diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md deleted file mode 100644 index 098f2c8f3c836a998d78f7607787b7476108a57e..0000000000000000000000000000000000000000 --- a/.github/ISSUE_TEMPLATE.md +++ /dev/null @@ -1 +0,0 @@ -## Please let us know which model this issue is about (specify the top-level directory) diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md new file mode 100644 index 0000000000000000000000000000000000000000..4da144cdd9a2b61aa9a136faa639554e12f89de5 --- /dev/null +++ b/ISSUE_TEMPLATE.md @@ -0,0 +1,37 @@ +Please go to Stack Overflow for help and support: + +http://stackoverflow.com/questions/tagged/tensorflow + +Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy: + +1. It must be a bug or a feature request. +2. The form below must be filled out. + +**Here's why we have that policy**: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow. + +------------------------ + +### System information +- **What is the top-level directory of the model you are using**: +- **Have I written custom code (as opposed to using a stock example script provided in TensorFlow)**: +- **OS Platform and Distribution (e.g., Linux Ubuntu 16.04)**: +- **TensorFlow installed from (source or binary)**: +- **TensorFlow version (use command below)**: +- **Bazel version (if compiling from source)**: +- **CUDA/cuDNN version**: +- **GPU model and memory**: +- **Exact command to reproduce**: + +You can collect some of this information using our environment capture script: + +https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh + +You can obtain the TensorFlow version with + +python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" + +### Describe the problem +Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request. + +### Source code / logs +Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem. diff --git a/README.md b/README.md index 1b81c6ddbc28c07a4ce632f617dd447b6c8318c1..50f5cebd324f94edd31139801306bd7fd22f89b6 100644 --- a/README.md +++ b/README.md @@ -2,22 +2,39 @@ This repository contains machine learning models implemented in [TensorFlow](https://tensorflow.org). The models are maintained by their -respective authors. +respective authors. To propose a model for inclusion, please submit a pull +request. -To propose a model for inclusion please submit a pull request. +Currently, the models are compatible with TensorFlow 1.0 or later. If you are +running TensorFlow 0.12 or earlier, please +[upgrade your installation](https://www.tensorflow.org/install). ## Models -- [autoencoder](autoencoder) -- various autoencoders -- [differential_privacy](differential_privacy) -- privacy-preserving student models from multiple teachers -- [im2txt](im2txt) -- image-to-text neural network for image captioning. -- [inception](inception) -- deep convolutional networks for computer vision -- [namignizer](namignizer) -- recognize and generate names -- [neural_gpu](neural_gpu) -- highly parallel neural computer -- [neural_programmer](neural_programmer) -- neural network augmented with logic and mathematic operations. -- [resnet](resnet) -- deep and wide residual networks -- [slim](slim) -- image classification models in TF-Slim -- [swivel](swivel) -- the Swivel algorithm for generating word embeddings -- [syntaxnet](syntaxnet) -- neural models of natural language syntax -- [textsum](textsum) -- sequence-to-sequence with attention model for text summarization. -- [transformer](transformer) -- spatial transformer network, which allows the spatial manipulation of data within the network +- [adversarial_crypto](adversarial_crypto): protecting communications with adversarial neural cryptography. +- [adversarial_text](adversarial_text): semi-supervised sequence learning with adversarial training. +- [attention_ocr](attention_ocr): a model for real-world image text extraction. +- [autoencoder](autoencoder): various autoencoders. +- [cognitive_mapping_and_planning](cognitive_mapping_and_planning): implementation of a spatial memory based mapping and planning architecture for visual navigation. +- [compression](compression): compressing and decompressing images using a pre-trained Residual GRU network. +- [differential_privacy](differential_privacy): privacy-preserving student models from multiple teachers. +- [domain_adaptation](domain_adaptation): domain separation networks. +- [im2txt](im2txt): image-to-text neural network for image captioning. +- [inception](inception): deep convolutional networks for computer vision. +- [learning_to_remember_rare_events](learning_to_remember_rare_events): a large-scale life-long memory module for use in deep learning. +- [lm_1b](lm_1b): language modeling on the one billion word benchmark. +- [namignizer](namignizer): recognize and generate names. +- [neural_gpu](neural_gpu): highly parallel neural computer. +- [neural_programmer](neural_programmer): neural network augmented with logic and mathematic operations. +- [next_frame_prediction](next_frame_prediction): probabilistic future frame synthesis via cross convolutional networks. +- [real_nvp](real_nvp): density estimation using real-valued non-volume preserving (real NVP) transformations. +- [resnet](resnet): deep and wide residual networks. +- [skip_thoughts](skip_thoughts): recurrent neural network sentence-to-vector encoder. +- [slim](slim): image classification models in TF-Slim. +- [street](street): identify the name of a street (in France) from an image using a Deep RNN. +- [swivel](swivel): the Swivel algorithm for generating word embeddings. +- [syntaxnet](syntaxnet): neural models of natural language syntax. +- [textsum](textsum): sequence-to-sequence with attention model for text summarization. +- [transformer](transformer): spatial transformer network, which allows the spatial manipulation of data within the network. +- [tutorials](tutorials): models described in the [TensorFlow tutorials](https://www.tensorflow.org/tutorials/). +- [video_prediction](video_prediction): predicting future video frames with neural advection. diff --git a/adversarial_crypto/README.md b/adversarial_crypto/README.md new file mode 100644 index 0000000000000000000000000000000000000000..504ca234bebeb71421128467e0eee3e172abcf6b --- /dev/null +++ b/adversarial_crypto/README.md @@ -0,0 +1,58 @@ +# Learning to Protect Communications with Adversarial Neural Cryptography + +This is a slightly-updated model used for the paper +["Learning to Protect Communications with Adversarial Neural +Cryptography"](https://arxiv.org/abs/1610.06918). + +> We ask whether neural networks can learn to use secret keys to protect +> information from other neural networks. Specifically, we focus on ensuring +> confidentiality properties in a multiagent system, and we specify those +> properties in terms of an adversary. Thus, a system may consist of neural +> networks named Alice and Bob, and we aim to limit what a third neural +> network named Eve learns from eavesdropping on the communication between +> Alice and Bob. We do not prescribe specific cryptographic algorithms to +> these neural networks; instead, we train end-to-end, adversarially. +> We demonstrate that the neural networks can learn how to perform forms of +> encryption and decryption, and also how to apply these operations +> selectively in order to meet confidentiality goals. + +This code allows you to train an encoder/decoder/adversary triplet +and evaluate their effectiveness on randomly generated input and key +pairs. + +## Prerequisites + +The only software requirements for running the encoder and decoder is having +Tensorflow installed. + +Requires Tensorflow r0.12 or later. + +## Training and evaluating + +After installing TensorFlow and ensuring that your paths are configured +appropriately: + +``` +python train_eval.py +``` + +This will begin training a fresh model. If and when the model becomes +sufficiently well-trained, it will reset the Eve model multiple times +and retrain it from scratch, outputting the accuracy thus obtained +in each run. + +## Model differences from the paper + +The model has been simplified slightly from the one described in +the paper - the convolutional layer width was reduced by a factor +of two. In the version in the paper, there was a nonlinear unit +after the fully-connected layer; that nonlinear has been removed +here. These changes improve the robustness of training. The +initializer for the convolution layers has switched to the +tf.contrib.layers default of xavier_initializer instead of +a simpler truncated_normal. + +## Contact information + +This model repository is maintained by David G. Andersen +([dave-andersen](https://github.com/dave-andersen)). diff --git a/adversarial_crypto/train_eval.py b/adversarial_crypto/train_eval.py new file mode 100644 index 0000000000000000000000000000000000000000..09de7e513059028d4a5c2673fd72a1e880c2edec --- /dev/null +++ b/adversarial_crypto/train_eval.py @@ -0,0 +1,274 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Adversarial training to learn trivial encryption functions, +from the paper "Learning to Protect Communications with +Adversarial Neural Cryptography", Abadi & Andersen, 2016. + +https://arxiv.org/abs/1610.06918 + +This program creates and trains three neural networks, +termed Alice, Bob, and Eve. Alice takes inputs +in_m (message), in_k (key) and outputs 'ciphertext'. + +Bob takes inputs in_k, ciphertext and tries to reconstruct +the message. + +Eve is an adversarial network that takes input ciphertext +and also tries to reconstruct the message. + +The main function attempts to train these networks and then +evaluates them, all on random plaintext and key values. + +""" + +# TensorFlow Python 3 compatibility +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import signal +import sys +from six.moves import xrange # pylint: disable=redefined-builtin +import tensorflow as tf + +flags = tf.app.flags + +flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate') +flags.DEFINE_integer('batch_size', 4096, 'Batch size') + +FLAGS = flags.FLAGS + +# Input and output configuration. +TEXT_SIZE = 16 +KEY_SIZE = 16 + +# Training parameters. +ITERS_PER_ACTOR = 1 +EVE_MULTIPLIER = 2 # Train Eve 2x for every step of Alice/Bob +# Train until either max loops or Alice/Bob "good enough": +MAX_TRAINING_LOOPS = 850000 +BOB_LOSS_THRESH = 0.02 # Exit when Bob loss < 0.02 and Eve > 7.7 bits +EVE_LOSS_THRESH = 7.7 + +# Logging and evaluation. +PRINT_EVERY = 200 # In training, log every 200 steps. +EVE_EXTRA_ROUNDS = 2000 # At end, train eve a bit more. +RETRAIN_EVE_ITERS = 10000 # Retrain eve up to ITERS*LOOPS times. +RETRAIN_EVE_LOOPS = 25 # With an evaluation each loop +NUMBER_OF_EVE_RESETS = 5 # And do this up to 5 times with a fresh eve. +# Use EVAL_BATCHES samples each time we check accuracy. +EVAL_BATCHES = 1 + + +def batch_of_random_bools(batch_size, n): + """Return a batch of random "boolean" numbers. + + Args: + batch_size: Batch size dimension of returned tensor. + n: number of entries per batch. + + Returns: + A [batch_size, n] tensor of "boolean" numbers, where each number is + preresented as -1 or 1. + """ + + as_int = tf.random_uniform( + [batch_size, n], minval=0, maxval=2, dtype=tf.int32) + expanded_range = (as_int * 2) - 1 + return tf.cast(expanded_range, tf.float32) + + +class AdversarialCrypto(object): + """Primary model implementation class for Adversarial Neural Crypto. + + This class contains the code for the model itself, + and when created, plumbs the pathways from Alice to Bob and + Eve, creates the optimizers and loss functions, etc. + + Attributes: + eve_loss: Eve's loss function. + bob_loss: Bob's loss function. Different units from eve_loss. + eve_optimizer: A tf op that runs Eve's optimizer. + bob_optimizer: A tf op that runs Bob's optimizer. + bob_reconstruction_loss: Bob's message reconstruction loss, + which is comparable to eve_loss. + reset_eve_vars: Execute this op to completely reset Eve. + """ + + def get_message_and_key(self): + """Generate random pseudo-boolean key and message values.""" + + batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[]) + + in_m = batch_of_random_bools(batch_size, TEXT_SIZE) + in_k = batch_of_random_bools(batch_size, KEY_SIZE) + return in_m, in_k + + def model(self, collection, message, key=None): + """The model for Alice, Bob, and Eve. If key=None, the first FC layer + takes only the Key as inputs. Otherwise, it uses both the key + and the message. + + Args: + collection: The graph keys collection to add new vars to. + message: The input message to process. + key: The input key (if any) to use. + """ + + if key is not None: + combined_message = tf.concat(axis=1, values=[message, key]) + else: + combined_message = message + + # Ensure that all variables created are in the specified collection. + with tf.contrib.framework.arg_scope( + [tf.contrib.layers.fully_connected, tf.contrib.layers.conv2d], + variables_collections=[collection]): + + fc = tf.contrib.layers.fully_connected( + combined_message, + TEXT_SIZE + KEY_SIZE, + biases_initializer=tf.constant_initializer(0.0), + activation_fn=None) + + # Perform a sequence of 1D convolutions (by expanding the message out to 2D + # and then squeezing it back down). + fc = tf.expand_dims(fc, 2) + # 2,1 -> 1,2 + conv = tf.contrib.layers.conv2d( + fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid) + # 1,2 -> 1, 2 + conv = tf.contrib.layers.conv2d( + conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid) + # 1,2 -> 1, 1 + conv = tf.contrib.layers.conv2d( + conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh) + conv = tf.squeeze(conv, 2) + return conv + + def __init__(self): + in_m, in_k = self.get_message_and_key() + encrypted = self.model('alice', in_m, in_k) + decrypted = self.model('bob', encrypted, in_k) + eve_out = self.model('eve', encrypted, None) + + self.reset_eve_vars = tf.group( + *[w.initializer for w in tf.get_collection('eve')]) + + optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate) + + # Eve's goal is to decrypt the entire message: + eve_bits_wrong = tf.reduce_sum( + tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1]) + self.eve_loss = tf.reduce_sum(eve_bits_wrong) + self.eve_optimizer = optimizer.minimize( + self.eve_loss, var_list=tf.get_collection('eve')) + + # Alice and Bob want to be accurate... + self.bob_bits_wrong = tf.reduce_sum( + tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1]) + # ... and to not let Eve do better than guessing. + self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong) + bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong) + # 7-9 bits wrong is OK too, so we squish the error function a bit. + # Without doing this, we often tend to hang out at 0.25 / 7.5 error, + # and it seems bad to have continued, high communication error. + bob_eve_loss = tf.reduce_sum( + tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2) + + # Rescale the losses to [0, 1] per example and combine. + self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss) + + self.bob_optimizer = optimizer.minimize( + self.bob_loss, + var_list=(tf.get_collection('alice') + tf.get_collection('bob'))) + + +def doeval(s, ac, n, itercount): + """Evaluate the current network on n batches of random examples. + + Args: + s: The current TensorFlow session + ac: an instance of the AdversarialCrypto class + n: The number of iterations to run. + itercount: Iteration count label for logging. + + Returns: + Bob and eve's loss, as a percent of bits incorrect. + """ + + bob_loss_accum = 0 + eve_loss_accum = 0 + for _ in xrange(n): + bl, el = s.run([ac.bob_reconstruction_loss, ac.eve_loss]) + bob_loss_accum += bl + eve_loss_accum += el + bob_loss_percent = bob_loss_accum / (n * FLAGS.batch_size) + eve_loss_percent = eve_loss_accum / (n * FLAGS.batch_size) + print('%d %.2f %.2f' % (itercount, bob_loss_percent, eve_loss_percent)) + sys.stdout.flush() + return bob_loss_percent, eve_loss_percent + + +def train_until_thresh(s, ac): + for j in xrange(MAX_TRAINING_LOOPS): + for _ in xrange(ITERS_PER_ACTOR): + s.run(ac.bob_optimizer) + for _ in xrange(ITERS_PER_ACTOR * EVE_MULTIPLIER): + s.run(ac.eve_optimizer) + if j % PRINT_EVERY == 0: + bob_avg_loss, eve_avg_loss = doeval(s, ac, EVAL_BATCHES, j) + if (bob_avg_loss < BOB_LOSS_THRESH and eve_avg_loss > EVE_LOSS_THRESH): + print('Target losses achieved.') + return True + return False + + +def train_and_evaluate(): + """Run the full training and evaluation loop.""" + + ac = AdversarialCrypto() + init = tf.global_variables_initializer() + + with tf.Session() as s: + s.run(init) + print('# Batch size: ', FLAGS.batch_size) + print('# Iter Bob_Recon_Error Eve_Recon_Error') + + if train_until_thresh(s, ac): + for _ in xrange(EVE_EXTRA_ROUNDS): + s.run(eve_optimizer) + print('Loss after eve extra training:') + doeval(s, ac, EVAL_BATCHES * 2, 0) + for _ in xrange(NUMBER_OF_EVE_RESETS): + print('Resetting Eve') + s.run(reset_eve_vars) + eve_counter = 0 + for _ in xrange(RETRAIN_EVE_LOOPS): + for _ in xrange(RETRAIN_EVE_ITERS): + eve_counter += 1 + s.run(eve_optimizer) + doeval(s, ac, EVAL_BATCHES, eve_counter) + doeval(s, ac, EVAL_BATCHES, eve_counter) + + +def main(unused_argv): + # Exit more quietly with Ctrl-C. + signal.signal(signal.SIGINT, signal.SIG_DFL) + train_and_evaluate() + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/BUILD b/adversarial_text/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..b0fdc6332f96e6c55101dc4c707c5c9d8609da02 --- /dev/null +++ b/adversarial_text/BUILD @@ -0,0 +1,76 @@ +# Binaries +# ============================================================================== +py_binary( + name = "evaluate", + srcs = ["evaluate.py"], + deps = [ + ":graphs", + ], +) + +py_binary( + name = "train_classifier", + srcs = ["train_classifier.py"], + deps = [ + ":graphs", + ":train_utils", + ], +) + +py_binary( + name = "pretrain", + srcs = [ + "pretrain.py", + ], + deps = [ + ":graphs", + ":train_utils", + ], +) + +# Libraries +# ============================================================================== +py_library( + name = "graphs", + srcs = ["graphs.py"], + deps = [ + ":adversarial_losses", + ":inputs", + ":layers", + ], +) + +py_library( + name = "adversarial_losses", + srcs = ["adversarial_losses.py"], +) + +py_library( + name = "inputs", + srcs = ["inputs.py"], + deps = [ + "//adversarial_text/data:data_utils", + ], +) + +py_library( + name = "layers", + srcs = ["layers.py"], +) + +py_library( + name = "train_utils", + srcs = ["train_utils.py"], +) + +# Tests +# ============================================================================== +py_test( + name = "graphs_test", + size = "large", + srcs = ["graphs_test.py"], + deps = [ + ":graphs", + "//adversarial_text/data:data_utils", + ], +) diff --git a/adversarial_text/README.md b/adversarial_text/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a27d56c9e7ff2d9c703bebc717f7b4e2000a90e1 --- /dev/null +++ b/adversarial_text/README.md @@ -0,0 +1,157 @@ +# Adversarial Text Classification + +Code for [*Adversarial Training Methods for Semi-Supervised Text Classification*](https://arxiv.org/abs/1605.07725) and [*Semi-Supervised Sequence Learning*](https://arxiv.org/abs/1511.01432). + +## Requirements + +* Bazel ([install](https://bazel.build/versions/master/docs/install.html)) +* TensorFlow >= v1.1 + +## End-to-end IMDB Sentiment Classification + +### Fetch data + +``` +$ wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz \ + -O /tmp/imdb.tar.gz +$ tar -xf /tmp/imdb.tar.gz -C /tmp +``` + +The directory `/tmp/aclImdb` contains the raw IMDB data. + +### Generate vocabulary + +``` +$ IMDB_DATA_DIR=/tmp/imdb +$ bazel run data:gen_vocab -- \ + --output_dir=$IMDB_DATA_DIR \ + --dataset=imdb \ + --imdb_input_dir=/tmp/aclImdb \ + --lowercase=False +``` + +Vocabulary and frequency files will be generated in `$IMDB_DATA_DIR`. + +###  Generate training, validation, and test data + +``` +$ bazel run data:gen_data -- \ + --output_dir=$IMDB_DATA_DIR \ + --dataset=imdb \ + --imdb_input_dir=/tmp/aclImdb \ + --lowercase=False \ + --label_gain=False +``` + +`$IMDB_DATA_DIR` contains TFRecords files. + +### Pretrain IMDB Language Model + +``` +$ PRETRAIN_DIR=/tmp/models/imdb_pretrain +$ bazel run :pretrain -- \ + --train_dir=$PRETRAIN_DIR \ + --data_dir=$IMDB_DATA_DIR \ + --vocab_size=86934 \ + --embedding_dims=256 \ + --rnn_cell_size=1024 \ + --num_candidate_samples=1024 \ + --optimizer=adam \ + --batch_size=256 \ + --learning_rate=0.001 \ + --learning_rate_decay_factor=0.9999 \ + --max_steps=100000 \ + --max_grad_norm=1.0 \ + --num_timesteps=400 \ + --keep_prob_emb=0.5 \ + --normalize_embeddings +``` + +`$PRETRAIN_DIR` contains checkpoints of the pretrained language model. + +### Train classifier + +Most flags stay the same, save for the removal of candidate sampling and the +addition of `pretrained_model_dir`, from which the classifier will load the +pretrained embedding and LSTM variables, and flags related to adversarial +training and classification. + +``` +$ TRAIN_DIR=/tmp/models/imdb_classify +$ bazel run :train_classifier -- \ + --train_dir=$TRAIN_DIR \ + --pretrained_model_dir=$PRETRAIN_DIR \ + --data_dir=$IMDB_DATA_DIR \ + --vocab_size=86934 \ + --embedding_dims=256 \ + --rnn_cell_size=1024 \ + --cl_num_layers=1 \ + --cl_hidden_size=30 \ + --optimizer=adam \ + --batch_size=64 \ + --learning_rate=0.0005 \ + --learning_rate_decay_factor=0.9998 \ + --max_steps=15000 \ + --max_grad_norm=1.0 \ + --num_timesteps=400 \ + --keep_prob_emb=0.5 \ + --normalize_embeddings \ + --adv_training_method=vat +``` + +### Evaluate on test data + +``` +$ EVAL_DIR=/tmp/models/imdb_eval +$ bazel run :evaluate -- \ + --eval_dir=$EVAL_DIR \ + --checkpoint_dir=$TRAIN_DIR \ + --eval_data=test \ + --run_once \ + --num_examples=25000 \ + --data_dir=$IMDB_DATA_DIR \ + --vocab_size=86934 \ + --embedding_dims=256 \ + --rnn_cell_size=1024 \ + --batch_size=256 \ + --num_timesteps=400 \ + --normalize_embeddings +``` + +## Code Overview + +The main entry points are the binaries listed below. Each training binary builds +a `VatxtModel`, defined in `graphs.py`, which in turn uses graph building blocks +defined in `inputs.py` (defines input data reading and parsing), `layers.py` +(defines core model components), and `adversarial_losses.py` (defines +adversarial training losses). The training loop itself is defined in +`train_utils.py`. + +### Binaries + +* Pretraining: `pretrain.py` +* Classifier Training: `train_classifier.py` +* Evaluation: `evaluate.py` + +### Command-Line Flags + +Flags related to distributed training and the training loop itself are defined +in `train_utils.py`. + +Flags related to model hyperparameters are defined in `graphs.py`. + +Flags related to adversarial training are defined in `adversarial_losses.py`. + +Flags particular to each job are defined in the main binary files. + +### Data Generation + +* Vocabulary generation: `gen_vocab.py` +* Data generation: `gen_data.py` + +Command-line flags defined in `document_generators.py` control which dataset is +processed and how. + +## Contact for Issues + +* Ryan Sepassi, @rsepassi diff --git a/adversarial_text/adversarial_losses.py b/adversarial_text/adversarial_losses.py new file mode 100644 index 0000000000000000000000000000000000000000..f8fba6d3558d52938e065388003bd9c6f3fd5bbe --- /dev/null +++ b/adversarial_text/adversarial_losses.py @@ -0,0 +1,238 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Adversarial losses for text models.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Adversarial and virtual adversarial training parameters. +flags.DEFINE_float('perturb_norm_length', 0.1, + 'Norm length of adversarial perturbation to be ' + 'optimized with validation') + +# Virtual adversarial training parameters +flags.DEFINE_integer('num_power_iteration', 1, 'The number of power iteration') +flags.DEFINE_float('small_constant_for_finite_diff', 1e-3, + 'Small constant for finite difference method') + +# Parameters for building the graph +flags.DEFINE_string('adv_training_method', None, + 'The flag which specifies training method. ' + '"rp" : random perturbation training ' + '"at" : adversarial training ' + '"vat" : virtual adversarial training ' + '"atvat" : at + vat ') +flags.DEFINE_float('adv_reg_coeff', 1.0, + 'Regularization coefficient of adversarial loss.') + + +def random_perturbation_loss(embedded, length, loss_fn): + """Adds noise to embeddings and recomputes classification loss.""" + noise = tf.random_normal(shape=tf.shape(embedded)) + perturb = _scale_l2(_mask_by_length(noise, length), FLAGS.perturb_norm_length) + return loss_fn(embedded + perturb) + + +def adversarial_loss(embedded, loss, loss_fn): + """Adds gradient to embedding and recomputes classification loss.""" + grad, = tf.gradients( + loss, + embedded, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + grad = tf.stop_gradient(grad) + perturb = _scale_l2(grad, FLAGS.perturb_norm_length) + return loss_fn(embedded + perturb) + + +def virtual_adversarial_loss(logits, embedded, inputs, + logits_from_embedding_fn): + """Virtual adversarial loss. + + Computes virtual adversarial perturbation by finite difference method and + power iteration, adds it to the embedding, and computes the KL divergence + between the new logits and the original logits. + + Args: + logits: 2-D float Tensor, [num_timesteps*batch_size, m], where m=1 if + num_classes=2, otherwise m=num_classes. + embedded: 3-D float Tensor, [batch_size, num_timesteps, embedding_dim]. + inputs: VatxtInput. + logits_from_embedding_fn: callable that takes embeddings and returns + classifier logits. + + Returns: + kl: float scalar. + """ + # Stop gradient of logits. See https://arxiv.org/abs/1507.00677 for details. + logits = tf.stop_gradient(logits) + # Only care about the KL divergence on the final timestep. + weights = _end_of_seq_mask(inputs.labels) + + # Initialize perturbation with random noise. + # shape(embedded) = (batch_size, num_timesteps, embedding_dim) + d = _mask_by_length(tf.random_normal(shape=tf.shape(embedded)), inputs.length) + + # Perform finite difference method and power iteration. + # See Eq.(8) in the paper http://arxiv.org/pdf/1507.00677.pdf, + # Adding small noise to input and taking gradient with respect to the noise + # corresponds to 1 power iteration. + for _ in xrange(FLAGS.num_power_iteration): + d = _scale_l2(d, FLAGS.small_constant_for_finite_diff) + d_logits = logits_from_embedding_fn(embedded + d) + kl = _kl_divergence_with_logits(logits, d_logits, weights) + d, = tf.gradients( + kl, + d, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + d = tf.stop_gradient(d) + + perturb = _scale_l2( + _mask_by_length(d, inputs.length), FLAGS.perturb_norm_length) + vadv_logits = logits_from_embedding_fn(embedded + perturb) + return _kl_divergence_with_logits(logits, vadv_logits, weights) + + +def random_perturbation_loss_bidir(embedded, length, loss_fn): + """Adds noise to embeddings and recomputes classification loss.""" + noise = [tf.random_normal(shape=tf.shape(emb)) for emb in embedded] + masked = [_mask_by_length(n, length) for n in noise] + scaled = [_scale_l2(m, FLAGS.perturb_norm_length) for m in masked] + return loss_fn([e + s for (e, s) in zip(embedded, scaled)]) + + +def adversarial_loss_bidir(embedded, loss, loss_fn): + """Adds gradient to embeddings and recomputes classification loss.""" + grads = tf.gradients( + loss, + embedded, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + adv_exs = [ + emb + _scale_l2(tf.stop_gradient(g), FLAGS.perturb_norm_length) + for emb, g in zip(embedded, grads) + ] + return loss_fn(adv_exs) + + +def virtual_adversarial_loss_bidir(logits, embedded, inputs, + logits_from_embedding_fn): + """Virtual adversarial loss for bidirectional models.""" + logits = tf.stop_gradient(logits) + f_inputs, _ = inputs + weights = _end_of_seq_mask(f_inputs.labels) + + perturbs = [ + _mask_by_length(tf.random_normal(shape=tf.shape(emb)), f_inputs.length) + for emb in embedded + ] + for _ in xrange(FLAGS.num_power_iteration): + perturbs = [ + _scale_l2(d, FLAGS.small_constant_for_finite_diff) for d in perturbs + ] + d_logits = logits_from_embedding_fn( + [emb + d for (emb, d) in zip(embedded, perturbs)]) + kl = _kl_divergence_with_logits(logits, d_logits, weights) + perturbs = tf.gradients( + kl, + perturbs, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + perturbs = [tf.stop_gradient(d) for d in perturbs] + + perturbs = [ + _scale_l2(_mask_by_length(d, f_inputs.length), FLAGS.perturb_norm_length) + for d in perturbs + ] + vadv_logits = logits_from_embedding_fn( + [emb + d for (emb, d) in zip(embedded, perturbs)]) + return _kl_divergence_with_logits(logits, vadv_logits, weights) + + +def _mask_by_length(t, length): + """Mask t, 3-D [batch, time, dim], by length, 1-D [batch,].""" + maxlen = t.get_shape().as_list()[1] + mask = tf.sequence_mask(length, maxlen=maxlen) + mask = tf.expand_dims(tf.cast(mask, tf.float32), -1) + # shape(mask) = (batch, num_timesteps, 1) + return t * mask + + +def _scale_l2(x, norm_length): + # shape(x) = (batch, num_timesteps, d) + + # Divide x by max(abs(x)) for a numerically stable L2 norm. + # 2norm(x) = a * 2norm(x/a) + # Scale over the full sequence, dims (1, 2) + alpha = tf.reduce_max(tf.abs(x), (1, 2), keep_dims=True) + 1e-12 + l2_norm = alpha * tf.sqrt(tf.reduce_sum(tf.pow(x / alpha, 2), (1, 2), + keep_dims=True) + 1e-6) + x_unit = x / l2_norm + return norm_length * x_unit + + +def _end_of_seq_mask(tokens): + """Generate a mask for the EOS token (1.0 on EOS, 0.0 otherwise). + + Args: + tokens: 1-D integer tensor [num_timesteps*batch_size]. Each element is an + id from the vocab. + + Returns: + Float tensor same shape as tokens, whose values are 1.0 on the end of + sequence and 0.0 on the others. + """ + eos_id = FLAGS.vocab_size - 1 + return tf.cast(tf.equal(tokens, eos_id), tf.float32) + + +def _kl_divergence_with_logits(q_logits, p_logits, weights): + """Returns weighted KL divergence between distributions q and p. + + Args: + q_logits: logits for 1st argument of KL divergence shape + [num_timesteps * batch_size, num_classes] if num_classes > 2, and + [num_timesteps * batch_size] if num_classes == 2. + p_logits: logits for 2nd argument of KL divergence with same shape q_logits. + weights: 1-D float tensor with shape [num_timesteps * batch_size]. + Elements should be 1.0 only on end of sequences + + Returns: + KL: float scalar. + """ + # For logistic regression + if FLAGS.num_classes == 2: + q = tf.nn.sigmoid(q_logits) + p = tf.nn.sigmoid(p_logits) + kl = (-tf.nn.sigmoid_cross_entropy_with_logits(logits=q_logits, labels=q) + + tf.nn.sigmoid_cross_entropy_with_logits(logits=p_logits, labels=q)) + + # For softmax regression + else: + q = tf.nn.softmax(q_logits) + p = tf.nn.softmax(p_logits) + kl = tf.reduce_sum(q * (tf.log(q) - tf.log(p)), 1) + + num_labels = tf.reduce_sum(weights) + num_labels = tf.where(tf.equal(num_labels, 0.), 1., num_labels) + + kl.get_shape().assert_has_rank(2) + weights.get_shape().assert_has_rank(1) + loss = tf.identity(tf.reduce_sum(tf.expand_dims(weights, -1) * kl) / + num_labels, name='kl') + return loss diff --git a/adversarial_text/data/BUILD b/adversarial_text/data/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..33d46bcc1643be6964da810dc83e0ce11cdfcc30 --- /dev/null +++ b/adversarial_text/data/BUILD @@ -0,0 +1,41 @@ +package( + default_visibility = [ + "//adversarial_text:__subpackages__", + ], +) + +py_binary( + name = "gen_vocab", + srcs = ["gen_vocab.py"], + deps = [ + ":data_utils", + ":document_generators", + ], +) + +py_binary( + name = "gen_data", + srcs = ["gen_data.py"], + deps = [ + ":data_utils", + ":document_generators", + ], +) + +py_library( + name = "document_generators", + srcs = ["document_generators.py"], +) + +py_library( + name = "data_utils", + srcs = ["data_utils.py"], +) + +py_test( + name = "data_utils_test", + srcs = ["data_utils_test.py"], + deps = [ + ":data_utils", + ], +) diff --git a/adversarial_text/data/data_utils.py b/adversarial_text/data/data_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1c31ab96d1d8e4573b9657d3792bd880da3d50ec --- /dev/null +++ b/adversarial_text/data/data_utils.py @@ -0,0 +1,326 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for generating/preprocessing data for adversarial text models.""" + +import operator +import os +import random +import re +import tensorflow as tf + +EOS_TOKEN = '' + +# Data filenames +# Sequence Autoencoder +ALL_SA = 'all_sa.tfrecords' +TRAIN_SA = 'train_sa.tfrecords' +TEST_SA = 'test_sa.tfrecords' +# Language Model +ALL_LM = 'all_lm.tfrecords' +TRAIN_LM = 'train_lm.tfrecords' +TEST_LM = 'test_lm.tfrecords' +# Classification +TRAIN_CLASS = 'train_classification.tfrecords' +TEST_CLASS = 'test_classification.tfrecords' +VALID_CLASS = 'validate_classification.tfrecords' +# LM with bidirectional LSTM +TRAIN_REV_LM = 'train_reverse_lm.tfrecords' +TEST_REV_LM = 'test_reverse_lm.tfrecords' +# Classification with bidirectional LSTM +TRAIN_BD_CLASS = 'train_bidir_classification.tfrecords' +TEST_BD_CLASS = 'test_bidir_classification.tfrecords' +VALID_BD_CLASS = 'validate_bidir_classification.tfrecords' + + +class ShufflingTFRecordWriter(object): + """Thin wrapper around TFRecordWriter that shuffles records.""" + + def __init__(self, path): + self._path = path + self._records = [] + self._closed = False + + def write(self, record): + assert not self._closed + self._records.append(record) + + def close(self): + assert not self._closed + random.shuffle(self._records) + with tf.python_io.TFRecordWriter(self._path) as f: + for record in self._records: + f.write(record) + self._closed = True + + def __enter__(self): + return self + + def __exit__(self, unused_type, unused_value, unused_traceback): + self.close() + + +class Timestep(object): + """Represents a single timestep in a SequenceWrapper.""" + + def __init__(self, token, label, weight, multivalent_tokens=False): + """Constructs Timestep from empty Features.""" + self._token = token + self._label = label + self._weight = weight + self._multivalent_tokens = multivalent_tokens + self._fill_with_defaults() + + @property + def token(self): + if self._multivalent_tokens: + raise TypeError('Timestep may contain multiple values; use `tokens`') + return self._token.int64_list.value[0] + + @property + def tokens(self): + return self._token.int64_list.value + + @property + def label(self): + return self._label.int64_list.value[0] + + @property + def weight(self): + return self._weight.float_list.value[0] + + def set_token(self, token): + if self._multivalent_tokens: + raise TypeError('Timestep may contain multiple values; use `add_token`') + self._token.int64_list.value[0] = token + return self + + def add_token(self, token): + self._token.int64_list.value.append(token) + return self + + def set_label(self, label): + self._label.int64_list.value[0] = label + return self + + def set_weight(self, weight): + self._weight.float_list.value[0] = weight + return self + + def copy_from(self, timestep): + self.set_token(timestep.token).set_label(timestep.label).set_weight( + timestep.weight) + return self + + def _fill_with_defaults(self): + if not self._multivalent_tokens: + self._token.int64_list.value.append(0) + self._label.int64_list.value.append(0) + self._weight.float_list.value.append(0.0) + + +class SequenceWrapper(object): + """Wrapper around tf.SequenceExample.""" + + F_TOKEN_ID = 'token_id' + F_LABEL = 'label' + F_WEIGHT = 'weight' + + def __init__(self, multivalent_tokens=False): + self._seq = tf.train.SequenceExample() + self._flist = self._seq.feature_lists.feature_list + self._timesteps = [] + self._multivalent_tokens = multivalent_tokens + + @property + def seq(self): + return self._seq + + @property + def multivalent_tokens(self): + return self._multivalent_tokens + + @property + def _tokens(self): + return self._flist[SequenceWrapper.F_TOKEN_ID].feature + + @property + def _labels(self): + return self._flist[SequenceWrapper.F_LABEL].feature + + @property + def _weights(self): + return self._flist[SequenceWrapper.F_WEIGHT].feature + + def add_timestep(self): + timestep = Timestep( + self._tokens.add(), + self._labels.add(), + self._weights.add(), + multivalent_tokens=self._multivalent_tokens) + self._timesteps.append(timestep) + return timestep + + def __iter__(self): + for timestep in self._timesteps: + yield timestep + + def __len__(self): + return len(self._timesteps) + + def __getitem__(self, idx): + return self._timesteps[idx] + + +def build_reverse_sequence(seq): + """Builds a sequence that is the reverse of the input sequence.""" + reverse_seq = SequenceWrapper() + + # Copy all but last timestep + for timestep in reversed(seq[:-1]): + reverse_seq.add_timestep().copy_from(timestep) + + # Copy final timestep + reverse_seq.add_timestep().copy_from(seq[-1]) + + return reverse_seq + + +def build_bidirectional_seq(seq, rev_seq): + bidir_seq = SequenceWrapper(multivalent_tokens=True) + for forward_ts, reverse_ts in zip(seq, rev_seq): + bidir_seq.add_timestep().add_token(forward_ts.token).add_token( + reverse_ts.token) + + return bidir_seq + + +def build_lm_sequence(seq): + """Builds language model sequence from input sequence. + + Args: + seq: SequenceWrapper. + + Returns: + SequenceWrapper with `seq` tokens copied over to output sequence tokens and + labels (offset by 1, i.e. predict next token) with weights set to 1.0. + """ + lm_seq = SequenceWrapper() + for i, timestep in enumerate(seq[:-1]): + lm_seq.add_timestep().set_token(timestep.token).set_label( + seq[i + 1].token).set_weight(1.0) + + return lm_seq + + +def build_seq_ae_sequence(seq): + """Builds seq_ae sequence from input sequence. + + Args: + seq: SequenceWrapper. + + Returns: + SequenceWrapper with `seq` inputs copied and concatenated, and with labels + copied in on the right-hand (i.e. decoder) side with weights set to 1.0. + The new sequence will have length `len(seq) * 2 - 1`, as the last timestep + of the encoder section and the first step of the decoder section will + overlap. + """ + seq_ae_seq = SequenceWrapper() + + for i in range(len(seq) * 2 - 1): + ts = seq_ae_seq.add_timestep() + + if i < len(seq) - 1: + # Encoder + ts.set_token(seq[i].token) + elif i == len(seq) - 1: + # Transition step + ts.set_token(seq[i].token) + ts.set_label(seq[0].token) + ts.set_weight(1.0) + else: + # Decoder + ts.set_token(seq[i % len(seq)].token) + ts.set_label(seq[(i + 1) % len(seq)].token) + ts.set_weight(1.0) + + return seq_ae_seq + + +def build_labeled_sequence(seq, class_label, label_gain=False): + """Builds labeled sequence from input sequence. + + Args: + seq: SequenceWrapper. + class_label: bool. + label_gain: bool. If True, class_label will be put on every timestep and + weight will increase linearly from 0 to 1. + + Returns: + SequenceWrapper with `seq` copied in and `class_label` added as label to + final timestep. + """ + label_seq = SequenceWrapper(multivalent_tokens=seq.multivalent_tokens) + + # Copy sequence without labels + seq_len = len(seq) + final_timestep = None + for i, timestep in enumerate(seq): + label_timestep = label_seq.add_timestep() + if seq.multivalent_tokens: + for token in timestep.tokens: + label_timestep.add_token(token) + else: + label_timestep.set_token(timestep.token) + if label_gain: + label_timestep.set_label(int(class_label)) + weight = 1.0 if seq_len < 2 else float(i) / (seq_len - 1) + label_timestep.set_weight(weight) + if i == (seq_len - 1): + final_timestep = label_timestep + + # Edit final timestep to have class label and weight = 1. + final_timestep.set_label(int(class_label)).set_weight(1.0) + + return label_seq + + +def split_by_punct(segment): + """Splits str segment by punctuation, filters our empties and spaces.""" + return [s for s in re.split(r'\W+', segment) if s and not s.isspace()] + + +def sort_vocab_by_frequency(vocab_freq_map): + """Sorts vocab_freq_map by count. + + Args: + vocab_freq_map: dict, vocabulary terms with counts. + + Returns: + list> sorted by count, descending. + """ + return sorted( + vocab_freq_map.items(), key=operator.itemgetter(1), reverse=True) + + +def write_vocab_and_frequency(ordered_vocab_freqs, output_dir): + """Writes ordered_vocab_freqs into vocab.txt and vocab_freq.txt.""" + tf.gfile.MakeDirs(output_dir) + with open(os.path.join(output_dir, 'vocab.txt'), 'w') as vocab_f: + with open(os.path.join(output_dir, 'vocab_freq.txt'), 'w') as freq_f: + for word, freq in ordered_vocab_freqs: + vocab_f.write('{}\n'.format(word)) + freq_f.write('{}\n'.format(freq)) diff --git a/adversarial_text/data/data_utils_test.py b/adversarial_text/data/data_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..614b12953e77f9d66503f1d1a1e0d81b98e84a14 --- /dev/null +++ b/adversarial_text/data/data_utils_test.py @@ -0,0 +1,192 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for data_utils.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +from adversarial_text.data import data_utils + +data = data_utils + + +class SequenceWrapperTest(tf.test.TestCase): + + def testDefaultTimesteps(self): + seq = data.SequenceWrapper() + t1 = seq.add_timestep() + _ = seq.add_timestep() + self.assertEqual(len(seq), 2) + + self.assertEqual(t1.weight, 0.0) + self.assertEqual(t1.label, 0) + self.assertEqual(t1.token, 0) + + def testSettersAndGetters(self): + ts = data.SequenceWrapper().add_timestep() + ts.set_token(3) + ts.set_label(4) + ts.set_weight(2.0) + self.assertEqual(ts.token, 3) + self.assertEqual(ts.label, 4) + self.assertEqual(ts.weight, 2.0) + + def testTimestepIteration(self): + seq = data.SequenceWrapper() + seq.add_timestep().set_token(0) + seq.add_timestep().set_token(1) + seq.add_timestep().set_token(2) + for i, ts in enumerate(seq): + self.assertEqual(ts.token, i) + + def testFillsSequenceExampleCorrectly(self): + seq = data.SequenceWrapper() + seq.add_timestep().set_token(1).set_label(2).set_weight(3.0) + seq.add_timestep().set_token(10).set_label(20).set_weight(30.0) + + seq_ex = seq.seq + fl = seq_ex.feature_lists.feature_list + fl_token = fl[data.SequenceWrapper.F_TOKEN_ID].feature + fl_label = fl[data.SequenceWrapper.F_LABEL].feature + fl_weight = fl[data.SequenceWrapper.F_WEIGHT].feature + _ = [self.assertEqual(len(f), 2) for f in [fl_token, fl_label, fl_weight]] + self.assertAllEqual([f.int64_list.value[0] for f in fl_token], [1, 10]) + self.assertAllEqual([f.int64_list.value[0] for f in fl_label], [2, 20]) + self.assertAllEqual([f.float_list.value[0] for f in fl_weight], [3.0, 30.0]) + + +class DataUtilsTest(tf.test.TestCase): + + def testSplitByPunct(self): + output = data.split_by_punct( + 'hello! world, i\'ve been\nwaiting\tfor\ryou for.a long time') + expected = [ + 'hello', 'world', 'i', 've', 'been', 'waiting', 'for', 'you', 'for', + 'a', 'long', 'time' + ] + self.assertListEqual(output, expected) + + def _buildDummySequence(self): + seq = data.SequenceWrapper() + for i in range(10): + seq.add_timestep().set_token(i) + return seq + + def testBuildLMSeq(self): + seq = self._buildDummySequence() + lm_seq = data.build_lm_sequence(seq) + for i, ts in enumerate(lm_seq): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, i + 1) + self.assertEqual(ts.weight, 1.0) + + def testBuildSAESeq(self): + seq = self._buildDummySequence() + sa_seq = data.build_seq_ae_sequence(seq) + + self.assertEqual(len(sa_seq), len(seq) * 2 - 1) + + # Tokens should be sequence twice, minus the EOS token at the end + for i, ts in enumerate(sa_seq): + self.assertEqual(ts.token, seq[i % 10].token) + + # Weights should be len-1 0.0's and len 1.0's. + for i in range(len(seq) - 1): + self.assertEqual(sa_seq[i].weight, 0.0) + for i in range(len(seq) - 1, len(sa_seq)): + self.assertEqual(sa_seq[i].weight, 1.0) + + # Labels should be len-1 0's, and then the sequence + for i in range(len(seq) - 1): + self.assertEqual(sa_seq[i].label, 0) + for i in range(len(seq) - 1, len(sa_seq)): + self.assertEqual(sa_seq[i].label, seq[i - (len(seq) - 1)].token) + + def testBuildLabelSeq(self): + seq = self._buildDummySequence() + eos_id = len(seq) - 1 + label_seq = data.build_labeled_sequence(seq, True) + for i, ts in enumerate(label_seq[:-1]): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = label_seq[-1] + self.assertEqual(final_timestep.token, eos_id) + self.assertEqual(final_timestep.label, 1) + self.assertEqual(final_timestep.weight, 1.0) + + def testBuildBidirLabelSeq(self): + seq = self._buildDummySequence() + reverse_seq = data.build_reverse_sequence(seq) + bidir_seq = data.build_bidirectional_seq(seq, reverse_seq) + label_seq = data.build_labeled_sequence(bidir_seq, True) + + for (i, ts), j in zip( + enumerate(label_seq[:-1]), reversed(range(len(seq) - 1))): + self.assertAllEqual(ts.tokens, [i, j]) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = label_seq[-1] + eos_id = len(seq) - 1 + self.assertAllEqual(final_timestep.tokens, [eos_id, eos_id]) + self.assertEqual(final_timestep.label, 1) + self.assertEqual(final_timestep.weight, 1.0) + + def testReverseSeq(self): + seq = self._buildDummySequence() + reverse_seq = data.build_reverse_sequence(seq) + for i, ts in enumerate(reversed(reverse_seq[:-1])): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = reverse_seq[-1] + eos_id = len(seq) - 1 + self.assertEqual(final_timestep.token, eos_id) + self.assertEqual(final_timestep.label, 0) + self.assertEqual(final_timestep.weight, 0.0) + + def testBidirSeq(self): + seq = self._buildDummySequence() + reverse_seq = data.build_reverse_sequence(seq) + bidir_seq = data.build_bidirectional_seq(seq, reverse_seq) + for (i, ts), j in zip( + enumerate(bidir_seq[:-1]), reversed(range(len(seq) - 1))): + self.assertAllEqual(ts.tokens, [i, j]) + self.assertEqual(ts.label, 0) + self.assertEqual(ts.weight, 0.0) + + final_timestep = bidir_seq[-1] + eos_id = len(seq) - 1 + self.assertAllEqual(final_timestep.tokens, [eos_id, eos_id]) + self.assertEqual(final_timestep.label, 0) + self.assertEqual(final_timestep.weight, 0.0) + + def testLabelGain(self): + seq = self._buildDummySequence() + label_seq = data.build_labeled_sequence(seq, True, label_gain=True) + for i, ts in enumerate(label_seq): + self.assertEqual(ts.token, i) + self.assertEqual(ts.label, 1) + self.assertNear(ts.weight, float(i) / (len(seq) - 1), 1e-3) + + +if __name__ == '__main__': + tf.test.main() diff --git a/adversarial_text/data/document_generators.py b/adversarial_text/data/document_generators.py new file mode 100644 index 0000000000000000000000000000000000000000..990dae775fe4218de7cfa84445dae1f0e3bc1eda --- /dev/null +++ b/adversarial_text/data/document_generators.py @@ -0,0 +1,370 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Input readers and document/token generators for datasets.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import namedtuple +import csv +import os +import random + +import tensorflow as tf + +from adversarial_text.data import data_utils + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('dataset', '', 'Which dataset to generate data for') + +# Preprocessing config +flags.DEFINE_boolean('output_unigrams', True, 'Whether to output unigrams.') +flags.DEFINE_boolean('output_bigrams', False, 'Whether to output bigrams.') +flags.DEFINE_boolean('output_char', False, 'Whether to output characters.') +flags.DEFINE_boolean('lowercase', True, 'Whether to lowercase document terms.') + +# IMDB +flags.DEFINE_string('imdb_input_dir', '', 'The input directory containing the ' + 'IMDB sentiment dataset.') +flags.DEFINE_integer('imdb_validation_pos_start_id', 10621, 'File id of the ' + 'first file in the pos sentiment validation set.') +flags.DEFINE_integer('imdb_validation_neg_start_id', 10625, 'File id of the ' + 'first file in the neg sentiment validation set.') + +# DBpedia +flags.DEFINE_string('dbpedia_input_dir', '', + 'Path to DBpedia directory containing train.csv and ' + 'test.csv.') + +# Reuters Corpus (rcv1) +flags.DEFINE_string('rcv1_input_dir', '', + 'Path to rcv1 directory containing train.csv, unlab.csv, ' + 'and test.csv.') + +# Rotten Tomatoes +flags.DEFINE_string('rt_input_dir', '', + 'The Rotten Tomatoes dataset input directory.') + + +# The amazon reviews input file to use in either the RT or IMDB datasets. +flags.DEFINE_string('amazon_unlabeled_input_file', '', + 'The unlabeled Amazon Reviews dataset input file. If set, ' + 'the input file is used to augment RT and IMDB vocab.') + +Document = namedtuple('Document', + 'content is_validation is_test label add_tokens') + + +def documents(dataset='train', + include_unlabeled=False, + include_validation=False): + """Generates Documents based on FLAGS.dataset. + + Args: + dataset: str, identifies folder within IMDB data directory, test or train. + include_unlabeled: bool, whether to include the unsup directory. Only valid + when dataset=train. + include_validation: bool, whether to include validation data. + + Yields: + Document + + Raises: + ValueError: if include_unlabeled is true but dataset is not 'train' + """ + + if include_unlabeled and dataset != 'train': + raise ValueError('If include_unlabeled=True, must use train dataset') + + # Set the random seed so that we have the same validation set when running + # gen_data and gen_vocab. + random.seed(302) + + ds = FLAGS.dataset + if ds == 'imdb': + docs_gen = imdb_documents + elif ds == 'dbpedia': + docs_gen = dbpedia_documents + elif ds == 'rcv1': + docs_gen = rcv1_documents + elif ds == 'rt': + docs_gen = rt_documents + else: + raise ValueError('Unrecognized dataset %s' % FLAGS.dataset) + + for doc in docs_gen(dataset, include_unlabeled, include_validation): + yield doc + + +def tokens(doc): + """Given a Document, produces character or word tokens. + + Tokens can be either characters, or word-level tokens (unigrams and/or + bigrams). + + Args: + doc: Document to produce tokens from. + + Yields: + token + + Raises: + ValueError: if all FLAGS.{output_unigrams, output_bigrams, output_char} + are False. + """ + if not (FLAGS.output_unigrams or FLAGS.output_bigrams or FLAGS.output_char): + raise ValueError( + 'At least one of {FLAGS.output_unigrams, FLAGS.output_bigrams, ' + 'FLAGS.output_char} must be true') + + content = doc.content.strip() + if FLAGS.lowercase: + content = content.lower() + + if FLAGS.output_char: + for char in content: + yield char + + else: + tokens_ = data_utils.split_by_punct(content) + for i, token in enumerate(tokens_): + if FLAGS.output_unigrams: + yield token + + if FLAGS.output_bigrams: + previous_token = (tokens_[i - 1] if i > 0 else data_utils.EOS_TOKEN) + bigram = '_'.join([previous_token, token]) + yield bigram + if (i + 1) == len(tokens_): + bigram = '_'.join([token, data_utils.EOS_TOKEN]) + yield bigram + + +def imdb_documents(dataset='train', + include_unlabeled=False, + include_validation=False): + """Generates Documents for IMDB dataset. + + Data from http://ai.stanford.edu/~amaas/data/sentiment/ + + Args: + dataset: str, identifies folder within IMDB data directory, test or train. + include_unlabeled: bool, whether to include the unsup directory. Only valid + when dataset=train. + include_validation: bool, whether to include validation data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.imdb_input_dir is empty. + """ + if not FLAGS.imdb_input_dir: + raise ValueError('Must provide FLAGS.imdb_input_dir') + + tf.logging.info('Generating IMDB documents...') + + def check_is_validation(filename, class_label): + if class_label is None: + return False + file_idx = int(filename.split('_')[0]) + is_pos_valid = (class_label and + file_idx >= FLAGS.imdb_validation_pos_start_id) + is_neg_valid = (not class_label and + file_idx >= FLAGS.imdb_validation_neg_start_id) + return is_pos_valid or is_neg_valid + + dirs = [(dataset + '/pos', True), (dataset + '/neg', False)] + if include_unlabeled: + dirs.append(('train/unsup', None)) + + for d, class_label in dirs: + for filename in os.listdir(os.path.join(FLAGS.imdb_input_dir, d)): + is_validation = check_is_validation(filename, class_label) + if is_validation and not include_validation: + continue + + with open(os.path.join(FLAGS.imdb_input_dir, d, filename)) as imdb_f: + content = imdb_f.read() + yield Document( + content=content, + is_validation=is_validation, + is_test=False, + label=class_label, + add_tokens=True) + + if FLAGS.amazon_unlabeled_input_file and include_unlabeled: + with open(FLAGS.amazon_unlabeled_input_file) as rt_f: + for content in rt_f: + yield Document(content=content, is_validation=False, is_test=False, + label=None, add_tokens=False) + + +def dbpedia_documents(dataset='train', + include_unlabeled=False, + include_validation=False): + """Generates Documents for DBpedia dataset. + + Dataset linked to at https://github.com/zhangxiangxiao/Crepe. + + Args: + dataset: str, identifies the csv file within the DBpedia data directory, + test or train. + include_unlabeled: bool, unused. + include_validation: bool, whether to include validation data, which is a + randomly selected 10% of the data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.dbpedia_input_dir is empty. + """ + del include_unlabeled + + if not FLAGS.dbpedia_input_dir: + raise ValueError('Must provide FLAGS.dbpedia_input_dir') + + tf.logging.info('Generating DBpedia documents...') + + with open(os.path.join(FLAGS.dbpedia_input_dir, dataset + '.csv')) as db_f: + reader = csv.reader(db_f) + for row in reader: + # 10% of the data is randomly held out + is_validation = random.randint(1, 10) == 1 + if is_validation and not include_validation: + continue + + content = row[1] + ' ' + row[2] + yield Document( + content=content, + is_validation=is_validation, + is_test=False, + label=int(row[0]), + add_tokens=True) + + +def rcv1_documents(dataset='train', + include_unlabeled=True, + include_validation=False): + # pylint:disable=line-too-long + """Generates Documents for Reuters Corpus (rcv1) dataset. + + Dataset described at http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm + + Args: + dataset: str, identifies the csv file within the rcv1 data directory. + include_unlabeled: bool, whether to include the unlab file. Only valid + when dataset=train. + include_validation: bool, whether to include validation data, which is a + randomly selected 10% of the data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.rcv1_input_dir is empty. + """ + # pylint:enable=line-too-long + + if not FLAGS.rcv1_input_dir: + raise ValueError('Must provide FLAGS.rcv1_input_dir') + + tf.logging.info('Generating rcv1 documents...') + + datasets = [dataset] + if include_unlabeled: + if dataset == 'train': + datasets.append('unlab') + for dset in datasets: + with open(os.path.join(FLAGS.rcv1_input_dir, dset + '.csv')) as db_f: + reader = csv.reader(db_f) + for row in reader: + # 10% of the data is randomly held out + is_validation = random.randint(1, 10) == 1 + if is_validation and not include_validation: + continue + + content = row[1] + yield Document( + content=content, + is_validation=is_validation, + is_test=False, + label=int(row[0]), + add_tokens=True) + + +def rt_documents(dataset='train', + include_unlabeled=True, + include_validation=False): + # pylint:disable=line-too-long + """Generates Documents for the Rotten Tomatoes dataset. + + Dataset available at http://www.cs.cornell.edu/people/pabo/movie-review-data/ + In this dataset, amazon reviews are used for the unlabeled data. + + Args: + dataset: str, identifies the data subdirectory. + include_unlabeled: bool, whether to include the unlabeled data. Only valid + when dataset=train. + include_validation: bool, whether to include validation data, which is a + randomly selected 10% of the data. + + Yields: + Document + + Raises: + ValueError: if FLAGS.rt_input_dir is empty. + """ + # pylint:enable=line-too-long + + if not FLAGS.rt_input_dir: + raise ValueError('Must provide FLAGS.rt_input_dir') + + tf.logging.info('Generating rt documents...') + + data_files = [] + input_filenames = os.listdir(FLAGS.rt_input_dir) + for inp_fname in input_filenames: + if inp_fname.endswith('.pos'): + data_files.append((os.path.join(FLAGS.rt_input_dir, inp_fname), True)) + elif inp_fname.endswith('.neg'): + data_files.append((os.path.join(FLAGS.rt_input_dir, inp_fname), False)) + if include_unlabeled and FLAGS.amazon_unlabeled_input_file: + data_files.append((FLAGS.amazon_unlabeled_input_file, None)) + + for filename, class_label in data_files: + with open(filename) as rt_f: + for content in rt_f: + if class_label is None: + # Process Amazon Review data for unlabeled dataset + if content.startswith('review/text'): + yield Document(content=content, is_validation=False, + is_test=False, label=None, add_tokens=False) + else: + # 10% of the data is randomly held out for the validation set and + # another 10% of it is randomly held out for the test set + random_int = random.randint(1, 10) + is_validation = random_int == 1 + is_test = random_int == 2 + if (is_test and dataset != 'test') or ( + is_validation and not include_validation): + continue + + yield Document(content=content, is_validation=is_validation, + is_test=is_test, label=class_label, add_tokens=True) diff --git a/adversarial_text/data/gen_data.py b/adversarial_text/data/gen_data.py new file mode 100644 index 0000000000000000000000000000000000000000..0631de8e77520c9ec00179fa44e5425de7ec2cb2 --- /dev/null +++ b/adversarial_text/data/gen_data.py @@ -0,0 +1,215 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Create TFRecord files of SequenceExample protos from dataset. + +Constructs 3 datasets: + 1. Labeled data for the LSTM classification model, optionally with label gain. + "*_classification.tfrecords" (for both unidirectional and bidirectional + models). + 2. Data for the unsupervised LM-LSTM model that predicts the next token. + "*_lm.tfrecords" (generates forward and reverse data). + 3. Data for the unsupervised SA-LSTM model that uses Seq2Seq. + "*_sa.tfrecords". +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import string + +import tensorflow as tf + +from adversarial_text.data import data_utils +from adversarial_text.data import document_generators + +data = data_utils +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Flags for input data are in document_generators.py +flags.DEFINE_string('vocab_file', '', 'Path to the vocabulary file. Defaults ' + 'to FLAGS.output_dir/vocab.txt.') +flags.DEFINE_string('output_dir', '', 'Path to save tfrecords.') + +# Config +flags.DEFINE_boolean('label_gain', False, + 'Enable linear label gain. If True, sentiment label will ' + 'be included at each timestep with linear weight ' + 'increase.') + + +def build_shuffling_tf_record_writer(fname): + return data.ShufflingTFRecordWriter(os.path.join(FLAGS.output_dir, fname)) + + +def build_tf_record_writer(fname): + return tf.python_io.TFRecordWriter(os.path.join(FLAGS.output_dir, fname)) + + +def build_input_sequence(doc, vocab_ids): + """Builds input sequence from file. + + Splits lines on whitespace. Treats punctuation as whitespace. For word-level + sequences, only keeps terms that are in the vocab. + + Terms are added as token in the SequenceExample. The EOS_TOKEN is also + appended. Label and weight features are set to 0. + + Args: + doc: Document (defined in `document_generators`) from which to build the + sequence. + vocab_ids: dict. + + Returns: + SequenceExampleWrapper. + """ + seq = data.SequenceWrapper() + for token in document_generators.tokens(doc): + if token in vocab_ids: + seq.add_timestep().set_token(vocab_ids[token]) + + # Add EOS token to end + seq.add_timestep().set_token(vocab_ids[data.EOS_TOKEN]) + + return seq + + +def make_vocab_ids(vocab_filename): + if FLAGS.output_char: + ret = dict([(char, i) for i, char in enumerate(string.printable)]) + ret[data.EOS_TOKEN] = len(string.printable) + return ret + else: + with open(vocab_filename) as vocab_f: + return dict([(line.strip(), i) for i, line in enumerate(vocab_f)]) + + +def generate_training_data(vocab_ids, writer_lm_all, writer_seq_ae_all): + """Generates training data.""" + + # Construct training data writers + writer_lm = build_shuffling_tf_record_writer(data.TRAIN_LM) + writer_seq_ae = build_shuffling_tf_record_writer(data.TRAIN_SA) + writer_class = build_shuffling_tf_record_writer(data.TRAIN_CLASS) + writer_valid_class = build_tf_record_writer(data.VALID_CLASS) + writer_rev_lm = build_shuffling_tf_record_writer(data.TRAIN_REV_LM) + writer_bd_class = build_shuffling_tf_record_writer(data.TRAIN_BD_CLASS) + writer_bd_valid_class = build_shuffling_tf_record_writer(data.VALID_BD_CLASS) + + for doc in document_generators.documents( + dataset='train', include_unlabeled=True, include_validation=True): + input_seq = build_input_sequence(doc, vocab_ids) + if len(input_seq) < 2: + continue + rev_seq = data.build_reverse_sequence(input_seq) + lm_seq = data.build_lm_sequence(input_seq) + rev_lm_seq = data.build_lm_sequence(rev_seq) + seq_ae_seq = data.build_seq_ae_sequence(input_seq) + if doc.label is not None: + # Used for sentiment classification. + label_seq = data.build_labeled_sequence( + input_seq, + doc.label, + label_gain=(FLAGS.label_gain and not doc.is_validation)) + bd_label_seq = data.build_labeled_sequence( + data.build_bidirectional_seq(input_seq, rev_seq), + doc.label, + label_gain=(FLAGS.label_gain and not doc.is_validation)) + class_writer = writer_valid_class if doc.is_validation else writer_class + bd_class_writer = (writer_bd_valid_class + if doc.is_validation else writer_bd_class) + class_writer.write(label_seq.seq.SerializeToString()) + bd_class_writer.write(bd_label_seq.seq.SerializeToString()) + + # Write + lm_seq_ser = lm_seq.seq.SerializeToString() + seq_ae_seq_ser = seq_ae_seq.seq.SerializeToString() + writer_lm_all.write(lm_seq_ser) + writer_seq_ae_all.write(seq_ae_seq_ser) + if not doc.is_validation: + writer_lm.write(lm_seq_ser) + writer_rev_lm.write(rev_lm_seq.seq.SerializeToString()) + writer_seq_ae.write(seq_ae_seq_ser) + + # Close writers + writer_lm.close() + writer_seq_ae.close() + writer_class.close() + writer_valid_class.close() + writer_rev_lm.close() + writer_bd_class.close() + writer_bd_valid_class.close() + + +def generate_test_data(vocab_ids, writer_lm_all, writer_seq_ae_all): + """Generates test data.""" + # Construct test data writers + writer_lm = build_shuffling_tf_record_writer(data.TEST_LM) + writer_rev_lm = build_shuffling_tf_record_writer(data.TEST_REV_LM) + writer_seq_ae = build_shuffling_tf_record_writer(data.TEST_SA) + writer_class = build_tf_record_writer(data.TEST_CLASS) + writer_bd_class = build_shuffling_tf_record_writer(data.TEST_BD_CLASS) + + for doc in document_generators.documents( + dataset='test', include_unlabeled=False, include_validation=True): + input_seq = build_input_sequence(doc, vocab_ids) + if len(input_seq) < 2: + continue + rev_seq = data.build_reverse_sequence(input_seq) + lm_seq = data.build_lm_sequence(input_seq) + rev_lm_seq = data.build_lm_sequence(rev_seq) + seq_ae_seq = data.build_seq_ae_sequence(input_seq) + label_seq = data.build_labeled_sequence(input_seq, doc.label) + bd_label_seq = data.build_labeled_sequence( + data.build_bidirectional_seq(input_seq, rev_seq), doc.label) + + # Write + writer_class.write(label_seq.seq.SerializeToString()) + writer_bd_class.write(bd_label_seq.seq.SerializeToString()) + lm_seq_ser = lm_seq.seq.SerializeToString() + seq_ae_seq_ser = seq_ae_seq.seq.SerializeToString() + writer_lm.write(lm_seq_ser) + writer_rev_lm.write(rev_lm_seq.seq.SerializeToString()) + writer_seq_ae.write(seq_ae_seq_ser) + writer_lm_all.write(lm_seq_ser) + writer_seq_ae_all.write(seq_ae_seq_ser) + + # Close test writers + writer_lm.close() + writer_rev_lm.close() + writer_seq_ae.close() + writer_class.close() + writer_bd_class.close() + + +def main(_): + tf.logging.info('Assigning vocabulary ids...') + vocab_ids = make_vocab_ids( + FLAGS.vocab_file or os.path.join(FLAGS.output_dir, 'vocab.txt')) + + with build_shuffling_tf_record_writer(data.ALL_LM) as writer_lm_all: + with build_shuffling_tf_record_writer(data.ALL_SA) as writer_seq_ae_all: + + tf.logging.info('Generating training data...') + generate_training_data(vocab_ids, writer_lm_all, writer_seq_ae_all) + + tf.logging.info('Generating test data...') + generate_test_data(vocab_ids, writer_lm_all, writer_seq_ae_all) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/data/gen_vocab.py b/adversarial_text/data/gen_vocab.py new file mode 100644 index 0000000000000000000000000000000000000000..43a8688fa95fd4fa917894e4a709df2303bdb1e0 --- /dev/null +++ b/adversarial_text/data/gen_vocab.py @@ -0,0 +1,98 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generates vocabulary and term frequency files for datasets.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import defaultdict + +import tensorflow as tf + +from adversarial_text.data import data_utils +from adversarial_text.data import document_generators + +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Flags controlling input are in document_generators.py + +flags.DEFINE_string('output_dir', '', + 'Path to save vocab.txt and vocab_freq.txt.') + +flags.DEFINE_boolean('use_unlabeled', True, 'Whether to use the ' + 'unlabeled sentiment dataset in the vocabulary.') +flags.DEFINE_boolean('include_validation', False, 'Whether to include the ' + 'validation set in the vocabulary.') +flags.DEFINE_integer('doc_count_threshold', 1, 'The minimum number of ' + 'documents a word or bigram should occur in to keep ' + 'it in the vocabulary.') + +MAX_VOCAB_SIZE = 100 * 1000 + + +def fill_vocab_from_doc(doc, vocab_freqs, doc_counts): + """Fills vocabulary and doc counts with tokens from doc. + + Args: + doc: Document to read tokens from. + vocab_freqs: dict + doc_counts: dict + + Returns: + None + """ + doc_seen = set() + + for token in document_generators.tokens(doc): + if doc.add_tokens or token in vocab_freqs: + vocab_freqs[token] += 1 + if token not in doc_seen: + doc_counts[token] += 1 + doc_seen.add(token) + + +def main(_): + vocab_freqs = defaultdict(int) + doc_counts = defaultdict(int) + + # Fill vocabulary frequencies map and document counts map + for doc in document_generators.documents( + dataset='train', + include_unlabeled=FLAGS.use_unlabeled, + include_validation=FLAGS.include_validation): + fill_vocab_from_doc(doc, vocab_freqs, doc_counts) + + # Filter out low-occurring terms + vocab_freqs = dict((term, freq) for term, freq in vocab_freqs.iteritems() + if doc_counts[term] > FLAGS.doc_count_threshold) + + # Sort by frequency + ordered_vocab_freqs = data_utils.sort_vocab_by_frequency(vocab_freqs) + + # Limit vocab size + ordered_vocab_freqs = ordered_vocab_freqs[:MAX_VOCAB_SIZE] + + # Add EOS token + ordered_vocab_freqs.append((data_utils.EOS_TOKEN, 1)) + + # Write + tf.gfile.MakeDirs(FLAGS.output_dir) + data_utils.write_vocab_and_frequency(ordered_vocab_freqs, FLAGS.output_dir) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/evaluate.py b/adversarial_text/evaluate.py new file mode 100644 index 0000000000000000000000000000000000000000..2c96b7990dee2d15271883e6ea60a6e2f1cecb59 --- /dev/null +++ b/adversarial_text/evaluate.py @@ -0,0 +1,136 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Evaluates text classification model.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import time + +import tensorflow as tf + +import graphs + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', '', + 'BNS name prefix of the Tensorflow eval master, ' + 'or "local".') +flags.DEFINE_string('eval_dir', '/tmp/text_eval', + 'Directory where to write event logs.') +flags.DEFINE_string('eval_data', 'test', 'Specify which dataset is used. ' + '("train", "valid", "test") ') + +flags.DEFINE_string('checkpoint_dir', '/tmp/text_train', + 'Directory where to read model checkpoints.') +flags.DEFINE_integer('eval_interval_secs', 60, 'How often to run the eval.') +flags.DEFINE_integer('num_examples', 32, 'Number of examples to run.') +flags.DEFINE_bool('run_once', False, 'Whether to run eval only once.') + + +def restore_from_checkpoint(sess, saver): + """Restore model from checkpoint. + + Args: + sess: Session. + saver: Saver for restoring the checkpoint. + + Returns: + bool: Whether the checkpoint was found and restored + """ + ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) + if not ckpt or not ckpt.model_checkpoint_path: + tf.logging.info('No checkpoint found at %s', FLAGS.checkpoint_dir) + return False + + saver.restore(sess, ckpt.model_checkpoint_path) + return True + + +def run_eval(eval_ops, summary_writer, saver): + """Runs evaluation over FLAGS.num_examples examples. + + Args: + eval_ops: dict + summary_writer: Summary writer. + saver: Saver. + + Returns: + dict, with value being the average over all examples. + """ + sv = tf.train.Supervisor(logdir=FLAGS.eval_dir, saver=None, summary_op=None) + with sv.managed_session( + master=FLAGS.master, start_standard_services=False) as sess: + if not restore_from_checkpoint(sess, saver): + return + sv.start_queue_runners(sess) + + metric_names, ops = zip(*eval_ops.items()) + value_ops, update_ops = zip(*ops) + + value_ops_dict = dict(zip(metric_names, value_ops)) + + # Run update ops + num_batches = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size)) + tf.logging.info('Running %d batches for evaluation.', num_batches) + for i in range(num_batches): + if (i + 1) % 10 == 0: + tf.logging.info('Running batch %d/%d...', i + 1, num_batches) + if (i + 1) % 50 == 0: + _log_values(sess, value_ops_dict) + sess.run(update_ops) + + _log_values(sess, value_ops_dict, summary_writer=summary_writer) + + +def _log_values(sess, value_ops, summary_writer=None): + metric_names, value_ops = zip(*value_ops.items()) + values = sess.run(value_ops) + + tf.logging.info('Eval metric values:') + summary = tf.summary.Summary() + for name, val in zip(metric_names, values): + summary.value.add(tag=name, simple_value=val) + tf.logging.info('%s = %.3f', name, val) + + if summary_writer is not None: + global_step_val = sess.run(tf.train.get_global_step()) + summary_writer.add_summary(summary, global_step_val) + + +def main(_): + tf.logging.set_verbosity(tf.logging.INFO) + tf.gfile.MakeDirs(FLAGS.eval_dir) + tf.logging.info('Building eval graph...') + output = graphs.get_model().eval_graph(FLAGS.eval_data) + eval_ops, moving_averaged_variables = output + + saver = tf.train.Saver(moving_averaged_variables) + summary_writer = tf.summary.FileWriter( + FLAGS.eval_dir, graph=tf.get_default_graph()) + + while True: + run_eval(eval_ops, summary_writer, saver) + if FLAGS.run_once: + break + time.sleep(FLAGS.eval_interval_secs) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/graphs.py b/adversarial_text/graphs.py new file mode 100644 index 0000000000000000000000000000000000000000..4d5dce8d0e01cb0ed22d4bbcc56ac43ec11840b1 --- /dev/null +++ b/adversarial_text/graphs.py @@ -0,0 +1,661 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Virtual adversarial text models.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import csv +import os +import tensorflow as tf + +import adversarial_losses as adv_lib +import inputs as inputs_lib +import layers as layers_lib + +flags = tf.app.flags +FLAGS = flags.FLAGS + +# Flags governing adversarial training are defined in adversarial_losses.py. + +# Classifier +flags.DEFINE_integer('num_classes', 2, 'Number of classes for classification') + +# Data path +flags.DEFINE_string('data_dir', '/tmp/IMDB', + 'Directory path to preprocessed text dataset.') +flags.DEFINE_string('vocab_freq_path', None, + 'Path to pre-calculated vocab frequency data. If ' + 'None, use FLAGS.data_dir/vocab_freq.txt.') +flags.DEFINE_integer('batch_size', 64, 'Size of the batch.') +flags.DEFINE_integer('num_timesteps', 100, 'Number of timesteps for BPTT') + +# Model architechture +flags.DEFINE_bool('bidir_lstm', False, 'Whether to build a bidirectional LSTM.') +flags.DEFINE_integer('rnn_num_layers', 1, 'Number of LSTM layers.') +flags.DEFINE_integer('rnn_cell_size', 512, + 'Number of hidden units in the LSTM.') +flags.DEFINE_integer('cl_num_layers', 1, + 'Number of hidden layers of classification model.') +flags.DEFINE_integer('cl_hidden_size', 30, + 'Number of hidden units in classification layer.') +flags.DEFINE_integer('num_candidate_samples', -1, + 'Num samples used in the sampled output layer.') +flags.DEFINE_bool('use_seq2seq_autoencoder', False, + 'If True, seq2seq auto-encoder is used to pretrain. ' + 'If False, standard language model is used.') + +# Vocabulary and embeddings +flags.DEFINE_integer('embedding_dims', 256, 'Dimensions of embedded vector.') +flags.DEFINE_integer('vocab_size', 86934, + 'The size of the vocaburary. This value ' + 'should be exactly same as the number of the ' + 'vocabulary used in dataset. Because the last ' + 'indexed vocabulary of the dataset preprocessed by ' + 'my preprocessed code, is always and here we ' + 'specify the with the the index.') +flags.DEFINE_bool('normalize_embeddings', True, + 'Normalize word embeddings by vocab frequency') + +# Optimization +flags.DEFINE_float('learning_rate', 0.001, 'Learning rate while fine-tuning.') +flags.DEFINE_float('learning_rate_decay_factor', 1.0, + 'Learning rate decay factor') +flags.DEFINE_boolean('sync_replicas', False, 'sync_replica or not') +flags.DEFINE_integer('replicas_to_aggregate', 1, + 'The number of replicas to aggregate') + +# Regularization +flags.DEFINE_float('max_grad_norm', 1.0, + 'Clip the global gradient norm to this value.') +flags.DEFINE_float('keep_prob_emb', 1.0, 'keep probability on embedding layer') +flags.DEFINE_float('keep_prob_lstm_out', 1.0, + 'keep probability on lstm output.') +flags.DEFINE_float('keep_prob_cl_hidden', 1.0, + 'keep probability on classification hidden layer') + + +def get_model(): + if FLAGS.bidir_lstm: + return VatxtBidirModel() + else: + return VatxtModel() + + +class VatxtModel(object): + """Constructs training and evaluation graphs. + + Main methods: `classifier_training()`, `language_model_training()`, + and `eval_graph()`. + + Variable reuse is a critical part of the model, both for sharing variables + between the language model and the classifier, and for reusing variables for + the adversarial loss calculation. To ensure correct variable reuse, all + variables are created in Keras-style layers, wherein stateful layers (i.e. + layers with variables) are represented as callable instances of the Layer + class. Each time the Layer instance is called, it is using the same variables. + + All Layers are constructed in the __init__ method and reused in the various + graph-building functions. + """ + + def __init__(self, cl_logits_input_dim=None): + self.global_step = tf.contrib.framework.get_or_create_global_step() + self.vocab_freqs = _get_vocab_freqs() + + # Cache VatxtInput objects + self.cl_inputs = None + self.lm_inputs = None + + # Cache intermediate Tensors that are reused + self.tensors = {} + + # Construct layers which are reused in constructing the LM and + # Classification graphs. Instantiating them all once here ensures that + # variable reuse works correctly. + self.layers = {} + self.layers['embedding'] = layers_lib.Embedding( + FLAGS.vocab_size, FLAGS.embedding_dims, FLAGS.normalize_embeddings, + self.vocab_freqs, FLAGS.keep_prob_emb) + self.layers['lstm'] = layers_lib.LSTM( + FLAGS.rnn_cell_size, FLAGS.rnn_num_layers, FLAGS.keep_prob_lstm_out) + self.layers['lm_loss'] = layers_lib.SoftmaxLoss( + FLAGS.vocab_size, + FLAGS.num_candidate_samples, + self.vocab_freqs, + name='LM_loss') + + cl_logits_input_dim = cl_logits_input_dim or FLAGS.rnn_cell_size + self.layers['cl_logits'] = layers_lib.cl_logits_subgraph( + [FLAGS.cl_hidden_size] * FLAGS.cl_num_layers, cl_logits_input_dim, + FLAGS.num_classes, FLAGS.keep_prob_cl_hidden) + + @property + def pretrained_variables(self): + return (self.layers['embedding'].trainable_weights + + self.layers['lstm'].trainable_weights) + + def classifier_training(self): + loss = self.classifier_graph() + train_op = optimize(loss, self.global_step) + return train_op, loss, self.global_step + + def language_model_training(self): + loss = self.language_model_graph() + train_op = optimize(loss, self.global_step) + return train_op, loss, self.global_step + + def classifier_graph(self): + """Constructs classifier graph from inputs to classifier loss. + + * Caches the VatxtInput object in `self.cl_inputs` + * Caches tensors: `cl_embedded`, `cl_logits`, `cl_loss` + + Returns: + loss: scalar float. + """ + inputs = _inputs('train', pretrain=False) + self.cl_inputs = inputs + embedded = self.layers['embedding'](inputs.tokens) + self.tensors['cl_embedded'] = embedded + + _, next_state, logits, loss = self.cl_loss_from_embedding( + embedded, return_intermediates=True) + tf.summary.scalar('classification_loss', loss) + self.tensors['cl_logits'] = logits + self.tensors['cl_loss'] = loss + + acc = layers_lib.accuracy(logits, inputs.labels, inputs.weights) + tf.summary.scalar('accuracy', acc) + + adv_loss = (self.adversarial_loss() * tf.constant( + FLAGS.adv_reg_coeff, name='adv_reg_coeff')) + tf.summary.scalar('adversarial_loss', adv_loss) + + total_loss = loss + adv_loss + tf.summary.scalar('total_classification_loss', total_loss) + + with tf.control_dependencies([inputs.save_state(next_state)]): + total_loss = tf.identity(total_loss) + + return total_loss + + def language_model_graph(self, compute_loss=True): + """Constructs LM graph from inputs to LM loss. + + * Caches the VatxtInput object in `self.lm_inputs` + * Caches tensors: `lm_embedded` + + Args: + compute_loss: bool, whether to compute and return the loss or stop after + the LSTM computation. + + Returns: + loss: scalar float. + """ + inputs = _inputs('train', pretrain=True) + self.lm_inputs = inputs + return self._lm_loss(inputs, compute_loss=compute_loss) + + def _lm_loss(self, + inputs, + emb_key='lm_embedded', + lstm_layer='lstm', + lm_loss_layer='lm_loss', + loss_name='lm_loss', + compute_loss=True): + embedded = self.layers['embedding'](inputs.tokens) + self.tensors[emb_key] = embedded + lstm_out, next_state = self.layers[lstm_layer](embedded, inputs.state, + inputs.length) + if compute_loss: + loss = self.layers[lm_loss_layer]( + [lstm_out, inputs.labels, inputs.weights]) + with tf.control_dependencies([inputs.save_state(next_state)]): + loss = tf.identity(loss) + tf.summary.scalar(loss_name, loss) + + return loss + + def eval_graph(self, dataset='test'): + """Constructs classifier evaluation graph. + + Args: + dataset: the labeled dataset to evaluate, {'train', 'test', 'valid'}. + + Returns: + eval_ops: dict + var_restore_dict: dict mapping variable restoration names to variables. + Trainable variables will be mapped to their moving average names. + """ + inputs = _inputs(dataset, pretrain=False) + embedded = self.layers['embedding'](inputs.tokens) + _, next_state, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=inputs, return_intermediates=True) + + eval_ops = { + 'accuracy': + tf.contrib.metrics.streaming_accuracy( + layers_lib.predictions(logits), inputs.labels, + inputs.weights) + } + + with tf.control_dependencies([inputs.save_state(next_state)]): + acc, acc_update = eval_ops['accuracy'] + acc_update = tf.identity(acc_update) + eval_ops['accuracy'] = (acc, acc_update) + + var_restore_dict = make_restore_average_vars_dict() + return eval_ops, var_restore_dict + + def cl_loss_from_embedding(self, + embedded, + inputs=None, + return_intermediates=False): + """Compute classification loss from embedding. + + Args: + embedded: 3-D float Tensor [batch_size, num_timesteps, embedding_dim] + inputs: VatxtInput, defaults to self.cl_inputs. + return_intermediates: bool, whether to return intermediate tensors or only + the final loss. + + Returns: + If return_intermediates is True: + lstm_out, next_state, logits, loss + Else: + loss + """ + if inputs is None: + inputs = self.cl_inputs + + lstm_out, next_state = self.layers['lstm'](embedded, inputs.state, + inputs.length) + logits = self.layers['cl_logits'](lstm_out) + loss = layers_lib.classification_loss(logits, inputs.labels, inputs.weights) + + if return_intermediates: + return lstm_out, next_state, logits, loss + else: + return loss + + def adversarial_loss(self): + """Compute adversarial loss based on FLAGS.adv_training_method.""" + + def random_perturbation_loss(): + return adv_lib.random_perturbation_loss(self.tensors['cl_embedded'], + self.cl_inputs.length, + self.cl_loss_from_embedding) + + def adversarial_loss(): + return adv_lib.adversarial_loss(self.tensors['cl_embedded'], + self.tensors['cl_loss'], + self.cl_loss_from_embedding) + + def virtual_adversarial_loss(): + """Computes virtual adversarial loss. + + Uses lm_inputs and constructs the language model graph if it hasn't yet + been constructed. + + Also ensures that the LM input states are saved for LSTM state-saving + BPTT. + + Returns: + loss: float scalar. + """ + if self.lm_inputs is None: + self.language_model_graph(compute_loss=False) + + def logits_from_embedding(embedded, return_next_state=False): + _, next_state, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=self.lm_inputs, return_intermediates=True) + if return_next_state: + return next_state, logits + else: + return logits + + next_state, lm_cl_logits = logits_from_embedding( + self.tensors['lm_embedded'], return_next_state=True) + + va_loss = adv_lib.virtual_adversarial_loss( + lm_cl_logits, self.tensors['lm_embedded'], self.lm_inputs, + logits_from_embedding) + + with tf.control_dependencies([self.lm_inputs.save_state(next_state)]): + va_loss = tf.identity(va_loss) + + return va_loss + + def combo_loss(): + return adversarial_loss() + virtual_adversarial_loss() + + adv_training_methods = { + # Random perturbation + 'rp': random_perturbation_loss, + # Adversarial training + 'at': adversarial_loss, + # Virtual adversarial training + 'vat': virtual_adversarial_loss, + # Both at and vat + 'atvat': combo_loss, + '': lambda: tf.constant(0.), + None: lambda: tf.constant(0.), + } + + with tf.name_scope('adversarial_loss'): + return adv_training_methods[FLAGS.adv_training_method]() + + +class VatxtBidirModel(VatxtModel): + """Extension of VatxtModel that supports bidirectional input.""" + + def __init__(self): + super(VatxtBidirModel, + self).__init__(cl_logits_input_dim=FLAGS.rnn_cell_size * 2) + + # Reverse LSTM and LM loss for bidirectional models + self.layers['lstm_reverse'] = layers_lib.LSTM( + FLAGS.rnn_cell_size, + FLAGS.rnn_num_layers, + FLAGS.keep_prob_lstm_out, + name='LSTM_Reverse') + self.layers['lm_loss_reverse'] = layers_lib.SoftmaxLoss( + FLAGS.vocab_size, + FLAGS.num_candidate_samples, + self.vocab_freqs, + name='LM_loss_reverse') + + @property + def pretrained_variables(self): + variables = super(VatxtBidirModel, self).pretrained_variables + variables.extend(self.layers['lstm_reverse'].trainable_weights) + return variables + + def classifier_graph(self): + """Constructs classifier graph from inputs to classifier loss. + + * Caches the VatxtInput objects in `self.cl_inputs` + * Caches tensors: `cl_embedded` (tuple of forward and reverse), `cl_logits`, + `cl_loss` + + Returns: + loss: scalar float. + """ + inputs = _inputs('train', pretrain=False, bidir=True) + self.cl_inputs = inputs + f_inputs, _ = inputs + + # Embed both forward and reverse with a shared embedding + embedded = [self.layers['embedding'](inp.tokens) for inp in inputs] + self.tensors['cl_embedded'] = embedded + + _, next_states, logits, loss = self.cl_loss_from_embedding( + embedded, return_intermediates=True) + tf.summary.scalar('classification_loss', loss) + self.tensors['cl_logits'] = logits + self.tensors['cl_loss'] = loss + + acc = layers_lib.accuracy(logits, f_inputs.labels, f_inputs.weights) + tf.summary.scalar('accuracy', acc) + + adv_loss = (self.adversarial_loss() * tf.constant( + FLAGS.adv_reg_coeff, name='adv_reg_coeff')) + tf.summary.scalar('adversarial_loss', adv_loss) + + total_loss = loss + adv_loss + tf.summary.scalar('total_classification_loss', total_loss) + + saves = [inp.save_state(state) for (inp, state) in zip(inputs, next_states)] + with tf.control_dependencies(saves): + total_loss = tf.identity(total_loss) + + return total_loss + + def language_model_graph(self, compute_loss=True): + """Constructs forward and reverse LM graphs from inputs to LM losses. + + * Caches the VatxtInput objects in `self.lm_inputs` + * Caches tensors: `lm_embedded`, `lm_embedded_reverse` + + Args: + compute_loss: bool, whether to compute and return the loss or stop after + the LSTM computation. + + Returns: + loss: scalar float, sum of forward and reverse losses. + """ + inputs = _inputs('train', pretrain=True, bidir=True) + self.lm_inputs = inputs + f_inputs, r_inputs = inputs + f_loss = self._lm_loss(f_inputs, compute_loss=compute_loss) + r_loss = self._lm_loss( + r_inputs, + emb_key='lm_embedded_reverse', + lstm_layer='lstm_reverse', + lm_loss_layer='lm_loss_reverse', + loss_name='lm_loss_reverse', + compute_loss=compute_loss) + if compute_loss: + return f_loss + r_loss + + def eval_graph(self, dataset='test'): + """Constructs classifier evaluation graph. + + Args: + dataset: the labeled dataset to evaluate, {'train', 'test', 'valid'}. + + Returns: + eval_ops: dict + var_restore_dict: dict mapping variable restoration names to variables. + Trainable variables will be mapped to their moving average names. + """ + inputs = _inputs(dataset, pretrain=False, bidir=True) + embedded = [self.layers['embedding'](inp.tokens) for inp in inputs] + _, next_states, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=inputs, return_intermediates=True) + f_inputs, _ = inputs + + eval_ops = { + 'accuracy': + tf.contrib.metrics.streaming_accuracy( + layers_lib.predictions(logits), f_inputs.labels, + f_inputs.weights) + } + + # Save states on accuracy update + saves = [inp.save_state(state) for (inp, state) in zip(inputs, next_states)] + with tf.control_dependencies(saves): + acc, acc_update = eval_ops['accuracy'] + acc_update = tf.identity(acc_update) + eval_ops['accuracy'] = (acc, acc_update) + + var_restore_dict = make_restore_average_vars_dict() + return eval_ops, var_restore_dict + + def cl_loss_from_embedding(self, + embedded, + inputs=None, + return_intermediates=False): + """Compute classification loss from embedding. + + Args: + embedded: Length 2 tuple of 3-D float Tensor + [batch_size, num_timesteps, embedding_dim]. + inputs: Length 2 tuple of VatxtInput, defaults to self.cl_inputs. + return_intermediates: bool, whether to return intermediate tensors or only + the final loss. + + Returns: + If return_intermediates is True: + lstm_out, next_states, logits, loss + Else: + loss + """ + if inputs is None: + inputs = self.cl_inputs + + out = [] + for (layer_name, emb, inp) in zip(['lstm', 'lstm_reverse'], embedded, + inputs): + out.append(self.layers[layer_name](emb, inp.state, inp.length)) + lstm_outs, next_states = zip(*out) + + # Concatenate output of forward and reverse LSTMs + lstm_out = tf.concat(lstm_outs, 1) + + logits = self.layers['cl_logits'](lstm_out) + f_inputs, _ = inputs # pylint: disable=unpacking-non-sequence + loss = layers_lib.classification_loss(logits, f_inputs.labels, + f_inputs.weights) + + if return_intermediates: + return lstm_out, next_states, logits, loss + else: + return loss + + def adversarial_loss(self): + """Compute adversarial loss based on FLAGS.adv_training_method.""" + + def random_perturbation_loss(): + return adv_lib.random_perturbation_loss_bidir(self.tensors['cl_embedded'], + self.cl_inputs[0].length, + self.cl_loss_from_embedding) + + def adversarial_loss(): + return adv_lib.adversarial_loss_bidir(self.tensors['cl_embedded'], + self.tensors['cl_loss'], + self.cl_loss_from_embedding) + + def virtual_adversarial_loss(): + """Computes virtual adversarial loss. + + Uses lm_inputs and constructs the language model graph if it hasn't yet + been constructed. + + Also ensures that the LM input states are saved for LSTM state-saving + BPTT. + + Returns: + loss: float scalar. + """ + if self.lm_inputs is None: + self.language_model_graph(compute_loss=False) + + def logits_from_embedding(embedded, return_next_state=False): + _, next_states, logits, _ = self.cl_loss_from_embedding( + embedded, inputs=self.lm_inputs, return_intermediates=True) + if return_next_state: + return next_states, logits + else: + return logits + + lm_embedded = (self.tensors['lm_embedded'], + self.tensors['lm_embedded_reverse']) + next_states, lm_cl_logits = logits_from_embedding( + lm_embedded, return_next_state=True) + + va_loss = adv_lib.virtual_adversarial_loss_bidir( + lm_cl_logits, lm_embedded, self.lm_inputs, logits_from_embedding) + + saves = [ + inp.save_state(state) + for (inp, state) in zip(self.lm_inputs, next_states) + ] + with tf.control_dependencies(saves): + va_loss = tf.identity(va_loss) + + return va_loss + + def combo_loss(): + return adversarial_loss() + virtual_adversarial_loss() + + adv_training_methods = { + # Random perturbation + 'rp': random_perturbation_loss, + # Adversarial training + 'at': adversarial_loss, + # Virtual adversarial training + 'vat': virtual_adversarial_loss, + # Both at and vat + 'atvat': combo_loss, + '': lambda: tf.constant(0.), + None: lambda: tf.constant(0.), + } + + with tf.name_scope('adversarial_loss'): + return adv_training_methods[FLAGS.adv_training_method]() + + +def _inputs(dataset='train', pretrain=False, bidir=False): + return inputs_lib.inputs( + data_dir=FLAGS.data_dir, + phase=dataset, + bidir=bidir, + pretrain=pretrain, + use_seq2seq=pretrain and FLAGS.use_seq2seq_autoencoder, + state_size=FLAGS.rnn_cell_size, + num_layers=FLAGS.rnn_num_layers, + batch_size=FLAGS.batch_size, + unroll_steps=FLAGS.num_timesteps) + + +def _get_vocab_freqs(): + """Returns vocab frequencies. + + Returns: + List of integers, length=FLAGS.vocab_size. + + Raises: + ValueError: if the length of the frequency file is not equal to the vocab + size, or if the file is not found. + """ + path = FLAGS.vocab_freq_path or os.path.join(FLAGS.data_dir, 'vocab_freq.txt') + + if tf.gfile.Exists(path): + with tf.gfile.Open(path) as f: + # Get pre-calculated frequencies of words. + reader = csv.reader(f, quoting=csv.QUOTE_NONE) + freqs = [int(row[-1]) for row in reader] + if len(freqs) != FLAGS.vocab_size: + raise ValueError('Frequency file length %d != vocab size %d' % + (len(freqs), FLAGS.vocab_size)) + else: + if FLAGS.vocab_freq_path: + raise ValueError('vocab_freq_path not found') + freqs = [1] * FLAGS.vocab_size + + return freqs + + +def make_restore_average_vars_dict(): + """Returns dict mapping moving average names to variables.""" + var_restore_dict = {} + variable_averages = tf.train.ExponentialMovingAverage(0.999) + for v in tf.global_variables(): + if v in tf.trainable_variables(): + name = variable_averages.average_name(v) + else: + name = v.op.name + var_restore_dict[name] = v + return var_restore_dict + + +def optimize(loss, global_step): + return layers_lib.optimize( + loss, global_step, FLAGS.max_grad_norm, FLAGS.learning_rate, + FLAGS.learning_rate_decay_factor, FLAGS.sync_replicas, + FLAGS.replicas_to_aggregate, FLAGS.task) diff --git a/adversarial_text/graphs_test.py b/adversarial_text/graphs_test.py new file mode 100644 index 0000000000000000000000000000000000000000..849e3d06f9f5d51eb4c6b81fe568a16a90fd962c --- /dev/null +++ b/adversarial_text/graphs_test.py @@ -0,0 +1,224 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for graphs.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from collections import defaultdict +import operator +import os +import random +import shutil +import string +import tempfile + +import tensorflow as tf + +import graphs +from adversarial_text.data import data_utils + +flags = tf.app.flags +FLAGS = flags.FLAGS +data = data_utils + +flags.DEFINE_integer('task', 0, 'Task id; needed for SyncReplicas test') + + +def _build_random_vocabulary(vocab_size=100): + """Builds and returns a dict.""" + vocab = set() + while len(vocab) < (vocab_size - 1): + rand_word = ''.join( + random.choice(string.ascii_lowercase) + for _ in range(random.randint(1, 10))) + vocab.add(rand_word) + + vocab_ids = dict([(word, i) for i, word in enumerate(vocab)]) + vocab_ids[data.EOS_TOKEN] = vocab_size - 1 + return vocab_ids + + +def _build_random_sequence(vocab_ids): + seq_len = random.randint(10, 200) + ids = vocab_ids.values() + seq = data.SequenceWrapper() + for token_id in [random.choice(ids) for _ in range(seq_len)]: + seq.add_timestep().set_token(token_id) + return seq + + +def _build_vocab_frequencies(seqs, vocab_ids): + vocab_freqs = defaultdict(int) + ids_to_words = dict([(i, word) for word, i in vocab_ids.iteritems()]) + for seq in seqs: + for timestep in seq: + vocab_freqs[ids_to_words[timestep.token]] += 1 + + vocab_freqs[data.EOS_TOKEN] = 0 + return vocab_freqs + + +class GraphsTest(tf.test.TestCase): + """Test graph construction methods.""" + + @classmethod + def setUpClass(cls): + # Make model small + FLAGS.batch_size = 2 + FLAGS.num_timesteps = 3 + FLAGS.embedding_dims = 4 + FLAGS.rnn_num_layers = 2 + FLAGS.rnn_cell_size = 4 + FLAGS.cl_num_layers = 2 + FLAGS.cl_hidden_size = 4 + FLAGS.vocab_size = 10 + + # Set input/output flags + FLAGS.data_dir = tempfile.mkdtemp() + + # Build and write sequence files. + vocab_ids = _build_random_vocabulary(FLAGS.vocab_size) + seqs = [_build_random_sequence(vocab_ids) for _ in range(5)] + seqs_label = [ + data.build_labeled_sequence(seq, random.choice([True, False])) + for seq in seqs + ] + seqs_lm = [data.build_lm_sequence(seq) for seq in seqs] + seqs_ae = [data.build_seq_ae_sequence(seq) for seq in seqs] + seqs_rev = [data.build_reverse_sequence(seq) for seq in seqs] + seqs_bidir = [ + data.build_bidirectional_seq(seq, rev) + for seq, rev in zip(seqs, seqs_rev) + ] + seqs_bidir_label = [ + data.build_labeled_sequence(bd_seq, random.choice([True, False])) + for bd_seq in seqs_bidir + ] + + filenames = [ + data.TRAIN_CLASS, data.TRAIN_LM, data.TRAIN_SA, data.TEST_CLASS, + data.TRAIN_REV_LM, data.TRAIN_BD_CLASS, data.TEST_BD_CLASS + ] + seq_lists = [ + seqs_label, seqs_lm, seqs_ae, seqs_label, seqs_rev, seqs_bidir, + seqs_bidir_label + ] + for fname, seq_list in zip(filenames, seq_lists): + with tf.python_io.TFRecordWriter( + os.path.join(FLAGS.data_dir, fname)) as writer: + for seq in seq_list: + writer.write(seq.seq.SerializeToString()) + + # Write vocab.txt and vocab_freq.txt + vocab_freqs = _build_vocab_frequencies(seqs, vocab_ids) + ordered_vocab_freqs = sorted( + vocab_freqs.items(), key=operator.itemgetter(1), reverse=True) + with open(os.path.join(FLAGS.data_dir, 'vocab.txt'), 'w') as vocab_f: + with open(os.path.join(FLAGS.data_dir, 'vocab_freq.txt'), 'w') as freq_f: + for word, freq in ordered_vocab_freqs: + vocab_f.write('{}\n'.format(word)) + freq_f.write('{}\n'.format(freq)) + + @classmethod + def tearDownClass(cls): + shutil.rmtree(FLAGS.data_dir) + + def setUp(self): + # Reset FLAGS + FLAGS.rnn_num_layers = 1 + FLAGS.sync_replicas = False + FLAGS.adv_training_method = None + FLAGS.num_candidate_samples = -1 + FLAGS.num_classes = 2 + FLAGS.use_seq2seq_autoencoder = False + + # Reset Graph + tf.reset_default_graph() + + def testClassifierGraph(self): + FLAGS.rnn_num_layers = 2 + model = graphs.VatxtModel() + train_op, _, _ = model.classifier_training() + # Pretrained vars: embedding + LSTM layers + self.assertEqual( + len(model.pretrained_variables), 1 + 2 * FLAGS.rnn_num_layers) + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + tf.train.start_queue_runners(sess) + sess.run(train_op) + + def testLanguageModelGraph(self): + train_op, _, _ = graphs.VatxtModel().language_model_training() + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + tf.train.start_queue_runners(sess) + sess.run(train_op) + + def testMulticlass(self): + FLAGS.num_classes = 10 + graphs.VatxtModel().classifier_graph() + + def testATMethods(self): + at_methods = [None, 'rp', 'at', 'vat', 'atvat'] + for method in at_methods: + FLAGS.adv_training_method = method + with tf.Graph().as_default(): + graphs.VatxtModel().classifier_graph() + + # Ensure variables have been reused + # Embedding + LSTM layers + hidden layers + logits layer + expected_num_vars = 1 + 2 * FLAGS.rnn_num_layers + 2 * ( + FLAGS.cl_num_layers) + 2 + self.assertEqual(len(tf.trainable_variables()), expected_num_vars) + + def testSyncReplicas(self): + FLAGS.sync_replicas = True + graphs.VatxtModel().language_model_training() + + def testCandidateSampling(self): + FLAGS.num_candidate_samples = 10 + graphs.VatxtModel().language_model_training() + + def testSeqAE(self): + FLAGS.use_seq2seq_autoencoder = True + graphs.VatxtModel().language_model_training() + + def testBidirLM(self): + graphs.VatxtBidirModel().language_model_graph() + + def testBidirClassifier(self): + at_methods = [None, 'rp', 'at', 'vat', 'atvat'] + for method in at_methods: + FLAGS.adv_training_method = method + with tf.Graph().as_default(): + graphs.VatxtBidirModel().classifier_graph() + + # Ensure variables have been reused + # Embedding + 2 LSTM layers + hidden layers + logits layer + expected_num_vars = 1 + 2 * 2 * FLAGS.rnn_num_layers + 2 * ( + FLAGS.cl_num_layers) + 2 + self.assertEqual(len(tf.trainable_variables()), expected_num_vars) + + def testEvalGraph(self): + _, _ = graphs.VatxtModel().eval_graph() + + def testBidirEvalGraph(self): + _, _ = graphs.VatxtBidirModel().eval_graph() + + +if __name__ == '__main__': + tf.test.main() diff --git a/adversarial_text/inputs.py b/adversarial_text/inputs.py new file mode 100644 index 0000000000000000000000000000000000000000..ec99eded05ed4abf4dfbf8e52d36c0ef796026ba --- /dev/null +++ b/adversarial_text/inputs.py @@ -0,0 +1,325 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Input utils for virtual adversarial text classification.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import tensorflow as tf + +from adversarial_text.data import data_utils + + +class VatxtInput(object): + """Wrapper around NextQueuedSequenceBatch.""" + + def __init__(self, batch, state_name=None, tokens=None, num_states=0): + """Construct VatxtInput. + + Args: + batch: NextQueuedSequenceBatch. + state_name: str, name of state to fetch and save. + tokens: int Tensor, tokens. Defaults to batch's F_TOKEN_ID sequence. + num_states: int The number of states to store. + """ + self._batch = batch + self._state_name = state_name + self._tokens = (tokens if tokens is not None else + batch.sequences[data_utils.SequenceWrapper.F_TOKEN_ID]) + self._num_states = num_states + + # Once the tokens have passed through embedding and LSTM, the output Tensor + # shapes will be time-major, i.e. shape = (time, batch, dim). Here we make + # both weights and labels time-major with a transpose, and then merge the + # time and batch dimensions such that they are both vectors of shape + # (time*batch). + w = batch.sequences[data_utils.SequenceWrapper.F_WEIGHT] + w = tf.transpose(w, [1, 0]) + w = tf.reshape(w, [-1]) + self._weights = w + + l = batch.sequences[data_utils.SequenceWrapper.F_LABEL] + l = tf.transpose(l, [1, 0]) + l = tf.reshape(l, [-1]) + self._labels = l + + @property + def tokens(self): + return self._tokens + + @property + def weights(self): + return self._weights + + @property + def labels(self): + return self._labels + + @property + def length(self): + return self._batch.length + + @property + def state_name(self): + return self._state_name + + @property + def state(self): + # LSTM tuple states + state_names = _get_tuple_state_names(self._num_states, self._state_name) + return tuple([ + tf.contrib.rnn.LSTMStateTuple( + self._batch.state(c_name), self._batch.state(h_name)) + for c_name, h_name in state_names + ]) + + def save_state(self, value): + # LSTM tuple states + state_names = _get_tuple_state_names(self._num_states, self._state_name) + save_ops = [] + for (c_state, h_state), (c_name, h_name) in zip(value, state_names): + save_ops.append(self._batch.save_state(c_name, c_state)) + save_ops.append(self._batch.save_state(h_name, h_state)) + return tf.group(*save_ops) + + +def _get_tuple_state_names(num_states, base_name): + """Returns state names for use with LSTM tuple state.""" + state_names = [('{}_{}_c'.format(i, base_name), '{}_{}_h'.format( + i, base_name)) for i in range(num_states)] + return state_names + + +def _split_bidir_tokens(batch): + tokens = batch.sequences[data_utils.SequenceWrapper.F_TOKEN_ID] + # Tokens have shape [batch, time, 2] + # forward and reverse have shape [batch, time]. + forward, reverse = [ + tf.squeeze(t, axis=[2]) for t in tf.split(tokens, 2, axis=2) + ] + return forward, reverse + + +def _filenames_for_data_spec(phase, bidir, pretrain, use_seq2seq): + """Returns input filenames for configuration. + + Args: + phase: str, 'train', 'test', or 'valid'. + bidir: bool, bidirectional model. + pretrain: bool, pretraining or classification. + use_seq2seq: bool, seq2seq data, only valid if pretrain=True. + + Returns: + Tuple of filenames. + + Raises: + ValueError: if an invalid combination of arguments is provided that does not + map to any data files (e.g. pretrain=False, use_seq2seq=True). + """ + data_spec = (phase, bidir, pretrain, use_seq2seq) + data_specs = { + ('train', True, True, False): (data_utils.TRAIN_LM, + data_utils.TRAIN_REV_LM), + ('train', True, False, False): (data_utils.TRAIN_BD_CLASS,), + ('train', False, True, False): (data_utils.TRAIN_LM,), + ('train', False, True, True): (data_utils.TRAIN_SA,), + ('train', False, False, False): (data_utils.TRAIN_CLASS,), + ('test', True, True, False): (data_utils.TEST_LM, + data_utils.TRAIN_REV_LM), + ('test', True, False, False): (data_utils.TEST_BD_CLASS,), + ('test', False, True, False): (data_utils.TEST_LM,), + ('test', False, True, True): (data_utils.TEST_SA,), + ('test', False, False, False): (data_utils.TEST_CLASS,), + ('valid', True, False, False): (data_utils.VALID_BD_CLASS,), + ('valid', False, False, False): (data_utils.VALID_CLASS,), + } + if data_spec not in data_specs: + raise ValueError( + 'Data specification (phase, bidir, pretrain, use_seq2seq) %s not ' + 'supported' % str(data_spec)) + + return data_specs[data_spec] + + +def _read_single_sequence_example(file_list, tokens_shape=None): + """Reads and parses SequenceExamples from TFRecord-encoded file_list.""" + tf.logging.info('Constructing TFRecordReader from files: %s', file_list) + file_queue = tf.train.string_input_producer(file_list) + reader = tf.TFRecordReader() + seq_key, serialized_record = reader.read(file_queue) + ctx, sequence = tf.parse_single_sequence_example( + serialized_record, + sequence_features={ + data_utils.SequenceWrapper.F_TOKEN_ID: + tf.FixedLenSequenceFeature(tokens_shape or [], dtype=tf.int64), + data_utils.SequenceWrapper.F_LABEL: + tf.FixedLenSequenceFeature([], dtype=tf.int64), + data_utils.SequenceWrapper.F_WEIGHT: + tf.FixedLenSequenceFeature([], dtype=tf.float32), + }) + return seq_key, ctx, sequence + + +def _read_and_batch(data_dir, + fname, + state_name, + state_size, + num_layers, + unroll_steps, + batch_size, + bidir_input=False): + """Inputs for text model. + + Args: + data_dir: str, directory containing TFRecord files of SequenceExample. + fname: str, input file name. + state_name: string, key for saved state of LSTM. + state_size: int, size of LSTM state. + num_layers: int, the number of layers in the LSTM. + unroll_steps: int, number of timesteps to unroll for TBTT. + batch_size: int, batch size. + bidir_input: bool, whether the input is bidirectional. If True, creates 2 + states, state_name and state_name + '_reverse'. + + Returns: + Instance of NextQueuedSequenceBatch + + Raises: + ValueError: if file for input specification is not found. + """ + data_path = os.path.join(data_dir, fname) + if not tf.gfile.Exists(data_path): + raise ValueError('Failed to find file: %s' % data_path) + + tokens_shape = [2] if bidir_input else [] + seq_key, ctx, sequence = _read_single_sequence_example( + [data_path], tokens_shape=tokens_shape) + # Set up stateful queue reader. + state_names = _get_tuple_state_names(num_layers, state_name) + initial_states = {} + for c_state, h_state in state_names: + initial_states[c_state] = tf.zeros(state_size) + initial_states[h_state] = tf.zeros(state_size) + if bidir_input: + rev_state_names = _get_tuple_state_names(num_layers, + '{}_reverse'.format(state_name)) + for rev_c_state, rev_h_state in rev_state_names: + initial_states[rev_c_state] = tf.zeros(state_size) + initial_states[rev_h_state] = tf.zeros(state_size) + batch = tf.contrib.training.batch_sequences_with_states( + input_key=seq_key, + input_sequences=sequence, + input_context=ctx, + input_length=tf.shape(sequence['token_id'])[0], + initial_states=initial_states, + num_unroll=unroll_steps, + batch_size=batch_size, + allow_small_batch=False, + num_threads=4, + capacity=batch_size * 10, + make_keys_unique=True, + make_keys_unique_seed=29392) + return batch + + +def inputs(data_dir=None, + phase='train', + bidir=False, + pretrain=False, + use_seq2seq=False, + state_name='lstm', + state_size=None, + num_layers=0, + batch_size=32, + unroll_steps=100): + """Inputs for text model. + + Args: + data_dir: str, directory containing TFRecord files of SequenceExample. + phase: str, dataset for evaluation {'train', 'valid', 'test'}. + bidir: bool, bidirectional LSTM. + pretrain: bool, whether to read pretraining data or classification data. + use_seq2seq: bool, whether to read seq2seq data or the language model data. + state_name: string, key for saved state of LSTM. + state_size: int, size of LSTM state. + num_layers: int, the number of LSTM layers. + batch_size: int, batch size. + unroll_steps: int, number of timesteps to unroll for TBTT. + + Returns: + Instance of VatxtInput (x2 if bidir=True and pretrain=True, i.e. forward and + reverse). + """ + with tf.name_scope('inputs'): + filenames = _filenames_for_data_spec(phase, bidir, pretrain, use_seq2seq) + + if bidir and pretrain: + # Bidirectional pretraining + # Requires separate forward and reverse language model data. + forward_fname, reverse_fname = filenames + forward_batch = _read_and_batch(data_dir, forward_fname, state_name, + state_size, num_layers, unroll_steps, + batch_size) + state_name_rev = state_name + '_reverse' + reverse_batch = _read_and_batch(data_dir, reverse_fname, state_name_rev, + state_size, num_layers, unroll_steps, + batch_size) + forward_input = VatxtInput( + forward_batch, state_name=state_name, num_states=num_layers) + reverse_input = VatxtInput( + reverse_batch, state_name=state_name_rev, num_states=num_layers) + return forward_input, reverse_input + + elif bidir: + # Classifier bidirectional LSTM + # Shared data source, but separate token/state streams + fname, = filenames + batch = _read_and_batch( + data_dir, + fname, + state_name, + state_size, + num_layers, + unroll_steps, + batch_size, + bidir_input=True) + forward_tokens, reverse_tokens = _split_bidir_tokens(batch) + forward_input = VatxtInput( + batch, + state_name=state_name, + tokens=forward_tokens, + num_states=num_layers) + reverse_input = VatxtInput( + batch, + state_name=state_name + '_reverse', + tokens=reverse_tokens, + num_states=num_layers) + return forward_input, reverse_input + else: + # Unidirectional LM or classifier + fname, = filenames + batch = _read_and_batch( + data_dir, + fname, + state_name, + state_size, + num_layers, + unroll_steps, + batch_size, + bidir_input=False) + return VatxtInput(batch, state_name=state_name, num_states=num_layers) diff --git a/adversarial_text/layers.py b/adversarial_text/layers.py new file mode 100644 index 0000000000000000000000000000000000000000..c560be306494dd0bae43d4fefe3164b7852b1495 --- /dev/null +++ b/adversarial_text/layers.py @@ -0,0 +1,387 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Layers for VatxtModel.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +K = tf.contrib.keras + + +def cl_logits_subgraph(layer_sizes, input_size, num_classes, keep_prob=1.): + """Construct multiple ReLU layers with dropout and a linear layer.""" + subgraph = K.models.Sequential(name='cl_logits') + for i, layer_size in enumerate(layer_sizes): + if i == 0: + subgraph.add( + K.layers.Dense(layer_size, activation='relu', input_dim=input_size)) + else: + subgraph.add(K.layers.Dense(layer_size, activation='relu')) + + if keep_prob < 1.: + subgraph.add(K.layers.Dropout(keep_prob)) + subgraph.add(K.layers.Dense(1 if num_classes == 2 else num_classes)) + return subgraph + + +class Embedding(K.layers.Layer): + """Embedding layer with frequency-based normalization and dropout.""" + + def __init__(self, + vocab_size, + embedding_dim, + normalize=False, + vocab_freqs=None, + keep_prob=1., + **kwargs): + self.vocab_size = vocab_size + self.embedding_dim = embedding_dim + self.normalized = normalize + self.keep_prob = keep_prob + + if normalize: + assert vocab_freqs is not None + self.vocab_freqs = tf.constant( + vocab_freqs, dtype=tf.float32, shape=(vocab_size, 1)) + + super(Embedding, self).__init__(**kwargs) + + def build(self, input_shape): + with tf.device('/cpu:0'): + self.var = self.add_weight( + shape=(self.vocab_size, self.embedding_dim), + initializer=tf.random_uniform_initializer(-1., 1.), + name='embedding') + + if self.normalized: + self.var = self._normalize(self.var) + + super(Embedding, self).build(input_shape) + + def call(self, x): + embedded = tf.nn.embedding_lookup(self.var, x) + if self.keep_prob < 1.: + embedded = tf.nn.dropout(embedded, self.keep_prob) + return embedded + + def _normalize(self, emb): + weights = self.vocab_freqs / tf.reduce_sum(self.vocab_freqs) + mean = tf.reduce_sum(weights * emb, 0, keep_dims=True) + var = tf.reduce_sum(weights * tf.pow(emb - mean, 2.), 0, keep_dims=True) + stddev = tf.sqrt(1e-6 + var) + return (emb - mean) / stddev + + +class LSTM(object): + """LSTM layer using static_rnn. + + Exposes variables in `trainable_weights` property. + """ + + def __init__(self, cell_size, num_layers=1, keep_prob=1., name='LSTM'): + self.cell_size = cell_size + self.num_layers = num_layers + self.keep_prob = keep_prob + self.reuse = None + self.trainable_weights = None + self.name = name + + def __call__(self, x, initial_state, seq_length): + with tf.variable_scope(self.name, reuse=self.reuse) as vs: + cell = tf.contrib.rnn.MultiRNNCell([ + tf.contrib.rnn.BasicLSTMCell( + self.cell_size, + forget_bias=0.0, + reuse=tf.get_variable_scope().reuse) + for _ in xrange(self.num_layers) + ]) + + # shape(x) = (batch_size, num_timesteps, embedding_dim) + # Convert into a time-major list for static_rnn + x = tf.unstack(tf.transpose(x, perm=[1, 0, 2])) + + lstm_out, next_state = tf.contrib.rnn.static_rnn( + cell, x, initial_state=initial_state, sequence_length=seq_length) + + # Merge time and batch dimensions + # shape(lstm_out) = timesteps * (batch_size, cell_size) + lstm_out = tf.concat(lstm_out, 0) + # shape(lstm_out) = (timesteps*batch_size, cell_size) + + if self.keep_prob < 1.: + lstm_out = tf.nn.dropout(lstm_out, self.keep_prob) + + if self.reuse is None: + self.trainable_weights = vs.global_variables() + + self.reuse = True + + return lstm_out, next_state + + +class SoftmaxLoss(K.layers.Layer): + """Softmax xentropy loss with candidate sampling.""" + + def __init__(self, + vocab_size, + num_candidate_samples=-1, + vocab_freqs=None, + **kwargs): + self.vocab_size = vocab_size + self.num_candidate_samples = num_candidate_samples + self.vocab_freqs = vocab_freqs + super(SoftmaxLoss, self).__init__(**kwargs) + + def build(self, input_shape): + input_shape = input_shape[0] + with tf.device('/cpu:0'): + self.lin_w = self.add_weight( + shape=(input_shape[-1], self.vocab_size), + name='lm_lin_w', + initializer='glorot_uniform') + self.lin_b = self.add_weight( + shape=(self.vocab_size,), + name='lm_lin_b', + initializer='glorot_uniform') + + super(SoftmaxLoss, self).build(input_shape) + + def call(self, inputs): + x, labels, weights = inputs + if self.num_candidate_samples > -1: + assert self.vocab_freqs is not None + labels = tf.expand_dims(labels, -1) + sampled = tf.nn.fixed_unigram_candidate_sampler( + true_classes=labels, + num_true=1, + num_sampled=self.num_candidate_samples, + unique=True, + range_max=self.vocab_size, + unigrams=self.vocab_freqs) + + lm_loss = tf.nn.sampled_softmax_loss( + weights=tf.transpose(self.lin_w), + biases=self.lin_b, + labels=labels, + inputs=x, + num_sampled=self.num_candidate_samples, + num_classes=self.vocab_size, + sampled_values=sampled) + else: + logits = tf.matmul(x, self.lin_w) + self.lin_b + lm_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels) + + lm_loss = tf.identity( + tf.reduce_sum(lm_loss * weights) / _num_labels(weights), + name='lm_xentropy_loss') + return lm_loss + + +def classification_loss(logits, labels, weights): + """Computes cross entropy loss between logits and labels. + + Args: + logits: 2-D [timesteps*batch_size, m] float tensor, where m=1 if + num_classes=2, otherwise m=num_classes. + labels: 1-D [timesteps*batch_size] integer tensor. + weights: 1-D [timesteps*batch_size] float tensor. + + Returns: + Loss scalar of type float. + """ + inner_dim = logits.get_shape().as_list()[-1] + with tf.name_scope('classifier_loss'): + # Logistic loss + if inner_dim == 1: + loss = tf.nn.sigmoid_cross_entropy_with_logits( + logits=tf.squeeze(logits), labels=tf.cast(labels, tf.float32)) + # Softmax loss + else: + loss = tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels) + + num_lab = _num_labels(weights) + tf.summary.scalar('num_labels', num_lab) + return tf.identity( + tf.reduce_sum(weights * loss) / num_lab, name='classification_xentropy') + + +def accuracy(logits, targets, weights): + """Computes prediction accuracy. + + Args: + logits: 2-D classifier logits [timesteps*batch_size, num_classes] + targets: 1-D [timesteps*batch_size] integer tensor. + weights: 1-D [timesteps*batch_size] float tensor. + + Returns: + Accuracy: float scalar. + """ + with tf.name_scope('accuracy'): + eq = tf.cast(tf.equal(predictions(logits), targets), tf.float32) + return tf.identity( + tf.reduce_sum(weights * eq) / _num_labels(weights), name='accuracy') + + +def predictions(logits): + """Class prediction from logits.""" + inner_dim = logits.get_shape().as_list()[-1] + with tf.name_scope('predictions'): + # For binary classification + if inner_dim == 1: + pred = tf.cast(tf.greater(tf.squeeze(logits), 0.5), tf.int64) + # For multi-class classification + else: + pred = tf.argmax(logits, 1) + return pred + + +def _num_labels(weights): + """Number of 1's in weights. Returns 1. if 0.""" + num_labels = tf.reduce_sum(weights) + num_labels = tf.where(tf.equal(num_labels, 0.), 1., num_labels) + return num_labels + + +def optimize(loss, + global_step, + max_grad_norm, + lr, + lr_decay, + sync_replicas=False, + replicas_to_aggregate=1, + task_id=0): + """Builds optimization graph. + + * Creates an optimizer, and optionally wraps with SyncReplicasOptimizer + * Computes, clips, and applies gradients + * Maintains moving averages for all trainable variables + * Summarizes variables and gradients + + Args: + loss: scalar loss to minimize. + global_step: integer scalar Variable. + max_grad_norm: float scalar. Grads will be clipped to this value. + lr: float scalar, learning rate. + lr_decay: float scalar, learning rate decay rate. + sync_replicas: bool, whether to use SyncReplicasOptimizer. + replicas_to_aggregate: int, number of replicas to aggregate when using + SyncReplicasOptimizer. + task_id: int, id of the current task; used to ensure proper initialization + of SyncReplicasOptimizer. + + Returns: + train_op + """ + with tf.name_scope('optimization'): + # Compute gradients. + tvars = tf.trainable_variables() + grads = tf.gradients( + loss, + tvars, + aggregation_method=tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N) + + # Clip non-embedding grads + non_embedding_grads_and_vars = [(g, v) for (g, v) in zip(grads, tvars) + if 'embedding' not in v.op.name] + embedding_grads_and_vars = [(g, v) for (g, v) in zip(grads, tvars) + if 'embedding' in v.op.name] + + ne_grads, ne_vars = zip(*non_embedding_grads_and_vars) + ne_grads, _ = tf.clip_by_global_norm(ne_grads, max_grad_norm) + non_embedding_grads_and_vars = zip(ne_grads, ne_vars) + + grads_and_vars = embedding_grads_and_vars + non_embedding_grads_and_vars + + # Summarize + _summarize_vars_and_grads(grads_and_vars) + + # Decaying learning rate + lr = tf.train.exponential_decay( + lr, global_step, 1, lr_decay, staircase=True) + tf.summary.scalar('learning_rate', lr) + opt = tf.train.AdamOptimizer(lr) + + # Track the moving averages of all trainable variables. + variable_averages = tf.train.ExponentialMovingAverage(0.999, global_step) + + # Apply gradients + if sync_replicas: + opt = tf.train.SyncReplicasOptimizer( + opt, + replicas_to_aggregate, + variable_averages=variable_averages, + variables_to_average=tvars, + total_num_replicas=replicas_to_aggregate) + apply_gradient_op = opt.apply_gradients( + grads_and_vars, global_step=global_step) + with tf.control_dependencies([apply_gradient_op]): + train_op = tf.no_op(name='train_op') + + # Initialization ops + tf.add_to_collection(tf.GraphKeys.QUEUE_RUNNERS, + opt.get_chief_queue_runner()) + if task_id == 0: # Chief task + local_init_op = opt.chief_init_op + tf.add_to_collection('chief_init_op', opt.get_init_tokens_op()) + else: + local_init_op = opt.local_step_init_op + tf.add_to_collection('local_init_op', local_init_op) + tf.add_to_collection('ready_for_local_init_op', + opt.ready_for_local_init_op) + else: + # Non-sync optimizer + variables_averages_op = variable_averages.apply(tvars) + apply_gradient_op = opt.apply_gradients(grads_and_vars, global_step) + with tf.control_dependencies([apply_gradient_op, variables_averages_op]): + train_op = tf.no_op(name='train_op') + + return train_op + + +def _summarize_vars_and_grads(grads_and_vars): + tf.logging.info('Trainable variables:') + tf.logging.info('-' * 60) + for grad, var in grads_and_vars: + tf.logging.info(var) + + def tag(name, v=var): + return v.op.name + '_' + name + + # Variable summary + mean = tf.reduce_mean(var) + tf.summary.scalar(tag('mean'), mean) + with tf.name_scope(tag('stddev')): + stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) + tf.summary.scalar(tag('stddev'), stddev) + tf.summary.scalar(tag('max'), tf.reduce_max(var)) + tf.summary.scalar(tag('min'), tf.reduce_min(var)) + tf.summary.histogram(tag('histogram'), var) + + # Gradient summary + if grad is not None: + if isinstance(grad, tf.IndexedSlices): + grad_values = grad.values + else: + grad_values = grad + + tf.summary.histogram(tag('gradient'), grad_values) + tf.summary.scalar(tag('gradient_norm'), tf.global_norm([grad_values])) + else: + tf.logging.info('Var %s has no gradient', var.op.name) diff --git a/adversarial_text/pretrain.py b/adversarial_text/pretrain.py new file mode 100644 index 0000000000000000000000000000000000000000..25d6a47669ab4e2a6042ba97a013e9e13ce26bb8 --- /dev/null +++ b/adversarial_text/pretrain.py @@ -0,0 +1,45 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Pretrains a recurrent language model. + +Computational time: + 5 days to train 100000 steps on 1 layer 1024 hidden units LSTM, + 256 embeddings, 400 truncated BP, 64 minibatch and on 4 GPU with + SyncReplicasOptimizer, that is the total minibatch is 256. +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +import graphs +import train_utils + +FLAGS = tf.app.flags.FLAGS + + +def main(_): + """Trains Language Model.""" + tf.logging.set_verbosity(tf.logging.INFO) + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks)): + model = graphs.get_model() + train_op, loss, global_step = model.language_model_training() + train_utils.run_training(train_op, loss, global_step) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/train_classifier.py b/adversarial_text/train_classifier.py new file mode 100644 index 0000000000000000000000000000000000000000..94fba3f6f67330929add25beda5799e9b0cc0d2a --- /dev/null +++ b/adversarial_text/train_classifier.py @@ -0,0 +1,62 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Trains LSTM text classification model. + +Model trains with adversarial or virtual adversarial training. + +Computational time: + 6 hours to train 10000 steps without adversarial or virtual adversarial + training, on 1 layer 1024 hidden units LSTM, 256 embeddings, 400 truncated + BP, 64 minibatch and on single GPU. + + 12 hours to train 10000 steps with adversarial or virtual adversarial + training, with above condition. + +To initialize embedding and LSTM cell weights from a pretrained model, set +FLAGS.pretrained_model_dir to the pretrained model's checkpoint directory. +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +import graphs +import train_utils + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('pretrained_model_dir', None, + 'Directory path to pretrained model to restore from') + + +def main(_): + """Trains LSTM classification model.""" + tf.logging.set_verbosity(tf.logging.INFO) + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks)): + model = graphs.get_model() + train_op, loss, global_step = model.classifier_training() + train_utils.run_training( + train_op, + loss, + global_step, + variables_to_restore=model.pretrained_variables, + pretrained_model_dir=FLAGS.pretrained_model_dir) + + +if __name__ == '__main__': + tf.app.run() diff --git a/adversarial_text/train_utils.py b/adversarial_text/train_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..91104a1352c3878b5a40cfc8563e9a314f61e6ee --- /dev/null +++ b/adversarial_text/train_utils.py @@ -0,0 +1,133 @@ +# Copyright 2017 Google, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for training adversarial text models.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import time + +import numpy as np +import tensorflow as tf + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', '', 'Master address.') +flags.DEFINE_integer('task', 0, 'Task id of the replica running the training.') +flags.DEFINE_integer('ps_tasks', 0, 'Number of parameter servers.') +flags.DEFINE_string('train_dir', '/tmp/text_train', + 'Directory for logs and checkpoints.') +flags.DEFINE_integer('max_steps', 1000000, 'Number of batches to run.') +flags.DEFINE_boolean('log_device_placement', False, + 'Whether to log device placement.') + + +def run_training(train_op, + loss, + global_step, + variables_to_restore=None, + pretrained_model_dir=None): + """Sets up and runs training loop.""" + tf.gfile.MakeDirs(FLAGS.train_dir) + + # Create pretrain Saver + if pretrained_model_dir: + assert variables_to_restore + tf.logging.info('Will attempt restore from %s: %s', pretrained_model_dir, + variables_to_restore) + saver_for_restore = tf.train.Saver(variables_to_restore) + + # Init ops + if FLAGS.sync_replicas: + local_init_op = tf.get_collection('local_init_op')[0] + ready_for_local_init_op = tf.get_collection('ready_for_local_init_op')[0] + else: + local_init_op = tf.train.Supervisor.USE_DEFAULT + ready_for_local_init_op = tf.train.Supervisor.USE_DEFAULT + + is_chief = FLAGS.task == 0 + sv = tf.train.Supervisor( + logdir=FLAGS.train_dir, + is_chief=is_chief, + save_summaries_secs=5 * 60, + save_model_secs=5 * 60, + local_init_op=local_init_op, + ready_for_local_init_op=ready_for_local_init_op, + global_step=global_step) + + # Delay starting standard services to allow possible pretrained model restore. + with sv.managed_session( + master=FLAGS.master, + config=tf.ConfigProto(log_device_placement=FLAGS.log_device_placement), + start_standard_services=False) as sess: + # Initialization + if is_chief: + if pretrained_model_dir: + maybe_restore_pretrained_model(sess, saver_for_restore, + pretrained_model_dir) + if FLAGS.sync_replicas: + sess.run(tf.get_collection('chief_init_op')[0]) + sv.start_standard_services(sess) + + sv.start_queue_runners(sess) + + # Training loop + global_step_val = 0 + while not sv.should_stop() and global_step_val < FLAGS.max_steps: + global_step_val = train_step(sess, train_op, loss, global_step) + sv.stop() + + # Final checkpoint + if is_chief: + sv.saver.save(sess, sv.save_path, global_step=global_step) + + +def maybe_restore_pretrained_model(sess, saver_for_restore, model_dir): + """Restores pretrained model if there is no ckpt model.""" + ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir) + checkpoint_exists = ckpt and ckpt.model_checkpoint_path + if checkpoint_exists: + tf.logging.info('Checkpoint exists in FLAGS.train_dir; skipping ' + 'pretraining restore') + return + + pretrain_ckpt = tf.train.get_checkpoint_state(model_dir) + if not (pretrain_ckpt and pretrain_ckpt.model_checkpoint_path): + raise ValueError( + 'Asked to restore model from %s but no checkpoint found.' % model_dir) + saver_for_restore.restore(sess, pretrain_ckpt.model_checkpoint_path) + + +def train_step(sess, train_op, loss, global_step): + """Runs a single training step.""" + start_time = time.time() + _, loss_val, global_step_val = sess.run([train_op, loss, global_step]) + duration = time.time() - start_time + + # Logging + if global_step_val % 10 == 0: + examples_per_sec = FLAGS.batch_size / duration + sec_per_batch = float(duration) + + format_str = ('step %d, loss = %.2f (%.1f examples/sec; %.3f ' 'sec/batch)') + tf.logging.info(format_str % (global_step_val, loss_val, examples_per_sec, + sec_per_batch)) + + if np.isnan(loss_val): + raise OverflowError('Loss is nan') + + return global_step_val diff --git a/attention_ocr/README.md b/attention_ocr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1079bb74073c4e4d355c9f15c046bdfd99b696d6 --- /dev/null +++ b/attention_ocr/README.md @@ -0,0 +1,179 @@ +## Attention-based Extraction of Structured Information from Street View Imagery + +*A TensorFlow model for real-world image text extraction problems.* + +This folder contains the code needed to train a new Attention OCR model on the +[FSNS dataset][FSNS] dataset to transcribe street names in France. You can +also use it to train it on your own data. + +More details can be found in our paper: + +["Attention-based Extraction of Structured Information from Street View +Imagery"](https://arxiv.org/abs/1704.03549) + +## Contacts + +Authors: +Zbigniew Wojna , +Alexander Gorban + +Pull requests: +[alexgorban](https://github.com/alexgorban) + +## Requirements + +1. Install the TensorFlow library ([instructions][TF]). For example: + +``` +virtualenv --system-site-packages ~/.tensorflow +source ~/.tensorflow/bin/activate +pip install --upgrade pip +pip install --upgrade tensorflow_gpu +``` + +2. At least 158GB of free disk space to download the FSNS dataset: + +``` +cd models/attention_ocr/python/datasets +aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt +cd .. +``` + +3. 16GB of RAM or more; 32GB is recommended. +4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980. + +[TF]: https://www.tensorflow.org/install/ +[FSNS]: https://github.com/tensorflow/models/tree/master/street + +## How to use this code + +To run all unit tests: + +``` +cd models/attention_ocr/python +python -m unittest discover -p '*_test.py' +``` + +To train from scratch: + +``` +python train.py +``` + +To train a model using pre-trained Inception weights as initialization: + +``` +wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz +tar xf inception_v3_2016_08_28.tar.gz +python train.py --checkpoint_inception=inception_v3.ckpt +``` + +To fine tune the Attention OCR model using a checkpoint: + +``` +wget http://download.tensorflow.org/models/attention_ocr_2017_05_17.tar.gz +tar xf attention_ocr_2017_05_17.tar.gz +python train.py --checkpoint=model.ckpt-399731 +``` + +## How to use your own image data to train the model + +You need to define a new dataset. There are two options: + +1. Store data in the same format as the FSNS dataset and just reuse the +[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/attention_ocr/python/datasets/fsns.py) +module. E.g., create a file datasets/newtextdataset.py: +``` +import fsns + +DEFAULT_DATASET_DIR = 'path/to/the/dataset' + +DEFAULT_CONFIG = { + 'name': + 'MYDATASET', + 'splits': { + 'train': { + 'size': 123, + 'pattern': 'tfexample_train*' + }, + 'test': { + 'size': 123, + 'pattern': 'tfexample_test*' + } + }, + 'charset_filename': + 'charset_size.txt', + 'image_shape': (150, 600, 3), + 'num_of_views': + 4, + 'max_sequence_length': + 37, + 'null_code': + 42, + 'items_to_descriptions': { + 'image': + 'A [150 x 600 x 3] color image.', + 'label': + 'Characters codes.', + 'text': + 'A unicode string.', + 'length': + 'A length of the encoded text.', + 'num_of_views': + 'A number of different views stored within the image.' + } +} + + +def get_split(split_name, dataset_dir=None, config=None): + if not dataset_dir: + dataset_dir = DEFAULT_DATASET_DIR + if not config: + config = DEFAULT_CONFIG + + return fsns.get_split(split_name, dataset_dir, config) +``` +You will also need to include it into the `datasets/__init__.py` and specify the +dataset name in the command line. + +``` +python train.py --dataset_name=newtextdataset +``` + +Please note that eval.py will also require the same flag. + +2. Define a new dataset format. The model needs the following data to train: + +- images: input images, shape [batch_size x H x W x 3]; +- labels: ground truth label ids, shape=[batch_size x seq_length]; +- labels_one_hot: labels in one-hot encoding, shape [batch_size x seq_length x num_char_classes]; + +Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/attention_ocr/python/data_provider.py#L33) +for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/attention_ocr/python/datasets/fsns.py) +as the example. + +## How to use a pre-trained model + +The inference part was not released yet, but it is pretty straightforward to +implement one in Python or C++. + +The recommended way is to use the [Serving infrastructure](https://tensorflow.github.io/serving/serving_basic). + +Alternatively you can: +1. define a placeholder for images (or use directly an numpy array) +2. [create a graph ](https://github.com/tensorflow/models/blob/master/attention_ocr/python/eval.py#L60) +`endpoints = model.create_base(images_placeholder, labels_one_hot=None)` +3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/attention_ocr/python/model.py#L494) +4. run computations through the graph: +`predictions = sess.run(endpoints.predicted_chars, feed_dict={images_placeholder:images_actual_data})` +5. Convert character IDs (predictions) to UTF8 using the provided charset file. + +## Disclaimer + +This code is a modified version of the internal model we used for our paper. +Currently it reaches 83.79% full sequence accuracy after 400k steps of training. +The main difference between this version and the version used in the paper - for +the paper we used a distributed training with 50 GPU (K80) workers (asynchronous +updates), the provided checkpoint was created using this code after ~6 days of +training on a single GPU (Titan X) (it reached 81% after 24 hours of training), +the coordinate encoding is missing TODO(alexgorban@). diff --git a/attention_ocr/python/all_jobs.screenrc b/attention_ocr/python/all_jobs.screenrc new file mode 100644 index 0000000000000000000000000000000000000000..ef7fdf237387c95eeb9a61e507b1c74db212502d --- /dev/null +++ b/attention_ocr/python/all_jobs.screenrc @@ -0,0 +1,9 @@ +# A GPU/screen config to run all jobs for training and evaluation in parallel. +# Execute: +# source /path/to/your/virtualenv/bin/activate +# screen -R TF -c all_jobs.screenrc + +screen -t train 0 python train.py --train_log_dir=workdir/train +screen -t eval_train 1 python eval.py --split_name=train --train_log_dir=workdir/train --eval_log_dir=workdir/eval_train +screen -t eval_test 2 python eval.py --split_name=test --train_log_dir=workdir/train --eval_log_dir=workdir/eval_test +screen -t tensorboard 3 tensorboard --logdir=workdir diff --git a/attention_ocr/python/common_flags.py b/attention_ocr/python/common_flags.py new file mode 100644 index 0000000000000000000000000000000000000000..996bf4c6c0e9aa67135e7a6f4b47d64b1e1f9e41 --- /dev/null +++ b/attention_ocr/python/common_flags.py @@ -0,0 +1,149 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Define flags are common for both train.py and eval.py scripts.""" +import sys + +from tensorflow.python.platform import flags +import logging + +import datasets +import model + +FLAGS = flags.FLAGS + +logging.basicConfig( + level=logging.DEBUG, + stream=sys.stderr, + format='%(levelname)s ' + '%(asctime)s.%(msecs)06d: ' + '%(filename)s: ' + '%(lineno)d ' + '%(message)s', + datefmt='%Y-%m-%d %H:%M:%S') + + +def define(): + """Define common flags.""" + # yapf: disable + flags.DEFINE_integer('batch_size', 32, + 'Batch size.') + + flags.DEFINE_integer('crop_width', None, + 'Width of the central crop for images.') + + flags.DEFINE_integer('crop_height', None, + 'Height of the central crop for images.') + + flags.DEFINE_string('train_log_dir', '/tmp/attention_ocr/train', + 'Directory where to write event logs.') + + flags.DEFINE_string('dataset_name', 'fsns', + 'Name of the dataset. Supported: fsns') + + flags.DEFINE_string('split_name', 'train', + 'Dataset split name to run evaluation for: test,train.') + + flags.DEFINE_string('dataset_dir', None, + 'Dataset root folder.') + + flags.DEFINE_string('checkpoint', '', + 'Path for checkpoint to restore weights from.') + + flags.DEFINE_string('master', + '', + 'BNS name of the TensorFlow master to use.') + + # Model hyper parameters + flags.DEFINE_float('learning_rate', 0.004, + 'learning rate') + + flags.DEFINE_string('optimizer', 'momentum', + 'the optimizer to use') + + flags.DEFINE_string('momentum', 0.9, + 'momentum value for the momentum optimizer if used') + + flags.DEFINE_bool('use_augment_input', True, + 'If True will use image augmentation') + + # Method hyper parameters + # conv_tower_fn + flags.DEFINE_string('final_endpoint', 'Mixed_5d', + 'Endpoint to cut inception tower') + + # sequence_logit_fn + flags.DEFINE_bool('use_attention', True, + 'If True will use the attention mechanism') + + flags.DEFINE_bool('use_autoregression', True, + 'If True will use autoregression (a feedback link)') + + flags.DEFINE_integer('num_lstm_units', 256, + 'number of LSTM units for sequence LSTM') + + flags.DEFINE_float('weight_decay', 0.00004, + 'weight decay for char prediction FC layers') + + flags.DEFINE_float('lstm_state_clip_value', 10.0, + 'cell state is clipped by this value prior to the cell' + ' output activation') + + # 'sequence_loss_fn' + flags.DEFINE_float('label_smoothing', 0.1, + 'weight for label smoothing') + + flags.DEFINE_bool('ignore_nulls', True, + 'ignore null characters for computing the loss') + + flags.DEFINE_bool('average_across_timesteps', False, + 'divide the returned cost by the total label weight') + # yapf: enable + + +def get_crop_size(): + if FLAGS.crop_width and FLAGS.crop_height: + return (FLAGS.crop_width, FLAGS.crop_height) + else: + return None + + +def create_dataset(split_name): + ds_module = getattr(datasets, FLAGS.dataset_name) + return ds_module.get_split(split_name, dataset_dir=FLAGS.dataset_dir) + + +def create_mparams(): + return { + 'conv_tower_fn': + model.ConvTowerParams(final_endpoint=FLAGS.final_endpoint), + 'sequence_logit_fn': + model.SequenceLogitsParams( + use_attention=FLAGS.use_attention, + use_autoregression=FLAGS.use_autoregression, + num_lstm_units=FLAGS.num_lstm_units, + weight_decay=FLAGS.weight_decay, + lstm_state_clip_value=FLAGS.lstm_state_clip_value), + 'sequence_loss_fn': + model.SequenceLossParams( + label_smoothing=FLAGS.label_smoothing, + ignore_nulls=FLAGS.ignore_nulls, + average_across_timesteps=FLAGS.average_across_timesteps) + } + + +def create_model(*args, **kwargs): + ocr_model = model.Model(mparams=create_mparams(), *args, **kwargs) + return ocr_model diff --git a/attention_ocr/python/data_provider.py b/attention_ocr/python/data_provider.py new file mode 100644 index 0000000000000000000000000000000000000000..1b1181158385cc181566176ae85b710a291b7826 --- /dev/null +++ b/attention_ocr/python/data_provider.py @@ -0,0 +1,199 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to read, decode and pre-process input data for the Model. +""" +import collections +import functools +import tensorflow as tf +from tensorflow.contrib import slim + +import inception_preprocessing + +# Tuple to store input data endpoints for the Model. +# It has following fields (tensors): +# images: input images, +# shape [batch_size x H x W x 3]; +# labels: ground truth label ids, +# shape=[batch_size x seq_length]; +# labels_one_hot: labels in one-hot encoding, +# shape [batch_size x seq_length x num_char_classes]; +InputEndpoints = collections.namedtuple( + 'InputEndpoints', ['images', 'images_orig', 'labels', 'labels_one_hot']) + +# A namedtuple to define a configuration for shuffled batch fetching. +# num_batching_threads: A number of parallel threads to fetch data. +# queue_capacity: a max number of elements in the batch shuffling queue. +# min_after_dequeue: a min number elements in the queue after a dequeue, used +# to ensure a level of mixing of elements. +ShuffleBatchConfig = collections.namedtuple('ShuffleBatchConfig', [ + 'num_batching_threads', 'queue_capacity', 'min_after_dequeue' +]) + +DEFAULT_SHUFFLE_CONFIG = ShuffleBatchConfig( + num_batching_threads=8, queue_capacity=3000, min_after_dequeue=1000) + + +def augment_image(image): + """Augmentation the image with a random modification. + + Args: + image: input Tensor image of rank 3, with the last dimension + of size 3. + + Returns: + Distorted Tensor image of the same shape. + """ + with tf.variable_scope('AugmentImage'): + height = image.get_shape().dims[0].value + width = image.get_shape().dims[1].value + + # Random crop cut from the street sign image, resized to the same size. + # Assures that the crop is covers at least 0.8 area of the input image. + bbox_begin, bbox_size, _ = tf.image.sample_distorted_bounding_box( + tf.shape(image), + bounding_boxes=tf.zeros([0, 0, 4]), + min_object_covered=0.8, + aspect_ratio_range=[0.8, 1.2], + area_range=[0.8, 1.0], + use_image_if_no_bounding_boxes=True) + distorted_image = tf.slice(image, bbox_begin, bbox_size) + + # Randomly chooses one of the 4 interpolation methods + distorted_image = inception_preprocessing.apply_with_random_selector( + distorted_image, + lambda x, method: tf.image.resize_images(x, [height, width], method), + num_cases=4) + distorted_image.set_shape([height, width, 3]) + + # Color distortion + distorted_image = inception_preprocessing.apply_with_random_selector( + distorted_image, + functools.partial( + inception_preprocessing.distort_color, fast_mode=False), + num_cases=4) + distorted_image = tf.clip_by_value(distorted_image, -1.5, 1.5) + + return distorted_image + + +def central_crop(image, crop_size): + """Returns a central crop for the specified size of an image. + + Args: + image: A tensor with shape [height, width, channels] + crop_size: A tuple (crop_width, crop_height) + + Returns: + A tensor of shape [crop_height, crop_width, channels]. + """ + with tf.variable_scope('CentralCrop'): + target_width, target_height = crop_size + image_height, image_width = tf.shape(image)[0], tf.shape(image)[1] + assert_op1 = tf.Assert( + tf.greater_equal(image_height, target_height), + ['image_height < target_height', image_height, target_height]) + assert_op2 = tf.Assert( + tf.greater_equal(image_width, target_width), + ['image_width < target_width', image_width, target_width]) + with tf.control_dependencies([assert_op1, assert_op2]): + offset_width = (image_width - target_width) / 2 + offset_height = (image_height - target_height) / 2 + return tf.image.crop_to_bounding_box(image, offset_height, offset_width, + target_height, target_width) + + +def preprocess_image(image, augment=False, central_crop_size=None, + num_towers=4): + """Normalizes image to have values in a narrow range around zero. + + Args: + image: a [H x W x 3] uint8 tensor. + augment: optional, if True do random image distortion. + central_crop_size: A tuple (crop_width, crop_height). + num_towers: optional, number of shots of the same image in the input image. + + Returns: + A float32 tensor of shape [H x W x 3] with RGB values in the required + range. + """ + with tf.variable_scope('PreprocessImage'): + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + if augment or central_crop_size: + if num_towers == 1: + images = [image] + else: + images = tf.split(value=image, num_or_size_splits=num_towers, axis=1) + if central_crop_size: + view_crop_size = (central_crop_size[0] / num_towers, + central_crop_size[1]) + images = [central_crop(img, view_crop_size) for img in images] + if augment: + images = [augment_image(img) for img in images] + image = tf.concat(images, 1) + + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.5) + + return image + + +def get_data(dataset, + batch_size, + augment=False, + central_crop_size=None, + shuffle_config=None, + shuffle=True): + """Wraps calls to DatasetDataProviders and shuffle_batch. + + For more details about supported Dataset objects refer to datasets/fsns.py. + + Args: + dataset: a slim.data.dataset.Dataset object. + batch_size: number of samples per batch. + augment: optional, if True does random image distortion. + central_crop_size: A CharLogittuple (crop_width, crop_height). + shuffle_config: A namedtuple ShuffleBatchConfig. + shuffle: if True use data shuffling. + + Returns: + + """ + if not shuffle_config: + shuffle_config = DEFAULT_SHUFFLE_CONFIG + + provider = slim.dataset_data_provider.DatasetDataProvider( + dataset, + shuffle=shuffle, + common_queue_capacity=2 * batch_size, + common_queue_min=batch_size) + image_orig, label = provider.get(['image', 'label']) + + image = preprocess_image( + image_orig, augment, central_crop_size, num_towers=dataset.num_of_views) + label_one_hot = slim.one_hot_encoding(label, dataset.num_char_classes) + + images, images_orig, labels, labels_one_hot = (tf.train.shuffle_batch( + [image, image_orig, label, label_one_hot], + batch_size=batch_size, + num_threads=shuffle_config.num_batching_threads, + capacity=shuffle_config.queue_capacity, + min_after_dequeue=shuffle_config.min_after_dequeue)) + + return InputEndpoints( + images=images, + images_orig=images_orig, + labels=labels, + labels_one_hot=labels_one_hot) diff --git a/attention_ocr/python/data_provider_test.py b/attention_ocr/python/data_provider_test.py new file mode 100644 index 0000000000000000000000000000000000000000..551bc75e02cc470c40aad8a4066b6bba7ceeb62c --- /dev/null +++ b/attention_ocr/python/data_provider_test.py @@ -0,0 +1,72 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for data_provider.""" + +import numpy as np +import tensorflow as tf +from tensorflow.contrib.slim import queues + +import datasets +import data_provider + + +class DataProviderTest(tf.test.TestCase): + def setUp(self): + tf.test.TestCase.setUp(self) + + def test_preprocessed_image_values_are_in_range(self): + image_shape = (5, 4, 3) + fake_image = np.random.randint(low=0, high=255, size=image_shape) + image_tf = data_provider.preprocess_image(fake_image) + + with self.test_session() as sess: + image_np = sess.run(image_tf) + + self.assertEqual(image_np.shape, image_shape) + min_value, max_value = np.min(image_np), np.max(image_np) + self.assertTrue((-1.28 < min_value) and (min_value < 1.27)) + self.assertTrue((-1.28 < max_value) and (max_value < 1.27)) + + def test_provided_data_has_correct_shape(self): + batch_size = 4 + data = data_provider.get_data( + dataset=datasets.fsns_test.get_test_split(), + batch_size=batch_size, + augment=True, + central_crop_size=None) + + with self.test_session() as sess, queues.QueueRunners(sess): + images_np, labels_np = sess.run([data.images, data.labels_one_hot]) + + self.assertEqual(images_np.shape, (batch_size, 150, 600, 3)) + self.assertEqual(labels_np.shape, (batch_size, 37, 134)) + + def test_optionally_applies_central_crop(self): + batch_size = 4 + data = data_provider.get_data( + dataset=datasets.fsns_test.get_test_split(), + batch_size=batch_size, + augment=True, + central_crop_size=(500, 100)) + + with self.test_session() as sess, queues.QueueRunners(sess): + images_np = sess.run(data.images) + + self.assertEqual(images_np.shape, (batch_size, 100, 500, 3)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/tutorials/rnn/linear.py b/attention_ocr/python/datasets/__init__.py similarity index 67% rename from tutorials/rnn/linear.py rename to attention_ocr/python/datasets/__init__.py index 6aad2283668f253191fd3af38ba112d1cd8bf91d..e2fef7b2dd275051861a29c6d4f708162575eac6 100644 --- a/tutorials/rnn/linear.py +++ b/attention_ocr/python/datasets/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2015 The TensorFlow Authors. All Rights Reserved. +# Copyright 2017 The TensorFlow Authors All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,9 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""Import linear python op for backward compatibility.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -raise ImportError("This module is deprecated. Use tf.contrib.layers.linear.") +import fsns +import fsns_test + +__all__ = [fsns, fsns_test] diff --git a/attention_ocr/python/datasets/fsns.py b/attention_ocr/python/datasets/fsns.py new file mode 100644 index 0000000000000000000000000000000000000000..d8dd5efb4eb047889b4f8cdab30a1c872f51f44b --- /dev/null +++ b/attention_ocr/python/datasets/fsns.py @@ -0,0 +1,183 @@ +# -*- coding: utf-8 -*- +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Configuration to read FSNS dataset https://goo.gl/3Ldm8v.""" + +import os +import re +import tensorflow as tf +from tensorflow.contrib import slim +import logging + +DEFAULT_DATASET_DIR = os.path.join(os.path.dirname(__file__), 'data/fsns') + +# The dataset configuration, should be used only as a default value. +DEFAULT_CONFIG = { + 'name': 'FSNS', + 'splits': { + 'train': { + 'size': 1044868, + 'pattern': 'train/train*' + }, + 'test': { + 'size': 20404, + 'pattern': 'test/test*' + }, + 'validation': { + 'size': 16150, + 'pattern': 'validation/validation*' + } + }, + 'charset_filename': 'charset_size=134.txt', + 'image_shape': (150, 600, 3), + 'num_of_views': 4, + 'max_sequence_length': 37, + 'null_code': 133, + 'items_to_descriptions': { + 'image': 'A [150 x 600 x 3] color image.', + 'label': 'Characters codes.', + 'text': 'A unicode string.', + 'length': 'A length of the encoded text.', + 'num_of_views': 'A number of different views stored within the image.' + } +} + + +def read_charset(filename, null_character=u'\u2591'): + """Reads a charset definition from a tab separated text file. + + charset file has to have format compatible with the FSNS dataset. + + Args: + filename: a path to the charset file. + null_character: a unicode character used to replace '' character. the + default value is a light shade block '░'. + + Returns: + a dictionary with keys equal to character codes and values - unicode + characters. + """ + pattern = re.compile(r'(\d+)\t(.+)') + charset = {} + with tf.gfile.GFile(filename) as f: + for i, line in enumerate(f): + m = pattern.match(line) + if m is None: + logging.warning('incorrect charset file. line #%d: %s', i, line) + continue + code = int(m.group(1)) + char = m.group(2).decode('utf-8') + if char == '': + char = null_character + charset[code] = char + return charset + + +class _NumOfViewsHandler(slim.tfexample_decoder.ItemHandler): + """Convenience handler to determine number of views stored in an image.""" + + def __init__(self, width_key, original_width_key, num_of_views): + super(_NumOfViewsHandler, self).__init__([width_key, original_width_key]) + self._width_key = width_key + self._original_width_key = original_width_key + self._num_of_views = num_of_views + + def tensors_to_item(self, keys_to_tensors): + return tf.to_int64( + self._num_of_views * keys_to_tensors[self._original_width_key] / + keys_to_tensors[self._width_key]) + + +def get_split(split_name, dataset_dir=None, config=None): + """Returns a dataset tuple for FSNS dataset. + + Args: + split_name: A train/test split name. + dataset_dir: The base directory of the dataset sources, by default it uses + a predefined CNS path (see DEFAULT_DATASET_DIR). + config: A dictionary with dataset configuration. If None - will use the + DEFAULT_CONFIG. + + Returns: + A `Dataset` namedtuple. + + Raises: + ValueError: if `split_name` is not a valid train/test split. + """ + if not dataset_dir: + dataset_dir = DEFAULT_DATASET_DIR + + if not config: + config = DEFAULT_CONFIG + + if split_name not in config['splits']: + raise ValueError('split name %s was not recognized.' % split_name) + + logging.info('Using %s dataset split_name=%s dataset_dir=%s', config['name'], + split_name, dataset_dir) + + # Ignores the 'image/height' feature. + zero = tf.zeros([1], dtype=tf.int64) + keys_to_features = { + 'image/encoded': + tf.FixedLenFeature((), tf.string, default_value=''), + 'image/format': + tf.FixedLenFeature((), tf.string, default_value='png'), + 'image/width': + tf.FixedLenFeature([1], tf.int64, default_value=zero), + 'image/orig_width': + tf.FixedLenFeature([1], tf.int64, default_value=zero), + 'image/class': + tf.FixedLenFeature([config['max_sequence_length']], tf.int64), + 'image/unpadded_class': + tf.VarLenFeature(tf.int64), + 'image/text': + tf.FixedLenFeature([1], tf.string, default_value=''), + } + items_to_handlers = { + 'image': + slim.tfexample_decoder.Image( + shape=config['image_shape'], + image_key='image/encoded', + format_key='image/format'), + 'label': + slim.tfexample_decoder.Tensor(tensor_key='image/class'), + 'text': + slim.tfexample_decoder.Tensor(tensor_key='image/text'), + 'num_of_views': + _NumOfViewsHandler( + width_key='image/width', + original_width_key='image/orig_width', + num_of_views=config['num_of_views']) + } + decoder = slim.tfexample_decoder.TFExampleDecoder(keys_to_features, + items_to_handlers) + charset_file = os.path.join(dataset_dir, config['charset_filename']) + charset = read_charset(charset_file) + file_pattern = os.path.join(dataset_dir, + config['splits'][split_name]['pattern']) + return slim.dataset.Dataset( + data_sources=file_pattern, + reader=tf.TFRecordReader, + decoder=decoder, + num_samples=config['splits'][split_name]['size'], + items_to_descriptions=config['items_to_descriptions'], + # additional parameters for convenience. + charset=charset, + num_char_classes=len(charset), + num_of_views=config['num_of_views'], + max_sequence_length=config['max_sequence_length'], + null_code=config['null_code']) diff --git a/attention_ocr/python/datasets/fsns_test.py b/attention_ocr/python/datasets/fsns_test.py new file mode 100644 index 0000000000000000000000000000000000000000..17cee7d404445e2c1e8f28cfb5b87c10fbbc5289 --- /dev/null +++ b/attention_ocr/python/datasets/fsns_test.py @@ -0,0 +1,103 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for FSNS datasets module.""" + +import collections +import os +import tensorflow as tf +from tensorflow.contrib import slim + +import fsns +import unittest_utils + +FLAGS = tf.flags.FLAGS + + +def get_test_split(): + config = fsns.DEFAULT_CONFIG.copy() + config['splits'] = {'test': {'size': 50, 'pattern': 'fsns-00000-of-00001'}} + return fsns.get_split('test', dataset_dir(), config) + + +def dataset_dir(): + return os.path.join(os.path.dirname(__file__), 'testdata/fsns') + + +class FsnsTest(tf.test.TestCase): + def test_decodes_example_proto(self): + expected_label = range(37) + expected_image, encoded = unittest_utils.create_random_image( + 'PNG', shape=(150, 600, 3)) + serialized = unittest_utils.create_serialized_example({ + 'image/encoded': [encoded], + 'image/format': ['PNG'], + 'image/class': + expected_label, + 'image/unpadded_class': + range(10), + 'image/text': ['Raw text'], + 'image/orig_width': [150], + 'image/width': [600] + }) + + decoder = fsns.get_split('train', dataset_dir()).decoder + with self.test_session() as sess: + data_tuple = collections.namedtuple('DecodedData', decoder.list_items()) + data = sess.run(data_tuple(*decoder.decode(serialized))) + + self.assertAllEqual(expected_image, data.image) + self.assertAllEqual(expected_label, data.label) + self.assertEqual(['Raw text'], data.text) + self.assertEqual([1], data.num_of_views) + + def test_label_has_shape_defined(self): + serialized = 'fake' + decoder = fsns.get_split('train', dataset_dir()).decoder + + [label_tf] = decoder.decode(serialized, ['label']) + + self.assertEqual(label_tf.get_shape().dims[0], 37) + + def test_dataset_tuple_has_all_extra_attributes(self): + dataset = fsns.get_split('train', dataset_dir()) + + self.assertTrue(dataset.charset) + self.assertTrue(dataset.num_char_classes) + self.assertTrue(dataset.num_of_views) + self.assertTrue(dataset.max_sequence_length) + self.assertTrue(dataset.null_code) + + def test_can_use_the_test_data(self): + batch_size = 1 + dataset = get_test_split() + provider = slim.dataset_data_provider.DatasetDataProvider( + dataset, + shuffle=True, + common_queue_capacity=2 * batch_size, + common_queue_min=batch_size) + image_tf, label_tf = provider.get(['image', 'label']) + + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + with slim.queues.QueueRunners(sess): + image_np, label_np = sess.run([image_tf, label_tf]) + + self.assertEqual((150, 600, 3), image_np.shape) + self.assertEqual((37, ), label_np.shape) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/datasets/testdata/fsns/charset_size=134.txt b/attention_ocr/python/datasets/testdata/fsns/charset_size=134.txt new file mode 100644 index 0000000000000000000000000000000000000000..5c7fcde2ae0ab679f279a083d6de1c50d33ff90b --- /dev/null +++ b/attention_ocr/python/datasets/testdata/fsns/charset_size=134.txt @@ -0,0 +1,139 @@ +0 +133 +1 l +2 ’ +3 é +4 t +5 e +6 i +7 n +8 s +9 x +10 g +11 u +12 o +13 1 +14 8 +15 7 +16 0 +17 - +18 . +19 p +20 a +21 r +22 è +23 d +24 c +25 V +26 v +27 b +28 m +29 ) +30 C +31 z +32 S +33 y +34 , +35 k +36 É +37 A +38 h +39 E +40 » +41 D +42 / +43 H +44 M +45 ( +46 G +47 P +48 ç +2 ' +49 R +50 f +51 " +52 2 +53 j +54 | +55 N +56 6 +57 ° +58 5 +59 T +60 O +61 U +62 3 +63 % +64 9 +65 q +66 Z +67 B +68 K +69 w +70 W +71 : +72 4 +73 L +74 F +75 ] +76 ï +2 ‘ +77 I +78 J +79 ä +80 î +81 ; +82 à +83 ê +84 X +85 ü +86 Y +87 ô +88 = +89 + +90 \ +91 { +92 } +93 _ +94 Q +95 œ +96 ñ +97 * +98 ! +99 Ü +51 “ +100 â +101 Ç +102 Œ +103 û +104 ? +105 $ +106 ë +107 « +108 € +109 & +110 < +51 ” +111 æ +112 # +113 ® +114  +115 È +116 > +117 [ +17 — +118 Æ +119 ù +120 Î +121 Ô +122 ÿ +123 À +124 Ê +125 @ +126 Ï +127 © +128 Ë +129 Ù +130 £ +131 Ÿ +132 Û diff --git a/attention_ocr/python/datasets/testdata/fsns/fsns-00000-of-00001 b/attention_ocr/python/datasets/testdata/fsns/fsns-00000-of-00001 new file mode 100644 index 0000000000000000000000000000000000000000..eacafcc810fafba6c747e81a9f5e30e21c98d816 Binary files /dev/null and b/attention_ocr/python/datasets/testdata/fsns/fsns-00000-of-00001 differ diff --git a/attention_ocr/python/datasets/testdata/fsns/links.txt b/attention_ocr/python/datasets/testdata/fsns/links.txt new file mode 100644 index 0000000000000000000000000000000000000000..da98d305fa02a61a9ac42b5e5490aa4e0c709b7e --- /dev/null +++ b/attention_ocr/python/datasets/testdata/fsns/links.txt @@ -0,0 +1 @@ +http://download.tensorflow.org/data/fsns-20160927/testdata/fsns-00000-of-00001 diff --git a/attention_ocr/python/datasets/unittest_utils.py b/attention_ocr/python/datasets/unittest_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f74a40a4997d95de5c8353998a74ff32158fe7ad --- /dev/null +++ b/attention_ocr/python/datasets/unittest_utils.py @@ -0,0 +1,64 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to make unit testing easier.""" + +import StringIO +import numpy as np +from PIL import Image as PILImage +import tensorflow as tf + + +def create_random_image(image_format, shape): + """Creates an image with random values. + + Args: + image_format: An image format (PNG or JPEG). + shape: A tuple with image shape (including channels). + + Returns: + A tuple (, ) + """ + image = np.random.randint(low=0, high=255, size=shape, dtype='uint8') + io = StringIO.StringIO() + image_pil = PILImage.fromarray(image) + image_pil.save(io, image_format, subsampling=0, quality=100) + return image, io.getvalue() + + +def create_serialized_example(name_to_values): + """Creates a tf.Example proto using a dictionary. + + It automatically detects type of values and define a corresponding feature. + + Args: + name_to_values: A dictionary. + + Returns: + tf.Example proto. + """ + example = tf.train.Example() + for name, values in name_to_values.items(): + feature = example.features.feature[name] + if isinstance(values[0], str): + add = feature.bytes_list.value.extend + elif isinstance(values[0], float): + add = feature.float32_list.value.extend + elif isinstance(values[0], int): + add = feature.int64_list.value.extend + else: + raise AssertionError('Unsupported type: %s' % type(values[0])) + add(values) + return example.SerializeToString() diff --git a/attention_ocr/python/datasets/unittest_utils_test.py b/attention_ocr/python/datasets/unittest_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a127143320971f24b389afc973accda81cea8432 --- /dev/null +++ b/attention_ocr/python/datasets/unittest_utils_test.py @@ -0,0 +1,64 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for unittest_utils.""" +import StringIO + +import numpy as np +from PIL import Image as PILImage +import tensorflow as tf + +import unittest_utils + + +class UnittestUtilsTest(tf.test.TestCase): + def test_creates_an_image_of_specified_shape(self): + image, _ = unittest_utils.create_random_image('PNG', (10, 20, 3)) + self.assertEqual(image.shape, (10, 20, 3)) + + def test_encoded_image_corresponds_to_numpy_array(self): + image, encoded = unittest_utils.create_random_image('PNG', (20, 10, 3)) + pil_image = PILImage.open(StringIO.StringIO(encoded)) + self.assertAllEqual(image, np.array(pil_image)) + + def test_created_example_has_correct_values(self): + example_serialized = unittest_utils.create_serialized_example({ + 'labels': [1, 2, 3], + 'data': ['FAKE'] + }) + example = tf.train.Example() + example.ParseFromString(example_serialized) + self.assertProtoEquals(""" + features { + feature { + key: "labels" + value { int64_list { + value: 1 + value: 2 + value: 3 + }} + } + feature { + key: "data" + value { bytes_list { + value: "FAKE" + }} + } + } + """, example) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/eval.py b/attention_ocr/python/eval.py new file mode 100644 index 0000000000000000000000000000000000000000..ec68ad50bc25cd8528f4e9fd7976adad72782641 --- /dev/null +++ b/attention_ocr/python/eval.py @@ -0,0 +1,78 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Script to evaluate a trained Attention OCR model. + +A simple usage example: +python eval.py +""" +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow import app +from tensorflow.python.platform import flags + +import data_provider +import common_flags + +FLAGS = flags.FLAGS +common_flags.define() + +# yapf: disable +flags.DEFINE_integer('num_batches', 100, + 'Number of batches to run eval for.') + +flags.DEFINE_string('eval_log_dir', '/tmp/attention_ocr/eval', + 'Directory where the evaluation results are saved to.') + +flags.DEFINE_integer('eval_interval_secs', 60, + 'Frequency in seconds to run evaluations.') + +flags.DEFINE_integer('number_of_steps', None, + 'Number of times to run evaluation.') +# yapf: enable + + +def main(_): + if not tf.gfile.Exists(FLAGS.eval_log_dir): + tf.gfile.MakeDirs(FLAGS.eval_log_dir) + + dataset = common_flags.create_dataset(split_name=FLAGS.split_name) + model = common_flags.create_model(dataset.num_char_classes, + dataset.max_sequence_length, + dataset.num_of_views, dataset.null_code) + data = data_provider.get_data( + dataset, + FLAGS.batch_size, + augment=False, + central_crop_size=common_flags.get_crop_size()) + endpoints = model.create_base(data.images, labels_one_hot=None) + model.create_loss(data, endpoints) + eval_ops = model.create_summaries( + data, endpoints, dataset.charset, is_training=False) + slim.get_or_create_global_step() + session_config = tf.ConfigProto(device_count={"GPU": 0}) + slim.evaluation.evaluation_loop( + master=FLAGS.master, + checkpoint_dir=FLAGS.train_log_dir, + logdir=FLAGS.eval_log_dir, + eval_op=eval_ops, + num_evals=FLAGS.num_batches, + eval_interval_secs=FLAGS.eval_interval_secs, + max_number_of_evaluations=FLAGS.number_of_steps, + session_config=session_config) + + +if __name__ == '__main__': + app.run() diff --git a/attention_ocr/python/inception_preprocessing.py b/attention_ocr/python/inception_preprocessing.py new file mode 100644 index 0000000000000000000000000000000000000000..d3c3a5b07c24bc1a9e62d52b3213aff31c67d7b7 --- /dev/null +++ b/attention_ocr/python/inception_preprocessing.py @@ -0,0 +1,315 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Provides utilities to preprocess images for the Inception networks.""" + +# TODO(gorban): add as a dependency, when slim or tensorflow/models are pipfied +# Source: +# https://raw.githubusercontent.com/tensorflow/models/a9d0e6e8923a4/slim/preprocessing/inception_preprocessing.py +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +from tensorflow.python.ops import control_flow_ops + + +def apply_with_random_selector(x, func, num_cases): + """Computes func(x, sel), with sel sampled from [0...num_cases-1]. + + Args: + x: input Tensor. + func: Python function to apply. + num_cases: Python int32, number of cases to sample sel from. + + Returns: + The result of func(x, sel), where func receives the value of the + selector as a python integer, but sel is sampled dynamically. + """ + sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32) + # Pass the real x only to one of the func calls. + return control_flow_ops.merge([ + func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case) + for case in range(num_cases) + ])[0] + + +def distort_color(image, color_ordering=0, fast_mode=True, scope=None): + """Distort the color of a Tensor image. + + Each color distortion is non-commutative and thus ordering of the color ops + matters. Ideally we would randomly permute the ordering of the color ops. + Rather then adding that level of complication, we select a distinct ordering + of color ops for each preprocessing thread. + + Args: + image: 3-D Tensor containing single image in [0, 1]. + color_ordering: Python int, a type of distortion (valid values: 0-3). + fast_mode: Avoids slower ops (random_hue and random_contrast) + scope: Optional scope for name_scope. + Returns: + 3-D Tensor color-distorted image on range [0, 1] + Raises: + ValueError: if color_ordering not in [0, 3] + """ + with tf.name_scope(scope, 'distort_color', [image]): + if fast_mode: + if color_ordering == 0: + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + else: + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + else: + if color_ordering == 0: + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + elif color_ordering == 1: + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + elif color_ordering == 2: + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + elif color_ordering == 3: + image = tf.image.random_hue(image, max_delta=0.2) + image = tf.image.random_saturation(image, lower=0.5, upper=1.5) + image = tf.image.random_contrast(image, lower=0.5, upper=1.5) + image = tf.image.random_brightness(image, max_delta=32. / 255.) + else: + raise ValueError('color_ordering must be in [0, 3]') + + # The random_* ops do not necessarily clamp. + return tf.clip_by_value(image, 0.0, 1.0) + + +def distorted_bounding_box_crop(image, + bbox, + min_object_covered=0.1, + aspect_ratio_range=(0.75, 1.33), + area_range=(0.05, 1.0), + max_attempts=100, + scope=None): + """Generates cropped_image using a one of the bboxes randomly distorted. + + See `tf.image.sample_distorted_bounding_box` for more documentation. + + Args: + image: 3-D Tensor of image (it will be converted to floats in [0, 1]). + bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] + where each coordinate is [0, 1) and the coordinates are arranged + as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the + whole image. + min_object_covered: An optional `float`. Defaults to `0.1`. The cropped + area of the image must contain at least this fraction of any bounding box + supplied. + aspect_ratio_range: An optional list of `floats`. The cropped area of the + image must have an aspect ratio = width / height within this range. + area_range: An optional list of `floats`. The cropped area of the image + must contain a fraction of the supplied image within in this range. + max_attempts: An optional `int`. Number of attempts at generating a cropped + region of the image of the specified constraints. After `max_attempts` + failures, return the entire image. + scope: Optional scope for name_scope. + Returns: + A tuple, a 3-D Tensor cropped_image and the distorted bbox + """ + with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]): + # Each bounding box has shape [1, num_boxes, box coords] and + # the coordinates are ordered [ymin, xmin, ymax, xmax]. + + # A large fraction of image datasets contain a human-annotated bounding + # box delineating the region of the image containing the object of interest. + # We choose to create a new bounding box for the object which is a randomly + # distorted version of the human-annotated bounding box that obeys an + # allowed range of aspect ratios, sizes and overlap with the human-annotated + # bounding box. If no box is supplied, then we assume the bounding box is + # the entire image. + sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( + tf.shape(image), + bounding_boxes=bbox, + min_object_covered=min_object_covered, + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + max_attempts=max_attempts, + use_image_if_no_bounding_boxes=True) + bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box + + # Crop the image to the specified bounding box. + cropped_image = tf.slice(image, bbox_begin, bbox_size) + return cropped_image, distort_bbox + + +def preprocess_for_train(image, + height, + width, + bbox, + fast_mode=True, + scope=None): + """Distort one image for training a network. + + Distorting images provides a useful technique for augmenting the data + set during training in order to make the network invariant to aspects + of the image that do not effect the label. + + Additionally it would create image_summaries to display the different + transformations applied to the image. + + Args: + image: 3-D Tensor of image. If dtype is tf.float32 then the range should be + [0, 1], otherwise it would converted to tf.float32 assuming that the range + is [0, MAX], where MAX is largest positive representable number for + int(8/16/32) data type (see `tf.image.convert_image_dtype` for details). + height: integer + width: integer + bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] + where each coordinate is [0, 1) and the coordinates are arranged + as [ymin, xmin, ymax, xmax]. + fast_mode: Optional boolean, if True avoids slower transformations (i.e. + bi-cubic resizing, random_hue or random_contrast). + scope: Optional scope for name_scope. + Returns: + 3-D float Tensor of distorted image used for training with range [-1, 1]. + """ + with tf.name_scope(scope, 'distort_image', [image, height, width, bbox]): + if bbox is None: + bbox = tf.constant( + [0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) + if image.dtype != tf.float32: + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + # Each bounding box has shape [1, num_boxes, box coords] and + # the coordinates are ordered [ymin, xmin, ymax, xmax]. + image_with_box = tf.image.draw_bounding_boxes( + tf.expand_dims(image, 0), bbox) + tf.summary.image('image_with_bounding_boxes', image_with_box) + + distorted_image, distorted_bbox = distorted_bounding_box_crop(image, bbox) + # Restore the shape since the dynamic slice based upon the bbox_size loses + # the third dimension. + distorted_image.set_shape([None, None, 3]) + image_with_distorted_box = tf.image.draw_bounding_boxes( + tf.expand_dims(image, 0), distorted_bbox) + tf.summary.image('images_with_distorted_bounding_box', + image_with_distorted_box) + + # This resizing operation may distort the images because the aspect + # ratio is not respected. We select a resize method in a round robin + # fashion based on the thread number. + # Note that ResizeMethod contains 4 enumerated resizing methods. + + # We select only 1 case for fast_mode bilinear. + num_resize_cases = 1 if fast_mode else 4 + distorted_image = apply_with_random_selector( + distorted_image, + lambda x, method: tf.image.resize_images(x, [height, width], method=method), + num_cases=num_resize_cases) + + tf.summary.image('cropped_resized_image', + tf.expand_dims(distorted_image, 0)) + + # Randomly flip the image horizontally. + distorted_image = tf.image.random_flip_left_right(distorted_image) + + # Randomly distort the colors. There are 4 ways to do it. + distorted_image = apply_with_random_selector( + distorted_image, + lambda x, ordering: distort_color(x, ordering, fast_mode), + num_cases=4) + + tf.summary.image('final_distorted_image', + tf.expand_dims(distorted_image, 0)) + distorted_image = tf.subtract(distorted_image, 0.5) + distorted_image = tf.multiply(distorted_image, 2.0) + return distorted_image + + +def preprocess_for_eval(image, + height, + width, + central_fraction=0.875, + scope=None): + """Prepare one image for evaluation. + + If height and width are specified it would output an image with that size by + applying resize_bilinear. + + If central_fraction is specified it would cropt the central fraction of the + input image. + + Args: + image: 3-D Tensor of image. If dtype is tf.float32 then the range should be + [0, 1], otherwise it would converted to tf.float32 assuming that the range + is [0, MAX], where MAX is largest positive representable number for + int(8/16/32) data type (see `tf.image.convert_image_dtype` for details) + height: integer + width: integer + central_fraction: Optional Float, fraction of the image to crop. + scope: Optional scope for name_scope. + Returns: + 3-D float Tensor of prepared image. + """ + with tf.name_scope(scope, 'eval_image', [image, height, width]): + if image.dtype != tf.float32: + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + # Crop the central region of the image with an area containing 87.5% of + # the original image. + if central_fraction: + image = tf.image.central_crop(image, central_fraction=central_fraction) + + if height and width: + # Resize the image to the specified height and width. + image = tf.expand_dims(image, 0) + image = tf.image.resize_bilinear( + image, [height, width], align_corners=False) + image = tf.squeeze(image, [0]) + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.0) + return image + + +def preprocess_image(image, + height, + width, + is_training=False, + bbox=None, + fast_mode=True): + """Pre-process one image for training or evaluation. + + Args: + image: 3-D Tensor [height, width, channels] with the image. + height: integer, image expected height. + width: integer, image expected width. + is_training: Boolean. If true it would transform an image for train, + otherwise it would transform it for evaluation. + bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] + where each coordinate is [0, 1) and the coordinates are arranged as + [ymin, xmin, ymax, xmax]. + fast_mode: Optional boolean, if True avoids slower transformations. + + Returns: + 3-D float Tensor containing an appropriately scaled image + + Raises: + ValueError: if user does not provide bounding box + """ + if is_training: + return preprocess_for_train(image, height, width, bbox, fast_mode) + else: + return preprocess_for_eval(image, height, width) diff --git a/attention_ocr/python/metrics.py b/attention_ocr/python/metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..9e2a6a7579812583dc60546f97976f05befe07ff --- /dev/null +++ b/attention_ocr/python/metrics.py @@ -0,0 +1,90 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Quality metrics for the model.""" + +import tensorflow as tf + + +def char_accuracy(predictions, targets, rej_char, streaming=False): + """Computes character level accuracy. + + Both predictions and targets should have the same shape + [batch_size x seq_length]. + + Args: + predictions: predicted characters ids. + targets: ground truth character ids. + rej_char: the character id used to mark an empty element (end of sequence). + streaming: if True, uses the streaming mean from the slim.metric module. + + Returns: + a update_ops for execution and value tensor whose value on evaluation + returns the total character accuracy. + """ + with tf.variable_scope('CharAccuracy'): + predictions.get_shape().assert_is_compatible_with(targets.get_shape()) + + targets = tf.to_int32(targets) + const_rej_char = tf.constant(rej_char, shape=targets.get_shape()) + weights = tf.to_float(tf.not_equal(targets, const_rej_char)) + correct_chars = tf.to_float(tf.equal(predictions, targets)) + accuracy_per_example = tf.div( + tf.reduce_sum(tf.multiply(correct_chars, weights), 1), + tf.reduce_sum(weights, 1)) + if streaming: + return tf.contrib.metrics.streaming_mean(accuracy_per_example) + else: + return tf.reduce_mean(accuracy_per_example) + + +def sequence_accuracy(predictions, targets, rej_char, streaming=False): + """Computes sequence level accuracy. + + Both input tensors should have the same shape: [batch_size x seq_length]. + + Args: + predictions: predicted character classes. + targets: ground truth character classes. + rej_char: the character id used to mark empty element (end of sequence). + streaming: if True, uses the streaming mean from the slim.metric module. + + Returns: + a update_ops for execution and value tensor whose value on evaluation + returns the total sequence accuracy. + """ + + with tf.variable_scope('SequenceAccuracy'): + predictions.get_shape().assert_is_compatible_with(targets.get_shape()) + + targets = tf.to_int32(targets) + const_rej_char = tf.constant( + rej_char, shape=targets.get_shape(), dtype=tf.int32) + include_mask = tf.not_equal(targets, const_rej_char) + include_predictions = tf.to_int32( + tf.where(include_mask, predictions, + tf.zeros_like(predictions) + rej_char)) + correct_chars = tf.to_float(tf.equal(include_predictions, targets)) + correct_chars_counts = tf.cast( + tf.reduce_sum(correct_chars, reduction_indices=[1]), dtype=tf.int32) + target_length = targets.get_shape().dims[1].value + target_chars_counts = tf.constant( + target_length, shape=correct_chars_counts.get_shape()) + accuracy_per_example = tf.to_float( + tf.equal(correct_chars_counts, target_chars_counts)) + if streaming: + return tf.contrib.metrics.streaming_mean(accuracy_per_example) + else: + return tf.reduce_mean(accuracy_per_example) diff --git a/attention_ocr/python/metrics_test.py b/attention_ocr/python/metrics_test.py new file mode 100644 index 0000000000000000000000000000000000000000..68b9724f1d20e62f39f8b1b5c0130d4ea76cf825 --- /dev/null +++ b/attention_ocr/python/metrics_test.py @@ -0,0 +1,97 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for the metrics module.""" +import contextlib +import numpy as np +import tensorflow as tf + +import metrics + + +class AccuracyTest(tf.test.TestCase): + def setUp(self): + tf.test.TestCase.setUp(self) + self.rng = np.random.RandomState([11, 23, 50]) + self.num_char_classes = 3 + self.batch_size = 4 + self.seq_length = 5 + self.rej_char = 42 + + @contextlib.contextmanager + def initialized_session(self): + """Wrapper for test session context manager with required initialization. + + Yields: + A session object that should be used as a context manager. + """ + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + sess.run(tf.local_variables_initializer()) + yield sess + + def _fake_labels(self): + return self.rng.randint( + low=0, + high=self.num_char_classes, + size=(self.batch_size, self.seq_length), + dtype='int32') + + def _incorrect_copy(self, values, bad_indexes): + incorrect = np.copy(values) + incorrect[bad_indexes] = values[bad_indexes] + 1 + return incorrect + + def test_sequence_accuracy_identical_samples(self): + labels_tf = tf.convert_to_tensor(self._fake_labels()) + + accuracy_tf = metrics.sequence_accuracy(labels_tf, labels_tf, + self.rej_char) + with self.initialized_session() as sess: + accuracy_np = sess.run(accuracy_tf) + + self.assertAlmostEqual(accuracy_np, 1.0) + + def test_sequence_accuracy_one_char_difference(self): + ground_truth_np = self._fake_labels() + ground_truth_tf = tf.convert_to_tensor(ground_truth_np) + prediction_tf = tf.convert_to_tensor( + self._incorrect_copy(ground_truth_np, bad_indexes=((0, 0)))) + + accuracy_tf = metrics.sequence_accuracy(prediction_tf, ground_truth_tf, + self.rej_char) + with self.initialized_session() as sess: + accuracy_np = sess.run(accuracy_tf) + + # 1 of 4 sequences is incorrect. + self.assertAlmostEqual(accuracy_np, 1.0 - 1.0 / self.batch_size) + + def test_char_accuracy_one_char_difference_with_padding(self): + ground_truth_np = self._fake_labels() + ground_truth_tf = tf.convert_to_tensor(ground_truth_np) + prediction_tf = tf.convert_to_tensor( + self._incorrect_copy(ground_truth_np, bad_indexes=((0, 0)))) + + accuracy_tf = metrics.char_accuracy(prediction_tf, ground_truth_tf, + self.rej_char) + with self.initialized_session() as sess: + accuracy_np = sess.run(accuracy_tf) + + chars_count = self.seq_length * self.batch_size + self.assertAlmostEqual(accuracy_np, 1.0 - 1.0 / chars_count) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/model.py b/attention_ocr/python/model.py new file mode 100644 index 0000000000000000000000000000000000000000..8e0e19bb887e1476a4e2a6df82491a5e9a812460 --- /dev/null +++ b/attention_ocr/python/model.py @@ -0,0 +1,531 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to build the Attention OCR model. + +Usage example: + ocr_model = model.Model(num_char_classes, seq_length, num_of_views) + + data = ... # create namedtuple InputEndpoints + endpoints = model.create_base(data.images, data.labels_one_hot) + # endpoints.predicted_chars is a tensor with predicted character codes. + total_loss = model.create_loss(data, endpoints) +""" +import sys +import collections +import logging +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.contrib.slim.nets import inception + +import metrics +import sequence_layers +import utils + + +OutputEndpoints = collections.namedtuple('OutputEndpoints', [ + 'chars_logit', 'chars_log_prob', 'predicted_chars', 'predicted_scores' +]) + +# TODO(gorban): replace with tf.HParams when it is released. +ModelParams = collections.namedtuple('ModelParams', [ + 'num_char_classes', 'seq_length', 'num_views', 'null_code' +]) + +ConvTowerParams = collections.namedtuple('ConvTowerParams', ['final_endpoint']) + +SequenceLogitsParams = collections.namedtuple('SequenceLogitsParams', [ + 'use_attention', 'use_autoregression', 'num_lstm_units', 'weight_decay', + 'lstm_state_clip_value' +]) + +SequenceLossParams = collections.namedtuple('SequenceLossParams', [ + 'label_smoothing', 'ignore_nulls', 'average_across_timesteps' +]) + + +def _dict_to_array(id_to_char, default_character): + num_char_classes = max(id_to_char.keys()) + 1 + array = [default_character] * num_char_classes + for k, v in id_to_char.iteritems(): + array[k] = v + return array + + +class CharsetMapper(object): + """A simple class to map tensor ids into strings. + + It works only when the character set is 1:1 mapping between individual + characters and individual ids. + + Make sure you call tf.tables_initializer().run() as part of the init op. + """ + + def __init__(self, charset, default_character='?'): + """Creates a lookup table. + + Args: + charset: a dictionary with id-to-character mapping. + """ + mapping_strings = tf.constant(_dict_to_array(charset, default_character)) + self.table = tf.contrib.lookup.index_to_string_table_from_tensor( + mapping=mapping_strings, default_value=default_character) + + def get_text(self, ids): + """Returns a string corresponding to a sequence of character ids. + + Args: + ids: a tensor with shape [batch_size, max_sequence_length] + """ + return tf.reduce_join( + self.table.lookup(tf.to_int64(ids)), reduction_indices=1) + + +def get_softmax_loss_fn(label_smoothing): + """Returns sparse or dense loss function depending on the label_smoothing. + + Args: + label_smoothing: weight for label smoothing + + Returns: + a function which takes labels and predictions as arguments and returns + a softmax loss for the selected type of labels (sparse or dense). + """ + if label_smoothing > 0: + + def loss_fn(labels, logits): + return (tf.nn.softmax_cross_entropy_with_logits( + logits=logits, labels=labels)) + else: + + def loss_fn(labels, logits): + return tf.nn.sparse_softmax_cross_entropy_with_logits( + logits=logits, labels=labels) + + return loss_fn + + +class Model(object): + """Class to create the Attention OCR Model.""" + + def __init__(self, + num_char_classes, + seq_length, + num_views, + null_code, + mparams=None): + """Initialized model parameters. + + Args: + num_char_classes: size of character set. + seq_length: number of characters in a sequence. + num_views: Number of views (conv towers) to use. + null_code: A character code corresponding to a character which + indicates end of a sequence. + mparams: a dictionary with hyper parameters for methods, keys - + function names, values - corresponding namedtuples. + """ + super(Model, self).__init__() + self._params = ModelParams( + num_char_classes=num_char_classes, + seq_length=seq_length, + num_views=num_views, + null_code=null_code) + self._mparams = self.default_mparams() + if mparams: + self._mparams.update(mparams) + + def default_mparams(self): + return { + 'conv_tower_fn': + ConvTowerParams(final_endpoint='Mixed_5d'), + 'sequence_logit_fn': + SequenceLogitsParams( + use_attention=True, + use_autoregression=True, + num_lstm_units=256, + weight_decay=0.00004, + lstm_state_clip_value=10.0), + 'sequence_loss_fn': + SequenceLossParams( + label_smoothing=0.1, + ignore_nulls=True, + average_across_timesteps=False) + } + + def set_mparam(self, function, **kwargs): + self._mparams[function] = self._mparams[function]._replace(**kwargs) + + def conv_tower_fn(self, images, is_training=True, reuse=None): + """Computes convolutional features using the InceptionV3 model. + + Args: + images: A tensor of shape [batch_size, height, width, channels]. + is_training: whether is training or not. + reuse: whether or not the network and its variables should be reused. To + be able to reuse 'scope' must be given. + + Returns: + A tensor of shape [batch_size, OH, OW, N], where OWxOH is resolution of + output feature map and N is number of output features (depends on the + network architecture). + """ + mparams = self._mparams['conv_tower_fn'] + logging.debug('Using final_endpoint=%s', mparams.final_endpoint) + with tf.variable_scope('conv_tower_fn/INCE'): + if reuse: + tf.get_variable_scope().reuse_variables() + with slim.arg_scope(inception.inception_v3_arg_scope()): + net, _ = inception.inception_v3_base( + images, final_endpoint=mparams.final_endpoint) + return net + + def _create_lstm_inputs(self, net): + """Splits an input tensor into a list of tensors (features). + + Args: + net: A feature map of shape [batch_size, num_features, feature_size]. + + Raises: + AssertionError: if num_features is less than seq_length. + + Returns: + A list with seq_length tensors of shape [batch_size, feature_size] + """ + num_features = net.get_shape().dims[1].value + if num_features < self._params.seq_length: + raise AssertionError('Incorrect dimension #1 of input tensor' + ' %d should be bigger than %d (shape=%s)' % + (num_features, self._params.seq_length, + net.get_shape())) + elif num_features > self._params.seq_length: + logging.warning('Ignoring some features: use %d of %d (shape=%s)', + self._params.seq_length, num_features, net.get_shape()) + net = tf.slice(net, [0, 0, 0], [-1, self._params.seq_length, -1]) + + return tf.unstack(net, axis=1) + + def sequence_logit_fn(self, net, labels_one_hot): + mparams = self._mparams['sequence_logit_fn'] + # TODO(gorban): remove /alias suffixes from the scopes. + with tf.variable_scope('sequence_logit_fn/SQLR'): + layer_class = sequence_layers.get_layer_class(mparams.use_attention, + mparams.use_autoregression) + layer = layer_class(net, labels_one_hot, self._params, mparams) + return layer.create_logits() + + def max_pool_views(self, nets_list): + """Max pool across all nets in spatial dimensions. + + Args: + nets_list: A list of 4D tensors with identical size. + + Returns: + A tensor with the same size as any input tensors. + """ + batch_size, height, width, num_features = [ + d.value for d in nets_list[0].get_shape().dims + ] + xy_flat_shape = (batch_size, 1, height * width, num_features) + nets_for_merge = [] + with tf.variable_scope('max_pool_views', values=nets_list): + for net in nets_list: + nets_for_merge.append(tf.reshape(net, xy_flat_shape)) + merged_net = tf.concat(nets_for_merge, 1) + net = slim.max_pool2d( + merged_net, kernel_size=[len(nets_list), 1], stride=1) + net = tf.reshape(net, (batch_size, height, width, num_features)) + return net + + def pool_views_fn(self, nets): + """Combines output of multiple convolutional towers into a single tensor. + + It stacks towers one on top another (in height dim) in a 4x1 grid. + The order is arbitrary design choice and shouldn't matter much. + + Args: + nets: list of tensors of shape=[batch_size, height, width, num_features]. + + Returns: + A tensor of shape [batch_size, seq_length, features_size]. + """ + with tf.variable_scope('pool_views_fn/STCK'): + net = tf.concat(nets, 1) + batch_size = net.get_shape().dims[0].value + feature_size = net.get_shape().dims[3].value + return tf.reshape(net, [batch_size, -1, feature_size]) + + def char_predictions(self, chars_logit): + """Returns confidence scores (softmax values) for predicted characters. + + Args: + chars_logit: chars logits, a tensor with shape + [batch_size x seq_length x num_char_classes] + + Returns: + A tuple (ids, log_prob, scores), where: + ids - predicted characters, a int32 tensor with shape + [batch_size x seq_length]; + log_prob - a log probability of all characters, a float tensor with + shape [batch_size, seq_length, num_char_classes]; + scores - corresponding confidence scores for characters, a float + tensor + with shape [batch_size x seq_length]. + """ + log_prob = utils.logits_to_log_prob(chars_logit) + ids = tf.to_int32(tf.argmax(log_prob, dimension=2), name='predicted_chars') + mask = tf.cast( + slim.one_hot_encoding(ids, self._params.num_char_classes), tf.bool) + all_scores = tf.nn.softmax(chars_logit) + selected_scores = tf.boolean_mask(all_scores, mask, name='char_scores') + scores = tf.reshape(selected_scores, shape=(-1, self._params.seq_length)) + return ids, log_prob, scores + + def create_base(self, + images, + labels_one_hot, + scope='AttentionOcr_v1', + reuse=None): + """Creates a base part of the Model (no gradients, losses or summaries). + + Args: + images: A tensor of shape [batch_size, height, width, channels]. + labels_one_hot: Optional (can be None) one-hot encoding for ground truth + labels. If provided the function will create a model for training. + scope: Optional variable_scope. + reuse: whether or not the network and its variables should be reused. To + be able to reuse 'scope' must be given. + + Returns: + A named tuple OutputEndpoints. + """ + logging.debug('images: %s', images) + is_training = labels_one_hot is not None + with tf.variable_scope(scope, reuse=reuse): + views = tf.split( + value=images, num_or_size_splits=self._params.num_views, axis=2) + logging.debug('Views=%d single view: %s', len(views), views[0]) + + nets = [ + self.conv_tower_fn(v, is_training, reuse=(i != 0)) + for i, v in enumerate(views) + ] + logging.debug('Conv tower: %s', nets[0]) + + net = self.pool_views_fn(nets) + logging.debug('Pooled views: %s', net) + + chars_logit = self.sequence_logit_fn(net, labels_one_hot) + logging.debug('chars_logit: %s', chars_logit) + + predicted_chars, chars_log_prob, predicted_scores = ( + self.char_predictions(chars_logit)) + + return OutputEndpoints( + chars_logit=chars_logit, + chars_log_prob=chars_log_prob, + predicted_chars=predicted_chars, + predicted_scores=predicted_scores) + + def create_loss(self, data, endpoints): + """Creates all losses required to train the model. + + Args: + data: InputEndpoints namedtuple. + endpoints: Model namedtuple. + + Returns: + Total loss. + """ + # NOTE: the return value of ModelLoss is not used directly for the + # gradient computation because under the hood it calls slim.losses.AddLoss, + # which registers the loss in an internal collection and later returns it + # as part of GetTotalLoss. We need to use total loss because model may have + # multiple losses including regularization losses. + self.sequence_loss_fn(endpoints.chars_logit, data.labels) + total_loss = slim.losses.get_total_loss() + tf.summary.scalar('TotalLoss', total_loss) + return total_loss + + def label_smoothing_regularization(self, chars_labels, weight=0.1): + """Applies a label smoothing regularization. + + Uses the same method as in https://arxiv.org/abs/1512.00567. + + Args: + chars_labels: ground truth ids of charactes, + shape=[batch_size, seq_length]; + weight: label-smoothing regularization weight. + + Returns: + A sensor with the same shape as the input. + """ + one_hot_labels = tf.one_hot( + chars_labels, depth=self._params.num_char_classes, axis=-1) + pos_weight = 1.0 - weight + neg_weight = weight / self._params.num_char_classes + return one_hot_labels * pos_weight + neg_weight + + def sequence_loss_fn(self, chars_logits, chars_labels): + """Loss function for char sequence. + + Depending on values of hyper parameters it applies label smoothing and can + also ignore all null chars after the first one. + + Args: + chars_logits: logits for predicted characters, + shape=[batch_size, seq_length, num_char_classes]; + chars_labels: ground truth ids of characters, + shape=[batch_size, seq_length]; + mparams: method hyper parameters. + + Returns: + A Tensor with shape [batch_size] - the log-perplexity for each sequence. + """ + mparams = self._mparams['sequence_loss_fn'] + with tf.variable_scope('sequence_loss_fn/SLF'): + if mparams.label_smoothing > 0: + smoothed_one_hot_labels = self.label_smoothing_regularization( + chars_labels, mparams.label_smoothing) + labels_list = tf.unstack(smoothed_one_hot_labels, axis=1) + else: + # NOTE: in case of sparse softmax we are not using one-hot + # encoding. + labels_list = tf.unstack(chars_labels, axis=1) + + batch_size, seq_length, _ = chars_logits.shape.as_list() + if mparams.ignore_nulls: + weights = tf.ones((batch_size, seq_length), dtype=tf.float32) + else: + # Suppose that reject character is the last in the charset. + reject_char = tf.constant( + self._params.num_char_classes - 1, + shape=(batch_size, seq_length), + dtype=tf.int64) + known_char = tf.not_equal(chars_labels, reject_char) + weights = tf.to_float(known_char) + + logits_list = tf.unstack(chars_logits, axis=1) + weights_list = tf.unstack(weights, axis=1) + loss = tf.contrib.legacy_seq2seq.sequence_loss( + logits_list, + labels_list, + weights_list, + softmax_loss_function=get_softmax_loss_fn(mparams.label_smoothing), + average_across_timesteps=mparams.average_across_timesteps) + tf.losses.add_loss(loss) + return loss + + def create_summaries(self, data, endpoints, charset, is_training): + """Creates all summaries for the model. + + Args: + data: InputEndpoints namedtuple. + endpoints: OutputEndpoints namedtuple. + charset: A dictionary with mapping between character codes and + unicode characters. Use the one provided by a dataset.charset. + is_training: If True will create summary prefixes for training job, + otherwise - for evaluation. + + Returns: + A list of evaluation ops + """ + + def sname(label): + prefix = 'train' if is_training else 'eval' + return '%s/%s' % (prefix, label) + + max_outputs = 4 + # TODO(gorban): uncomment, when tf.summary.text released. + # charset_mapper = CharsetMapper(charset) + # pr_text = charset_mapper.get_text( + # endpoints.predicted_chars[:max_outputs,:]) + # tf.summary.text(sname('text/pr'), pr_text) + # gt_text = charset_mapper.get_text(data.labels[:max_outputs,:]) + # tf.summary.text(sname('text/gt'), gt_text) + tf.summary.image(sname('image'), data.images, max_outputs=max_outputs) + + if is_training: + tf.summary.image( + sname('image/orig'), data.images_orig, max_outputs=max_outputs) + for var in tf.trainable_variables(): + tf.summary.histogram(var.op.name, var) + return None + + else: + names_to_values = {} + names_to_updates = {} + + def use_metric(name, value_update_tuple): + names_to_values[name] = value_update_tuple[0] + names_to_updates[name] = value_update_tuple[1] + + use_metric('CharacterAccuracy', + metrics.char_accuracy( + endpoints.predicted_chars, + data.labels, + streaming=True, + rej_char=self._params.null_code)) + # Sequence accuracy computed by cutting sequence at the first null char + use_metric('SequenceAccuracy', + metrics.sequence_accuracy( + endpoints.predicted_chars, + data.labels, + streaming=True, + rej_char=self._params.null_code)) + + for name, value in names_to_values.iteritems(): + summary_name = 'eval/' + name + tf.summary.scalar(summary_name, tf.Print(value, [value], summary_name)) + return names_to_updates.values() + + def create_init_fn_to_restore(self, master_checkpoint, inception_checkpoint): + """Creates an init operations to restore weights from various checkpoints. + + Args: + master_checkpoint: path to a checkpoint which contains all weights for + the whole model. + inception_checkpoint: path to a checkpoint which contains weights for the + inception part only. + + Returns: + a function to run initialization ops. + """ + all_assign_ops = [] + all_feed_dict = {} + + def assign_from_checkpoint(variables, checkpoint): + logging.info('Request to re-store %d weights from %s', + len(variables), checkpoint) + if not variables: + logging.error('Can\'t find any variables to restore.') + sys.exit(1) + assign_op, feed_dict = slim.assign_from_checkpoint(checkpoint, variables) + all_assign_ops.append(assign_op) + all_feed_dict.update(feed_dict) + + if master_checkpoint: + assign_from_checkpoint(utils.variables_to_restore(), master_checkpoint) + + if inception_checkpoint: + variables = utils.variables_to_restore( + 'AttentionOcr_v1/conv_tower_fn/INCE', strip_scope=True) + assign_from_checkpoint(variables, inception_checkpoint) + + def init_assign_fn(sess): + logging.info('Restoring checkpoint(s)') + sess.run(all_assign_ops, all_feed_dict) + + return init_assign_fn diff --git a/attention_ocr/python/model_test.py b/attention_ocr/python/model_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3626788b2124779702694a6b71e3aa5923021b32 --- /dev/null +++ b/attention_ocr/python/model_test.py @@ -0,0 +1,181 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for the model.""" + +import numpy as np +import string +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.contrib.tfprof import model_analyzer + +import model +import data_provider + + +def create_fake_charset(num_char_classes): + charset = {} + for i in xrange(num_char_classes): + charset[i] = string.printable[i % len(string.printable)] + return charset + + +class ModelTest(tf.test.TestCase): + def setUp(self): + tf.test.TestCase.setUp(self) + + self.rng = np.random.RandomState([11, 23, 50]) + + self.batch_size = 4 + self.image_width = 600 + self.image_height = 30 + self.seq_length = 40 + self.num_char_classes = 72 + self.null_code = 62 + self.num_views = 4 + + feature_size = 288 + self.conv_tower_shape = (self.batch_size, 1, 72, feature_size) + self.features_shape = (self.batch_size, self.seq_length, feature_size) + self.chars_logit_shape = (self.batch_size, self.seq_length, + self.num_char_classes) + self.length_logit_shape = (self.batch_size, self.seq_length + 1) + + self.initialize_fakes() + + def initialize_fakes(self): + self.images_shape = (self.batch_size, self.image_height, self.image_width, + 3) + self.fake_images = tf.constant( + self.rng.randint(low=0, high=255, + size=self.images_shape).astype('float32'), + name='input_node') + self.fake_conv_tower_np = tf.constant( + self.rng.randn(*self.conv_tower_shape).astype('float32')) + self.fake_logits = tf.constant( + self.rng.randn(*self.chars_logit_shape).astype('float32')) + self.fake_labels = tf.constant( + self.rng.randint( + low=0, + high=self.num_char_classes, + size=(self.batch_size, self.seq_length)).astype('int64')) + + def create_model(self): + return model.Model( + self.num_char_classes, self.seq_length, num_views=4, null_code=62) + + def test_char_related_shapes(self): + ocr_model = self.create_model() + with self.test_session() as sess: + endpoints_tf = ocr_model.create_base( + images=self.fake_images, labels_one_hot=None) + + sess.run(tf.global_variables_initializer()) + endpoints = sess.run(endpoints_tf) + + self.assertEqual((self.batch_size, self.seq_length, + self.num_char_classes), endpoints.chars_logit.shape) + self.assertEqual((self.batch_size, self.seq_length, + self.num_char_classes), endpoints.chars_log_prob.shape) + self.assertEqual((self.batch_size, self.seq_length), + endpoints.predicted_chars.shape) + self.assertEqual((self.batch_size, self.seq_length), + endpoints.predicted_scores.shape) + + def test_predicted_scores_are_within_range(self): + ocr_model = self.create_model() + + _, _, scores = ocr_model.char_predictions(self.fake_logits) + with self.test_session() as sess: + scores_np = sess.run(scores) + + values_in_range = (scores_np >= 0.0) & (scores_np <= 1.0) + self.assertTrue( + np.all(values_in_range), + msg=('Scores contains out of the range values %s' % + scores_np[np.logical_not(values_in_range)])) + + def test_conv_tower_shape(self): + with self.test_session() as sess: + ocr_model = self.create_model() + conv_tower = ocr_model.conv_tower_fn(self.fake_images) + + sess.run(tf.global_variables_initializer()) + conv_tower_np = sess.run(conv_tower) + + self.assertEqual(self.conv_tower_shape, conv_tower_np.shape) + + def test_model_size_less_then1_gb(self): + # NOTE: Actual amount of memory occupied my TF during training will be at + # least 4X times bigger because of space need to store original weights, + # updates, gradients and variances. It also depends on the type of used + # optimizer. + ocr_model = self.create_model() + ocr_model.create_base(images=self.fake_images, labels_one_hot=None) + with self.test_session() as sess: + tfprof_root = model_analyzer.print_model_analysis( + sess.graph, + tfprof_options=model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS) + + model_size_bytes = 4 * tfprof_root.total_parameters + self.assertLess(model_size_bytes, 1 * 2**30) + + def test_create_summaries_is_runnable(self): + ocr_model = self.create_model() + data = data_provider.InputEndpoints( + images=self.fake_images, + images_orig=self.fake_images, + labels=self.fake_labels, + labels_one_hot=slim.one_hot_encoding(self.fake_labels, + self.num_char_classes)) + endpoints = ocr_model.create_base( + images=self.fake_images, labels_one_hot=None) + charset = create_fake_charset(self.num_char_classes) + summaries = ocr_model.create_summaries( + data, endpoints, charset, is_training=False) + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + sess.run(tf.local_variables_initializer()) + tf.tables_initializer().run() + sess.run(summaries) # just check it is runnable + + def test_sequence_loss_function_without_label_smoothing(self): + model = self.create_model() + model.set_mparam('sequence_loss_fn', label_smoothing=0) + + loss = model.sequence_loss_fn(self.fake_logits, self.fake_labels) + with self.test_session() as sess: + loss_np = sess.run(loss) + + # This test checks that the loss function is 'runnable'. + self.assertEqual(loss_np.shape, tuple()) + + +class CharsetMapperTest(tf.test.TestCase): + def test_text_corresponds_to_ids(self): + charset = create_fake_charset(36) + ids = tf.constant( + [[17, 14, 21, 21, 24], [32, 24, 27, 21, 13]], dtype=tf.int64) + charset_mapper = model.CharsetMapper(charset) + + with self.test_session() as sess: + tf.tables_initializer().run() + text = sess.run(charset_mapper.get_text(ids)) + + self.assertAllEqual(text, ['hello', 'world']) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/sequence_layers.py b/attention_ocr/python/sequence_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..6e1e8493fdcf81eaf90d6769edefaf55a2baf7e8 --- /dev/null +++ b/attention_ocr/python/sequence_layers.py @@ -0,0 +1,422 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various implementations of sequence layers for character prediction. + +A 'sequence layer' is a part of a computation graph which is responsible of +producing a sequence of characters using extracted image features. There are +many reasonable ways to implement such layers. All of them are using RNNs. +This module provides implementations which uses 'attention' mechanism to +spatially 'pool' image features and also can use a previously predicted +character to predict the next (aka auto regression). + +Usage: + Select one of available classes, e.g. Attention or use a wrapper function to + pick one based on your requirements: + layer_class = sequence_layers.get_layer_class(use_attention=True, + use_autoregression=True) + layer = layer_class(net, labels_one_hot, model_params, method_params) + char_logits = layer.create_logits() +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import abc +import logging +import numpy as np + +import tensorflow as tf + +from tensorflow.contrib import slim + + +def orthogonal_initializer(shape, dtype=tf.float32, *args, **kwargs): + """Generates orthonormal matrices with random values. + + Orthonormal initialization is important for RNNs: + http://arxiv.org/abs/1312.6120 + http://smerity.com/articles/2016/orthogonal_init.html + + For non-square shapes the returned matrix will be semi-orthonormal: if the + number of columns exceeds the number of rows, then the rows are orthonormal + vectors; but if the number of rows exceeds the number of columns, then the + columns are orthonormal vectors. + + We use SVD decomposition to generate an orthonormal matrix with random + values. The same way as it is done in the Lasagne library for Theano. Note + that both u and v returned by the svd are orthogonal and random. We just need + to pick one with the right shape. + + Args: + shape: a shape of the tensor matrix to initialize. + dtype: a dtype of the initialized tensor. + *args: not used. + **kwargs: not used. + + Returns: + An initialized tensor. + """ + del args + del kwargs + flat_shape = (shape[0], np.prod(shape[1:])) + w = np.random.randn(*flat_shape) + u, _, v = np.linalg.svd(w, full_matrices=False) + w = u if u.shape == flat_shape else v + return tf.constant(w.reshape(shape), dtype=dtype) + + +SequenceLayerParams = collections.namedtuple('SequenceLogitsParams', [ + 'num_lstm_units', 'weight_decay', 'lstm_state_clip_value' +]) + + +class SequenceLayerBase(object): + """A base abstruct class for all sequence layers. + + A child class has to define following methods: + get_train_input + get_eval_input + unroll_cell + """ + __metaclass__ = abc.ABCMeta + + def __init__(self, net, labels_one_hot, model_params, method_params): + """Stores argument in member variable for further use. + + Args: + net: A tensor with shape [batch_size, num_features, feature_size] which + contains some extracted image features. + labels_one_hot: An optional (can be None) ground truth labels for the + input features. Is a tensor with shape + [batch_size, seq_length, num_char_classes] + model_params: A namedtuple with model parameters (model.ModelParams). + method_params: A SequenceLayerParams instance. + """ + self._params = model_params + self._mparams = method_params + self._net = net + self._labels_one_hot = labels_one_hot + self._batch_size = net.get_shape().dims[0].value + + # Initialize parameters for char logits which will be computed on the fly + # inside an LSTM decoder. + self._char_logits = {} + regularizer = slim.l2_regularizer(self._mparams.weight_decay) + self._softmax_w = slim.model_variable( + 'softmax_w', + [self._mparams.num_lstm_units, self._params.num_char_classes], + initializer=orthogonal_initializer, + regularizer=regularizer) + self._softmax_b = slim.model_variable( + 'softmax_b', [self._params.num_char_classes], + initializer=tf.zeros_initializer(), + regularizer=regularizer) + + @abc.abstractmethod + def get_train_input(self, prev, i): + """Returns a sample to be used to predict a character during training. + + This function is used as a loop_function for an RNN decoder. + + Args: + prev: output tensor from previous step of the RNN. A tensor with shape: + [batch_size, num_char_classes]. + i: index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, ?] - depth depends on implementation + details. + """ + pass + + @abc.abstractmethod + def get_eval_input(self, prev, i): + """Returns a sample to be used to predict a character during inference. + + This function is used as a loop_function for an RNN decoder. + + Args: + prev: output tensor from previous step of the RNN. A tensor with shape: + [batch_size, num_char_classes]. + i: index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, ?] - depth depends on implementation + details. + """ + raise AssertionError('Not implemented') + + @abc.abstractmethod + def unroll_cell(self, decoder_inputs, initial_state, loop_function, cell): + """Unrolls an RNN cell for all inputs. + + This is a placeholder to call some RNN decoder. It has a similar to + tf.seq2seq.rnn_decode interface. + + Args: + decoder_inputs: A list of 2D Tensors* [batch_size x input_size]. In fact, + most of existing decoders in presence of a loop_function use only the + first element to determine batch_size and length of the list to + determine number of steps. + initial_state: 2D Tensor with shape [batch_size x cell.state_size]. + loop_function: function will be applied to the i-th output in order to + generate the i+1-st input (see self.get_input). + cell: rnn_cell.RNNCell defining the cell function and size. + + Returns: + A tuple of the form (outputs, state), where: + outputs: A list of character logits of the same length as + decoder_inputs of 2D Tensors with shape [batch_size x num_characters]. + state: The state of each cell at the final time-step. + It is a 2D Tensor of shape [batch_size x cell.state_size]. + """ + pass + + def is_training(self): + """Returns True if the layer is created for training stage.""" + return self._labels_one_hot is not None + + def char_logit(self, inputs, char_index): + """Creates logits for a character if required. + + Args: + inputs: A tensor with shape [batch_size, ?] (depth is implementation + dependent). + char_index: A integer index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, num_char_classes] + """ + if char_index not in self._char_logits: + self._char_logits[char_index] = tf.nn.xw_plus_b(inputs, self._softmax_w, + self._softmax_b) + return self._char_logits[char_index] + + def char_one_hot(self, logit): + """Creates one hot encoding for a logit of a character. + + Args: + logit: A tensor with shape [batch_size, num_char_classes]. + + Returns: + A tensor with shape [batch_size, num_char_classes] + """ + prediction = tf.argmax(logit, dimension=1) + return slim.one_hot_encoding(prediction, self._params.num_char_classes) + + def get_input(self, prev, i): + """A wrapper for get_train_input and get_eval_input. + + Args: + prev: output tensor from previous step of the RNN. A tensor with shape: + [batch_size, num_char_classes]. + i: index of a character in the output sequence. + + Returns: + A tensor with shape [batch_size, ?] - depth depends on implementation + details. + """ + if self.is_training(): + return self.get_train_input(prev, i) + else: + return self.get_eval_input(prev, i) + + def create_logits(self): + """Creates character sequence logits for a net specified in the constructor. + + A "main" method for the sequence layer which glues together all pieces. + + Returns: + A tensor with shape [batch_size, seq_length, num_char_classes]. + """ + with tf.variable_scope('LSTM'): + first_label = self.get_input(prev=None, i=0) + decoder_inputs = [first_label] + [None] * (self._params.seq_length - 1) + lstm_cell = tf.contrib.rnn.LSTMCell( + self._mparams.num_lstm_units, + use_peepholes=False, + cell_clip=self._mparams.lstm_state_clip_value, + state_is_tuple=True, + initializer=orthogonal_initializer) + lstm_outputs, _ = self.unroll_cell( + decoder_inputs=decoder_inputs, + initial_state=lstm_cell.zero_state(self._batch_size, tf.float32), + loop_function=self.get_input, + cell=lstm_cell) + + with tf.variable_scope('logits'): + logits_list = [ + tf.expand_dims(self.char_logit(logit, i), dim=1) + for i, logit in enumerate(lstm_outputs) + ] + + return tf.concat(logits_list, 1) + + +class NetSlice(SequenceLayerBase): + """A layer which uses a subset of image features to predict each character. + """ + + def __init__(self, *args, **kwargs): + super(NetSlice, self).__init__(*args, **kwargs) + self._zero_label = tf.zeros( + [self._batch_size, self._params.num_char_classes]) + + def get_image_feature(self, char_index): + """Returns a subset of image features for a character. + + Args: + char_index: an index of a character. + + Returns: + A tensor with shape [batch_size, ?]. The output depth depends on the + depth of input net. + """ + batch_size, features_num, _ = [d.value for d in self._net.get_shape()] + slice_len = int(features_num / self._params.seq_length) + # In case when features_num != seq_length, we just pick a subset of image + # features, this choice is arbitrary and there is no intuitive geometrical + # interpretation. If features_num is not dividable by seq_length there will + # be unused image features. + net_slice = self._net[:, char_index:char_index + slice_len, :] + feature = tf.reshape(net_slice, [batch_size, -1]) + logging.debug('Image feature: %s', feature) + return feature + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + del prev + return self.get_image_feature(i) + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + return self.get_eval_input(prev, i) + + def unroll_cell(self, decoder_inputs, initial_state, loop_function, cell): + """See SequenceLayerBase.unroll_cell for details.""" + return tf.contrib.legacy_seq2seq.rnn_decoder( + decoder_inputs=decoder_inputs, + initial_state=initial_state, + cell=cell, + loop_function=self.get_input) + + +class NetSliceWithAutoregression(NetSlice): + """A layer similar to NetSlice, but it also uses auto regression. + + The "auto regression" means that we use network output for previous character + as a part of input for the current character. + """ + + def __init__(self, *args, **kwargs): + super(NetSliceWithAutoregression, self).__init__(*args, **kwargs) + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + if i == 0: + prev = self._zero_label + else: + logit = self.char_logit(prev, char_index=i - 1) + prev = self.char_one_hot(logit) + image_feature = self.get_image_feature(char_index=i) + return tf.concat([image_feature, prev], 1) + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + if i == 0: + prev = self._zero_label + else: + prev = self._labels_one_hot[:, i - 1, :] + image_feature = self.get_image_feature(i) + return tf.concat([image_feature, prev], 1) + + +class Attention(SequenceLayerBase): + """A layer which uses attention mechanism to select image features.""" + + def __init__(self, *args, **kwargs): + super(Attention, self).__init__(*args, **kwargs) + self._zero_label = tf.zeros( + [self._batch_size, self._params.num_char_classes]) + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + del prev, i + # The attention_decoder will fetch image features from the net, no need for + # extra inputs. + return self._zero_label + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + return self.get_eval_input(prev, i) + + def unroll_cell(self, decoder_inputs, initial_state, loop_function, cell): + return tf.contrib.legacy_seq2seq.attention_decoder( + decoder_inputs=decoder_inputs, + initial_state=initial_state, + attention_states=self._net, + cell=cell, + loop_function=self.get_input) + + +class AttentionWithAutoregression(Attention): + """A layer which uses both attention and auto regression.""" + + def __init__(self, *args, **kwargs): + super(AttentionWithAutoregression, self).__init__(*args, **kwargs) + + def get_train_input(self, prev, i): + """See SequenceLayerBase.get_train_input for details.""" + if i == 0: + return self._zero_label + else: + # TODO(gorban): update to gradually introduce gt labels. + return self._labels_one_hot[:, i - 1, :] + + def get_eval_input(self, prev, i): + """See SequenceLayerBase.get_eval_input for details.""" + if i == 0: + return self._zero_label + else: + logit = self.char_logit(prev, char_index=i - 1) + return self.char_one_hot(logit) + + +def get_layer_class(use_attention, use_autoregression): + """A convenience function to get a layer class based on requirements. + + Args: + use_attention: if True a returned class will use attention. + use_autoregression: if True a returned class will use auto regression. + + Returns: + One of available sequence layers (child classes for SequenceLayerBase). + """ + if use_attention and use_autoregression: + layer_class = AttentionWithAutoregression + elif use_attention and not use_autoregression: + layer_class = Attention + elif not use_attention and not use_autoregression: + layer_class = NetSlice + elif not use_attention and use_autoregression: + layer_class = NetSliceWithAutoregression + else: + raise AssertionError('Unsupported sequence layer class') + + logging.debug('Use %s as a layer class', layer_class.__name__) + return layer_class diff --git a/attention_ocr/python/sequence_layers_test.py b/attention_ocr/python/sequence_layers_test.py new file mode 100644 index 0000000000000000000000000000000000000000..fd41e2d824c014084129707631d45de334ec741b --- /dev/null +++ b/attention_ocr/python/sequence_layers_test.py @@ -0,0 +1,112 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for sequence_layers.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import tensorflow as tf +from tensorflow.contrib import slim + +import model +import sequence_layers + + +def fake_net(batch_size, num_features, feature_size): + return tf.convert_to_tensor( + np.random.uniform(size=(batch_size, num_features, feature_size)), + dtype=tf.float32) + + +def fake_labels(batch_size, seq_length, num_char_classes): + labels_np = tf.convert_to_tensor( + np.random.randint( + low=0, high=num_char_classes, size=(batch_size, seq_length))) + return slim.one_hot_encoding(labels_np, num_classes=num_char_classes) + + +def create_layer(layer_class, batch_size, seq_length, num_char_classes): + model_params = model.ModelParams( + num_char_classes=num_char_classes, + seq_length=seq_length, + num_views=1, + null_code=num_char_classes) + net = fake_net( + batch_size=batch_size, num_features=seq_length * 5, feature_size=6) + labels_one_hot = fake_labels(batch_size, seq_length, num_char_classes) + layer_params = sequence_layers.SequenceLayerParams( + num_lstm_units=10, weight_decay=0.00004, lstm_state_clip_value=10.0) + return layer_class(net, labels_one_hot, model_params, layer_params) + + +class SequenceLayersTest(tf.test.TestCase): + def test_net_slice_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.NetSlice, batch_size, seq_length, + num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + def test_net_slice_with_autoregression_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.NetSliceWithAutoregression, + batch_size, seq_length, num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + def test_attention_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.Attention, batch_size, seq_length, + num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + def test_attention_with_autoregression_char_logits_with_correct_shape(self): + batch_size = 2 + seq_length = 4 + num_char_classes = 3 + + layer = create_layer(sequence_layers.AttentionWithAutoregression, + batch_size, seq_length, num_char_classes) + char_logits = layer.create_logits() + + self.assertEqual( + tf.TensorShape([batch_size, seq_length, num_char_classes]), + char_logits.get_shape()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/attention_ocr/python/train.py b/attention_ocr/python/train.py new file mode 100644 index 0000000000000000000000000000000000000000..fa91fb73b412287889f05d0af5875e269f1ce367 --- /dev/null +++ b/attention_ocr/python/train.py @@ -0,0 +1,209 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Script to train the Attention OCR model. + +A simple usage example: +python train.py +""" +import collections +import logging +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow import app +from tensorflow.python.platform import flags +from tensorflow.contrib.tfprof import model_analyzer + +import data_provider +import common_flags + +FLAGS = flags.FLAGS +common_flags.define() + +# yapf: disable +flags.DEFINE_integer('task', 0, + 'The Task ID. This value is used when training with ' + 'multiple workers to identify each worker.') + +flags.DEFINE_integer('ps_tasks', 0, + 'The number of parameter servers. If the value is 0, then' + ' the parameters are handled locally by the worker.') + +flags.DEFINE_integer('save_summaries_secs', 60, + 'The frequency with which summaries are saved, in ' + 'seconds.') + +flags.DEFINE_integer('save_interval_secs', 600, + 'Frequency in seconds of saving the model.') + +flags.DEFINE_integer('max_number_of_steps', int(1e10), + 'The maximum number of gradient steps.') + +flags.DEFINE_string('checkpoint_inception', '', + 'Checkpoint to recover inception weights from.') + +flags.DEFINE_float('clip_gradient_norm', 2.0, + 'If greater than 0 then the gradients would be clipped by ' + 'it.') + +flags.DEFINE_bool('sync_replicas', False, + 'If True will synchronize replicas during training.') + +flags.DEFINE_integer('replicas_to_aggregate', 1, + 'The number of gradients updates before updating params.') + +flags.DEFINE_integer('total_num_replicas', 1, + 'Total number of worker replicas.') + +flags.DEFINE_integer('startup_delay_steps', 15, + 'Number of training steps between replicas startup.') + +flags.DEFINE_boolean('reset_train_dir', False, + 'If true will delete all files in the train_log_dir') + +flags.DEFINE_boolean('show_graph_stats', False, + 'Output model size stats to stderr.') +# yapf: enable + +TrainingHParams = collections.namedtuple('TrainingHParams', [ + 'learning_rate', + 'optimizer', + 'momentum', + 'use_augment_input', +]) + + +def get_training_hparams(): + return TrainingHParams( + learning_rate=FLAGS.learning_rate, + optimizer=FLAGS.optimizer, + momentum=FLAGS.momentum, + use_augment_input=FLAGS.use_augment_input) + + +def create_optimizer(hparams): + """Creates optimized based on the specified flags.""" + if hparams.optimizer == 'momentum': + optimizer = tf.train.MomentumOptimizer( + hparams.learning_rate, momentum=hparams.momentum) + elif hparams.optimizer == 'adam': + optimizer = tf.train.AdamOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'adadelta': + optimizer = tf.train.AdadeltaOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'adagrad': + optimizer = tf.train.AdagradOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'rmsprop': + optimizer = tf.train.RMSPropOptimizer( + hparams.learning_rate, momentum=hparams.momentum) + return optimizer + + +def train(loss, init_fn, hparams): + """Wraps slim.learning.train to run a training loop. + + Args: + loss: a loss tensor + init_fn: A callable to be executed after all other initialization is done. + hparams: a model hyper parameters + """ + optimizer = create_optimizer(hparams) + + if FLAGS.sync_replicas: + replica_id = tf.constant(FLAGS.task, tf.int32, shape=()) + optimizer = tf.LegacySyncReplicasOptimizer( + opt=optimizer, + replicas_to_aggregate=FLAGS.replicas_to_aggregate, + replica_id=replica_id, + total_num_replicas=FLAGS.total_num_replicas) + sync_optimizer = optimizer + startup_delay_steps = 0 + else: + startup_delay_steps = 0 + sync_optimizer = None + + train_op = slim.learning.create_train_op( + loss, + optimizer, + summarize_gradients=True, + clip_gradient_norm=FLAGS.clip_gradient_norm) + + slim.learning.train( + train_op=train_op, + logdir=FLAGS.train_log_dir, + graph=loss.graph, + master=FLAGS.master, + is_chief=(FLAGS.task == 0), + number_of_steps=FLAGS.max_number_of_steps, + save_summaries_secs=FLAGS.save_summaries_secs, + save_interval_secs=FLAGS.save_interval_secs, + startup_delay_steps=startup_delay_steps, + sync_optimizer=sync_optimizer, + init_fn=init_fn) + + +def prepare_training_dir(): + if not tf.gfile.Exists(FLAGS.train_log_dir): + logging.info('Create a new training directory %s', FLAGS.train_log_dir) + tf.gfile.MakeDirs(FLAGS.train_log_dir) + else: + if FLAGS.reset_train_dir: + logging.info('Reset the training directory %s', FLAGS.train_log_dir) + tf.gfile.DeleteRecursively(FLAGS.train_log_dir) + tf.gfile.MakeDirs(FLAGS.train_log_dir) + else: + logging.info('Use already existing training directory %s', + FLAGS.train_log_dir) + + +def calculate_graph_metrics(): + param_stats = model_analyzer.print_model_analysis( + tf.get_default_graph(), + tfprof_options=model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS) + return param_stats.total_parameters + + +def main(_): + prepare_training_dir() + + dataset = common_flags.create_dataset(split_name=FLAGS.split_name) + model = common_flags.create_model(dataset.num_char_classes, + dataset.max_sequence_length, + dataset.num_of_views, dataset.null_code) + hparams = get_training_hparams() + + # If ps_tasks is zero, the local device is used. When using multiple + # (non-local) replicas, the ReplicaDeviceSetter distributes the variables + # across the different devices. + device_setter = tf.train.replica_device_setter( + FLAGS.ps_tasks, merge_devices=True) + with tf.device(device_setter): + data = data_provider.get_data( + dataset, + FLAGS.batch_size, + augment=hparams.use_augment_input, + central_crop_size=common_flags.get_crop_size()) + endpoints = model.create_base(data.images, data.labels_one_hot) + total_loss = model.create_loss(data, endpoints) + model.create_summaries(data, endpoints, dataset.charset, is_training=True) + init_fn = model.create_init_fn_to_restore(FLAGS.checkpoint, + FLAGS.checkpoint_inception) + if FLAGS.show_graph_stats: + logging.info('Total number of weights in the graph: %s', + calculate_graph_metrics()) + train(total_loss, init_fn, hparams) + + +if __name__ == '__main__': + app.run() diff --git a/attention_ocr/python/utils.py b/attention_ocr/python/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..10d93ad21e1444736bf4562ef0df1c939617a5c1 --- /dev/null +++ b/attention_ocr/python/utils.py @@ -0,0 +1,80 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Functions to support building models for StreetView text transcription.""" + +import tensorflow as tf +from tensorflow.contrib import slim + + +def logits_to_log_prob(logits): + """Computes log probabilities using numerically stable trick. + + This uses two numerical stability tricks: + 1) softmax(x) = softmax(x - c) where c is a constant applied to all + arguments. If we set c = max(x) then the softmax is more numerically + stable. + 2) log softmax(x) is not numerically stable, but we can stabilize it + by using the identity log softmax(x) = x - log sum exp(x) + + Args: + logits: Tensor of arbitrary shape whose last dimension contains logits. + + Returns: + A tensor of the same shape as the input, but with corresponding log + probabilities. + """ + + with tf.variable_scope('log_probabilities'): + reduction_indices = len(logits.shape.as_list()) - 1 + max_logits = tf.reduce_max( + logits, reduction_indices=reduction_indices, keep_dims=True) + safe_logits = tf.subtract(logits, max_logits) + sum_exp = tf.reduce_sum( + tf.exp(safe_logits), + reduction_indices=reduction_indices, + keep_dims=True) + log_probs = tf.subtract(safe_logits, tf.log(sum_exp)) + return log_probs + + +def variables_to_restore(scope=None, strip_scope=False): + """Returns a list of variables to restore for the specified list of methods. + + It is supposed that variable name starts with the method's scope (a prefix + returned by _method_scope function). + + Args: + methods_names: a list of names of configurable methods. + strip_scope: if True will return variable names without method's scope. + If methods_names is None will return names unchanged. + model_scope: a scope for a whole model. + + Returns: + a dictionary mapping variable names to variables for restore. + """ + if scope: + variable_map = {} + method_variables = slim.get_variables_to_restore(include=[scope]) + for var in method_variables: + if strip_scope: + var_name = var.op.name[len(scope) + 1:] + else: + var_name = var.op.name + variable_map[var_name] = var + + return variable_map + else: + return {v.op.name: v for v in slim.get_variables_to_restore()} diff --git a/autoencoder/AdditiveGaussianNoiseAutoencoderRunner.py b/autoencoder/AdditiveGaussianNoiseAutoencoderRunner.py index 66fac00df4acbe6f469f19151eb2324f9e2242a2..e91176634af216e755969b1a7382d3b67a8e308e 100644 --- a/autoencoder/AdditiveGaussianNoiseAutoencoderRunner.py +++ b/autoencoder/AdditiveGaussianNoiseAutoencoderRunner.py @@ -4,7 +4,7 @@ import sklearn.preprocessing as prep import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data -from autoencoder.autoencoder_models.DenoisingAutoencoder import AdditiveGaussianNoiseAutoencoder +from autoencoder_models.DenoisingAutoencoder import AdditiveGaussianNoiseAutoencoder mnist = input_data.read_data_sets('MNIST_data', one_hot = True) @@ -45,7 +45,6 @@ for epoch in range(training_epochs): # Display logs per epoch step if epoch % display_step == 0: - print "Epoch:", '%04d' % (epoch + 1), \ - "cost=", "{:.9f}".format(avg_cost) + print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost)) -print "Total cost: " + str(autoencoder.calc_total_cost(X_test)) +print("Total cost: " + str(autoencoder.calc_total_cost(X_test))) diff --git a/autoencoder/AutoencoderRunner.py b/autoencoder/AutoencoderRunner.py index 3eab056126cd8716c607fff8c172053fd2579dd6..4d708742aba1d988ea10a8c4459f9c68004b6405 100644 --- a/autoencoder/AutoencoderRunner.py +++ b/autoencoder/AutoencoderRunner.py @@ -4,7 +4,7 @@ import sklearn.preprocessing as prep import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data -from autoencoder.autoencoder_models.Autoencoder import Autoencoder +from autoencoder_models.Autoencoder import Autoencoder mnist = input_data.read_data_sets('MNIST_data', one_hot = True) @@ -44,7 +44,6 @@ for epoch in range(training_epochs): # Display logs per epoch step if epoch % display_step == 0: - print "Epoch:", '%04d' % (epoch + 1), \ - "cost=", "{:.9f}".format(avg_cost) + print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost)) -print "Total cost: " + str(autoencoder.calc_total_cost(X_test)) +print("Total cost: " + str(autoencoder.calc_total_cost(X_test))) diff --git a/autoencoder/MaskingNoiseAutoencoderRunner.py b/autoencoder/MaskingNoiseAutoencoderRunner.py index d5f8cc1db494606ada63c23965b48a792fd10bae..4f5db16854d9c1acf860362f6aac13e132f3990a 100644 --- a/autoencoder/MaskingNoiseAutoencoderRunner.py +++ b/autoencoder/MaskingNoiseAutoencoderRunner.py @@ -4,7 +4,7 @@ import sklearn.preprocessing as prep import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data -from autoencoder.autoencoder_models.DenoisingAutoencoder import MaskingNoiseAutoencoder +from autoencoder_models.DenoisingAutoencoder import MaskingNoiseAutoencoder mnist = input_data.read_data_sets('MNIST_data', one_hot = True) @@ -43,7 +43,6 @@ for epoch in range(training_epochs): avg_cost += cost / n_samples * batch_size if epoch % display_step == 0: - print "Epoch:", '%04d' % (epoch + 1), \ - "cost=", "{:.9f}".format(avg_cost) + print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost)) -print "Total cost: " + str(autoencoder.calc_total_cost(X_test)) +print("Total cost: " + str(autoencoder.calc_total_cost(X_test))) diff --git a/autoencoder/Utils.py b/autoencoder/Utils.py deleted file mode 100644 index 2034f6a7231665a67552a762690fb99988722062..0000000000000000000000000000000000000000 --- a/autoencoder/Utils.py +++ /dev/null @@ -1,9 +0,0 @@ -import numpy as np -import tensorflow as tf - -def xavier_init(fan_in, fan_out, constant = 1): - low = -constant * np.sqrt(6.0 / (fan_in + fan_out)) - high = constant * np.sqrt(6.0 / (fan_in + fan_out)) - return tf.random_uniform((fan_in, fan_out), - minval = low, maxval = high, - dtype = tf.float32) diff --git a/autoencoder/VariationalAutoencoderRunner.py b/autoencoder/VariationalAutoencoderRunner.py index f6e73f16f92eabc26bc15708debf7c4155680c37..1142681798061d13bd19456f1193bd7ab0b80bcf 100644 --- a/autoencoder/VariationalAutoencoderRunner.py +++ b/autoencoder/VariationalAutoencoderRunner.py @@ -4,7 +4,7 @@ import sklearn.preprocessing as prep import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data -from autoencoder.autoencoder_models.VariationalAutoencoder import VariationalAutoencoder +from autoencoder_models.VariationalAutoencoder import VariationalAutoencoder mnist = input_data.read_data_sets('MNIST_data', one_hot = True) @@ -47,7 +47,6 @@ for epoch in range(training_epochs): # Display logs per epoch step if epoch % display_step == 0: - print "Epoch:", '%04d' % (epoch + 1), \ - "cost=", "{:.9f}".format(avg_cost) + print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost)) -print "Total cost: " + str(autoencoder.calc_total_cost(X_test)) +print("Total cost: " + str(autoencoder.calc_total_cost(X_test))) diff --git a/autoencoder/autoencoder_models/Autoencoder.py b/autoencoder/autoencoder_models/Autoencoder.py index e6d61fec352f6fcd3a87465d2652fb78df4c46a8..cde14aa4a993cb6997eeb99e8af31fb3d438cd28 100644 --- a/autoencoder/autoencoder_models/Autoencoder.py +++ b/autoencoder/autoencoder_models/Autoencoder.py @@ -1,6 +1,4 @@ import tensorflow as tf -import numpy as np -import autoencoder.Utils class Autoencoder(object): @@ -18,7 +16,7 @@ class Autoencoder(object): self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2']) # cost - self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.sub(self.reconstruction, self.x), 2.0)) + self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) self.optimizer = optimizer.minimize(self.cost) init = tf.global_variables_initializer() @@ -28,7 +26,8 @@ class Autoencoder(object): def _initialize_weights(self): all_weights = dict() - all_weights['w1'] = tf.Variable(autoencoder.Utils.xavier_init(self.n_input, self.n_hidden)) + all_weights['w1'] = tf.get_variable("w1", shape=[self.n_input, self.n_hidden], + initializer=tf.contrib.layers.xavier_initializer()) all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype=tf.float32)) all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype=tf.float32)) all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype=tf.float32)) @@ -46,7 +45,7 @@ class Autoencoder(object): def generate(self, hidden = None): if hidden is None: - hidden = np.random.normal(size=self.weights["b1"]) + hidden = self.sess.run(tf.random_normal([1, self.n_hidden])) return self.sess.run(self.reconstruction, feed_dict={self.hidden: hidden}) def reconstruct(self, X): diff --git a/autoencoder/autoencoder_models/DenoisingAutoencoder.py b/autoencoder/autoencoder_models/DenoisingAutoencoder.py index 05c57cfb82ecacbf1104da0f2f932837057c2ecf..22b5dcb44a4079b80bfcfc16e3dcda5b21ca8c1b 100644 --- a/autoencoder/autoencoder_models/DenoisingAutoencoder.py +++ b/autoencoder/autoencoder_models/DenoisingAutoencoder.py @@ -1,7 +1,4 @@ import tensorflow as tf -import numpy as np -import autoencoder.Utils - class AdditiveGaussianNoiseAutoencoder(object): def __init__(self, n_input, n_hidden, transfer_function = tf.nn.softplus, optimizer = tf.train.AdamOptimizer(), @@ -22,7 +19,7 @@ class AdditiveGaussianNoiseAutoencoder(object): self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2']) # cost - self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.sub(self.reconstruction, self.x), 2.0)) + self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) self.optimizer = optimizer.minimize(self.cost) init = tf.global_variables_initializer() @@ -31,7 +28,8 @@ class AdditiveGaussianNoiseAutoencoder(object): def _initialize_weights(self): all_weights = dict() - all_weights['w1'] = tf.Variable(autoencoder.Utils.xavier_init(self.n_input, self.n_hidden)) + all_weights['w1'] = tf.get_variable("w1", shape=[self.n_input, self.n_hidden], + initializer=tf.contrib.layers.xavier_initializer()) all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype = tf.float32)) all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype = tf.float32)) all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype = tf.float32)) @@ -53,9 +51,9 @@ class AdditiveGaussianNoiseAutoencoder(object): self.scale: self.training_scale }) - def generate(self, hidden = None): + def generate(self, hidden=None): if hidden is None: - hidden = np.random.normal(size = self.weights["b1"]) + hidden = self.sess.run(tf.random_normal([1, self.n_hidden])) return self.sess.run(self.reconstruction, feed_dict = {self.hidden: hidden}) def reconstruct(self, X): @@ -89,7 +87,7 @@ class MaskingNoiseAutoencoder(object): self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2']) # cost - self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.sub(self.reconstruction, self.x), 2.0)) + self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) self.optimizer = optimizer.minimize(self.cost) init = tf.global_variables_initializer() @@ -98,7 +96,8 @@ class MaskingNoiseAutoencoder(object): def _initialize_weights(self): all_weights = dict() - all_weights['w1'] = tf.Variable(autoencoder.Utils.xavier_init(self.n_input, self.n_hidden)) + all_weights['w1'] = tf.get_variable("w1", shape=[self.n_input, self.n_hidden], + initializer=tf.contrib.layers.xavier_initializer()) all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype = tf.float32)) all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype = tf.float32)) all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype = tf.float32)) @@ -115,9 +114,9 @@ class MaskingNoiseAutoencoder(object): def transform(self, X): return self.sess.run(self.hidden, feed_dict = {self.x: X, self.keep_prob: 1.0}) - def generate(self, hidden = None): + def generate(self, hidden=None): if hidden is None: - hidden = np.random.normal(size = self.weights["b1"]) + hidden = self.sess.run(tf.random_normal([1, self.n_hidden])) return self.sess.run(self.reconstruction, feed_dict = {self.hidden: hidden}) def reconstruct(self, X): diff --git a/autoencoder/autoencoder_models/VariationalAutoencoder.py b/autoencoder/autoencoder_models/VariationalAutoencoder.py index 05e9f4ed9aa5d629fbd79e59258b5c6fdc66a59f..8767891a92efc6f1bf058f8b50b23c315eae8d4e 100644 --- a/autoencoder/autoencoder_models/VariationalAutoencoder.py +++ b/autoencoder/autoencoder_models/VariationalAutoencoder.py @@ -1,6 +1,4 @@ import tensorflow as tf -import numpy as np -import autoencoder.Utils class VariationalAutoencoder(object): @@ -17,13 +15,13 @@ class VariationalAutoencoder(object): self.z_log_sigma_sq = tf.add(tf.matmul(self.x, self.weights['log_sigma_w1']), self.weights['log_sigma_b1']) # sample from gaussian distribution - eps = tf.random_normal(tf.pack([tf.shape(self.x)[0], self.n_hidden]), 0, 1, dtype = tf.float32) - self.z = tf.add(self.z_mean, tf.mul(tf.sqrt(tf.exp(self.z_log_sigma_sq)), eps)) + eps = tf.random_normal(tf.stack([tf.shape(self.x)[0], self.n_hidden]), 0, 1, dtype = tf.float32) + self.z = tf.add(self.z_mean, tf.multiply(tf.sqrt(tf.exp(self.z_log_sigma_sq)), eps)) self.reconstruction = tf.add(tf.matmul(self.z, self.weights['w2']), self.weights['b2']) # cost - reconstr_loss = 0.5 * tf.reduce_sum(tf.pow(tf.sub(self.reconstruction, self.x), 2.0)) + reconstr_loss = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) latent_loss = -0.5 * tf.reduce_sum(1 + self.z_log_sigma_sq - tf.square(self.z_mean) - tf.exp(self.z_log_sigma_sq), 1) @@ -36,8 +34,10 @@ class VariationalAutoencoder(object): def _initialize_weights(self): all_weights = dict() - all_weights['w1'] = tf.Variable(autoencoder.Utils.xavier_init(self.n_input, self.n_hidden)) - all_weights['log_sigma_w1'] = tf.Variable(autoencoder.Utils.xavier_init(self.n_input, self.n_hidden)) + all_weights['w1'] = tf.get_variable("w1", shape=[self.n_input, self.n_hidden], + initializer=tf.contrib.layers.xavier_initializer()) + all_weights['log_sigma_w1'] = tf.get_variable("log_sigma_w1", shape=[self.n_input, self.n_hidden], + initializer=tf.contrib.layers.xavier_initializer()) all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype=tf.float32)) all_weights['log_sigma_b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype=tf.float32)) all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype=tf.float32)) @@ -56,8 +56,8 @@ class VariationalAutoencoder(object): def generate(self, hidden = None): if hidden is None: - hidden = np.random.normal(size=self.weights["b1"]) - return self.sess.run(self.reconstruction, feed_dict={self.z_mean: hidden}) + hidden = self.sess.run(tf.random_normal([1, self.n_hidden])) + return self.sess.run(self.reconstruction, feed_dict={self.z: hidden}) def reconstruct(self, X): return self.sess.run(self.reconstruction, feed_dict={self.x: X}) diff --git a/cognitive_mapping_and_planning/.gitignore b/cognitive_mapping_and_planning/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..cbc6a8f0271075171ffdf3c2bc5fb9c528b08fc6 --- /dev/null +++ b/cognitive_mapping_and_planning/.gitignore @@ -0,0 +1,4 @@ +deps +*.pyc +lib*.so +lib*.so* diff --git a/cognitive_mapping_and_planning/README.md b/cognitive_mapping_and_planning/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ce69d34745368159d36ee3421ce1ed9de468cf2b --- /dev/null +++ b/cognitive_mapping_and_planning/README.md @@ -0,0 +1,122 @@ +# Cognitive Mapping and Planning for Visual Navigation +**Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik** + +**Computer Vision and Pattern Recognition (CVPR) 2017.** + +**[ArXiv](https://arxiv.org/abs/1702.03920), +[Project Website](https://sites.google.com/corp/view/cognitive-mapping-and-planning/)** + +### Citing +If you find this code base and models useful in your research, please consider +citing the following paper: + ``` + @inproceedings{gupta2017cognitive, + title={Cognitive Mapping and Planning for Visual Navigation}, + author={Gupta, Saurabh and Davidson, James and Levine, Sergey and + Sukthankar, Rahul and Malik, Jitendra}, + booktitle={CVPR}, + year={2017} + } + ``` + +### Contents +1. [Requirements: software](#requirements-software) +2. [Requirements: data](#requirements-data) +3. [Test Pre-trained Models](#test-pre_trained-models) +4. [Train your Own Models](#train-your-own-models) + +### Requirements: software +1. Python Virtual Env Setup: All code is implemented in Python but depends on a + small number of python packages and a couple of C libraries. We recommend + using virtual environment for installing these python packages and python + bindings for these C libraries. + ```Shell + VENV_DIR=venv + pip install virtualenv + virtualenv $VENV_DIR + source $VENV_DIR/bin/activate + + # You may need to upgrade pip for installing openv-python. + pip install --upgrade pip + # Install simple dependencies. + pip install -r requirements.txt + + # Patch bugs in dependencies. + sh patches/apply_patches.sh + ``` + +2. Install [Tensorflow](https://www.tensorflow.org/) inside this virtual + environment. Typically done with `pip install --upgrade tensorflow-gpu`. + +3. Swiftshader: We use + [Swiftshader](https://github.com/google/swiftshader.git), a CPU based + renderer to render the meshes. It is possible to use other renderers, + replace `SwiftshaderRenderer` in `render/swiftshader_renderer.py` with + bindings to your renderer. + ```Shell + mkdir -p deps + git clone --recursive https://github.com/google/swiftshader.git deps/swiftshader-src + cd deps/swiftshader-src && git checkout 91da6b00584afd7dcaed66da88e2b617429b3950 + mkdir build && cd build && cmake .. && make -j 16 libEGL libGLESv2 + cd ../../../ + cp deps/swiftshader-src/build/libEGL* libEGL.so.1 + cp deps/swiftshader-src/build/libGLESv2* libGLESv2.so.2 + ``` + +4. PyAssimp: We use [PyAssimp](https://github.com/assimp/assimp.git) to load + meshes. It is possible to use other libraries to load meshes, replace + `Shape` `render/swiftshader_renderer.py` with bindings to your library for + loading meshes. + ```Shell + mkdir -p deps + git clone https://github.com/assimp/assimp.git deps/assimp-src + cd deps/assimp-src + git checkout 2afeddd5cb63d14bc77b53740b38a54a97d94ee8 + cmake CMakeLists.txt -G 'Unix Makefiles' && make -j 16 + cd port/PyAssimp && python setup.py install + cd ../../../.. + cp deps/assimp-src/lib/libassimp* . + ``` + +5. graph-tool: We use [graph-tool](https://git.skewed.de/count0/graph-tool) + library for graph processing. + ```Shell + mkdir -p deps + # If the following git clone command fails, you can also download the source + # from https://downloads.skewed.de/graph-tool/graph-tool-2.2.44.tar.bz2 + git clone https://git.skewed.de/count0/graph-tool deps/graph-tool-src + cd deps/graph-tool-src && git checkout 178add3a571feb6666f4f119027705d95d2951ab + bash autogen.sh + ./configure --disable-cairo --disable-sparsehash --prefix=$HOME/.local + make -j 16 + make install + cd ../../ + ``` + +### Requirements: data +1. Download the Stanford 3D Indoor Spaces Dataset (S3DIS Dataset) and ImageNet + Pre-trained models for initializing different models. Follow instructions in + `data/README.md` + +### Test Pre-trained Models +1. Download pre-trained models using + `scripts/scripts_download_pretrained_models.sh` + +2. Test models using `scripts/script_test_pretrained_models.sh`. + +### Train Your Own Models +All models were trained asynchronously with 16 workers each worker using data +from a single floor. The default hyper-parameters correspond to this setting. +See [distributed training with +Tensorflow](https://www.tensorflow.org/deploy/distributed) for setting up +distributed training. Training with a single worker is possible with the current +code base but will require some minor changes to allow each worker to load all +training environments. + +### Contact +For questions or issues open an issue on the tensorflow/models [issues +tracker](https://github.com/tensorflow/models/issues). Please assign issues to +@s-gupta. + +### Credits +This code was written by Saurabh Gupta (@s-gupta). diff --git a/cognitive_mapping_and_planning/__init__.py b/cognitive_mapping_and_planning/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/cfgs/__init__.py b/cognitive_mapping_and_planning/cfgs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/cfgs/config_cmp.py b/cognitive_mapping_and_planning/cfgs/config_cmp.py new file mode 100644 index 0000000000000000000000000000000000000000..715eee2b973cb66f816ecdb65bbcc3abdd8a9483 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_cmp.py @@ -0,0 +1,283 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os, sys +import numpy as np +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +import logging +import src.utils as utils +import cfgs.config_common as cc + + +import tensorflow as tf + + +rgb_resnet_v2_50_path = 'data/init_models/resnet_v2_50/model.ckpt-5136169' +d_resnet_v2_50_path = 'data/init_models/distill_rgb_to_d_resnet_v2_50/model.ckpt-120002' + +def get_default_args(): + summary_args = utils.Foo(display_interval=1, test_iters=26, + arop_full_summary_iters=14) + + control_args = utils.Foo(train=False, test=False, + force_batchnorm_is_training_at_test=False, + reset_rng_seed=False, only_eval_when_done=False, + test_mode=None) + return summary_args, control_args + +def get_default_cmp_args(): + batch_norm_param = {'center': True, 'scale': True, + 'activation_fn':tf.nn.relu} + + mapper_arch_args = utils.Foo( + dim_reduce_neurons=64, + fc_neurons=[1024, 1024], + fc_out_size=8, + fc_out_neurons=64, + encoder='resnet_v2_50', + deconv_neurons=[64, 32, 16, 8, 4, 2], + deconv_strides=[2, 2, 2, 2, 2, 2], + deconv_layers_per_block=2, + deconv_kernel_size=4, + fc_dropout=0.5, + combine_type='wt_avg_logits', + batch_norm_param=batch_norm_param) + + readout_maps_arch_args = utils.Foo( + num_neurons=[], + strides=[], + kernel_size=None, + layers_per_block=None) + + arch_args = utils.Foo( + vin_val_neurons=8, vin_action_neurons=8, vin_ks=3, vin_share_wts=False, + pred_neurons=[64, 64], pred_batch_norm_param=batch_norm_param, + conv_on_value_map=0, fr_neurons=16, fr_ver='v2', fr_inside_neurons=64, + fr_stride=1, crop_remove_each=30, value_crop_size=4, + action_sample_type='sample', action_sample_combine_type='one_or_other', + sample_gt_prob_type='inverse_sigmoid_decay', dagger_sample_bn_false=True, + vin_num_iters=36, isd_k=750., use_agent_loc=False, multi_scale=True, + readout_maps=False, rom_arch=readout_maps_arch_args) + + return arch_args, mapper_arch_args + +def get_arch_vars(arch_str): + if arch_str == '': vals = [] + else: vals = arch_str.split('_') + ks = ['var1', 'var2', 'var3'] + ks = ks[:len(vals)] + + # Exp Ver. + if len(vals) == 0: ks.append('var1'); vals.append('v0') + # custom arch. + if len(vals) == 1: ks.append('var2'); vals.append('') + # map scape for projection baseline. + if len(vals) == 2: ks.append('var3'); vals.append('fr2') + + assert(len(vals) == 3) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + + logging.error('arch_vars: %s', vars) + return vars + +def process_arch_str(args, arch_str): + # This function modifies args. + args.arch, args.mapper_arch = get_default_cmp_args() + + arch_vars = get_arch_vars(arch_str) + + args.navtask.task_params.outputs.ego_maps = True + args.navtask.task_params.outputs.ego_goal_imgs = True + args.navtask.task_params.outputs.egomotion = True + args.navtask.task_params.toy_problem = False + + if arch_vars.var1 == 'lmap': + args = process_arch_learned_map(args, arch_vars) + + elif arch_vars.var1 == 'pmap': + args = process_arch_projected_map(args, arch_vars) + + else: + logging.fatal('arch_vars.var1 should be lmap or pmap, but is %s', arch_vars.var1) + assert(False) + + return args + +def process_arch_learned_map(args, arch_vars): + # Multiscale vision based system. + args.navtask.task_params.input_type = 'vision' + args.navtask.task_params.outputs.images = True + + if args.navtask.camera_param.modalities[0] == 'rgb': + args.solver.pretrained_path = rgb_resnet_v2_50_path + elif args.navtask.camera_param.modalities[0] == 'depth': + args.solver.pretrained_path = d_resnet_v2_50_path + + if arch_vars.var2 == 'Ssc': + sc = 1./args.navtask.task_params.step_size + args.arch.vin_num_iters = 40 + args.navtask.task_params.map_scales = [sc] + max_dist = args.navtask.task_params.max_dist * \ + args.navtask.task_params.num_goals + args.navtask.task_params.map_crop_sizes = [2*max_dist] + + args.arch.fr_stride = 1 + args.arch.vin_action_neurons = 8 + args.arch.vin_val_neurons = 3 + args.arch.fr_inside_neurons = 32 + + args.mapper_arch.pad_map_with_zeros_each = [24] + args.mapper_arch.deconv_neurons = [64, 32, 16] + args.mapper_arch.deconv_strides = [1, 2, 1] + + elif (arch_vars.var2 == 'Msc' or arch_vars.var2 == 'MscROMms' or + arch_vars.var2 == 'MscROMss' or arch_vars.var2 == 'MscNoVin'): + # Code for multi-scale planner. + args.arch.vin_num_iters = 8 + args.arch.crop_remove_each = 4 + args.arch.value_crop_size = 8 + + sc = 1./args.navtask.task_params.step_size + max_dist = args.navtask.task_params.max_dist * \ + args.navtask.task_params.num_goals + n_scales = np.log2(float(max_dist) / float(args.arch.vin_num_iters)) + n_scales = int(np.ceil(n_scales)+1) + + args.navtask.task_params.map_scales = \ + list(sc*(0.5**(np.arange(n_scales))[::-1])) + args.navtask.task_params.map_crop_sizes = [16 for x in range(n_scales)] + + args.arch.fr_stride = 1 + args.arch.vin_action_neurons = 8 + args.arch.vin_val_neurons = 3 + args.arch.fr_inside_neurons = 32 + + args.mapper_arch.pad_map_with_zeros_each = [0 for _ in range(n_scales)] + args.mapper_arch.deconv_neurons = [64*n_scales, 32*n_scales, 16*n_scales] + args.mapper_arch.deconv_strides = [1, 2, 1] + + if arch_vars.var2 == 'MscNoVin': + # No planning version. + args.arch.fr_stride = [1, 2, 1, 2] + args.arch.vin_action_neurons = None + args.arch.vin_val_neurons = 16 + args.arch.fr_inside_neurons = 32 + + args.arch.crop_remove_each = 0 + args.arch.value_crop_size = 4 + args.arch.vin_num_iters = 0 + + elif arch_vars.var2 == 'MscROMms' or arch_vars.var2 == 'MscROMss': + # Code with read outs, MscROMms flattens and reads out, + # MscROMss does not flatten and produces output at multiple scales. + args.navtask.task_params.outputs.readout_maps = True + args.navtask.task_params.map_resize_method = 'antialiasing' + args.arch.readout_maps = True + + if arch_vars.var2 == 'MscROMms': + args.arch.rom_arch.num_neurons = [64, 1] + args.arch.rom_arch.kernel_size = 4 + args.arch.rom_arch.strides = [2,2] + args.arch.rom_arch.layers_per_block = 2 + + args.navtask.task_params.readout_maps_crop_sizes = [64] + args.navtask.task_params.readout_maps_scales = [sc] + + elif arch_vars.var2 == 'MscROMss': + args.arch.rom_arch.num_neurons = \ + [64, len(args.navtask.task_params.map_scales)] + args.arch.rom_arch.kernel_size = 4 + args.arch.rom_arch.strides = [1,1] + args.arch.rom_arch.layers_per_block = 1 + + args.navtask.task_params.readout_maps_crop_sizes = \ + args.navtask.task_params.map_crop_sizes + args.navtask.task_params.readout_maps_scales = \ + args.navtask.task_params.map_scales + + else: + logging.fatal('arch_vars.var2 not one of Msc, MscROMms, MscROMss, MscNoVin.') + assert(False) + + map_channels = args.mapper_arch.deconv_neurons[-1] / \ + (2*len(args.navtask.task_params.map_scales)) + args.navtask.task_params.map_channels = map_channels + + return args + +def process_arch_projected_map(args, arch_vars): + # Single scale vision based system which does not use a mapper but instead + # uses an analytically estimated map. + ds = int(arch_vars.var3[2]) + args.navtask.task_params.input_type = 'analytical_counts' + args.navtask.task_params.outputs.analytical_counts = True + + assert(args.navtask.task_params.modalities[0] == 'depth') + args.navtask.camera_param.img_channels = None + + analytical_counts = utils.Foo(map_sizes=[512/ds], + xy_resolution=[5.*ds], + z_bins=[[-10, 10, 150, 200]], + non_linearity=[arch_vars.var2]) + args.navtask.task_params.analytical_counts = analytical_counts + + sc = 1./ds + args.arch.vin_num_iters = 36 + args.navtask.task_params.map_scales = [sc] + args.navtask.task_params.map_crop_sizes = [512/ds] + + args.arch.fr_stride = [1,2] + args.arch.vin_action_neurons = 8 + args.arch.vin_val_neurons = 3 + args.arch.fr_inside_neurons = 32 + + map_channels = len(analytical_counts.z_bins[0]) + 1 + args.navtask.task_params.map_channels = map_channels + args.solver.freeze_conv = False + + return args + +def get_args_for_config(config_name): + args = utils.Foo() + + args.summary, args.control = get_default_args() + + exp_name, mode_str = config_name.split('+') + arch_str, solver_str, navtask_str = exp_name.split('.') + logging.error('config_name: %s', config_name) + logging.error('arch_str: %s', arch_str) + logging.error('navtask_str: %s', navtask_str) + logging.error('solver_str: %s', solver_str) + logging.error('mode_str: %s', mode_str) + + args.solver = cc.process_solver_str(solver_str) + args.navtask = cc.process_navtask_str(navtask_str) + + args = process_arch_str(args, arch_str) + args.arch.isd_k = args.solver.isd_k + + # Train, test, etc. + mode, imset = mode_str.split('_') + args = cc.adjust_args_for_mode(args, mode) + args.navtask.building_names = args.navtask.dataset.get_split(imset) + args.control.test_name = '{:s}_on_{:s}'.format(mode, imset) + + # Log the arguments + logging.error('%s', args) + return args diff --git a/cognitive_mapping_and_planning/cfgs/config_common.py b/cognitive_mapping_and_planning/cfgs/config_common.py new file mode 100644 index 0000000000000000000000000000000000000000..440bf5b72f87a1eeca38e22f33b22e82de7345c0 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_common.py @@ -0,0 +1,261 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os +import numpy as np +import logging +import src.utils as utils +import datasets.nav_env_config as nec +from datasets import factory + +def adjust_args_for_mode(args, mode): + if mode == 'train': + args.control.train = True + + elif mode == 'val1': + # Same settings as for training, to make sure nothing wonky is happening + # there. + args.control.test = True + args.control.test_mode = 'val' + args.navtask.task_params.batch_size = 32 + + elif mode == 'val2': + # No data augmentation, not sampling but taking the argmax action, not + # sampling from the ground truth at all. + args.control.test = True + args.arch.action_sample_type = 'argmax' + args.arch.sample_gt_prob_type = 'zero' + args.navtask.task_params.data_augment = \ + utils.Foo(lr_flip=0, delta_angle=0, delta_xy=0, relight=False, + relight_fast=False, structured=False) + args.control.test_mode = 'val' + args.navtask.task_params.batch_size = 32 + + elif mode == 'bench': + # Actually testing the agent in settings that are kept same between + # different runs. + args.navtask.task_params.batch_size = 16 + args.control.test = True + args.arch.action_sample_type = 'argmax' + args.arch.sample_gt_prob_type = 'zero' + args.navtask.task_params.data_augment = \ + utils.Foo(lr_flip=0, delta_angle=0, delta_xy=0, relight=False, + relight_fast=False, structured=False) + args.summary.test_iters = 250 + args.control.only_eval_when_done = True + args.control.reset_rng_seed = True + args.control.test_mode = 'test' + else: + logging.fatal('Unknown mode: %s.', mode) + assert(False) + return args + +def get_solver_vars(solver_str): + if solver_str == '': vals = []; + else: vals = solver_str.split('_') + ks = ['clip', 'dlw', 'long', 'typ', 'isdk', 'adam_eps', 'init_lr']; + ks = ks[:len(vals)] + + # Gradient clipping or not. + if len(vals) == 0: ks.append('clip'); vals.append('noclip'); + # data loss weight. + if len(vals) == 1: ks.append('dlw'); vals.append('dlw20') + # how long to train for. + if len(vals) == 2: ks.append('long'); vals.append('nolong') + # Adam + if len(vals) == 3: ks.append('typ'); vals.append('adam2') + # reg loss wt + if len(vals) == 4: ks.append('rlw'); vals.append('rlw1') + # isd_k + if len(vals) == 5: ks.append('isdk'); vals.append('isdk415') # 415, inflexion at 2.5k. + # adam eps + if len(vals) == 6: ks.append('adam_eps'); vals.append('aeps1en8') + # init lr + if len(vals) == 7: ks.append('init_lr'); vals.append('lr1en3') + + assert(len(vals) == 8) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + logging.error('solver_vars: %s', vars) + return vars + +def process_solver_str(solver_str): + solver = utils.Foo( + seed=0, learning_rate_decay=None, clip_gradient_norm=None, max_steps=None, + initial_learning_rate=None, momentum=None, steps_per_decay=None, + logdir=None, sync=False, adjust_lr_sync=True, wt_decay=0.0001, + data_loss_wt=None, reg_loss_wt=None, freeze_conv=True, num_workers=1, + task=0, ps_tasks=0, master='local', typ=None, momentum2=None, + adam_eps=None) + + # Clobber with overrides from solver str. + solver_vars = get_solver_vars(solver_str) + + solver.data_loss_wt = float(solver_vars.dlw[3:].replace('x', '.')) + solver.adam_eps = float(solver_vars.adam_eps[4:].replace('x', '.').replace('n', '-')) + solver.initial_learning_rate = float(solver_vars.init_lr[2:].replace('x', '.').replace('n', '-')) + solver.reg_loss_wt = float(solver_vars.rlw[3:].replace('x', '.')) + solver.isd_k = float(solver_vars.isdk[4:].replace('x', '.')) + + long = solver_vars.long + if long == 'long': + solver.steps_per_decay = 40000 + solver.max_steps = 120000 + elif long == 'long2': + solver.steps_per_decay = 80000 + solver.max_steps = 120000 + elif long == 'nolong' or long == 'nol': + solver.steps_per_decay = 20000 + solver.max_steps = 60000 + else: + logging.fatal('solver_vars.long should be long, long2, nolong or nol.') + assert(False) + + clip = solver_vars.clip + if clip == 'noclip' or clip == 'nocl': + solver.clip_gradient_norm = 0 + elif clip[:4] == 'clip': + solver.clip_gradient_norm = float(clip[4:].replace('x', '.')) + else: + logging.fatal('Unknown solver_vars.clip: %s', clip) + assert(False) + + typ = solver_vars.typ + if typ == 'adam': + solver.typ = 'adam' + solver.momentum = 0.9 + solver.momentum2 = 0.999 + solver.learning_rate_decay = 1.0 + elif typ == 'adam2': + solver.typ = 'adam' + solver.momentum = 0.9 + solver.momentum2 = 0.999 + solver.learning_rate_decay = 0.1 + elif typ == 'sgd': + solver.typ = 'sgd' + solver.momentum = 0.99 + solver.momentum2 = None + solver.learning_rate_decay = 0.1 + else: + logging.fatal('Unknown solver_vars.typ: %s', typ) + assert(False) + + logging.error('solver: %s', solver) + return solver + +def get_navtask_vars(navtask_str): + if navtask_str == '': vals = [] + else: vals = navtask_str.split('_') + + ks_all = ['dataset_name', 'modality', 'task', 'history', 'max_dist', + 'num_steps', 'step_size', 'n_ori', 'aux_views', 'data_aug'] + ks = ks_all[:len(vals)] + + # All data or not. + if len(vals) == 0: ks.append('dataset_name'); vals.append('sbpd') + # modality + if len(vals) == 1: ks.append('modality'); vals.append('rgb') + # semantic task? + if len(vals) == 2: ks.append('task'); vals.append('r2r') + # number of history frames. + if len(vals) == 3: ks.append('history'); vals.append('h0') + # max steps + if len(vals) == 4: ks.append('max_dist'); vals.append('32') + # num steps + if len(vals) == 5: ks.append('num_steps'); vals.append('40') + # step size + if len(vals) == 6: ks.append('step_size'); vals.append('8') + # n_ori + if len(vals) == 7: ks.append('n_ori'); vals.append('4') + # Auxiliary views. + if len(vals) == 8: ks.append('aux_views'); vals.append('nv0') + # Normal data augmentation as opposed to structured data augmentation (if set + # to straug. + if len(vals) == 9: ks.append('data_aug'); vals.append('straug') + + assert(len(vals) == 10) + for i in range(len(ks)): + assert(ks[i] == ks_all[i]) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + logging.error('navtask_vars: %s', vals) + return vars + +def process_navtask_str(navtask_str): + navtask = nec.nav_env_base_config() + + # Clobber with overrides from strings. + navtask_vars = get_navtask_vars(navtask_str) + + navtask.task_params.n_ori = int(navtask_vars.n_ori) + navtask.task_params.max_dist = int(navtask_vars.max_dist) + navtask.task_params.num_steps = int(navtask_vars.num_steps) + navtask.task_params.step_size = int(navtask_vars.step_size) + navtask.task_params.data_augment.delta_xy = int(navtask_vars.step_size)/2. + n_aux_views_each = int(navtask_vars.aux_views[2]) + aux_delta_thetas = np.concatenate((np.arange(n_aux_views_each) + 1, + -1 -np.arange(n_aux_views_each))) + aux_delta_thetas = aux_delta_thetas*np.deg2rad(navtask.camera_param.fov) + navtask.task_params.aux_delta_thetas = aux_delta_thetas + + if navtask_vars.data_aug == 'aug': + navtask.task_params.data_augment.structured = False + elif navtask_vars.data_aug == 'straug': + navtask.task_params.data_augment.structured = True + else: + logging.fatal('Unknown navtask_vars.data_aug %s.', navtask_vars.data_aug) + assert(False) + + navtask.task_params.num_history_frames = int(navtask_vars.history[1:]) + navtask.task_params.n_views = 1+navtask.task_params.num_history_frames + + navtask.task_params.goal_channels = int(navtask_vars.n_ori) + + if navtask_vars.task == 'hard': + navtask.task_params.type = 'rng_rejection_sampling_many' + navtask.task_params.rejection_sampling_M = 2000 + navtask.task_params.min_dist = 10 + elif navtask_vars.task == 'r2r': + navtask.task_params.type = 'room_to_room_many' + elif navtask_vars.task == 'ST': + # Semantic task at hand. + navtask.task_params.goal_channels = \ + len(navtask.task_params.semantic_task.class_map_names) + navtask.task_params.rel_goal_loc_dim = \ + len(navtask.task_params.semantic_task.class_map_names) + navtask.task_params.type = 'to_nearest_obj_acc' + else: + logging.fatal('navtask_vars.task: should be hard or r2r, ST') + assert(False) + + if navtask_vars.modality == 'rgb': + navtask.camera_param.modalities = ['rgb'] + navtask.camera_param.img_channels = 3 + elif navtask_vars.modality == 'd': + navtask.camera_param.modalities = ['depth'] + navtask.camera_param.img_channels = 2 + + navtask.task_params.img_height = navtask.camera_param.height + navtask.task_params.img_width = navtask.camera_param.width + navtask.task_params.modalities = navtask.camera_param.modalities + navtask.task_params.img_channels = navtask.camera_param.img_channels + navtask.task_params.img_fov = navtask.camera_param.fov + + navtask.dataset = factory.get_dataset(navtask_vars.dataset_name) + return navtask diff --git a/cognitive_mapping_and_planning/cfgs/config_distill.py b/cognitive_mapping_and_planning/cfgs/config_distill.py new file mode 100644 index 0000000000000000000000000000000000000000..a6f7985f8f003bc48800153239817d6ecbd53662 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_distill.py @@ -0,0 +1,114 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import pprint +import copy +import os +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +import logging +import src.utils as utils +import cfgs.config_common as cc + + +import tensorflow as tf + +rgb_resnet_v2_50_path = 'cache/resnet_v2_50_inception_preprocessed/model.ckpt-5136169' + +def get_default_args(): + robot = utils.Foo(radius=15, base=10, height=140, sensor_height=120, + camera_elevation_degree=-15) + + camera_param = utils.Foo(width=225, height=225, z_near=0.05, z_far=20.0, + fov=60., modalities=['rgb', 'depth']) + + env = utils.Foo(padding=10, resolution=5, num_point_threshold=2, + valid_min=-10, valid_max=200, n_samples_per_face=200) + + data_augment = utils.Foo(lr_flip=0, delta_angle=1, delta_xy=4, relight=False, + relight_fast=False, structured=False) + + task_params = utils.Foo(num_actions=4, step_size=4, num_steps=0, + batch_size=32, room_seed=0, base_class='Building', + task='mapping', n_ori=6, data_augment=data_augment, + output_transform_to_global_map=False, + output_canonical_map=False, + output_incremental_transform=False, + output_free_space=False, move_type='shortest_path', + toy_problem=0) + + buildinger_args = utils.Foo(building_names=['area1_gates_wingA_floor1_westpart'], + env_class=None, robot=robot, + task_params=task_params, env=env, + camera_param=camera_param) + + solver_args = utils.Foo(seed=0, learning_rate_decay=0.1, + clip_gradient_norm=0, max_steps=120000, + initial_learning_rate=0.001, momentum=0.99, + steps_per_decay=40000, logdir=None, sync=False, + adjust_lr_sync=True, wt_decay=0.0001, + data_loss_wt=1.0, reg_loss_wt=1.0, + num_workers=1, task=0, ps_tasks=0, master='local') + + summary_args = utils.Foo(display_interval=1, test_iters=100) + + control_args = utils.Foo(train=False, test=False, + force_batchnorm_is_training_at_test=False) + + arch_args = utils.Foo(rgb_encoder='resnet_v2_50', d_encoder='resnet_v2_50') + + return utils.Foo(solver=solver_args, + summary=summary_args, control=control_args, arch=arch_args, + buildinger=buildinger_args) + +def get_vars(config_name): + vars = config_name.split('_') + if len(vars) == 1: # All data or not. + vars.append('noall') + if len(vars) == 2: # n_ori + vars.append('4') + logging.error('vars: %s', vars) + return vars + +def get_args_for_config(config_name): + args = get_default_args() + config_name, mode = config_name.split('+') + vars = get_vars(config_name) + + logging.info('config_name: %s, mode: %s', config_name, mode) + + args.buildinger.task_params.n_ori = int(vars[2]) + args.solver.freeze_conv = True + args.solver.pretrained_path = resnet_v2_50_path + args.buildinger.task_params.img_channels = 5 + args.solver.data_loss_wt = 0.00001 + + if vars[0] == 'v0': + None + else: + logging.error('config_name: %s undefined', config_name) + + args.buildinger.task_params.height = args.buildinger.camera_param.height + args.buildinger.task_params.width = args.buildinger.camera_param.width + args.buildinger.task_params.modalities = args.buildinger.camera_param.modalities + + if vars[1] == 'all': + args = cc.get_args_for_mode_building_all(args, mode) + elif vars[1] == 'noall': + args = cc.get_args_for_mode_building(args, mode) + + # Log the arguments + logging.error('%s', args) + return args diff --git a/cognitive_mapping_and_planning/cfgs/config_vision_baseline.py b/cognitive_mapping_and_planning/cfgs/config_vision_baseline.py new file mode 100644 index 0000000000000000000000000000000000000000..3cc64fe594ab025fbcfb41543302fa42c7fc0074 --- /dev/null +++ b/cognitive_mapping_and_planning/cfgs/config_vision_baseline.py @@ -0,0 +1,173 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import pprint +import os +import numpy as np +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +import logging +import src.utils as utils +import cfgs.config_common as cc +import datasets.nav_env_config as nec + + +import tensorflow as tf + +FLAGS = flags.FLAGS + +get_solver_vars = cc.get_solver_vars +get_navtask_vars = cc.get_navtask_vars + + +rgb_resnet_v2_50_path = 'data/init_models/resnet_v2_50/model.ckpt-5136169' +d_resnet_v2_50_path = 'data/init_models/distill_rgb_to_d_resnet_v2_50/model.ckpt-120002' + +def get_default_args(): + summary_args = utils.Foo(display_interval=1, test_iters=26, + arop_full_summary_iters=14) + + control_args = utils.Foo(train=False, test=False, + force_batchnorm_is_training_at_test=False, + reset_rng_seed=False, only_eval_when_done=False, + test_mode=None) + return summary_args, control_args + +def get_default_baseline_args(): + batch_norm_param = {'center': True, 'scale': True, + 'activation_fn':tf.nn.relu} + arch_args = utils.Foo( + pred_neurons=[], goal_embed_neurons=[], img_embed_neurons=[], + batch_norm_param=batch_norm_param, dim_reduce_neurons=64, combine_type='', + encoder='resnet_v2_50', action_sample_type='sample', + action_sample_combine_type='one_or_other', + sample_gt_prob_type='inverse_sigmoid_decay', dagger_sample_bn_false=True, + isd_k=750., use_visit_count=False, lstm_output=False, lstm_ego=False, + lstm_img=False, fc_dropout=0.0, embed_goal_for_state=False, + lstm_output_init_state_from_goal=False) + return arch_args + +def get_arch_vars(arch_str): + if arch_str == '': vals = [] + else: vals = arch_str.split('_') + + ks = ['ver', 'lstm_dim', 'dropout'] + + # Exp Ver + if len(vals) == 0: vals.append('v0') + # LSTM dimentsions + if len(vals) == 1: vals.append('lstm2048') + # Dropout + if len(vals) == 2: vals.append('noDO') + + assert(len(vals) == 3) + + vars = utils.Foo() + for k, v in zip(ks, vals): + setattr(vars, k, v) + + logging.error('arch_vars: %s', vars) + return vars + +def process_arch_str(args, arch_str): + # This function modifies args. + args.arch = get_default_baseline_args() + arch_vars = get_arch_vars(arch_str) + + args.navtask.task_params.outputs.rel_goal_loc = True + args.navtask.task_params.input_type = 'vision' + args.navtask.task_params.outputs.images = True + + if args.navtask.camera_param.modalities[0] == 'rgb': + args.solver.pretrained_path = rgb_resnet_v2_50_path + elif args.navtask.camera_param.modalities[0] == 'depth': + args.solver.pretrained_path = d_resnet_v2_50_path + else: + logging.fatal('Neither of rgb or d') + + if arch_vars.dropout == 'DO': + args.arch.fc_dropout = 0.5 + + args.tfcode = 'B' + + exp_ver = arch_vars.ver + if exp_ver == 'v0': + # Multiplicative interaction between goal loc and image features. + args.arch.combine_type = 'multiply' + args.arch.pred_neurons = [256, 256] + args.arch.goal_embed_neurons = [64, 8] + args.arch.img_embed_neurons = [1024, 512, 256*8] + + elif exp_ver == 'v1': + # Additive interaction between goal and image features. + args.arch.combine_type = 'add' + args.arch.pred_neurons = [256, 256] + args.arch.goal_embed_neurons = [64, 256] + args.arch.img_embed_neurons = [1024, 512, 256] + + elif exp_ver == 'v2': + # LSTM at the output on top of multiple interactions. + args.arch.combine_type = 'multiply' + args.arch.goal_embed_neurons = [64, 8] + args.arch.img_embed_neurons = [1024, 512, 256*8] + args.arch.lstm_output = True + args.arch.lstm_output_dim = int(arch_vars.lstm_dim[4:]) + args.arch.pred_neurons = [256] # The other is inside the LSTM. + + elif exp_ver == 'v0blind': + # LSTM only on the goal location. + args.arch.combine_type = 'goalonly' + args.arch.goal_embed_neurons = [64, 256] + args.arch.img_embed_neurons = [2] # I dont know what it will do otherwise. + args.arch.lstm_output = True + args.arch.lstm_output_dim = 256 + args.arch.pred_neurons = [256] # The other is inside the LSTM. + + else: + logging.fatal('exp_ver: %s undefined', exp_ver) + assert(False) + + # Log the arguments + logging.error('%s', args) + return args + +def get_args_for_config(config_name): + args = utils.Foo() + + args.summary, args.control = get_default_args() + + exp_name, mode_str = config_name.split('+') + arch_str, solver_str, navtask_str = exp_name.split('.') + logging.error('config_name: %s', config_name) + logging.error('arch_str: %s', arch_str) + logging.error('navtask_str: %s', navtask_str) + logging.error('solver_str: %s', solver_str) + logging.error('mode_str: %s', mode_str) + + args.solver = cc.process_solver_str(solver_str) + args.navtask = cc.process_navtask_str(navtask_str) + + args = process_arch_str(args, arch_str) + args.arch.isd_k = args.solver.isd_k + + # Train, test, etc. + mode, imset = mode_str.split('_') + args = cc.adjust_args_for_mode(args, mode) + args.navtask.building_names = args.navtask.dataset.get_split(imset) + args.control.test_name = '{:s}_on_{:s}'.format(mode, imset) + + # Log the arguments + logging.error('%s', args) + return args diff --git a/cognitive_mapping_and_planning/data/.gitignore b/cognitive_mapping_and_planning/data/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..2b6d5e46652d14a9c0a8025dbcccfc2dd4376e4a --- /dev/null +++ b/cognitive_mapping_and_planning/data/.gitignore @@ -0,0 +1,3 @@ +stanford_building_parser_dataset_raw +stanford_building_parser_dataset +init_models diff --git a/cognitive_mapping_and_planning/data/README.md b/cognitive_mapping_and_planning/data/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a8928345351dac19c0e12fd33f99dd2aa600e23b --- /dev/null +++ b/cognitive_mapping_and_planning/data/README.md @@ -0,0 +1,33 @@ +This directory contains the data needed for training and benchmarking various +navigation models. + +1. Download the data from the [dataset website] + (http://buildingparser.stanford.edu/dataset.html). + 1. [Raw meshes](https://goo.gl/forms/2YSPaO2UKmn5Td5m2). We need the meshes + which are in the noXYZ folder. Download the tar files and place them in + the `stanford_building_parser_dataset_raw` folder. You need to download + `area_1_noXYZ.tar`, `area_3_noXYZ.tar`, `area_5a_noXYZ.tar`, + `area_5b_noXYZ.tar`, `area_6_noXYZ.tar` for training and + `area_4_noXYZ.tar` for evaluation. + 2. [Annotations](https://goo.gl/forms/4SoGp4KtH1jfRqEj2) for setting up + tasks. We will need the file called `Stanford3dDataset_v1.2.zip`. Place + the file in the directory `stanford_building_parser_dataset_raw`. + +2. Preprocess the data. + 1. Extract meshes using `scripts/script_preprocess_meshes_S3DIS.sh`. After + this `ls data/stanford_building_parser_dataset/mesh` should have 6 + folders `area1`, `area3`, `area4`, `area5a`, `area5b`, `area6`, with + textures and obj files within each directory. + 2. Extract out room information and semantics from zip file using + `scripts/script_preprocess_annoations_S3DIS.sh`. After this there should + be `room-dimension` and `class-maps` folder in + `data/stanford_building_parser_dataset`. (If you find this script to + crash because of an exception in np.loadtxt while processing + `Area_5/office_19/Annotations/ceiling_1.txt`, there is a special + character on line 323474, that should be removed manually.) + +3. Download ImageNet Pre-trained models. We used ResNet-v2-50 for representing + images. For RGB images this is pre-trained on ImageNet. For Depth images we + [distill](https://arxiv.org/abs/1507.00448) the RGB model to depth images + using paired RGB-D images. Both there models are available through + `scripts/script_download_init_models.sh` diff --git a/cognitive_mapping_and_planning/datasets/__init__.py b/cognitive_mapping_and_planning/datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/datasets/factory.py b/cognitive_mapping_and_planning/datasets/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..3f7b5c0a602dbacf9619dc1c2ec98e94200428b6 --- /dev/null +++ b/cognitive_mapping_and_planning/datasets/factory.py @@ -0,0 +1,113 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Wrapper for selecting the navigation environment that we want to train and +test on. +""" +import numpy as np +import os, glob +import platform + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags + +import render.swiftshader_renderer as renderer +import src.file_utils as fu +import src.utils as utils + +def get_dataset(dataset_name): + if dataset_name == 'sbpd': + dataset = StanfordBuildingParserDataset(dataset_name) + else: + logging.fatal('Not one of sbpd') + return dataset + +class Loader(): + def get_data_dir(): + pass + + def get_meta_data(self, file_name, data_dir=None): + if data_dir is None: + data_dir = self.get_data_dir() + full_file_name = os.path.join(data_dir, 'meta', file_name) + assert(fu.exists(full_file_name)), \ + '{:s} does not exist'.format(full_file_name) + ext = os.path.splitext(full_file_name)[1] + if ext == '.txt': + ls = [] + with fu.fopen(full_file_name, 'r') as f: + for l in f: + ls.append(l.rstrip()) + elif ext == '.pkl': + ls = utils.load_variables(full_file_name) + return ls + + def load_building(self, name, data_dir=None): + if data_dir is None: + data_dir = self.get_data_dir() + out = {} + out['name'] = name + out['data_dir'] = data_dir + out['room_dimension_file'] = os.path.join(data_dir, 'room-dimension', + name+'.pkl') + out['class_map_folder'] = os.path.join(data_dir, 'class-maps') + return out + + def load_building_meshes(self, building): + dir_name = os.path.join(building['data_dir'], 'mesh', building['name']) + mesh_file_name = glob.glob1(dir_name, '*.obj')[0] + mesh_file_name_full = os.path.join(dir_name, mesh_file_name) + logging.error('Loading building from obj file: %s', mesh_file_name_full) + shape = renderer.Shape(mesh_file_name_full, load_materials=True, + name_prefix=building['name']+'_') + return [shape] + +class StanfordBuildingParserDataset(Loader): + def __init__(self, ver): + self.ver = ver + self.data_dir = None + + def get_data_dir(self): + if self.data_dir is None: + self.data_dir = 'data/stanford_building_parser_dataset/' + return self.data_dir + + def get_benchmark_sets(self): + return self._get_benchmark_sets() + + def get_split(self, split_name): + if self.ver == 'sbpd': + return self._get_split(split_name) + else: + logging.fatal('Unknown version.') + + def _get_benchmark_sets(self): + sets = ['train1', 'val', 'test'] + return sets + + def _get_split(self, split_name): + train = ['area1', 'area5a', 'area5b', 'area6'] + train1 = ['area1'] + val = ['area3'] + test = ['area4'] + + sets = {} + sets['train'] = train + sets['train1'] = train1 + sets['val'] = val + sets['test'] = test + sets['all'] = sorted(list(set(train + val + test))) + return sets[split_name] diff --git a/cognitive_mapping_and_planning/datasets/nav_env.py b/cognitive_mapping_and_planning/datasets/nav_env.py new file mode 100644 index 0000000000000000000000000000000000000000..5710e26dcb113121d99400cb060104224dd91749 --- /dev/null +++ b/cognitive_mapping_and_planning/datasets/nav_env.py @@ -0,0 +1,1465 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Navidation Environment. Includes the following classes along with some +helper functions. + Building: Loads buildings, computes traversibility, exposes functionality for + rendering images. + + GridWorld: Base class which implements functionality for moving an agent on a + grid world. + + NavigationEnv: Base class which generates navigation problems on a grid world. + + VisualNavigationEnv: Builds upon NavigationEnv and Building to provide + interface that is used externally to train the agent. + + MeshMapper: Class used for distilling the model, testing the mapper. + + BuildingMultiplexer: Wrapper class that instantiates a VisualNavigationEnv for + each building and multiplexes between them as needed. +""" + +import numpy as np +import os +import re +import matplotlib.pyplot as plt + +import graph_tool as gt +import graph_tool.topology + +from tensorflow.python.platform import gfile +import logging +import src.file_utils as fu +import src.utils as utils +import src.graph_utils as gu +import src.map_utils as mu +import src.depth_utils as du +import render.swiftshader_renderer as sru +from render.swiftshader_renderer import SwiftshaderRenderer +import cv2 + +label_nodes_with_class = gu.label_nodes_with_class +label_nodes_with_class_geodesic = gu.label_nodes_with_class_geodesic +get_distance_node_list = gu.get_distance_node_list +convert_to_graph_tool = gu.convert_to_graph_tool +generate_graph = gu.generate_graph +get_hardness_distribution = gu.get_hardness_distribution +rng_next_goal_rejection_sampling = gu.rng_next_goal_rejection_sampling +rng_next_goal = gu.rng_next_goal +rng_room_to_room = gu.rng_room_to_room +rng_target_dist_field = gu.rng_target_dist_field + +compute_traversibility = mu.compute_traversibility +make_map = mu.make_map +resize_maps = mu.resize_maps +pick_largest_cc = mu.pick_largest_cc +get_graph_origin_loc = mu.get_graph_origin_loc +generate_egocentric_maps = mu.generate_egocentric_maps +generate_goal_images = mu.generate_goal_images +get_map_to_predict = mu.get_map_to_predict + +bin_points = du.bin_points +make_geocentric = du.make_geocentric +get_point_cloud_from_z = du.get_point_cloud_from_z +get_camera_matrix = du.get_camera_matrix + +def _get_semantic_maps(folder_name, building_name, map, flip): + # Load file from the cache. + file_name = '{:s}_{:d}_{:d}_{:d}_{:d}_{:d}_{:d}.pkl' + file_name = file_name.format(building_name, map.size[0], map.size[1], + map.origin[0], map.origin[1], map.resolution, + flip) + file_name = os.path.join(folder_name, file_name) + logging.info('Loading semantic maps from %s.', file_name) + + if fu.exists(file_name): + a = utils.load_variables(file_name) + maps = a['maps'] #HxWx#C + cats = a['cats'] + else: + logging.error('file_name: %s not found.', file_name) + maps = None + cats = None + return maps, cats + +def _select_classes(all_maps, all_cats, cats_to_use): + inds = [] + for c in cats_to_use: + ind = all_cats.index(c) + inds.append(ind) + out_maps = all_maps[:,:,inds] + return out_maps + +def _get_room_dimensions(file_name, resolution, origin, flip=False): + if fu.exists(file_name): + a = utils.load_variables(file_name)['room_dimension'] + names = a.keys() + dims = np.concatenate(a.values(), axis=0).reshape((-1,6)) + ind = np.argsort(names) + dims = dims[ind,:] + names = [names[x] for x in ind] + if flip: + dims_new = dims*1 + dims_new[:,1] = -dims[:,4] + dims_new[:,4] = -dims[:,1] + dims = dims_new*1 + + dims = dims*100. + dims[:,0] = dims[:,0] - origin[0] + dims[:,1] = dims[:,1] - origin[1] + dims[:,3] = dims[:,3] - origin[0] + dims[:,4] = dims[:,4] - origin[1] + dims = dims / resolution + out = {'names': names, 'dims': dims} + else: + out = None + return out + +def _filter_rooms(room_dims, room_regex): + pattern = re.compile(room_regex) + ind = [] + for i, name in enumerate(room_dims['names']): + if pattern.match(name): + ind.append(i) + new_room_dims = {} + new_room_dims['names'] = [room_dims['names'][i] for i in ind] + new_room_dims['dims'] = room_dims['dims'][ind,:]*1 + return new_room_dims + +def _label_nodes_with_room_id(xyt, room_dims): + # Label the room with the ID into things. + node_room_id = -1*np.ones((xyt.shape[0], 1)) + dims = room_dims['dims'] + for x, name in enumerate(room_dims['names']): + all_ = np.concatenate((xyt[:,[0]] >= dims[x,0], + xyt[:,[0]] <= dims[x,3], + xyt[:,[1]] >= dims[x,1], + xyt[:,[1]] <= dims[x,4]), axis=1) + node_room_id[np.all(all_, axis=1), 0] = x + return node_room_id + +def get_path_ids(start_node_id, end_node_id, pred_map): + id = start_node_id + path = [id] + while id != end_node_id: + id = pred_map[id] + path.append(id) + return path + +def image_pre(images, modalities): + # Assumes images are ...xHxWxC. + # We always assume images are RGB followed by Depth. + if 'depth' in modalities: + d = images[...,-1][...,np.newaxis]*1. + d[d < 0.01] = np.NaN; isnan = np.isnan(d); + d = 100./d; d[isnan] = 0.; + images = np.concatenate((images[...,:-1], d, isnan), axis=images.ndim-1) + if 'rgb' in modalities: + images[...,:3] = images[...,:3]*1. - 128 + return images + +def _get_relative_goal_loc(goal_loc, loc, theta): + r = np.sqrt(np.sum(np.square(goal_loc - loc), axis=1)) + t = np.arctan2(goal_loc[:,1] - loc[:,1], goal_loc[:,0] - loc[:,0]) + t = t-theta[:,0] + np.pi/2 + return np.expand_dims(r,axis=1), np.expand_dims(t, axis=1) + +def _gen_perturbs(rng, batch_size, num_steps, lr_flip, delta_angle, delta_xy, + structured): + perturbs = [] + for i in range(batch_size): + # Doing things one by one for each episode in this batch. This way this + # remains replicatable even when we change the batch size. + p = np.zeros((num_steps+1, 4)) + if lr_flip: + # Flip the whole trajectory. + p[:,3] = rng.rand(1)-0.5 + if delta_angle > 0: + if structured: + p[:,2] = (rng.rand(1)-0.5)* delta_angle + else: + p[:,2] = (rng.rand(p.shape[0])-0.5)* delta_angle + if delta_xy > 0: + if structured: + p[:,:2] = (rng.rand(1, 2)-0.5)*delta_xy + else: + p[:,:2] = (rng.rand(p.shape[0], 2)-0.5)*delta_xy + perturbs.append(p) + return perturbs + +def get_multiplexer_class(args, task_number): + assert(args.task_params.base_class == 'Building') + logging.info('Returning BuildingMultiplexer') + R = BuildingMultiplexer(args, task_number) + return R + +class GridWorld(): + def __init__(self): + """Class members that will be assigned by any class that actually uses this + class.""" + self.restrict_to_largest_cc = None + self.robot = None + self.env = None + self.category_list = None + self.traversible = None + + def get_loc_axis(self, node, delta_theta, perturb=None): + """Based on the node orientation returns X, and Y axis. Used to sample the + map in egocentric coordinate frame. + """ + if type(node) == tuple: + node = np.array([node]) + if perturb is None: + perturb = np.zeros((node.shape[0], 4)) + xyt = self.to_actual_xyt_vec(node) + x = xyt[:,[0]] + perturb[:,[0]] + y = xyt[:,[1]] + perturb[:,[1]] + t = xyt[:,[2]] + perturb[:,[2]] + theta = t*delta_theta + loc = np.concatenate((x,y), axis=1) + x_axis = np.concatenate((np.cos(theta), np.sin(theta)), axis=1) + y_axis = np.concatenate((np.cos(theta+np.pi/2.), np.sin(theta+np.pi/2.)), + axis=1) + # Flip the sampled map where need be. + y_axis[np.where(perturb[:,3] > 0)[0], :] *= -1. + return loc, x_axis, y_axis, theta + + def to_actual_xyt(self, pqr): + """Converts from node to location on the map.""" + (p, q, r) = pqr + if self.task.n_ori == 6: + out = (p - q * 0.5 + self.task.origin_loc[0], + q * np.sqrt(3.) / 2. + self.task.origin_loc[1], r) + elif self.task.n_ori == 4: + out = (p + self.task.origin_loc[0], + q + self.task.origin_loc[1], r) + return out + + def to_actual_xyt_vec(self, pqr): + """Converts from node array to location array on the map.""" + p = pqr[:,0][:, np.newaxis] + q = pqr[:,1][:, np.newaxis] + r = pqr[:,2][:, np.newaxis] + if self.task.n_ori == 6: + out = np.concatenate((p - q * 0.5 + self.task.origin_loc[0], + q * np.sqrt(3.) / 2. + self.task.origin_loc[1], + r), axis=1) + elif self.task.n_ori == 4: + out = np.concatenate((p + self.task.origin_loc[0], + q + self.task.origin_loc[1], + r), axis=1) + return out + + def raw_valid_fn_vec(self, xyt): + """Returns if the given set of nodes is valid or not.""" + height = self.traversible.shape[0] + width = self.traversible.shape[1] + x = np.round(xyt[:,[0]]).astype(np.int32) + y = np.round(xyt[:,[1]]).astype(np.int32) + is_inside = np.all(np.concatenate((x >= 0, y >= 0, + x < width, y < height), axis=1), axis=1) + x = np.minimum(np.maximum(x, 0), width-1) + y = np.minimum(np.maximum(y, 0), height-1) + ind = np.ravel_multi_index((y,x), self.traversible.shape) + is_traversible = self.traversible.ravel()[ind] + + is_valid = np.all(np.concatenate((is_inside[:,np.newaxis], is_traversible), + axis=1), axis=1) + return is_valid + + + def valid_fn_vec(self, pqr): + """Returns if the given set of nodes is valid or not.""" + xyt = self.to_actual_xyt_vec(np.array(pqr)) + height = self.traversible.shape[0] + width = self.traversible.shape[1] + x = np.round(xyt[:,[0]]).astype(np.int32) + y = np.round(xyt[:,[1]]).astype(np.int32) + is_inside = np.all(np.concatenate((x >= 0, y >= 0, + x < width, y < height), axis=1), axis=1) + x = np.minimum(np.maximum(x, 0), width-1) + y = np.minimum(np.maximum(y, 0), height-1) + ind = np.ravel_multi_index((y,x), self.traversible.shape) + is_traversible = self.traversible.ravel()[ind] + + is_valid = np.all(np.concatenate((is_inside[:,np.newaxis], is_traversible), + axis=1), axis=1) + return is_valid + + def get_feasible_actions(self, node_ids): + """Returns the feasible set of actions from the current node.""" + a = np.zeros((len(node_ids), self.task_params.num_actions), dtype=np.int32) + gtG = self.task.gtG + next_node = [] + for i, c in enumerate(node_ids): + neigh = gtG.vertex(c).out_neighbours() + neigh_edge = gtG.vertex(c).out_edges() + nn = {} + for n, e in zip(neigh, neigh_edge): + _ = gtG.ep['action'][e] + a[i,_] = 1 + nn[_] = int(n) + next_node.append(nn) + return a, next_node + + def take_action(self, current_node_ids, action): + """Returns the new node after taking the action action. Stays at the current + node if the action is invalid.""" + actions, next_node_ids = self.get_feasible_actions(current_node_ids) + new_node_ids = [] + for i, (c,a) in enumerate(zip(current_node_ids, action)): + if actions[i,a] == 1: + new_node_ids.append(next_node_ids[i][a]) + else: + new_node_ids.append(c) + return new_node_ids + + def set_r_obj(self, r_obj): + """Sets the SwiftshaderRenderer object used for rendering.""" + self.r_obj = r_obj + +class Building(GridWorld): + def __init__(self, building_name, robot, env, + category_list=None, small=False, flip=False, logdir=None, + building_loader=None): + + self.restrict_to_largest_cc = True + self.robot = robot + self.env = env + self.logdir = logdir + + # Load the building meta data. + building = building_loader.load_building(building_name) + if small: + building['mesh_names'] = building['mesh_names'][:5] + + # New code. + shapess = building_loader.load_building_meshes(building) + if flip: + for shapes in shapess: + shapes.flip_shape() + + vs = [] + for shapes in shapess: + vs.append(shapes.get_vertices()[0]) + vs = np.concatenate(vs, axis=0) + map = make_map(env.padding, env.resolution, vertex=vs, sc=100.) + map = compute_traversibility( + map, robot.base, robot.height, robot.radius, env.valid_min, + env.valid_max, env.num_point_threshold, shapess=shapess, sc=100., + n_samples_per_face=env.n_samples_per_face) + + room_dims = _get_room_dimensions(building['room_dimension_file'], + env.resolution, map.origin, flip=flip) + class_maps, class_map_names = _get_semantic_maps( + building['class_map_folder'], building_name, map, flip) + + self.class_maps = class_maps + self.class_map_names = class_map_names + self.building = building + self.shapess = shapess + self.map = map + self.traversible = map.traversible*1 + self.building_name = building_name + self.room_dims = room_dims + self.flipped = flip + self.renderer_entitiy_ids = [] + + if self.restrict_to_largest_cc: + self.traversible = pick_largest_cc(self.traversible) + + def load_building_into_scene(self): + # Loads the scene. + self.renderer_entitiy_ids += self.r_obj.load_shapes(self.shapess) + # Free up memory, we dont need the mesh or the materials anymore. + self.shapess = None + + def add_entity_at_nodes(self, nodes, height, shape): + xyt = self.to_actual_xyt_vec(nodes) + nxy = xyt[:,:2]*1. + nxy = nxy * self.map.resolution + nxy = nxy + self.map.origin + Ts = np.concatenate((nxy, nxy[:,:1]), axis=1) + Ts[:,2] = height; Ts = Ts / 100.; + + # Merge all the shapes into a single shape and add that shape. + shape.replicate_shape(Ts) + entity_ids = self.r_obj.load_shapes([shape]) + self.renderer_entitiy_ids += entity_ids + return entity_ids + + def add_shapes(self, shapes): + scene = self.r_obj.viz.scene() + for shape in shapes: + scene.AddShape(shape) + + def add_materials(self, materials): + scene = self.r_obj.viz.scene() + for material in materials: + scene.AddOrUpdateMaterial(material) + + def set_building_visibility(self, visibility): + self.r_obj.set_entity_visible(self.renderer_entitiy_ids, visibility) + + def render_nodes(self, nodes, perturb=None, aux_delta_theta=0.): + self.set_building_visibility(True) + if perturb is None: + perturb = np.zeros((len(nodes), 4)) + + imgs = [] + r = 2 + elevation_z = r * np.tan(np.deg2rad(self.robot.camera_elevation_degree)) + + for i in range(len(nodes)): + xyt = self.to_actual_xyt(nodes[i]) + lookat_theta = 3.0 * np.pi / 2.0 - (xyt[2]+perturb[i,2]+aux_delta_theta) * (self.task.delta_theta) + nxy = np.array([xyt[0]+perturb[i,0], xyt[1]+perturb[i,1]]).reshape(1, -1) + nxy = nxy * self.map.resolution + nxy = nxy + self.map.origin + camera_xyz = np.zeros((1, 3)) + camera_xyz[...] = [nxy[0, 0], nxy[0, 1], self.robot.sensor_height] + camera_xyz = camera_xyz / 100. + lookat_xyz = np.array([-r * np.sin(lookat_theta), + -r * np.cos(lookat_theta), elevation_z]) + lookat_xyz = lookat_xyz + camera_xyz[0, :] + self.r_obj.position_camera(camera_xyz[0, :].tolist(), + lookat_xyz.tolist(), [0.0, 0.0, 1.0]) + img = self.r_obj.render(take_screenshot=True, output_type=0) + img = [x for x in img if x is not None] + img = np.concatenate(img, axis=2).astype(np.float32) + if perturb[i,3]>0: + img = img[:,::-1,:] + imgs.append(img) + + self.set_building_visibility(False) + return imgs + + +class MeshMapper(Building): + def __init__(self, robot, env, task_params, building_name, category_list, + flip, logdir=None, building_loader=None): + Building.__init__(self, building_name, robot, env, category_list, + small=task_params.toy_problem, flip=flip, logdir=logdir, + building_loader=building_loader) + self.task_params = task_params + self.task = None + self._preprocess_for_task(self.task_params.building_seed) + + def _preprocess_for_task(self, seed): + if self.task is None or self.task.seed != seed: + rng = np.random.RandomState(seed) + origin_loc = get_graph_origin_loc(rng, self.traversible) + self.task = utils.Foo(seed=seed, origin_loc=origin_loc, + n_ori=self.task_params.n_ori) + G = generate_graph(self.valid_fn_vec, + self.task_params.step_size, self.task.n_ori, + (0, 0, 0)) + gtG, nodes, nodes_to_id = convert_to_graph_tool(G) + self.task.gtG = gtG + self.task.nodes = nodes + self.task.delta_theta = 2.0*np.pi/(self.task.n_ori*1.) + self.task.nodes_to_id = nodes_to_id + logging.info('Building %s, #V=%d, #E=%d', self.building_name, + self.task.nodes.shape[0], self.task.gtG.num_edges()) + + if self.logdir is not None: + write_traversible = cv2.applyColorMap(self.traversible.astype(np.uint8)*255, cv2.COLORMAP_JET) + img_path = os.path.join(self.logdir, + '{:s}_{:d}_graph.png'.format(self.building_name, + seed)) + node_xyt = self.to_actual_xyt_vec(self.task.nodes) + plt.set_cmap('jet'); + fig, ax = utils.subplot(plt, (1,1), (12,12)) + ax.plot(node_xyt[:,0], node_xyt[:,1], 'm.') + ax.imshow(self.traversible, origin='lower'); + ax.set_axis_off(); ax.axis('equal'); + ax.set_title('{:s}, {:d}, {:d}'.format(self.building_name, + self.task.nodes.shape[0], + self.task.gtG.num_edges())) + if self.room_dims is not None: + for i, r in enumerate(self.room_dims['dims']*1): + min_ = r[:3]*1 + max_ = r[3:]*1 + xmin, ymin, zmin = min_ + xmax, ymax, zmax = max_ + + ax.plot([xmin, xmax, xmax, xmin, xmin], + [ymin, ymin, ymax, ymax, ymin], 'g') + with fu.fopen(img_path, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + + + def _gen_rng(self, rng): + # instances is a list of list of node_ids. + if self.task_params.move_type == 'circle': + _, _, _, _, paths = rng_target_dist_field(self.task_params.batch_size, + self.task.gtG, rng, 0, 1, + compute_path=True) + instances_ = paths + + instances = [] + for instance_ in instances_: + instance = instance_ + for i in range(self.task_params.num_steps): + instance.append(self.take_action([instance[-1]], [1])[0]) + instances.append(instance) + + elif self.task_params.move_type == 'shortest_path': + _, _, _, _, paths = rng_target_dist_field(self.task_params.batch_size, + self.task.gtG, rng, + self.task_params.num_steps, + self.task_params.num_steps+1, + compute_path=True) + instances = paths + + elif self.task_params.move_type == 'circle+forward': + _, _, _, _, paths = rng_target_dist_field(self.task_params.batch_size, + self.task.gtG, rng, 0, 1, + compute_path=True) + instances_ = paths + instances = [] + for instance_ in instances_: + instance = instance_ + for i in range(self.task_params.n_ori-1): + instance.append(self.take_action([instance[-1]], [1])[0]) + while len(instance) <= self.task_params.num_steps: + while self.take_action([instance[-1]], [3])[0] == instance[-1] and len(instance) <= self.task_params.num_steps: + instance.append(self.take_action([instance[-1]], [2])[0]) + if len(instance) <= self.task_params.num_steps: + instance.append(self.take_action([instance[-1]], [3])[0]) + instances.append(instance) + + # Do random perturbation if needed. + perturbs = _gen_perturbs(rng, self.task_params.batch_size, + self.task_params.num_steps, + self.task_params.data_augment.lr_flip, + self.task_params.data_augment.delta_angle, + self.task_params.data_augment.delta_xy, + self.task_params.data_augment.structured) + return instances, perturbs + + def worker(self, instances, perturbs): + # Output the images and the free space. + + # Make the instances be all the same length. + for i in range(len(instances)): + for j in range(self.task_params.num_steps - len(instances[i]) + 1): + instances[i].append(instances[i][-1]) + if perturbs[i].shape[0] < self.task_params.num_steps+1: + p = np.zeros((self.task_params.num_steps+1, 4)) + p[:perturbs[i].shape[0], :] = perturbs[i] + p[perturbs[i].shape[0]:, :] = perturbs[i][-1,:] + perturbs[i] = p + + instances_ = [] + for instance in instances: + instances_ = instances_ + instance + perturbs_ = np.concatenate(perturbs, axis=0) + + instances_nodes = self.task.nodes[instances_,:] + instances_nodes = [tuple(x) for x in instances_nodes] + + imgs_ = self.render_nodes(instances_nodes, perturbs_) + imgs = []; next = 0; + for instance in instances: + img_i = [] + for _ in instance: + img_i.append(imgs_[next]) + next = next+1 + imgs.append(img_i) + imgs = np.array(imgs) + + # Render out the maps in the egocentric view for all nodes and not just the + # last node. + all_nodes = [] + for x in instances: + all_nodes = all_nodes + x + all_perturbs = np.concatenate(perturbs, axis=0) + loc, x_axis, y_axis, theta = self.get_loc_axis( + self.task.nodes[all_nodes, :]*1, delta_theta=self.task.delta_theta, + perturb=all_perturbs) + fss = None + valids = None + loc_on_map = None + theta_on_map = None + cum_fs = None + cum_valid = None + incremental_locs = None + incremental_thetas = None + + if self.task_params.output_free_space: + fss, valids = get_map_to_predict(loc, x_axis, y_axis, + map=self.traversible*1., + map_size=self.task_params.map_size) + fss = np.array(fss) > 0.5 + fss = np.reshape(fss, [self.task_params.batch_size, + self.task_params.num_steps+1, + self.task_params.map_size, + self.task_params.map_size]) + valids = np.reshape(np.array(valids), fss.shape) + + if self.task_params.output_transform_to_global_map: + # Output the transform to the global map. + loc_on_map = np.reshape(loc*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + # Converting to location wrt to first location so that warping happens + # properly. + theta_on_map = np.reshape(theta*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + + if self.task_params.output_incremental_transform: + # Output the transform to the global map. + incremental_locs_ = np.reshape(loc*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + incremental_locs_[:,1:,:] -= incremental_locs_[:,:-1,:] + t0 = -np.pi/2+np.reshape(theta*1, [self.task_params.batch_size, + self.task_params.num_steps+1, -1]) + t = t0*1 + incremental_locs = incremental_locs_*1 + incremental_locs[:,:,0] = np.sum(incremental_locs_ * np.concatenate((np.cos(t), np.sin(t)), axis=-1), axis=-1) + incremental_locs[:,:,1] = np.sum(incremental_locs_ * np.concatenate((np.cos(t+np.pi/2), np.sin(t+np.pi/2)), axis=-1), axis=-1) + incremental_locs[:,0,:] = incremental_locs_[:,0,:] + # print incremental_locs_[0,:,:], incremental_locs[0,:,:], t0[0,:,:] + + incremental_thetas = np.reshape(theta*1, [self.task_params.batch_size, + self.task_params.num_steps+1, + -1]) + incremental_thetas[:,1:,:] += -incremental_thetas[:,:-1,:] + + if self.task_params.output_canonical_map: + loc_ = loc[0::(self.task_params.num_steps+1), :] + x_axis = np.zeros_like(loc_); x_axis[:,1] = 1 + y_axis = np.zeros_like(loc_); y_axis[:,0] = -1 + cum_fs, cum_valid = get_map_to_predict(loc_, x_axis, y_axis, + map=self.traversible*1., + map_size=self.task_params.map_size) + cum_fs = np.array(cum_fs) > 0.5 + cum_fs = np.reshape(cum_fs, [self.task_params.batch_size, 1, + self.task_params.map_size, + self.task_params.map_size]) + cum_valid = np.reshape(np.array(cum_valid), cum_fs.shape) + + + inputs = {'fs_maps': fss, + 'valid_maps': valids, + 'imgs': imgs, + 'loc_on_map': loc_on_map, + 'theta_on_map': theta_on_map, + 'cum_fs_maps': cum_fs, + 'cum_valid_maps': cum_valid, + 'incremental_thetas': incremental_thetas, + 'incremental_locs': incremental_locs} + return inputs + + def pre(self, inputs): + inputs['imgs'] = image_pre(inputs['imgs'], self.task_params.modalities) + if inputs['loc_on_map'] is not None: + inputs['loc_on_map'] = inputs['loc_on_map'] - inputs['loc_on_map'][:,[0],:] + if inputs['theta_on_map'] is not None: + inputs['theta_on_map'] = np.pi/2. - inputs['theta_on_map'] + return inputs + +def _nav_env_reset_helper(type, rng, nodes, batch_size, gtG, max_dist, + num_steps, num_goals, data_augment, **kwargs): + """Generates and returns a new episode.""" + max_compute = max_dist + 4*num_steps + if type == 'general': + start_node_ids, end_node_ids, dist, pred_map, paths = \ + rng_target_dist_field(batch_size, gtG, rng, max_dist, max_compute, + nodes=nodes, compute_path=False) + target_class = None + + elif type == 'room_to_room_many': + goal_node_ids = []; dists = []; + node_room_ids = kwargs['node_room_ids'] + # Sample the first one + start_node_ids_, end_node_ids_, dist_, _, _ = rng_room_to_room( + batch_size, gtG, rng, max_dist, max_compute, + node_room_ids=node_room_ids, nodes=nodes) + start_node_ids = start_node_ids_ + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + for n in range(num_goals-1): + start_node_ids_, end_node_ids_, dist_, _, _ = rng_next_goal( + goal_node_ids[n], batch_size, gtG, rng, max_dist, + max_compute, node_room_ids=node_room_ids, nodes=nodes, + dists_from_start_node=dists[n]) + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + target_class = None + + elif type == 'rng_rejection_sampling_many': + num_goals = num_goals + goal_node_ids = []; dists = []; + + n_ori = kwargs['n_ori'] + step_size = kwargs['step_size'] + min_dist = kwargs['min_dist'] + sampling_distribution = kwargs['sampling_distribution'] + target_distribution = kwargs['target_distribution'] + rejection_sampling_M = kwargs['rejection_sampling_M'] + distribution_bins = kwargs['distribution_bins'] + + for n in range(num_goals): + if n == 0: input_nodes = None + else: input_nodes = goal_node_ids[n-1] + start_node_ids_, end_node_ids_, dist_, _, _, _, _ = rng_next_goal_rejection_sampling( + input_nodes, batch_size, gtG, rng, max_dist, min_dist, + max_compute, sampling_distribution, target_distribution, nodes, + n_ori, step_size, distribution_bins, rejection_sampling_M) + if n == 0: start_node_ids = start_node_ids_ + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + target_class = None + + elif type == 'room_to_room_back': + num_goals = num_goals + assert(num_goals == 2), 'num_goals must be 2.' + goal_node_ids = []; dists = []; + node_room_ids = kwargs['node_room_ids'] + # Sample the first one. + start_node_ids_, end_node_ids_, dist_, _, _ = rng_room_to_room( + batch_size, gtG, rng, max_dist, max_compute, + node_room_ids=node_room_ids, nodes=nodes) + start_node_ids = start_node_ids_ + goal_node_ids.append(end_node_ids_) + dists.append(dist_) + + # Set second goal to be starting position, and compute distance to the start node. + goal_node_ids.append(start_node_ids) + dist = [] + for i in range(batch_size): + dist_ = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), + source=gtG.vertex(start_node_ids[i]), target=None) + dist_ = np.array(dist_.get_array()) + dist.append(dist_) + dists.append(dist) + target_class = None + + elif type[:14] == 'to_nearest_obj': + # Generate an episode by sampling one of the target classes (with + # probability proportional to the number of nodes in the world). + # With the sampled class sample a node that is within some distance from + # the sampled class. + class_nodes = kwargs['class_nodes'] + sampling = kwargs['sampling'] + dist_to_class = kwargs['dist_to_class'] + + assert(num_goals == 1), 'Only supports a single goal.' + ind = rng.choice(class_nodes.shape[0], size=batch_size) + target_class = class_nodes[ind,1] + start_node_ids = []; dists = []; goal_node_ids = []; + + for t in target_class: + if sampling == 'uniform': + max_dist = max_dist + cnts = np.bincount(dist_to_class[t], minlength=max_dist+1)*1. + cnts[max_dist+1:] = 0 + p_each = 1./ cnts / (max_dist+1.) + p_each[cnts == 0] = 0 + p = p_each[dist_to_class[t]]*1.; p = p/np.sum(p) + start_node_id = rng.choice(p.shape[0], size=1, p=p)[0] + else: + logging.fatal('Sampling not one of uniform.') + start_node_ids.append(start_node_id) + dists.append(dist_to_class[t]) + # Dummy goal node, same as the start node, so that vis is better. + goal_node_ids.append(start_node_id) + dists = [dists] + goal_node_ids = [goal_node_ids] + + return start_node_ids, goal_node_ids, dists, target_class + + +class NavigationEnv(GridWorld, Building): + """Wrapper around GridWorld which sets up navigation tasks. + """ + def _debug_save_hardness(self, seed): + out_path = os.path.join(self.logdir, '{:s}_{:d}_hardness.png'.format(self.building_name, seed)) + batch_size = 4000 + rng = np.random.RandomState(0) + start_node_ids, end_node_ids, dists, pred_maps, paths, hardnesss, gt_dists = \ + rng_next_goal_rejection_sampling( + None, batch_size, self.task.gtG, rng, self.task_params.max_dist, + self.task_params.min_dist, self.task_params.max_dist, + self.task.sampling_distribution, self.task.target_distribution, + self.task.nodes, self.task_params.n_ori, self.task_params.step_size, + self.task.distribution_bins, self.task.rejection_sampling_M) + bins = self.task.distribution_bins + n_bins = self.task.n_bins + with plt.style.context('ggplot'): + fig, axes = utils.subplot(plt, (1,2), (10,10)) + ax = axes[0] + _ = ax.hist(hardnesss, bins=bins, weights=np.ones_like(hardnesss)/len(hardnesss)) + ax.plot(bins[:-1]+0.5/n_bins, self.task.target_distribution, 'g') + ax.plot(bins[:-1]+0.5/n_bins, self.task.sampling_distribution, 'b') + ax.grid('on') + + ax = axes[1] + _ = ax.hist(gt_dists, bins=np.arange(self.task_params.max_dist+1)) + ax.grid('on') + ax.set_title('Mean: {:0.2f}, Median: {:0.2f}'.format(np.mean(gt_dists), + np.median(gt_dists))) + with fu.fopen(out_path, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + def _debug_save_map_nodes(self, seed): + """Saves traversible space along with nodes generated on the graph. Takes + the seed as input.""" + img_path = os.path.join(self.logdir, '{:s}_{:d}_graph.png'.format(self.building_name, seed)) + node_xyt = self.to_actual_xyt_vec(self.task.nodes) + plt.set_cmap('jet'); + fig, ax = utils.subplot(plt, (1,1), (12,12)) + ax.plot(node_xyt[:,0], node_xyt[:,1], 'm.') + ax.set_axis_off(); ax.axis('equal'); + + if self.room_dims is not None: + for i, r in enumerate(self.room_dims['dims']*1): + min_ = r[:3]*1 + max_ = r[3:]*1 + xmin, ymin, zmin = min_ + xmax, ymax, zmax = max_ + + ax.plot([xmin, xmax, xmax, xmin, xmin], + [ymin, ymin, ymax, ymax, ymin], 'g') + ax.imshow(self.traversible, origin='lower'); + with fu.fopen(img_path, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + def _debug_semantic_maps(self, seed): + """Saves traversible space along with nodes generated on the graph. Takes + the seed as input.""" + for i, cls in enumerate(self.task_params.semantic_task.class_map_names): + img_path = os.path.join(self.logdir, '{:s}_flip{:d}_{:s}_graph.png'.format(self.building_name, seed, cls)) + maps = self.traversible*1. + maps += 0.5*(self.task.class_maps_dilated[:,:,i]) + write_traversible = (maps*1.+1.)/3.0 + write_traversible = (write_traversible*255.).astype(np.uint8)[:,:,np.newaxis] + write_traversible = write_traversible + np.zeros((1,1,3), dtype=np.uint8) + fu.write_image(img_path, write_traversible[::-1,:,:]) + + def _preprocess_for_task(self, seed): + """Sets up the task field for doing navigation on the grid world.""" + if self.task is None or self.task.seed != seed: + rng = np.random.RandomState(seed) + origin_loc = get_graph_origin_loc(rng, self.traversible) + self.task = utils.Foo(seed=seed, origin_loc=origin_loc, + n_ori=self.task_params.n_ori) + G = generate_graph(self.valid_fn_vec, self.task_params.step_size, + self.task.n_ori, (0, 0, 0)) + gtG, nodes, nodes_to_id = convert_to_graph_tool(G) + self.task.gtG = gtG + self.task.nodes = nodes + self.task.delta_theta = 2.0*np.pi/(self.task.n_ori*1.) + self.task.nodes_to_id = nodes_to_id + + logging.info('Building %s, #V=%d, #E=%d', self.building_name, + self.task.nodes.shape[0], self.task.gtG.num_edges()) + type = self.task_params.type + if type == 'general': + # Do nothing + _ = None + + elif type == 'room_to_room_many' or type == 'room_to_room_back': + if type == 'room_to_room_back': + assert(self.task_params.num_goals == 2), 'num_goals must be 2.' + + self.room_dims = _filter_rooms(self.room_dims, self.task_params.room_regex) + xyt = self.to_actual_xyt_vec(self.task.nodes) + self.task.node_room_ids = _label_nodes_with_room_id(xyt, self.room_dims) + self.task.reset_kwargs = {'node_room_ids': self.task.node_room_ids} + + elif type == 'rng_rejection_sampling_many': + n_bins = 20 + rejection_sampling_M = self.task_params.rejection_sampling_M + min_dist = self.task_params.min_dist + bins = np.arange(n_bins+1)/(n_bins*1.) + target_d = np.zeros(n_bins); target_d[...] = 1./n_bins; + + sampling_d = get_hardness_distribution( + self.task.gtG, self.task_params.max_dist, self.task_params.min_dist, + np.random.RandomState(0), 4000, bins, self.task.nodes, + self.task_params.n_ori, self.task_params.step_size) + + self.task.reset_kwargs = {'distribution_bins': bins, + 'target_distribution': target_d, + 'sampling_distribution': sampling_d, + 'rejection_sampling_M': rejection_sampling_M, + 'n_bins': n_bins, + 'n_ori': self.task_params.n_ori, + 'step_size': self.task_params.step_size, + 'min_dist': self.task_params.min_dist} + self.task.n_bins = n_bins + self.task.distribution_bins = bins + self.task.target_distribution = target_d + self.task.sampling_distribution = sampling_d + self.task.rejection_sampling_M = rejection_sampling_M + + if self.logdir is not None: + self._debug_save_hardness(seed) + + elif type[:14] == 'to_nearest_obj': + self.room_dims = _filter_rooms(self.room_dims, self.task_params.room_regex) + xyt = self.to_actual_xyt_vec(self.task.nodes) + + self.class_maps = _select_classes(self.class_maps, + self.class_map_names, + self.task_params.semantic_task.class_map_names)*1 + self.class_map_names = self.task_params.semantic_task.class_map_names + nodes_xyt = self.to_actual_xyt_vec(np.array(self.task.nodes)) + + tt = utils.Timer(); tt.tic(); + if self.task_params.type == 'to_nearest_obj_acc': + self.task.class_maps_dilated, self.task.node_class_label = label_nodes_with_class_geodesic( + nodes_xyt, self.class_maps, + self.task_params.semantic_task.pix_distance+8, self.map.traversible, + ff_cost=1., fo_cost=1., oo_cost=4., connectivity=8.) + + dists = [] + for i in range(len(self.class_map_names)): + class_nodes_ = np.where(self.task.node_class_label[:,i])[0] + dists.append(get_distance_node_list(gtG, source_nodes=class_nodes_, direction='to')) + self.task.dist_to_class = dists + a_, b_ = np.where(self.task.node_class_label) + self.task.class_nodes = np.concatenate((a_[:,np.newaxis], b_[:,np.newaxis]), axis=1) + + if self.logdir is not None: + self._debug_semantic_maps(seed) + + self.task.reset_kwargs = {'sampling': self.task_params.semantic_task.sampling, + 'class_nodes': self.task.class_nodes, + 'dist_to_class': self.task.dist_to_class} + + if self.logdir is not None: + self._debug_save_map_nodes(seed) + + def reset(self, rngs): + rng = rngs[0]; rng_perturb = rngs[1]; + nodes = self.task.nodes + tp = self.task_params + + start_node_ids, goal_node_ids, dists, target_class = \ + _nav_env_reset_helper(tp.type, rng, self.task.nodes, tp.batch_size, + self.task.gtG, tp.max_dist, tp.num_steps, + tp.num_goals, tp.data_augment, + **(self.task.reset_kwargs)) + + start_nodes = [tuple(nodes[_,:]) for _ in start_node_ids] + goal_nodes = [[tuple(nodes[_,:]) for _ in __] for __ in goal_node_ids] + data_augment = tp.data_augment + perturbs = _gen_perturbs(rng_perturb, tp.batch_size, + (tp.num_steps+1)*tp.num_goals, + data_augment.lr_flip, data_augment.delta_angle, + data_augment.delta_xy, data_augment.structured) + perturbs = np.array(perturbs) # batch x steps x 4 + end_perturbs = perturbs[:,-(tp.num_goals):,:]*1 # fixed perturb for the goal. + perturbs = perturbs[:,:-(tp.num_goals),:]*1 + + history = -np.ones((tp.batch_size, tp.num_steps*tp.num_goals), dtype=np.int32) + self.episode = utils.Foo( + start_nodes=start_nodes, start_node_ids=start_node_ids, + goal_nodes=goal_nodes, goal_node_ids=goal_node_ids, dist_to_goal=dists, + perturbs=perturbs, goal_perturbs=end_perturbs, history=history, + target_class=target_class, history_frames=[]) + return start_node_ids + + def take_action(self, current_node_ids, action, step_number): + """In addition to returning the action, also returns the reward that the + agent receives.""" + goal_number = step_number / self.task_params.num_steps + new_node_ids = GridWorld.take_action(self, current_node_ids, action) + rewards = [] + for i, n in enumerate(new_node_ids): + reward = 0 + if n == self.episode.goal_node_ids[goal_number][i]: + reward = self.task_params.reward_at_goal + reward = reward - self.task_params.reward_time_penalty + rewards.append(reward) + return new_node_ids, rewards + + + def get_optimal_action(self, current_node_ids, step_number): + """Returns the optimal action from the current node.""" + goal_number = step_number / self.task_params.num_steps + gtG = self.task.gtG + a = np.zeros((len(current_node_ids), self.task_params.num_actions), dtype=np.int32) + d_dict = self.episode.dist_to_goal[goal_number] + for i, c in enumerate(current_node_ids): + neigh = gtG.vertex(c).out_neighbours() + neigh_edge = gtG.vertex(c).out_edges() + ds = np.array([d_dict[i][int(x)] for x in neigh]) + ds_min = np.min(ds) + for i_, e in enumerate(neigh_edge): + if ds[i_] == ds_min: + _ = gtG.ep['action'][e] + a[i, _] = 1 + return a + + def get_targets(self, current_node_ids, step_number): + """Returns the target actions from the current node.""" + action = self.get_optimal_action(current_node_ids, step_number) + action = np.expand_dims(action, axis=1) + return vars(utils.Foo(action=action)) + + def get_targets_name(self): + """Returns the list of names of the targets.""" + return ['action'] + + def cleanup(self): + self.episode = None + +class VisualNavigationEnv(NavigationEnv): + """Class for doing visual navigation in environments. Functions for computing + features on states, etc. + """ + def __init__(self, robot, env, task_params, category_list=None, + building_name=None, flip=False, logdir=None, + building_loader=None, r_obj=None): + tt = utils.Timer() + tt.tic() + Building.__init__(self, building_name, robot, env, category_list, + small=task_params.toy_problem, flip=flip, logdir=logdir, + building_loader=building_loader) + + self.set_r_obj(r_obj) + self.task_params = task_params + self.task = None + self.episode = None + self._preprocess_for_task(self.task_params.building_seed) + if hasattr(self.task_params, 'map_scales'): + self.task.scaled_maps = resize_maps( + self.traversible.astype(np.float32)*1, self.task_params.map_scales, + self.task_params.map_resize_method) + else: + logging.fatal('VisualNavigationEnv does not support scale_f anymore.') + self.task.readout_maps_scaled = resize_maps( + self.traversible.astype(np.float32)*1, + self.task_params.readout_maps_scales, + self.task_params.map_resize_method) + tt.toc(log_at=1, log_str='VisualNavigationEnv __init__: ') + + def get_weight(self): + return self.task.nodes.shape[0] + + def get_common_data(self): + goal_nodes = self.episode.goal_nodes + start_nodes = self.episode.start_nodes + perturbs = self.episode.perturbs + goal_perturbs = self.episode.goal_perturbs + target_class = self.episode.target_class + + goal_locs = []; rel_goal_locs = []; + for i in range(len(goal_nodes)): + end_nodes = goal_nodes[i] + goal_loc, _, _, goal_theta = self.get_loc_axis( + np.array(end_nodes), delta_theta=self.task.delta_theta, + perturb=goal_perturbs[:,i,:]) + + # Compute the relative location to all goals from the starting location. + loc, _, _, theta = self.get_loc_axis(np.array(start_nodes), + delta_theta=self.task.delta_theta, + perturb=perturbs[:,0,:]) + r_goal, t_goal = _get_relative_goal_loc(goal_loc*1., loc, theta) + rel_goal_loc = np.concatenate((r_goal*np.cos(t_goal), r_goal*np.sin(t_goal), + np.cos(goal_theta-theta), + np.sin(goal_theta-theta)), axis=1) + rel_goal_locs.append(np.expand_dims(rel_goal_loc, axis=1)) + goal_locs.append(np.expand_dims(goal_loc, axis=1)) + + map = self.traversible*1. + maps = np.repeat(np.expand_dims(np.expand_dims(map, axis=0), axis=0), + self.task_params.batch_size, axis=0)*1 + if self.task_params.type[:14] == 'to_nearest_obj': + for i in range(self.task_params.batch_size): + maps[i,0,:,:] += 0.5*(self.task.class_maps_dilated[:,:,target_class[i]]) + + rel_goal_locs = np.concatenate(rel_goal_locs, axis=1) + goal_locs = np.concatenate(goal_locs, axis=1) + maps = np.expand_dims(maps, axis=-1) + + if self.task_params.type[:14] == 'to_nearest_obj': + rel_goal_locs = np.zeros((self.task_params.batch_size, 1, + len(self.task_params.semantic_task.class_map_names)), + dtype=np.float32) + goal_locs = np.zeros((self.task_params.batch_size, 1, 2), + dtype=np.float32) + for i in range(self.task_params.batch_size): + t = target_class[i] + rel_goal_locs[i,0,t] = 1. + goal_locs[i,0,0] = t + goal_locs[i,0,1] = np.NaN + + return vars(utils.Foo(orig_maps=maps, goal_loc=goal_locs, + rel_goal_loc_at_start=rel_goal_locs)) + + def pre_common_data(self, inputs): + return inputs + + + def get_features(self, current_node_ids, step_number): + task_params = self.task_params + goal_number = step_number / self.task_params.num_steps + end_nodes = self.task.nodes[self.episode.goal_node_ids[goal_number],:]*1 + current_nodes = self.task.nodes[current_node_ids,:]*1 + end_perturbs = self.episode.goal_perturbs[:,goal_number,:][:,np.newaxis,:] + perturbs = self.episode.perturbs + target_class = self.episode.target_class + + # Append to history. + self.episode.history[:,step_number] = np.array(current_node_ids) + + # Render out the images from current node. + outs = {} + + if self.task_params.outputs.images: + imgs_all = [] + imgs = self.render_nodes([tuple(x) for x in current_nodes], + perturb=perturbs[:,step_number,:]) + imgs_all.append(imgs) + aux_delta_thetas = self.task_params.aux_delta_thetas + for i in range(len(aux_delta_thetas)): + imgs = self.render_nodes([tuple(x) for x in current_nodes], + perturb=perturbs[:,step_number,:], + aux_delta_theta=aux_delta_thetas[i]) + imgs_all.append(imgs) + imgs_all = np.array(imgs_all) # A x B x H x W x C + imgs_all = np.transpose(imgs_all, axes=[1,0,2,3,4]) + imgs_all = np.expand_dims(imgs_all, axis=1) # B x N x A x H x W x C + if task_params.num_history_frames > 0: + if step_number == 0: + # Append the same frame 4 times + for i in range(task_params.num_history_frames+1): + self.episode.history_frames.insert(0, imgs_all*1.) + self.episode.history_frames.insert(0, imgs_all) + self.episode.history_frames.pop() + imgs_all_with_history = np.concatenate(self.episode.history_frames, axis=2) + else: + imgs_all_with_history = imgs_all + outs['imgs'] = imgs_all_with_history # B x N x A x H x W x C + + if self.task_params.outputs.node_ids: + outs['node_ids'] = np.array(current_node_ids).reshape((-1,1,1)) + outs['perturbs'] = np.expand_dims(perturbs[:,step_number, :]*1., axis=1) + + if self.task_params.outputs.analytical_counts: + assert(self.task_params.modalities == ['depth']) + d = image_pre(outs['imgs']*1., self.task_params.modalities) + cm = get_camera_matrix(self.task_params.img_width, + self.task_params.img_height, + self.task_params.img_fov) + XYZ = get_point_cloud_from_z(100./d[...,0], cm) + XYZ = make_geocentric(XYZ*100., self.robot.sensor_height, + self.robot.camera_elevation_degree) + for i in range(len(self.task_params.analytical_counts.map_sizes)): + non_linearity = self.task_params.analytical_counts.non_linearity[i] + count, isvalid = bin_points(XYZ*1., + map_size=self.task_params.analytical_counts.map_sizes[i], + xy_resolution=self.task_params.analytical_counts.xy_resolution[i], + z_bins=self.task_params.analytical_counts.z_bins[i]) + assert(count.shape[2] == 1), 'only works for n_views equal to 1.' + count = count[:,:,0,:,:,:] + isvalid = isvalid[:,:,0,:,:,:] + if non_linearity == 'none': + None + elif non_linearity == 'min10': + count = np.minimum(count, 10.) + elif non_linearity == 'sqrt': + count = np.sqrt(count) + else: + logging.fatal('Undefined non_linearity.') + outs['analytical_counts_{:d}'.format(i)] = count + + # Compute the goal location in the cordinate frame of the robot. + if self.task_params.outputs.rel_goal_loc: + if self.task_params.type[:14] != 'to_nearest_obj': + loc, _, _, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + goal_loc, _, _, goal_theta = self.get_loc_axis(end_nodes, + delta_theta=self.task.delta_theta, + perturb=end_perturbs[:,0,:]) + r_goal, t_goal = _get_relative_goal_loc(goal_loc, loc, theta) + + rel_goal_loc = np.concatenate((r_goal*np.cos(t_goal), r_goal*np.sin(t_goal), + np.cos(goal_theta-theta), + np.sin(goal_theta-theta)), axis=1) + outs['rel_goal_loc'] = np.expand_dims(rel_goal_loc, axis=1) + elif self.task_params.type[:14] == 'to_nearest_obj': + rel_goal_loc = np.zeros((self.task_params.batch_size, 1, + len(self.task_params.semantic_task.class_map_names)), + dtype=np.float32) + for i in range(self.task_params.batch_size): + t = target_class[i] + rel_goal_loc[i,0,t] = 1. + outs['rel_goal_loc'] = rel_goal_loc + + # Location on map to plot the trajectory during validation. + if self.task_params.outputs.loc_on_map: + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + outs['loc_on_map'] = np.expand_dims(loc, axis=1) + + # Compute gt_dist to goal + if self.task_params.outputs.gt_dist_to_goal: + gt_dist_to_goal = np.zeros((len(current_node_ids), 1), dtype=np.float32) + for i, n in enumerate(current_node_ids): + gt_dist_to_goal[i,0] = self.episode.dist_to_goal[goal_number][i][n] + outs['gt_dist_to_goal'] = np.expand_dims(gt_dist_to_goal, axis=1) + + # Free space in front of you, map and goal as images. + if self.task_params.outputs.ego_maps: + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + maps = generate_egocentric_maps(self.task.scaled_maps, + self.task_params.map_scales, + self.task_params.map_crop_sizes, loc, + x_axis, y_axis, theta) + + for i in range(len(self.task_params.map_scales)): + outs['ego_maps_{:d}'.format(i)] = \ + np.expand_dims(np.expand_dims(maps[i], axis=1), axis=-1) + + if self.task_params.outputs.readout_maps: + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + maps = generate_egocentric_maps(self.task.readout_maps_scaled, + self.task_params.readout_maps_scales, + self.task_params.readout_maps_crop_sizes, + loc, x_axis, y_axis, theta) + for i in range(len(self.task_params.readout_maps_scales)): + outs['readout_maps_{:d}'.format(i)] = \ + np.expand_dims(np.expand_dims(maps[i], axis=1), axis=-1) + + # Images for the goal. + if self.task_params.outputs.ego_goal_imgs: + if self.task_params.type[:14] != 'to_nearest_obj': + loc, x_axis, y_axis, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + goal_loc, _, _, _ = self.get_loc_axis(end_nodes, + delta_theta=self.task.delta_theta, + perturb=end_perturbs[:,0,:]) + rel_goal_orientation = np.mod( + np.int32(current_nodes[:,2:] - end_nodes[:,2:]), self.task_params.n_ori) + goal_dist, goal_theta = _get_relative_goal_loc(goal_loc, loc, theta) + goals = generate_goal_images(self.task_params.map_scales, + self.task_params.map_crop_sizes, + self.task_params.n_ori, goal_dist, + goal_theta, rel_goal_orientation) + for i in range(len(self.task_params.map_scales)): + outs['ego_goal_imgs_{:d}'.format(i)] = np.expand_dims(goals[i], axis=1) + + elif self.task_params.type[:14] == 'to_nearest_obj': + for i in range(len(self.task_params.map_scales)): + num_classes = len(self.task_params.semantic_task.class_map_names) + outs['ego_goal_imgs_{:d}'.format(i)] = np.zeros((self.task_params.batch_size, 1, + self.task_params.map_crop_sizes[i], + self.task_params.map_crop_sizes[i], + self.task_params.goal_channels)) + for i in range(self.task_params.batch_size): + t = target_class[i] + for j in range(len(self.task_params.map_scales)): + outs['ego_goal_imgs_{:d}'.format(j)][i,:,:,:,t] = 1. + + # Incremental locs and theta (for map warping), always in the original scale + # of the map, the subequent steps in the tf code scale appropriately. + # Scaling is done by just multiplying incremental_locs appropriately. + if self.task_params.outputs.egomotion: + if step_number == 0: + # Zero Ego Motion + incremental_locs = np.zeros((self.task_params.batch_size, 1, 2), dtype=np.float32) + incremental_thetas = np.zeros((self.task_params.batch_size, 1, 1), dtype=np.float32) + else: + previous_nodes = self.task.nodes[self.episode.history[:,step_number-1], :]*1 + loc, _, _, theta = self.get_loc_axis(current_nodes, + delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number,:]) + previous_loc, _, _, previous_theta = self.get_loc_axis( + previous_nodes, delta_theta=self.task.delta_theta, + perturb=perturbs[:,step_number-1,:]) + + incremental_locs_ = np.reshape(loc-previous_loc, [self.task_params.batch_size, 1, -1]) + + t = -np.pi/2+np.reshape(theta*1, [self.task_params.batch_size, 1, -1]) + incremental_locs = incremental_locs_*1 + incremental_locs[:,:,0] = np.sum(incremental_locs_ * + np.concatenate((np.cos(t), np.sin(t)), + axis=-1), axis=-1) + incremental_locs[:,:,1] = np.sum(incremental_locs_ * + np.concatenate((np.cos(t+np.pi/2), + np.sin(t+np.pi/2)), + axis=-1), axis=-1) + incremental_thetas = np.reshape(theta-previous_theta, + [self.task_params.batch_size, 1, -1]) + outs['incremental_locs'] = incremental_locs + outs['incremental_thetas'] = incremental_thetas + + if self.task_params.outputs.visit_count: + # Output the visit count for this state, how many times has the current + # state been visited, and how far in the history was the last visit + # (except this one) + visit_count = np.zeros((self.task_params.batch_size, 1), dtype=np.int32) + last_visit = -np.ones((self.task_params.batch_size, 1), dtype=np.int32) + if step_number >= 1: + h = self.episode.history[:,:(step_number)] + visit_count[:,0] = np.sum(h == np.array(current_node_ids).reshape([-1,1]), + axis=1) + last_visit[:,0] = np.argmax(h[:,::-1] == np.array(current_node_ids).reshape([-1,1]), + axis=1) + 1 + last_visit[visit_count == 0] = -1 # -1 if not visited. + outs['visit_count'] = np.expand_dims(visit_count, axis=1) + outs['last_visit'] = np.expand_dims(last_visit, axis=1) + return outs + + def get_features_name(self): + f = [] + if self.task_params.outputs.images: + f.append('imgs') + if self.task_params.outputs.rel_goal_loc: + f.append('rel_goal_loc') + if self.task_params.outputs.loc_on_map: + f.append('loc_on_map') + if self.task_params.outputs.gt_dist_to_goal: + f.append('gt_dist_to_goal') + if self.task_params.outputs.ego_maps: + for i in range(len(self.task_params.map_scales)): + f.append('ego_maps_{:d}'.format(i)) + if self.task_params.outputs.readout_maps: + for i in range(len(self.task_params.readout_maps_scales)): + f.append('readout_maps_{:d}'.format(i)) + if self.task_params.outputs.ego_goal_imgs: + for i in range(len(self.task_params.map_scales)): + f.append('ego_goal_imgs_{:d}'.format(i)) + if self.task_params.outputs.egomotion: + f.append('incremental_locs') + f.append('incremental_thetas') + if self.task_params.outputs.visit_count: + f.append('visit_count') + f.append('last_visit') + if self.task_params.outputs.analytical_counts: + for i in range(len(self.task_params.analytical_counts.map_sizes)): + f.append('analytical_counts_{:d}'.format(i)) + if self.task_params.outputs.node_ids: + f.append('node_ids') + f.append('perturbs') + return f + + def pre_features(self, inputs): + if self.task_params.outputs.images: + inputs['imgs'] = image_pre(inputs['imgs'], self.task_params.modalities) + return inputs + +class BuildingMultiplexer(): + def __init__(self, args, task_number): + params = vars(args) + for k in params.keys(): + setattr(self, k, params[k]) + self.task_number = task_number + self._pick_data(task_number) + logging.info('Env Class: %s.', self.env_class) + if self.task_params.task == 'planning': + self._setup_planner() + elif self.task_params.task == 'mapping': + self._setup_mapper() + elif self.task_params.task == 'map+plan': + self._setup_mapper() + else: + logging.error('Undefined task: %s'.format(self.task_params.task)) + + def _pick_data(self, task_number): + logging.error('Input Building Names: %s', self.building_names) + self.flip = [np.mod(task_number / len(self.building_names), 2) == 1] + id = np.mod(task_number, len(self.building_names)) + self.building_names = [self.building_names[id]] + self.task_params.building_seed = task_number + logging.error('BuildingMultiplexer: Picked Building Name: %s', self.building_names) + self.building_names = self.building_names[0].split('+') + self.flip = [self.flip[0] for _ in self.building_names] + logging.error('BuildingMultiplexer: Picked Building Name: %s', self.building_names) + logging.error('BuildingMultiplexer: Flipping Buildings: %s', self.flip) + logging.error('BuildingMultiplexer: Set building_seed: %d', self.task_params.building_seed) + self.num_buildings = len(self.building_names) + logging.error('BuildingMultiplexer: Num buildings: %d', self.num_buildings) + + def _setup_planner(self): + # Load building env class. + self.buildings = [] + for i, building_name in enumerate(self.building_names): + b = self.env_class(robot=self.robot, env=self.env, + task_params=self.task_params, + building_name=building_name, flip=self.flip[i], + logdir=self.logdir, building_loader=self.dataset) + self.buildings.append(b) + + def _setup_mapper(self): + # Set up the renderer. + cp = self.camera_param + rgb_shader, d_shader = sru.get_shaders(cp.modalities) + r_obj = SwiftshaderRenderer() + r_obj.init_display(width=cp.width, height=cp.height, fov=cp.fov, + z_near=cp.z_near, z_far=cp.z_far, rgb_shader=rgb_shader, + d_shader=d_shader) + self.r_obj = r_obj + r_obj.clear_scene() + + # Load building env class. + self.buildings = [] + wt = [] + for i, building_name in enumerate(self.building_names): + b = self.env_class(robot=self.robot, env=self.env, + task_params=self.task_params, + building_name=building_name, flip=self.flip[i], + logdir=self.logdir, building_loader=self.dataset, + r_obj=r_obj) + wt.append(b.get_weight()) + b.load_building_into_scene() + b.set_building_visibility(False) + self.buildings.append(b) + wt = np.array(wt).astype(np.float32) + wt = wt / np.sum(wt+0.0001) + self.building_sampling_weights = wt + + def sample_building(self, rng): + if self.num_buildings == 1: + building_id = rng.choice(range(len(self.building_names))) + else: + building_id = rng.choice(self.num_buildings, + p=self.building_sampling_weights) + b = self.buildings[building_id] + instances = b._gen_rng(rng) + self._building_id = building_id + return self.buildings[building_id], instances + + def sample_env(self, rngs): + rng = rngs[0]; + if self.num_buildings == 1: + building_id = rng.choice(range(len(self.building_names))) + else: + building_id = rng.choice(self.num_buildings, + p=self.building_sampling_weights) + return self.buildings[building_id] + + def pre(self, inputs): + return self.buildings[self._building_id].pre(inputs) + + def __del__(self): + self.r_obj.clear_scene() + logging.error('Clearing scene.') diff --git a/cognitive_mapping_and_planning/datasets/nav_env_config.py b/cognitive_mapping_and_planning/datasets/nav_env_config.py new file mode 100644 index 0000000000000000000000000000000000000000..3d71c5767c4dc0ed9f05cce5c1790f11ede3778a --- /dev/null +++ b/cognitive_mapping_and_planning/datasets/nav_env_config.py @@ -0,0 +1,127 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Configs for stanford navigation environment. + +Base config for stanford navigation enviornment. +""" +import numpy as np +import src.utils as utils +import datasets.nav_env as nav_env + +def nav_env_base_config(): + """Returns the base config for stanford navigation environment. + + Returns: + Base config for stanford navigation environment. + """ + robot = utils.Foo(radius=15, + base=10, + height=140, + sensor_height=120, + camera_elevation_degree=-15) + + env = utils.Foo(padding=10, + resolution=5, + num_point_threshold=2, + valid_min=-10, + valid_max=200, + n_samples_per_face=200) + + camera_param = utils.Foo(width=225, + height=225, + z_near=0.05, + z_far=20.0, + fov=60., + modalities=['rgb'], + img_channels=3) + + data_augment = utils.Foo(lr_flip=0, + delta_angle=0.5, + delta_xy=4, + relight=True, + relight_fast=False, + structured=False) # if True, uses the same perturb for the whole episode. + + outputs = utils.Foo(images=True, + rel_goal_loc=False, + loc_on_map=True, + gt_dist_to_goal=True, + ego_maps=False, + ego_goal_imgs=False, + egomotion=False, + visit_count=False, + analytical_counts=False, + node_ids=True, + readout_maps=False) + + # class_map_names=['board', 'chair', 'door', 'sofa', 'table'] + class_map_names = ['chair', 'door', 'table'] + semantic_task = utils.Foo(class_map_names=class_map_names, pix_distance=16, + sampling='uniform') + + # time per iteration for cmp is 0.82 seconds per episode with 3.4s overhead per batch. + task_params = utils.Foo(max_dist=32, + step_size=8, + num_steps=40, + num_actions=4, + batch_size=4, + building_seed=0, + num_goals=1, + img_height=None, + img_width=None, + img_channels=None, + modalities=None, + outputs=outputs, + map_scales=[1.], + map_crop_sizes=[64], + rel_goal_loc_dim=4, + base_class='Building', + task='map+plan', + n_ori=4, + type='room_to_room_many', + data_augment=data_augment, + room_regex='^((?!hallway).)*$', + toy_problem=False, + map_channels=1, + gt_coverage=False, + input_type='maps', + full_information=False, + aux_delta_thetas=[], + semantic_task=semantic_task, + num_history_frames=0, + node_ids_dim=1, + perturbs_dim=4, + map_resize_method='linear_noantialiasing', + readout_maps_channels=1, + readout_maps_scales=[], + readout_maps_crop_sizes=[], + n_views=1, + reward_time_penalty=0.1, + reward_at_goal=1., + discount_factor=0.99, + rejection_sampling_M=100, + min_dist=None) + + navtask_args = utils.Foo( + building_names=['area1_gates_wingA_floor1_westpart'], + env_class=nav_env.VisualNavigationEnv, + robot=robot, + task_params=task_params, + env=env, + camera_param=camera_param, + cache_rooms=True) + return navtask_args + diff --git a/cognitive_mapping_and_planning/matplotlibrc b/cognitive_mapping_and_planning/matplotlibrc new file mode 100644 index 0000000000000000000000000000000000000000..ed5097572ae68680d0c9afdf510968e1c3d175d4 --- /dev/null +++ b/cognitive_mapping_and_planning/matplotlibrc @@ -0,0 +1 @@ +backend : agg diff --git a/cognitive_mapping_and_planning/output/.gitignore b/cognitive_mapping_and_planning/output/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..a767cafbbd864d0baf76530294598e4c2be60a24 --- /dev/null +++ b/cognitive_mapping_and_planning/output/.gitignore @@ -0,0 +1 @@ +* diff --git a/cognitive_mapping_and_planning/output/README.md b/cognitive_mapping_and_planning/output/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7518c3874390da7e2aa65a89ccdec035ca7610e8 --- /dev/null +++ b/cognitive_mapping_and_planning/output/README.md @@ -0,0 +1,16 @@ +### Pre-Trained Models + +We provide the following pre-trained models: + +Config Name | Checkpoint | Mean Dist. | 50%ile Dist. | 75%ile Dist. | Success %age | +:-: | :-: | :-: | :-: | :-: | :-: | +cmp.lmap_Msc.clip5.sbpd_d_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_r2r.tar) | 4.79 | 0 | 1 | 78.9 | +cmp.lmap_Msc.clip5.sbpd_rgb_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_rgb_r2r.tar) | 7.74 | 0 | 14 | 62.4 | +cmp.lmap_Msc.clip5.sbpd_d_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_ST.tar) | 10.67 | 9 | 19 | 39.7 | +cmp.lmap_Msc.clip5.sbpd_rgb_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_rgb_ST.tar) | 11.27 | 10 | 19 | 35.6 | +cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80 | [ckpt](http:////download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80.tar) | 11.6 | 0 | 19 | 66.9 | +bl.v2.noclip.sbpd_d_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_r2r.tar) | 5.90 | 0 | 6 | 71.2 | +bl.v2.noclip.sbpd_rgb_r2r | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_rgb_r2r.tar) | 10.21 | 1 | 21 | 53.4 | +bl.v2.noclip.sbpd_d_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_ST.tar) | 13.29 | 14 | 23 | 28.0 | +bl.v2.noclip.sbpd_rgb_ST | [ckpt](http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_rgb_ST.tar) | 13.37 | 13 | 20 | 24.2 | +bl.v2.noclip.sbpd_d_r2r_h0_64_80 | [ckpt](http:////download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/bl.v2.noclip.sbpd_d_r2r_h0_64_80.tar) | 15.30 | 0 | 29 | 57.9 | diff --git a/cognitive_mapping_and_planning/patches/GLES2_2_0.py.patch b/cognitive_mapping_and_planning/patches/GLES2_2_0.py.patch new file mode 100644 index 0000000000000000000000000000000000000000..de1be442d5b9fff44862d37b9329e32face2b663 --- /dev/null +++ b/cognitive_mapping_and_planning/patches/GLES2_2_0.py.patch @@ -0,0 +1,14 @@ +10c10 +< from OpenGL import platform, constant, arrays +--- +> from OpenGL import platform, constant, arrays, contextdata +249a250 +> from OpenGL._bytes import _NULL_8_BYTE +399c400 +< array = ArrayDatatype.asArray( pointer, type ) +--- +> array = arrays.ArrayDatatype.asArray( pointer, type ) +405c406 +< ArrayDatatype.voidDataPointer( array ) +--- +> arrays.ArrayDatatype.voidDataPointer( array ) diff --git a/cognitive_mapping_and_planning/patches/apply_patches.sh b/cognitive_mapping_and_planning/patches/apply_patches.sh new file mode 100644 index 0000000000000000000000000000000000000000..4a786058258decdfb381eff25684183d92788ebe --- /dev/null +++ b/cognitive_mapping_and_planning/patches/apply_patches.sh @@ -0,0 +1,18 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +echo $VIRTUAL_ENV +patch $VIRTUAL_ENV/local/lib/python2.7/site-packages/OpenGL/GLES2/VERSION/GLES2_2_0.py patches/GLES2_2_0.py.patch +patch $VIRTUAL_ENV/local/lib/python2.7/site-packages/OpenGL/platform/ctypesloader.py patches/ctypesloader.py.patch diff --git a/cognitive_mapping_and_planning/patches/ctypesloader.py.patch b/cognitive_mapping_and_planning/patches/ctypesloader.py.patch new file mode 100644 index 0000000000000000000000000000000000000000..27dd43b18010dc5fdcd605b9a5d470abaa19151f --- /dev/null +++ b/cognitive_mapping_and_planning/patches/ctypesloader.py.patch @@ -0,0 +1,15 @@ +45c45,46 +< return dllType( name, mode ) +--- +> print './' + name +> return dllType( './' + name, mode ) +47,48c48,53 +< err.args += (name,fullName) +< raise +--- +> try: +> print name +> return dllType( name, mode ) +> except: +> err.args += (name,fullName) +> raise diff --git a/cognitive_mapping_and_planning/render/__init__.py b/cognitive_mapping_and_planning/render/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/render/depth_rgb_encoded.fp b/cognitive_mapping_and_planning/render/depth_rgb_encoded.fp new file mode 100644 index 0000000000000000000000000000000000000000..23e93d27f585e93896799f177888e9c50fa03eed --- /dev/null +++ b/cognitive_mapping_and_planning/render/depth_rgb_encoded.fp @@ -0,0 +1,30 @@ +// This shader computes per-pixel depth (-z coordinate in the camera space, or +// orthogonal distance to the camera plane). The result is multiplied by the +// `kFixedPointFraction` constant and is encoded to RGB channels as an integer +// (R being the least significant byte). + +#ifdef GL_ES +#ifdef GL_FRAGMENT_PRECISION_HIGH +precision highp float; +#else +precision mediump float; +#endif +#endif + +const float kFixedPointFraction = 1000.0; + +varying float vDepth; + +void main(void) { + float d = vDepth; + + // Encode the depth to RGB. + d *= (kFixedPointFraction / 255.0); + gl_FragColor.r = mod(d, 1.0); + d = (d - gl_FragColor.r) / 255.0; + gl_FragColor.g = mod(d, 1.0); + d = (d - gl_FragColor.g) / 255.0; + gl_FragColor.b = mod(d, 1.0); + + gl_FragColor.a = 1.0; +} diff --git a/cognitive_mapping_and_planning/render/depth_rgb_encoded.vp b/cognitive_mapping_and_planning/render/depth_rgb_encoded.vp new file mode 100644 index 0000000000000000000000000000000000000000..2db74f14aa7f253b8f544ec1ab519129f13426a0 --- /dev/null +++ b/cognitive_mapping_and_planning/render/depth_rgb_encoded.vp @@ -0,0 +1,15 @@ +uniform mat4 uViewMatrix; +uniform mat4 uProjectionMatrix; + +attribute vec3 aPosition; + +varying float vDepth; + +void main(void) { + vec4 worldPosition = vec4(aPosition, 1.0); + vec4 viewPosition = uViewMatrix * worldPosition; + gl_Position = uProjectionMatrix * viewPosition; + + // Orthogonal depth is simply -z in the camera space. + vDepth = -viewPosition.z; +} diff --git a/cognitive_mapping_and_planning/render/rgb_flat_color.fp b/cognitive_mapping_and_planning/render/rgb_flat_color.fp new file mode 100644 index 0000000000000000000000000000000000000000..c8c24d76103793d9cfa9166517177cb332d1a92c --- /dev/null +++ b/cognitive_mapping_and_planning/render/rgb_flat_color.fp @@ -0,0 +1,11 @@ +precision highp float; +varying vec4 vColor; +varying vec2 vTextureCoord; + +uniform sampler2D uTexture; + +void main(void) { + vec4 color = vColor; + color = texture2D(uTexture, vTextureCoord); + gl_FragColor = color; +} diff --git a/cognitive_mapping_and_planning/render/rgb_flat_color.vp b/cognitive_mapping_and_planning/render/rgb_flat_color.vp new file mode 100644 index 0000000000000000000000000000000000000000..ebc79173405f7449921fd40f778fe3695aab5ea8 --- /dev/null +++ b/cognitive_mapping_and_planning/render/rgb_flat_color.vp @@ -0,0 +1,18 @@ +uniform mat4 uViewMatrix; +uniform mat4 uProjectionMatrix; +uniform vec4 uColor; + +attribute vec4 aColor; +attribute vec3 aPosition; +attribute vec2 aTextureCoord; + +varying vec4 vColor; +varying vec2 vTextureCoord; + +void main(void) { + vec4 worldPosition = vec4(aPosition, 1.0); + gl_Position = uProjectionMatrix * (uViewMatrix * worldPosition); + + vColor = aColor * uColor; + vTextureCoord = aTextureCoord; +} diff --git a/cognitive_mapping_and_planning/render/swiftshader_renderer.py b/cognitive_mapping_and_planning/render/swiftshader_renderer.py new file mode 100644 index 0000000000000000000000000000000000000000..74b1be72c11a2877231a66886d02babfd4793ce8 --- /dev/null +++ b/cognitive_mapping_and_planning/render/swiftshader_renderer.py @@ -0,0 +1,427 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Implements loading and rendering of meshes. Contains 2 classes: + Shape: Class that exposes high level functions for loading and manipulating + shapes. This currently is bound to assimp + (https://github.com/assimp/assimp). If you want to interface to a different + library, reimplement this class with bindings to your mesh loading library. + + SwiftshaderRenderer: Class that renders Shapes. Currently this uses python + bindings to OpenGL (EGL), bindings to an alternate renderer may be implemented + here. +""" + +import numpy as np, os +import cv2, ctypes, logging, os, numpy as np +import pyassimp as assimp +from OpenGL.GLES2 import * +from OpenGL.EGL import * +import src.rotation_utils as ru + +__version__ = 'swiftshader_renderer' + +def get_shaders(modalities): + rgb_shader = 'rgb_flat_color' if 'rgb' in modalities else None + d_shader = 'depth_rgb_encoded' if 'depth' in modalities else None + return rgb_shader, d_shader + +def sample_points_on_faces(vs, fs, rng, n_samples_per_face): + idx = np.repeat(np.arange(fs.shape[0]), n_samples_per_face) + + r = rng.rand(idx.size, 2) + r1 = r[:,:1]; r2 = r[:,1:]; sqrt_r1 = np.sqrt(r1); + + v1 = vs[fs[idx, 0], :]; v2 = vs[fs[idx, 1], :]; v3 = vs[fs[idx, 2], :]; + pts = (1-sqrt_r1)*v1 + sqrt_r1*(1-r2)*v2 + sqrt_r1*r2*v3 + + v1 = vs[fs[:,0], :]; v2 = vs[fs[:, 1], :]; v3 = vs[fs[:, 2], :]; + ar = 0.5*np.sqrt(np.sum(np.cross(v1-v3, v2-v3)**2, 1)) + + return pts, ar, idx + +class Shape(): + def get_pyassimp_load_options(self): + load_flags = assimp.postprocess.aiProcess_Triangulate; + load_flags = load_flags | assimp.postprocess.aiProcess_SortByPType; + load_flags = load_flags | assimp.postprocess.aiProcess_OptimizeMeshes; + load_flags = load_flags | assimp.postprocess.aiProcess_RemoveRedundantMaterials; + load_flags = load_flags | assimp.postprocess.aiProcess_FindDegenerates; + load_flags = load_flags | assimp.postprocess.aiProcess_GenSmoothNormals; + load_flags = load_flags | assimp.postprocess.aiProcess_JoinIdenticalVertices; + load_flags = load_flags | assimp.postprocess.aiProcess_ImproveCacheLocality; + load_flags = load_flags | assimp.postprocess.aiProcess_GenUVCoords; + load_flags = load_flags | assimp.postprocess.aiProcess_FindInvalidData; + return load_flags + + def __init__(self, obj_file, material_file=None, load_materials=True, + name_prefix='', name_suffix=''): + if material_file is not None: + logging.error('Ignoring material file input, reading them off obj file.') + load_flags = self.get_pyassimp_load_options() + scene = assimp.load(obj_file, processing=load_flags) + filter_ind = self._filter_triangles(scene.meshes) + self.meshes = [scene.meshes[i] for i in filter_ind] + for m in self.meshes: + m.name = name_prefix + m.name + name_suffix + + dir_name = os.path.dirname(obj_file) + # Load materials + materials = None + if load_materials: + materials = [] + for m in self.meshes: + file_name = os.path.join(dir_name, m.material.properties[('file', 1)]) + assert(os.path.exists(file_name)), \ + 'Texture file {:s} foes not exist.'.format(file_name) + img_rgb = cv2.imread(file_name)[::-1,:,::-1] + if img_rgb.shape[0] != img_rgb.shape[1]: + logging.warn('Texture image not square.') + sz = np.maximum(img_rgb.shape[0], img_rgb.shape[1]) + sz = int(np.power(2., np.ceil(np.log2(sz)))) + img_rgb = cv2.resize(img_rgb, (sz,sz), interpolation=cv2.INTER_LINEAR) + else: + sz = img_rgb.shape[0] + sz_ = int(np.power(2., np.ceil(np.log2(sz)))) + if sz != sz_: + logging.warn('Texture image not square of power of 2 size. ' + + 'Changing size from %d to %d.', sz, sz_) + sz = sz_ + img_rgb = cv2.resize(img_rgb, (sz,sz), interpolation=cv2.INTER_LINEAR) + materials.append(img_rgb) + self.scene = scene + self.materials = materials + + def _filter_triangles(self, meshes): + select = [] + for i in range(len(meshes)): + if meshes[i].primitivetypes == 4: + select.append(i) + return select + + def flip_shape(self): + for m in self.meshes: + m.vertices[:,1] = -m.vertices[:,1] + bb = m.faces*1 + bb[:,1] = m.faces[:,2] + bb[:,2] = m.faces[:,1] + m.faces = bb + # m.vertices[:,[0,1]] = m.vertices[:,[1,0]] + + def get_vertices(self): + vs = [] + for m in self.meshes: + vs.append(m.vertices) + vss = np.concatenate(vs, axis=0) + return vss, vs + + def get_faces(self): + vs = [] + for m in self.meshes: + v = m.faces + vs.append(v) + return vs + + def get_number_of_meshes(self): + return len(self.meshes) + + def scale(self, sx=1., sy=1., sz=1.): + pass + + def sample_points_on_face_of_shape(self, i, n_samples_per_face, sc): + v = self.meshes[i].vertices*sc + f = self.meshes[i].faces + p, face_areas, face_idx = sample_points_on_faces( + v, f, np.random.RandomState(0), n_samples_per_face) + return p, face_areas, face_idx + + def __del__(self): + scene = self.scene + assimp.release(scene) + +class SwiftshaderRenderer(): + def __init__(self): + self.entities = {} + + def init_display(self, width, height, fov, z_near, z_far, rgb_shader, + d_shader): + self.init_renderer_egl(width, height) + dir_path = os.path.dirname(os.path.realpath(__file__)) + if d_shader is not None and rgb_shader is not None: + logging.fatal('Does not support setting both rgb_shader and d_shader.') + + if d_shader is not None: + assert rgb_shader is None + shader = d_shader + self.modality = 'depth' + + if rgb_shader is not None: + assert d_shader is None + shader = rgb_shader + self.modality = 'rgb' + + self.create_shaders(os.path.join(dir_path, shader+'.vp'), + os.path.join(dir_path, shader + '.fp')) + aspect = width*1./(height*1.) + self.set_camera(fov, z_near, z_far, aspect) + + def init_renderer_egl(self, width, height): + major,minor = ctypes.c_long(),ctypes.c_long() + logging.info('init_renderer_egl: EGL_DEFAULT_DISPLAY: %s', EGL_DEFAULT_DISPLAY) + + egl_display = eglGetDisplay(EGL_DEFAULT_DISPLAY) + logging.info('init_renderer_egl: egl_display: %s', egl_display) + + eglInitialize(egl_display, major, minor) + logging.info('init_renderer_egl: EGL_OPENGL_API, EGL_OPENGL_ES_API: %s, %s', + EGL_OPENGL_API, EGL_OPENGL_ES_API) + eglBindAPI(EGL_OPENGL_ES_API) + + num_configs = ctypes.c_long() + configs = (EGLConfig*1)() + local_attributes = [EGL_RED_SIZE, 8, EGL_GREEN_SIZE, 8, EGL_BLUE_SIZE, 8, + EGL_DEPTH_SIZE, 16, EGL_SURFACE_TYPE, EGL_PBUFFER_BIT, + EGL_RENDERABLE_TYPE, EGL_OPENGL_ES2_BIT, EGL_NONE,] + logging.error('init_renderer_egl: local attributes: %s', local_attributes) + local_attributes = arrays.GLintArray.asArray(local_attributes) + success = eglChooseConfig(egl_display, local_attributes, configs, 1, num_configs) + logging.error('init_renderer_egl: eglChooseConfig success, num_configs: %d, %d', success, num_configs.value) + egl_config = configs[0] + + + context_attributes = [EGL_CONTEXT_CLIENT_VERSION, 2, EGL_NONE] + context_attributes = arrays.GLintArray.asArray(context_attributes) + egl_context = eglCreateContext(egl_display, egl_config, EGL_NO_CONTEXT, context_attributes) + + buffer_attributes = [EGL_WIDTH, width, EGL_HEIGHT, height, EGL_NONE] + buffer_attributes = arrays.GLintArray.asArray(buffer_attributes) + egl_surface = eglCreatePbufferSurface(egl_display, egl_config, buffer_attributes) + + + eglMakeCurrent(egl_display, egl_surface, egl_surface, egl_context) + logging.error("init_renderer_egl: egl_display: %s egl_surface: %s, egl_config: %s", egl_display, egl_surface, egl_context) + + glViewport(0, 0, width, height); + + self.egl_display = egl_display + self.egl_surface = egl_surface + self.egl_config = egl_config + self.egl_mapping = {} + self.render_timer = None + self.load_timer = None + self.height = height + self.width = width + + def create_shaders(self, v_shader_file, f_shader_file): + v_shader = glCreateShader(GL_VERTEX_SHADER) + with open(v_shader_file, 'r') as f: + ls = '' + for l in f: + ls = ls + l + glShaderSource(v_shader, ls) + glCompileShader(v_shader); + assert(glGetShaderiv(v_shader, GL_COMPILE_STATUS) == 1) + + f_shader = glCreateShader(GL_FRAGMENT_SHADER) + with open(f_shader_file, 'r') as f: + ls = '' + for l in f: + ls = ls + l + glShaderSource(f_shader, ls) + glCompileShader(f_shader); + assert(glGetShaderiv(f_shader, GL_COMPILE_STATUS) == 1) + + egl_program = glCreateProgram(); + assert(egl_program) + glAttachShader(egl_program, v_shader) + glAttachShader(egl_program, f_shader) + glLinkProgram(egl_program); + assert(glGetProgramiv(egl_program, GL_LINK_STATUS) == 1) + glUseProgram(egl_program) + + glBindAttribLocation(egl_program, 0, "aPosition") + glBindAttribLocation(egl_program, 1, "aColor") + glBindAttribLocation(egl_program, 2, "aTextureCoord") + + self.egl_program = egl_program + self.egl_mapping['vertexs'] = 0 + self.egl_mapping['vertexs_color'] = 1 + self.egl_mapping['vertexs_tc'] = 2 + + glClearColor(0.0, 0.0, 0.0, 1.0); + # glEnable(GL_CULL_FACE); glCullFace(GL_BACK); + glEnable(GL_DEPTH_TEST); + + glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) + + def set_camera(self, fov_vertical, z_near, z_far, aspect): + width = 2*np.tan(np.deg2rad(fov_vertical)/2.0)*z_near*aspect; + height = 2*np.tan(np.deg2rad(fov_vertical)/2.0)*z_near; + egl_program = self.egl_program + c = np.eye(4, dtype=np.float32) + c[3,3] = 0 + c[3,2] = -1 + c[2,2] = -(z_near+z_far)/(z_far-z_near) + c[2,3] = -2.0*(z_near*z_far)/(z_far-z_near) + c[0,0] = 2.0*z_near/width + c[1,1] = 2.0*z_near/height + c = c.T + + projection_matrix_o = glGetUniformLocation(egl_program, 'uProjectionMatrix') + projection_matrix = np.eye(4, dtype=np.float32) + projection_matrix[...] = c + projection_matrix = np.reshape(projection_matrix, (-1)) + glUniformMatrix4fv(projection_matrix_o, 1, GL_FALSE, projection_matrix) + + + def load_default_object(self): + v = np.array([[0.0, 0.5, 0.0, 1.0, 1.0, 0.0, 1.0], + [-0.5, -0.5, 0.0, 1.0, 0.0, 1.0, 1.0], + [0.5, -0.5, 0.0, 1.0, 1.0, 1.0, 1.0]], dtype=np.float32) + v = np.concatenate((v,v+0.1), axis=0) + v = np.ascontiguousarray(v, dtype=np.float32) + + vbo = glGenBuffers(1) + glBindBuffer (GL_ARRAY_BUFFER, vbo) + glBufferData (GL_ARRAY_BUFFER, v.dtype.itemsize*v.size, v, GL_STATIC_DRAW) + glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 28, ctypes.c_void_p(0)) + glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 28, ctypes.c_void_p(12)) + glEnableVertexAttribArray(0); + glEnableVertexAttribArray(1); + + self.num_to_render = 6; + + def _actual_render(self): + for entity_id, entity in self.entities.iteritems(): + if entity['visible']: + vbo = entity['vbo'] + tbo = entity['tbo'] + num = entity['num'] + + glBindBuffer(GL_ARRAY_BUFFER, vbo) + glVertexAttribPointer(self.egl_mapping['vertexs'], 3, GL_FLOAT, GL_FALSE, + 20, ctypes.c_void_p(0)) + glVertexAttribPointer(self.egl_mapping['vertexs_tc'], 2, GL_FLOAT, + GL_FALSE, 20, ctypes.c_void_p(12)) + glEnableVertexAttribArray(self.egl_mapping['vertexs']); + glEnableVertexAttribArray(self.egl_mapping['vertexs_tc']); + + glBindTexture(GL_TEXTURE_2D, tbo) + glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); + glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); + glDrawArrays(GL_TRIANGLES, 0, num) + + def render(self, take_screenshot=False, output_type=0): + # self.render_timer.tic() + self._actual_render() + # self.render_timer.toc(log_at=1000, log_str='render timer', type='time') + + np_rgb_img = None + np_d_img = None + c = 1000. + if take_screenshot: + if self.modality == 'rgb': + screenshot_rgba = np.zeros((self.height, self.width, 4), dtype=np.uint8) + glReadPixels(0, 0, self.width, self.height, GL_RGBA, GL_UNSIGNED_BYTE, screenshot_rgba) + np_rgb_img = screenshot_rgba[::-1,:,:3]; + + if self.modality == 'depth': + screenshot_d = np.zeros((self.height, self.width, 4), dtype=np.uint8) + glReadPixels(0, 0, self.width, self.height, GL_RGBA, GL_UNSIGNED_BYTE, screenshot_d) + np_d_img = screenshot_d[::-1,:,:3]; + np_d_img = np_d_img[:,:,2]*(255.*255./c) + np_d_img[:,:,1]*(255./c) + np_d_img[:,:,0]*(1./c) + np_d_img = np_d_img.astype(np.float32) + np_d_img[np_d_img == 0] = np.NaN + np_d_img = np_d_img[:,:,np.newaxis] + + glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT) + return np_rgb_img, np_d_img + + def _load_mesh_into_gl(self, mesh, material): + vvt = np.concatenate((mesh.vertices, mesh.texturecoords[0,:,:2]), axis=1) + vvt = np.ascontiguousarray(vvt[mesh.faces.reshape((-1)),:], dtype=np.float32) + num = vvt.shape[0] + vvt = np.reshape(vvt, (-1)) + + vbo = glGenBuffers(1) + glBindBuffer(GL_ARRAY_BUFFER, vbo) + glBufferData(GL_ARRAY_BUFFER, vvt.dtype.itemsize*vvt.size, vvt, GL_STATIC_DRAW) + + tbo = glGenTextures(1) + glBindTexture(GL_TEXTURE_2D, tbo) + glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, material.shape[1], + material.shape[0], 0, GL_RGB, GL_UNSIGNED_BYTE, + np.reshape(material, (-1))) + return num, vbo, tbo + + def load_shapes(self, shapes): + entities = self.entities + entity_ids = [] + for i, shape in enumerate(shapes): + for j in range(len(shape.meshes)): + name = shape.meshes[j].name + assert name not in entities, '{:s} entity already exists.'.format(name) + num, vbo, tbo = self._load_mesh_into_gl(shape.meshes[j], shape.materials[j]) + entities[name] = {'num': num, 'vbo': vbo, 'tbo': tbo, 'visible': False} + entity_ids.append(name) + return entity_ids + + def set_entity_visible(self, entity_ids, visibility): + for entity_id in entity_ids: + self.entities[entity_id]['visible'] = visibility + + def position_camera(self, camera_xyz, lookat_xyz, up): + camera_xyz = np.array(camera_xyz) + lookat_xyz = np.array(lookat_xyz) + up = np.array(up) + lookat_to = lookat_xyz - camera_xyz + lookat_from = np.array([0, 1., 0.]) + up_from = np.array([0, 0., 1.]) + up_to = up * 1. + # np.set_printoptions(precision=2, suppress=True) + # print up_from, lookat_from, up_to, lookat_to + r = ru.rotate_camera_to_point_at(up_from, lookat_from, up_to, lookat_to) + R = np.eye(4, dtype=np.float32) + R[:3,:3] = r + + t = np.eye(4, dtype=np.float32) + t[:3,3] = -camera_xyz + + view_matrix = np.dot(R.T, t) + flip_yz = np.eye(4, dtype=np.float32) + flip_yz[1,1] = 0; flip_yz[2,2] = 0; flip_yz[1,2] = 1; flip_yz[2,1] = -1; + view_matrix = np.dot(flip_yz, view_matrix) + view_matrix = view_matrix.T + # print np.concatenate((R, t, view_matrix), axis=1) + view_matrix = np.reshape(view_matrix, (-1)) + view_matrix_o = glGetUniformLocation(self.egl_program, 'uViewMatrix') + glUniformMatrix4fv(view_matrix_o, 1, GL_FALSE, view_matrix) + return None, None #camera_xyz, q + + def clear_scene(self): + keys = self.entities.keys() + for entity_id in keys: + entity = self.entities.pop(entity_id, None) + vbo = entity['vbo'] + tbo = entity['tbo'] + num = entity['num'] + glDeleteBuffers(1, [vbo]) + glDeleteTextures(1, [tbo]) + + def __del__(self): + self.clear_scene() + eglMakeCurrent(self.egl_display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT) + eglDestroySurface(self.egl_display, self.egl_surface) + eglTerminate(self.egl_display) diff --git a/cognitive_mapping_and_planning/requirements.txt b/cognitive_mapping_and_planning/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..306c807a6c9fd9404afa1c05108e5e835e84edc6 --- /dev/null +++ b/cognitive_mapping_and_planning/requirements.txt @@ -0,0 +1,9 @@ +numpy +pillow +PyOpenGL +PyOpenGL-accelerate +six +networkx +scikit-image +scipy +opencv-python diff --git a/cognitive_mapping_and_planning/scripts/__init__.py b/cognitive_mapping_and_planning/scripts/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/scripts/script_distill.py b/cognitive_mapping_and_planning/scripts/script_distill.py new file mode 100644 index 0000000000000000000000000000000000000000..010c690412ed28011146ab44109dc099d02324e7 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_distill.py @@ -0,0 +1,177 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r""" Script to setup the grid moving agent. + +blaze build --define=ION_GFX_OGLES20=1 -c opt --copt=-mavx --config=cuda_clang \ + learning/brain/public/tensorflow_std_server{,_gpu} \ + experimental/users/saurabhgupta/navigation/cmp/scripts/script_distill.par \ + experimental/users/saurabhgupta/navigation/cmp/scripts/script_distill + + +./blaze-bin/experimental/users/saurabhgupta/navigation/cmp/scripts/script_distill \ + --logdir=/cns/iq-d/home/saurabhgupta/output/stanford-distill/local/v0/ \ + --config_name 'v0+train' --gfs_user robot-intelligence-gpu + +""" +import sys, os, numpy as np +import copy +import argparse, pprint +import time +import cProfile + + +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.python.framework import ops +from tensorflow.contrib.framework.python.ops import variables + +import logging +from tensorflow.python.platform import gfile +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from cfgs import config_distill +from tfcode import tf_utils +import src.utils as utils +import src.file_utils as fu +import tfcode.distillation as distill +import datasets.nav_env as nav_env + +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', 'local', + 'The name of the TensorFlow master to use.') +flags.DEFINE_integer('ps_tasks', 0, 'The number of parameter servers. If the ' + 'value is 0, then the parameters are handled locally by ' + 'the worker.') +flags.DEFINE_integer('task', 0, 'The Task ID. This value is used when training ' + 'with multiple workers to identify each worker.') + +flags.DEFINE_integer('num_workers', 1, '') + +flags.DEFINE_string('config_name', '', '') + +flags.DEFINE_string('logdir', '', '') + +def main(_): + args = config_distill.get_args_for_config(FLAGS.config_name) + args.logdir = FLAGS.logdir + args.solver.num_workers = FLAGS.num_workers + args.solver.task = FLAGS.task + args.solver.ps_tasks = FLAGS.ps_tasks + args.solver.master = FLAGS.master + + args.buildinger.env_class = nav_env.MeshMapper + fu.makedirs(args.logdir) + args.buildinger.logdir = args.logdir + R = nav_env.get_multiplexor_class(args.buildinger, args.solver.task) + + if False: + pr = cProfile.Profile() + pr.enable() + rng = np.random.RandomState(0) + for i in range(1): + b, instances_perturbs = R.sample_building(rng) + inputs = b.worker(*(instances_perturbs)) + for j in range(inputs['imgs'].shape[0]): + p = os.path.join('tmp', '{:d}.png'.format(j)) + img = inputs['imgs'][j,0,:,:,:3]*1 + img = (img).astype(np.uint8) + fu.write_image(p, img) + print(inputs['imgs'].shape) + inputs = R.pre(inputs) + pr.disable() + pr.print_stats(2) + + if args.control.train: + if not gfile.Exists(args.logdir): + gfile.MakeDirs(args.logdir) + + m = utils.Foo() + m.tf_graph = tf.Graph() + + config = tf.ConfigProto() + config.device_count['GPU'] = 1 + config.gpu_options.allow_growth = True + config.gpu_options.per_process_gpu_memory_fraction = 0.8 + + with m.tf_graph.as_default(): + with tf.device(tf.train.replica_device_setter(args.solver.ps_tasks)): + m = distill.setup_to_run(m, args, is_training=True, + batch_norm_is_training=True) + + train_step_kwargs = distill.setup_train_step_kwargs_mesh( + m, R, os.path.join(args.logdir, 'train'), + rng_seed=args.solver.task, is_chief=args.solver.task==0, iters=1, + train_display_interval=args.summary.display_interval) + + final_loss = slim.learning.train( + train_op=m.train_op, + logdir=args.logdir, + master=args.solver.master, + is_chief=args.solver.task == 0, + number_of_steps=args.solver.max_steps, + train_step_fn=tf_utils.train_step_custom, + train_step_kwargs=train_step_kwargs, + global_step=m.global_step_op, + init_op=m.init_op, + init_fn=m.init_fn, + sync_optimizer=m.sync_optimizer, + saver=m.saver_op, + summary_op=None, session_config=config) + + if args.control.test: + m = utils.Foo() + m.tf_graph = tf.Graph() + checkpoint_dir = os.path.join(format(args.logdir)) + with m.tf_graph.as_default(): + m = distill.setup_to_run(m, args, is_training=False, + batch_norm_is_training=args.control.force_batchnorm_is_training_at_test) + + train_step_kwargs = distill.setup_train_step_kwargs_mesh( + m, R, os.path.join(args.logdir, args.control.test_name), + rng_seed=args.solver.task+1, is_chief=args.solver.task==0, + iters=args.summary.test_iters, train_display_interval=None) + + sv = slim.learning.supervisor.Supervisor( + graph=ops.get_default_graph(), logdir=None, init_op=m.init_op, + summary_op=None, summary_writer=None, global_step=None, saver=m.saver_op) + + last_checkpoint = None + while True: + last_checkpoint = slim.evaluation.wait_for_new_checkpoint(checkpoint_dir, last_checkpoint) + checkpoint_iter = int(os.path.basename(last_checkpoint).split('-')[1]) + start = time.time() + logging.info('Starting evaluation at %s using checkpoint %s.', + time.strftime('%Y-%m-%d-%H:%M:%S', time.localtime()), + last_checkpoint) + + config = tf.ConfigProto() + config.device_count['GPU'] = 1 + config.gpu_options.allow_growth = True + config.gpu_options.per_process_gpu_memory_fraction = 0.8 + + with sv.managed_session(args.solver.master,config=config, + start_standard_services=False) as sess: + sess.run(m.init_op) + sv.saver.restore(sess, last_checkpoint) + sv.start_queue_runners(sess) + vals, _ = tf_utils.train_step_custom( + sess, None, m.global_step_op, train_step_kwargs, mode='val') + if checkpoint_iter >= args.solver.max_steps: + break + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_download_init_models.sh b/cognitive_mapping_and_planning/scripts/script_download_init_models.sh new file mode 100644 index 0000000000000000000000000000000000000000..1900bd0b03566d29dac8a8de5f4fce623be98a92 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_download_init_models.sh @@ -0,0 +1,18 @@ +# Script to download models to initialize the RGB and D models for training.We +# use ResNet-v2-50 for both modalities. + +mkdir -p data/init_models +cd data/init_models + +# RGB Models are initialized by pre-training on ImageNet. +mkdir -p resnet_v2_50 +RGB_URL="http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz" +wget $RGB_URL +tar -xf resnet_v2_50_2017_04_14.tar.gz -C resnet_v2_50 + +# Depth models are initialized by distilling the RGB model to D images using +# Cross-Modal Distillation (https://arxiv.org/abs/1507.00448). +mkdir -p distill_rgb_to_d_resnet_v2_50 +D_URL="http://download.tensorflow.org/models/cognitive_mapping_and_planning/2017_04_16/distill_rgb_to_d_resnet_v2_50.tar" +wget $D_URL +tar -xf distill_rgb_to_d_resnet_v2_50.tar -C distill_rgb_to_d_resnet_v2_50 diff --git a/cognitive_mapping_and_planning/scripts/script_env_vis.py b/cognitive_mapping_and_planning/scripts/script_env_vis.py new file mode 100644 index 0000000000000000000000000000000000000000..03222dfab3f25d2eecec8c9a66903999b194b405 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_env_vis.py @@ -0,0 +1,186 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""A simple python function to walk in the enviornments that we have created. +PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_env_vis.py \ + --dataset_name sbpd --building_name area3 +""" +import sys +import numpy as np +import matplotlib +matplotlib.use('TkAgg') +from PIL import ImageTk, Image +import Tkinter as tk +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags + +import datasets.nav_env_config as nec +import datasets.nav_env as nav_env +import cv2 +from datasets import factory +import render.swiftshader_renderer as renderer + +SwiftshaderRenderer = renderer.SwiftshaderRenderer +VisualNavigationEnv = nav_env.VisualNavigationEnv + +FLAGS = flags.FLAGS +flags.DEFINE_string('dataset_name', 'sbpd', 'Name of the dataset.') +flags.DEFINE_float('fov', 60., 'Field of view') +flags.DEFINE_integer('image_size', 512, 'Size of the image.') +flags.DEFINE_string('building_name', '', 'Name of the building.') + +def get_args(): + navtask = nec.nav_env_base_config() + navtask.task_params.type = 'rng_rejection_sampling_many' + navtask.task_params.rejection_sampling_M = 2000 + navtask.task_params.min_dist = 10 + sz = FLAGS.image_size + navtask.camera_param.fov = FLAGS.fov + navtask.camera_param.height = sz + navtask.camera_param.width = sz + navtask.task_params.img_height = sz + navtask.task_params.img_width = sz + + # navtask.task_params.semantic_task.class_map_names = ['chair', 'door', 'table'] + # navtask.task_params.type = 'to_nearest_obj_acc' + + logging.info('navtask: %s', navtask) + return navtask + +def load_building(dataset_name, building_name): + dataset = factory.get_dataset(dataset_name) + + navtask = get_args() + cp = navtask.camera_param + rgb_shader, d_shader = renderer.get_shaders(cp.modalities) + r_obj = SwiftshaderRenderer() + r_obj.init_display(width=cp.width, height=cp.height, + fov=cp.fov, z_near=cp.z_near, z_far=cp.z_far, + rgb_shader=rgb_shader, d_shader=d_shader) + r_obj.clear_scene() + b = VisualNavigationEnv(robot=navtask.robot, env=navtask.env, + task_params=navtask.task_params, + building_name=building_name, flip=False, + logdir=None, building_loader=dataset, + r_obj=r_obj) + b.load_building_into_scene() + b.set_building_visibility(False) + return b + +def walk_through(b): + # init agent at a random location in the environment. + init_env_state = b.reset([np.random.RandomState(0), np.random.RandomState(0)]) + + global current_node + rng = np.random.RandomState(0) + current_node = rng.choice(b.task.nodes.shape[0]) + + root = tk.Tk() + image = b.render_nodes(b.task.nodes[[current_node],:])[0] + print image.shape + image = image.astype(np.uint8) + im = Image.fromarray(image) + im = ImageTk.PhotoImage(im) + panel = tk.Label(root, image=im) + + map_size = b.traversible.shape + sc = np.max(map_size)/256. + loc = np.array([[map_size[1]/2., map_size[0]/2.]]) + x_axis = np.zeros_like(loc); x_axis[:,1] = sc + y_axis = np.zeros_like(loc); y_axis[:,0] = -sc + cum_fs, cum_valid = nav_env.get_map_to_predict(loc, x_axis, y_axis, + map=b.traversible*1., + map_size=256) + cum_fs = cum_fs[0] + cum_fs = cv2.applyColorMap((cum_fs*255).astype(np.uint8), cv2.COLORMAP_JET) + im = Image.fromarray(cum_fs) + im = ImageTk.PhotoImage(im) + panel_overhead = tk.Label(root, image=im) + + def refresh(): + global current_node + image = b.render_nodes(b.task.nodes[[current_node],:])[0] + image = image.astype(np.uint8) + im = Image.fromarray(image) + im = ImageTk.PhotoImage(im) + panel.configure(image=im) + panel.image = im + + def left_key(event): + global current_node + current_node = b.take_action([current_node], [2], 1)[0][0] + refresh() + + def up_key(event): + global current_node + current_node = b.take_action([current_node], [3], 1)[0][0] + refresh() + + def right_key(event): + global current_node + current_node = b.take_action([current_node], [1], 1)[0][0] + refresh() + + def quit(event): + root.destroy() + + panel_overhead.grid(row=4, column=5, rowspan=1, columnspan=1, + sticky=tk.W+tk.E+tk.N+tk.S) + panel.bind('', left_key) + panel.bind('', up_key) + panel.bind('', right_key) + panel.bind('q', quit) + panel.focus_set() + panel.grid(row=0, column=0, rowspan=5, columnspan=5, + sticky=tk.W+tk.E+tk.N+tk.S) + root.mainloop() + +def simple_window(): + root = tk.Tk() + + image = np.zeros((128, 128, 3), dtype=np.uint8) + image[32:96, 32:96, 0] = 255 + im = Image.fromarray(image) + im = ImageTk.PhotoImage(im) + + image = np.zeros((128, 128, 3), dtype=np.uint8) + image[32:96, 32:96, 1] = 255 + im2 = Image.fromarray(image) + im2 = ImageTk.PhotoImage(im2) + + panel = tk.Label(root, image=im) + + def left_key(event): + panel.configure(image=im2) + panel.image = im2 + + def quit(event): + sys.exit() + + panel.bind('', left_key) + panel.bind('', left_key) + panel.bind('', left_key) + panel.bind('q', quit) + panel.focus_set() + panel.pack(side = "bottom", fill = "both", expand = "yes") + root.mainloop() + +def main(_): + b = load_building(FLAGS.dataset_name, FLAGS.building_name) + walk_through(b) + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_nav_agent_release.py b/cognitive_mapping_and_planning/scripts/script_nav_agent_release.py new file mode 100644 index 0000000000000000000000000000000000000000..dab2819a6fcf100cb2e385e45b7aa694c4c5f033 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_nav_agent_release.py @@ -0,0 +1,253 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r""" Script to train and test the grid navigation agent. +Usage: + 1. Testing a model. + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 \ + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_nav_agent_release.py \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r + + 2. Training a model (locally). + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 \ + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_nav_agent_release.py \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+train_train \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r_ + + 3. Training a model (distributed). + # See https://www.tensorflow.org/deploy/distributed on how to setup distributed + # training. + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 \ + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_nav_agent_release.py \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+train_train \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r_ \ + --ps_tasks $num_ps --master $master_name --task $worker_id +""" + +import sys, os, numpy as np +import copy +import argparse, pprint +import time +import cProfile +import platform + + +import tensorflow as tf +from tensorflow.contrib import slim +from tensorflow.python.framework import ops +from tensorflow.contrib.framework.python.ops import variables + +import logging +from tensorflow.python.platform import gfile +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from cfgs import config_cmp +from cfgs import config_vision_baseline +import datasets.nav_env as nav_env +import src.file_utils as fu +import src.utils as utils +import tfcode.cmp as cmp +from tfcode import tf_utils +from tfcode import vision_baseline_lstm + +FLAGS = flags.FLAGS + +flags.DEFINE_string('master', '', + 'The address of the tensorflow master') +flags.DEFINE_integer('ps_tasks', 0, 'The number of parameter servers. If the ' + 'value is 0, then the parameters are handled locally by ' + 'the worker.') +flags.DEFINE_integer('task', 0, 'The Task ID. This value is used when training ' + 'with multiple workers to identify each worker.') + +flags.DEFINE_integer('num_workers', 1, '') + +flags.DEFINE_string('config_name', '', '') + +flags.DEFINE_string('logdir', '', '') + +flags.DEFINE_integer('solver_seed', 0, '') + +flags.DEFINE_integer('delay_start_iters', 20, '') + +logging.basicConfig(level=logging.INFO) + +def main(_): + _launcher(FLAGS.config_name, FLAGS.logdir) + +def _launcher(config_name, logdir): + args = _setup_args(config_name, logdir) + + fu.makedirs(args.logdir) + + if args.control.train: + _train(args) + + if args.control.test: + _test(args) + +def get_args_for_config(config_name): + configs = config_name.split('.') + type = configs[0] + config_name = '.'.join(configs[1:]) + if type == 'cmp': + args = config_cmp.get_args_for_config(config_name) + args.setup_to_run = cmp.setup_to_run + args.setup_train_step_kwargs = cmp.setup_train_step_kwargs + + elif type == 'bl': + args = config_vision_baseline.get_args_for_config(config_name) + args.setup_to_run = vision_baseline_lstm.setup_to_run + args.setup_train_step_kwargs = vision_baseline_lstm.setup_train_step_kwargs + + else: + logging.fatal('Unknown type: {:s}'.format(type)) + return args + +def _setup_args(config_name, logdir): + args = get_args_for_config(config_name) + args.solver.num_workers = FLAGS.num_workers + args.solver.task = FLAGS.task + args.solver.ps_tasks = FLAGS.ps_tasks + args.solver.master = FLAGS.master + args.solver.seed = FLAGS.solver_seed + args.logdir = logdir + args.navtask.logdir = None + return args + +def _train(args): + container_name = "" + + R = lambda: nav_env.get_multiplexer_class(args.navtask, args.solver.task) + m = utils.Foo() + m.tf_graph = tf.Graph() + + config = tf.ConfigProto() + config.device_count['GPU'] = 1 + + with m.tf_graph.as_default(): + with tf.device(tf.train.replica_device_setter(args.solver.ps_tasks, + merge_devices=True)): + with tf.container(container_name): + m = args.setup_to_run(m, args, is_training=True, + batch_norm_is_training=True, summary_mode='train') + + train_step_kwargs = args.setup_train_step_kwargs( + m, R(), os.path.join(args.logdir, 'train'), rng_seed=args.solver.task, + is_chief=args.solver.task==0, + num_steps=args.navtask.task_params.num_steps*args.navtask.task_params.num_goals, iters=1, + train_display_interval=args.summary.display_interval, + dagger_sample_bn_false=args.arch.dagger_sample_bn_false) + + delay_start = (args.solver.task*(args.solver.task+1))/2 * FLAGS.delay_start_iters + logging.error('delaying start for task %d by %d steps.', + args.solver.task, delay_start) + + additional_args = {} + final_loss = slim.learning.train( + train_op=m.train_op, + logdir=args.logdir, + master=args.solver.master, + is_chief=args.solver.task == 0, + number_of_steps=args.solver.max_steps, + train_step_fn=tf_utils.train_step_custom_online_sampling, + train_step_kwargs=train_step_kwargs, + global_step=m.global_step_op, + init_op=m.init_op, + init_fn=m.init_fn, + sync_optimizer=m.sync_optimizer, + saver=m.saver_op, + startup_delay_steps=delay_start, + summary_op=None, session_config=config, **additional_args) + +def _test(args): + args.solver.master = '' + container_name = "" + checkpoint_dir = os.path.join(format(args.logdir)) + logging.error('Checkpoint_dir: %s', args.logdir) + + config = tf.ConfigProto(); + config.device_count['GPU'] = 1; + + m = utils.Foo() + m.tf_graph = tf.Graph() + + rng_data_seed = 0; rng_action_seed = 0; + R = lambda: nav_env.get_multiplexer_class(args.navtask, rng_data_seed) + with m.tf_graph.as_default(): + with tf.container(container_name): + m = args.setup_to_run( + m, args, is_training=False, + batch_norm_is_training=args.control.force_batchnorm_is_training_at_test, + summary_mode=args.control.test_mode) + train_step_kwargs = args.setup_train_step_kwargs( + m, R(), os.path.join(args.logdir, args.control.test_name), + rng_seed=rng_data_seed, is_chief=True, + num_steps=args.navtask.task_params.num_steps*args.navtask.task_params.num_goals, + iters=args.summary.test_iters, train_display_interval=None, + dagger_sample_bn_false=args.arch.dagger_sample_bn_false) + + saver = slim.learning.tf_saver.Saver(variables.get_variables_to_restore()) + + sv = slim.learning.supervisor.Supervisor( + graph=ops.get_default_graph(), logdir=None, init_op=m.init_op, + summary_op=None, summary_writer=None, global_step=None, saver=m.saver_op) + + last_checkpoint = None + reported = False + while True: + last_checkpoint_ = None + while last_checkpoint_ is None: + last_checkpoint_ = slim.evaluation.wait_for_new_checkpoint( + checkpoint_dir, last_checkpoint, seconds_to_sleep=10, timeout=60) + if last_checkpoint_ is None: break + + last_checkpoint = last_checkpoint_ + checkpoint_iter = int(os.path.basename(last_checkpoint).split('-')[1]) + + logging.info('Starting evaluation at %s using checkpoint %s.', + time.strftime('%Y-%m-%d-%H:%M:%S', time.localtime()), + last_checkpoint) + + if (args.control.only_eval_when_done == False or + checkpoint_iter >= args.solver.max_steps): + start = time.time() + logging.info('Starting evaluation at %s using checkpoint %s.', + time.strftime('%Y-%m-%d-%H:%M:%S', time.localtime()), + last_checkpoint) + + with sv.managed_session(args.solver.master, config=config, + start_standard_services=False) as sess: + sess.run(m.init_op) + sv.saver.restore(sess, last_checkpoint) + sv.start_queue_runners(sess) + if args.control.reset_rng_seed: + train_step_kwargs['rng_data'] = [np.random.RandomState(rng_data_seed), + np.random.RandomState(rng_data_seed)] + train_step_kwargs['rng_action'] = np.random.RandomState(rng_action_seed) + vals, _ = tf_utils.train_step_custom_online_sampling( + sess, None, m.global_step_op, train_step_kwargs, + mode=args.control.test_mode) + should_stop = False + + if checkpoint_iter >= args.solver.max_steps: + should_stop = True + + if should_stop: + break + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_plot_trajectory.py b/cognitive_mapping_and_planning/scripts/script_plot_trajectory.py new file mode 100644 index 0000000000000000000000000000000000000000..81c4c899052884b2061cde554c27c43e9574d771 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_plot_trajectory.py @@ -0,0 +1,339 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r""" +Code for plotting trajectories in the top view, and also plot first person views +from saved trajectories. Does not run the network but only loads the mesh data +to plot the view points. + CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 + PYTHONPATH='.' PYOPENGL_PLATFORM=egl python scripts/script_plot_trajectory.py \ + --first_person --num_steps 40 \ + --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r \ + --imset test --alsologtostderr --base_dir output --out_dir vis + +""" +import os, sys, numpy as np, copy +import matplotlib +matplotlib.use("Agg") +import matplotlib.pyplot as plt +import matplotlib.animation as animation +from matplotlib.gridspec import GridSpec + +import tensorflow as tf +from tensorflow.contrib import slim +import cv2 +import logging +from tensorflow.python.platform import gfile +from tensorflow.python.platform import app +from tensorflow.python.platform import flags + +from datasets import nav_env +import scripts.script_nav_agent_release as sna +import src.file_utils as fu +from src import graph_utils +from src import utils +FLAGS = flags.FLAGS + +flags.DEFINE_string('out_dir', 'vis', 'Directory where to store the output') +flags.DEFINE_string('type', '', 'Optional type.') +flags.DEFINE_bool('first_person', False, 'Visualize the first person view.') +flags.DEFINE_bool('top_view', False, 'Visualize the trajectory in the top view.') +flags.DEFINE_integer('num_steps', 40, 'Number of steps to run the model for.') +flags.DEFINE_string('imset', 'test', '') +flags.DEFINE_string('base_dir', 'output', 'Cache directory.') + +def _get_suffix_str(): + return '' + + +def _load_trajectory(): + base_dir = FLAGS.base_dir + config_name = FLAGS.config_name+_get_suffix_str() + + dir_name = os.path.join(base_dir, FLAGS.type, config_name) + logging.info('Waiting for snapshot in directory %s.', dir_name) + last_checkpoint = slim.evaluation.wait_for_new_checkpoint(dir_name, None) + checkpoint_iter = int(os.path.basename(last_checkpoint).split('-')[1]) + + # Load the distances. + a = utils.load_variables(os.path.join(dir_name, 'bench_on_'+FLAGS.imset, + 'all_locs_at_t_{:d}.pkl'.format(checkpoint_iter))) + return a + +def _compute_hardness(): + # Load the stanford data to compute the hardness. + if FLAGS.type == '': + args = sna.get_args_for_config(FLAGS.config_name+'+bench_'+FLAGS.imset) + else: + args = sna.get_args_for_config(FLAGS.type+'.'+FLAGS.config_name+'+bench_'+FLAGS.imset) + + args.navtask.logdir = None + R = lambda: nav_env.get_multiplexer_class(args.navtask, 0) + R = R() + + rng_data = [np.random.RandomState(0), np.random.RandomState(0)] + + # Sample a room. + h_dists = [] + gt_dists = [] + for i in range(250): + e = R.sample_env(rng_data) + nodes = e.task.nodes + + # Initialize the agent. + init_env_state = e.reset(rng_data) + + gt_dist_to_goal = [e.episode.dist_to_goal[0][j][s] + for j, s in enumerate(e.episode.start_node_ids)] + + for j in range(args.navtask.task_params.batch_size): + start_node_id = e.episode.start_node_ids[j] + end_node_id =e.episode.goal_node_ids[0][j] + h_dist = graph_utils.heuristic_fn_vec( + nodes[[start_node_id],:], nodes[[end_node_id], :], + n_ori=args.navtask.task_params.n_ori, + step_size=args.navtask.task_params.step_size)[0][0] + gt_dist = e.episode.dist_to_goal[0][j][start_node_id] + h_dists.append(h_dist) + gt_dists.append(gt_dist) + + h_dists = np.array(h_dists) + gt_dists = np.array(gt_dists) + e = R.sample_env([np.random.RandomState(0), np.random.RandomState(0)]) + input = e.get_common_data() + orig_maps = input['orig_maps'][0,0,:,:,0] + return h_dists, gt_dists, orig_maps + +def plot_trajectory_first_person(dt, orig_maps, out_dir): + out_dir = os.path.join(out_dir, FLAGS.config_name+_get_suffix_str(), + FLAGS.imset) + fu.makedirs(out_dir) + + # Load the model so that we can render. + plt.set_cmap('gray') + samples_per_action = 8; wait_at_action = 0; + + Writer = animation.writers['mencoder'] + writer = Writer(fps=3*(samples_per_action+wait_at_action), + metadata=dict(artist='anonymous'), bitrate=1800) + + args = sna.get_args_for_config(FLAGS.config_name + '+bench_'+FLAGS.imset) + args.navtask.logdir = None + navtask_ = copy.deepcopy(args.navtask) + navtask_.camera_param.modalities = ['rgb'] + navtask_.task_params.modalities = ['rgb'] + sz = 512 + navtask_.camera_param.height = sz + navtask_.camera_param.width = sz + navtask_.task_params.img_height = sz + navtask_.task_params.img_width = sz + R = lambda: nav_env.get_multiplexer_class(navtask_, 0) + R = R() + b = R.buildings[0] + + f = [0 for _ in range(wait_at_action)] + \ + [float(_)/samples_per_action for _ in range(samples_per_action)]; + + # Generate things for it to render. + inds_to_do = [] + inds_to_do += [1, 4, 10] #1291, 1268, 1273, 1289, 1302, 1426, 1413, 1449, 1399, 1390] + + for i in inds_to_do: + fig = plt.figure(figsize=(10,8)) + gs = GridSpec(3,4) + gs.update(wspace=0.05, hspace=0.05, left=0.0, top=0.97, right=1.0, bottom=0.) + ax = fig.add_subplot(gs[:,:-1]) + ax1 = fig.add_subplot(gs[0,-1]) + ax2 = fig.add_subplot(gs[1,-1]) + ax3 = fig.add_subplot(gs[2,-1]) + axes = [ax, ax1, ax2, ax3] + # ax = fig.add_subplot(gs[:,:]) + # axes = [ax] + for ax in axes: + ax.set_axis_off() + + node_ids = dt['all_node_ids'][i, :, 0]*1 + # Prune so that last node is not repeated more than 3 times? + if np.all(node_ids[-4:] == node_ids[-1]): + while node_ids[-4] == node_ids[-1]: + node_ids = node_ids[:-1] + num_steps = np.minimum(FLAGS.num_steps, len(node_ids)) + + xyt = b.to_actual_xyt_vec(b.task.nodes[node_ids]) + xyt_diff = xyt[1:,:] - xyt[:-1:,:] + xyt_diff[:,2] = np.mod(xyt_diff[:,2], 4) + ind = np.where(xyt_diff[:,2] == 3)[0] + xyt_diff[ind, 2] = -1 + xyt_diff = np.expand_dims(xyt_diff, axis=1) + to_cat = [xyt_diff*_ for _ in f] + perturbs_all = np.concatenate(to_cat, axis=1) + perturbs_all = np.concatenate([perturbs_all, np.zeros_like(perturbs_all[:,:,:1])], axis=2) + node_ids_all = np.expand_dims(node_ids, axis=1)*1 + node_ids_all = np.concatenate([node_ids_all for _ in f], axis=1) + node_ids_all = np.reshape(node_ids_all[:-1,:], -1) + perturbs_all = np.reshape(perturbs_all, [-1, 4]) + imgs = b.render_nodes(b.task.nodes[node_ids_all,:], perturb=perturbs_all) + + # Get action at each node. + actions = [] + _, action_to_nodes = b.get_feasible_actions(node_ids) + for j in range(num_steps-1): + action_to_node = action_to_nodes[j] + node_to_action = dict(zip(action_to_node.values(), action_to_node.keys())) + actions.append(node_to_action[node_ids[j+1]]) + + def init_fn(): + return fig, + gt_dist_to_goal = [] + + # Render trajectories. + def worker(j): + # Plot the image. + step_number = j/(samples_per_action + wait_at_action) + img = imgs[j]; ax = axes[0]; ax.clear(); ax.set_axis_off(); + img = img.astype(np.uint8); ax.imshow(img); + tt = ax.set_title( + "First Person View\n" + + "Top corners show diagnostics (distance, agents' action) not input to agent.", + fontsize=12) + plt.setp(tt, color='white') + + # Distance to goal. + t = 'Dist to Goal:\n{:2d} steps'.format(int(dt['all_d_at_t'][i, step_number])) + t = ax.text(0.01, 0.99, t, + horizontalalignment='left', + verticalalignment='top', + fontsize=20, color='red', + transform=ax.transAxes, alpha=1.0) + t.set_bbox(dict(color='white', alpha=0.85, pad=-0.1)) + + # Action to take. + action_latex = ['$\odot$ ', '$\curvearrowright$ ', '$\curvearrowleft$ ', '$\Uparrow$ '] + t = ax.text(0.99, 0.99, action_latex[actions[step_number]], + horizontalalignment='right', + verticalalignment='top', + fontsize=40, color='green', + transform=ax.transAxes, alpha=1.0) + t.set_bbox(dict(color='white', alpha=0.85, pad=-0.1)) + + + # Plot the map top view. + ax = axes[-1] + if j == 0: + # Plot the map + locs = dt['all_locs'][i,:num_steps,:] + goal_loc = dt['all_goal_locs'][i,:,:] + xymin = np.minimum(np.min(goal_loc, axis=0), np.min(locs, axis=0)) + xymax = np.maximum(np.max(goal_loc, axis=0), np.max(locs, axis=0)) + xy1 = (xymax+xymin)/2. - 0.7*np.maximum(np.max(xymax-xymin), 24) + xy2 = (xymax+xymin)/2. + 0.7*np.maximum(np.max(xymax-xymin), 24) + + ax.set_axis_on() + ax.patch.set_facecolor((0.333, 0.333, 0.333)) + ax.set_xticks([]); ax.set_yticks([]); + ax.imshow(orig_maps, origin='lower', vmin=-1.0, vmax=2.0) + ax.plot(goal_loc[:,0], goal_loc[:,1], 'g*', markersize=12) + + locs = dt['all_locs'][i,:1,:] + ax.plot(locs[:,0], locs[:,1], 'b.', markersize=12) + + ax.set_xlim([xy1[0], xy2[0]]) + ax.set_ylim([xy1[1], xy2[1]]) + + locs = dt['all_locs'][i,step_number,:] + locs = np.expand_dims(locs, axis=0) + ax.plot(locs[:,0], locs[:,1], 'r.', alpha=1.0, linewidth=0, markersize=4) + tt = ax.set_title('Trajectory in topview', fontsize=14) + plt.setp(tt, color='white') + return fig, + + line_ani = animation.FuncAnimation(fig, worker, + (num_steps-1)*(wait_at_action+samples_per_action), + interval=500, blit=True, init_func=init_fn) + tmp_file_name = 'tmp.mp4' + line_ani.save(tmp_file_name, writer=writer, savefig_kwargs={'facecolor':'black'}) + out_file_name = os.path.join(out_dir, 'vis_{:04d}.mp4'.format(i)) + print out_file_name + + if fu.exists(out_file_name): + gfile.Remove(out_file_name) + gfile.Copy(tmp_file_name, out_file_name) + gfile.Remove(tmp_file_name) + plt.close(fig) + +def plot_trajectory(dt, hardness, orig_maps, out_dir): + out_dir = os.path.join(out_dir, FLAGS.config_name+_get_suffix_str(), + FLAGS.imset) + fu.makedirs(out_dir) + out_file = os.path.join(out_dir, 'all_locs_at_t.pkl') + dt['hardness'] = hardness + utils.save_variables(out_file, dt.values(), dt.keys(), overwrite=True) + + #Plot trajectories onto the maps + plt.set_cmap('gray') + for i in range(4000): + goal_loc = dt['all_goal_locs'][i, :, :] + locs = np.concatenate((dt['all_locs'][i,:,:], + dt['all_locs'][i,:,:]), axis=0) + xymin = np.minimum(np.min(goal_loc, axis=0), np.min(locs, axis=0)) + xymax = np.maximum(np.max(goal_loc, axis=0), np.max(locs, axis=0)) + xy1 = (xymax+xymin)/2. - 1.*np.maximum(np.max(xymax-xymin), 24) + xy2 = (xymax+xymin)/2. + 1.*np.maximum(np.max(xymax-xymin), 24) + + fig, ax = utils.tight_imshow_figure(plt, figsize=(6,6)) + ax.set_axis_on() + ax.patch.set_facecolor((0.333, 0.333, 0.333)) + ax.set_xticks([]) + ax.set_yticks([]) + + all_locs = dt['all_locs'][i,:,:]*1 + uniq = np.where(np.any(all_locs[1:,:] != all_locs[:-1,:], axis=1))[0]+1 + uniq = np.sort(uniq).tolist() + uniq.insert(0,0) + uniq = np.array(uniq) + all_locs = all_locs[uniq, :] + + ax.plot(dt['all_locs'][i, 0, 0], + dt['all_locs'][i, 0, 1], 'b.', markersize=24) + ax.plot(dt['all_goal_locs'][i, 0, 0], + dt['all_goal_locs'][i, 0, 1], 'g*', markersize=19) + ax.plot(all_locs[:,0], all_locs[:,1], 'r', alpha=0.4, linewidth=2) + ax.scatter(all_locs[:,0], all_locs[:,1], + c=5+np.arange(all_locs.shape[0])*1./all_locs.shape[0], + cmap='Reds', s=30, linewidth=0) + ax.imshow(orig_maps, origin='lower', vmin=-1.0, vmax=2.0, aspect='equal') + ax.set_xlim([xy1[0], xy2[0]]) + ax.set_ylim([xy1[1], xy2[1]]) + + file_name = os.path.join(out_dir, 'trajectory_{:04d}.png'.format(i)) + print file_name + with fu.fopen(file_name, 'w') as f: + plt.savefig(f) + plt.close(fig) + + +def main(_): + a = _load_trajectory() + h_dists, gt_dists, orig_maps = _compute_hardness() + hardness = 1.-h_dists*1./ gt_dists + + if FLAGS.top_view: + plot_trajectory(a, hardness, orig_maps, out_dir=FLAGS.out_dir) + + if FLAGS.first_person: + plot_trajectory_first_person(a, orig_maps, out_dir=FLAGS.out_dir) + +if __name__ == '__main__': + app.run() diff --git a/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.py b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.py new file mode 100644 index 0000000000000000000000000000000000000000..58f32d121acf4c638625079907b02161e808af68 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.py @@ -0,0 +1,197 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import os +import glob +import numpy as np +import logging +import cPickle +from datasets import nav_env +from datasets import factory +from src import utils +from src import map_utils as mu + +logging.basicConfig(level=logging.INFO) +DATA_DIR = 'data/stanford_building_parser_dataset_raw/' + +mkdir_if_missing = utils.mkdir_if_missing +save_variables = utils.save_variables + +def _get_semantic_maps(building_name, transform, map_, flip, cats): + rooms = get_room_in_building(building_name) + maps = [] + for cat in cats: + maps.append(np.zeros((map_.size[1], map_.size[0]))) + + for r in rooms: + room = load_room(building_name, r, category_list=cats) + classes = room['class_id'] + for i, cat in enumerate(cats): + c_ind = cats.index(cat) + ind = [_ for _, c in enumerate(classes) if c == c_ind] + if len(ind) > 0: + vs = [room['vertexs'][x]*1 for x in ind] + vs = np.concatenate(vs, axis=0) + if transform: + vs = np.array([vs[:,1], vs[:,0], vs[:,2]]).T + vs[:,0] = -vs[:,0] + vs[:,1] += 4.20 + vs[:,0] += 6.20 + vs = vs*100. + if flip: + vs[:,1] = -vs[:,1] + maps[i] = maps[i] + \ + mu._project_to_map(map_, vs, ignore_points_outside_map=True) + return maps + +def _map_building_name(building_name): + b = int(building_name.split('_')[0][4]) + out_name = 'Area_{:d}'.format(b) + if b == 5: + if int(building_name.split('_')[0][5]) == 1: + transform = True + else: + transform = False + else: + transform = False + return out_name, transform + +def get_categories(): + cats = ['beam', 'board', 'bookcase', 'ceiling', 'chair', 'clutter', 'column', + 'door', 'floor', 'sofa', 'table', 'wall', 'window'] + return cats + +def _write_map_files(b_in, b_out, transform): + cats = get_categories() + + env = utils.Foo(padding=10, resolution=5, num_point_threshold=2, + valid_min=-10, valid_max=200, n_samples_per_face=200) + robot = utils.Foo(radius=15, base=10, height=140, sensor_height=120, + camera_elevation_degree=-15) + + building_loader = factory.get_dataset('sbpd') + for flip in [False, True]: + b = nav_env.Building(b_out, robot, env, flip=flip, + building_loader=building_loader) + logging.info("building_in: %s, building_out: %s, transform: %d", b_in, + b_out, transform) + maps = _get_semantic_maps(b_in, transform, b.map, flip, cats) + maps = np.transpose(np.array(maps), axes=[1,2,0]) + + # Load file from the cache. + file_name = '{:s}_{:d}_{:d}_{:d}_{:d}_{:d}_{:d}.pkl' + file_name = file_name.format(b.building_name, b.map.size[0], b.map.size[1], + b.map.origin[0], b.map.origin[1], + b.map.resolution, flip) + out_file = os.path.join(DATA_DIR, 'processing', 'class-maps', file_name) + logging.info('Writing semantic maps to %s.', out_file) + save_variables(out_file, [maps, cats], ['maps', 'cats'], overwrite=True) + +def _transform_area5b(room_dimension): + for a in room_dimension.keys(): + r = room_dimension[a]*1 + r[[0,1,3,4]] = r[[1,0,4,3]] + r[[0,3]] = -r[[3,0]] + r[[1,4]] += 4.20 + r[[0,3]] += 6.20 + room_dimension[a] = r + return room_dimension + +def collect_room(building_name, room_name): + room_dir = os.path.join(DATA_DIR, 'Stanford3dDataset_v1.2', building_name, + room_name, 'Annotations') + files = glob.glob1(room_dir, '*.txt') + files = sorted(files, key=lambda s: s.lower()) + vertexs = []; colors = []; + for f in files: + file_name = os.path.join(room_dir, f) + logging.info(' %s', file_name) + a = np.loadtxt(file_name) + vertex = a[:,:3]*1. + color = a[:,3:]*1 + color = color.astype(np.uint8) + vertexs.append(vertex) + colors.append(color) + files = [f.split('.')[0] for f in files] + out = {'vertexs': vertexs, 'colors': colors, 'names': files} + return out + +def load_room(building_name, room_name, category_list=None): + room = collect_room(building_name, room_name) + room['building_name'] = building_name + room['room_name'] = room_name + instance_id = range(len(room['names'])) + room['instance_id'] = instance_id + if category_list is not None: + name = [r.split('_')[0] for r in room['names']] + class_id = [] + for n in name: + if n in category_list: + class_id.append(category_list.index(n)) + else: + class_id.append(len(category_list)) + room['class_id'] = class_id + room['category_list'] = category_list + return room + +def get_room_in_building(building_name): + building_dir = os.path.join(DATA_DIR, 'Stanford3dDataset_v1.2', building_name) + rn = os.listdir(building_dir) + rn = [x for x in rn if os.path.isdir(os.path.join(building_dir, x))] + rn = sorted(rn, key=lambda s: s.lower()) + return rn + +def write_room_dimensions(b_in, b_out, transform): + rooms = get_room_in_building(b_in) + room_dimension = {} + for r in rooms: + room = load_room(b_in, r, category_list=None) + vertex = np.concatenate(room['vertexs'], axis=0) + room_dimension[r] = np.concatenate((np.min(vertex, axis=0), np.max(vertex, axis=0)), axis=0) + if transform == 1: + room_dimension = _transform_area5b(room_dimension) + + out_file = os.path.join(DATA_DIR, 'processing', 'room-dimension', b_out+'.pkl') + save_variables(out_file, [room_dimension], ['room_dimension'], overwrite=True) + +def write_room_dimensions_all(I): + mkdir_if_missing(os.path.join(DATA_DIR, 'processing', 'room-dimension')) + bs_in = ['Area_1', 'Area_2', 'Area_3', 'Area_4', 'Area_5', 'Area_5', 'Area_6'] + bs_out = ['area1', 'area2', 'area3', 'area4', 'area5a', 'area5b', 'area6'] + transforms = [0, 0, 0, 0, 0, 1, 0] + + for i in I: + b_in = bs_in[i] + b_out = bs_out[i] + t = transforms[i] + write_room_dimensions(b_in, b_out, t) + +def write_class_maps_all(I): + mkdir_if_missing(os.path.join(DATA_DIR, 'processing', 'class-maps')) + bs_in = ['Area_1', 'Area_2', 'Area_3', 'Area_4', 'Area_5', 'Area_5', 'Area_6'] + bs_out = ['area1', 'area2', 'area3', 'area4', 'area5a', 'area5b', 'area6'] + transforms = [0, 0, 0, 0, 0, 1, 0] + + for i in I: + b_in = bs_in[i] + b_out = bs_out[i] + t = transforms[i] + _write_map_files(b_in, b_out, t) + + +if __name__ == '__main__': + write_room_dimensions_all([0, 2, 3, 4, 5, 6]) + write_class_maps_all([0, 2, 3, 4, 5, 6]) + diff --git a/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.sh b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.sh new file mode 100644 index 0000000000000000000000000000000000000000..1384fabe69259ccc514a14d62aee358d1909bffb --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_preprocess_annoations_S3DIS.sh @@ -0,0 +1,24 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +cd data/stanford_building_parser_dataset_raw +unzip Stanford3dDataset_v1.2.zip +cd ../../ +PYOPENGL_PLATFORM=egl PYTHONPATH='.' python scripts/script_preprocess_annoations_S3DIS.py + +mv data/stanford_building_parser_dataset_raw/processing/room-dimension data/stanford_building_parser_dataset/. +mv data/stanford_building_parser_dataset_raw/processing/class-maps data/stanford_building_parser_dataset/. + +echo "You may now delete data/stanford_building_parser_dataset_raw if needed." diff --git a/cognitive_mapping_and_planning/scripts/script_preprocess_meshes_S3DIS.sh b/cognitive_mapping_and_planning/scripts/script_preprocess_meshes_S3DIS.sh new file mode 100644 index 0000000000000000000000000000000000000000..557a4dde611d42e71d71dd1589abf96f55e6eec6 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_preprocess_meshes_S3DIS.sh @@ -0,0 +1,37 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +mkdir -p data/stanford_building_parser_dataset +mkdir -p data/stanford_building_parser_dataset/mesh +cd data/stanford_building_parser_dataset_raw + +# Untar the files and extract the meshes. +for t in "1" "3" "4" "5a" "5b" "6"; do + tar -xf area_"$t"_noXYZ.tar area_$t/3d/rgb_textures + mv area_$t/3d/rgb_textures ../stanford_building_parser_dataset/mesh/area$t + rmdir area_$t/3d + rmdir area_$t +done + +cd ../../ + +# Preprocess meshes to remove the group and chunk information. +cd data/stanford_building_parser_dataset/ +for t in "1" "3" "4" "5a" "5b" "6"; do + obj_name=`ls mesh/area$t/*.obj` + cp $obj_name "$obj_name".bck + cat $obj_name.bck | grep -v '^g' | grep -v '^o' > $obj_name +done +cd ../../ diff --git a/cognitive_mapping_and_planning/scripts/script_test_pretrained_models.sh b/cognitive_mapping_and_planning/scripts/script_test_pretrained_models.sh new file mode 100644 index 0000000000000000000000000000000000000000..a4299fff5346afb53783a61de5c3e84f102a6304 --- /dev/null +++ b/cognitive_mapping_and_planning/scripts/script_test_pretrained_models.sh @@ -0,0 +1,63 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# Test CMP models. +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_rgb_r2r+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_rgb_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_d_ST+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_rgb_ST+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_rgb_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80+bench_test \ + --logdir output/cmp.lmap_Msc.clip5.sbpd_d_r2r_h0_64_80 + +# Test LSTM baseline models. +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_d_r2r+bench_test \ + --logdir output/bl.v2.noclip.sbpd_d_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_rgb_r2r+bench_test \ + --logdir output/bl.v2.noclip.sbpd_rgb_r2r + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_d_ST+bench_test \ + --logdir output/bl.v2.noclip.sbpd_d_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_rgb_ST+bench_test \ + --logdir output/bl.v2.noclip.sbpd_rgb_ST + +CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ + python scripts/script_nav_agent_release.py --config_name bl.v2.noclip.sbpd_d_r2r_h0_64_80+bench_test \ + --logdir output/bl.v2.noclip.sbpd_d_r2r_h0_64_80 + +# Visualize test trajectories in top view. +# CUDA_VISIBLE_DEVICES=0 LD_LIBRARY_PATH=/opt/cuda-8.0/lib64:/opt/cudnnv51/lib64 PYTHONPATH='.' PYOPENGL_PLATFORM=egl \ +# python scripts/script_plot_trajectory.py \ +# --first_person --num_steps 40 \ +# --config_name cmp.lmap_Msc.clip5.sbpd_d_r2r \ +# --imset test --alsologtostderr diff --git a/cognitive_mapping_and_planning/src/__init__.py b/cognitive_mapping_and_planning/src/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/src/depth_utils.py b/cognitive_mapping_and_planning/src/depth_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..b1fb2f51e5caa08ac43c730d587a771576700242 --- /dev/null +++ b/cognitive_mapping_and_planning/src/depth_utils.py @@ -0,0 +1,95 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for processing depth images. +""" +import numpy as np +import src.rotation_utils as ru + +def get_camera_matrix(width, height, fov): + """Returns a camera matrix from image size and fov.""" + xc = (width-1.) / 2. + zc = (height-1.) / 2. + f = (width / 2.) / np.tan(np.deg2rad(fov / 2.)) + camera_matrix = utils.Foo(xc=xc, zc=zc, f=f) + return camera_matrix + +def get_point_cloud_from_z(Y, camera_matrix): + """Projects the depth image Y into a 3D point cloud. + Inputs: + Y is ...xHxW + camera_matrix + Outputs: + X is positive going right + Y is positive into the image + Z is positive up in the image + XYZ is ...xHxWx3 + """ + x, z = np.meshgrid(np.arange(Y.shape[-1]), + np.arange(Y.shape[-2]-1, -1, -1)) + for i in range(Y.ndim-2): + x = np.expand_dims(x, axis=0) + z = np.expand_dims(z, axis=0) + X = (x-camera_matrix.xc) * Y / camera_matrix.f + Z = (z-camera_matrix.zc) * Y / camera_matrix.f + XYZ = np.concatenate((X[...,np.newaxis], Y[...,np.newaxis], + Z[...,np.newaxis]), axis=X.ndim) + return XYZ + +def make_geocentric(XYZ, sensor_height, camera_elevation_degree): + """Transforms the point cloud into geocentric coordinate frame. + Input: + XYZ : ...x3 + sensor_height : height of the sensor + camera_elevation_degree : camera elevation to rectify. + Output: + XYZ : ...x3 + """ + R = ru.get_r_matrix([1.,0.,0.], angle=np.deg2rad(camera_elevation_degree)) + XYZ = np.matmul(XYZ.reshape(-1,3), R.T).reshape(XYZ.shape) + XYZ[...,2] = XYZ[...,2] + sensor_height + return XYZ + +def bin_points(XYZ_cms, map_size, z_bins, xy_resolution): + """Bins points into xy-z bins + XYZ_cms is ... x H x W x3 + Outputs is ... x map_size x map_size x (len(z_bins)+1) + """ + sh = XYZ_cms.shape + XYZ_cms = XYZ_cms.reshape([-1, sh[-3], sh[-2], sh[-1]]) + n_z_bins = len(z_bins)+1 + map_center = (map_size-1.)/2. + counts = [] + isvalids = [] + for XYZ_cm in XYZ_cms: + isnotnan = np.logical_not(np.isnan(XYZ_cm[:,:,0])) + X_bin = np.round(XYZ_cm[:,:,0] / xy_resolution + map_center).astype(np.int32) + Y_bin = np.round(XYZ_cm[:,:,1] / xy_resolution + map_center).astype(np.int32) + Z_bin = np.digitize(XYZ_cm[:,:,2], bins=z_bins).astype(np.int32) + + isvalid = np.array([X_bin >= 0, X_bin < map_size, Y_bin >= 0, Y_bin < map_size, + Z_bin >= 0, Z_bin < n_z_bins, isnotnan]) + isvalid = np.all(isvalid, axis=0) + + ind = (Y_bin * map_size + X_bin) * n_z_bins + Z_bin + ind[np.logical_not(isvalid)] = 0 + count = np.bincount(ind.ravel(), isvalid.ravel().astype(np.int32), + minlength=map_size*map_size*n_z_bins) + count = np.reshape(count, [map_size, map_size, n_z_bins]) + counts.append(count) + isvalids.append(isvalid) + counts = np.array(counts).reshape(list(sh[:-3]) + [map_size, map_size, n_z_bins]) + isvalids = np.array(isvalids).reshape(list(sh[:-3]) + [sh[-3], sh[-2], 1]) + return counts, isvalids diff --git a/cognitive_mapping_and_planning/src/file_utils.py b/cognitive_mapping_and_planning/src/file_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..5bf0e4a2e0d1f11382476b586fc76eb3cb5c583e --- /dev/null +++ b/cognitive_mapping_and_planning/src/file_utils.py @@ -0,0 +1,41 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for manipulating files. +""" +import os +import PIL +from tensorflow.python.platform import gfile +import cv2 + +exists = lambda path: gfile.Exists(path) +fopen = lambda path, mode: gfile.Open(path, mode) +makedirs = lambda path: gfile.MakeDirs(path) +listdir = lambda path: gfile.ListDir(path) +copyfile = lambda a, b, o: gfile.Copy(a,b,o) + +def write_image(image_path, rgb): + ext = os.path.splitext(image_path)[1] + with gfile.GFile(image_path, 'w') as f: + img_str = cv2.imencode(ext, rgb[:,:,::-1])[1].tostring() + f.write(img_str) + +def read_image(image_path, type='rgb'): + with fopen(file_name, 'r') as f: + I = PIL.Image.open(f) + II = np.array(I) + if type == 'rgb': + II = II[:,:,:3] + return II diff --git a/cognitive_mapping_and_planning/src/graph_utils.py b/cognitive_mapping_and_planning/src/graph_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d40eb62ca6eb47126074ccb243be5773fc92d83f --- /dev/null +++ b/cognitive_mapping_and_planning/src/graph_utils.py @@ -0,0 +1,550 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various function to manipulate graphs for computing distances. +""" +import skimage.morphology +import numpy as np +import networkx as nx +import itertools +import graph_tool as gt +import graph_tool.topology +import graph_tool.generation +import src.utils as utils + +# Compute shortest path from all nodes to or from all source nodes +def get_distance_node_list(gtG, source_nodes, direction, weights=None): + gtG_ = gt.Graph(gtG) + v = gtG_.add_vertex() + + if weights is not None: + weights = gtG_.edge_properties[weights] + + for s in source_nodes: + e = gtG_.add_edge(s, int(v)) + if weights is not None: + weights[e] = 0. + + if direction == 'to': + dist = gt.topology.shortest_distance( + gt.GraphView(gtG_, reversed=True), source=gtG_.vertex(int(v)), + target=None, weights=weights) + elif direction == 'from': + dist = gt.topology.shortest_distance( + gt.GraphView(gtG_, reversed=False), source=gtG_.vertex(int(v)), + target=None, weights=weights) + dist = np.array(dist.get_array()) + dist = dist[:-1] + if weights is None: + dist = dist-1 + return dist + +# Functions for semantically labelling nodes in the traversal graph. +def generate_lattice(sz_x, sz_y): + """Generates a lattice with sz_x vertices along x and sz_y vertices along y + direction Each of these vertices is step_size distance apart. Origin is at + (0,0). """ + g = gt.generation.lattice([sz_x, sz_y]) + x, y = np.meshgrid(np.arange(sz_x), np.arange(sz_y)) + x = np.reshape(x, [-1,1]); y = np.reshape(y, [-1,1]); + nodes = np.concatenate((x,y), axis=1) + return g, nodes + +def add_diagonal_edges(g, nodes, sz_x, sz_y, edge_len): + offset = [sz_x+1, sz_x-1] + for o in offset: + s = np.arange(nodes.shape[0]-o-1) + t = s + o + ind = np.all(np.abs(nodes[s,:] - nodes[t,:]) == np.array([[1,1]]), axis=1) + s = s[ind][:,np.newaxis] + t = t[ind][:,np.newaxis] + st = np.concatenate((s,t), axis=1) + for i in range(st.shape[0]): + e = g.add_edge(st[i,0], st[i,1], add_missing=False) + g.ep['wts'][e] = edge_len + +def convert_traversible_to_graph(traversible, ff_cost=1., fo_cost=1., + oo_cost=1., connectivity=4): + assert(connectivity == 4 or connectivity == 8) + + sz_x = traversible.shape[1] + sz_y = traversible.shape[0] + g, nodes = generate_lattice(sz_x, sz_y) + + # Assign costs. + edge_wts = g.new_edge_property('float') + g.edge_properties['wts'] = edge_wts + wts = np.ones(g.num_edges(), dtype=np.float32) + edge_wts.get_array()[:] = wts + + if connectivity == 8: + add_diagonal_edges(g, nodes, sz_x, sz_y, np.sqrt(2.)) + + se = np.array([[int(e.source()), int(e.target())] for e in g.edges()]) + s_xy = nodes[se[:,0]] + t_xy = nodes[se[:,1]] + s_t = np.ravel_multi_index((s_xy[:,1], s_xy[:,0]), traversible.shape) + t_t = np.ravel_multi_index((t_xy[:,1], t_xy[:,0]), traversible.shape) + s_t = traversible.ravel()[s_t] + t_t = traversible.ravel()[t_t] + + wts = np.zeros(g.num_edges(), dtype=np.float32) + wts[np.logical_and(s_t == True, t_t == True)] = ff_cost + wts[np.logical_and(s_t == False, t_t == False)] = oo_cost + wts[np.logical_xor(s_t, t_t)] = fo_cost + + edge_wts = g.edge_properties['wts'] + for i, e in enumerate(g.edges()): + edge_wts[e] = edge_wts[e] * wts[i] + # d = edge_wts.get_array()*1. + # edge_wts.get_array()[:] = d*wts + return g, nodes + +def label_nodes_with_class(nodes_xyt, class_maps, pix): + """ + Returns: + class_maps__: one-hot class_map for each class. + node_class_label: one-hot class_map for each class, nodes_xyt.shape[0] x n_classes + """ + # Assign each pixel to a node. + selem = skimage.morphology.disk(pix) + class_maps_ = class_maps*1. + for i in range(class_maps.shape[2]): + class_maps_[:,:,i] = skimage.morphology.dilation(class_maps[:,:,i]*1, selem) + class_maps__ = np.argmax(class_maps_, axis=2) + class_maps__[np.max(class_maps_, axis=2) == 0] = -1 + + # For each node pick out the label from this class map. + x = np.round(nodes_xyt[:,[0]]).astype(np.int32) + y = np.round(nodes_xyt[:,[1]]).astype(np.int32) + ind = np.ravel_multi_index((y,x), class_maps__.shape) + node_class_label = class_maps__.ravel()[ind][:,0] + + # Convert to one hot versions. + class_maps_one_hot = np.zeros(class_maps.shape, dtype=np.bool) + node_class_label_one_hot = np.zeros((node_class_label.shape[0], class_maps.shape[2]), dtype=np.bool) + for i in range(class_maps.shape[2]): + class_maps_one_hot[:,:,i] = class_maps__ == i + node_class_label_one_hot[:,i] = node_class_label == i + return class_maps_one_hot, node_class_label_one_hot + +def label_nodes_with_class_geodesic(nodes_xyt, class_maps, pix, traversible, + ff_cost=1., fo_cost=1., oo_cost=1., + connectivity=4): + """Labels nodes in nodes_xyt with class labels using geodesic distance as + defined by traversible from class_maps. + Inputs: + nodes_xyt + class_maps: counts for each class. + pix: distance threshold to consider close enough to target. + traversible: binary map of whether traversible or not. + Output: + labels: For each node in nodes_xyt returns a label of the class or -1 is + unlabelled. + """ + g, nodes = convert_traversible_to_graph(traversible, ff_cost=ff_cost, + fo_cost=fo_cost, oo_cost=oo_cost, + connectivity=connectivity) + + class_dist = np.zeros_like(class_maps*1.) + n_classes = class_maps.shape[2] + if False: + # Assign each pixel to a class based on number of points. + selem = skimage.morphology.disk(pix) + class_maps_ = class_maps*1. + class_maps__ = np.argmax(class_maps_, axis=2) + class_maps__[np.max(class_maps_, axis=2) == 0] = -1 + + # Label nodes with classes. + for i in range(n_classes): + # class_node_ids = np.where(class_maps__.ravel() == i)[0] + class_node_ids = np.where(class_maps[:,:,i].ravel() > 0)[0] + dist_i = get_distance_node_list(g, class_node_ids, 'to', weights='wts') + class_dist[:,:,i] = np.reshape(dist_i, class_dist[:,:,i].shape) + class_map_geodesic = (class_dist <= pix) + class_map_geodesic = np.reshape(class_map_geodesic, [-1, n_classes]) + + # For each node pick out the label from this class map. + x = np.round(nodes_xyt[:,[0]]).astype(np.int32) + y = np.round(nodes_xyt[:,[1]]).astype(np.int32) + ind = np.ravel_multi_index((y,x), class_dist[:,:,0].shape) + node_class_label = class_map_geodesic[ind[:,0],:] + class_map_geodesic = class_dist <= pix + return class_map_geodesic, node_class_label + +def _get_next_nodes_undirected(n, sc, n_ori): + nodes_to_add = [] + nodes_to_validate = [] + (p, q, r) = n + nodes_to_add.append((n, (p, q, r), 0)) + if n_ori == 4: + for _ in [1, 2, 3, 4]: + if _ == 1: + v = (p - sc, q, r) + elif _ == 2: + v = (p + sc, q, r) + elif _ == 3: + v = (p, q - sc, r) + elif _ == 4: + v = (p, q + sc, r) + nodes_to_validate.append((n, v, _)) + return nodes_to_add, nodes_to_validate + +def _get_next_nodes(n, sc, n_ori): + nodes_to_add = [] + nodes_to_validate = [] + (p, q, r) = n + for r_, a_ in zip([-1, 0, 1], [1, 0, 2]): + nodes_to_add.append((n, (p, q, np.mod(r+r_, n_ori)), a_)) + + if n_ori == 6: + if r == 0: + v = (p + sc, q, r) + elif r == 1: + v = (p + sc, q + sc, r) + elif r == 2: + v = (p, q + sc, r) + elif r == 3: + v = (p - sc, q, r) + elif r == 4: + v = (p - sc, q - sc, r) + elif r == 5: + v = (p, q - sc, r) + elif n_ori == 4: + if r == 0: + v = (p + sc, q, r) + elif r == 1: + v = (p, q + sc, r) + elif r == 2: + v = (p - sc, q, r) + elif r == 3: + v = (p, q - sc, r) + nodes_to_validate.append((n,v,3)) + + return nodes_to_add, nodes_to_validate + +def generate_graph(valid_fn_vec=None, sc=1., n_ori=6, + starting_location=(0, 0, 0), vis=False, directed=True): + timer = utils.Timer() + timer.tic() + if directed: G = nx.DiGraph(directed=True) + else: G = nx.Graph() + G.add_node(starting_location) + new_nodes = G.nodes() + while len(new_nodes) != 0: + nodes_to_add = [] + nodes_to_validate = [] + for n in new_nodes: + if directed: + na, nv = _get_next_nodes(n, sc, n_ori) + else: + na, nv = _get_next_nodes_undirected(n, sc, n_ori) + nodes_to_add = nodes_to_add + na + if valid_fn_vec is not None: + nodes_to_validate = nodes_to_validate + nv + else: + node_to_add = nodes_to_add + nv + + # Validate nodes. + vs = [_[1] for _ in nodes_to_validate] + valids = valid_fn_vec(vs) + + for nva, valid in zip(nodes_to_validate, valids): + if valid: + nodes_to_add.append(nva) + + new_nodes = [] + for n,v,a in nodes_to_add: + if not G.has_node(v): + new_nodes.append(v) + G.add_edge(n, v, action=a) + + timer.toc(average=True, log_at=1, log_str='src.graph_utils.generate_graph') + return (G) + +def vis_G(G, ax, vertex_color='r', edge_color='b', r=None): + if edge_color is not None: + for e in G.edges(): + XYT = zip(*e) + x = XYT[-3] + y = XYT[-2] + t = XYT[-1] + if r is None or t[0] == r: + ax.plot(x, y, edge_color) + if vertex_color is not None: + XYT = zip(*G.nodes()) + x = XYT[-3] + y = XYT[-2] + t = XYT[-1] + ax.plot(x, y, vertex_color + '.') + +def convert_to_graph_tool(G): + timer = utils.Timer() + timer.tic() + gtG = gt.Graph(directed=G.is_directed()) + gtG.ep['action'] = gtG.new_edge_property('int') + + nodes_list = G.nodes() + nodes_array = np.array(nodes_list) + + nodes_id = np.zeros((nodes_array.shape[0],), dtype=np.int64) + + for i in range(nodes_array.shape[0]): + v = gtG.add_vertex() + nodes_id[i] = int(v) + + # d = {key: value for (key, value) in zip(nodes_list, nodes_id)} + d = dict(itertools.izip(nodes_list, nodes_id)) + + for src, dst, data in G.edges_iter(data=True): + e = gtG.add_edge(d[src], d[dst]) + gtG.ep['action'][e] = data['action'] + nodes_to_id = d + timer.toc(average=True, log_at=1, log_str='src.graph_utils.convert_to_graph_tool') + return gtG, nodes_array, nodes_to_id + + +def _rejection_sampling(rng, sampling_d, target_d, bins, hardness, M): + bin_ind = np.digitize(hardness, bins)-1 + i = 0 + ratio = target_d[bin_ind] / (M*sampling_d[bin_ind]) + while i < ratio.size and rng.rand() > ratio[i]: + i = i+1 + return i + +def heuristic_fn_vec(n1, n2, n_ori, step_size): + # n1 is a vector and n2 is a single point. + dx = (n1[:,0] - n2[0,0])/step_size + dy = (n1[:,1] - n2[0,1])/step_size + dt = n1[:,2] - n2[0,2] + dt = np.mod(dt, n_ori) + dt = np.minimum(dt, n_ori-dt) + + if n_ori == 6: + if dx*dy > 0: + d = np.maximum(np.abs(dx), np.abs(dy)) + else: + d = np.abs(dy-dx) + elif n_ori == 4: + d = np.abs(dx) + np.abs(dy) + + return (d + dt).reshape((-1,1)) + +def get_hardness_distribution(gtG, max_dist, min_dist, rng, trials, bins, nodes, + n_ori, step_size): + heuristic_fn = lambda node_ids, node_id: \ + heuristic_fn_vec(nodes[node_ids, :], nodes[[node_id], :], n_ori, step_size) + num_nodes = gtG.num_vertices() + gt_dists = []; h_dists = []; + for i in range(trials): + end_node_id = rng.choice(num_nodes) + gt_dist = gt.topology.shortest_distance(gt.GraphView(gtG, reversed=True), + source=gtG.vertex(end_node_id), + target=None, max_dist=max_dist) + gt_dist = np.array(gt_dist.get_array()) + ind = np.where(np.logical_and(gt_dist <= max_dist, gt_dist >= min_dist))[0] + gt_dist = gt_dist[ind] + h_dist = heuristic_fn(ind, end_node_id)[:,0] + gt_dists.append(gt_dist) + h_dists.append(h_dist) + gt_dists = np.concatenate(gt_dists) + h_dists = np.concatenate(h_dists) + hardness = 1. - h_dists*1./gt_dists + hist, _ = np.histogram(hardness, bins) + hist = hist.astype(np.float64) + hist = hist / np.sum(hist) + return hist + +def rng_next_goal_rejection_sampling(start_node_ids, batch_size, gtG, rng, + max_dist, min_dist, max_dist_to_compute, + sampling_d, target_d, + nodes, n_ori, step_size, bins, M): + sample_start_nodes = start_node_ids is None + dists = []; pred_maps = []; end_node_ids = []; start_node_ids_ = []; + hardnesss = []; gt_dists = []; + num_nodes = gtG.num_vertices() + for i in range(batch_size): + done = False + while not done: + if sample_start_nodes: + start_node_id = rng.choice(num_nodes) + else: + start_node_id = start_node_ids[i] + + gt_dist = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=False), source=start_node_id, target=None, + max_dist=max_dist) + gt_dist = np.array(gt_dist.get_array()) + ind = np.where(np.logical_and(gt_dist <= max_dist, gt_dist >= min_dist))[0] + ind = rng.permutation(ind) + gt_dist = gt_dist[ind]*1. + h_dist = heuristic_fn_vec(nodes[ind, :], nodes[[start_node_id], :], + n_ori, step_size)[:,0] + hardness = 1. - h_dist / gt_dist + sampled_ind = _rejection_sampling(rng, sampling_d, target_d, bins, + hardness, M) + if sampled_ind < ind.size: + # print sampled_ind + end_node_id = ind[sampled_ind] + hardness = hardness[sampled_ind] + gt_dist = gt_dist[sampled_ind] + done = True + + # Compute distance from end node to all nodes, to return. + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=end_node_id, target=None, + max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + + hardnesss.append(hardness); dists.append(dist); pred_maps.append(pred_map); + start_node_ids_.append(start_node_id); end_node_ids.append(end_node_id); + gt_dists.append(gt_dist); + paths = None + return start_node_ids_, end_node_ids, dists, pred_maps, paths, hardnesss, gt_dists + + +def rng_next_goal(start_node_ids, batch_size, gtG, rng, max_dist, + max_dist_to_compute, node_room_ids, nodes=None, + compute_path=False, dists_from_start_node=None): + # Compute the distance field from the starting location, and then pick a + # destination in another room if possible otherwise anywhere outside this + # room. + dists = []; pred_maps = []; paths = []; end_node_ids = []; + for i in range(batch_size): + room_id = node_room_ids[start_node_ids[i]] + # Compute distances. + if dists_from_start_node == None: + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=False), source=gtG.vertex(start_node_ids[i]), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + else: + dist = dists_from_start_node[i] + + # Randomly sample nodes which are within max_dist. + near_ids = dist <= max_dist + near_ids = near_ids[:, np.newaxis] + # Check to see if there is a non-negative node which is close enough. + non_same_room_ids = node_room_ids != room_id + non_hallway_ids = node_room_ids != -1 + good1_ids = np.logical_and(near_ids, np.logical_and(non_same_room_ids, non_hallway_ids)) + good2_ids = np.logical_and(near_ids, non_hallway_ids) + good3_ids = near_ids + if np.any(good1_ids): + end_node_id = rng.choice(np.where(good1_ids)[0]) + elif np.any(good2_ids): + end_node_id = rng.choice(np.where(good2_ids)[0]) + elif np.any(good3_ids): + end_node_id = rng.choice(np.where(good3_ids)[0]) + else: + logging.error('Did not find any good nodes.') + + # Compute distance to this new goal for doing distance queries. + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=gtG.vertex(end_node_id), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + + dists.append(dist) + pred_maps.append(pred_map) + end_node_ids.append(end_node_id) + + path = None + if compute_path: + path = get_path_ids(start_node_ids[i], end_node_ids[i], pred_map) + paths.append(path) + + return start_node_ids, end_node_ids, dists, pred_maps, paths + + +def rng_room_to_room(batch_size, gtG, rng, max_dist, max_dist_to_compute, + node_room_ids, nodes=None, compute_path=False): + # Sample one of the rooms, compute the distance field. Pick a destination in + # another room if possible otherwise anywhere outside this room. + dists = []; pred_maps = []; paths = []; start_node_ids = []; end_node_ids = []; + room_ids = np.unique(node_room_ids[node_room_ids[:,0] >= 0, 0]) + for i in range(batch_size): + room_id = rng.choice(room_ids) + end_node_id = rng.choice(np.where(node_room_ids[:,0] == room_id)[0]) + end_node_ids.append(end_node_id) + + # Compute distances. + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=gtG.vertex(end_node_id), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + dists.append(dist) + pred_maps.append(pred_map) + + # Randomly sample nodes which are within max_dist. + near_ids = dist <= max_dist + near_ids = near_ids[:, np.newaxis] + + # Check to see if there is a non-negative node which is close enough. + non_same_room_ids = node_room_ids != room_id + non_hallway_ids = node_room_ids != -1 + good1_ids = np.logical_and(near_ids, np.logical_and(non_same_room_ids, non_hallway_ids)) + good2_ids = np.logical_and(near_ids, non_hallway_ids) + good3_ids = near_ids + if np.any(good1_ids): + start_node_id = rng.choice(np.where(good1_ids)[0]) + elif np.any(good2_ids): + start_node_id = rng.choice(np.where(good2_ids)[0]) + elif np.any(good3_ids): + start_node_id = rng.choice(np.where(good3_ids)[0]) + else: + logging.error('Did not find any good nodes.') + + start_node_ids.append(start_node_id) + + path = None + if compute_path: + path = get_path_ids(start_node_ids[i], end_node_ids[i], pred_map) + paths.append(path) + + return start_node_ids, end_node_ids, dists, pred_maps, paths + + +def rng_target_dist_field(batch_size, gtG, rng, max_dist, max_dist_to_compute, + nodes=None, compute_path=False): + # Sample a single node, compute distance to all nodes less than max_dist, + # sample nodes which are a particular distance away. + dists = []; pred_maps = []; paths = []; start_node_ids = [] + end_node_ids = rng.choice(gtG.num_vertices(), size=(batch_size,), + replace=False).tolist() + + for i in range(batch_size): + dist, pred_map = gt.topology.shortest_distance( + gt.GraphView(gtG, reversed=True), source=gtG.vertex(end_node_ids[i]), + target=None, max_dist=max_dist_to_compute, pred_map=True) + dist = np.array(dist.get_array()) + pred_map = np.array(pred_map.get_array()) + dists.append(dist) + pred_maps.append(pred_map) + + # Randomly sample nodes which are withing max_dist + near_ids = np.where(dist <= max_dist)[0] + start_node_id = rng.choice(near_ids, size=(1,), replace=False)[0] + start_node_ids.append(start_node_id) + + path = None + if compute_path: + path = get_path_ids(start_node_ids[i], end_node_ids[i], pred_map) + paths.append(path) + + return start_node_ids, end_node_ids, dists, pred_maps, paths diff --git a/cognitive_mapping_and_planning/src/map_utils.py b/cognitive_mapping_and_planning/src/map_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1298bff24e798cb31bd40c106e603d5accd2b573 --- /dev/null +++ b/cognitive_mapping_and_planning/src/map_utils.py @@ -0,0 +1,244 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various function to compute the ground truth map for training etc. +""" +import copy +import skimage.morphology +import numpy as np +import scipy.ndimage +import matplotlib.pyplot as plt +import PIL + +import src.utils as utils +import cv2 + +def _get_xy_bounding_box(vertex, padding): + """Returns the xy bounding box of the environment.""" + min_ = np.floor(np.min(vertex[:, :2], axis=0) - padding).astype(np.int) + max_ = np.ceil(np.max(vertex[:, :2], axis=0) + padding).astype(np.int) + return min_, max_ + +def _project_to_map(map, vertex, wt=None, ignore_points_outside_map=False): + """Projects points to map, returns how many points are present at each + location.""" + num_points = np.zeros((map.size[1], map.size[0])) + vertex_ = vertex[:, :2] - map.origin + vertex_ = np.round(vertex_ / map.resolution).astype(np.int) + if ignore_points_outside_map: + good_ind = np.all(np.array([vertex_[:,1] >= 0, vertex_[:,1] < map.size[1], + vertex_[:,0] >= 0, vertex_[:,0] < map.size[0]]), + axis=0) + vertex_ = vertex_[good_ind, :] + if wt is not None: + wt = wt[good_ind, :] + if wt is None: + np.add.at(num_points, (vertex_[:, 1], vertex_[:, 0]), 1) + else: + assert(wt.shape[0] == vertex.shape[0]), \ + 'number of weights should be same as vertices.' + np.add.at(num_points, (vertex_[:, 1], vertex_[:, 0]), wt) + return num_points + +def make_map(padding, resolution, vertex=None, sc=1.): + """Returns a map structure.""" + min_, max_ = _get_xy_bounding_box(vertex*sc, padding=padding) + sz = np.ceil((max_ - min_ + 1) / resolution).astype(np.int32) + max_ = min_ + sz * resolution - 1 + map = utils.Foo(origin=min_, size=sz, max=max_, resolution=resolution, + padding=padding) + return map + +def _fill_holes(img, thresh): + """Fills holes less than thresh area (assumes 4 connectivity when computing + hole area.""" + l, n = scipy.ndimage.label(np.logical_not(img)) + img_ = img == True + cnts = np.bincount(l.reshape(-1)) + for i, cnt in enumerate(cnts): + if cnt < thresh: + l[l == i] = -1 + img_[l == -1] = True + return img_ + +def compute_traversibility(map, robot_base, robot_height, robot_radius, + valid_min, valid_max, num_point_threshold, shapess, + sc=100., n_samples_per_face=200): + """Returns a bit map with pixels that are traversible or not as long as the + robot center is inside this volume we are good colisions can be detected by + doing a line search on things, or walking from current location to final + location in the bitmap, or doing bwlabel on the traversibility map.""" + + tt = utils.Timer() + tt.tic() + num_obstcale_points = np.zeros((map.size[1], map.size[0])) + num_points = np.zeros((map.size[1], map.size[0])) + + for i, shapes in enumerate(shapess): + for j in range(shapes.get_number_of_meshes()): + p, face_areas, face_idx = shapes.sample_points_on_face_of_shape( + j, n_samples_per_face, sc) + wt = face_areas[face_idx]/n_samples_per_face + + ind = np.all(np.concatenate( + (p[:, [2]] > robot_base, + p[:, [2]] < robot_base + robot_height), axis=1),axis=1) + num_obstcale_points += _project_to_map(map, p[ind, :], wt[ind]) + + ind = np.all(np.concatenate( + (p[:, [2]] > valid_min, + p[:, [2]] < valid_max), axis=1),axis=1) + num_points += _project_to_map(map, p[ind, :], wt[ind]) + + selem = skimage.morphology.disk(robot_radius / map.resolution) + obstacle_free = skimage.morphology.binary_dilation( + _fill_holes(num_obstcale_points > num_point_threshold, 20), selem) != True + valid_space = _fill_holes(num_points > num_point_threshold, 20) + traversible = np.all(np.concatenate((obstacle_free[...,np.newaxis], + valid_space[...,np.newaxis]), axis=2), + axis=2) + # plt.imshow(np.concatenate((obstacle_free, valid_space, traversible), axis=1)) + # plt.show() + + map_out = copy.deepcopy(map) + map_out.num_obstcale_points = num_obstcale_points + map_out.num_points = num_points + map_out.traversible = traversible + map_out.obstacle_free = obstacle_free + map_out.valid_space = valid_space + tt.toc(log_at=1, log_str='src.map_utils.compute_traversibility: ') + return map_out + + +def resize_maps(map, map_scales, resize_method): + scaled_maps = [] + for i, sc in enumerate(map_scales): + if resize_method == 'antialiasing': + # Resize using open cv so that we can compute the size. + # Use PIL resize to use anti aliasing feature. + map_ = cv2.resize(map*1, None, None, fx=sc, fy=sc, interpolation=cv2.INTER_LINEAR) + w = map_.shape[1]; h = map_.shape[0] + + map_img = PIL.Image.fromarray((map*255).astype(np.uint8)) + map__img = map_img.resize((w,h), PIL.Image.ANTIALIAS) + map_ = np.asarray(map__img).astype(np.float32) + map_ = map_/255. + map_ = np.minimum(map_, 1.0) + map_ = np.maximum(map_, 0.0) + elif resize_method == 'linear_noantialiasing': + map_ = cv2.resize(map*1, None, None, fx=sc, fy=sc, interpolation=cv2.INTER_LINEAR) + else: + logging.error('Unknown resizing method') + scaled_maps.append(map_) + return scaled_maps + + +def pick_largest_cc(traversible): + out = scipy.ndimage.label(traversible)[0] + cnt = np.bincount(out.reshape(-1))[1:] + return out == np.argmax(cnt) + 1 + +def get_graph_origin_loc(rng, traversible): + """Erode the traversibility mask so that we get points in the bulk of the + graph, and not end up with a situation where the graph is localized in the + corner of a cramped room. Output Locs is in the coordinate frame of the + map.""" + + aa = pick_largest_cc(skimage.morphology.binary_erosion(traversible == True, + selem=np.ones((15,15)))) + y, x = np.where(aa > 0) + ind = rng.choice(y.size) + locs = np.array([x[ind], y[ind]]) + locs = locs + rng.rand(*(locs.shape)) - 0.5 + return locs + + +def generate_egocentric_maps(scaled_maps, map_scales, map_crop_sizes, loc, + x_axis, y_axis, theta): + maps = [] + for i, (map_, sc, map_crop_size) in enumerate(zip(scaled_maps, map_scales, map_crop_sizes)): + maps_i = np.array(get_map_to_predict(loc*sc, x_axis, y_axis, map_, + map_crop_size, + interpolation=cv2.INTER_LINEAR)[0]) + maps_i[np.isnan(maps_i)] = 0 + maps.append(maps_i) + return maps + +def generate_goal_images(map_scales, map_crop_sizes, n_ori, goal_dist, + goal_theta, rel_goal_orientation): + goal_dist = goal_dist[:,0] + goal_theta = goal_theta[:,0] + rel_goal_orientation = rel_goal_orientation[:,0] + + goals = []; + # Generate the map images. + for i, (sc, map_crop_size) in enumerate(zip(map_scales, map_crop_sizes)): + goal_i = np.zeros((goal_dist.shape[0], map_crop_size, map_crop_size, n_ori), + dtype=np.float32) + x = goal_dist*np.cos(goal_theta)*sc + (map_crop_size-1.)/2. + y = goal_dist*np.sin(goal_theta)*sc + (map_crop_size-1.)/2. + + for j in range(goal_dist.shape[0]): + gc = rel_goal_orientation[j] + x0 = np.floor(x[j]).astype(np.int32); x1 = x0 + 1; + y0 = np.floor(y[j]).astype(np.int32); y1 = y0 + 1; + if x0 >= 0 and x0 <= map_crop_size-1: + if y0 >= 0 and y0 <= map_crop_size-1: + goal_i[j, y0, x0, gc] = (x1-x[j])*(y1-y[j]) + if y1 >= 0 and y1 <= map_crop_size-1: + goal_i[j, y1, x0, gc] = (x1-x[j])*(y[j]-y0) + + if x1 >= 0 and x1 <= map_crop_size-1: + if y0 >= 0 and y0 <= map_crop_size-1: + goal_i[j, y0, x1, gc] = (x[j]-x0)*(y1-y[j]) + if y1 >= 0 and y1 <= map_crop_size-1: + goal_i[j, y1, x1, gc] = (x[j]-x0)*(y[j]-y0) + + goals.append(goal_i) + return goals + +def get_map_to_predict(src_locs, src_x_axiss, src_y_axiss, map, map_size, + interpolation=cv2.INTER_LINEAR): + fss = [] + valids = [] + + center = (map_size-1.0)/2.0 + dst_theta = np.pi/2.0 + dst_loc = np.array([center, center]) + dst_x_axis = np.array([np.cos(dst_theta), np.sin(dst_theta)]) + dst_y_axis = np.array([np.cos(dst_theta+np.pi/2), np.sin(dst_theta+np.pi/2)]) + + def compute_points(center, x_axis, y_axis): + points = np.zeros((3,2),dtype=np.float32) + points[0,:] = center + points[1,:] = center + x_axis + points[2,:] = center + y_axis + return points + + dst_points = compute_points(dst_loc, dst_x_axis, dst_y_axis) + for i in range(src_locs.shape[0]): + src_loc = src_locs[i,:] + src_x_axis = src_x_axiss[i,:] + src_y_axis = src_y_axiss[i,:] + src_points = compute_points(src_loc, src_x_axis, src_y_axis) + M = cv2.getAffineTransform(src_points, dst_points) + + fs = cv2.warpAffine(map, M, (map_size, map_size), None, flags=interpolation, + borderValue=np.NaN) + valid = np.invert(np.isnan(fs)) + valids.append(valid) + fss.append(fs) + return fss, valids + diff --git a/cognitive_mapping_and_planning/src/rotation_utils.py b/cognitive_mapping_and_planning/src/rotation_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..8d6d4f3cbdb1f808d210dce8b22fa3ba831d45a9 --- /dev/null +++ b/cognitive_mapping_and_planning/src/rotation_utils.py @@ -0,0 +1,73 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utilities for generating and applying rotation matrices. +""" +import numpy as np + +ANGLE_EPS = 0.001 + + +def normalize(v): + return v / np.linalg.norm(v) + + +def get_r_matrix(ax_, angle): + ax = normalize(ax_) + if np.abs(angle) > ANGLE_EPS: + S_hat = np.array( + [[0.0, -ax[2], ax[1]], [ax[2], 0.0, -ax[0]], [-ax[1], ax[0], 0.0]], + dtype=np.float32) + R = np.eye(3) + np.sin(angle)*S_hat + \ + (1-np.cos(angle))*(np.linalg.matrix_power(S_hat, 2)) + else: + R = np.eye(3) + return R + + +def r_between(v_from_, v_to_): + v_from = normalize(v_from_) + v_to = normalize(v_to_) + ax = normalize(np.cross(v_from, v_to)) + angle = np.arccos(np.dot(v_from, v_to)) + return get_r_matrix(ax, angle) + + +def rotate_camera_to_point_at(up_from, lookat_from, up_to, lookat_to): + inputs = [up_from, lookat_from, up_to, lookat_to] + for i in range(4): + inputs[i] = normalize(np.array(inputs[i]).reshape((-1,))) + up_from, lookat_from, up_to, lookat_to = inputs + r1 = r_between(lookat_from, lookat_to) + + new_x = np.dot(r1, np.array([1, 0, 0]).reshape((-1, 1))).reshape((-1)) + to_x = normalize(np.cross(lookat_to, up_to)) + angle = np.arccos(np.dot(new_x, to_x)) + if angle > ANGLE_EPS: + if angle < np.pi - ANGLE_EPS: + ax = normalize(np.cross(new_x, to_x)) + flip = np.dot(lookat_to, ax) + if flip > 0: + r2 = get_r_matrix(lookat_to, angle) + elif flip < 0: + r2 = get_r_matrix(lookat_to, -1. * angle) + else: + # Angle of rotation is too close to 180 degrees, direction of rotation + # does not matter. + r2 = get_r_matrix(lookat_to, angle) + else: + r2 = np.eye(3) + return np.dot(r2, r1) + diff --git a/cognitive_mapping_and_planning/src/utils.py b/cognitive_mapping_and_planning/src/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..f58820c1f4cda35c0b38fb42f02d3f221924dc66 --- /dev/null +++ b/cognitive_mapping_and_planning/src/utils.py @@ -0,0 +1,168 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +r"""Generaly Utilities. +""" + +import numpy as np, cPickle, os, time +import src.file_utils as fu +import logging + +class Timer(): + def __init__(self): + self.calls = 0. + self.start_time = 0. + self.time_per_call = 0. + self.total_time = 0. + self.last_log_time = 0. + + def tic(self): + self.start_time = time.time() + + def toc(self, average=True, log_at=-1, log_str='', type='calls'): + if self.start_time == 0: + logging.error('Timer not started by calling tic().') + t = time.time() + diff = time.time() - self.start_time + self.total_time += diff + self.calls += 1. + self.time_per_call = self.total_time/self.calls + + if type == 'calls' and log_at > 0 and np.mod(self.calls, log_at) == 0: + _ = [] + logging.info('%s: %f seconds.', log_str, self.time_per_call) + elif type == 'time' and log_at > 0 and t - self.last_log_time >= log_at: + _ = [] + logging.info('%s: %f seconds.', log_str, self.time_per_call) + self.last_log_time = t + + if average: + return self.time_per_call + else: + return diff + +class Foo(object): + def __init__(self, **kwargs): + self.__dict__.update(kwargs) + def __str__(self): + str_ = '' + for v in vars(self).keys(): + a = getattr(self, v) + if True: #isinstance(v, object): + str__ = str(a) + str__ = str__.replace('\n', '\n ') + else: + str__ = str(a) + str_ += '{:s}: {:s}'.format(v, str__) + str_ += '\n' + return str_ + + +def dict_equal(dict1, dict2): + assert(set(dict1.keys()) == set(dict2.keys())), "Sets of keys between 2 dictionaries are different." + for k in dict1.keys(): + assert(type(dict1[k]) == type(dict2[k])), "Type of key '{:s}' if different.".format(k) + if type(dict1[k]) == np.ndarray: + assert(dict1[k].dtype == dict2[k].dtype), "Numpy Type of key '{:s}' if different.".format(k) + assert(np.allclose(dict1[k], dict2[k])), "Value for key '{:s}' do not match.".format(k) + else: + assert(dict1[k] == dict2[k]), "Value for key '{:s}' do not match.".format(k) + return True + +def subplot(plt, Y_X, sz_y_sz_x = (10, 10)): + Y,X = Y_X + sz_y, sz_x = sz_y_sz_x + plt.rcParams['figure.figsize'] = (X*sz_x, Y*sz_y) + fig, axes = plt.subplots(Y, X) + plt.subplots_adjust(wspace=0.1, hspace=0.1) + return fig, axes + +def tic_toc_print(interval, string): + global tic_toc_print_time_old + if 'tic_toc_print_time_old' not in globals(): + tic_toc_print_time_old = time.time() + print string + else: + new_time = time.time() + if new_time - tic_toc_print_time_old > interval: + tic_toc_print_time_old = new_time; + print string + +def mkdir_if_missing(output_dir): + if not fu.exists(output_dir): + fu.makedirs(output_dir) + +def save_variables(pickle_file_name, var, info, overwrite = False): + if fu.exists(pickle_file_name) and overwrite == False: + raise Exception('{:s} exists and over write is false.'.format(pickle_file_name)) + # Construct the dictionary + assert(type(var) == list); assert(type(info) == list); + d = {} + for i in xrange(len(var)): + d[info[i]] = var[i] + with fu.fopen(pickle_file_name, 'w') as f: + cPickle.dump(d, f, cPickle.HIGHEST_PROTOCOL) + +def load_variables(pickle_file_name): + if fu.exists(pickle_file_name): + with fu.fopen(pickle_file_name, 'r') as f: + d = cPickle.load(f) + return d + else: + raise Exception('{:s} does not exists.'.format(pickle_file_name)) + +def voc_ap(rec, prec): + rec = rec.reshape((-1,1)) + prec = prec.reshape((-1,1)) + z = np.zeros((1,1)) + o = np.ones((1,1)) + mrec = np.vstack((z, rec, o)) + mpre = np.vstack((z, prec, z)) + for i in range(len(mpre)-2, -1, -1): + mpre[i] = max(mpre[i], mpre[i+1]) + + I = np.where(mrec[1:] != mrec[0:-1])[0]+1; + ap = 0; + for i in I: + ap = ap + (mrec[i] - mrec[i-1])*mpre[i]; + return ap + +def tight_imshow_figure(plt, figsize=None): + fig = plt.figure(figsize=figsize) + ax = plt.Axes(fig, [0,0,1,1]) + ax.set_axis_off() + fig.add_axes(ax) + return fig, ax + +def calc_pr(gt, out, wt=None): + if wt is None: + wt = np.ones((gt.size,1)) + + gt = gt.astype(np.float64).reshape((-1,1)) + wt = wt.astype(np.float64).reshape((-1,1)) + out = out.astype(np.float64).reshape((-1,1)) + + gt = gt*wt + tog = np.concatenate([gt, wt, out], axis=1)*1. + ind = np.argsort(tog[:,2], axis=0)[::-1] + tog = tog[ind,:] + cumsumsortgt = np.cumsum(tog[:,0]) + cumsumsortwt = np.cumsum(tog[:,1]) + prec = cumsumsortgt / cumsumsortwt + rec = cumsumsortgt / np.sum(tog[:,0]) + + ap = voc_ap(rec, prec) + return ap, rec, prec + diff --git a/cognitive_mapping_and_planning/tfcode/__init__.py b/cognitive_mapping_and_planning/tfcode/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/cognitive_mapping_and_planning/tfcode/cmp.py b/cognitive_mapping_and_planning/tfcode/cmp.py new file mode 100644 index 0000000000000000000000000000000000000000..228ef90fddcd9ff41b26795544d93a1f18466158 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/cmp.py @@ -0,0 +1,553 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Code for setting up the network for CMP. + +Sets up the mapper and the planner. +""" + +import sys, os, numpy as np +import matplotlib.pyplot as plt +import copy +import argparse, pprint +import time + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from src import utils +import src.file_utils as fu +import tfcode.nav_utils as nu +import tfcode.cmp_utils as cu +import tfcode.cmp_summary as cmp_s +from tfcode import tf_utils + +value_iteration_network = cu.value_iteration_network +rotate_preds = cu.rotate_preds +deconv = cu.deconv +get_visual_frustum = cu.get_visual_frustum +fr_v2 = cu.fr_v2 + +setup_train_step_kwargs = nu.default_train_step_kwargs +compute_losses_multi_or = nu.compute_losses_multi_or + +get_repr_from_image = nu.get_repr_from_image + +_save_d_at_t = nu.save_d_at_t +_save_all = nu.save_all +_eval_ap = nu.eval_ap +_eval_dist = nu.eval_dist +_plot_trajectories = nu.plot_trajectories + +_vis_readout_maps = cmp_s._vis_readout_maps +_vis = cmp_s._vis +_summary_vis = cmp_s._summary_vis +_summary_readout_maps = cmp_s._summary_readout_maps +_add_summaries = cmp_s._add_summaries + +def _inputs(problem): + # Set up inputs. + with tf.name_scope('inputs'): + inputs = [] + inputs.append(('orig_maps', tf.float32, + (problem.batch_size, 1, None, None, 1))) + inputs.append(('goal_loc', tf.float32, + (problem.batch_size, problem.num_goals, 2))) + common_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + if problem.input_type == 'vision': + # Multiple images from an array of cameras. + inputs.append(('imgs', tf.float32, + (problem.batch_size, None, len(problem.aux_delta_thetas)+1, + problem.img_height, problem.img_width, + problem.img_channels))) + elif problem.input_type == 'analytical_counts': + for i in range(len(problem.map_crop_sizes)): + inputs.append(('analytical_counts_{:d}'.format(i), tf.float32, + (problem.batch_size, None, problem.map_crop_sizes[i], + problem.map_crop_sizes[i], problem.map_channels))) + + if problem.outputs.readout_maps: + for i in range(len(problem.readout_maps_crop_sizes)): + inputs.append(('readout_maps_{:d}'.format(i), tf.float32, + (problem.batch_size, None, + problem.readout_maps_crop_sizes[i], + problem.readout_maps_crop_sizes[i], + problem.readout_maps_channels))) + + for i in range(len(problem.map_crop_sizes)): + inputs.append(('ego_goal_imgs_{:d}'.format(i), tf.float32, + (problem.batch_size, None, problem.map_crop_sizes[i], + problem.map_crop_sizes[i], problem.goal_channels))) + for s in ['sum_num', 'sum_denom', 'max_denom']: + inputs.append(('running_'+s+'_{:d}'.format(i), tf.float32, + (problem.batch_size, 1, problem.map_crop_sizes[i], + problem.map_crop_sizes[i], problem.map_channels))) + + inputs.append(('incremental_locs', tf.float32, + (problem.batch_size, None, 2))) + inputs.append(('incremental_thetas', tf.float32, + (problem.batch_size, None, 1))) + inputs.append(('step_number', tf.int32, (1, None, 1))) + inputs.append(('node_ids', tf.int32, (problem.batch_size, None, + problem.node_ids_dim))) + inputs.append(('perturbs', tf.float32, (problem.batch_size, None, + problem.perturbs_dim))) + + # For plotting result plots + inputs.append(('loc_on_map', tf.float32, (problem.batch_size, None, 2))) + inputs.append(('gt_dist_to_goal', tf.float32, (problem.batch_size, None, 1))) + + step_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + inputs.append(('action', tf.int32, (problem.batch_size, None, problem.num_actions))) + train_data, _ = tf_utils.setup_inputs(inputs) + train_data.update(step_input_data) + train_data.update(common_input_data) + return common_input_data, step_input_data, train_data + +def readout_general(multi_scale_belief, num_neurons, strides, layers_per_block, + kernel_size, batch_norm_is_training_op, wt_decay): + multi_scale_belief = tf.stop_gradient(multi_scale_belief) + with tf.variable_scope('readout_maps_deconv'): + x, outs = deconv(multi_scale_belief, batch_norm_is_training_op, + wt_decay=wt_decay, neurons=num_neurons, strides=strides, + layers_per_block=layers_per_block, kernel_size=kernel_size, + conv_fn=slim.conv2d_transpose, offset=0, + name='readout_maps_deconv') + probs = tf.sigmoid(x) + return x, probs + + +def running_combine(fss_logits, confs_probs, incremental_locs, + incremental_thetas, previous_sum_num, previous_sum_denom, + previous_max_denom, map_size, num_steps): + # fss_logits is B x N x H x W x C + # confs_logits is B x N x H x W x C + # incremental_locs is B x N x 2 + # incremental_thetas is B x N x 1 + # previous_sum_num etc is B x 1 x H x W x C + + with tf.name_scope('combine_{:d}'.format(num_steps)): + running_sum_nums_ = []; running_sum_denoms_ = []; + running_max_denoms_ = []; + + fss_logits_ = tf.unstack(fss_logits, axis=1, num=num_steps) + confs_probs_ = tf.unstack(confs_probs, axis=1, num=num_steps) + incremental_locs_ = tf.unstack(incremental_locs, axis=1, num=num_steps) + incremental_thetas_ = tf.unstack(incremental_thetas, axis=1, num=num_steps) + running_sum_num = tf.unstack(previous_sum_num, axis=1, num=1)[0] + running_sum_denom = tf.unstack(previous_sum_denom, axis=1, num=1)[0] + running_max_denom = tf.unstack(previous_max_denom, axis=1, num=1)[0] + + for i in range(num_steps): + # Rotate the previous running_num and running_denom + running_sum_num, running_sum_denom, running_max_denom = rotate_preds( + incremental_locs_[i], incremental_thetas_[i], map_size, + [running_sum_num, running_sum_denom, running_max_denom], + output_valid_mask=False)[0] + # print i, num_steps, running_sum_num.get_shape().as_list() + running_sum_num = running_sum_num + fss_logits_[i] * confs_probs_[i] + running_sum_denom = running_sum_denom + confs_probs_[i] + running_max_denom = tf.maximum(running_max_denom, confs_probs_[i]) + running_sum_nums_.append(running_sum_num) + running_sum_denoms_.append(running_sum_denom) + running_max_denoms_.append(running_max_denom) + + running_sum_nums = tf.stack(running_sum_nums_, axis=1) + running_sum_denoms = tf.stack(running_sum_denoms_, axis=1) + running_max_denoms = tf.stack(running_max_denoms_, axis=1) + return running_sum_nums, running_sum_denoms, running_max_denoms + +def get_map_from_images(imgs, mapper_arch, task_params, freeze_conv, wt_decay, + is_training, batch_norm_is_training_op, num_maps, + split_maps=True): + # Hit image with a resnet. + n_views = len(task_params.aux_delta_thetas) + 1 + out = utils.Foo() + + images_reshaped = tf.reshape(imgs, + shape=[-1, task_params.img_height, + task_params.img_width, + task_params.img_channels], name='re_image') + + x, out.vars_to_restore = get_repr_from_image( + images_reshaped, task_params.modalities, task_params.data_augment, + mapper_arch.encoder, freeze_conv, wt_decay, is_training) + + # Reshape into nice things so that these can be accumulated over time steps + # for faster backprop. + sh_before = x.get_shape().as_list() + out.encoder_output = tf.reshape(x, shape=[task_params.batch_size, -1, n_views] + sh_before[1:]) + x = tf.reshape(out.encoder_output, shape=[-1] + sh_before[1:]) + + # Add a layer to reduce dimensions for a fc layer. + if mapper_arch.dim_reduce_neurons > 0: + ks = 1; neurons = mapper_arch.dim_reduce_neurons; + init_var = np.sqrt(2.0/(ks**2)/neurons) + batch_norm_param = mapper_arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + out.conv_feat = slim.conv2d(x, neurons, kernel_size=ks, stride=1, + normalizer_fn=slim.batch_norm, normalizer_params=batch_norm_param, + padding='SAME', scope='dim_reduce', + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var)) + reshape_conv_feat = slim.flatten(out.conv_feat) + sh = reshape_conv_feat.get_shape().as_list() + out.reshape_conv_feat = tf.reshape(reshape_conv_feat, shape=[-1, sh[1]*n_views]) + + with tf.variable_scope('fc'): + # Fully connected layers to compute the representation in top-view space. + fc_batch_norm_param = {'center': True, 'scale': True, + 'activation_fn':tf.nn.relu, + 'is_training': batch_norm_is_training_op} + f = out.reshape_conv_feat + out_neurons = (mapper_arch.fc_out_size**2)*mapper_arch.fc_out_neurons + neurons = mapper_arch.fc_neurons + [out_neurons] + f, _ = tf_utils.fc_network(f, neurons=neurons, wt_decay=wt_decay, + name='fc', offset=0, + batch_norm_param=fc_batch_norm_param, + is_training=is_training, + dropout_ratio=mapper_arch.fc_dropout) + f = tf.reshape(f, shape=[-1, mapper_arch.fc_out_size, + mapper_arch.fc_out_size, + mapper_arch.fc_out_neurons], name='re_fc') + + # Use pool5 to predict the free space map via deconv layers. + with tf.variable_scope('deconv'): + x, outs = deconv(f, batch_norm_is_training_op, wt_decay=wt_decay, + neurons=mapper_arch.deconv_neurons, + strides=mapper_arch.deconv_strides, + layers_per_block=mapper_arch.deconv_layers_per_block, + kernel_size=mapper_arch.deconv_kernel_size, + conv_fn=slim.conv2d_transpose, offset=0, name='deconv') + + # Reshape x the right way. + sh = x.get_shape().as_list() + x = tf.reshape(x, shape=[task_params.batch_size, -1] + sh[1:]) + out.deconv_output = x + + # Separate out the map and the confidence predictions, pass the confidence + # through a sigmoid. + if split_maps: + with tf.name_scope('split'): + out_all = tf.split(value=x, axis=4, num_or_size_splits=2*num_maps) + out.fss_logits = out_all[:num_maps] + out.confs_logits = out_all[num_maps:] + with tf.name_scope('sigmoid'): + out.confs_probs = [tf.nn.sigmoid(x) for x in out.confs_logits] + return out + +def setup_to_run(m, args, is_training, batch_norm_is_training, summary_mode): + assert(args.arch.multi_scale), 'removed support for old single scale code.' + # Set up the model. + tf.set_random_seed(args.solver.seed) + task_params = args.navtask.task_params + + batch_norm_is_training_op = \ + tf.placeholder_with_default(batch_norm_is_training, shape=[], + name='batch_norm_is_training_op') + + # Setup the inputs + m.input_tensors = {} + m.train_ops = {} + m.input_tensors['common'], m.input_tensors['step'], m.input_tensors['train'] = \ + _inputs(task_params) + + m.init_fn = None + + if task_params.input_type == 'vision': + m.vision_ops = get_map_from_images( + m.input_tensors['step']['imgs'], args.mapper_arch, + task_params, args.solver.freeze_conv, + args.solver.wt_decay, is_training, batch_norm_is_training_op, + num_maps=len(task_params.map_crop_sizes)) + + # Load variables from snapshot if needed. + if args.solver.pretrained_path is not None: + m.init_fn = slim.assign_from_checkpoint_fn(args.solver.pretrained_path, + m.vision_ops.vars_to_restore) + + # Set up caching of vision features if needed. + if args.solver.freeze_conv: + m.train_ops['step_data_cache'] = [m.vision_ops.encoder_output] + else: + m.train_ops['step_data_cache'] = [] + + # Set up blobs that are needed for the computation in rest of the graph. + m.ego_map_ops = m.vision_ops.fss_logits + m.coverage_ops = m.vision_ops.confs_probs + + # Zero pad these to make them same size as what the planner expects. + for i in range(len(m.ego_map_ops)): + if args.mapper_arch.pad_map_with_zeros_each[i] > 0: + paddings = np.zeros((5,2), dtype=np.int32) + paddings[2:4,:] = args.mapper_arch.pad_map_with_zeros_each[i] + paddings_op = tf.constant(paddings, dtype=tf.int32) + m.ego_map_ops[i] = tf.pad(m.ego_map_ops[i], paddings=paddings_op) + m.coverage_ops[i] = tf.pad(m.coverage_ops[i], paddings=paddings_op) + + elif task_params.input_type == 'analytical_counts': + m.ego_map_ops = []; m.coverage_ops = [] + for i in range(len(task_params.map_crop_sizes)): + ego_map_op = m.input_tensors['step']['analytical_counts_{:d}'.format(i)] + coverage_op = tf.cast(tf.greater_equal( + tf.reduce_max(ego_map_op, reduction_indices=[4], + keep_dims=True), 1), tf.float32) + coverage_op = tf.ones_like(ego_map_op) * coverage_op + m.ego_map_ops.append(ego_map_op) + m.coverage_ops.append(coverage_op) + m.train_ops['step_data_cache'] = [] + + num_steps = task_params.num_steps + num_goals = task_params.num_goals + + map_crop_size_ops = [] + for map_crop_size in task_params.map_crop_sizes: + map_crop_size_ops.append(tf.constant(map_crop_size, dtype=tf.int32, shape=(2,))) + + with tf.name_scope('check_size'): + is_single_step = tf.equal(tf.unstack(tf.shape(m.ego_map_ops[0]), num=5)[1], 1) + + fr_ops = []; value_ops = []; + fr_intermediate_ops = []; value_intermediate_ops = []; + crop_value_ops = []; + resize_crop_value_ops = []; + confs = []; occupancys = []; + + previous_value_op = None + updated_state = []; state_names = []; + + for i in range(len(task_params.map_crop_sizes)): + map_crop_size = task_params.map_crop_sizes[i] + with tf.variable_scope('scale_{:d}'.format(i)): + # Accumulate the map. + fn = lambda ns: running_combine( + m.ego_map_ops[i], + m.coverage_ops[i], + m.input_tensors['step']['incremental_locs'] * task_params.map_scales[i], + m.input_tensors['step']['incremental_thetas'], + m.input_tensors['step']['running_sum_num_{:d}'.format(i)], + m.input_tensors['step']['running_sum_denom_{:d}'.format(i)], + m.input_tensors['step']['running_max_denom_{:d}'.format(i)], + map_crop_size, ns) + + running_sum_num, running_sum_denom, running_max_denom = \ + tf.cond(is_single_step, lambda: fn(1), lambda: fn(num_steps*num_goals)) + updated_state += [running_sum_num, running_sum_denom, running_max_denom] + state_names += ['running_sum_num_{:d}'.format(i), + 'running_sum_denom_{:d}'.format(i), + 'running_max_denom_{:d}'.format(i)] + + # Concat the accumulated map and goal + occupancy = running_sum_num / tf.maximum(running_sum_denom, 0.001) + conf = running_max_denom + # print occupancy.get_shape().as_list() + + # Concat occupancy, how much occupied and goal. + with tf.name_scope('concat'): + sh = [-1, map_crop_size, map_crop_size, task_params.map_channels] + occupancy = tf.reshape(occupancy, shape=sh) + conf = tf.reshape(conf, shape=sh) + + sh = [-1, map_crop_size, map_crop_size, task_params.goal_channels] + goal = tf.reshape(m.input_tensors['step']['ego_goal_imgs_{:d}'.format(i)], shape=sh) + to_concat = [occupancy, conf, goal] + + if previous_value_op is not None: + to_concat.append(previous_value_op) + + x = tf.concat(to_concat, 3) + + # Pass the map, previous rewards and the goal through a few convolutional + # layers to get fR. + fr_op, fr_intermediate_op = fr_v2( + x, output_neurons=args.arch.fr_neurons, + inside_neurons=args.arch.fr_inside_neurons, + is_training=batch_norm_is_training_op, name='fr', + wt_decay=args.solver.wt_decay, stride=args.arch.fr_stride) + + # Do Value Iteration on the fR + if args.arch.vin_num_iters > 0: + value_op, value_intermediate_op = value_iteration_network( + fr_op, num_iters=args.arch.vin_num_iters, + val_neurons=args.arch.vin_val_neurons, + action_neurons=args.arch.vin_action_neurons, + kernel_size=args.arch.vin_ks, share_wts=args.arch.vin_share_wts, + name='vin', wt_decay=args.solver.wt_decay) + else: + value_op = fr_op + value_intermediate_op = [] + + # Crop out and upsample the previous value map. + remove = args.arch.crop_remove_each + if remove > 0: + crop_value_op = value_op[:, remove:-remove, remove:-remove,:] + else: + crop_value_op = value_op + crop_value_op = tf.reshape(crop_value_op, shape=[-1, args.arch.value_crop_size, + args.arch.value_crop_size, + args.arch.vin_val_neurons]) + if i < len(task_params.map_crop_sizes)-1: + # Reshape it to shape of the next scale. + previous_value_op = tf.image.resize_bilinear(crop_value_op, + map_crop_size_ops[i+1], + align_corners=True) + resize_crop_value_ops.append(previous_value_op) + + occupancys.append(occupancy) + confs.append(conf) + value_ops.append(value_op) + crop_value_ops.append(crop_value_op) + fr_ops.append(fr_op) + fr_intermediate_ops.append(fr_intermediate_op) + + m.value_ops = value_ops + m.value_intermediate_ops = value_intermediate_ops + m.fr_ops = fr_ops + m.fr_intermediate_ops = fr_intermediate_ops + m.final_value_op = crop_value_op + m.crop_value_ops = crop_value_ops + m.resize_crop_value_ops = resize_crop_value_ops + m.confs = confs + m.occupancys = occupancys + + sh = [-1, args.arch.vin_val_neurons*((args.arch.value_crop_size)**2)] + m.value_features_op = tf.reshape(m.final_value_op, sh, name='reshape_value_op') + + # Determine what action to take. + with tf.variable_scope('action_pred'): + batch_norm_param = args.arch.pred_batch_norm_param + if batch_norm_param is not None: + batch_norm_param['is_training'] = batch_norm_is_training_op + m.action_logits_op, _ = tf_utils.fc_network( + m.value_features_op, neurons=args.arch.pred_neurons, + wt_decay=args.solver.wt_decay, name='pred', offset=0, + num_pred=task_params.num_actions, + batch_norm_param=batch_norm_param) + m.action_prob_op = tf.nn.softmax(m.action_logits_op) + + init_state = tf.constant(0., dtype=tf.float32, shape=[ + task_params.batch_size, 1, map_crop_size, map_crop_size, + task_params.map_channels]) + + m.train_ops['state_names'] = state_names + m.train_ops['updated_state'] = updated_state + m.train_ops['init_state'] = [init_state for _ in updated_state] + + m.train_ops['step'] = m.action_prob_op + m.train_ops['common'] = [m.input_tensors['common']['orig_maps'], + m.input_tensors['common']['goal_loc']] + m.train_ops['batch_norm_is_training_op'] = batch_norm_is_training_op + m.loss_ops = []; m.loss_ops_names = []; + + if args.arch.readout_maps: + with tf.name_scope('readout_maps'): + all_occupancys = tf.concat(m.occupancys + m.confs, 3) + readout_maps, probs = readout_general( + all_occupancys, num_neurons=args.arch.rom_arch.num_neurons, + strides=args.arch.rom_arch.strides, + layers_per_block=args.arch.rom_arch.layers_per_block, + kernel_size=args.arch.rom_arch.kernel_size, + batch_norm_is_training_op=batch_norm_is_training_op, + wt_decay=args.solver.wt_decay) + + gt_ego_maps = [m.input_tensors['step']['readout_maps_{:d}'.format(i)] + for i in range(len(task_params.readout_maps_crop_sizes))] + m.readout_maps_gt = tf.concat(gt_ego_maps, 4) + gt_shape = tf.shape(m.readout_maps_gt) + m.readout_maps_logits = tf.reshape(readout_maps, gt_shape) + m.readout_maps_probs = tf.reshape(probs, gt_shape) + + # Add a loss op + m.readout_maps_loss_op = tf.losses.sigmoid_cross_entropy( + tf.reshape(m.readout_maps_gt, [-1, len(task_params.readout_maps_crop_sizes)]), + tf.reshape(readout_maps, [-1, len(task_params.readout_maps_crop_sizes)]), + scope='loss') + m.readout_maps_loss_op = 10.*m.readout_maps_loss_op + + ewma_decay = 0.99 if is_training else 0.0 + weight = tf.ones_like(m.input_tensors['train']['action'], dtype=tf.float32, + name='weight') + m.reg_loss_op, m.data_loss_op, m.total_loss_op, m.acc_ops = \ + compute_losses_multi_or(m.action_logits_op, + m.input_tensors['train']['action'], weights=weight, + num_actions=task_params.num_actions, + data_loss_wt=args.solver.data_loss_wt, + reg_loss_wt=args.solver.reg_loss_wt, + ewma_decay=ewma_decay) + + if args.arch.readout_maps: + m.total_loss_op = m.total_loss_op + m.readout_maps_loss_op + m.loss_ops += [m.readout_maps_loss_op] + m.loss_ops_names += ['readout_maps_loss'] + + m.loss_ops += [m.reg_loss_op, m.data_loss_op, m.total_loss_op] + m.loss_ops_names += ['reg_loss', 'data_loss', 'total_loss'] + + if args.solver.freeze_conv: + vars_to_optimize = list(set(tf.trainable_variables()) - + set(m.vision_ops.vars_to_restore)) + else: + vars_to_optimize = None + + m.lr_op, m.global_step_op, m.train_op, m.should_stop_op, m.optimizer, \ + m.sync_optimizer = tf_utils.setup_training( + m.total_loss_op, + args.solver.initial_learning_rate, + args.solver.steps_per_decay, + args.solver.learning_rate_decay, + args.solver.momentum, + args.solver.max_steps, + args.solver.sync, + args.solver.adjust_lr_sync, + args.solver.num_workers, + args.solver.task, + vars_to_optimize=vars_to_optimize, + clip_gradient_norm=args.solver.clip_gradient_norm, + typ=args.solver.typ, momentum2=args.solver.momentum2, + adam_eps=args.solver.adam_eps) + + if args.arch.sample_gt_prob_type == 'inverse_sigmoid_decay': + m.sample_gt_prob_op = tf_utils.inverse_sigmoid_decay(args.arch.isd_k, + m.global_step_op) + elif args.arch.sample_gt_prob_type == 'zero': + m.sample_gt_prob_op = tf.constant(-1.0, dtype=tf.float32) + + elif args.arch.sample_gt_prob_type.split('_')[0] == 'step': + step = int(args.arch.sample_gt_prob_type.split('_')[1]) + m.sample_gt_prob_op = tf_utils.step_gt_prob( + step, m.input_tensors['step']['step_number'][0,0,0]) + + m.sample_action_type = args.arch.action_sample_type + m.sample_action_combine_type = args.arch.action_sample_combine_type + + m.summary_ops = { + summary_mode: _add_summaries(m, args, summary_mode, + args.summary.arop_full_summary_iters)} + + m.init_op = tf.group(tf.global_variables_initializer(), + tf.local_variables_initializer()) + m.saver_op = tf.train.Saver(keep_checkpoint_every_n_hours=4, + write_version=tf.train.SaverDef.V2) + return m diff --git a/cognitive_mapping_and_planning/tfcode/cmp_summary.py b/cognitive_mapping_and_planning/tfcode/cmp_summary.py new file mode 100644 index 0000000000000000000000000000000000000000..55313bfbd52a9e079e1de5093ae1882a9bf1d858 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/cmp_summary.py @@ -0,0 +1,213 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Code for setting up summaries for CMP. +""" + +import sys, os, numpy as np +import matplotlib.pyplot as plt + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from src import utils +import src.file_utils as fu +import tfcode.nav_utils as nu + +def _vis_readout_maps(outputs, global_step, output_dir, metric_summary, N): + # outputs is [gt_map, pred_map]: + if N >= 0: + outputs = outputs[:N] + N = len(outputs) + + plt.set_cmap('jet') + fig, axes = utils.subplot(plt, (N, outputs[0][0].shape[4]*2), (5,5)) + axes = axes.ravel()[::-1].tolist() + for i in range(N): + gt_map, pred_map = outputs[i] + for j in [0]: + for k in range(gt_map.shape[4]): + # Display something like the midpoint of the trajectory. + id = np.int(gt_map.shape[1]/2) + + ax = axes.pop(); + ax.imshow(gt_map[j,id,:,:,k], origin='lower', interpolation='none', + vmin=0., vmax=1.) + ax.set_axis_off(); + if i == 0: ax.set_title('gt_map') + + ax = axes.pop(); + ax.imshow(pred_map[j,id,:,:,k], origin='lower', interpolation='none', + vmin=0., vmax=1.) + ax.set_axis_off(); + if i == 0: ax.set_title('pred_map') + + file_name = os.path.join(output_dir, 'readout_map_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + +def _vis(outputs, global_step, output_dir, metric_summary, N): + # Plot the value map, goal for various maps to see what if the model is + # learning anything useful. + # + # outputs is [values, goals, maps, occupancy, conf]. + # + if N >= 0: + outputs = outputs[:N] + N = len(outputs) + + plt.set_cmap('jet') + fig, axes = utils.subplot(plt, (N, outputs[0][0].shape[4]*5), (5,5)) + axes = axes.ravel()[::-1].tolist() + for i in range(N): + values, goals, maps, occupancy, conf = outputs[i] + for j in [0]: + for k in range(values.shape[4]): + # Display something like the midpoint of the trajectory. + id = np.int(values.shape[1]/2) + + ax = axes.pop(); + ax.imshow(goals[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('goal') + + ax = axes.pop(); + ax.imshow(occupancy[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('occupancy') + + ax = axes.pop(); + ax.imshow(conf[j,id,:,:,k], origin='lower', interpolation='none', + vmin=0., vmax=1.) + ax.set_axis_off(); + if i == 0: ax.set_title('conf') + + ax = axes.pop(); + ax.imshow(values[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('value') + + ax = axes.pop(); + ax.imshow(maps[j,id,:,:,k], origin='lower', interpolation='none') + ax.set_axis_off(); + if i == 0: ax.set_title('incr map') + + file_name = os.path.join(output_dir, 'value_vis_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + +def _summary_vis(m, batch_size, num_steps, arop_full_summary_iters): + arop = []; arop_summary_iters = []; arop_eval_fns = []; + vis_value_ops = []; vis_goal_ops = []; vis_map_ops = []; + vis_occupancy_ops = []; vis_conf_ops = []; + for i, val_op in enumerate(m.value_ops): + vis_value_op = tf.reduce_mean(tf.abs(val_op), axis=3, keep_dims=True) + vis_value_ops.append(vis_value_op) + + vis_occupancy_op = tf.reduce_mean(tf.abs(m.occupancys[i]), 3, True) + vis_occupancy_ops.append(vis_occupancy_op) + + vis_conf_op = tf.reduce_max(tf.abs(m.confs[i]), axis=3, keep_dims=True) + vis_conf_ops.append(vis_conf_op) + + ego_goal_imgs_i_op = m.input_tensors['step']['ego_goal_imgs_{:d}'.format(i)] + vis_goal_op = tf.reduce_max(ego_goal_imgs_i_op, 4, True) + vis_goal_ops.append(vis_goal_op) + + vis_map_op = tf.reduce_mean(tf.abs(m.ego_map_ops[i]), 4, True) + vis_map_ops.append(vis_map_op) + + vis_goal_ops = tf.concat(vis_goal_ops, 4) + vis_map_ops = tf.concat(vis_map_ops, 4) + vis_value_ops = tf.concat(vis_value_ops, 3) + vis_occupancy_ops = tf.concat(vis_occupancy_ops, 3) + vis_conf_ops = tf.concat(vis_conf_ops, 3) + + sh = tf.unstack(tf.shape(vis_value_ops))[1:] + vis_value_ops = tf.reshape(vis_value_ops, shape=[batch_size, -1] + sh) + + sh = tf.unstack(tf.shape(vis_conf_ops))[1:] + vis_conf_ops = tf.reshape(vis_conf_ops, shape=[batch_size, -1] + sh) + + sh = tf.unstack(tf.shape(vis_occupancy_ops))[1:] + vis_occupancy_ops = tf.reshape(vis_occupancy_ops, shape=[batch_size,-1] + sh) + + # Save memory, only return time steps that need to be visualized, factor of + # 32 CPU memory saving. + id = np.int(num_steps/2) + vis_goal_ops = tf.expand_dims(vis_goal_ops[:,id,:,:,:], axis=1) + vis_map_ops = tf.expand_dims(vis_map_ops[:,id,:,:,:], axis=1) + vis_value_ops = tf.expand_dims(vis_value_ops[:,id,:,:,:], axis=1) + vis_conf_ops = tf.expand_dims(vis_conf_ops[:,id,:,:,:], axis=1) + vis_occupancy_ops = tf.expand_dims(vis_occupancy_ops[:,id,:,:,:], axis=1) + + arop += [[vis_value_ops, vis_goal_ops, vis_map_ops, vis_occupancy_ops, + vis_conf_ops]] + arop_summary_iters += [arop_full_summary_iters] + arop_eval_fns += [_vis] + return arop, arop_summary_iters, arop_eval_fns + +def _summary_readout_maps(m, num_steps, arop_full_summary_iters): + arop = []; arop_summary_iters = []; arop_eval_fns = []; + id = np.int(num_steps-1) + vis_readout_maps_gt = m.readout_maps_gt + vis_readout_maps_prob = tf.reshape(m.readout_maps_probs, + shape=tf.shape(vis_readout_maps_gt)) + vis_readout_maps_gt = tf.expand_dims(vis_readout_maps_gt[:,id,:,:,:], 1) + vis_readout_maps_prob = tf.expand_dims(vis_readout_maps_prob[:,id,:,:,:], 1) + arop += [[vis_readout_maps_gt, vis_readout_maps_prob]] + arop_summary_iters += [arop_full_summary_iters] + arop_eval_fns += [_vis_readout_maps] + return arop, arop_summary_iters, arop_eval_fns + +def _add_summaries(m, args, summary_mode, arop_full_summary_iters): + task_params = args.navtask.task_params + + summarize_ops = [m.lr_op, m.global_step_op, m.sample_gt_prob_op] + \ + m.loss_ops + m.acc_ops + summarize_names = ['lr', 'global_step', 'sample_gt_prob_op'] + \ + m.loss_ops_names + ['acc_{:d}'.format(i) for i in range(len(m.acc_ops))] + to_aggregate = [0, 0, 0] + [1]*len(m.loss_ops_names) + [1]*len(m.acc_ops) + + scope_name = 'summary' + with tf.name_scope(scope_name): + s_ops = nu.add_default_summaries(summary_mode, arop_full_summary_iters, + summarize_ops, summarize_names, + to_aggregate, m.action_prob_op, + m.input_tensors, scope_name=scope_name) + if summary_mode == 'val': + arop, arop_summary_iters, arop_eval_fns = _summary_vis( + m, task_params.batch_size, task_params.num_steps, + arop_full_summary_iters) + s_ops.additional_return_ops += arop + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + if args.arch.readout_maps: + arop, arop_summary_iters, arop_eval_fns = _summary_readout_maps( + m, task_params.num_steps, arop_full_summary_iters) + s_ops.additional_return_ops += arop + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + return s_ops diff --git a/cognitive_mapping_and_planning/tfcode/cmp_utils.py b/cognitive_mapping_and_planning/tfcode/cmp_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..6d87c697b4b29128c8b8a42caac27aeb4d657ec6 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/cmp_utils.py @@ -0,0 +1,164 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utility functions for setting up the CMP graph. +""" + +import os, numpy as np +import matplotlib.pyplot as plt + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope +import logging +from src import utils +import src.file_utils as fu +from tfcode import tf_utils + +resnet_v2 = tf_utils.resnet_v2 +custom_residual_block = tf_utils.custom_residual_block + +def value_iteration_network( + fr, num_iters, val_neurons, action_neurons, kernel_size, share_wts=False, + name='vin', wt_decay=0.0001, activation_fn=None, shape_aware=False): + """ + Constructs a Value Iteration Network, convolutions and max pooling across + channels. + Input: + fr: NxWxHxC + val_neurons: Number of channels for maintaining the value. + action_neurons: Computes action_neurons * val_neurons at each iteration to + max pool over. + Output: + value image: NxHxWx(val_neurons) + """ + init_var = np.sqrt(2.0/(kernel_size**2)/(val_neurons*action_neurons)) + vals = [] + with tf.variable_scope(name) as varscope: + if shape_aware == False: + fr_shape = tf.unstack(tf.shape(fr)) + val_shape = tf.stack(fr_shape[:-1] + [val_neurons]) + val = tf.zeros(val_shape, name='val_init') + else: + val = tf.expand_dims(tf.zeros_like(fr[:,:,:,0]), dim=-1) * \ + tf.constant(0., dtype=tf.float32, shape=[1,1,1,val_neurons]) + val_shape = tf.shape(val) + vals.append(val) + for i in range(num_iters): + if share_wts: + # The first Value Iteration maybe special, so it can have its own + # paramterss. + scope = 'conv' + if i == 0: scope = 'conv_0' + if i > 1: varscope.reuse_variables() + else: + scope = 'conv_{:d}'.format(i) + val = slim.conv2d(tf.concat([val, fr], 3, name='concat_{:d}'.format(i)), + num_outputs=action_neurons*val_neurons, + kernel_size=kernel_size, stride=1, activation_fn=activation_fn, + scope=scope, normalizer_fn=None, + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var), + biases_initializer=tf.zeros_initializer()) + val = tf.reshape(val, [-1, action_neurons*val_neurons, 1, 1], + name='re_{:d}'.format(i)) + val = slim.max_pool2d(val, kernel_size=[action_neurons,1], + stride=[action_neurons,1], padding='VALID', + scope='val_{:d}'.format(i)) + val = tf.reshape(val, val_shape, name='unre_{:d}'.format(i)) + vals.append(val) + return val, vals + + +def rotate_preds(loc_on_map, relative_theta, map_size, preds, + output_valid_mask): + with tf.name_scope('rotate'): + flow_op = tf_utils.get_flow(loc_on_map, relative_theta, map_size=map_size) + if type(preds) != list: + rotated_preds, valid_mask_warps = tf_utils.dense_resample(preds, flow_op, + output_valid_mask) + else: + rotated_preds = [] ;valid_mask_warps = [] + for pred in preds: + rotated_pred, valid_mask_warp = tf_utils.dense_resample(pred, flow_op, + output_valid_mask) + rotated_preds.append(rotated_pred) + valid_mask_warps.append(valid_mask_warp) + return rotated_preds, valid_mask_warps + +def get_visual_frustum(map_size, shape_like, expand_dims=[0,0]): + with tf.name_scope('visual_frustum'): + l = np.tril(np.ones(map_size)) ;l = l + l[:,::-1] + l = (l == 2).astype(np.float32) + for e in expand_dims: + l = np.expand_dims(l, axis=e) + confs_probs = tf.constant(l, dtype=tf.float32) + confs_probs = tf.ones_like(shape_like, dtype=tf.float32) * confs_probs + return confs_probs + +def deconv(x, is_training, wt_decay, neurons, strides, layers_per_block, + kernel_size, conv_fn, name, offset=0): + """Generates a up sampling network with residual connections. + """ + batch_norm_param = {'center': True, 'scale': True, + 'activation_fn': tf.nn.relu, + 'is_training': is_training} + outs = [] + for i, (neuron, stride) in enumerate(zip(neurons, strides)): + for s in range(layers_per_block): + scope = '{:s}_{:d}_{:d}'.format(name, i+1+offset,s+1) + x = custom_residual_block(x, neuron, kernel_size, stride, scope, + is_training, wt_decay, use_residual=True, + residual_stride_conv=True, conv_fn=conv_fn, + batch_norm_param=batch_norm_param) + stride = 1 + outs.append((x,True)) + return x, outs + +def fr_v2(x, output_neurons, inside_neurons, is_training, name='fr', + wt_decay=0.0001, stride=1, updates_collections=tf.GraphKeys.UPDATE_OPS): + """Performs fusion of information between the map and the reward map. + Inputs + x: NxHxWxC1 + + Outputs + fr map: NxHxWx(output_neurons) + """ + if type(stride) != list: + stride = [stride] + with slim.arg_scope(resnet_v2.resnet_utils.resnet_arg_scope( + is_training=is_training, weight_decay=wt_decay)): + with slim.arg_scope([slim.batch_norm], updates_collections=updates_collections) as arg_sc: + # Change the updates_collections for the conv normalizer_params to None + for i in range(len(arg_sc.keys())): + if 'convolution' in arg_sc.keys()[i]: + arg_sc.values()[i]['normalizer_params']['updates_collections'] = updates_collections + with slim.arg_scope(arg_sc): + bottleneck = resnet_v2.bottleneck + blocks = [] + for i, s in enumerate(stride): + b = resnet_v2.resnet_utils.Block( + 'block{:d}'.format(i + 1), bottleneck, [{ + 'depth': output_neurons, + 'depth_bottleneck': inside_neurons, + 'stride': stride[i] + }]) + blocks.append(b) + x, outs = resnet_v2.resnet_v2(x, blocks, num_classes=None, global_pool=False, + output_stride=None, include_root_block=False, + reuse=False, scope=name) + return x, outs diff --git a/cognitive_mapping_and_planning/tfcode/nav_utils.py b/cognitive_mapping_and_planning/tfcode/nav_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..2f764f33df91a80f6539dcbae1e0fa7093becd29 --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/nav_utils.py @@ -0,0 +1,435 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Various losses for training navigation agents. + +Defines various loss functions for navigation agents, +compute_losses_multi_or. +""" + +import os, numpy as np +import matplotlib.pyplot as plt + + +import tensorflow as tf + +from tensorflow.contrib import slim +from tensorflow.contrib.slim import arg_scope +from tensorflow.contrib.slim.nets import resnet_v2 +from tensorflow.python.training import moving_averages +import logging +from src import utils +import src.file_utils as fu +from tfcode import tf_utils + + +def compute_losses_multi_or(logits, actions_one_hot, weights=None, + num_actions=-1, data_loss_wt=1., reg_loss_wt=1., + ewma_decay=0.99, reg_loss_op=None): + assert(num_actions > 0), 'num_actions must be specified and must be > 0.' + + with tf.name_scope('loss'): + if weights is None: + weight = tf.ones_like(actions_one_hot, dtype=tf.float32, name='weight') + + actions_one_hot = tf.cast(tf.reshape(actions_one_hot, [-1, num_actions], + 're_actions_one_hot'), tf.float32) + weights = tf.reduce_sum(tf.reshape(weights, [-1, num_actions], 're_weight'), + reduction_indices=1) + total = tf.reduce_sum(weights) + + action_prob = tf.nn.softmax(logits) + action_prob = tf.reduce_sum(tf.multiply(action_prob, actions_one_hot), + reduction_indices=1) + example_loss = -tf.log(tf.maximum(tf.constant(1e-4), action_prob)) + + data_loss_op = tf.reduce_sum(example_loss * weights) / total + if reg_loss_op is None: + if reg_loss_wt > 0: + reg_loss_op = tf.add_n(tf.losses.get_regularization_losses()) + else: + reg_loss_op = tf.constant(0.) + + if reg_loss_wt > 0: + total_loss_op = data_loss_wt*data_loss_op + reg_loss_wt*reg_loss_op + else: + total_loss_op = data_loss_wt*data_loss_op + + is_correct = tf.cast(tf.greater(action_prob, 0.5, name='pred_class'), tf.float32) + acc_op = tf.reduce_sum(is_correct*weights) / total + + ewma_acc_op = moving_averages.weighted_moving_average( + acc_op, ewma_decay, weight=total, name='ewma_acc') + + acc_ops = [ewma_acc_op] + + return reg_loss_op, data_loss_op, total_loss_op, acc_ops + + +def get_repr_from_image(images_reshaped, modalities, data_augment, encoder, + freeze_conv, wt_decay, is_training): + # Pass image through lots of convolutional layers, to obtain pool5 + if modalities == ['rgb']: + with tf.name_scope('pre_rgb'): + x = (images_reshaped + 128.) / 255. # Convert to brightness between 0 and 1. + if data_augment.relight and is_training: + x = tf_utils.distort_image(x, fast_mode=data_augment.relight_fast) + x = (x-0.5)*2.0 + scope_name = encoder + elif modalities == ['depth']: + with tf.name_scope('pre_d'): + d_image = images_reshaped + x = 2*(d_image[...,0] - 80.0)/100.0 + y = d_image[...,1] + d_image = tf.concat([tf.expand_dims(x, -1), tf.expand_dims(y, -1)], 3) + x = d_image + scope_name = 'd_'+encoder + + resnet_is_training = is_training and (not freeze_conv) + with slim.arg_scope(resnet_v2.resnet_utils.resnet_arg_scope(resnet_is_training)): + fn = getattr(tf_utils, encoder) + x, end_points = fn(x, num_classes=None, global_pool=False, + output_stride=None, reuse=None, + scope=scope_name) + vars_ = slim.get_variables_to_restore() + + conv_feat = x + return conv_feat, vars_ + +def default_train_step_kwargs(m, obj, logdir, rng_seed, is_chief, num_steps, + iters, train_display_interval, + dagger_sample_bn_false): + train_step_kwargs = {} + train_step_kwargs['obj'] = obj + train_step_kwargs['m'] = m + + # rng_data has 2 independent rngs, one for sampling episodes and one for + # sampling perturbs (so that we can make results reproducible. + train_step_kwargs['rng_data'] = [np.random.RandomState(rng_seed), + np.random.RandomState(rng_seed)] + train_step_kwargs['rng_action'] = np.random.RandomState(rng_seed) + if is_chief: + train_step_kwargs['writer'] = tf.summary.FileWriter(logdir) #, m.tf_graph) + else: + train_step_kwargs['writer'] = None + train_step_kwargs['iters'] = iters + train_step_kwargs['train_display_interval'] = train_display_interval + train_step_kwargs['num_steps'] = num_steps + train_step_kwargs['logdir'] = logdir + train_step_kwargs['dagger_sample_bn_false'] = dagger_sample_bn_false + return train_step_kwargs + +# Utilities for visualizing and analysing validation output. +def save_d_at_t(outputs, global_step, output_dir, metric_summary, N): + """Save distance to goal at all time steps. + + Args: + outputs : [gt_dist_to_goal]. + global_step : number of iterations. + output_dir : output directory. + metric_summary : to append scalars to summary. + N : number of outputs to process. + + """ + d_at_t = np.concatenate(map(lambda x: x[0][:,:,0]*1, outputs), axis=0) + fig, axes = utils.subplot(plt, (1,1), (5,5)) + axes.plot(np.arange(d_at_t.shape[1]), np.mean(d_at_t, axis=0), 'r.') + axes.set_xlabel('time step') + axes.set_ylabel('dist to next goal') + axes.grid('on') + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.pkl'.format(global_step)) + utils.save_variables(file_name, [d_at_t], ['d_at_t'], overwrite=True) + plt.close(fig) + return None + +def save_all(outputs, global_step, output_dir, metric_summary, N): + """Save numerous statistics. + + Args: + outputs : [locs, goal_loc, gt_dist_to_goal, node_ids, perturbs] + global_step : number of iterations. + output_dir : output directory. + metric_summary : to append scalars to summary. + N : number of outputs to process. + """ + all_locs = np.concatenate(map(lambda x: x[0], outputs), axis=0) + all_goal_locs = np.concatenate(map(lambda x: x[1], outputs), axis=0) + all_d_at_t = np.concatenate(map(lambda x: x[2][:,:,0]*1, outputs), axis=0) + all_node_ids = np.concatenate(map(lambda x: x[3], outputs), axis=0) + all_perturbs = np.concatenate(map(lambda x: x[4], outputs), axis=0) + + file_name = os.path.join(output_dir, 'all_locs_at_t_{:d}.pkl'.format(global_step)) + vars = [all_locs, all_goal_locs, all_d_at_t, all_node_ids, all_perturbs] + var_names = ['all_locs', 'all_goal_locs', 'all_d_at_t', 'all_node_ids', 'all_perturbs'] + utils.save_variables(file_name, vars, var_names, overwrite=True) + return None + +def eval_ap(outputs, global_step, output_dir, metric_summary, N, num_classes=4): + """Processes the collected outputs to compute AP for action prediction. + + Args: + outputs : [logits, labels] + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + num_classes : number of classes to compute AP over, and to reshape tensors. + """ + if N >= 0: + outputs = outputs[:N] + logits = np.concatenate(map(lambda x: x[0], outputs), axis=0).reshape((-1, num_classes)) + labels = np.concatenate(map(lambda x: x[1], outputs), axis=0).reshape((-1, num_classes)) + aps = [] + for i in range(logits.shape[1]): + ap, rec, prec = utils.calc_pr(labels[:,i], logits[:,i]) + ap = ap[0] + tf_utils.add_value_to_summary(metric_summary, 'aps/ap_{:d}: '.format(i), ap) + aps.append(ap) + return aps + +def eval_dist(outputs, global_step, output_dir, metric_summary, N): + """Processes the collected outputs during validation to + 1. Plot the distance over time curve. + 2. Compute mean and median distances. + 3. Plots histogram of end distances. + + Args: + outputs : [locs, goal_loc, gt_dist_to_goal]. + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + """ + SUCCESS_THRESH = 3 + if N >= 0: + outputs = outputs[:N] + + # Plot distance at time t. + d_at_t = [] + for i in range(len(outputs)): + locs, goal_loc, gt_dist_to_goal = outputs[i] + d_at_t.append(gt_dist_to_goal[:,:,0]*1) + + # Plot the distance. + fig, axes = utils.subplot(plt, (1,1), (5,5)) + d_at_t = np.concatenate(d_at_t, axis=0) + axes.plot(np.arange(d_at_t.shape[1]), np.mean(d_at_t, axis=0), 'r.') + axes.set_xlabel('time step') + axes.set_ylabel('dist to next goal') + axes.grid('on') + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + file_name = os.path.join(output_dir, 'dist_at_t_{:d}.pkl'.format(global_step)) + utils.save_variables(file_name, [d_at_t], ['d_at_t'], overwrite=True) + plt.close(fig) + + # Plot the trajectories and the init_distance and final distance. + d_inits = [] + d_ends = [] + for i in range(len(outputs)): + locs, goal_loc, gt_dist_to_goal = outputs[i] + d_inits.append(gt_dist_to_goal[:,0,0]*1) + d_ends.append(gt_dist_to_goal[:,-1,0]*1) + + # Plot the distance. + fig, axes = utils.subplot(plt, (1,1), (5,5)) + d_inits = np.concatenate(d_inits, axis=0) + d_ends = np.concatenate(d_ends, axis=0) + axes.plot(d_inits+np.random.rand(*(d_inits.shape))-0.5, + d_ends+np.random.rand(*(d_ends.shape))-0.5, '.', mec='red', mew=1.0) + axes.set_xlabel('init dist'); axes.set_ylabel('final dist'); + axes.grid('on'); axes.axis('equal'); + title_str = 'mean: {:0.1f}, 50: {:0.1f}, 75: {:0.2f}, s: {:0.1f}' + title_str = title_str.format( + np.mean(d_ends), np.median(d_ends), np.percentile(d_ends, q=75), + 100*(np.mean(d_ends <= SUCCESS_THRESH))) + axes.set_title(title_str) + file_name = os.path.join(output_dir, 'dist_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + file_name = os.path.join(output_dir, 'dist_{:d}.pkl'.format(global_step)) + utils.save_variables(file_name, [d_inits, d_ends], ['d_inits', 'd_ends'], + overwrite=True) + plt.close(fig) + + # Plot the histogram of the end_distance. + with plt.style.context('seaborn-white'): + d_ends_ = np.sort(d_ends) + d_inits_ = np.sort(d_inits) + leg = []; + fig, ax = utils.subplot(plt, (1,1), (5,5)) + ax.grid('on') + ax.set_xlabel('Distance from goal'); ax.xaxis.label.set_fontsize(16); + ax.set_ylabel('Fraction of data'); ax.yaxis.label.set_fontsize(16); + ax.plot(d_ends_, np.arange(d_ends_.size)*1./d_ends_.size, 'r') + ax.plot(d_inits_, np.arange(d_inits_.size)*1./d_inits_.size, 'k') + leg.append('Final'); leg.append('Init'); + ax.legend(leg, fontsize='x-large'); + ax.set_axis_on() + title_str = 'mean: {:0.1f}, 50: {:0.1f}, 75: {:0.2f}, s: {:0.1f}' + title_str = title_str.format( + np.mean(d_ends), np.median(d_ends), np.percentile(d_ends, q=75), + 100*(np.mean(d_ends <= SUCCESS_THRESH))) + ax.set_title(title_str) + file_name = os.path.join(output_dir, 'dist_hist_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + + # Log distance metrics. + tf_utils.add_value_to_summary(metric_summary, 'dists/success_init: ', + 100*(np.mean(d_inits <= SUCCESS_THRESH))) + tf_utils.add_value_to_summary(metric_summary, 'dists/success_end: ', + 100*(np.mean(d_ends <= SUCCESS_THRESH))) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_init (75): ', + np.percentile(d_inits, q=75)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_end (75): ', + np.percentile(d_ends, q=75)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_init (median): ', + np.median(d_inits)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_end (median): ', + np.median(d_ends)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_init (mean): ', + np.mean(d_inits)) + tf_utils.add_value_to_summary(metric_summary, 'dists/dist_end (mean): ', + np.mean(d_ends)) + return np.median(d_inits), np.median(d_ends), np.mean(d_inits), np.mean(d_ends), \ + np.percentile(d_inits, q=75), np.percentile(d_ends, q=75), \ + 100*(np.mean(d_inits) <= SUCCESS_THRESH), 100*(np.mean(d_ends) <= SUCCESS_THRESH) + +def plot_trajectories(outputs, global_step, output_dir, metric_summary, N): + """Processes the collected outputs during validation to plot the trajectories + in the top view. + + Args: + outputs : [locs, orig_maps, goal_loc]. + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + """ + if N >= 0: + outputs = outputs[:N] + N = len(outputs) + + plt.set_cmap('gray') + fig, axes = utils.subplot(plt, (N, outputs[0][1].shape[0]), (5,5)) + axes = axes.ravel()[::-1].tolist() + for i in range(N): + locs, orig_maps, goal_loc = outputs[i] + is_semantic = np.isnan(goal_loc[0,0,1]) + for j in range(orig_maps.shape[0]): + ax = axes.pop(); + ax.plot(locs[j,0,0], locs[j,0,1], 'ys') + # Plot one by one, so that they come in different colors. + for k in range(goal_loc.shape[1]): + if not is_semantic: + ax.plot(goal_loc[j,k,0], goal_loc[j,k,1], 's') + if False: + ax.plot(locs[j,:,0], locs[j,:,1], 'r.', ms=3) + ax.imshow(orig_maps[j,0,:,:,0], origin='lower') + ax.set_axis_off(); + else: + ax.scatter(locs[j,:,0], locs[j,:,1], c=np.arange(locs.shape[1]), + cmap='jet', s=10, lw=0) + ax.imshow(orig_maps[j,0,:,:,0], origin='lower', vmin=-1.0, vmax=2.0) + if not is_semantic: + xymin = np.minimum(np.min(goal_loc[j,:,:], axis=0), np.min(locs[j,:,:], axis=0)) + xymax = np.maximum(np.max(goal_loc[j,:,:], axis=0), np.max(locs[j,:,:], axis=0)) + else: + xymin = np.min(locs[j,:,:], axis=0) + xymax = np.max(locs[j,:,:], axis=0) + xy1 = (xymax+xymin)/2. - np.maximum(np.max(xymax-xymin), 12) + xy2 = (xymax+xymin)/2. + np.maximum(np.max(xymax-xymin), 12) + ax.set_xlim([xy1[0], xy2[0]]) + ax.set_ylim([xy1[1], xy2[1]]) + ax.set_axis_off() + file_name = os.path.join(output_dir, 'trajectory_{:d}.png'.format(global_step)) + with fu.fopen(file_name, 'w') as f: + fig.savefig(f, bbox_inches='tight', transparent=True, pad_inches=0) + plt.close(fig) + return None + +def add_default_summaries(mode, arop_full_summary_iters, summarize_ops, + summarize_names, to_aggregate, action_prob_op, + input_tensors, scope_name): + assert(mode == 'train' or mode == 'val' or mode == 'test'), \ + 'add_default_summaries mode is neither train or val or test.' + + s_ops = tf_utils.get_default_summary_ops() + + if mode == 'train': + s_ops.summary_ops, s_ops.print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns = tf_utils.simple_summaries( + summarize_ops, summarize_names, mode, to_aggregate=False, + scope_name=scope_name) + s_ops.additional_return_ops += additional_return_ops + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + elif mode == 'val': + s_ops.summary_ops, s_ops.print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns = tf_utils.simple_summaries( + summarize_ops, summarize_names, mode, to_aggregate=to_aggregate, + scope_name=scope_name) + s_ops.additional_return_ops += additional_return_ops + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + elif mode == 'test': + s_ops.summary_ops, s_ops.print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns = tf_utils.simple_summaries( + [], [], mode, to_aggregate=[], scope_name=scope_name) + s_ops.additional_return_ops += additional_return_ops + s_ops.arop_summary_iters += arop_summary_iters + s_ops.arop_eval_fns += arop_eval_fns + + + if mode == 'val': + arop = s_ops.additional_return_ops + arop += [[action_prob_op, input_tensors['train']['action']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['goal_loc'], + input_tensors['step']['gt_dist_to_goal']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['orig_maps'], + input_tensors['common']['goal_loc']]] + s_ops.arop_summary_iters += [-1, arop_full_summary_iters, + arop_full_summary_iters] + s_ops.arop_eval_fns += [eval_ap, eval_dist, plot_trajectories] + + elif mode == 'test': + arop = s_ops.additional_return_ops + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['goal_loc'], + input_tensors['step']['gt_dist_to_goal']]] + arop += [[input_tensors['step']['gt_dist_to_goal']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['goal_loc'], + input_tensors['step']['gt_dist_to_goal'], + input_tensors['step']['node_ids'], + input_tensors['step']['perturbs']]] + arop += [[input_tensors['step']['loc_on_map'], + input_tensors['common']['orig_maps'], + input_tensors['common']['goal_loc']]] + s_ops.arop_summary_iters += [-1, -1, -1, arop_full_summary_iters] + s_ops.arop_eval_fns += [eval_dist, save_d_at_t, save_all, + plot_trajectories] + return s_ops + + diff --git a/cognitive_mapping_and_planning/tfcode/tf_utils.py b/cognitive_mapping_and_planning/tfcode/tf_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..5f96d8ff5ce7473f0ec49096abcbac274e6c4fcc --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/tf_utils.py @@ -0,0 +1,840 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import numpy as np +import sys +import tensorflow as tf +import src.utils as utils +import logging +from tensorflow.contrib import slim +from tensorflow.contrib.metrics.python.ops import confusion_matrix_ops +from tensorflow.contrib.slim import arg_scope +from tensorflow.contrib.slim.nets import resnet_v2 +from tensorflow.python.framework import dtypes +from tensorflow.python.ops import array_ops +from tensorflow.python.ops import check_ops +from tensorflow.python.ops import math_ops +from tensorflow.python.ops import variable_scope +sys.path.insert(0, '../slim') +from preprocessing import inception_preprocessing as ip + +resnet_v2_50 = resnet_v2.resnet_v2_50 + + +def custom_residual_block(x, neurons, kernel_size, stride, name, is_training, + wt_decay=0.0001, use_residual=True, + residual_stride_conv=True, conv_fn=slim.conv2d, + batch_norm_param=None): + + # batch norm x and relu + init_var = np.sqrt(2.0/(kernel_size**2)/neurons) + with arg_scope([conv_fn], + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var), + biases_initializer=tf.zeros_initializer()): + + if batch_norm_param is None: + batch_norm_param = {'center': True, 'scale': False, + 'activation_fn':tf.nn.relu, + 'is_training': is_training} + + y = slim.batch_norm(x, scope=name+'_bn', **batch_norm_param) + + y = conv_fn(y, num_outputs=neurons, kernel_size=kernel_size, stride=stride, + activation_fn=None, scope=name+'_1', + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param) + + y = conv_fn(y, num_outputs=neurons, kernel_size=kernel_size, + stride=1, activation_fn=None, scope=name+'_2') + + if use_residual: + if stride != 1 or x.get_shape().as_list()[-1] != neurons: + batch_norm_param_ = dict(batch_norm_param) + batch_norm_param_['activation_fn'] = None + x = conv_fn(x, num_outputs=neurons, kernel_size=1, + stride=stride if residual_stride_conv else 1, + activation_fn=None, scope=name+'_0_1x1', + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param_) + if not residual_stride_conv: + x = slim.avg_pool2d(x, 1, stride=stride, scope=name+'_0_avg') + + y = tf.add(x, y, name=name+'_add') + + return y + +def step_gt_prob(step, step_number_op): + # Change samping probability from 1 to -1 at step steps. + with tf.name_scope('step_gt_prob'): + out = tf.cond(tf.less(step_number_op, step), + lambda: tf.constant(1.), lambda: tf.constant(-1.)) + return out + +def inverse_sigmoid_decay(k, global_step_op): + with tf.name_scope('inverse_sigmoid_decay'): + k = tf.constant(k, dtype=tf.float32) + tmp = k*tf.exp(-tf.cast(global_step_op, tf.float32)/k) + tmp = tmp / (1. + tmp) + return tmp + +def dense_resample(im, flow_im, output_valid_mask, name='dense_resample'): + """ Resample reward at particular locations. + Args: + im: ...xHxWxC matrix to sample from. + flow_im: ...xHxWx2 matrix, samples the image using absolute offsets as given + by the flow_im. + """ + with tf.name_scope(name): + valid_mask = None + + x, y = tf.unstack(flow_im, axis=-1) + x = tf.cast(tf.reshape(x, [-1]), tf.float32) + y = tf.cast(tf.reshape(y, [-1]), tf.float32) + + # constants + shape = tf.unstack(tf.shape(im)) + channels = shape[-1] + width = shape[-2] + height = shape[-3] + num_batch = tf.cast(tf.reduce_prod(tf.stack(shape[:-3])), 'int32') + zero = tf.constant(0, dtype=tf.int32) + + # Round up and down. + x0 = tf.cast(tf.floor(x), 'int32'); x1 = x0 + 1; + y0 = tf.cast(tf.floor(y), 'int32'); y1 = y0 + 1; + + if output_valid_mask: + valid_mask = tf.logical_and( + tf.logical_and(tf.less_equal(x, tf.cast(width, tf.float32)-1.), tf.greater_equal(x, 0.)), + tf.logical_and(tf.less_equal(y, tf.cast(height, tf.float32)-1.), tf.greater_equal(y, 0.))) + valid_mask = tf.reshape(valid_mask, shape=shape[:-1] + [1]) + + x0 = tf.clip_by_value(x0, zero, width-1) + x1 = tf.clip_by_value(x1, zero, width-1) + y0 = tf.clip_by_value(y0, zero, height-1) + y1 = tf.clip_by_value(y1, zero, height-1) + + dim2 = width; dim1 = width * height; + + # Create base index + base = tf.reshape(tf.range(num_batch) * dim1, shape=[-1,1]) + base = tf.reshape(tf.tile(base, [1, height*width]), shape=[-1]) + + base_y0 = base + y0 * dim2 + base_y1 = base + y1 * dim2 + idx_a = base_y0 + x0 + idx_b = base_y1 + x0 + idx_c = base_y0 + x1 + idx_d = base_y1 + x1 + + # use indices to lookup pixels in the flat image and restore channels dim + sh = tf.stack([tf.constant(-1,dtype=tf.int32), channels]) + im_flat = tf.cast(tf.reshape(im, sh), dtype=tf.float32) + pixel_a = tf.gather(im_flat, idx_a) + pixel_b = tf.gather(im_flat, idx_b) + pixel_c = tf.gather(im_flat, idx_c) + pixel_d = tf.gather(im_flat, idx_d) + + # and finally calculate interpolated values + x1_f = tf.to_float(x1) + y1_f = tf.to_float(y1) + + wa = tf.expand_dims(((x1_f - x) * (y1_f - y)), 1) + wb = tf.expand_dims((x1_f - x) * (1.0 - (y1_f - y)), 1) + wc = tf.expand_dims(((1.0 - (x1_f - x)) * (y1_f - y)), 1) + wd = tf.expand_dims(((1.0 - (x1_f - x)) * (1.0 - (y1_f - y))), 1) + + output = tf.add_n([wa * pixel_a, wb * pixel_b, wc * pixel_c, wd * pixel_d]) + output = tf.reshape(output, shape=tf.shape(im)) + return output, valid_mask + +def get_flow(t, theta, map_size, name_scope='gen_flow'): + """ + Rotates the map by theta and translates the rotated map by t. + + Assume that the robot rotates by an angle theta and then moves forward by + translation t. This function returns the flow field field. For every pixel in + the new image it tells us which pixel in the original image it came from: + NewI(x, y) = OldI(flow_x(x,y), flow_y(x,y)). + + Assume there is a point p in the original image. Robot rotates by R and moves + forward by t. p1 = Rt*p; p2 = p1 - t; (the world moves in opposite direction. + So, p2 = Rt*p - t, thus p2 came from R*(p2+t), which is what this function + calculates. + + t: ... x 2 (translation for B batches of N motions each). + theta: ... x 1 (rotation for B batches of N motions each). + + Output: ... x map_size x map_size x 2 + """ + + with tf.name_scope(name_scope): + tx, ty = tf.unstack(tf.reshape(t, shape=[-1, 1, 1, 1, 2]), axis=4) + theta = tf.reshape(theta, shape=[-1, 1, 1, 1]) + c = tf.constant((map_size-1.)/2., dtype=tf.float32) + + x, y = np.meshgrid(np.arange(map_size), np.arange(map_size)) + x = tf.constant(x[np.newaxis, :, :, np.newaxis], dtype=tf.float32, name='x', + shape=[1, map_size, map_size, 1]) + y = tf.constant(y[np.newaxis, :, :, np.newaxis], dtype=tf.float32, name='y', + shape=[1,map_size, map_size, 1]) + + x = x-(-tx+c) + y = y-(-ty+c) + + sin_theta = tf.sin(theta) + cos_theta = tf.cos(theta) + xr = cos_theta*x - sin_theta*y + yr = sin_theta*x + cos_theta*y + + xr = xr + c + yr = yr + c + + flow = tf.stack([xr, yr], axis=-1) + sh = tf.unstack(tf.shape(t), axis=0) + sh = tf.stack(sh[:-1]+[tf.constant(_, dtype=tf.int32) for _ in [map_size, map_size, 2]]) + flow = tf.reshape(flow, shape=sh) + return flow + +def distort_image(im, fast_mode=False): + # All images in the same batch are transformed the same way, but over + # iterations you see different distortions. + # im should be float with values between 0 and 1. + im_ = tf.reshape(im, shape=(-1,1,3)) + im_ = ip.apply_with_random_selector( + im_, lambda x, ordering: ip.distort_color(x, ordering, fast_mode), + num_cases=4) + im_ = tf.reshape(im_, tf.shape(im)) + return im_ + +def fc_network(x, neurons, wt_decay, name, num_pred=None, offset=0, + batch_norm_param=None, dropout_ratio=0.0, is_training=None): + if dropout_ratio > 0: + assert(is_training is not None), \ + 'is_training needs to be defined when trainnig with dropout.' + + repr = [] + for i, neuron in enumerate(neurons): + init_var = np.sqrt(2.0/neuron) + if batch_norm_param is not None: + x = slim.fully_connected(x, neuron, activation_fn=None, + weights_initializer=tf.random_normal_initializer(stddev=init_var), + weights_regularizer=slim.l2_regularizer(wt_decay), + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param, + biases_initializer=tf.zeros_initializer(), + scope='{:s}_{:d}'.format(name, offset+i)) + else: + x = slim.fully_connected(x, neuron, activation_fn=tf.nn.relu, + weights_initializer=tf.random_normal_initializer(stddev=init_var), + weights_regularizer=slim.l2_regularizer(wt_decay), + biases_initializer=tf.zeros_initializer(), + scope='{:s}_{:d}'.format(name, offset+i)) + if dropout_ratio > 0: + x = slim.dropout(x, keep_prob=1-dropout_ratio, is_training=is_training, + scope='{:s}_{:d}'.format('dropout_'+name, offset+i)) + repr.append(x) + + if num_pred is not None: + init_var = np.sqrt(2.0/num_pred) + x = slim.fully_connected(x, num_pred, + weights_regularizer=slim.l2_regularizer(wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var), + biases_initializer=tf.zeros_initializer(), + activation_fn=None, + scope='{:s}_pred'.format(name)) + return x, repr + +def concat_state_x_list(f, names): + af = {} + for i, k in enumerate(names): + af[k] = np.concatenate([x[i] for x in f], axis=1) + return af + +def concat_state_x(f, names): + af = {} + for k in names: + af[k] = np.concatenate([x[k] for x in f], axis=1) + # af[k] = np.swapaxes(af[k], 0, 1) + return af + +def sample_action(rng, action_probs, optimal_action, sample_gt_prob, + type='sample', combine_type='one_or_other'): + optimal_action_ = optimal_action/np.sum(optimal_action+0., 1, keepdims=True) + action_probs_ = action_probs/np.sum(action_probs+0.001, 1, keepdims=True) + batch_size = action_probs_.shape[0] + + action = np.zeros((batch_size), dtype=np.int32) + action_sample_wt = np.zeros((batch_size), dtype=np.float32) + if combine_type == 'add': + sample_gt_prob_ = np.minimum(np.maximum(sample_gt_prob, 0.), 1.) + + for i in range(batch_size): + if combine_type == 'one_or_other': + sample_gt = rng.rand() < sample_gt_prob + if sample_gt: distr_ = optimal_action_[i,:]*1. + else: distr_ = action_probs_[i,:]*1. + elif combine_type == 'add': + distr_ = optimal_action_[i,:]*sample_gt_prob_ + \ + (1.-sample_gt_prob_)*action_probs_[i,:] + distr_ = distr_ / np.sum(distr_) + + if type == 'sample': + action[i] = np.argmax(rng.multinomial(1, distr_, size=1)) + elif type == 'argmax': + action[i] = np.argmax(distr_) + action_sample_wt[i] = action_probs_[i, action[i]] / distr_[action[i]] + return action, action_sample_wt + +def train_step_custom_online_sampling(sess, train_op, global_step, + train_step_kwargs, mode='train'): + m = train_step_kwargs['m'] + obj = train_step_kwargs['obj'] + rng_data = train_step_kwargs['rng_data'] + rng_action = train_step_kwargs['rng_action'] + writer = train_step_kwargs['writer'] + iters = train_step_kwargs['iters'] + num_steps = train_step_kwargs['num_steps'] + logdir = train_step_kwargs['logdir'] + dagger_sample_bn_false = train_step_kwargs['dagger_sample_bn_false'] + train_display_interval = train_step_kwargs['train_display_interval'] + if 'outputs' not in m.train_ops: + m.train_ops['outputs'] = [] + + s_ops = m.summary_ops[mode] + val_additional_ops = [] + + # Print all variables here. + if False: + v = tf.get_collection(tf.GraphKeys.VARIABLES) + v_op = [_.value() for _ in v] + v_op_value = sess.run(v_op) + + filter = lambda x, y: 'Adam' in x.name + # filter = lambda x, y: np.is_any_nan(y) + ind = [i for i, (_, __) in enumerate(zip(v, v_op_value)) if filter(_, __)] + v = [v[i] for i in ind] + v_op_value = [v_op_value[i] for i in ind] + + for i in range(len(v)): + logging.info('XXXX: variable: %30s, is_any_nan: %5s, norm: %f.', + v[i].name, np.any(np.isnan(v_op_value[i])), + np.linalg.norm(v_op_value[i])) + + tt = utils.Timer() + for i in range(iters): + tt.tic() + # Sample a room. + e = obj.sample_env(rng_data) + + # Initialize the agent. + init_env_state = e.reset(rng_data) + + # Get and process the common data. + input = e.get_common_data() + input = e.pre_common_data(input) + feed_dict = prepare_feed_dict(m.input_tensors['common'], input) + if dagger_sample_bn_false: + feed_dict[m.train_ops['batch_norm_is_training_op']] = False + common_data = sess.run(m.train_ops['common'], feed_dict=feed_dict) + + states = [] + state_features = [] + state_targets = [] + net_state_to_input = [] + step_data_cache = [] + executed_actions = [] + rewards = [] + action_sample_wts = [] + states.append(init_env_state) + + net_state = sess.run(m.train_ops['init_state'], feed_dict=feed_dict) + net_state = dict(zip(m.train_ops['state_names'], net_state)) + net_state_to_input.append(net_state) + for j in range(num_steps): + f = e.get_features(states[j], j) + f = e.pre_features(f) + f.update(net_state) + f['step_number'] = np.ones((1,1,1), dtype=np.int32)*j + state_features.append(f) + + feed_dict = prepare_feed_dict(m.input_tensors['step'], state_features[-1]) + optimal_action = e.get_optimal_action(states[j], j) + for x, v in zip(m.train_ops['common'], common_data): + feed_dict[x] = v + if dagger_sample_bn_false: + feed_dict[m.train_ops['batch_norm_is_training_op']] = False + outs = sess.run([m.train_ops['step'], m.sample_gt_prob_op, + m.train_ops['step_data_cache'], + m.train_ops['updated_state'], + m.train_ops['outputs']], feed_dict=feed_dict) + action_probs = outs[0] + sample_gt_prob = outs[1] + step_data_cache.append(dict(zip(m.train_ops['step_data_cache'], outs[2]))) + net_state = outs[3] + if hasattr(e, 'update_state'): + outputs = outs[4] + outputs = dict(zip(m.train_ops['output_names'], outputs)) + e.update_state(outputs, j) + state_targets.append(e.get_targets(states[j], j)) + + if j < num_steps-1: + # Sample from action_probs and optimal action. + action, action_sample_wt = sample_action( + rng_action, action_probs, optimal_action, sample_gt_prob, + m.sample_action_type, m.sample_action_combine_type) + next_state, reward = e.take_action(states[j], action, j) + executed_actions.append(action) + states.append(next_state) + rewards.append(reward) + action_sample_wts.append(action_sample_wt) + net_state = dict(zip(m.train_ops['state_names'], net_state)) + net_state_to_input.append(net_state) + + # Concatenate things together for training. + rewards = np.array(rewards).T + action_sample_wts = np.array(action_sample_wts).T + executed_actions = np.array(executed_actions).T + all_state_targets = concat_state_x(state_targets, e.get_targets_name()) + all_state_features = concat_state_x(state_features, + e.get_features_name()+['step_number']) + # all_state_net = concat_state_x(net_state_to_input, + # m.train_ops['state_names']) + all_step_data_cache = concat_state_x(step_data_cache, + m.train_ops['step_data_cache']) + + dict_train = dict(input) + dict_train.update(all_state_features) + dict_train.update(all_state_targets) + # dict_train.update(all_state_net) + dict_train.update(net_state_to_input[0]) + dict_train.update(all_step_data_cache) + dict_train.update({'rewards': rewards, + 'action_sample_wts': action_sample_wts, + 'executed_actions': executed_actions}) + feed_dict = prepare_feed_dict(m.input_tensors['train'], dict_train) + for x in m.train_ops['step_data_cache']: + feed_dict[x] = all_step_data_cache[x] + if mode == 'train': + n_step = sess.run(global_step) + + if np.mod(n_step, train_display_interval) == 0: + total_loss, np_global_step, summary, print_summary = sess.run( + [train_op, global_step, s_ops.summary_ops, s_ops.print_summary_ops], + feed_dict=feed_dict) + logging.error("") + else: + total_loss, np_global_step, summary = sess.run( + [train_op, global_step, s_ops.summary_ops], feed_dict=feed_dict) + + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + should_stop = sess.run(m.should_stop_op) + + if mode != 'train': + arop = [[] for j in range(len(s_ops.additional_return_ops))] + for j in range(len(s_ops.additional_return_ops)): + if s_ops.arop_summary_iters[j] < 0 or i < s_ops.arop_summary_iters[j]: + arop[j] = s_ops.additional_return_ops[j] + val = sess.run(arop, feed_dict=feed_dict) + val_additional_ops.append(val) + tt.toc(log_at=60, log_str='val timer {:d} / {:d}: '.format(i, iters), + type='time') + + if mode != 'train': + # Write the default val summaries. + summary, print_summary, np_global_step = sess.run( + [s_ops.summary_ops, s_ops.print_summary_ops, global_step]) + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + # write custom validation ops + val_summarys = [] + val_additional_ops = zip(*val_additional_ops) + if len(s_ops.arop_eval_fns) > 0: + val_metric_summary = tf.summary.Summary() + for i in range(len(s_ops.arop_eval_fns)): + val_summary = None + if s_ops.arop_eval_fns[i] is not None: + val_summary = s_ops.arop_eval_fns[i](val_additional_ops[i], + np_global_step, logdir, + val_metric_summary, + s_ops.arop_summary_iters[i]) + val_summarys.append(val_summary) + if writer is not None: + writer.add_summary(val_metric_summary, np_global_step) + + # Return the additional val_ops + total_loss = (val_additional_ops, val_summarys) + should_stop = None + + return total_loss, should_stop + +def train_step_custom_v2(sess, train_op, global_step, train_step_kwargs, + mode='train'): + m = train_step_kwargs['m'] + obj = train_step_kwargs['obj'] + rng = train_step_kwargs['rng'] + writer = train_step_kwargs['writer'] + iters = train_step_kwargs['iters'] + logdir = train_step_kwargs['logdir'] + train_display_interval = train_step_kwargs['train_display_interval'] + + s_ops = m.summary_ops[mode] + val_additional_ops = [] + + # Print all variables here. + if False: + v = tf.get_collection(tf.GraphKeys.VARIABLES) + v_op = [_.value() for _ in v] + v_op_value = sess.run(v_op) + + filter = lambda x, y: 'Adam' in x.name + # filter = lambda x, y: np.is_any_nan(y) + ind = [i for i, (_, __) in enumerate(zip(v, v_op_value)) if filter(_, __)] + v = [v[i] for i in ind] + v_op_value = [v_op_value[i] for i in ind] + + for i in range(len(v)): + logging.info('XXXX: variable: %30s, is_any_nan: %5s, norm: %f.', + v[i].name, np.any(np.isnan(v_op_value[i])), + np.linalg.norm(v_op_value[i])) + + tt = utils.Timer() + for i in range(iters): + tt.tic() + e = obj.sample_env(rng) + rngs = e.gen_rng(rng) + input_data = e.gen_data(*rngs) + input_data = e.pre_data(input_data) + feed_dict = prepare_feed_dict(m.input_tensors, input_data) + + if mode == 'train': + n_step = sess.run(global_step) + + if np.mod(n_step, train_display_interval) == 0: + total_loss, np_global_step, summary, print_summary = sess.run( + [train_op, global_step, s_ops.summary_ops, s_ops.print_summary_ops], + feed_dict=feed_dict) + else: + total_loss, np_global_step, summary = sess.run( + [train_op, global_step, s_ops.summary_ops], + feed_dict=feed_dict) + + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + should_stop = sess.run(m.should_stop_op) + + if mode != 'train': + arop = [[] for j in range(len(s_ops.additional_return_ops))] + for j in range(len(s_ops.additional_return_ops)): + if s_ops.arop_summary_iters[j] < 0 or i < s_ops.arop_summary_iters[j]: + arop[j] = s_ops.additional_return_ops[j] + val = sess.run(arop, feed_dict=feed_dict) + val_additional_ops.append(val) + tt.toc(log_at=60, log_str='val timer {:d} / {:d}: '.format(i, iters), + type='time') + + if mode != 'train': + # Write the default val summaries. + summary, print_summary, np_global_step = sess.run( + [s_ops.summary_ops, s_ops.print_summary_ops, global_step]) + if writer is not None and summary is not None: + writer.add_summary(summary, np_global_step) + + # write custom validation ops + val_summarys = [] + val_additional_ops = zip(*val_additional_ops) + if len(s_ops.arop_eval_fns) > 0: + val_metric_summary = tf.summary.Summary() + for i in range(len(s_ops.arop_eval_fns)): + val_summary = None + if s_ops.arop_eval_fns[i] is not None: + val_summary = s_ops.arop_eval_fns[i](val_additional_ops[i], + np_global_step, logdir, + val_metric_summary, + s_ops.arop_summary_iters[i]) + val_summarys.append(val_summary) + if writer is not None: + writer.add_summary(val_metric_summary, np_global_step) + + # Return the additional val_ops + total_loss = (val_additional_ops, val_summarys) + should_stop = None + + return total_loss, should_stop + +def train_step_custom(sess, train_op, global_step, train_step_kwargs, + mode='train'): + m = train_step_kwargs['m'] + params = train_step_kwargs['params'] + rng = train_step_kwargs['rng'] + writer = train_step_kwargs['writer'] + iters = train_step_kwargs['iters'] + gen_rng = train_step_kwargs['gen_rng'] + logdir = train_step_kwargs['logdir'] + gen_data = train_step_kwargs['gen_data'] + pre_data = train_step_kwargs['pre_data'] + train_display_interval = train_step_kwargs['train_display_interval'] + + val_additional_ops = [] + # Print all variables here. + if False: + v = tf.get_collection(tf.GraphKeys.VARIABLES) + for _ in v: + val = sess.run(_.value()) + logging.info('variable: %30s, is_any_nan: %5s, norm: %f.', _.name, + np.any(np.isnan(val)), np.linalg.norm(val)) + + for i in range(iters): + rngs = gen_rng(params, rng) + input_data = gen_data(params, *rngs) + input_data = pre_data(params, input_data) + feed_dict = prepare_feed_dict(m.input_tensors, input_data) + + if mode == 'train': + n_step = sess.run(global_step) + + if np.mod(n_step, train_display_interval) == 0: + total_loss, np_global_step, summary, print_summary = sess.run( + [train_op, global_step, m.summary_op[mode], m.print_summary_op[mode]], + feed_dict=feed_dict) + else: + total_loss, np_global_step, summary = sess.run( + [train_op, global_step, m.summary_op[mode]], + feed_dict=feed_dict) + + if writer is not None: + writer.add_summary(summary, np_global_step) + + should_stop = sess.run(m.should_stop_op) + + if mode == 'val': + val = sess.run(m.agg_update_op[mode] + m.additional_return_op[mode], + feed_dict=feed_dict) + val_additional_ops.append(val[len(m.agg_update_op[mode]):]) + + if mode == 'val': + summary, print_summary, np_global_step = sess.run( + [m.summary_op[mode], m.print_summary_op[mode], global_step]) + if writer is not None: + writer.add_summary(summary, np_global_step) + sess.run([m.agg_reset_op[mode]]) + + # write custom validation ops + if m.eval_metrics_fn[mode] is not None: + val_metric_summary = m.eval_metrics_fn[mode](val_additional_ops, + np_global_step, logdir) + if writer is not None: + writer.add_summary(val_metric_summary, np_global_step) + + total_loss = val_additional_ops + should_stop = None + + return total_loss, should_stop + +def setup_training(loss_op, initial_learning_rate, steps_per_decay, + learning_rate_decay, momentum, max_steps, + sync=False, adjust_lr_sync=True, + num_workers=1, replica_id=0, vars_to_optimize=None, + clip_gradient_norm=0, typ=None, momentum2=0.999, + adam_eps=1e-8): + if sync and adjust_lr_sync: + initial_learning_rate = initial_learning_rate * num_workers + max_steps = np.int(max_steps / num_workers) + steps_per_decay = np.int(steps_per_decay / num_workers) + + global_step_op = slim.get_or_create_global_step() + lr_op = tf.train.exponential_decay(initial_learning_rate, + global_step_op, steps_per_decay, learning_rate_decay, staircase=True) + if typ == 'sgd': + optimizer = tf.train.MomentumOptimizer(lr_op, momentum) + elif typ == 'adam': + optimizer = tf.train.AdamOptimizer(learning_rate=lr_op, beta1=momentum, + beta2=momentum2, epsilon=adam_eps) + + if sync: + + sync_optimizer = tf.train.SyncReplicasOptimizer(optimizer, + replicas_to_aggregate=num_workers, + replica_id=replica_id, + total_num_replicas=num_workers) + train_op = slim.learning.create_train_op(loss_op, sync_optimizer, + variables_to_train=vars_to_optimize, + clip_gradient_norm=clip_gradient_norm) + else: + sync_optimizer = None + train_op = slim.learning.create_train_op(loss_op, optimizer, + variables_to_train=vars_to_optimize, + clip_gradient_norm=clip_gradient_norm) + should_stop_op = tf.greater_equal(global_step_op, max_steps) + return lr_op, global_step_op, train_op, should_stop_op, optimizer, sync_optimizer + +def add_value_to_summary(metric_summary, tag, val, log=True, tag_str=None): + """Adds a scalar summary to the summary object. Optionally also logs to + logging.""" + new_value = metric_summary.value.add(); + new_value.tag = tag + new_value.simple_value = val + if log: + if tag_str is None: + tag_str = tag + '%f' + logging.info(tag_str, val) + +def add_scalar_summary_op(tensor, name=None, + summary_key='summaries', print_summary_key='print_summaries', prefix=''): + collections = [] + op = tf.summary.scalar(name, tensor, collections=collections) + if summary_key != print_summary_key: + tf.add_to_collection(summary_key, op) + + op = tf.Print(op, [tensor], ' {:-<25s}: '.format(name) + prefix) + tf.add_to_collection(print_summary_key, op) + return op + +def setup_inputs(inputs): + input_tensors = {} + input_shapes = {} + for (name, typ, sz) in inputs: + _ = tf.placeholder(typ, shape=sz, name=name) + input_tensors[name] = _ + input_shapes[name] = sz + return input_tensors, input_shapes + +def prepare_feed_dict(input_tensors, inputs): + feed_dict = {} + for n in input_tensors.keys(): + feed_dict[input_tensors[n]] = inputs[n].astype(input_tensors[n].dtype.as_numpy_dtype) + return feed_dict + +def simple_add_summaries(summarize_ops, summarize_names, + summary_key='summaries', + print_summary_key='print_summaries', prefix=''): + for op, name, in zip(summarize_ops, summarize_names): + add_scalar_summary_op(op, name, summary_key, print_summary_key, prefix) + + summary_op = tf.summary.merge_all(summary_key) + print_summary_op = tf.summary.merge_all(print_summary_key) + return summary_op, print_summary_op + +def add_summary_ops(m, summarize_ops, summarize_names, to_aggregate=None, + summary_key='summaries', + print_summary_key='print_summaries', prefix=''): + if type(to_aggregate) != list: + to_aggregate = [to_aggregate for _ in summarize_ops] + + # set up aggregating metrics + if np.any(to_aggregate): + agg_ops = [] + for op, name, to_agg in zip(summarize_ops, summarize_names, to_aggregate): + if to_agg: + # agg_ops.append(slim.metrics.streaming_mean(op, return_reset_op=True)) + agg_ops.append(tf.contrib.metrics.streaming_mean(op)) + # agg_ops.append(tf.contrib.metrics.streaming_mean(op, return_reset_op=True)) + else: + agg_ops.append([None, None, None]) + + # agg_values_op, agg_update_op, agg_reset_op = zip(*agg_ops) + # agg_update_op = [x for x in agg_update_op if x is not None] + # agg_reset_op = [x for x in agg_reset_op if x is not None] + agg_values_op, agg_update_op = zip(*agg_ops) + agg_update_op = [x for x in agg_update_op if x is not None] + agg_reset_op = [tf.no_op()] + else: + agg_values_op = [None for _ in to_aggregate] + agg_update_op = [tf.no_op()] + agg_reset_op = [tf.no_op()] + + for op, name, to_agg, agg_op in zip(summarize_ops, summarize_names, to_aggregate, agg_values_op): + if to_agg: + add_scalar_summary_op(agg_op, name, summary_key, print_summary_key, prefix) + else: + add_scalar_summary_op(op, name, summary_key, print_summary_key, prefix) + + summary_op = tf.summary.merge_all(summary_key) + print_summary_op = tf.summary.merge_all(print_summary_key) + return summary_op, print_summary_op, agg_update_op, agg_reset_op + + + +def accum_val_ops(outputs, names, global_step, output_dir, metric_summary, N): + """Processes the collected outputs to compute AP for action prediction. + + Args: + outputs : List of scalar ops to summarize. + names : Name of the scalar ops. + global_step : global_step. + output_dir : where to store results. + metric_summary : summary object to add summaries to. + N : number of outputs to process. + """ + outs = [] + if N >= 0: + outputs = outputs[:N] + for i in range(len(outputs[0])): + scalar = np.array(map(lambda x: x[i], outputs)) + assert(scalar.ndim == 1) + add_value_to_summary(metric_summary, names[i], np.mean(scalar), + tag_str='{:>27s}: [{:s}]: %f'.format(names[i], '')) + outs.append(np.mean(scalar)) + return outs + +def get_default_summary_ops(): + return utils.Foo(summary_ops=None, print_summary_ops=None, + additional_return_ops=[], arop_summary_iters=[], + arop_eval_fns=[]) + + +def simple_summaries(summarize_ops, summarize_names, mode, to_aggregate=False, + scope_name='summary'): + + if type(to_aggregate) != list: + to_aggregate = [to_aggregate for _ in summarize_ops] + + summary_key = '{:s}_summaries'.format(mode) + print_summary_key = '{:s}_print_summaries'.format(mode) + prefix=' [{:s}]: '.format(mode) + + # Default ops for things that dont need to be aggregated. + if not np.all(to_aggregate): + for op, name, to_agg in zip(summarize_ops, summarize_names, to_aggregate): + if not to_agg: + add_scalar_summary_op(op, name, summary_key, print_summary_key, prefix) + summary_ops = tf.summary.merge_all(summary_key) + print_summary_ops = tf.summary.merge_all(print_summary_key) + else: + summary_ops = tf.no_op() + print_summary_ops = tf.no_op() + + # Default ops for things that dont need to be aggregated. + if np.any(to_aggregate): + additional_return_ops = [[summarize_ops[i] + for i, x in enumerate(to_aggregate )if x]] + arop_summary_iters = [-1] + s_names = ['{:s}/{:s}'.format(scope_name, summarize_names[i]) + for i, x in enumerate(to_aggregate) if x] + fn = lambda outputs, global_step, output_dir, metric_summary, N: \ + accum_val_ops(outputs, s_names, global_step, output_dir, metric_summary, + N) + arop_eval_fns = [fn] + else: + additional_return_ops = [] + arop_summary_iters = [] + arop_eval_fns = [] + return summary_ops, print_summary_ops, additional_return_ops, \ + arop_summary_iters, arop_eval_fns diff --git a/cognitive_mapping_and_planning/tfcode/vision_baseline_lstm.py b/cognitive_mapping_and_planning/tfcode/vision_baseline_lstm.py new file mode 100644 index 0000000000000000000000000000000000000000..1b9d68772419a91008d0386fe2ffd62155cf699b --- /dev/null +++ b/cognitive_mapping_and_planning/tfcode/vision_baseline_lstm.py @@ -0,0 +1,533 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +import numpy as np + + +import tensorflow as tf + +from tensorflow.contrib import slim + +import logging +from tensorflow.python.platform import app +from tensorflow.python.platform import flags +from src import utils +import src.file_utils as fu +import tfcode.nav_utils as nu +from tfcode import tf_utils + +setup_train_step_kwargs = nu.default_train_step_kwargs +compute_losses_multi_or = nu.compute_losses_multi_or +get_repr_from_image = nu.get_repr_from_image + +_save_d_at_t = nu.save_d_at_t +_save_all = nu.save_all +_eval_ap = nu.eval_ap +_eval_dist = nu.eval_dist +_plot_trajectories = nu.plot_trajectories + +def lstm_online(cell_fn, num_steps, inputs, state, varscope): + # inputs is B x num_steps x C, C channels. + # state is 2 tuple with B x 1 x C1, B x 1 x C2 + # Output state is always B x 1 x C + inputs = tf.unstack(inputs, axis=1, num=num_steps) + state = tf.unstack(state, axis=1, num=1)[0] + outputs = [] + + if num_steps > 1: + varscope.reuse_variables() + + for s in range(num_steps): + output, state = cell_fn(inputs[s], state) + outputs.append(output) + outputs = tf.stack(outputs, axis=1) + state = tf.stack([state], axis=1) + return outputs, state + +def _inputs(problem, lstm_states, lstm_state_dims): + # Set up inputs. + with tf.name_scope('inputs'): + n_views = problem.n_views + + inputs = [] + inputs.append(('orig_maps', tf.float32, + (problem.batch_size, 1, None, None, 1))) + inputs.append(('goal_loc', tf.float32, + (problem.batch_size, problem.num_goals, 2))) + + # For initing LSTM. + inputs.append(('rel_goal_loc_at_start', tf.float32, + (problem.batch_size, problem.num_goals, + problem.rel_goal_loc_dim))) + common_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + inputs.append(('imgs', tf.float32, (problem.batch_size, None, n_views, + problem.img_height, problem.img_width, + problem.img_channels))) + # Goal location as a tuple of delta location and delta theta. + inputs.append(('rel_goal_loc', tf.float32, (problem.batch_size, None, + problem.rel_goal_loc_dim))) + if problem.outputs.visit_count: + inputs.append(('visit_count', tf.int32, (problem.batch_size, None, 1))) + inputs.append(('last_visit', tf.int32, (problem.batch_size, None, 1))) + + for i, (state, dim) in enumerate(zip(lstm_states, lstm_state_dims)): + inputs.append((state, tf.float32, (problem.batch_size, 1, dim))) + + if problem.outputs.egomotion: + inputs.append(('incremental_locs', tf.float32, + (problem.batch_size, None, 2))) + inputs.append(('incremental_thetas', tf.float32, + (problem.batch_size, None, 1))) + + inputs.append(('step_number', tf.int32, (1, None, 1))) + inputs.append(('node_ids', tf.int32, (problem.batch_size, None, + problem.node_ids_dim))) + inputs.append(('perturbs', tf.float32, (problem.batch_size, None, + problem.perturbs_dim))) + + # For plotting result plots + inputs.append(('loc_on_map', tf.float32, (problem.batch_size, None, 2))) + inputs.append(('gt_dist_to_goal', tf.float32, (problem.batch_size, None, 1))) + step_input_data, _ = tf_utils.setup_inputs(inputs) + + inputs = [] + inputs.append(('executed_actions', tf.int32, (problem.batch_size, None))) + inputs.append(('rewards', tf.float32, (problem.batch_size, None))) + inputs.append(('action_sample_wts', tf.float32, (problem.batch_size, None))) + inputs.append(('action', tf.int32, (problem.batch_size, None, + problem.num_actions))) + train_data, _ = tf_utils.setup_inputs(inputs) + train_data.update(step_input_data) + train_data.update(common_input_data) + return common_input_data, step_input_data, train_data + + +def _add_summaries(m, summary_mode, arop_full_summary_iters): + summarize_ops = [m.lr_op, m.global_step_op, m.sample_gt_prob_op, + m.total_loss_op, m.data_loss_op, m.reg_loss_op] + m.acc_ops + summarize_names = ['lr', 'global_step', 'sample_gt_prob_op', 'total_loss', + 'data_loss', 'reg_loss'] + \ + ['acc_{:d}'.format(i) for i in range(len(m.acc_ops))] + to_aggregate = [0, 0, 0, 1, 1, 1] + [1]*len(m.acc_ops) + + scope_name = 'summary' + with tf.name_scope(scope_name): + s_ops = nu.add_default_summaries(summary_mode, arop_full_summary_iters, + summarize_ops, summarize_names, + to_aggregate, m.action_prob_op, + m.input_tensors, scope_name=scope_name) + m.summary_ops = {summary_mode: s_ops} + +def visit_count_fc(visit_count, last_visit, embed_neurons, wt_decay, fc_dropout): + with tf.variable_scope('embed_visit_count'): + visit_count = tf.reshape(visit_count, shape=[-1]) + last_visit = tf.reshape(last_visit, shape=[-1]) + + visit_count = tf.clip_by_value(visit_count, clip_value_min=-1, + clip_value_max=15) + last_visit = tf.clip_by_value(last_visit, clip_value_min=-1, + clip_value_max=15) + visit_count = tf.one_hot(visit_count, depth=16, axis=1, dtype=tf.float32, + on_value=10., off_value=0.) + last_visit = tf.one_hot(last_visit, depth=16, axis=1, dtype=tf.float32, + on_value=10., off_value=0.) + f = tf.concat_v2([visit_count, last_visit], 1) + x, _ = tf_utils.fc_network( + f, neurons=embed_neurons, wt_decay=wt_decay, name='visit_count_embed', + offset=0, batch_norm_param=None, dropout_ratio=fc_dropout, + is_training=is_training) + return x + +def lstm_setup(name, x, batch_size, is_single_step, lstm_dim, lstm_out, + num_steps, state_input_op): + # returns state_name, state_init_op, updated_state_op, out_op + with tf.name_scope('reshape_'+name): + sh = x.get_shape().as_list() + x = tf.reshape(x, shape=[batch_size, -1, sh[-1]]) + + with tf.variable_scope(name) as varscope: + cell = tf.contrib.rnn.LSTMCell( + num_units=lstm_dim, forget_bias=1.0, state_is_tuple=False, + num_proj=lstm_out, use_peepholes=True, + initializer=tf.random_uniform_initializer(-0.01, 0.01, seed=0), + cell_clip=None, proj_clip=None) + + sh = [batch_size, 1, lstm_dim+lstm_out] + state_init_op = tf.constant(0., dtype=tf.float32, shape=sh) + + fn = lambda ns: lstm_online(cell, ns, x, state_input_op, varscope) + out_op, updated_state_op = tf.cond(is_single_step, lambda: fn(1), lambda: + fn(num_steps)) + + return name, state_init_op, updated_state_op, out_op + +def combine_setup(name, combine_type, embed_img, embed_goal, num_img_neuorons=None, + num_goal_neurons=None): + with tf.name_scope(name + '_' + combine_type): + if combine_type == 'add': + # Simple concat features from goal and image + out = embed_img + embed_goal + + elif combine_type == 'multiply': + # Multiply things together + re_embed_img = tf.reshape( + embed_img, shape=[-1, num_img_neuorons / num_goal_neurons, + num_goal_neurons]) + re_embed_goal = tf.reshape(embed_goal, shape=[-1, num_goal_neurons, 1]) + x = tf.matmul(re_embed_img, re_embed_goal, transpose_a=False, transpose_b=False) + out = slim.flatten(x) + elif combine_type == 'none' or combine_type == 'imgonly': + out = embed_img + elif combine_type == 'goalonly': + out = embed_goal + else: + logging.fatal('Undefined combine_type: %s', combine_type) + return out + + +def preprocess_egomotion(locs, thetas): + with tf.name_scope('pre_ego'): + pre_ego = tf.concat_v2([locs, tf.sin(thetas), tf.cos(thetas)], 2) + sh = pre_ego.get_shape().as_list() + pre_ego = tf.reshape(pre_ego, [-1, sh[-1]]) + return pre_ego + +def setup_to_run(m, args, is_training, batch_norm_is_training, summary_mode): + # Set up the model. + tf.set_random_seed(args.solver.seed) + task_params = args.navtask.task_params + num_steps = task_params.num_steps + num_goals = task_params.num_goals + num_actions = task_params.num_actions + num_actions_ = num_actions + + n_views = task_params.n_views + + batch_norm_is_training_op = \ + tf.placeholder_with_default(batch_norm_is_training, shape=[], + name='batch_norm_is_training_op') + # Setup the inputs + m.input_tensors = {} + lstm_states = []; lstm_state_dims = []; + state_names = []; updated_state_ops = []; init_state_ops = []; + if args.arch.lstm_output: + lstm_states += ['lstm_output'] + lstm_state_dims += [args.arch.lstm_output_dim+task_params.num_actions] + if args.arch.lstm_ego: + lstm_states += ['lstm_ego'] + lstm_state_dims += [args.arch.lstm_ego_dim + args.arch.lstm_ego_out] + lstm_states += ['lstm_img'] + lstm_state_dims += [args.arch.lstm_img_dim + args.arch.lstm_img_out] + elif args.arch.lstm_img: + # An LSTM only on the image + lstm_states += ['lstm_img'] + lstm_state_dims += [args.arch.lstm_img_dim + args.arch.lstm_img_out] + else: + # No LSTMs involved here. + None + + m.input_tensors['common'], m.input_tensors['step'], m.input_tensors['train'] = \ + _inputs(task_params, lstm_states, lstm_state_dims) + + with tf.name_scope('check_size'): + is_single_step = tf.equal(tf.unstack(tf.shape(m.input_tensors['step']['imgs']), + num=6)[1], 1) + + images_reshaped = tf.reshape(m.input_tensors['step']['imgs'], + shape=[-1, task_params.img_height, task_params.img_width, + task_params.img_channels], name='re_image') + + rel_goal_loc_reshaped = tf.reshape(m.input_tensors['step']['rel_goal_loc'], + shape=[-1, task_params.rel_goal_loc_dim], name='re_rel_goal_loc') + + x, vars_ = get_repr_from_image( + images_reshaped, task_params.modalities, task_params.data_augment, + args.arch.encoder, args.solver.freeze_conv, args.solver.wt_decay, + is_training) + + # Reshape into nice things so that these can be accumulated over time steps + # for faster backprop. + sh_before = x.get_shape().as_list() + m.encoder_output = tf.reshape( + x, shape=[task_params.batch_size, -1, n_views] + sh_before[1:]) + x = tf.reshape(m.encoder_output, shape=[-1] + sh_before[1:]) + + # Add a layer to reduce dimensions for a fc layer. + if args.arch.dim_reduce_neurons > 0: + ks = 1; neurons = args.arch.dim_reduce_neurons; + init_var = np.sqrt(2.0/(ks**2)/neurons) + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.conv_feat = slim.conv2d( + x, neurons, kernel_size=ks, stride=1, normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_param, padding='SAME', scope='dim_reduce', + weights_regularizer=slim.l2_regularizer(args.solver.wt_decay), + weights_initializer=tf.random_normal_initializer(stddev=init_var)) + reshape_conv_feat = slim.flatten(m.conv_feat) + sh = reshape_conv_feat.get_shape().as_list() + m.reshape_conv_feat = tf.reshape(reshape_conv_feat, + shape=[-1, sh[1]*n_views]) + + # Restore these from a checkpoint. + if args.solver.pretrained_path is not None: + m.init_fn = slim.assign_from_checkpoint_fn(args.solver.pretrained_path, + vars_) + else: + m.init_fn = None + + # Hit the goal_location with a bunch of fully connected layers, to embed it + # into some space. + with tf.variable_scope('embed_goal'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_goal, _ = tf_utils.fc_network( + rel_goal_loc_reshaped, neurons=args.arch.goal_embed_neurons, + wt_decay=args.solver.wt_decay, name='goal_embed', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout, + is_training=is_training) + + if args.arch.embed_goal_for_state: + with tf.variable_scope('embed_goal_for_state'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_goal_for_state, _ = tf_utils.fc_network( + m.input_tensors['common']['rel_goal_loc_at_start'][:,0,:], + neurons=args.arch.goal_embed_neurons, wt_decay=args.solver.wt_decay, + name='goal_embed', offset=0, batch_norm_param=batch_norm_param, + dropout_ratio=args.arch.fc_dropout, is_training=is_training) + + # Hit the goal_location with a bunch of fully connected layers, to embed it + # into some space. + with tf.variable_scope('embed_img'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_img, _ = tf_utils.fc_network( + m.reshape_conv_feat, neurons=args.arch.img_embed_neurons, + wt_decay=args.solver.wt_decay, name='img_embed', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout, + is_training=is_training) + + # For lstm_ego, and lstm_image, embed the ego motion, accumulate it into an + # LSTM, combine with image features and accumulate those in an LSTM. Finally + # combine what you get from the image LSTM with the goal to output an action. + if args.arch.lstm_ego: + ego_reshaped = preprocess_egomotion(m.input_tensors['step']['incremental_locs'], + m.input_tensors['step']['incremental_thetas']) + with tf.variable_scope('embed_ego'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + m.embed_ego, _ = tf_utils.fc_network( + ego_reshaped, neurons=args.arch.ego_embed_neurons, + wt_decay=args.solver.wt_decay, name='ego_embed', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout, + is_training=is_training) + + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + 'lstm_ego', m.embed_ego, task_params.batch_size, is_single_step, + args.arch.lstm_ego_dim, args.arch.lstm_ego_out, num_steps*num_goals, + m.input_tensors['step']['lstm_ego']) + state_names += [state_name] + init_state_ops += [state_init_op] + updated_state_ops += [updated_state_op] + + # Combine the output with the vision features. + m.img_ego_op = combine_setup('img_ego', args.arch.combine_type_ego, + m.embed_img, out_op, + args.arch.img_embed_neurons[-1], + args.arch.lstm_ego_out) + + # LSTM on these vision features. + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + 'lstm_img', m.img_ego_op, task_params.batch_size, is_single_step, + args.arch.lstm_img_dim, args.arch.lstm_img_out, num_steps*num_goals, + m.input_tensors['step']['lstm_img']) + state_names += [state_name] + init_state_ops += [state_init_op] + updated_state_ops += [updated_state_op] + + m.img_for_goal = out_op + num_img_for_goal_neurons = args.arch.lstm_img_out + + elif args.arch.lstm_img: + # LSTM on just the image features. + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + 'lstm_img', m.embed_img, task_params.batch_size, is_single_step, + args.arch.lstm_img_dim, args.arch.lstm_img_out, num_steps*num_goals, + m.input_tensors['step']['lstm_img']) + state_names += [state_name] + init_state_ops += [state_init_op] + updated_state_ops += [updated_state_op] + m.img_for_goal = out_op + num_img_for_goal_neurons = args.arch.lstm_img_out + + else: + m.img_for_goal = m.embed_img + num_img_for_goal_neurons = args.arch.img_embed_neurons[-1] + + + if args.arch.use_visit_count: + m.embed_visit_count = visit_count_fc( + m.input_tensors['step']['visit_count'], + m.input_tensors['step']['last_visit'], args.arch.goal_embed_neurons, + args.solver.wt_decay, args.arch.fc_dropout, is_training=is_training) + m.embed_goal = m.embed_goal + m.embed_visit_count + + m.combined_f = combine_setup('img_goal', args.arch.combine_type, + m.img_for_goal, m.embed_goal, + num_img_for_goal_neurons, + args.arch.goal_embed_neurons[-1]) + + # LSTM on the combined representation. + if args.arch.lstm_output: + name = 'lstm_output' + # A few fully connected layers here. + with tf.variable_scope('action_pred'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + x, _ = tf_utils.fc_network( + m.combined_f, neurons=args.arch.pred_neurons, + wt_decay=args.solver.wt_decay, name='pred', offset=0, + batch_norm_param=batch_norm_param, dropout_ratio=args.arch.fc_dropout) + + if args.arch.lstm_output_init_state_from_goal: + # Use the goal embedding to initialize the LSTM state. + # UGLY CLUGGY HACK: if this is doing computation for a single time step + # then this will not involve back prop, so we can use the state input from + # the feed dict, otherwise we compute the state representation from the + # goal and feed that in. Necessary for using goal location to generate the + # state representation. + m.embed_goal_for_state = tf.expand_dims(m.embed_goal_for_state, dim=1) + state_op = tf.cond(is_single_step, lambda: m.input_tensors['step'][name], + lambda: m.embed_goal_for_state) + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + name, x, task_params.batch_size, is_single_step, + args.arch.lstm_output_dim, + num_actions_, + num_steps*num_goals, state_op) + init_state_ops += [m.embed_goal_for_state] + else: + state_op = m.input_tensors['step'][name] + state_name, state_init_op, updated_state_op, out_op = lstm_setup( + name, x, task_params.batch_size, is_single_step, + args.arch.lstm_output_dim, + num_actions_, num_steps*num_goals, state_op) + init_state_ops += [state_init_op] + + state_names += [state_name] + updated_state_ops += [updated_state_op] + + out_op = tf.reshape(out_op, shape=[-1, num_actions_]) + if num_actions_ > num_actions: + m.action_logits_op = out_op[:,:num_actions] + m.baseline_op = out_op[:,num_actions:] + else: + m.action_logits_op = out_op + m.baseline_op = None + m.action_prob_op = tf.nn.softmax(m.action_logits_op) + + else: + # A few fully connected layers here. + with tf.variable_scope('action_pred'): + batch_norm_param = args.arch.batch_norm_param + batch_norm_param['is_training'] = batch_norm_is_training_op + out_op, _ = tf_utils.fc_network( + m.combined_f, neurons=args.arch.pred_neurons, + wt_decay=args.solver.wt_decay, name='pred', offset=0, + num_pred=num_actions_, + batch_norm_param=batch_norm_param, + dropout_ratio=args.arch.fc_dropout, is_training=is_training) + if num_actions_ > num_actions: + m.action_logits_op = out_op[:,:num_actions] + m.baseline_op = out_op[:,num_actions:] + else: + m.action_logits_op = out_op + m.baseline_op = None + m.action_prob_op = tf.nn.softmax(m.action_logits_op) + + m.train_ops = {} + m.train_ops['step'] = m.action_prob_op + m.train_ops['common'] = [m.input_tensors['common']['orig_maps'], + m.input_tensors['common']['goal_loc'], + m.input_tensors['common']['rel_goal_loc_at_start']] + m.train_ops['state_names'] = state_names + m.train_ops['init_state'] = init_state_ops + m.train_ops['updated_state'] = updated_state_ops + m.train_ops['batch_norm_is_training_op'] = batch_norm_is_training_op + + # Flat list of ops which cache the step data. + m.train_ops['step_data_cache'] = [tf.no_op()] + + if args.solver.freeze_conv: + m.train_ops['step_data_cache'] = [m.encoder_output] + else: + m.train_ops['step_data_cache'] = [] + + ewma_decay = 0.99 if is_training else 0.0 + weight = tf.ones_like(m.input_tensors['train']['action'], dtype=tf.float32, + name='weight') + + m.reg_loss_op, m.data_loss_op, m.total_loss_op, m.acc_ops = \ + compute_losses_multi_or( + m.action_logits_op, m.input_tensors['train']['action'], + weights=weight, num_actions=num_actions, + data_loss_wt=args.solver.data_loss_wt, + reg_loss_wt=args.solver.reg_loss_wt, ewma_decay=ewma_decay) + + + if args.solver.freeze_conv: + vars_to_optimize = list(set(tf.trainable_variables()) - set(vars_)) + else: + vars_to_optimize = None + + m.lr_op, m.global_step_op, m.train_op, m.should_stop_op, m.optimizer, \ + m.sync_optimizer = tf_utils.setup_training( + m.total_loss_op, + args.solver.initial_learning_rate, + args.solver.steps_per_decay, + args.solver.learning_rate_decay, + args.solver.momentum, + args.solver.max_steps, + args.solver.sync, + args.solver.adjust_lr_sync, + args.solver.num_workers, + args.solver.task, + vars_to_optimize=vars_to_optimize, + clip_gradient_norm=args.solver.clip_gradient_norm, + typ=args.solver.typ, momentum2=args.solver.momentum2, + adam_eps=args.solver.adam_eps) + + + if args.arch.sample_gt_prob_type == 'inverse_sigmoid_decay': + m.sample_gt_prob_op = tf_utils.inverse_sigmoid_decay(args.arch.isd_k, + m.global_step_op) + elif args.arch.sample_gt_prob_type == 'zero': + m.sample_gt_prob_op = tf.constant(-1.0, dtype=tf.float32) + elif args.arch.sample_gt_prob_type.split('_')[0] == 'step': + step = int(args.arch.sample_gt_prob_type.split('_')[1]) + m.sample_gt_prob_op = tf_utils.step_gt_prob( + step, m.input_tensors['step']['step_number'][0,0,0]) + + m.sample_action_type = args.arch.action_sample_type + m.sample_action_combine_type = args.arch.action_sample_combine_type + _add_summaries(m, summary_mode, args.summary.arop_full_summary_iters) + + m.init_op = tf.group(tf.global_variables_initializer(), + tf.local_variables_initializer()) + m.saver_op = tf.train.Saver(keep_checkpoint_every_n_hours=4, + write_version=tf.train.SaverDef.V2) + + return m diff --git a/compression/README.md b/compression/README.md index 4b95961b2d20af864cc91d192e9365dbf2c2625f..2ae52f6fc013fe0551d002ad29fb29b56586850e 100644 --- a/compression/README.md +++ b/compression/README.md @@ -1,107 +1,14 @@ -# Image Compression with Neural Networks +# Compression with Neural Networks -This is a [TensorFlow](http://www.tensorflow.org/) model for compressing and -decompressing images using an already trained Residual GRU model as descibed -in [Full Resolution Image Compression with Recurrent Neural Networks] -(https://arxiv.org/abs/1608.05148). Please consult the paper for more details -on the architecture and compression results. +This is a [TensorFlow](http://www.tensorflow.org/) model repo containing +research on compression with neural networks. This repo currently contains +code for the following papers: -This code will allow you to perform the lossy compression on an model -already trained on compression. This code doesn't not currently contain the -Entropy Coding portions of our paper. - - -## Prerequisites -The only software requirements for running the encoder and decoder is having -Tensorflow installed. You will also need to [download] -(http://download.tensorflow.org/models/compression_residual_gru-2016-08-23.tar.gz) -and extract the model residual_gru.pb. - -If you want to generate the perceptual similarity under MS-SSIM, you will also -need to [Install SciPy](https://www.scipy.org/install.html). - -## Encoding -The Residual GRU network is fully convolutional, but requires the images -height and width in pixels by a multiple of 32. There is an image in this folder -called example.png that is 768x1024 if one is needed for testing. We also -rely on TensorFlow's built in decoding ops, which support only PNG and JPEG at -time of release. - -To encode an image, simply run the following command: - -`python encoder.py --input_image=/your/image/here.png ---output_codes=output_codes.npz --iteration=15 ---model=/path/to/model/residual_gru.pb -` - -The iteration parameter specifies the lossy-quality to target for compression. -The quality can be [0-15], where 0 corresponds to a target of 1/8 (bits per -pixel) bpp and every increment results in an additional 1/8 bpp. - -| Iteration | BPP | Compression Ratio | -|---: |---: |---: | -|0 | 0.125 | 192:1| -|1 | 0.250 | 96:1| -|2 | 0.375 | 64:1| -|3 | 0.500 | 48:1| -|4 | 0.625 | 38.4:1| -|5 | 0.750 | 32:1| -|6 | 0.875 | 27.4:1| -|7 | 1.000 | 24:1| -|8 | 1.125 | 21.3:1| -|9 | 1.250 | 19.2:1| -|10 | 1.375 | 17.4:1| -|11 | 1.500 | 16:1| -|12 | 1.625 | 14.7:1| -|13 | 1.750 | 13.7:1| -|14 | 1.875 | 12.8:1| -|15 | 2.000 | 12:1| - -The output_codes file contains the numpy shape and a flattened, bit-packed -array of the codes. These can be inspected in python by using numpy.load(). - - -## Decoding -After generating codes for an image, the lossy reconstructions for that image -can be done as follows: - -`python decoder.py --input_codes=codes.npz --output_directory=/tmp/decoded/ ---model=residual_gru.pb` - -The output_directory will contain images decoded at each quality level. - - -## Comparing Similarity -One of our primary metrics for comparing how similar two images are -is MS-SSIM. - -To generate these metrics on your images you can run: -`python msssim.py --original_image=/path/to/your/image.png ---compared_image=/tmp/decoded/image_15.png` - - -## Results -CSV results containing the post-entropy bitrates and MS-SSIM over Kodak can -are available for reference. Each row of the CSV represents each of the Kodak -images in their dataset number (1-24). Each column of the CSV represents each -iteration of the model (1-16). - -[Post Entropy Bitrates](https://storage.googleapis.com/compression-ml/residual_gru_results/bitrate.csv) - -[MS-SSIM](https://storage.googleapis.com/compression-ml/residual_gru_results/msssim.csv) - - -## FAQ - -#### How do I train my own compression network? -We currently don't provide the code to build and train a compression -graph from scratch. - -#### I get an InvalidArgumentError: Incompatible shapes. -This is usually due to the fact that our network only supports images that are -both height and width divisible by 32 pixel. Try padding your images to 32 -pixel boundaries. +[Full Resolution Image Compression with Recurrent Neural Networks](https://arxiv.org/abs/1608.05148) +## Organization +[Image Encoder](image_encoder/): Encoding and decoding images into their binary representation. +[Entropy Coder](entropy_coder/): Lossless compression of the binary representation. ## Contact Info Model repository maintained by Nick Johnston ([nickj-google](https://github.com/nickj-google)). diff --git a/compression/entropy_coder/README.md b/compression/entropy_coder/README.md new file mode 100644 index 0000000000000000000000000000000000000000..59e889990aab71e12ed13122c9b5a796a048402a --- /dev/null +++ b/compression/entropy_coder/README.md @@ -0,0 +1,109 @@ +# Neural net based entropy coding + +This is a [TensorFlow](http://www.tensorflow.org/) model for additional +lossless compression of bitstreams generated by neural net based image +encoders as described in +[https://arxiv.org/abs/1703.10114](https://arxiv.org/abs/1703.10114). + +To be more specific, the entropy coder aims at compressing further binary +codes which have a 3D tensor structure with: + +* the first two dimensions of the tensors corresponding to the height and +the width of the binary codes, +* the last dimension being the depth of the codes. The last dimension can be +sliced into N groups of K, where each additional group is used by the image +decoder to add more details to the reconstructed image. + +The code in this directory only contains the underlying code probability model +but does not perform the actual compression using arithmetic coding. +The code probability model is enough to compute the theoretical compression +ratio. + + +## Prerequisites +The only software requirements for running the encoder and decoder is having +Tensorflow installed. + +You will also need to add the top level source directory of the entropy coder +to your `PYTHONPATH`, for example: + +`export PYTHONPATH=${PYTHONPATH}:/tmp/models/compression` + + +## Training the entropy coder + +### Synthetic dataset +If you do not have a training dataset, there is a simple code generative model +that you can use to generate a dataset and play with the entropy coder. +The generative model is located under dataset/gen\_synthetic\_dataset.py. Note +that this simple generative model is not going to give good results on real +images as it is not supposed to be close to the statistics of the binary +representation of encoded images. Consider it as a toy dataset, no more, no +less. + +To generate a synthetic dataset with 20000 samples: + +`mkdir -p /tmp/dataset` + +`python ./dataset/gen_synthetic_dataset.py --dataset_dir=/tmp/dataset/ +--count=20000` + +Note that the generator has not been optimized at all, generating the synthetic +dataset is currently pretty slow. + +### Training + +If you just want to play with the entropy coder trainer, here is the command +line that can be used to train the entropy coder on the synthetic dataset: + +`mkdir -p /tmp/entropy_coder_train` + +`python ./core/entropy_coder_train.py --task=0 +--train_dir=/tmp/entropy_coder_train/ +--model=progressive +--model_config=./configs/synthetic/model_config.json +--train_config=./configs/synthetic/train_config.json +--input_config=./configs/synthetic/input_config.json +` + +Training is configured using 3 files formatted using JSON: + +* One file is used to configure the underlying entropy coder model. + Currently, only the *progressive* model is supported. + This model takes 2 mandatory parameters and an optional one: + * `layer_depth`: the number of bits per layer (a.k.a. iteration). + Background: the image decoder takes each layer to add more detail + to the image. + * `layer_count`: the maximum number of layers that should be supported + by the model. This should be equal or greater than the maximum number + of layers in the input binary codes. + * `coded_layer_count`: This can be used to consider only partial codes, + keeping only the first `coded_layer_count` layers and ignoring the + remaining layers. If left empty, the binary codes are left unchanged. +* One file to configure the training, including the learning rate, ... + The meaning of the parameters are pretty straightforward. Note that this + file is only used during training and is not needed during inference. +* One file to specify the input dataset to use during training. + The dataset is formatted using tf.RecordIO. + + +## Inference: file size after entropy coding. + +### Using a synthetic sample + +Here is the command line to generate a single synthetic sample formatted +in the same way as what is provided by the image encoder: + +`python ./dataset/gen_synthetic_single.py +--sample_filename=/tmp/dataset/sample_0000.npz` + +To actually compute the additional compression ratio using the entropy coder +trained in the previous step: + +`python ./core/entropy_coder_single.py +--model=progressive +--model_config=./configs/synthetic/model_config.json +--input_codes=/tmp/dataset/sample_0000.npz +--checkpoint=/tmp/entropy_coder_train/model.ckpt-209078` + +where the checkpoint number should be adjusted accordingly. diff --git a/compression/entropy_coder/__init__.py b/compression/entropy_coder/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/all_models/__init__.py b/compression/entropy_coder/all_models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/tutorials/rnn/rnn_cell.py b/compression/entropy_coder/all_models/all_models.py similarity index 67% rename from tutorials/rnn/rnn_cell.py rename to compression/entropy_coder/all_models/all_models.py index 47beb5e5a9be76e398cd1b605264111427173862..e376dac737667a348065eec622920b0a81ed1ac9 100644 --- a/tutorials/rnn/rnn_cell.py +++ b/compression/entropy_coder/all_models/all_models.py @@ -1,4 +1,4 @@ -# Copyright 2015 The TensorFlow Authors. All Rights Reserved. +# Copyright 2017 The TensorFlow Authors All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,10 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""Import rnn_cell python ops for backward compatibility.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function +"""Import and register all the entropy coder models.""" -raise ImportError("This module is deprecated. Use tf.contrib.rnn instead.") +# pylint: disable=unused-import +from entropy_coder.progressive import progressive diff --git a/compression/entropy_coder/all_models/all_models_test.py b/compression/entropy_coder/all_models/all_models_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b8aff504a0a00d579d1b2768164b78b6c095b235 --- /dev/null +++ b/compression/entropy_coder/all_models/all_models_test.py @@ -0,0 +1,68 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Basic test of all registered models.""" + +import tensorflow as tf + +# pylint: disable=unused-import +import all_models +# pylint: enable=unused-import +from entropy_coder.model import model_factory + + +class AllModelsTest(tf.test.TestCase): + + def testBuildModelForTraining(self): + factory = model_factory.GetModelRegistry() + model_names = factory.GetAvailableModels() + + for m in model_names: + tf.reset_default_graph() + + global_step = tf.Variable(tf.zeros([], dtype=tf.int64), + trainable=False, + name='global_step') + + optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) + + batch_size = 3 + height = 40 + width = 20 + depth = 5 + binary_codes = tf.placeholder(dtype=tf.float32, + shape=[batch_size, height, width, depth]) + + # Create a model with the default configuration. + print('Creating model: {}'.format(m)) + model = factory.CreateModel(m) + model.Initialize(global_step, + optimizer, + model.GetConfigStringForUnitTest()) + self.assertTrue(model.loss is None, 'model: {}'.format(m)) + self.assertTrue(model.train_op is None, 'model: {}'.format(m)) + self.assertTrue(model.average_code_length is None, 'model: {}'.format(m)) + + # Build the Tensorflow graph corresponding to the model. + model.BuildGraph(binary_codes) + self.assertTrue(model.loss is not None, 'model: {}'.format(m)) + self.assertTrue(model.average_code_length is not None, + 'model: {}'.format(m)) + if model.train_op is None: + print('Model {} is not trainable'.format(m)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/configs/gru_prime3/model_config.json b/compression/entropy_coder/configs/gru_prime3/model_config.json new file mode 100644 index 0000000000000000000000000000000000000000..cf63a4c454df5c47c732c5eaeea481b2aa714665 --- /dev/null +++ b/compression/entropy_coder/configs/gru_prime3/model_config.json @@ -0,0 +1,4 @@ +{ + "layer_count": 16, + "layer_depth": 32 +} diff --git a/compression/entropy_coder/configs/synthetic/input_config.json b/compression/entropy_coder/configs/synthetic/input_config.json new file mode 100644 index 0000000000000000000000000000000000000000..18455e65120cd45cb04106ed8b6b2d6641e1d49a --- /dev/null +++ b/compression/entropy_coder/configs/synthetic/input_config.json @@ -0,0 +1,4 @@ +{ + "data": "/tmp/dataset/synthetic_dataset", + "unique_code_size": true +} diff --git a/compression/entropy_coder/configs/synthetic/model_config.json b/compression/entropy_coder/configs/synthetic/model_config.json new file mode 100644 index 0000000000000000000000000000000000000000..c6f1f3e11547a75c05019e24c59a7fc6d2a29e3b --- /dev/null +++ b/compression/entropy_coder/configs/synthetic/model_config.json @@ -0,0 +1,4 @@ +{ + "layer_depth": 2, + "layer_count": 8 +} diff --git a/compression/entropy_coder/configs/synthetic/train_config.json b/compression/entropy_coder/configs/synthetic/train_config.json new file mode 100644 index 0000000000000000000000000000000000000000..79e4909fd3f93df983d79890e25b7b61ba14aa40 --- /dev/null +++ b/compression/entropy_coder/configs/synthetic/train_config.json @@ -0,0 +1,6 @@ +{ + "batch_size": 4, + "learning_rate": 0.1, + "decay_rate": 0.9, + "samples_per_decay": 20000 +} diff --git a/compression/entropy_coder/core/code_loader.py b/compression/entropy_coder/core/code_loader.py new file mode 100644 index 0000000000000000000000000000000000000000..603ab724afb0e6c4e94db9c121d7799eaf30fa02 --- /dev/null +++ b/compression/entropy_coder/core/code_loader.py @@ -0,0 +1,73 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Load binary codes stored as tf.Example in a TFRecord table.""" + +import tensorflow as tf + + +def ReadFirstCode(dataset): + """Read the first example from a binary code RecordIO table.""" + for record in tf.python_io.tf_record_iterator(dataset): + tf_example = tf.train.Example() + tf_example.ParseFromString(record) + break + return tf_example + + +def LoadBinaryCode(input_config, batch_size): + """Load a batch of binary codes from a tf.Example dataset. + + Args: + input_config: An InputConfig proto containing the input configuration. + batch_size: Output batch size of examples. + + Returns: + A batched tensor of binary codes. + """ + data = input_config.data + + # TODO: Possibly use multiple files (instead of just one). + file_list = [data] + filename_queue = tf.train.string_input_producer(file_list, + capacity=4) + reader = tf.TFRecordReader() + _, values = reader.read(filename_queue) + + serialized_example = tf.reshape(values, shape=[1]) + serialized_features = { + 'code_shape': tf.FixedLenFeature([3], + dtype=tf.int64), + 'code': tf.VarLenFeature(tf.float32), + } + example = tf.parse_example(serialized_example, serialized_features) + + # 3D shape: height x width x binary_code_depth + z = example['code_shape'] + code_shape = tf.reshape(tf.cast(z, tf.int32), [3]) + # Un-flatten the binary codes. + code = tf.reshape(tf.sparse_tensor_to_dense(example['code']), code_shape) + + queue_size = 10 + queue = tf.PaddingFIFOQueue( + queue_size + 3 * batch_size, + dtypes=[code.dtype], + shapes=[[None, None, None]]) + enqueue_op = queue.enqueue([code]) + dequeue_code = queue.dequeue_many(batch_size) + queue_runner = tf.train.queue_runner.QueueRunner(queue, [enqueue_op]) + tf.add_to_collection(tf.GraphKeys.QUEUE_RUNNERS, queue_runner) + + return dequeue_code diff --git a/compression/entropy_coder/core/config_helper.py b/compression/entropy_coder/core/config_helper.py new file mode 100644 index 0000000000000000000000000000000000000000..a7d949e329b93f33d330d1ba494f71ae1704fa3f --- /dev/null +++ b/compression/entropy_coder/core/config_helper.py @@ -0,0 +1,52 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Helper functions used in both train and inference.""" + +import json +import os.path + +import tensorflow as tf + + +def GetConfigString(config_file): + config_string = '' + if config_file is not None: + config_string = open(config_file).read() + return config_string + + +class InputConfig(object): + + def __init__(self, config_string): + config = json.loads(config_string) + self.data = config["data"] + self.unique_code_size = config["unique_code_size"] + + +class TrainConfig(object): + + def __init__(self, config_string): + config = json.loads(config_string) + self.batch_size = config["batch_size"] + self.learning_rate = config["learning_rate"] + self.decay_rate = config["decay_rate"] + self.samples_per_decay = config["samples_per_decay"] + + +def SaveConfig(directory, filename, config_string): + path = os.path.join(directory, filename) + with tf.gfile.Open(path, mode='w') as f: + f.write(config_string) diff --git a/compression/entropy_coder/core/entropy_coder_single.py b/compression/entropy_coder/core/entropy_coder_single.py new file mode 100644 index 0000000000000000000000000000000000000000..40a1317c91c77423d2f6f1cad385f4fcbf98df8c --- /dev/null +++ b/compression/entropy_coder/core/entropy_coder_single.py @@ -0,0 +1,116 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Compute the additional compression ratio after entropy coding.""" + +import io +import os + +import numpy as np +import tensorflow as tf + +import config_helper + +# pylint: disable=unused-import +from entropy_coder.all_models import all_models +# pylint: enable=unused-import +from entropy_coder.model import model_factory + + +# Checkpoint used to restore the model parameters. +tf.app.flags.DEFINE_string('checkpoint', None, + """Model checkpoint.""") + +# Model selection and configuration. +tf.app.flags.DEFINE_string('model', None, """Underlying encoder model.""") +tf.app.flags.DEFINE_string('model_config', None, + """Model config protobuf given as text file.""") + +# File holding the binary codes. +tf.flags.DEFINE_string('input_codes', None, 'Location of binary code file.') + +FLAGS = tf.flags.FLAGS + + +def main(_): + if (FLAGS.input_codes is None or FLAGS.model is None): + print ('\nUsage: python entropy_coder_single.py --model=progressive ' + '--model_config=model_config.json' + '--iteration=15\n\n') + return + + #if FLAGS.iteration < -1 or FLAGS.iteration > 15: + # print ('\n--iteration must be between 0 and 15 inclusive, or -1 to infer ' + # 'from file.\n') + # return + #iteration = FLAGS.iteration + + if not tf.gfile.Exists(FLAGS.input_codes): + print '\nInput codes not found.\n' + return + + with tf.gfile.FastGFile(FLAGS.input_codes, 'rb') as code_file: + contents = code_file.read() + loaded_codes = np.load(io.BytesIO(contents)) + assert ['codes', 'shape'] not in loaded_codes.files + loaded_shape = loaded_codes['shape'] + loaded_array = loaded_codes['codes'] + + # Unpack and recover code shapes. + unpacked_codes = np.reshape(np.unpackbits(loaded_array) + [:np.prod(loaded_shape)], + loaded_shape) + + numpy_int_codes = unpacked_codes.transpose([1, 2, 3, 0, 4]) + numpy_int_codes = numpy_int_codes.reshape([numpy_int_codes.shape[0], + numpy_int_codes.shape[1], + numpy_int_codes.shape[2], + -1]) + numpy_codes = numpy_int_codes.astype(np.float32) * 2.0 - 1.0 + + with tf.Graph().as_default() as graph: + # TF tensor to hold the binary codes to losslessly compress. + batch_size = 1 + codes = tf.placeholder(tf.float32, shape=numpy_codes.shape) + + # Create the entropy coder model. + global_step = None + optimizer = None + model = model_factory.GetModelRegistry().CreateModel(FLAGS.model) + model_config_string = config_helper.GetConfigString(FLAGS.model_config) + model.Initialize(global_step, optimizer, model_config_string) + model.BuildGraph(codes) + + saver = tf.train.Saver(sharded=True, keep_checkpoint_every_n_hours=12.0) + + with tf.Session(graph=graph) as sess: + # Initialize local variables. + sess.run(tf.local_variables_initializer()) + + # Restore model variables. + saver.restore(sess, FLAGS.checkpoint) + + tf_tensors = { + 'code_length': model.average_code_length + } + feed_dict = {codes: numpy_codes} + np_tensors = sess.run(tf_tensors, feed_dict=feed_dict) + + print('Additional compression ratio: {}'.format( + np_tensors['code_length'])) + + +if __name__ == '__main__': + tf.app.run() diff --git a/compression/entropy_coder/core/entropy_coder_train.py b/compression/entropy_coder/core/entropy_coder_train.py new file mode 100644 index 0000000000000000000000000000000000000000..248935e3c9504e6945745d6fe97ff6dcccf0d639 --- /dev/null +++ b/compression/entropy_coder/core/entropy_coder_train.py @@ -0,0 +1,184 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Train an entropy coder model.""" + +import time + +import tensorflow as tf + +import code_loader +import config_helper + +# pylint: disable=unused-import +from entropy_coder.all_models import all_models +# pylint: enable=unused-import +from entropy_coder.model import model_factory + + +FLAGS = tf.app.flags.FLAGS + +# Hardware resources configuration. +tf.app.flags.DEFINE_string('master', '', + """Name of the TensorFlow master to use.""") +tf.app.flags.DEFINE_string('train_dir', None, + """Directory where to write event logs.""") +tf.app.flags.DEFINE_integer('task', None, + """Task id of the replica running the training.""") +tf.app.flags.DEFINE_integer('ps_tasks', 0, """Number of tasks in the ps job. + If 0 no ps job is used.""") + +# Model selection and configuration. +tf.app.flags.DEFINE_string('model', None, """Underlying encoder model.""") +tf.app.flags.DEFINE_string('model_config', None, + """Model config protobuf given as text file.""") + +# Training data and parameters configuration. +tf.app.flags.DEFINE_string('input_config', None, + """Path to the training input config file.""") +tf.app.flags.DEFINE_string('train_config', None, + """Path to the training experiment config file.""") + + +def train(): + if FLAGS.train_dir is None: + raise ValueError('Parameter train_dir must be provided') + if FLAGS.task is None: + raise ValueError('Parameter task must be provided') + if FLAGS.model is None: + raise ValueError('Parameter model must be provided') + + input_config_string = config_helper.GetConfigString(FLAGS.input_config) + input_config = config_helper.InputConfig(input_config_string) + + # Training parameters. + train_config_string = config_helper.GetConfigString(FLAGS.train_config) + train_config = config_helper.TrainConfig(train_config_string) + + batch_size = train_config.batch_size + initial_learning_rate = train_config.learning_rate + decay_rate = train_config.decay_rate + samples_per_decay = train_config.samples_per_decay + + # Parameters for learning-rate decay. + # The formula is decay_rate ** floor(steps / decay_steps). + decay_steps = samples_per_decay / batch_size + decay_steps = max(decay_steps, 1) + + first_code = code_loader.ReadFirstCode(input_config.data) + first_code_height = ( + first_code.features.feature['code_shape'].int64_list.value[0]) + first_code_width = ( + first_code.features.feature['code_shape'].int64_list.value[1]) + max_bit_depth = ( + first_code.features.feature['code_shape'].int64_list.value[2]) + print('Maximum code depth: {}'.format(max_bit_depth)) + + with tf.Graph().as_default(): + ps_ops = ["Variable", "VariableV2", "AutoReloadVariable", "VarHandleOp"] + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks, + ps_ops=ps_ops)): + codes = code_loader.LoadBinaryCode( + input_config=input_config, + batch_size=batch_size) + if input_config.unique_code_size: + print('Input code size: {} x {}'.format(first_code_height, + first_code_width)) + codes.set_shape( + [batch_size, first_code_height, first_code_width, max_bit_depth]) + else: + codes.set_shape([batch_size, None, None, max_bit_depth]) + codes_effective_shape = tf.shape(codes) + + global_step = tf.contrib.framework.create_global_step() + + # Apply learning-rate decay. + learning_rate = tf.train.exponential_decay( + learning_rate=initial_learning_rate, + global_step=global_step, + decay_steps=decay_steps, + decay_rate=decay_rate, + staircase=True) + tf.summary.scalar('Learning Rate', learning_rate) + optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, + epsilon=1.0) + + # Create the entropy coder model. + model = model_factory.GetModelRegistry().CreateModel(FLAGS.model) + model_config_string = config_helper.GetConfigString(FLAGS.model_config) + model.Initialize(global_step, optimizer, model_config_string) + model.BuildGraph(codes) + + summary_op = tf.summary.merge_all() + + # Verify that the model can actually be trained. + if model.train_op is None: + raise ValueError('Input model {} is not trainable'.format(FLAGS.model)) + + # We disable the summary thread run by Supervisor class by passing + # summary_op=None. We still pass save_summaries_secs because it is used by + # the global step counter thread. + is_chief = (FLAGS.task == 0) + sv = tf.train.Supervisor(logdir=FLAGS.train_dir, + is_chief=is_chief, + global_step=global_step, + # saver=model.saver, + summary_op=None, + save_summaries_secs=120, + save_model_secs=600, + recovery_wait_secs=30) + + sess = sv.PrepareSession(FLAGS.master) + sv.StartQueueRunners(sess) + + step = sess.run(global_step) + print('Trainer initial step: {}.'.format(step)) + + # Once everything has been setup properly, save the configs. + if is_chief: + config_helper.SaveConfig(FLAGS.train_dir, 'input_config.json', + input_config_string) + config_helper.SaveConfig(FLAGS.train_dir, 'model_config.json', + model_config_string) + config_helper.SaveConfig(FLAGS.train_dir, 'train_config.json', + train_config_string) + + # Train the model. + next_summary_time = time.time() + while not sv.ShouldStop(): + feed_dict = None + + # Once in a while, update the summaries on the chief worker. + if is_chief and next_summary_time < time.time(): + summary_str = sess.run(summary_op, feed_dict=feed_dict) + sv.SummaryComputed(sess, summary_str) + next_summary_time = time.time() + sv.save_summaries_secs + else: + tf_tensors = { + 'train': model.train_op, + 'code_length': model.average_code_length + } + np_tensors = sess.run(tf_tensors, feed_dict=feed_dict) + print np_tensors['code_length'] + + sv.Stop() + + +def main(argv=None): # pylint: disable=unused-argument + train() + + +if __name__ == '__main__': + tf.app.run() diff --git a/compression/entropy_coder/dataset/gen_synthetic_dataset.py b/compression/entropy_coder/dataset/gen_synthetic_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..aa511b530692c3f7d9e57756473f18850f632beb --- /dev/null +++ b/compression/entropy_coder/dataset/gen_synthetic_dataset.py @@ -0,0 +1,88 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generate a synthetic dataset.""" + +import os + +import numpy as np +import tensorflow as tf + +import synthetic_model + + +FLAGS = tf.app.flags.FLAGS + +tf.app.flags.DEFINE_string( + 'dataset_dir', None, + """Directory where to write the dataset and the configs.""") +tf.app.flags.DEFINE_integer( + 'count', 1000, + """Number of samples to generate.""") + + +def int64_feature(values): + """Returns a TF-Feature of int64s. + + Args: + values: A scalar or list of values. + + Returns: + A TF-Feature. + """ + if not isinstance(values, (tuple, list)): + values = [values] + return tf.train.Feature(int64_list=tf.train.Int64List(value=values)) + + +def float_feature(values): + """Returns a TF-Feature of floats. + + Args: + values: A scalar of list of values. + + Returns: + A TF-Feature. + """ + if not isinstance(values, (tuple, list)): + values = [values] + return tf.train.Feature(float_list=tf.train.FloatList(value=values)) + + +def AddToTFRecord(code, tfrecord_writer): + example = tf.train.Example(features=tf.train.Features(feature={ + 'code_shape': int64_feature(code.shape), + 'code': float_feature(code.flatten().tolist()), + })) + tfrecord_writer.write(example.SerializeToString()) + + +def GenerateDataset(filename, count, code_shape): + with tf.python_io.TFRecordWriter(filename) as tfrecord_writer: + for _ in xrange(count): + code = synthetic_model.GenerateSingleCode(code_shape) + # Convert {0,1} codes to {-1,+1} codes. + code = 2.0 * code - 1.0 + AddToTFRecord(code, tfrecord_writer) + + +def main(argv=None): # pylint: disable=unused-argument + GenerateDataset(os.path.join(FLAGS.dataset_dir + '/synthetic_dataset'), + FLAGS.count, + [35, 48, 8]) + + +if __name__ == '__main__': + tf.app.run() diff --git a/compression/entropy_coder/dataset/gen_synthetic_single.py b/compression/entropy_coder/dataset/gen_synthetic_single.py new file mode 100644 index 0000000000000000000000000000000000000000..b8c3821c38b6a0b95f01ad7ffb283cca4beb34b3 --- /dev/null +++ b/compression/entropy_coder/dataset/gen_synthetic_single.py @@ -0,0 +1,72 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Generate a single synthetic sample.""" + +import io +import os + +import numpy as np +import tensorflow as tf + +import synthetic_model + + +FLAGS = tf.app.flags.FLAGS + +tf.app.flags.DEFINE_string( + 'sample_filename', None, + """Output file to store the generated binary code.""") + + +def GenerateSample(filename, code_shape, layer_depth): + # {0, +1} binary codes. + # No conversion since the output file is expected to store + # codes using {0, +1} codes (and not {-1, +1}). + code = synthetic_model.GenerateSingleCode(code_shape) + code = np.round(code) + + # Reformat the code so as to be compatible with what is generated + # by the image encoder. + # The image encoder generates a tensor of size: + # iteration_count x batch_size x height x width x iteration_depth. + # Here: batch_size = 1 + if code_shape[-1] % layer_depth != 0: + raise ValueError('Number of layers is not an integer') + height = code_shape[0] + width = code_shape[1] + code = code.reshape([1, height, width, -1, layer_depth]) + code = np.transpose(code, [3, 0, 1, 2, 4]) + + int_codes = code.astype(np.int8) + exported_codes = np.packbits(int_codes.reshape(-1)) + + output = io.BytesIO() + np.savez_compressed(output, shape=int_codes.shape, codes=exported_codes) + with tf.gfile.FastGFile(filename, 'wb') as code_file: + code_file.write(output.getvalue()) + + +def main(argv=None): # pylint: disable=unused-argument + # Note: the height and the width is different from the training dataset. + # The main purpose is to show that the entropy coder model is fully + # convolutional and can be used on any image size. + layer_depth = 2 + GenerateSample(FLAGS.sample_filename, [31, 36, 8], layer_depth) + + +if __name__ == '__main__': + tf.app.run() + diff --git a/compression/entropy_coder/dataset/synthetic_model.py b/compression/entropy_coder/dataset/synthetic_model.py new file mode 100644 index 0000000000000000000000000000000000000000..4811208386dd9ba72df03a3b01afb90aa0ee58a5 --- /dev/null +++ b/compression/entropy_coder/dataset/synthetic_model.py @@ -0,0 +1,74 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Binary code sample generator.""" + +import numpy as np + + +_CRC_LINE = [ + [0, 1, 0], + [1, 1, 0], + [1, 0, 0] +] + +_CRC_DEPTH = [1, 1, 0, 1] + + +def ComputeLineCrc(code, width, y, x, d): + crc = 0 + for dy in xrange(len(_CRC_LINE)): + i = y - 1 - dy + if i < 0: + continue + for dx in xrange(len(_CRC_LINE[dy])): + j = x - 2 + dx + if j < 0 or j >= width: + continue + crc += 1 if (code[i, j, d] != _CRC_LINE[dy][dx]) else 0 + return crc + + +def ComputeDepthCrc(code, y, x, d): + crc = 0 + for delta in xrange(len(_CRC_DEPTH)): + k = d - 1 - delta + if k < 0: + continue + crc += 1 if (code[y, x, k] != _CRC_DEPTH[delta]) else 0 + return crc + + +def GenerateSingleCode(code_shape): + code = np.zeros(code_shape, dtype=np.int) + + keep_value_proba = 0.8 + + height = code_shape[0] + width = code_shape[1] + depth = code_shape[2] + + for d in xrange(depth): + for y in xrange(height): + for x in xrange(width): + v1 = ComputeLineCrc(code, width, y, x, d) + v2 = ComputeDepthCrc(code, y, x, d) + v = 1 if (v1 + v2 >= 6) else 0 + if np.random.rand() < keep_value_proba: + code[y, x, d] = v + else: + code[y, x, d] = 1 - v + + return code diff --git a/compression/entropy_coder/lib/__init__.py b/compression/entropy_coder/lib/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/lib/block_base.py b/compression/entropy_coder/lib/block_base.py new file mode 100644 index 0000000000000000000000000000000000000000..615dff82829dbbcab46c7217cd35f6259de01161 --- /dev/null +++ b/compression/entropy_coder/lib/block_base.py @@ -0,0 +1,258 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Base class for Tensorflow building blocks.""" + +import collections +import contextlib +import itertools + +import tensorflow as tf + +_block_stacks = collections.defaultdict(lambda: []) + + +class BlockBase(object): + """Base class for transform wrappers of Tensorflow. + + To implement a Tensorflow transform block, inherit this class. + + 1. To create a variable, use NewVar() method. Do not overload this method! + For example, use as follows. + a_variable = self.NewVar(initial_value) + + 2. All Tensorflow-related code must be done inside 'with self._BlockScope().' + Otherwise, name scoping and block hierarchy will not work. An exception + is _Apply() method, which is already called inside the context manager + by __call__() method. + + 3. Override and implement _Apply() method. This method is called by + __call__() method. + + The users would use blocks like the following. + nn1 = NN(128, bias=Bias(0), act=tf.nn.relu) + y = nn1(x) + + Some things to consider. + + - Use lazy-initialization if possible. That is, initialize at first Apply() + rather than at __init__(). + + Note: if needed, the variables can be created on a specific parameter + server by creating blocks in a scope like: + with g.device(device): + linear = Linear(...) + """ + + def __init__(self, name): + self._variables = [] + self._subblocks = [] + self._called = False + + # Intentionally distinguishing empty string and None. + # If name is an empty string, then do not use name scope. + self.name = name if name is not None else self.__class__.__name__ + self._graph = tf.get_default_graph() + + if self.name: + # Capture the scope string at the init time. + with self._graph.name_scope(self.name) as scope: + self._scope_str = scope + else: + self._scope_str = '' + + # Maintain hierarchy structure of blocks. + self._stack = _block_stacks[self._graph] + if self.__class__ is BlockBase: + # This code is only executed to create the root, which starts in the + # initialized state. + assert not self._stack + self._parent = None + self._called = True # The root is initialized. + return + + # Create a fake root if a root is not already present. + if not self._stack: + self._stack.append(BlockBase('NoOpRoot')) + + self._parent = self._stack[-1] + self._parent._subblocks.append(self) # pylint: disable=protected-access + + def __repr__(self): + return '"{}" ({})'.format(self._scope_str, self.__class__.__name__) + + @contextlib.contextmanager + def _OptionalNameScope(self, scope_str): + if scope_str: + with self._graph.name_scope(scope_str): + yield + else: + yield + + @contextlib.contextmanager + def _BlockScope(self): + """Context manager that handles graph, namescope, and nested blocks.""" + self._stack.append(self) + + try: + with self._graph.as_default(): + with self._OptionalNameScope(self._scope_str): + yield self + finally: # Pop from the stack no matter exception is raised or not. + # The following line is executed when leaving 'with self._BlockScope()' + self._stack.pop() + + def __call__(self, *args, **kwargs): + assert self._stack is _block_stacks[self._graph] + + with self._BlockScope(): + ret = self._Apply(*args, **kwargs) + + self._called = True + return ret + + def _Apply(self, *args, **kwargs): + """Implementation of __call__().""" + raise NotImplementedError() + + # Redirect all variable creation to this single function, so that we can + # switch to better variable creation scheme. + def NewVar(self, value, **kwargs): + """Creates a new variable. + + This function creates a variable, then returns a local copy created by + Identity operation. To get the Variable class object, use LookupRef() + method. + + Note that each time Variable class object is used as an input to an + operation, Tensorflow will create a new Send/Recv pair. This hurts + performance. + + If not for assign operations, use the local copy returned by this method. + + Args: + value: Initialization value of the variable. The shape and the data type + of the variable is determined by this initial value. + **kwargs: Extra named arguments passed to Variable.__init__(). + + Returns: + A local copy of the new variable. + """ + v = tf.Variable(value, **kwargs) + + self._variables.append(v) + return v + + @property + def initialized(self): + """Returns bool if the block is initialized. + + By default, BlockBase assumes that a block is initialized when __call__() + is executed for the first time. If this is an incorrect assumption for some + subclasses, override this property in those subclasses. + + Returns: + True if initialized, False otherwise. + """ + return self._called + + def AssertInitialized(self): + """Asserts initialized property.""" + if not self.initialized: + raise RuntimeError('{} has not been initialized.'.format(self)) + + def VariableList(self): + """Returns the list of all tensorflow variables used inside this block.""" + variables = list(itertools.chain( + itertools.chain.from_iterable( + t.VariableList() for t in self._subblocks), + self._VariableList())) + return variables + + def _VariableList(self): + """Returns the list of all tensorflow variables owned by this block.""" + self.AssertInitialized() + return self._variables + + def CreateWeightLoss(self): + """Returns L2 loss list of (almost) all variables used inside this block. + + When this method needs to be overridden, there are two choices. + + 1. Override CreateWeightLoss() to change the weight loss of all variables + that belong to this block, both directly and indirectly. + 2. Override _CreateWeightLoss() to change the weight loss of all + variables that directly belong to this block but not to the sub-blocks. + + Returns: + A Tensor object or None. + """ + losses = list(itertools.chain( + itertools.chain.from_iterable( + t.CreateWeightLoss() for t in self._subblocks), + self._CreateWeightLoss())) + return losses + + def _CreateWeightLoss(self): + """Returns weight loss list of variables that belong to this block.""" + self.AssertInitialized() + with self._BlockScope(): + return [tf.nn.l2_loss(v) for v in self._variables] + + def CreateUpdateOps(self): + """Creates update operations for this block and its sub-blocks.""" + ops = list(itertools.chain( + itertools.chain.from_iterable( + t.CreateUpdateOps() for t in self._subblocks), + self._CreateUpdateOps())) + return ops + + def _CreateUpdateOps(self): + """Creates update operations for this block.""" + self.AssertInitialized() + return [] + + def MarkAsNonTrainable(self): + """Mark all the variables of this block as non-trainable. + + All the variables owned directly or indirectly (through subblocks) are + marked as non trainable. + + This function along with CheckpointInitOp can be used to load a pretrained + model that consists in only one part of the whole graph. + """ + assert self._called + + all_variables = self.VariableList() + collection = tf.get_collection_ref(tf.GraphKeys.TRAINABLE_VARIABLES) + for v in all_variables: + if v in collection: + collection.remove(v) + + +def CreateWeightLoss(): + """Returns all weight losses from the blocks in the graph.""" + stack = _block_stacks[tf.get_default_graph()] + if not stack: + return [] + return stack[0].CreateWeightLoss() + + +def CreateBlockUpdates(): + """Combines all updates from the blocks in the graph.""" + stack = _block_stacks[tf.get_default_graph()] + if not stack: + return [] + return stack[0].CreateUpdateOps() diff --git a/compression/entropy_coder/lib/block_util.py b/compression/entropy_coder/lib/block_util.py new file mode 100644 index 0000000000000000000000000000000000000000..957f8d603130d8dfa5c2523cce07a926cd8fe330 --- /dev/null +++ b/compression/entropy_coder/lib/block_util.py @@ -0,0 +1,100 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Utility functions for blocks.""" + +from __future__ import division +from __future__ import unicode_literals + +import math + +import numpy as np +import tensorflow as tf + + +class RsqrtInitializer(object): + """Gaussian initializer with standard deviation 1/sqrt(n). + + Note that tf.truncated_normal is used internally. Therefore any random sample + outside two-sigma will be discarded and re-sampled. + """ + + def __init__(self, dims=(0,), **kwargs): + """Creates an initializer. + + Args: + dims: Dimension(s) index to compute standard deviation: + 1.0 / sqrt(product(shape[dims])) + **kwargs: Extra keyword arguments to pass to tf.truncated_normal. + """ + if isinstance(dims, (int, long)): + self._dims = [dims] + else: + self._dims = dims + self._kwargs = kwargs + + def __call__(self, shape, dtype): + stddev = 1.0 / np.sqrt(np.prod([shape[x] for x in self._dims])) + return tf.truncated_normal( + shape=shape, dtype=dtype, stddev=stddev, **self._kwargs) + + +class RectifierInitializer(object): + """Gaussian initializer with standard deviation sqrt(2/fan_in). + + Note that tf.random_normal is used internally to ensure the expected weight + distribution. This is intended to be used with ReLU activations, specially + in ResNets. + + For details please refer to: + Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet + Classification + """ + + def __init__(self, dims=(0,), scale=2.0, **kwargs): + """Creates an initializer. + + Args: + dims: Dimension(s) index to compute standard deviation: + sqrt(scale / product(shape[dims])) + scale: A constant scaling for the initialization used as + sqrt(scale / product(shape[dims])). + **kwargs: Extra keyword arguments to pass to tf.truncated_normal. + """ + if isinstance(dims, (int, long)): + self._dims = [dims] + else: + self._dims = dims + self._kwargs = kwargs + self._scale = scale + + def __call__(self, shape, dtype): + stddev = np.sqrt(self._scale / np.prod([shape[x] for x in self._dims])) + return tf.random_normal( + shape=shape, dtype=dtype, stddev=stddev, **self._kwargs) + + +class GaussianInitializer(object): + """Gaussian initializer with a given standard deviation. + + Note that tf.truncated_normal is used internally. Therefore any random sample + outside two-sigma will be discarded and re-sampled. + """ + + def __init__(self, stddev=1.0): + self._stddev = stddev + + def __call__(self, shape, dtype): + return tf.truncated_normal(shape=shape, dtype=dtype, stddev=self._stddev) diff --git a/compression/entropy_coder/lib/blocks.py b/compression/entropy_coder/lib/blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..002384eb07045f1cad963d217a205ade51ba03b6 --- /dev/null +++ b/compression/entropy_coder/lib/blocks.py @@ -0,0 +1,24 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +from block_base import * +from block_util import * +from blocks_binarizer import * +from blocks_entropy_coding import * +from blocks_lstm import * +from blocks_masked_conv2d import * +from blocks_masked_conv2d_lstm import * +from blocks_operator import * +from blocks_std import * diff --git a/compression/entropy_coder/lib/blocks_binarizer.py b/compression/entropy_coder/lib/blocks_binarizer.py new file mode 100644 index 0000000000000000000000000000000000000000..8206731610613af2cf3ec15210fd5b9977f4a916 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_binarizer.py @@ -0,0 +1,35 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Activation and weight binarizer implementations.""" + +import math + +import numpy as np +import tensorflow as tf + + +def ConvertSignCodeToZeroOneCode(x): + """Conversion from codes {-1, +1} to codes {0, 1}.""" + return 0.5 * (x + 1.0) + + +def ConvertZeroOneCodeToSignCode(x): + """Convert from codes {0, 1} to codes {-1, +1}.""" + return 2.0 * x - 1.0 + + +def CheckZeroOneCode(x): + return tf.reduce_all(tf.equal(x * (x - 1.0), 0)) diff --git a/compression/entropy_coder/lib/blocks_entropy_coding.py b/compression/entropy_coder/lib/blocks_entropy_coding.py new file mode 100644 index 0000000000000000000000000000000000000000..6ee5d97926c1b50b12cb9853d16caa25ba31e8d7 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_entropy_coding.py @@ -0,0 +1,49 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Set of blocks related to entropy coding.""" + +import math + +import tensorflow as tf + +import block_base + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +class CodeLength(block_base.BlockBase): + """Theoretical bound for a code length given a probability distribution. + """ + + def __init__(self, name=None): + super(CodeLength, self).__init__(name) + + def _Apply(self, c, p): + """Theoretical bound of the coded length given a probability distribution. + + Args: + c: The binary codes. Belong to {0, 1}. + p: The probability of: P(code==+1) + + Returns: + The average code length. + Note: the average code length can be greater than 1 bit (e.g. when + encoding the least likely symbol). + """ + entropy = ((1.0 - c) * tf.log(1.0 - p) + c * tf.log(p)) / (-math.log(2)) + entropy = tf.reduce_mean(entropy) + return entropy diff --git a/compression/entropy_coder/lib/blocks_entropy_coding_test.py b/compression/entropy_coder/lib/blocks_entropy_coding_test.py new file mode 100644 index 0000000000000000000000000000000000000000..5209865f5991598ee873ed24a4be572e3f9fc515 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_entropy_coding_test.py @@ -0,0 +1,56 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for basic tensorflow blocks_entropy_coding.""" + +from __future__ import division +from __future__ import unicode_literals + +import math + +import numpy as np +import tensorflow as tf + +import blocks_entropy_coding + + +class BlocksEntropyCodingTest(tf.test.TestCase): + + def testCodeLength(self): + shape = [2, 4] + proba_feed = [[0.65, 0.25, 0.70, 0.10], + [0.28, 0.20, 0.44, 0.54]] + symbol_feed = [[1.0, 0.0, 1.0, 0.0], + [0.0, 0.0, 0.0, 1.0]] + mean_code_length = - ( + (math.log(0.65) + math.log(0.75) + math.log(0.70) + math.log(0.90) + + math.log(0.72) + math.log(0.80) + math.log(0.56) + math.log(0.54)) / + math.log(2.0)) / (shape[0] * shape[1]) + + symbol = tf.placeholder(dtype=tf.float32, shape=shape) + proba = tf.placeholder(dtype=tf.float32, shape=shape) + code_length_calculator = blocks_entropy_coding.CodeLength() + code_length = code_length_calculator(symbol, proba) + + with self.test_session(): + tf.global_variables_initializer().run() + code_length_eval = code_length.eval( + feed_dict={symbol: symbol_feed, proba: proba_feed}) + + self.assertAllClose(mean_code_length, code_length_eval) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_lstm.py b/compression/entropy_coder/lib/blocks_lstm.py new file mode 100644 index 0000000000000000000000000000000000000000..6e474e3e3fcb6eeb3f18daf320e21a3acc88a2bf --- /dev/null +++ b/compression/entropy_coder/lib/blocks_lstm.py @@ -0,0 +1,263 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Blocks of LSTM and its variants.""" + +import numpy as np +import tensorflow as tf + +import block_base +import block_util +import blocks_std + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +def LSTMBiasInit(shape, dtype): + """Returns ones for forget-gate, and zeros for the others.""" + shape = np.array(shape) + + # Check internal consistencies. + assert shape.shape == (1,), shape + assert shape[0] % 4 == 0, shape + + n = shape[0] // 4 + ones = tf.fill([n], tf.constant(1, dtype=dtype)) + zeros = tf.fill([3 * n], tf.constant(0, dtype=dtype)) + return tf.concat([ones, zeros], 0) + + +class LSTMBase(block_base.BlockBase): + """Base class for LSTM implementations. + + These LSTM implementations use the pattern found in [1]. No peephole + connection, i.e., cell content is not used in recurrence computation. + Hidden units are also output units. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, output_shape, name): + """Initializes LSTMBase class object. + + Args: + output_shape: List representing the LSTM output shape. This argument + does not include batch dimension. For example, if the LSTM output has + shape [batch, depth], then pass [depth]. + name: Name of this block. + """ + super(LSTMBase, self).__init__(name) + + with self._BlockScope(): + self._output_shape = [None] + list(output_shape) + self._hidden = None + self._cell = None + + @property + def hidden(self): + """Returns the hidden units of this LSTM.""" + return self._hidden + + @hidden.setter + def hidden(self, value): + """Assigns to the hidden units of this LSTM. + + Args: + value: The new value for the hidden units. If None, the hidden units are + considered to be filled with zeros. + """ + if value is not None: + value.get_shape().assert_is_compatible_with(self._output_shape) + self._hidden = value + + @property + def cell(self): + """Returns the cell units of this LSTM.""" + return self._cell + + @cell.setter + def cell(self, value): + """Assigns to the cell units of this LSTM. + + Args: + value: The new value for the cell units. If None, the cell units are + considered to be filled with zeros. + """ + if value is not None: + value.get_shape().assert_is_compatible_with(self._output_shape) + self._cell = value + + # Consider moving bias terms to the base, and require this method to be + # linear. + def _TransformInputs(self, _): + """Transforms the input units to (4 * depth) units. + + The forget-gate, input-gate, output-gate, and cell update is computed as + f, i, j, o = T(h) + R(x) + where h is hidden units, x is input units, and T, R are transforms of + h, x, respectively. + + This method implements R. Note that T is strictly linear, so if LSTM is + going to use bias, this method must include the bias to the transformation. + + Subclasses must implement this method. See _Apply() for more details. + """ + raise NotImplementedError() + + def _TransformHidden(self, _): + """Transforms the hidden units to (4 * depth) units. + + The forget-gate, input-gate, output-gate, and cell update is computed as + f, i, j, o = T(h) + R(x) + where h is hidden units, x is input units, and T, R are transforms of + h, x, respectively. + + This method implements T in the equation. The method must implement a + strictly linear transformation. For example, it may use MatMul or Conv2D, + but must not add bias. This is because when hidden units are zeros, then + the LSTM implementation will skip calling this method, instead of passing + zeros to this function. + + Subclasses must implement this method. See _Apply() for more details. + """ + raise NotImplementedError() + + def _Apply(self, *args): + xtransform = self._TransformInputs(*args) + depth_axis = len(self._output_shape) - 1 + + if self.hidden is not None: + htransform = self._TransformHidden(self.hidden) + f, i, j, o = tf.split( + value=htransform + xtransform, num_or_size_splits=4, axis=depth_axis) + else: + f, i, j, o = tf.split( + value=xtransform, num_or_size_splits=4, axis=depth_axis) + + if self.cell is not None: + self.cell = tf.sigmoid(f) * self.cell + tf.sigmoid(i) * tf.tanh(j) + else: + self.cell = tf.sigmoid(i) * tf.tanh(j) + + self.hidden = tf.sigmoid(o) * tf.tanh(self.cell) + return self.hidden + + +class LSTM(LSTMBase): + """Efficient LSTM implementation used in [1]. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, + depth, + bias=LSTMBiasInit, + initializer=block_util.RsqrtInitializer(), + name=None): + super(LSTM, self).__init__([depth], name) + + with self._BlockScope(): + self._depth = depth + self._nn = blocks_std.NN( + 4 * depth, bias=bias, act=None, initializer=initializer) + self._hidden_linear = blocks_std.Linear( + 4 * depth, initializer=initializer) + + def _TransformInputs(self, *args): + return self._nn(*args) + + def _TransformHidden(self, h): + return self._hidden_linear(h) + + +class Conv2DLSTM(LSTMBase): + """Convolutional LSTM implementation with optimizations inspired by [1]. + + Note that when using the batch normalization feature, the bias initializer + will not be used, since BN effectively cancels its effect out. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, + depth, + filter_size, + hidden_filter_size, + strides, + padding, + bias=LSTMBiasInit, + initializer=block_util.RsqrtInitializer(dims=(0, 1, 2)), + use_moving_average=False, + name=None): + super(Conv2DLSTM, self).__init__([None, None, depth], name) + self._iter = 0 + + with self._BlockScope(): + self._input_conv = blocks_std.Conv2D( + 4 * depth, + filter_size, + strides, + padding, + bias=None, + act=None, + initializer=initializer, + name='input_conv2d') + + self._hidden_conv = blocks_std.Conv2D( + 4 * depth, + hidden_filter_size, + [1, 1], + 'SAME', + bias=None, + act=None, + initializer=initializer, + name='hidden_conv2d') + + if bias is not None: + self._bias = blocks_std.BiasAdd(bias, name='biases') + else: + self._bias = blocks_std.PassThrough() + + def _TransformInputs(self, x): + return self._bias(self._input_conv(x)) + + def _TransformHidden(self, h): + return self._hidden_conv(h) + + def _Apply(self, *args): + xtransform = self._TransformInputs(*args) + depth_axis = len(self._output_shape) - 1 + + if self.hidden is not None: + htransform = self._TransformHidden(self.hidden) + f, i, j, o = tf.split( + value=htransform + xtransform, num_or_size_splits=4, axis=depth_axis) + else: + f, i, j, o = tf.split( + value=xtransform, num_or_size_splits=4, axis=depth_axis) + + if self.cell is not None: + self.cell = tf.sigmoid(f) * self.cell + tf.sigmoid(i) * tf.tanh(j) + else: + self.cell = tf.sigmoid(i) * tf.tanh(j) + + self.hidden = tf.sigmoid(o) * tf.tanh(self.cell) + + self._iter += 1 + return self.hidden diff --git a/compression/entropy_coder/lib/blocks_lstm_test.py b/compression/entropy_coder/lib/blocks_lstm_test.py new file mode 100644 index 0000000000000000000000000000000000000000..03c32dc136effda11163f2e35c5a48496f0187c0 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_lstm_test.py @@ -0,0 +1,113 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for LSTM tensorflow blocks.""" +from __future__ import division + +import numpy as np +import tensorflow as tf + +import block_base +import blocks_std +import blocks_lstm + + +class BlocksLSTMTest(tf.test.TestCase): + + def CheckUnary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(1, len(y.op.inputs)) + return y.op.inputs[0] + + def CheckBinary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(2, len(y.op.inputs)) + return y.op.inputs + + def testLSTM(self): + lstm = blocks_lstm.LSTM(10) + lstm.hidden = tf.zeros(shape=[10, 10], dtype=tf.float32) + lstm.cell = tf.zeros(shape=[10, 10], dtype=tf.float32) + x = tf.placeholder(dtype=tf.float32, shape=[10, 11]) + y = lstm(x) + + o, tanhc = self.CheckBinary(y, 'Mul') + self.assertEqual(self.CheckUnary(o, 'Sigmoid').name, 'LSTM/split:3') + + self.assertIs(lstm.cell, self.CheckUnary(tanhc, 'Tanh')) + fc, ij = self.CheckBinary(lstm.cell, 'Add') + + f, _ = self.CheckBinary(fc, 'Mul') + self.assertEqual(self.CheckUnary(f, 'Sigmoid').name, 'LSTM/split:0') + + i, j = self.CheckBinary(ij, 'Mul') + self.assertEqual(self.CheckUnary(i, 'Sigmoid').name, 'LSTM/split:1') + j = self.CheckUnary(j, 'Tanh') + self.assertEqual(j.name, 'LSTM/split:2') + + def testLSTMBiasInit(self): + lstm = blocks_lstm.LSTM(9) + x = tf.placeholder(dtype=tf.float32, shape=[15, 7]) + lstm(x) + b = lstm._nn._bias + + with self.test_session(): + tf.global_variables_initializer().run() + bias_var = b._bias.eval() + + comp = ([1.0] * 9) + ([0.0] * 27) + self.assertAllEqual(bias_var, comp) + + def testConv2DLSTM(self): + lstm = blocks_lstm.Conv2DLSTM(depth=10, + filter_size=[1, 1], + hidden_filter_size=[1, 1], + strides=[1, 1], + padding='SAME') + lstm.hidden = tf.zeros(shape=[10, 11, 11, 10], dtype=tf.float32) + lstm.cell = tf.zeros(shape=[10, 11, 11, 10], dtype=tf.float32) + x = tf.placeholder(dtype=tf.float32, shape=[10, 11, 11, 1]) + y = lstm(x) + + o, tanhc = self.CheckBinary(y, 'Mul') + self.assertEqual(self.CheckUnary(o, 'Sigmoid').name, 'Conv2DLSTM/split:3') + + self.assertIs(lstm.cell, self.CheckUnary(tanhc, 'Tanh')) + fc, ij = self.CheckBinary(lstm.cell, 'Add') + + f, _ = self.CheckBinary(fc, 'Mul') + self.assertEqual(self.CheckUnary(f, 'Sigmoid').name, 'Conv2DLSTM/split:0') + + i, j = self.CheckBinary(ij, 'Mul') + self.assertEqual(self.CheckUnary(i, 'Sigmoid').name, 'Conv2DLSTM/split:1') + j = self.CheckUnary(j, 'Tanh') + self.assertEqual(j.name, 'Conv2DLSTM/split:2') + + def testConv2DLSTMBiasInit(self): + lstm = blocks_lstm.Conv2DLSTM(9, 1, 1, [1, 1], 'SAME') + x = tf.placeholder(dtype=tf.float32, shape=[1, 7, 7, 7]) + lstm(x) + b = lstm._bias + + with self.test_session(): + tf.global_variables_initializer().run() + bias_var = b._bias.eval() + + comp = ([1.0] * 9) + ([0.0] * 27) + self.assertAllEqual(bias_var, comp) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_masked_conv2d.py b/compression/entropy_coder/lib/blocks_masked_conv2d.py new file mode 100644 index 0000000000000000000000000000000000000000..395af334953676215849683b9b275c64ae967b38 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_masked_conv2d.py @@ -0,0 +1,225 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Define some typical masked 2D convolutions.""" + +import numpy as np +import tensorflow as tf + +import block_util +import blocks_std + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +class RasterScanConv2D(blocks_std.Conv2DBase): + """Conv2D with no dependency on future pixels (in raster scan order). + + For example, assuming a 5 x 5 kernel, the kernel is applied a spatial mask: + T T T T T + T T T T T + T T x F F + F F F F F + F F F F F + where 'T' are pixels which are available when computing the convolution + for pixel 'x'. All the pixels marked with 'F' are not available. + 'x' itself is not available if strict_order is True, otherwise, it is + available. + """ + + def __init__(self, depth, filter_size, strides, padding, + strict_order=True, + bias=None, act=None, initializer=None, name=None): + super(RasterScanConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + if (filter_size[0] % 2) != 1 or (filter_size[1] % 2) != 1: + raise ValueError('Kernel size should be odd.') + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + self._strict_order = strict_order + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + mask = np.ones(shape[:2], dtype=dtype.as_numpy_dtype) + center = shape[:2] // 2 + mask[center[0] + 1:, :] = 0 + if not self._strict_order: + mask[center[0], center[1] + 1:] = 0 + else: + mask[center[0], center[1]:] = 0 + mask = mask.reshape(mask.shape + (1, 1)) + + return tf.convert_to_tensor(mask, dtype) * kernel + + +class DepthOrderConv2D(blocks_std.Conv2DBase): + """Conv2D with no dependency on higher depth dimensions. + + More precisely, the output depth #n has only dependencies on input depths #k + for k < n (if strict_order is True) or for k <= n (if strict_order is False). + """ + + def __init__(self, depth, filter_size, strides, padding, + strict_order=True, + bias=None, act=None, initializer=None, name=None): + super(DepthOrderConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + self._strict_order = strict_order + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + mask = np.ones(shape[2:], dtype=dtype.as_numpy_dtype) + depth_output = shape[3] + for d in xrange(depth_output): + if self._strict_order: + mask[d:, d] = 0 + else: + mask[d + 1:, d] = 0 + mask = mask.reshape((1, 1) + mask.shape) + + return tf.convert_to_tensor(mask, dtype) * kernel + + +class GroupRasterScanConv2D(blocks_std.Conv2DBase): + """Conv2D with no dependency on future pixels (in raster scan order). + + This version only introduces dependencies on previous pixels in raster scan + order. It can also introduce some dependencies on previous depth positions + of the current pixel (current pixel = center pixel of the kernel) in the + following way: + the depth dimension of the input is split into Ki groups of size + |input_group_size|, the output dimension is split into Ko groups of size + |output_group_size| (usually Ki == Ko). Each output group ko of the current + pixel position can only depend on previous input groups ki + (i.e. ki < ko if strict_order is True or ki <= ko if strict_order is False). + + Notes: + - Block RasterScanConv2D is a special case of GroupRasterScanConv2D + where Ki == Ko == 1 (i.e. input_group_size == input_depth and + output_group_size == output_depth). + - For 1x1 convolution, block DepthOrderConv2D is a special case of + GroupRasterScanConv2D where input_group_size == 1 and + output_group_size == 1. + """ + + def __init__(self, depth, filter_size, strides, padding, + strict_order=True, + input_group_size=1, + output_group_size=1, + bias=None, act=None, initializer=None, name=None): + super(GroupRasterScanConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + if (filter_size[0] % 2) != 1 or (filter_size[1] % 2) != 1: + raise ValueError('Kernel size should be odd.') + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + self._input_group_size = input_group_size + self._output_group_size = output_group_size + self._strict_order = strict_order + + if depth % self._output_group_size != 0: + raise ValueError( + 'Invalid depth group size: {} for depth {}'.format( + self._output_group_size, depth)) + self._output_group_count = depth // self._output_group_size + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + depth_input = shape[2] + if depth_input % self._input_group_size != 0: + raise ValueError( + 'Invalid depth group size: {} for depth {}'.format( + self._input_group_size, depth_input)) + input_group_count = depth_input // self._input_group_size + output_group_count = self._output_group_count + + # Set the mask to 0 for future pixels in raster scan order. + center = shape[:2] // 2 + mask = np.ones([shape[0], shape[1], + input_group_count, self._input_group_size, + output_group_count, self._output_group_size], + dtype=dtype.as_numpy_dtype) + mask[center[0] + 1:, :, :, :, :, :] = 0 + mask[center[0], center[1] + 1:, :, :, :, :] = 0 + + # Adjust the mask for the current position (the center position). + depth_output = shape[3] + for d in xrange(output_group_count): + mask[center[0], center[1], d + 1:, :, d:d + 1, :] = 0 + if self._strict_order: + mask[center[0], center[1], d, :, d:d + 1, :] = 0 + + mask = mask.reshape([shape[0], shape[1], depth_input, depth_output]) + return tf.convert_to_tensor(mask, dtype) * kernel + + +class InFillingConv2D(blocks_std.Conv2DBase): + """Conv2D with kernel having no dependency on the current pixel. + + For example, assuming a 5 x 5 kernel, the kernel is applied a spatial mask: + T T T T T + T T T T T + T T x T T + T T T T T + T T T T T + where 'T' marks a pixel which is available when computing the convolution + for pixel 'x'. 'x' itself is not available. + """ + + def __init__(self, depth, filter_size, strides, padding, + bias=None, act=None, initializer=None, name=None): + super(InFillingConv2D, self).__init__( + depth, filter_size, strides, padding, bias, act, name=name) + + if (filter_size[0] % 2) != 1 or (filter_size[1] % 2) != 1: + raise ValueError('Kernel size should be odd.') + if filter_size[0] == 1 and filter_size[1] == 1: + raise ValueError('Kernel size should be larger than 1x1.') + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + kernel = self.NewVar(init) + + mask = np.ones(shape[:2], dtype=dtype.as_numpy_dtype) + center = shape[:2] // 2 + mask[center[0], center[1]] = 0 + mask = mask.reshape(mask.shape + (1, 1)) + + return tf.convert_to_tensor(mask, dtype) * kernel diff --git a/compression/entropy_coder/lib/blocks_masked_conv2d_lstm.py b/compression/entropy_coder/lib/blocks_masked_conv2d_lstm.py new file mode 100644 index 0000000000000000000000000000000000000000..2d6dfeffcaff1289adf3bdec33cb0560db6b0416 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_masked_conv2d_lstm.py @@ -0,0 +1,79 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Masked conv2d LSTM.""" + +import block_base +import block_util +import blocks_masked_conv2d +import blocks_lstm +import blocks_std + +# pylint: disable=not-callable + + +class RasterScanConv2DLSTM(blocks_lstm.LSTMBase): + """Convolutional LSTM implementation with optimizations inspired by [1]. + + Note that when using the batch normalization feature, the bias initializer + will not be used, since BN effectively cancels its effect out. + + [1] Zaremba, Sutskever, Vinyals. Recurrent Neural Network Regularization, + 2015. arxiv:1409.2329. + """ + + def __init__(self, + depth, + filter_size, + hidden_filter_size, + strides, + padding, + bias=blocks_lstm.LSTMBiasInit, + initializer=block_util.RsqrtInitializer(dims=(0, 1, 2)), + name=None): + super(RasterScanConv2DLSTM, self).__init__([None, None, depth], name) + + with self._BlockScope(): + self._input_conv = blocks_masked_conv2d.RasterScanConv2D( + 4 * depth, + filter_size, + strides, + padding, + strict_order=False, + bias=None, + act=None, + initializer=initializer, + name='input_conv2d') + + self._hidden_conv = blocks_std.Conv2D( + 4 * depth, + hidden_filter_size, + [1, 1], + 'SAME', + bias=None, + act=None, + initializer=initializer, + name='hidden_conv2d') + + if bias is not None: + self._bias = blocks_std.BiasAdd(bias, name='biases') + else: + self._bias = blocks_std.PassThrough() + + def _TransformInputs(self, x): + return self._bias(self._input_conv(x)) + + def _TransformHidden(self, h): + return self._hidden_conv(h) diff --git a/compression/entropy_coder/lib/blocks_masked_conv2d_test.py b/compression/entropy_coder/lib/blocks_masked_conv2d_test.py new file mode 100644 index 0000000000000000000000000000000000000000..adb546778e526bfb99fda3bb3e6a4432d0082161 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_masked_conv2d_test.py @@ -0,0 +1,206 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests of the 2D masked convolution blocks.""" + +from __future__ import division +from __future__ import unicode_literals + +import numpy as np +import tensorflow as tf + +import blocks_masked_conv2d + + +class MaskedConv2DTest(tf.test.TestCase): + + def testRasterScanKernel(self): + kernel_size = 5 + input_depth = 1 + output_depth = 1 + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + + # pylint: disable=bad-whitespace + kernel_feed = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 13.0, 14.0, 15.0], + [16.0, 17.0, 18.0, 19.0, 20.0], + [21.0, 22.0, 23.0, 24.0, 25.0]] + kernel_feed = np.reshape(kernel_feed, kernel_shape) + kernel_expected = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 0.0, 0.0, 0.0], + [ 0.0, 0.0, 0.0, 0.0, 0.0], + [ 0.0, 0.0, 0.0, 0.0, 0.0]] + kernel_expected = np.reshape(kernel_expected, kernel_shape) + # pylint: enable=bad-whitespace + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.RasterScanConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=[10] * 3 + [input_depth]) + _ = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + kernel_value = masked_conv2d._kernel.eval() + + self.assertAllEqual(kernel_expected, kernel_value) + + def testDepthOrderKernel(self): + kernel_size = 1 + input_depth = 7 + output_depth = input_depth + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + + kernel_feed = np.ones(kernel_shape) + x_shape = [5] * 3 + [input_depth] + x_feed = np.ones(x_shape) + y_expected = np.zeros(x_shape[0:3] + [output_depth]) + y_expected[:, :, :] = np.arange(output_depth) + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.DepthOrderConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + strict_order=True, + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=x_shape) + y = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + y_value = y.eval(feed_dict={x: x_feed}) + + self.assertAllEqual(y_expected, y_value) + + def testGroupRasterScanKernel(self): + kernel_size = 3 + input_depth = 4 + input_group_size = 2 + output_depth = 2 + output_group_size = 1 + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + kernel_feed = np.ones(shape=kernel_shape) + + height = 5 + width = 5 + x_shape = [1, height, width, input_depth] + x_feed = np.ones(shape=x_shape) + + # pylint: disable=bad-whitespace + y_expected = [ + [[ 0, 2], [ 4, 6], [ 4, 6], [ 4, 6], [ 4, 6]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + [[ 8, 10], [16, 18], [16, 18], [16, 18], [12, 14]], + ] + y_expected = np.reshape(y_expected, [1, height, width, output_depth]) + # pylint: enable=bad-whitespace + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.GroupRasterScanConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + strict_order=True, + input_group_size=input_group_size, + output_group_size=output_group_size, + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=x_shape) + y = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + y_value = y.eval(feed_dict={x: x_feed}) + + self.assertAllEqual(y_expected, y_value) + + def testInFillingKernel(self): + kernel_size = 5 + input_depth = 1 + output_depth = 1 + kernel_shape = [kernel_size, kernel_size, input_depth, output_depth] + + # pylint: disable=bad-whitespace + kernel_feed = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 13.0, 14.0, 15.0], + [16.0, 17.0, 18.0, 19.0, 20.0], + [21.0, 22.0, 23.0, 24.0, 25.0]] + kernel_feed = np.reshape(kernel_feed, kernel_shape) + kernel_expected = [[ 1.0, 2.0, 3.0, 4.0, 5.0], + [ 6.0, 7.0, 8.0, 9.0, 10.0], + [11.0, 12.0, 0.0, 14.0, 15.0], + [16.0, 17.0, 18.0, 19.0, 20.0], + [21.0, 22.0, 23.0, 24.0, 25.0]] + kernel_expected = np.reshape(kernel_expected, kernel_shape) + # pylint: enable=bad-whitespace + + init_kernel = lambda s, t: tf.constant(kernel_feed, dtype=t, shape=s) + masked_conv2d = blocks_masked_conv2d.InFillingConv2D( + output_depth, [kernel_size] * 2, [1] * 2, 'SAME', + initializer=init_kernel) + x = tf.placeholder(dtype=tf.float32, shape=[10] * 3 + [input_depth]) + _ = masked_conv2d(x) + + with self.test_session(): + tf.global_variables_initializer().run() + kernel_value = masked_conv2d._kernel.eval() + + self.assertAllEqual(kernel_expected, kernel_value) + + def testConv2DMaskedNumerics(self): + kernel_size = 5 + input_shape = [1, 10, 10, 1] + filter_shape = [kernel_size, kernel_size, 1, 1] + strides = [1, 1, 1, 1] + output_shape = [1, 10, 10, 1] + + conv = blocks_masked_conv2d.RasterScanConv2D( + depth=filter_shape[-1], + filter_size=filter_shape[0:2], + strides=strides[1:3], + padding='SAME', + initializer=tf.constant_initializer(value=1.0)) + x = tf.placeholder(dtype=tf.float32, shape=input_shape) + y = conv(x) + + x_feed = - np.ones(input_shape, dtype=float) + y_expected = np.ones(output_shape, dtype=float) + for i in xrange(input_shape[1]): + for j in xrange(input_shape[2]): + x_feed[0, i, j, 0] = 10 * (j + 1) + i + v = 0 + ki_start = max(i - kernel_size // 2, 0) + kj_start = max(j - kernel_size // 2, 0) + kj_end = min(j + kernel_size // 2, input_shape[2] - 1) + for ki in range(ki_start, i + 1): + for kj in range(kj_start, kj_end + 1): + if ki > i: + continue + if ki == i and kj >= j: + continue + v += 10 * (kj + 1) + ki + y_expected[0, i, j, 0] = v + + with self.test_session(): + tf.global_variables_initializer().run() + y_value = y.eval(feed_dict={x: x_feed}) + + self.assertAllEqual(y_expected, y_value) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_operator.py b/compression/entropy_coder/lib/blocks_operator.py new file mode 100644 index 0000000000000000000000000000000000000000..e35e37b27aa416ed48f91eda866d372601741cba --- /dev/null +++ b/compression/entropy_coder/lib/blocks_operator.py @@ -0,0 +1,87 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Common blocks which work as operators on other blocks.""" + +import tensorflow as tf + +import block_base + +# pylint: disable=not-callable + + +class CompositionOperator(block_base.BlockBase): + """Composition of several blocks.""" + + def __init__(self, block_list, name=None): + """Initialization of the composition operator. + + Args: + block_list: List of blocks.BlockBase that are chained to create + a new blocks.BlockBase. + name: Name of this block. + """ + super(CompositionOperator, self).__init__(name) + self._blocks = block_list + + def _Apply(self, x): + """Apply successively all the blocks on the given input tensor.""" + h = x + for layer in self._blocks: + h = layer(h) + return h + + +class LineOperator(block_base.BlockBase): + """Repeat the same block over all the lines of an input tensor.""" + + def __init__(self, block, name=None): + super(LineOperator, self).__init__(name) + self._block = block + + def _Apply(self, x): + height = x.get_shape()[1].value + if height is None: + raise ValueError('Unknown tensor height') + all_line_x = tf.split(value=x, num_or_size_splits=height, axis=1) + + y = [] + for line_x in all_line_x: + y.append(self._block(line_x)) + y = tf.concat(values=y, axis=1) + + return y + + +class TowerOperator(block_base.BlockBase): + """Parallel execution with concatenation of several blocks.""" + + def __init__(self, block_list, dim=3, name=None): + """Initialization of the parallel exec + concat (Tower). + + Args: + block_list: List of blocks.BlockBase that are chained to create + a new blocks.BlockBase. + dim: the dimension on which to concat. + name: Name of this block. + """ + super(TowerOperator, self).__init__(name) + self._blocks = block_list + self._concat_dim = dim + + def _Apply(self, x): + """Apply successively all the blocks on the given input tensor.""" + outputs = [layer(x) for layer in self._blocks] + return tf.concat(outputs, self._concat_dim) diff --git a/compression/entropy_coder/lib/blocks_operator_test.py b/compression/entropy_coder/lib/blocks_operator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8b6d80da1d09102585e4725dd5c59f48d48eafcd --- /dev/null +++ b/compression/entropy_coder/lib/blocks_operator_test.py @@ -0,0 +1,64 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests of the block operators.""" + +import numpy as np +import tensorflow as tf + +import block_base +import blocks_operator + + +class AddOneBlock(block_base.BlockBase): + + def __init__(self, name=None): + super(AddOneBlock, self).__init__(name) + + def _Apply(self, x): + return x + 1.0 + + +class SquareBlock(block_base.BlockBase): + + def __init__(self, name=None): + super(SquareBlock, self).__init__(name) + + def _Apply(self, x): + return x * x + + +class BlocksOperatorTest(tf.test.TestCase): + + def testComposition(self): + x_value = np.array([[1.0, 2.0, 3.0], + [-1.0, -2.0, -3.0]]) + y_expected_value = np.array([[4.0, 9.0, 16.0], + [0.0, 1.0, 4.0]]) + + x = tf.placeholder(dtype=tf.float32, shape=[2, 3]) + complex_block = blocks_operator.CompositionOperator( + [AddOneBlock(), + SquareBlock()]) + y = complex_block(x) + + with self.test_session(): + y_value = y.eval(feed_dict={x: x_value}) + + self.assertAllClose(y_expected_value, y_value) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/lib/blocks_std.py b/compression/entropy_coder/lib/blocks_std.py new file mode 100644 index 0000000000000000000000000000000000000000..2c617485342452f500d4b1b0b18e33b07d51e487 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_std.py @@ -0,0 +1,363 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Basic blocks for building tensorflow models.""" + +import numpy as np +import tensorflow as tf + +import block_base +import block_util + +# pylint does not recognize block_base.BlockBase.__call__(). +# pylint: disable=not-callable + + +def HandleConvPaddingModes(x, padding, kernel_shape, strides): + """Returns an updated tensor and padding type for REFLECT and SYMMETRIC. + + Args: + x: A 4D tensor with shape [batch_size, height, width, depth]. + padding: Padding mode (SAME, VALID, REFLECT, or SYMMETRIC). + kernel_shape: Shape of convolution kernel that will be applied. + strides: Convolution stride that will be used. + + Returns: + x and padding after adjustments for REFLECT and SYMMETRIC. + """ + # For 1x1 convolution, all padding modes are the same. + if np.all(kernel_shape[:2] == 1): + return x, 'VALID' + + if padding == 'REFLECT' or padding == 'SYMMETRIC': + # We manually compute the number of paddings as if 'SAME'. + # From Tensorflow kernel, the formulas are as follows. + # output_shape = ceil(input_shape / strides) + # paddings = (output_shape - 1) * strides + filter_size - input_shape + # Let x, y, s be a shorthand notations for input_shape, output_shape, and + # strides, respectively. Let (x - 1) = sn + r where 0 <= r < s. Note that + # y - 1 = ceil(x / s) - 1 = floor((x - 1) / s) = n + # provided that x > 0. Therefore + # paddings = n * s + filter_size - (sn + r + 1) + # = filter_size - r - 1. + input_shape = x.get_shape() # shape at graph construction time + img_shape = tf.shape(x)[1:3] # image shape (no batch) at run time + remainder = tf.mod(img_shape - 1, strides[1:3]) + pad_sizes = kernel_shape[:2] - remainder - 1 + + pad_rows = pad_sizes[0] + pad_cols = pad_sizes[1] + pad = tf.stack([[0, 0], tf.stack([pad_rows // 2, (pad_rows + 1) // 2]), + tf.stack([pad_cols // 2, (pad_cols + 1) // 2]), [0, 0]]) + + # Manually pad the input and switch the padding mode to 'VALID'. + x = tf.pad(x, pad, mode=padding) + x.set_shape([input_shape[0], x.get_shape()[1], + x.get_shape()[2], input_shape[3]]) + padding = 'VALID' + + return x, padding + + +class PassThrough(block_base.BlockBase): + """A dummy transform block that does nothing.""" + + def __init__(self): + # Pass an empty string to disable name scoping. + super(PassThrough, self).__init__(name='') + + def _Apply(self, inp): + return inp + + @property + def initialized(self): + """Always returns True.""" + return True + + +class Bias(object): + """An initialization helper class for BiasAdd block below.""" + + def __init__(self, value=0): + self.value = value + + +class BiasAdd(block_base.BlockBase): + """A tf.nn.bias_add wrapper. + + This wrapper may act as a PassThrough block depending on the initializer + provided, to make easier optional bias applications in NN blocks, etc. + See __init__() for the details. + """ + + def __init__(self, initializer=Bias(0), name=None): + """Initializes Bias block. + + |initializer| parameter have two special cases. + + 1. If initializer is None, then this block works as a PassThrough. + 2. If initializer is a Bias class object, then tf.constant_initializer is + used with the stored value. + + Args: + initializer: An initializer for the bias variable. + name: Name of this block. + """ + super(BiasAdd, self).__init__(name) + + with self._BlockScope(): + if isinstance(initializer, Bias): + self._initializer = tf.constant_initializer(value=initializer.value) + else: + self._initializer = initializer + + self._bias = None + + def _Apply(self, x): + if not self._bias: + init = self._initializer([int(x.get_shape()[-1])], x.dtype) + self._bias = self.NewVar(init) + + return tf.nn.bias_add(x, self._bias) + + def CreateWeightLoss(self): + return [] + + +class LinearBase(block_base.BlockBase): + """A matmul wrapper. + + Returns input * W, where matrix W can be customized through derivation. + """ + + def __init__(self, depth, name=None): + super(LinearBase, self).__init__(name) + + with self._BlockScope(): + self._depth = depth + self._matrix = None + + def _CreateKernel(self, shape, dtype): + raise NotImplementedError('This method must be sub-classed.') + + def _Apply(self, x): + if not self._matrix: + shape = [int(x.get_shape()[-1]), self._depth] + self._matrix = self._CreateKernel(shape, x.dtype) + + return tf.matmul(x, self._matrix) + + +class Linear(LinearBase): + """A matmul wrapper. + + Returns input * W, where matrix W is learned. + """ + + def __init__(self, + depth, + initializer=block_util.RsqrtInitializer(), + name=None): + super(Linear, self).__init__(depth, name) + + with self._BlockScope(): + self._initializer = initializer + + def _CreateKernel(self, shape, dtype): + init = self._initializer(shape, dtype) + return self.NewVar(init) + + +class NN(block_base.BlockBase): + """A neural network layer wrapper. + + Returns act(input * W + b), where matrix W, bias b are learned, and act is an + optional activation function (i.e., nonlinearity). + + This transform block can handle multiple inputs. If x_1, x_2, ..., x_m are + the inputs, then returns act(x_1 * W_1 + ... + x_m * W_m + b). + + Attributes: + nunits: The dimension of the output. + """ + + def __init__(self, + depth, + bias=Bias(0), + act=None, # e.g., tf.nn.relu + initializer=block_util.RsqrtInitializer(), + linear_block_factory=(lambda d, i: Linear(d, initializer=i)), + name=None): + """Initializes NN block. + + Args: + depth: The depth of the output. + bias: An initializer for the bias, or a Bias class object. If None, there + will be no bias term for this NN block. See BiasAdd block. + act: Optional activation function. If None, no activation is applied. + initializer: The initialization method for the matrix weights. + linear_block_factory: A function used to create a linear block. + name: The name of this block. + """ + super(NN, self).__init__(name) + + with self._BlockScope(): + self._linear_block_factory = linear_block_factory + self._depth = depth + self._initializer = initializer + self._matrices = None + + self._bias = BiasAdd(bias) if bias else PassThrough() + self._act = act if act else PassThrough() + + def _Apply(self, *args): + if not self._matrices: + self._matrices = [ + self._linear_block_factory(self._depth, self._initializer) + for _ in args] + + if len(self._matrices) != len(args): + raise ValueError('{} expected {} inputs, but observed {} inputs'.format( + self.name, len(self._matrices), len(args))) + + if len(args) > 1: + y = tf.add_n([m(x) for m, x in zip(self._matrices, args)]) + else: + y = self._matrices[0](args[0]) + + return self._act(self._bias(y)) + + +class Conv2DBase(block_base.BlockBase): + """A tf.nn.conv2d operator.""" + + def __init__(self, depth, filter_size, strides, padding, + bias=None, act=None, atrous_rate=None, conv=tf.nn.conv2d, + name=None): + """Initializes a Conv2DBase block. + + Arguments: + depth: The output depth of the block (i.e. #filters); if negative, the + output depth will be set to be the same as the input depth. + filter_size: The size of the 2D filter. If it's specified as an integer, + it's going to create a square filter. Otherwise, this is a tuple + specifying the height x width of the filter. + strides: A tuple specifying the y and x stride. + padding: One of the valid padding modes allowed by tf.nn.conv2d, or + 'REFLECT'/'SYMMETRIC' for mirror padding. + bias: An initializer for the bias, or a Bias class object. If None, there + will be no bias in this block. See BiasAdd block. + act: Optional activation function applied to the output. + atrous_rate: optional input rate for ATrous convolution. If not None, this + will be used and the strides will be ignored. + conv: The convolution function to use (e.g. tf.nn.conv2d). + name: The name for this conv2d op. + """ + super(Conv2DBase, self).__init__(name) + + with self._BlockScope(): + self._act = act if act else PassThrough() + self._bias = BiasAdd(bias) if bias else PassThrough() + + self._kernel_shape = np.zeros((4,), dtype=np.int32) + self._kernel_shape[:2] = filter_size + self._kernel_shape[3] = depth + + self._strides = np.ones((4,), dtype=np.int32) + self._strides[1:3] = strides + self._strides = list(self._strides) + + self._padding = padding + + self._kernel = None + self._conv = conv + + self._atrous_rate = atrous_rate + + def _CreateKernel(self, shape, dtype): + raise NotImplementedError('This method must be sub-classed') + + def _Apply(self, x): + """Apply the self._conv op. + + Arguments: + x: input tensor. It needs to be a 4D tensor of the form + [batch, height, width, channels]. + Returns: + The output of the convolution of x with the current convolutional + kernel. + Raises: + ValueError: if number of channels is not defined at graph construction. + """ + input_shape = x.get_shape().with_rank(4) + input_shape[3:].assert_is_fully_defined() # channels must be defined + if self._kernel is None: + assert self._kernel_shape[2] == 0, self._kernel_shape + self._kernel_shape[2] = input_shape[3].value + if self._kernel_shape[3] < 0: + # Make output depth be the same as input depth. + self._kernel_shape[3] = self._kernel_shape[2] + self._kernel = self._CreateKernel(self._kernel_shape, x.dtype) + + x, padding = HandleConvPaddingModes( + x, self._padding, self._kernel_shape, self._strides) + if self._atrous_rate is None: + x = self._conv(x, self._kernel, strides=self._strides, padding=padding) + else: + x = self._conv(x, self._kernel, rate=self._atrous_rate, padding=padding) + + if self._padding != 'VALID': + # Manually update shape. Known shape information can be lost by tf.pad(). + height = (1 + (input_shape[1].value - 1) // self._strides[1] + if input_shape[1].value else None) + width = (1 + (input_shape[2].value - 1) // self._strides[2] + if input_shape[2].value else None) + shape = x.get_shape() + x.set_shape([shape[0], height, width, shape[3]]) + + return self._act(self._bias(x)) + + +class Conv2D(Conv2DBase): + """A tf.nn.conv2d operator.""" + + def __init__(self, depth, filter_size, strides, padding, + bias=None, act=None, initializer=None, name=None): + """Initializes a Conv2D block. + + Arguments: + depth: The output depth of the block (i.e., #filters) + filter_size: The size of the 2D filter. If it's specified as an integer, + it's going to create a square filter. Otherwise, this is a tuple + specifying the height x width of the filter. + strides: A tuple specifying the y and x stride. + padding: One of the valid padding modes allowed by tf.nn.conv2d, or + 'REFLECT'/'SYMMETRIC' for mirror padding. + bias: An initializer for the bias, or a Bias class object. If None, there + will be no bias in this block. See BiasAdd block. + act: Optional activation function applied to the output. + initializer: Optional initializer for weights. + name: The name for this conv2d op. + """ + super(Conv2D, self).__init__(depth, filter_size, strides, padding, bias, + act, conv=tf.nn.conv2d, name=name) + + with self._BlockScope(): + if initializer is None: + initializer = block_util.RsqrtInitializer(dims=(0, 1, 2)) + self._initializer = initializer + + def _CreateKernel(self, shape, dtype): + return self.NewVar(self._initializer(shape, dtype)) diff --git a/compression/entropy_coder/lib/blocks_std_test.py b/compression/entropy_coder/lib/blocks_std_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7e8d42cf1020dabaeb58ca52049610ce74245092 --- /dev/null +++ b/compression/entropy_coder/lib/blocks_std_test.py @@ -0,0 +1,339 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for basic tensorflow blocks_std.""" + +from __future__ import division +from __future__ import unicode_literals + +import math +import os + +import numpy as np +import tensorflow as tf + +import blocks_std + + +def _NumpyConv2D(x, f, strides, padding, rate=1): + assert strides[0] == 1 and strides[3] == 1, strides + + if rate > 1: + f_shape = f.shape + expand_f = np.zeros([f_shape[0], ((f_shape[1] - 1) * rate + 1), + f_shape[2], f_shape[3]]) + expand_f[:, [y * rate for y in range(f_shape[1])], :, :] = f + f = np.zeros([((f_shape[0] - 1) * rate + 1), expand_f.shape[1], + f_shape[2], f_shape[3]]) + f[[y * rate for y in range(f_shape[0])], :, :, :] = expand_f + + if padding != 'VALID': + assert x.shape[1] > 0 and x.shape[2] > 0, x.shape + # Compute the number of padded rows and cols. + # See Conv2D block comments for a math explanation. + remainder = ((x.shape[1] - 1) % strides[1], (x.shape[2] - 1) % strides[2]) + pad_rows = f.shape[0] - remainder[0] - 1 + pad_cols = f.shape[1] - remainder[1] - 1 + pad = ((0, 0), + (pad_rows // 2, (pad_rows + 1) // 2), + (pad_cols // 2, (pad_cols + 1) // 2), + (0, 0)) + + # Pad the input using numpy.pad(). + mode = None + if padding == 'SAME': + mode = str('constant') + if padding == 'REFLECT': + mode = str('reflect') + if padding == 'SYMMETRIC': + mode = str('symmetric') + x = np.pad(x, pad, mode=mode) + + # Since x is now properly padded, proceed as if padding mode is VALID. + x_window = np.empty( + (x.shape[0], + int(math.ceil((x.shape[1] - f.shape[0] + 1) / strides[1])), + int(math.ceil((x.shape[2] - f.shape[1] + 1) / strides[2])), + np.prod(f.shape[:3]))) + + # The output at pixel location (i, j) is the result of linear transformation + # applied to the window whose top-left corner is at + # (i * row_stride, j * col_stride). + for i in xrange(x_window.shape[1]): + k = i * strides[1] + for j in xrange(x_window.shape[2]): + l = j * strides[2] + x_window[:, i, j, :] = x[:, + k:(k + f.shape[0]), + l:(l + f.shape[1]), + :].reshape((x_window.shape[0], -1)) + + y = np.tensordot(x_window, f.reshape((-1, f.shape[3])), axes=1) + return y + + +class BlocksStdTest(tf.test.TestCase): + + def CheckUnary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(1, len(y.op.inputs)) + return y.op.inputs[0] + + def CheckBinary(self, y, op_type): + self.assertEqual(op_type, y.op.type) + self.assertEqual(2, len(y.op.inputs)) + return y.op.inputs + + def testPassThrough(self): + p = blocks_std.PassThrough() + x = tf.placeholder(dtype=tf.float32, shape=[1]) + self.assertIs(p(x), x) + + def CheckBiasAdd(self, y, b): + x, u = self.CheckBinary(y, 'BiasAdd') + self.assertIs(u, b._bias.value()) + self.assertEqual(x.dtype, u.dtype.base_dtype) + return x + + def testBiasAdd(self): + b = blocks_std.BiasAdd() + x = tf.placeholder(dtype=tf.float32, shape=[4, 8]) + y = b(x) + self.assertEqual(b._bias.get_shape(), x.get_shape()[-1:]) + self.assertIs(x, self.CheckBiasAdd(y, b)) + + def testBiasRankTest(self): + b = blocks_std.BiasAdd() + x = tf.placeholder(dtype=tf.float32, shape=[10]) + with self.assertRaises(ValueError): + b(x) + + def CheckLinear(self, y, m): + x, w = self.CheckBinary(y, 'MatMul') + self.assertIs(w, m._matrix.value()) + self.assertEqual(x.dtype, w.dtype.base_dtype) + return x + + def testLinear(self): + m = blocks_std.Linear(10) + x = tf.placeholder(dtype=tf.float32, shape=[8, 9]) + y = m(x) + self.assertEqual(m._matrix.get_shape(), [9, 10]) + self.assertIs(x, self.CheckLinear(y, m)) + + def testLinearShared(self): + # Create a linear map which is applied twice on different inputs + # (i.e. the weights of the map are shared). + linear_map = blocks_std.Linear(6) + x1 = tf.random_normal(shape=[1, 5]) + x2 = tf.random_normal(shape=[1, 5]) + xs = x1 + x2 + + # Apply the transform with the same weights. + y1 = linear_map(x1) + y2 = linear_map(x2) + ys = linear_map(xs) + + with self.test_session() as sess: + # Initialize all the variables of the graph. + tf.global_variables_initializer().run() + + y1_res, y2_res, ys_res = sess.run([y1, y2, ys]) + self.assertAllClose(y1_res + y2_res, ys_res) + + def CheckNN(self, y, nn, act=None): + if act: + pre_act = self.CheckUnary(y, act) + else: + pre_act = y + + if not isinstance(nn._bias, blocks_std.PassThrough): + pre_bias = self.CheckBiasAdd(pre_act, nn._bias) + else: + pre_bias = pre_act + + if len(nn._matrices) > 1: + self.assertEqual('AddN', pre_bias.op.type) + pre_bias = pre_bias.op.inputs + else: + pre_bias = [pre_bias] + + self.assertEqual(len(pre_bias), len(nn._matrices)) + return [self.CheckLinear(u, m) for u, m in zip(pre_bias, nn._matrices)] + + def testNNWithoutActWithoutBias(self): + nn = blocks_std.NN(10, act=None, bias=None) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn)[0]) + + def testNNWithoutBiasWithAct(self): + nn = blocks_std.NN(10, act=tf.nn.relu, bias=None) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn, 'Relu')[0]) + + def testNNWithBiasWithoutAct(self): + nn = blocks_std.NN(10, bias=blocks_std.Bias(0), act=None) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn)[0]) + + def testNNWithBiasWithAct(self): + nn = blocks_std.NN(10, bias=blocks_std.Bias(0), act=tf.square) + x = tf.placeholder(dtype=tf.float32, shape=[5, 7]) + y = nn(x) + self.assertIs(x, self.CheckNN(y, nn, 'Square')[0]) + + def testNNMultipleInputs(self): + nn = blocks_std.NN(10, bias=blocks_std.Bias(0), act=tf.tanh) + x = [tf.placeholder(dtype=tf.float32, shape=[5, 7]), + tf.placeholder(dtype=tf.float32, shape=[5, 3]), + tf.placeholder(dtype=tf.float32, shape=[5, 5])] + y = nn(*x) + xs = self.CheckNN(y, nn, 'Tanh') + self.assertEqual(len(x), len(xs)) + for u, v in zip(x, xs): + self.assertIs(u, v) + + def testConv2DSAME(self): + np.random.seed(142536) + + x_shape = [4, 16, 11, 5] + f_shape = [4, 3, 5, 6] + strides = [1, 2, 2, 1] + padding = 'SAME' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DValid(self): + np.random.seed(253647) + + x_shape = [4, 11, 12, 5] + f_shape = [5, 2, 5, 5] + strides = [1, 2, 2, 1] + padding = 'VALID' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DSymmetric(self): + np.random.seed(364758) + + x_shape = [4, 10, 12, 6] + f_shape = [3, 4, 6, 5] + strides = [1, 1, 1, 1] + padding = 'SYMMETRIC' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DReflect(self): + np.random.seed(768798) + + x_shape = [4, 10, 12, 6] + f_shape = [3, 4, 6, 5] + strides = [1, 2, 2, 1] + padding = 'REFLECT' + + conv = blocks_std.Conv2D(depth=f_shape[-1], + filter_size=f_shape[0:2], + strides=strides[1:3], + padding=padding, + act=None, + bias=None) + x_value = np.random.normal(size=x_shape) + x = tf.convert_to_tensor(x_value, dtype=tf.float32) + y = conv(x) + + with self.test_session(): + tf.global_variables_initializer().run() + f_value = conv._kernel.eval() + y_value = y.eval() + + y_expected = _NumpyConv2D(x_value, f_value, + strides=strides, padding=padding) + self.assertAllClose(y_expected, y_value) + + def testConv2DBias(self): + input_shape = [19, 14, 14, 64] + filter_shape = [3, 7, 64, 128] + strides = [1, 2, 2, 1] + output_shape = [19, 6, 4, 128] + + conv = blocks_std.Conv2D(depth=filter_shape[-1], + filter_size=filter_shape[0:2], + strides=strides[1:3], + padding='VALID', + act=None, + bias=blocks_std.Bias(1)) + x = tf.placeholder(dtype=tf.float32, shape=input_shape) + + y = conv(x) + self.CheckBiasAdd(y, conv._bias) + self.assertEqual(output_shape, y.get_shape().as_list()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/compression/entropy_coder/model/__init__.py b/compression/entropy_coder/model/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/model/entropy_coder_model.py b/compression/entropy_coder/model/entropy_coder_model.py new file mode 100644 index 0000000000000000000000000000000000000000..67f7eb5bc05f3df7363529c19fa77d176caaabc1 --- /dev/null +++ b/compression/entropy_coder/model/entropy_coder_model.py @@ -0,0 +1,55 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Entropy coder model.""" + + +class EntropyCoderModel(object): + """Entropy coder model.""" + + def __init__(self): + # Loss used for training the model. + self.loss = None + + # Tensorflow op to run to train the model. + self.train_op = None + + # Tensor corresponding to the average code length of the input bit field + # tensor. The average code length is a number of output bits per input bit. + # To get an effective compression, this number should be between 0.0 + # and 1.0 (1.0 corresponds to no compression). + self.average_code_length = None + + def Initialize(self, global_step, optimizer, config_string): + raise NotImplementedError() + + def BuildGraph(self, input_codes): + """Build the Tensorflow graph corresponding to the entropy coder model. + + Args: + input_codes: Tensor of size: batch_size x height x width x bit_depth + corresponding to the codes to compress. + The input codes are {-1, +1} codes. + """ + # TODO: + # - consider switching to {0, 1} codes. + # - consider passing an extra tensor which gives for each (b, y, x) + # what is the actual depth (which would allow to use more or less bits + # for each (y, x) location. + raise NotImplementedError() + + def GetConfigStringForUnitTest(self): + """Returns a default model configuration to be used for unit tests.""" + return None diff --git a/compression/entropy_coder/model/model_factory.py b/compression/entropy_coder/model/model_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..e6f9902f3bb720e76f228f2774a9eaf7774ef191 --- /dev/null +++ b/compression/entropy_coder/model/model_factory.py @@ -0,0 +1,53 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Entropy coder model registrar.""" + + +class ModelFactory(object): + """Factory of encoder/decoder models.""" + + def __init__(self): + self._model_dictionary = dict() + + def RegisterModel(self, + entropy_coder_model_name, + entropy_coder_model_factory): + self._model_dictionary[entropy_coder_model_name] = ( + entropy_coder_model_factory) + + def CreateModel(self, model_name): + current_model_factory = self._model_dictionary[model_name] + return current_model_factory() + + def GetAvailableModels(self): + return self._model_dictionary.keys() + + +_model_registry = ModelFactory() + + +def GetModelRegistry(): + return _model_registry + + +class RegisterEntropyCoderModel(object): + + def __init__(self, model_name): + self._model_name = model_name + + def __call__(self, f): + _model_registry.RegisterModel(self._model_name, f) + return f diff --git a/compression/entropy_coder/progressive/__init__.py b/compression/entropy_coder/progressive/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/compression/entropy_coder/progressive/progressive.py b/compression/entropy_coder/progressive/progressive.py new file mode 100644 index 0000000000000000000000000000000000000000..98777d8d5e7a7c72aba8aa11673c46830f6ef7d2 --- /dev/null +++ b/compression/entropy_coder/progressive/progressive.py @@ -0,0 +1,241 @@ +# Copyright 2017 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Code probability model used for entropy coding.""" + +import json + +import tensorflow as tf + +from entropy_coder.lib import blocks +from entropy_coder.model import entropy_coder_model +from entropy_coder.model import model_factory + +# pylint: disable=not-callable + + +class BrnnPredictor(blocks.BlockBase): + """BRNN prediction applied on one layer.""" + + def __init__(self, code_depth, name=None): + super(BrnnPredictor, self).__init__(name) + + with self._BlockScope(): + hidden_depth = 2 * code_depth + + # What is coming from the previous layer/iteration + # is going through a regular Conv2D layer as opposed to the binary codes + # of the current layer/iteration which are going through a masked + # convolution. + self._adaptation0 = blocks.RasterScanConv2D( + hidden_depth, [7, 7], [1, 1], 'SAME', + strict_order=True, + bias=blocks.Bias(0), act=tf.tanh) + self._adaptation1 = blocks.Conv2D( + hidden_depth, [3, 3], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + self._predictor = blocks.CompositionOperator([ + blocks.LineOperator( + blocks.RasterScanConv2DLSTM( + depth=hidden_depth, + filter_size=[1, 3], + hidden_filter_size=[1, 3], + strides=[1, 1], + padding='SAME')), + blocks.Conv2D(hidden_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh), + blocks.Conv2D(code_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + ]) + + def _Apply(self, x, s): + # Code estimation using both: + # - the state from the previous iteration/layer, + # - the binary codes that are before in raster scan order. + h = tf.concat(values=[self._adaptation0(x), self._adaptation1(s)], axis=3) + + estimated_codes = self._predictor(h) + + return estimated_codes + + +class LayerPrediction(blocks.BlockBase): + """Binary code prediction for one layer.""" + + def __init__(self, layer_count, code_depth, name=None): + super(LayerPrediction, self).__init__(name) + + self._layer_count = layer_count + + # No previous layer. + self._layer_state = None + self._current_layer = 0 + + with self._BlockScope(): + # Layers used to do the conditional code prediction. + self._brnn_predictors = [] + for _ in xrange(layer_count): + self._brnn_predictors.append(BrnnPredictor(code_depth)) + + # Layers used to generate the input of the LSTM operating on the + # iteration/depth domain. + hidden_depth = 2 * code_depth + self._state_blocks = [] + for _ in xrange(layer_count): + self._state_blocks.append(blocks.CompositionOperator([ + blocks.Conv2D( + hidden_depth, [3, 3], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh), + blocks.Conv2D( + code_depth, [3, 3], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + ])) + + # Memory of the RNN is equivalent to the size of 2 layers of binary + # codes. + hidden_depth = 2 * code_depth + self._layer_rnn = blocks.CompositionOperator([ + blocks.Conv2DLSTM( + depth=hidden_depth, + filter_size=[1, 1], + hidden_filter_size=[1, 1], + strides=[1, 1], + padding='SAME'), + blocks.Conv2D(hidden_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh), + blocks.Conv2D(code_depth, [1, 1], [1, 1], 'SAME', + bias=blocks.Bias(0), act=tf.tanh) + ]) + + def _Apply(self, x): + assert self._current_layer < self._layer_count + + # Layer state is set to 0 when there is no previous iteration. + if self._layer_state is None: + self._layer_state = tf.zeros_like(x, dtype=tf.float32) + + # Code estimation using both: + # - the state from the previous iteration/layer, + # - the binary codes that are before in raster scan order. + estimated_codes = self._brnn_predictors[self._current_layer]( + x, self._layer_state) + + # Compute the updated layer state. + h = self._state_blocks[self._current_layer](x) + self._layer_state = self._layer_rnn(h) + self._current_layer += 1 + + return estimated_codes + + +class ProgressiveModel(entropy_coder_model.EntropyCoderModel): + """Progressive BRNN entropy coder model.""" + + def __init__(self): + super(ProgressiveModel, self).__init__() + + def Initialize(self, global_step, optimizer, config_string): + if config_string is None: + raise ValueError('The progressive model requires a configuration.') + config = json.loads(config_string) + if 'coded_layer_count' not in config: + config['coded_layer_count'] = 0 + + self._config = config + self._optimizer = optimizer + self._global_step = global_step + + def BuildGraph(self, input_codes): + """Build the graph corresponding to the progressive BRNN model.""" + layer_depth = self._config['layer_depth'] + layer_count = self._config['layer_count'] + + code_shape = input_codes.get_shape() + code_depth = code_shape[-1].value + if self._config['coded_layer_count'] > 0: + prefix_depth = self._config['coded_layer_count'] * layer_depth + if code_depth < prefix_depth: + raise ValueError('Invalid prefix depth: {} VS {}'.format( + prefix_depth, code_depth)) + input_codes = input_codes[:, :, :, :prefix_depth] + + code_shape = input_codes.get_shape() + code_depth = code_shape[-1].value + if code_depth % layer_depth != 0: + raise ValueError( + 'Code depth must be a multiple of the layer depth: {} vs {}'.format( + code_depth, layer_depth)) + code_layer_count = code_depth // layer_depth + if code_layer_count > layer_count: + raise ValueError('Input codes have too many layers: {}, max={}'.format( + code_layer_count, layer_count)) + + # Block used to estimate binary codes. + layer_prediction = LayerPrediction(layer_count, layer_depth) + + # Block used to compute code lengths. + code_length_block = blocks.CodeLength() + + # Loop over all the layers. + code_length = [] + code_layers = tf.split( + value=input_codes, num_or_size_splits=code_layer_count, axis=3) + for k in xrange(code_layer_count): + x = code_layers[k] + predicted_x = layer_prediction(x) + # Saturate the prediction to avoid infinite code length. + epsilon = 0.001 + predicted_x = tf.clip_by_value( + predicted_x, -1 + epsilon, +1 - epsilon) + code_length.append(code_length_block( + blocks.ConvertSignCodeToZeroOneCode(x), + blocks.ConvertSignCodeToZeroOneCode(predicted_x))) + tf.summary.scalar('code_length_layer_{:02d}'.format(k), code_length[-1]) + code_length = tf.stack(code_length) + self.loss = tf.reduce_mean(code_length) + tf.summary.scalar('loss', self.loss) + + # Loop over all the remaining layers just to make sure they are + # instantiated. Otherwise, loading model params could fail. + dummy_x = tf.zeros_like(code_layers[0]) + for _ in xrange(layer_count - code_layer_count): + dummy_predicted_x = layer_prediction(dummy_x) + + # Average bitrate over total_line_count. + self.average_code_length = tf.reduce_mean(code_length) + + if self._optimizer: + optim_op = self._optimizer.minimize(self.loss, + global_step=self._global_step) + block_updates = blocks.CreateBlockUpdates() + if block_updates: + with tf.get_default_graph().control_dependencies([optim_op]): + self.train_op = tf.group(*block_updates) + else: + self.train_op = optim_op + else: + self.train_op = None + + def GetConfigStringForUnitTest(self): + s = '{\n' + s += '"layer_depth": 1,\n' + s += '"layer_count": 8\n' + s += '}\n' + return s + + +@model_factory.RegisterEntropyCoderModel('progressive') +def CreateProgressiveModel(): + return ProgressiveModel() diff --git a/compression/image_encoder/README.md b/compression/image_encoder/README.md new file mode 100644 index 0000000000000000000000000000000000000000..916820e2062567c96e51229d9881ce731f2a94fa --- /dev/null +++ b/compression/image_encoder/README.md @@ -0,0 +1,105 @@ +# Image Compression with Neural Networks + +This is a [TensorFlow](http://www.tensorflow.org/) model for compressing and +decompressing images using an already trained Residual GRU model as descibed +in [Full Resolution Image Compression with Recurrent Neural Networks](https://arxiv.org/abs/1608.05148). Please consult the paper for more details +on the architecture and compression results. + +This code will allow you to perform the lossy compression on an model +already trained on compression. This code doesn't not currently contain the +Entropy Coding portions of our paper. + + +## Prerequisites +The only software requirements for running the encoder and decoder is having +Tensorflow installed. You will also need to [download](http://download.tensorflow.org/models/compression_residual_gru-2016-08-23.tar.gz) +and extract the model residual_gru.pb. + +If you want to generate the perceptual similarity under MS-SSIM, you will also +need to [Install SciPy](https://www.scipy.org/install.html). + +## Encoding +The Residual GRU network is fully convolutional, but requires the images +height and width in pixels by a multiple of 32. There is an image in this folder +called example.png that is 768x1024 if one is needed for testing. We also +rely on TensorFlow's built in decoding ops, which support only PNG and JPEG at +time of release. + +To encode an image, simply run the following command: + +`python encoder.py --input_image=/your/image/here.png +--output_codes=output_codes.npz --iteration=15 +--model=/path/to/model/residual_gru.pb +` + +The iteration parameter specifies the lossy-quality to target for compression. +The quality can be [0-15], where 0 corresponds to a target of 1/8 (bits per +pixel) bpp and every increment results in an additional 1/8 bpp. + +| Iteration | BPP | Compression Ratio | +|---: |---: |---: | +|0 | 0.125 | 192:1| +|1 | 0.250 | 96:1| +|2 | 0.375 | 64:1| +|3 | 0.500 | 48:1| +|4 | 0.625 | 38.4:1| +|5 | 0.750 | 32:1| +|6 | 0.875 | 27.4:1| +|7 | 1.000 | 24:1| +|8 | 1.125 | 21.3:1| +|9 | 1.250 | 19.2:1| +|10 | 1.375 | 17.4:1| +|11 | 1.500 | 16:1| +|12 | 1.625 | 14.7:1| +|13 | 1.750 | 13.7:1| +|14 | 1.875 | 12.8:1| +|15 | 2.000 | 12:1| + +The output_codes file contains the numpy shape and a flattened, bit-packed +array of the codes. These can be inspected in python by using numpy.load(). + + +## Decoding +After generating codes for an image, the lossy reconstructions for that image +can be done as follows: + +`python decoder.py --input_codes=codes.npz --output_directory=/tmp/decoded/ +--model=residual_gru.pb` + +The output_directory will contain images decoded at each quality level. + + +## Comparing Similarity +One of our primary metrics for comparing how similar two images are +is MS-SSIM. + +To generate these metrics on your images you can run: +`python msssim.py --original_image=/path/to/your/image.png +--compared_image=/tmp/decoded/image_15.png` + + +## Results +CSV results containing the post-entropy bitrates and MS-SSIM over Kodak can +are available for reference. Each row of the CSV represents each of the Kodak +images in their dataset number (1-24). Each column of the CSV represents each +iteration of the model (1-16). + +[Post Entropy Bitrates](https://storage.googleapis.com/compression-ml/residual_gru_results/bitrate.csv) + +[MS-SSIM](https://storage.googleapis.com/compression-ml/residual_gru_results/msssim.csv) + + +## FAQ + +#### How do I train my own compression network? +We currently don't provide the code to build and train a compression +graph from scratch. + +#### I get an InvalidArgumentError: Incompatible shapes. +This is usually due to the fact that our network only supports images that are +both height and width divisible by 32 pixel. Try padding your images to 32 +pixel boundaries. + + +## Contact Info +Model repository maintained by Nick Johnston ([nickj-google](https://github.com/nickj-google)). diff --git a/compression/decoder.py b/compression/image_encoder/decoder.py old mode 100755 new mode 100644 similarity index 100% rename from compression/decoder.py rename to compression/image_encoder/decoder.py diff --git a/compression/encoder.py b/compression/image_encoder/encoder.py old mode 100755 new mode 100644 similarity index 100% rename from compression/encoder.py rename to compression/image_encoder/encoder.py diff --git a/compression/example.png b/compression/image_encoder/example.png similarity index 100% rename from compression/example.png rename to compression/image_encoder/example.png diff --git a/compression/msssim.py b/compression/image_encoder/msssim.py old mode 100755 new mode 100644 similarity index 100% rename from compression/msssim.py rename to compression/image_encoder/msssim.py diff --git a/differential_privacy/README.md b/differential_privacy/README.md index 9cda93aa18c06b51f2671e56b731adcf746189b9..4bd6c22c99830a329db4ae887d8243d0c1b8f931 100644 --- a/differential_privacy/README.md +++ b/differential_privacy/README.md @@ -3,7 +3,7 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718) -###Introduction for dp_sgd/README.md +### Introduction for [dp_sgd/README.md](dp_sgd/README.md) Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires @@ -18,7 +18,7 @@ manageable cost in software complexity, training efficiency, and model quality. paper: https://arxiv.org/abs/1607.00133 -###Introduction for multiple_teachers/README.md +### Introduction for [multiple_teachers/README.md](multiple_teachers/README.md) This repository contains code to create a setup for learning privacy-preserving student models by transferring knowledge from an ensemble of teachers trained diff --git a/differential_privacy/dp_sgd/README.md b/differential_privacy/dp_sgd/README.md index 887a13e8fbb61633ab6f869c60dc65ec2bcbf6bb..6c0846748b3516a12ccc126ef1bea843b6635914 100644 --- a/differential_privacy/dp_sgd/README.md +++ b/differential_privacy/dp_sgd/README.md @@ -8,14 +8,14 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718) -Machine learning techniques based on neural networks are achieving remarkable -results in a wide variety of domains. Often, the training of models requires -large, representative datasets, which may be crowdsourced and contain sensitive -information. The models should not expose private information in these datasets. -Addressing this goal, we develop new algorithmic techniques for learning and a -refined analysis of privacy costs within the framework of differential privacy. -Our implementation and experiments demonstrate that we can train deep neural -networks with non-convex objectives, under a modest privacy budget, and at a +Machine learning techniques based on neural networks are achieving remarkable +results in a wide variety of domains. Often, the training of models requires +large, representative datasets, which may be crowdsourced and contain sensitive +information. The models should not expose private information in these datasets. +Addressing this goal, we develop new algorithmic techniques for learning and a +refined analysis of privacy costs within the framework of differential privacy. +Our implementation and experiments demonstrate that we can train deep neural +networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality. paper: https://arxiv.org/abs/1607.00133 @@ -46,7 +46,7 @@ https://github.com/panyx0718/models/tree/master/slim # Download the data to the data/ directory. # List the codes. -ls -R differential_privacy/ +$ ls -R differential_privacy/ differential_privacy/: dp_sgd __init__.py privacy_accountant README.md @@ -72,16 +72,16 @@ differential_privacy/privacy_accountant/tf: accountant.py accountant_test.py BUILD # List the data. -ls -R data/ +$ ls -R data/ ./data: mnist_test.tfrecord mnist_train.tfrecord # Build the codes. -bazel build -c opt differential_privacy/... +$ bazel build -c opt differential_privacy/... # Run the mnist differntial privacy training codes. -bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \ +$ bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \ --training_data_path=data/mnist_train.tfrecord \ --eval_data_path=data/mnist_test.tfrecord \ --save_path=/tmp/mnist_dir @@ -102,6 +102,6 @@ train_accuracy: 0.53 eval_accuracy: 0.53 ... -ls /tmp/mnist_dir/ +$ ls /tmp/mnist_dir/ checkpoint ckpt ckpt.meta results-0.json ``` diff --git a/differential_privacy/dp_sgd/dp_mnist/dp_mnist.py b/differential_privacy/dp_sgd/dp_mnist/dp_mnist.py index 6e9a6274916705deba95dd27be7d6b8096953f3b..6c2cc49b51afa0a2381308425cce1509e0be3d4d 100644 --- a/differential_privacy/dp_sgd/dp_mnist/dp_mnist.py +++ b/differential_privacy/dp_sgd/dp_mnist/dp_mnist.py @@ -273,7 +273,7 @@ def Train(mnist_train_file, mnist_test_file, network_parameters, num_steps, images, network_parameters) cost = tf.nn.softmax_cross_entropy_with_logits( - logits, tf.one_hot(labels, 10)) + logits=logits, labels=tf.one_hot(labels, 10)) # The actual cost is the average across the examples. cost = tf.reduce_sum(cost, [0]) / batch_size @@ -343,7 +343,7 @@ def Train(mnist_train_file, mnist_test_file, network_parameters, num_steps, # We need to maintain the intialization sequence. for v in tf.trainable_variables(): - sess.run(tf.initialize_variables([v])) + sess.run(tf.variables_initializer([v])) sess.run(tf.global_variables_initializer()) sess.run(init_ops) diff --git a/differential_privacy/dp_sgd/dp_optimizer/dp_pca.py b/differential_privacy/dp_sgd/dp_optimizer/dp_pca.py index 6c2dc6c0ae71b604472e9d213a6eb39fd63f821b..7bea2133eae4232f914795a2d5bed5e98b59a74b 100644 --- a/differential_privacy/dp_sgd/dp_optimizer/dp_pca.py +++ b/differential_privacy/dp_sgd/dp_optimizer/dp_pca.py @@ -27,7 +27,7 @@ def ComputeDPPrincipalProjection(data, projection_dims, Args: data: the input data, each row is a data vector. projection_dims: the projection dimension. - sanitizer: the sanitizer used for acheiving privacy. + sanitizer: the sanitizer used for achieving privacy. eps_delta: (eps, delta) pair. sigma: if not None, use noise sigma; otherwise compute it using eps_delta pair. diff --git a/differential_privacy/dp_sgd/dp_optimizer/utils.py b/differential_privacy/dp_sgd/dp_optimizer/utils.py index f751b7a518e10677ba93e80af273766f1a342146..5fa57c82cf55e688cbfad7d2020b7b2bccf14401 100644 --- a/differential_privacy/dp_sgd/dp_optimizer/utils.py +++ b/differential_privacy/dp_sgd/dp_optimizer/utils.py @@ -233,10 +233,11 @@ def BatchClipByL2norm(t, upper_bound, name=None): """ assert upper_bound > 0 - with tf.op_scope([t, upper_bound], name, "batch_clip_by_l2norm") as name: + with tf.name_scope(values=[t, upper_bound], name=name, + default_name="batch_clip_by_l2norm") as name: saved_shape = tf.shape(t) batch_size = tf.slice(saved_shape, [0], [1]) - t2 = tf.reshape(t, tf.concat(0, [batch_size, [-1]])) + t2 = tf.reshape(t, tf.concat(axis=0, values=[batch_size, [-1]])) upper_bound_inv = tf.fill(tf.slice(saved_shape, [0], [1]), tf.constant(1.0/upper_bound)) # Add a small number to avoid divide by 0 @@ -264,9 +265,10 @@ def SoftThreshold(t, threshold_ratio, name=None): """ assert threshold_ratio >= 0 - with tf.op_scope([t, threshold_ratio], name, "soft_thresholding") as name: + with tf.name_scope(values=[t, threshold_ratio], name=name, + default_name="soft_thresholding") as name: saved_shape = tf.shape(t) - t2 = tf.reshape(t, tf.concat(0, [tf.slice(saved_shape, [0], [1]), -1])) + t2 = tf.reshape(t, tf.concat(axis=0, values=[tf.slice(saved_shape, [0], [1]), -1])) t_abs = tf.abs(t2) t_x = tf.sign(t2) * tf.nn.relu(t_abs - (tf.reduce_mean(t_abs, [0], @@ -286,7 +288,8 @@ def AddGaussianNoise(t, sigma, name=None): the noisy tensor. """ - with tf.op_scope([t, sigma], name, "add_gaussian_noise") as name: + with tf.name_scope(values=[t, sigma], name=name, + default_name="add_gaussian_noise") as name: noisy_t = t + tf.random_normal(tf.shape(t), stddev=sigma) return noisy_t diff --git a/differential_privacy/dp_sgd/per_example_gradients/per_example_gradients.py b/differential_privacy/dp_sgd/per_example_gradients/per_example_gradients.py index 4931e2751b856d7cf425379eee24590c53b90fb5..82b3ae2da207bca83c23835bdaf84e7f049e5e64 100644 --- a/differential_privacy/dp_sgd/per_example_gradients/per_example_gradients.py +++ b/differential_privacy/dp_sgd/per_example_gradients/per_example_gradients.py @@ -189,7 +189,7 @@ class MatMulPXG(object): z_grads, = z_grads x_expanded = tf.expand_dims(x, 2) z_grads_expanded = tf.expand_dims(z_grads, 1) - return tf.mul(x_expanded, z_grads_expanded) + return tf.multiply(x_expanded, z_grads_expanded) pxg_registry.Register("MatMul", MatMulPXG) @@ -245,7 +245,7 @@ class Conv2DPXG(object): num_x = int(conv_x.get_shape()[0]) assert num_x == 1, num_x assert len(conv_px) == batch_size - conv = tf.concat(0, conv_px) + conv = tf.concat(axis=0, values=conv_px) assert int(conv.get_shape()[0]) == batch_size return conv, w_px @@ -274,7 +274,7 @@ class Conv2DPXG(object): self.colocate_gradients_with_ops, gate_gradients=self.gate_gradients) - return tf.pack(gradients_list) + return tf.stack(gradients_list) pxg_registry.Register("Conv2D", Conv2DPXG) diff --git a/differential_privacy/multiple_teachers/analysis.py b/differential_privacy/multiple_teachers/analysis.py index 30fb865ffa3c12e788c5db474fdd737f5a0f2188..44647cdfaa10fc2d23ee7d249a2be9a6d07fefdd 100644 --- a/differential_privacy/multiple_teachers/analysis.py +++ b/differential_privacy/multiple_teachers/analysis.py @@ -216,10 +216,10 @@ def main(unused_argv): # If we are reproducing results from paper https://arxiv.org/abs/1610.05755, # download the required binaries with label information. ################################################################## - + # Binaries for MNIST results paper_binaries_mnist = \ - ["https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_labels.npy?raw=true", + ["https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_labels.npy?raw=true", "https://github.com/npapernot/multiple-teachers-for-privacy/blob/master/mnist_250_teachers_100_indices_used_by_student.npy?raw=true"] if FLAGS.counts_file == "mnist_250_teachers_labels.npy" \ or FLAGS.indices_file == "mnist_250_teachers_100_indices_used_by_student.npy": @@ -254,7 +254,7 @@ def main(unused_argv): total_log_mgf_nm = np.array([0.0 for _ in l_list]) total_ss_nm = np.array([0.0 for _ in l_list]) noise_eps = FLAGS.noise_eps - + for i in indices: total_log_mgf_nm += np.array( [logmgf_from_counts(counts_mat[i], noise_eps, l) @@ -287,7 +287,7 @@ def main(unused_argv): if min(eps_list_nm) == eps_list_nm[-1]: print "Warning: May not have used enough values of l" - # Data indpendent bound, as mechanism is + # Data independent bound, as mechanism is # 2*noise_eps DP. data_ind_log_mgf = np.array([0.0 for _ in l_list]) data_ind_log_mgf += num_examples * np.array( diff --git a/differential_privacy/multiple_teachers/deep_cnn.py b/differential_privacy/multiple_teachers/deep_cnn.py index afc46eec1e8c62b12da8c52ac661025acf23d558..cc34d0a2f3ea7907a439faf178b1bb04467821dd 100644 --- a/differential_privacy/multiple_teachers/deep_cnn.py +++ b/differential_privacy/multiple_teachers/deep_cnn.py @@ -75,7 +75,7 @@ def _variable_with_weight_decay(name, shape, stddev, wd): var = _variable_on_cpu(name, shape, tf.truncated_normal_initializer(stddev=stddev)) if wd is not None: - weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss') + weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss') tf.add_to_collection('losses', weight_decay) return var @@ -84,7 +84,7 @@ def inference(images, dropout=False): """Build the CNN model. Args: images: Images returned from distorted_inputs() or inputs(). - dropout: Boolean controling whether to use dropout or not + dropout: Boolean controlling whether to use dropout or not Returns: Logits """ @@ -95,9 +95,9 @@ def inference(images, dropout=False): # conv1 with tf.variable_scope('conv1') as scope: - kernel = _variable_with_weight_decay('weights', + kernel = _variable_with_weight_decay('weights', shape=first_conv_shape, - stddev=1e-4, + stddev=1e-4, wd=0.0) conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME') biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0)) @@ -108,25 +108,25 @@ def inference(images, dropout=False): # pool1 - pool1 = tf.nn.max_pool(conv1, - ksize=[1, 3, 3, 1], + pool1 = tf.nn.max_pool(conv1, + ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], - padding='SAME', + padding='SAME', name='pool1') - + # norm1 - norm1 = tf.nn.lrn(pool1, - 4, - bias=1.0, - alpha=0.001 / 9.0, + norm1 = tf.nn.lrn(pool1, + 4, + bias=1.0, + alpha=0.001 / 9.0, beta=0.75, name='norm1') # conv2 with tf.variable_scope('conv2') as scope: - kernel = _variable_with_weight_decay('weights', + kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 128], - stddev=1e-4, + stddev=1e-4, wd=0.0) conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME') biases = _variable_on_cpu('biases', [128], tf.constant_initializer(0.1)) @@ -137,18 +137,18 @@ def inference(images, dropout=False): # norm2 - norm2 = tf.nn.lrn(conv2, - 4, - bias=1.0, - alpha=0.001 / 9.0, + norm2 = tf.nn.lrn(conv2, + 4, + bias=1.0, + alpha=0.001 / 9.0, beta=0.75, name='norm2') - + # pool2 - pool2 = tf.nn.max_pool(norm2, + pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1], - strides=[1, 2, 2, 1], - padding='SAME', + strides=[1, 2, 2, 1], + padding='SAME', name='pool2') # local3 @@ -156,9 +156,9 @@ def inference(images, dropout=False): # Move everything into depth so we can perform a single matrix multiply. reshape = tf.reshape(pool2, [FLAGS.batch_size, -1]) dim = reshape.get_shape()[1].value - weights = _variable_with_weight_decay('weights', + weights = _variable_with_weight_decay('weights', shape=[dim, 384], - stddev=0.04, + stddev=0.04, wd=0.004) biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1)) local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name) @@ -167,9 +167,9 @@ def inference(images, dropout=False): # local4 with tf.variable_scope('local4') as scope: - weights = _variable_with_weight_decay('weights', + weights = _variable_with_weight_decay('weights', shape=[384, 192], - stddev=0.04, + stddev=0.04, wd=0.004) biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1)) local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name) @@ -178,11 +178,11 @@ def inference(images, dropout=False): # compute logits with tf.variable_scope('softmax_linear') as scope: - weights = _variable_with_weight_decay('weights', + weights = _variable_with_weight_decay('weights', [192, FLAGS.nb_labels], - stddev=1/192.0, + stddev=1/192.0, wd=0.0) - biases = _variable_on_cpu('biases', + biases = _variable_on_cpu('biases', [FLAGS.nb_labels], tf.constant_initializer(0.0)) logits = tf.add(tf.matmul(local4, weights), biases, name=scope.name) @@ -194,7 +194,7 @@ def inference_deeper(images, dropout=False): """Build a deeper CNN model. Args: images: Images returned from distorted_inputs() or inputs(). - dropout: Boolean controling whether to use dropout or not + dropout: Boolean controlling whether to use dropout or not Returns: Logits """ @@ -386,7 +386,7 @@ def train_op_fun(total_loss, global_step): """ # Variables that affect learning rate. nb_ex_per_train_epoch = int(60000 / FLAGS.nb_teachers) - + num_batches_per_epoch = nb_ex_per_train_epoch / FLAGS.batch_size decay_steps = int(num_batches_per_epoch * FLAGS.epochs_per_decay) @@ -398,7 +398,7 @@ def train_op_fun(total_loss, global_step): decay_steps, LEARNING_RATE_DECAY_FACTOR, staircase=True) - tf.scalar_summary('learning_rate', lr) + tf.summary.scalar('learning_rate', lr) # Generate moving averages of all losses and associated summaries. loss_averages_op = moving_av(total_loss) @@ -413,7 +413,7 @@ def train_op_fun(total_loss, global_step): # Add histograms for trainable variables. for var in tf.trainable_variables(): - tf.histogram_summary(var.op.name, var) + tf.summary.histogram(var.op.name, var) # Track the moving averages of all trainable variables. variable_averages = tf.train.ExponentialMovingAverage( @@ -485,7 +485,7 @@ def train(images, labels, ckpt_path, dropout=False): train_op = train_op_fun(loss, global_step) # Create a saver. - saver = tf.train.Saver(tf.all_variables()) + saver = tf.train.Saver(tf.global_variables()) print("Graph constructed and saver created") diff --git a/differential_privacy/multiple_teachers/input.py b/differential_privacy/multiple_teachers/input.py index e57da68782a425660ca020469f520bfbe96a1aca..bc8dec915b2a0f836e501455704016f4b1e4eff1 100644 --- a/differential_privacy/multiple_teachers/input.py +++ b/differential_privacy/multiple_teachers/input.py @@ -47,7 +47,7 @@ def create_dir_if_needed(dest_directory): def maybe_download(file_urls, directory): """ Download a set of files in temporary local folder - :param directory: the directory where to download + :param directory: the directory where to download :return: a tuple of filepaths corresponding to the files given as input """ # Create directory if doesn't exist @@ -73,7 +73,7 @@ def maybe_download(file_urls, directory): result.append(filepath) # Test if file already exists - if not gfile.Exists(filepath): + if not tf.gfile.Exists(filepath): def _progress(count, block_size, total_size): sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename, float(count * block_size) / float(total_size) * 100.0)) @@ -124,7 +124,7 @@ def extract_svhn(local_url): :return: """ - with gfile.Open(local_url, mode='r') as file_obj: + with tf.gfile.Open(local_url, mode='r') as file_obj: # Load MATLAB matrix using scipy IO dict = loadmat(file_obj) diff --git a/differential_privacy/multiple_teachers/train_teachers.py b/differential_privacy/multiple_teachers/train_teachers.py index 16e55b151695d357d21f4c243e32417338cd2447..fdb7634f4d8f29d8292642bf6fe050fcd082854f 100644 --- a/differential_privacy/multiple_teachers/train_teachers.py +++ b/differential_privacy/multiple_teachers/train_teachers.py @@ -64,11 +64,11 @@ def train_teacher(dataset, nb_teachers, teacher_id): else: print("Check value of dataset flag") return False - + # Retrieve subset of data for this teacher - data, labels = input.partition_dataset(train_data, - train_labels, - nb_teachers, + data, labels = input.partition_dataset(train_data, + train_labels, + nb_teachers, teacher_id) print("Length of training data: " + str(len(labels))) diff --git a/differential_privacy/privacy_accountant/tf/accountant.py b/differential_privacy/privacy_accountant/tf/accountant.py index e1aab7c5cb783e88542ccc1435376f1fba20139f..bde3607a383dc7f5ab582ee7d120e3827320ec41 100644 --- a/differential_privacy/privacy_accountant/tf/accountant.py +++ b/differential_privacy/privacy_accountant/tf/accountant.py @@ -152,7 +152,7 @@ class MomentsAccountant(object): We further assume that at each step, the mechanism operates on a random sample with sampling probability q = batch_size / total_examples. Then E[exp(L X)] = E[(Pr[M(D)==x / Pr[M(D')==x])^L] - By distinguishign two cases of wether D < D' or D' < D, we have + By distinguishing two cases of whether D < D' or D' < D, we have that E[exp(L X)] <= max (I1, I2) where @@ -361,12 +361,12 @@ class GaussianMomentsAccountant(MomentsAccountant): exponents = tf.constant([j * (j + 1.0 - 2.0 * s) / (2.0 * sigma * sigma) for j in range(t + 1)], dtype=tf.float64) # x[i, j] = binomial[i, j] * signs[i, j] = (i choose j) * (-1)^{i-j} - x = tf.mul(binomial, signs) + x = tf.multiply(binomial, signs) # y[i, j] = x[i, j] * exp(exponents[j]) # = (i choose j) * (-1)^{i-j} * exp(j(j-1)/(2 sigma^2)) # Note: this computation is done by broadcasting pointwise multiplication # between [t+1, t+1] tensor and [t+1] tensor. - y = tf.mul(x, tf.exp(exponents)) + y = tf.multiply(x, tf.exp(exponents)) # z[i] = sum_j y[i, j] # = sum_j (i choose j) * (-1)^{i-j} * exp(j(j-1)/(2 sigma^2)) z = tf.reduce_sum(y, 1) diff --git a/domain_adaptation/README.md b/domain_adaptation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b1b639d50208d3cd8db0fd6b7220117aee1ea64d --- /dev/null +++ b/domain_adaptation/README.md @@ -0,0 +1,69 @@ +# Domain Separation Networks + + +## Introduction +This code is the code used for the "Domain Separation Networks" paper +by Bousmalis K., Trigeorgis G., et al. which was presented at NIPS 2016. The +paper can be found here: https://arxiv.org/abs/1608.06019. + +## Contact +This code was open-sourced by [Konstantinos Bousmalis](https://github.com/bousmalis) (konstantinos@google.com). + +## Installation +You will need to have the following installed on your machine before trying out the DSN code. + +* Tensorflow: https://www.tensorflow.org/install/ +* Bazel: https://bazel.build/ + +## Important Note +Although we are making the code available, you are only able to use the MNIST +provider for now. We will soon provide a script to download and convert MNIST-M +as well. Check back here in a few weeks or wait for a relevant announcement from +[@bousmalis](https://twitter.com/bousmalis). + +## Running the code for adapting MNIST to MNIST-M +In order to run the MNIST to MNIST-M experiments with DANNs and/or DANNs with +domain separation (DSNs) you will need to set the directory you used to download +MNIST and MNIST-M: + +``` +$ export DSN_DATA_DIR=/your/dir +``` + +Add models and models/slim to your `$PYTHONPATH`: + +``` +$ export PYTHONPATH=$PYTHONPATH:$PWD:$PWD/slim +``` + +Then you need to build the binaries with Bazel: + +``` +$ bazel build -c opt domain_adaptation/domain_separation/... +``` + +You can then train with the following command: + +``` +$ ./bazel-bin/domain_adaptation/domain_separation/dsn_train \ + --similarity_loss=dann_loss \ + --basic_tower=dann_mnist \ + --source_dataset=mnist \ + --target_dataset=mnist_m \ + --learning_rate=0.0117249 \ + --gamma_weight=0.251175 \ + --weight_decay=1e-6 \ + --layers_to_regularize=fc3 \ + --nouse_separation \ + --master="" \ + --dataset_dir=${DSN_DATA_DIR} \ + -v --use_logging +``` + +Evaluation can be invoked with the following command: + +``` +$ ./bazel-bin/domain_adaptation/domain_separation/dsn_eval \ + -v --dataset mnist_m --split test --num_examples=9001 \ + --dataset_dir=${DSN_DATA_DIR} +``` diff --git a/domain_adaptation/WORKSPACE b/domain_adaptation/WORKSPACE new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/domain_adaptation/__init__.py b/domain_adaptation/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/domain_adaptation/datasets/BUILD b/domain_adaptation/datasets/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..94d2c366ea2ee04d7ece6e3895934af6b9bf041d --- /dev/null +++ b/domain_adaptation/datasets/BUILD @@ -0,0 +1,35 @@ +# Domain Adaptation Scenarios Datasets + +package( + default_visibility = [ + ":internal", + ], +) + +licenses(["notice"]) # Apache 2.0 + +exports_files(["LICENSE"]) + +package_group( + name = "internal", + packages = [ + "//domain_adaptation/...", + ], +) + +py_library( + name = "dataset_factory", + srcs = ["dataset_factory.py"], + deps = [ + ":mnist_m", + "//slim:mnist", + ], +) + +py_binary( + name = "mnist_m", + srcs = ["mnist_m.py"], + deps = [ + "//slim:dataset_utils", + ], +) diff --git a/domain_adaptation/datasets/__init__.py b/domain_adaptation/datasets/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/domain_adaptation/datasets/dataset_factory.py b/domain_adaptation/datasets/dataset_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..0b09931d1afd43504fe03234c6373ab85b2cc4b4 --- /dev/null +++ b/domain_adaptation/datasets/dataset_factory.py @@ -0,0 +1,106 @@ +# Copyright 2016 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""A factory-pattern class which returns image/label pairs.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +from slim.datasets import mnist +from domain_adaptation.datasets import mnist_m + +slim = tf.contrib.slim + + +def get_dataset(dataset_name, + split_name, + dataset_dir, + file_pattern=None, + reader=None): + """Given a dataset name and a split_name returns a Dataset. + + Args: + dataset_name: String, the name of the dataset. + split_name: A train/test split name. + dataset_dir: The directory where the dataset files are stored. + file_pattern: The file pattern to use for matching the dataset source files. + reader: The subclass of tf.ReaderBase. If left as `None`, then the default + reader defined by each dataset is used. + + Returns: + A tf-slim `Dataset` class. + + Raises: + ValueError: if `dataset_name` isn't recognized. + """ + dataset_name_to_module = {'mnist': mnist, 'mnist_m': mnist_m} + if dataset_name not in dataset_name_to_module: + raise ValueError('Name of dataset unknown %s.' % dataset_name) + + return dataset_name_to_module[dataset_name].get_split(split_name, dataset_dir, + file_pattern, reader) + + +def provide_batch(dataset_name, split_name, dataset_dir, num_readers, + batch_size, num_preprocessing_threads): + """Provides a batch of images and corresponding labels. + + Args: + dataset_name: String, the name of the dataset. + split_name: A train/test split name. + dataset_dir: The directory where the dataset files are stored. + num_readers: The number of readers used by DatasetDataProvider. + batch_size: The size of the batch requested. + num_preprocessing_threads: The number of preprocessing threads for + tf.train.batch. + file_pattern: The file pattern to use for matching the dataset source files. + reader: The subclass of tf.ReaderBase. If left as `None`, then the default + reader defined by each dataset is used. + + Returns: + A batch of + images: tensor of [batch_size, height, width, channels]. + labels: dictionary of labels. + """ + dataset = get_dataset(dataset_name, split_name, dataset_dir) + provider = slim.dataset_data_provider.DatasetDataProvider( + dataset, + num_readers=num_readers, + common_queue_capacity=20 * batch_size, + common_queue_min=10 * batch_size) + [image, label] = provider.get(['image', 'label']) + + # Convert images to float32 + image = tf.image.convert_image_dtype(image, tf.float32) + image -= 0.5 + image *= 2 + + # Load the data. + labels = {} + images, labels['classes'] = tf.train.batch( + [image, label], + batch_size=batch_size, + num_threads=num_preprocessing_threads, + capacity=5 * batch_size) + labels['classes'] = slim.one_hot_encoding(labels['classes'], + dataset.num_classes) + + # Convert mnist to RGB and 32x32 so that it can match mnist_m. + if dataset_name == 'mnist': + images = tf.image.grayscale_to_rgb(images) + images = tf.image.resize_images(images, [32, 32]) + return images, labels diff --git a/domain_adaptation/datasets/mnist_m.py b/domain_adaptation/datasets/mnist_m.py new file mode 100644 index 0000000000000000000000000000000000000000..134f10279853a0b64154eaa28dbe2586bf1f2334 --- /dev/null +++ b/domain_adaptation/datasets/mnist_m.py @@ -0,0 +1,94 @@ +# Copyright 2016 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Provides data for the MNIST-M dataset. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os +import tensorflow as tf + +from slim.datasets import dataset_utils + +slim = tf.contrib.slim + +_FILE_PATTERN = 'mnist_m_%s.tfrecord' + +_SPLITS_TO_SIZES = {'train': 58001, 'valid': 1000, 'test': 9001} + +_NUM_CLASSES = 10 + +_ITEMS_TO_DESCRIPTIONS = { + 'image': 'A [32 x 32 x 1] RGB image.', + 'label': 'A single integer between 0 and 9', +} + + +def get_split(split_name, dataset_dir, file_pattern=None, reader=None): + """Gets a dataset tuple with instructions for reading MNIST. + + Args: + split_name: A train/test split name. + dataset_dir: The base directory of the dataset sources. + + Returns: + A `Dataset` namedtuple. + + Raises: + ValueError: if `split_name` is not a valid train/test split. + """ + if split_name not in _SPLITS_TO_SIZES: + raise ValueError('split name %s was not recognized.' % split_name) + + if not file_pattern: + file_pattern = _FILE_PATTERN + file_pattern = os.path.join(dataset_dir, file_pattern % split_name) + + # Allowing None in the signature so that dataset_factory can use the default. + if reader is None: + reader = tf.TFRecordReader + + keys_to_features = { + 'image/encoded': + tf.FixedLenFeature((), tf.string, default_value=''), + 'image/format': + tf.FixedLenFeature((), tf.string, default_value='png'), + 'image/class/label': + tf.FixedLenFeature( + [1], tf.int64, default_value=tf.zeros([1], dtype=tf.int64)), + } + + items_to_handlers = { + 'image': slim.tfexample_decoder.Image(shape=[32, 32, 3], channels=3), + 'label': slim.tfexample_decoder.Tensor('image/class/label', shape=[]), + } + + decoder = slim.tfexample_decoder.TFExampleDecoder( + keys_to_features, items_to_handlers) + + labels_to_names = None + if dataset_utils.has_labels(dataset_dir): + labels_to_names = dataset_utils.read_label_file(dataset_dir) + + return slim.dataset.Dataset( + data_sources=file_pattern, + reader=reader, + decoder=decoder, + num_samples=_SPLITS_TO_SIZES[split_name], + num_classes=_NUM_CLASSES, + items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, + labels_to_names=labels_to_names) diff --git a/domain_adaptation/domain_separation/BUILD b/domain_adaptation/domain_separation/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..14dceda27e49d74eaaaeae21676183b78c72b9c2 --- /dev/null +++ b/domain_adaptation/domain_separation/BUILD @@ -0,0 +1,157 @@ +# Domain Separation Networks + +package( + default_visibility = [ + ":internal", + ], +) + +licenses(["notice"]) # Apache 2.0 + +exports_files(["LICENSE"]) + +package_group( + name = "internal", + packages = [ + "//domain_adaptation/...", + ], +) + +py_library( + name = "models", + srcs = [ + "models.py", + ], + deps = [ + ":utils", + ], +) + +py_library( + name = "losses", + srcs = [ + "losses.py", + ], + deps = [ + ":grl_op_grads_py", + ":grl_op_shapes_py", + ":grl_ops", + ":utils", + ], +) + +py_test( + name = "losses_test", + srcs = [ + "losses_test.py", + ], + deps = [ + ":losses", + ":utils", + ], +) + +py_library( + name = "dsn", + srcs = [ + "dsn.py", + ], + deps = [ + ":grl_op_grads_py", + ":grl_op_shapes_py", + ":grl_ops", + ":losses", + ":models", + ":utils", + ], +) + +py_test( + name = "dsn_test", + srcs = [ + "dsn_test.py", + ], + deps = [ + ":dsn", + ], +) + +py_binary( + name = "dsn_train", + srcs = [ + "dsn_train.py", + ], + deps = [ + ":dsn", + ":models", + "//domain_adaptation/datasets:dataset_factory", + ], +) + +py_binary( + name = "dsn_eval", + srcs = [ + "dsn_eval.py", + ], + deps = [ + ":dsn", + ":models", + "//domain_adaptation/datasets:dataset_factory", + ], +) + +py_test( + name = "models_test", + srcs = [ + "models_test.py", + ], + deps = [ + ":models", + "//domain_adaptation/datasets:dataset_factory", + ], +) + +py_library( + name = "utils", + srcs = [ + "utils.py", + ], + deps = [ + ], +) + +py_library( + name = "grl_op_grads_py", + srcs = [ + "grl_op_grads.py", + ], + deps = [ + ":grl_ops", + ], +) + +py_library( + name = "grl_op_shapes_py", + srcs = [ + "grl_op_shapes.py", + ], + deps = [ + ], +) + +py_library( + name = "grl_ops", + srcs = ["grl_ops.py"], + data = ["_grl_ops.so"], +) + +py_test( + name = "grl_ops_test", + size = "small", + srcs = ["grl_ops_test.py"], + deps = [ + ":grl_op_grads_py", + ":grl_op_shapes_py", + ":grl_ops", + ], +) diff --git a/domain_adaptation/domain_separation/__init__.py b/domain_adaptation/domain_separation/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/domain_adaptation/domain_separation/_grl_ops.so b/domain_adaptation/domain_separation/_grl_ops.so new file mode 100755 index 0000000000000000000000000000000000000000..4c35473760a76dcb743d58f45eddccecb5f5161e Binary files /dev/null and b/domain_adaptation/domain_separation/_grl_ops.so differ diff --git a/domain_adaptation/domain_separation/dsn.py b/domain_adaptation/domain_separation/dsn.py new file mode 100644 index 0000000000000000000000000000000000000000..3018e8a791840ae465bad493913235cc04c31cff --- /dev/null +++ b/domain_adaptation/domain_separation/dsn.py @@ -0,0 +1,355 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Functions to create a DSN model and add the different losses to it. + +Specifically, in this file we define the: + - Shared Encoding Similarity Loss Module, with: + - The MMD Similarity method + - The Correlation Similarity method + - The Gradient Reversal (Domain-Adversarial) method + - Difference Loss Module + - Reconstruction Loss Module + - Task Loss Module +""" +from functools import partial + +import tensorflow as tf + +import losses +import models +import utils + +slim = tf.contrib.slim + + +################################################################################ +# HELPER FUNCTIONS +################################################################################ +def dsn_loss_coefficient(params): + """The global_step-dependent weight that specifies when to kick in DSN losses. + + Args: + params: A dictionary of parameters. Expecting 'domain_separation_startpoint' + + Returns: + A weight to that effectively enables or disables the DSN-related losses, + i.e. similarity, difference, and reconstruction losses. + """ + return tf.where( + tf.less(slim.get_or_create_global_step(), + params['domain_separation_startpoint']), 1e-10, 1.0) + + +################################################################################ +# MODEL CREATION +################################################################################ +def create_model(source_images, source_labels, domain_selection_mask, + target_images, target_labels, similarity_loss, params, + basic_tower_name): + """Creates a DSN model. + + Args: + source_images: images from the source domain, a tensor of size + [batch_size, height, width, channels] + source_labels: a dictionary with the name, tensor pairs. 'classes' is one- + hot for the number of classes. + domain_selection_mask: a boolean tensor of size [batch_size, ] which denotes + the labeled images that belong to the source domain. + target_images: images from the target domain, a tensor of size + [batch_size, height width, channels]. + target_labels: a dictionary with the name, tensor pairs. + similarity_loss: The type of method to use for encouraging + the codes from the shared encoder to be similar. + params: A dictionary of parameters. Expecting 'weight_decay', + 'layers_to_regularize', 'use_separation', 'domain_separation_startpoint', + 'alpha_weight', 'beta_weight', 'gamma_weight', 'recon_loss_name', + 'decoder_name', 'encoder_name' + basic_tower_name: the name of the tower to use for the shared encoder. + + Raises: + ValueError: if the arch is not one of the available architectures. + """ + network = getattr(models, basic_tower_name) + num_classes = source_labels['classes'].get_shape().as_list()[1] + + # Make sure we are using the appropriate number of classes. + network = partial(network, num_classes=num_classes) + + # Add the classification/pose estimation loss to the source domain. + source_endpoints = add_task_loss(source_images, source_labels, network, + params) + + if similarity_loss == 'none': + # No domain adaptation, we can stop here. + return + + with tf.variable_scope('towers', reuse=True): + target_logits, target_endpoints = network( + target_images, weight_decay=params['weight_decay'], prefix='target') + + # Plot target accuracy of the train set. + target_accuracy = utils.accuracy( + tf.argmax(target_logits, 1), tf.argmax(target_labels['classes'], 1)) + + if 'quaternions' in target_labels: + target_quaternion_loss = losses.log_quaternion_loss( + target_labels['quaternions'], target_endpoints['quaternion_pred'], + params) + tf.summary.scalar('eval/Target quaternions', target_quaternion_loss) + + tf.summary.scalar('eval/Target accuracy', target_accuracy) + + source_shared = source_endpoints[params['layers_to_regularize']] + target_shared = target_endpoints[params['layers_to_regularize']] + + # When using the semisupervised model we include labeled target data in the + # source classifier. We do not want to include these target domain when + # we use the similarity loss. + indices = tf.range(0, source_shared.get_shape().as_list()[0]) + indices = tf.boolean_mask(indices, domain_selection_mask) + add_similarity_loss(similarity_loss, + tf.gather(source_shared, indices), + tf.gather(target_shared, indices), params) + + if params['use_separation']: + add_autoencoders( + source_images, + source_shared, + target_images, + target_shared, + params=params,) + + +def add_similarity_loss(method_name, + source_samples, + target_samples, + params, + scope=None): + """Adds a loss encouraging the shared encoding from each domain to be similar. + + Args: + method_name: the name of the encoding similarity method to use. Valid + options include `dann_loss', `mmd_loss' or `correlation_loss'. + source_samples: a tensor of shape [num_samples, num_features]. + target_samples: a tensor of shape [num_samples, num_features]. + params: a dictionary of parameters. Expecting 'gamma_weight'. + scope: optional name scope for summary tags. + Raises: + ValueError: if `method_name` is not recognized. + """ + weight = dsn_loss_coefficient(params) * params['gamma_weight'] + method = getattr(losses, method_name) + method(source_samples, target_samples, weight, scope) + + +def add_reconstruction_loss(recon_loss_name, images, recons, weight, domain): + """Adds a reconstruction loss. + + Args: + recon_loss_name: The name of the reconstruction loss. + images: A `Tensor` of size [batch_size, height, width, 3]. + recons: A `Tensor` whose size matches `images`. + weight: A scalar coefficient for the loss. + domain: The name of the domain being reconstructed. + + Raises: + ValueError: If `recon_loss_name` is not recognized. + """ + if recon_loss_name == 'sum_of_pairwise_squares': + loss_fn = tf.contrib.losses.mean_pairwise_squared_error + elif recon_loss_name == 'sum_of_squares': + loss_fn = tf.contrib.losses.mean_squared_error + else: + raise ValueError('recon_loss_name value [%s] not recognized.' % + recon_loss_name) + + loss = loss_fn(recons, images, weight) + assert_op = tf.Assert(tf.is_finite(loss), [loss]) + with tf.control_dependencies([assert_op]): + tf.summary.scalar('losses/%s Recon Loss' % domain, loss) + + +def add_autoencoders(source_data, source_shared, target_data, target_shared, + params): + """Adds the encoders/decoders for our domain separation model w/ incoherence. + + Args: + source_data: images from the source domain, a tensor of size + [batch_size, height, width, channels] + source_shared: a tensor with first dimension batch_size + target_data: images from the target domain, a tensor of size + [batch_size, height, width, channels] + target_shared: a tensor with first dimension batch_size + params: A dictionary of parameters. Expecting 'layers_to_regularize', + 'beta_weight', 'alpha_weight', 'recon_loss_name', 'decoder_name', + 'encoder_name', 'weight_decay' + """ + + def normalize_images(images): + images -= tf.reduce_min(images) + return images / tf.reduce_max(images) + + def concat_operation(shared_repr, private_repr): + return shared_repr + private_repr + + mu = dsn_loss_coefficient(params) + + # The layer to concatenate the networks at. + concat_layer = params['layers_to_regularize'] + + # The coefficient for modulating the private/shared difference loss. + difference_loss_weight = params['beta_weight'] * mu + + # The reconstruction weight. + recon_loss_weight = params['alpha_weight'] * mu + + # The reconstruction loss to use. + recon_loss_name = params['recon_loss_name'] + + # The decoder/encoder to use. + decoder_name = params['decoder_name'] + encoder_name = params['encoder_name'] + + _, height, width, _ = source_data.get_shape().as_list() + code_size = source_shared.get_shape().as_list()[-1] + weight_decay = params['weight_decay'] + + encoder_fn = getattr(models, encoder_name) + # Target Auto-encoding. + with tf.variable_scope('source_encoder'): + source_endpoints = encoder_fn( + source_data, code_size, weight_decay=weight_decay) + + with tf.variable_scope('target_encoder'): + target_endpoints = encoder_fn( + target_data, code_size, weight_decay=weight_decay) + + decoder_fn = getattr(models, decoder_name) + + decoder = partial( + decoder_fn, + height=height, + width=width, + channels=source_data.get_shape().as_list()[-1], + weight_decay=weight_decay) + + # Source Auto-encoding. + source_private = source_endpoints[concat_layer] + target_private = target_endpoints[concat_layer] + with tf.variable_scope('decoder'): + source_recons = decoder(concat_operation(source_shared, source_private)) + + with tf.variable_scope('decoder', reuse=True): + source_private_recons = decoder( + concat_operation(tf.zeros_like(source_private), source_private)) + source_shared_recons = decoder( + concat_operation(source_shared, tf.zeros_like(source_shared))) + + with tf.variable_scope('decoder', reuse=True): + target_recons = decoder(concat_operation(target_shared, target_private)) + target_shared_recons = decoder( + concat_operation(target_shared, tf.zeros_like(target_shared))) + target_private_recons = decoder( + concat_operation(tf.zeros_like(target_private), target_private)) + + losses.difference_loss( + source_private, + source_shared, + weight=difference_loss_weight, + name='Source') + losses.difference_loss( + target_private, + target_shared, + weight=difference_loss_weight, + name='Target') + + add_reconstruction_loss(recon_loss_name, source_data, source_recons, + recon_loss_weight, 'source') + add_reconstruction_loss(recon_loss_name, target_data, target_recons, + recon_loss_weight, 'target') + + # Add summaries + source_reconstructions = tf.concat( + axis=2, + values=map(normalize_images, [ + source_data, source_recons, source_shared_recons, + source_private_recons + ])) + target_reconstructions = tf.concat( + axis=2, + values=map(normalize_images, [ + target_data, target_recons, target_shared_recons, + target_private_recons + ])) + tf.summary.image( + 'Source Images:Recons:RGB', + source_reconstructions[:, :, :, :3], + max_outputs=10) + tf.summary.image( + 'Target Images:Recons:RGB', + target_reconstructions[:, :, :, :3], + max_outputs=10) + + if source_reconstructions.get_shape().as_list()[3] == 4: + tf.summary.image( + 'Source Images:Recons:Depth', + source_reconstructions[:, :, :, 3:4], + max_outputs=10) + tf.summary.image( + 'Target Images:Recons:Depth', + target_reconstructions[:, :, :, 3:4], + max_outputs=10) + + +def add_task_loss(source_images, source_labels, basic_tower, params): + """Adds a classification and/or pose estimation loss to the model. + + Args: + source_images: images from the source domain, a tensor of size + [batch_size, height, width, channels] + source_labels: labels from the source domain, a tensor of size [batch_size]. + or a tuple of (quaternions, class_labels) + basic_tower: a function that creates the single tower of the model. + params: A dictionary of parameters. Expecting 'weight_decay', 'pose_weight'. + Returns: + The source endpoints. + + Raises: + RuntimeError: if basic tower does not support pose estimation. + """ + with tf.variable_scope('towers'): + source_logits, source_endpoints = basic_tower( + source_images, weight_decay=params['weight_decay'], prefix='Source') + + if 'quaternions' in source_labels: # We have pose estimation as well + if 'quaternion_pred' not in source_endpoints: + raise RuntimeError('Please use a model for estimation e.g. pose_mini') + + loss = losses.log_quaternion_loss(source_labels['quaternions'], + source_endpoints['quaternion_pred'], + params) + + assert_op = tf.Assert(tf.is_finite(loss), [loss]) + with tf.control_dependencies([assert_op]): + quaternion_loss = loss + tf.summary.histogram('log_quaternion_loss_hist', quaternion_loss) + slim.losses.add_loss(quaternion_loss * params['pose_weight']) + tf.summary.scalar('losses/quaternion_loss', quaternion_loss) + + classification_loss = tf.losses.softmax_cross_entropy( + source_labels['classes'], source_logits) + + tf.summary.scalar('losses/classification_loss', classification_loss) + return source_endpoints diff --git a/domain_adaptation/domain_separation/dsn_eval.py b/domain_adaptation/domain_separation/dsn_eval.py new file mode 100644 index 0000000000000000000000000000000000000000..c52c9845ec767fdf49ec6aa4613778250db82902 --- /dev/null +++ b/domain_adaptation/domain_separation/dsn_eval.py @@ -0,0 +1,160 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# pylint: disable=line-too-long +"""Evaluation for Domain Separation Networks (DSNs).""" +# pylint: enable=line-too-long +import math + +import numpy as np +import tensorflow as tf + +from domain_adaptation.datasets import dataset_factory +from domain_adaptation.domain_separation import losses +from domain_adaptation.domain_separation import models + +slim = tf.contrib.slim + +FLAGS = tf.app.flags.FLAGS + +tf.app.flags.DEFINE_integer('batch_size', 32, + 'The number of images in each batch.') + +tf.app.flags.DEFINE_string('master', '', + 'BNS name of the TensorFlow master to use.') + +tf.app.flags.DEFINE_string('checkpoint_dir', '/tmp/da/', + 'Directory where the model was written to.') + +tf.app.flags.DEFINE_string( + 'eval_dir', '/tmp/da/', + 'Directory where we should write the tf summaries to.') + +tf.app.flags.DEFINE_string('dataset_dir', None, + 'The directory where the dataset files are stored.') + +tf.app.flags.DEFINE_string('dataset', 'mnist_m', + 'Which dataset to test on: "mnist", "mnist_m".') + +tf.app.flags.DEFINE_string('split', 'valid', + 'Which portion to test on: "valid", "test".') + +tf.app.flags.DEFINE_integer('num_examples', 1000, 'Number of test examples.') + +tf.app.flags.DEFINE_string('basic_tower', 'dann_mnist', + 'The basic tower building block.') + +tf.app.flags.DEFINE_bool('enable_precision_recall', False, + 'If True, precision and recall for each class will ' + 'be added to the metrics.') + +tf.app.flags.DEFINE_bool('use_logging', False, 'Debugging messages.') + + +def quaternion_metric(predictions, labels): + params = {'batch_size': FLAGS.batch_size, 'use_logging': False} + logcost = losses.log_quaternion_loss_batch(predictions, labels, params) + return slim.metrics.streaming_mean(logcost) + + +def angle_diff(true_q, pred_q): + angles = 2 * ( + 180.0 / + np.pi) * np.arccos(np.abs(np.sum(np.multiply(pred_q, true_q), axis=1))) + return angles + + +def provide_batch_fn(): + """ The provide_batch function to use. """ + return dataset_factory.provide_batch + + +def main(_): + g = tf.Graph() + with g.as_default(): + # Load the data. + images, labels = provide_batch_fn()( + FLAGS.dataset, FLAGS.split, FLAGS.dataset_dir, 4, FLAGS.batch_size, 4) + + num_classes = labels['classes'].get_shape().as_list()[1] + + tf.summary.image('eval_images', images, max_outputs=3) + + # Define the model: + with tf.variable_scope('towers'): + basic_tower = getattr(models, FLAGS.basic_tower) + predictions, endpoints = basic_tower( + images, + num_classes=num_classes, + is_training=False, + batch_norm_params=None) + metric_names_to_values = {} + + # Define the metrics: + if 'quaternions' in labels: # Also have to evaluate pose estimation! + quaternion_loss = quaternion_metric(labels['quaternions'], + endpoints['quaternion_pred']) + + angle_errors, = tf.py_func( + angle_diff, [labels['quaternions'], endpoints['quaternion_pred']], + [tf.float32]) + + metric_names_to_values[ + 'Angular mean error'] = slim.metrics.streaming_mean(angle_errors) + metric_names_to_values['Quaternion Loss'] = quaternion_loss + + accuracy = tf.contrib.metrics.streaming_accuracy( + tf.argmax(predictions, 1), tf.argmax(labels['classes'], 1)) + + predictions = tf.argmax(predictions, 1) + labels = tf.argmax(labels['classes'], 1) + metric_names_to_values['Accuracy'] = accuracy + + if FLAGS.enable_precision_recall: + for i in xrange(num_classes): + index_map = tf.one_hot(i, depth=num_classes) + name = 'PR/Precision_{}'.format(i) + metric_names_to_values[name] = slim.metrics.streaming_precision( + tf.gather(index_map, predictions), tf.gather(index_map, labels)) + name = 'PR/Recall_{}'.format(i) + metric_names_to_values[name] = slim.metrics.streaming_recall( + tf.gather(index_map, predictions), tf.gather(index_map, labels)) + + names_to_values, names_to_updates = slim.metrics.aggregate_metric_map( + metric_names_to_values) + + # Create the summary ops such that they also print out to std output: + summary_ops = [] + for metric_name, metric_value in names_to_values.iteritems(): + op = tf.summary.scalar(metric_name, metric_value) + op = tf.Print(op, [metric_value], metric_name) + summary_ops.append(op) + + # This ensures that we make a single pass over all of the data. + num_batches = math.ceil(FLAGS.num_examples / float(FLAGS.batch_size)) + + # Setup the global step. + slim.get_or_create_global_step() + slim.evaluation.evaluation_loop( + FLAGS.master, + checkpoint_dir=FLAGS.checkpoint_dir, + logdir=FLAGS.eval_dir, + num_evals=num_batches, + eval_op=names_to_updates.values(), + summary_op=tf.summary.merge(summary_ops)) + + +if __name__ == '__main__': + tf.app.run() diff --git a/domain_adaptation/domain_separation/dsn_test.py b/domain_adaptation/domain_separation/dsn_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3d687398a9b9356455f739417bc96ddb2ca5ad40 --- /dev/null +++ b/domain_adaptation/domain_separation/dsn_test.py @@ -0,0 +1,157 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for DSN model assembly functions.""" + +import numpy as np +import tensorflow as tf + +import dsn + + +class HelperFunctionsTest(tf.test.TestCase): + + def testBasicDomainSeparationStartPoint(self): + with self.test_session() as sess: + # Test for when global_step < domain_separation_startpoint + step = tf.contrib.slim.get_or_create_global_step() + sess.run(tf.global_variables_initializer()) # global_step = 0 + params = {'domain_separation_startpoint': 2} + weight = dsn.dsn_loss_coefficient(params) + weight_np = sess.run(weight) + self.assertAlmostEqual(weight_np, 1e-10) + + step_op = tf.assign_add(step, 1) + step_np = sess.run(step_op) # global_step = 1 + weight = dsn.dsn_loss_coefficient(params) + weight_np = sess.run(weight) + self.assertAlmostEqual(weight_np, 1e-10) + + # Test for when global_step >= domain_separation_startpoint + step_np = sess.run(step_op) # global_step = 2 + tf.logging.info(step_np) + weight = dsn.dsn_loss_coefficient(params) + weight_np = sess.run(weight) + self.assertAlmostEqual(weight_np, 1.0) + + +class DsnModelAssemblyTest(tf.test.TestCase): + + def _testBuildDefaultModel(self): + images = tf.to_float(np.random.rand(32, 28, 28, 1)) + labels = {} + labels['classes'] = tf.one_hot( + tf.to_int32(np.random.randint(0, 9, (32))), 10) + + params = { + 'use_separation': True, + 'layers_to_regularize': 'fc3', + 'weight_decay': 0.0, + 'ps_tasks': 1, + 'domain_separation_startpoint': 1, + 'alpha_weight': 1, + 'beta_weight': 1, + 'gamma_weight': 1, + 'recon_loss_name': 'sum_of_squares', + 'decoder_name': 'small_decoder', + 'encoder_name': 'default_encoder', + } + return images, labels, params + + def testBuildModelDann(self): + images, labels, params = self._testBuildDefaultModel() + + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, + 'dann_loss', params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 6) + + def testBuildModelDannSumOfPairwiseSquares(self): + images, labels, params = self._testBuildDefaultModel() + + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, + 'dann_loss', params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 6) + + def testBuildModelDannMultiPSTasks(self): + images, labels, params = self._testBuildDefaultModel() + params['ps_tasks'] = 10 + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, + 'dann_loss', params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 6) + + def testBuildModelMmd(self): + images, labels, params = self._testBuildDefaultModel() + + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, + 'mmd_loss', params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 6) + + def testBuildModelCorr(self): + images, labels, params = self._testBuildDefaultModel() + + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, + 'correlation_loss', params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 6) + + def testBuildModelNoDomainAdaptation(self): + images, labels, params = self._testBuildDefaultModel() + params['use_separation'] = False + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, 'none', + params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 1) + self.assertEqual(len(tf.contrib.losses.get_regularization_losses()), 0) + + def testBuildModelNoAdaptationWeightDecay(self): + images, labels, params = self._testBuildDefaultModel() + params['use_separation'] = False + params['weight_decay'] = 1e-5 + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, 'none', + params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 1) + self.assertTrue(len(tf.contrib.losses.get_regularization_losses()) >= 1) + + def testBuildModelNoSeparation(self): + images, labels, params = self._testBuildDefaultModel() + params['use_separation'] = False + with self.test_session(): + dsn.create_model(images, labels, + tf.cast(tf.ones([32,]), tf.bool), images, labels, + 'dann_loss', params, 'dann_mnist') + loss_tensors = tf.contrib.losses.get_losses() + self.assertEqual(len(loss_tensors), 2) + + +if __name__ == '__main__': + tf.test.main() diff --git a/domain_adaptation/domain_separation/dsn_train.py b/domain_adaptation/domain_separation/dsn_train.py new file mode 100644 index 0000000000000000000000000000000000000000..5e364ad3037b041125a3523370b3b040478f0d8e --- /dev/null +++ b/domain_adaptation/domain_separation/dsn_train.py @@ -0,0 +1,278 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Training for Domain Separation Networks (DSNs).""" +from __future__ import division + +import tensorflow as tf + +from domain_adaptation.datasets import dataset_factory +import dsn + +slim = tf.contrib.slim +FLAGS = tf.app.flags.FLAGS + +tf.app.flags.DEFINE_integer('batch_size', 32, + 'The number of images in each batch.') + +tf.app.flags.DEFINE_string('source_dataset', 'pose_synthetic', + 'Source dataset to train on.') + +tf.app.flags.DEFINE_string('target_dataset', 'pose_real', + 'Target dataset to train on.') + +tf.app.flags.DEFINE_string('target_labeled_dataset', 'none', + 'Target dataset to train on.') + +tf.app.flags.DEFINE_string('dataset_dir', None, + 'The directory where the dataset files are stored.') + +tf.app.flags.DEFINE_string('master', '', + 'BNS name of the TensorFlow master to use.') + +tf.app.flags.DEFINE_string('train_log_dir', '/tmp/da/', + 'Directory where to write event logs.') + +tf.app.flags.DEFINE_string( + 'layers_to_regularize', 'fc3', + 'Comma-separated list of layer names to use MMD regularization on.') + +tf.app.flags.DEFINE_float('learning_rate', .01, 'The learning rate') + +tf.app.flags.DEFINE_float('alpha_weight', 1e-6, + 'The coefficient for scaling the reconstruction ' + 'loss.') + +tf.app.flags.DEFINE_float( + 'beta_weight', 1e-6, + 'The coefficient for scaling the private/shared difference loss.') + +tf.app.flags.DEFINE_float( + 'gamma_weight', 1e-6, + 'The coefficient for scaling the shared encoding similarity loss.') + +tf.app.flags.DEFINE_float('pose_weight', 0.125, + 'The coefficient for scaling the pose loss.') + +tf.app.flags.DEFINE_float( + 'weight_decay', 1e-6, + 'The coefficient for the L2 regularization applied for all weights.') + +tf.app.flags.DEFINE_integer( + 'save_summaries_secs', 60, + 'The frequency with which summaries are saved, in seconds.') + +tf.app.flags.DEFINE_integer( + 'save_interval_secs', 60, + 'The frequency with which the model is saved, in seconds.') + +tf.app.flags.DEFINE_integer( + 'max_number_of_steps', None, + 'The maximum number of gradient steps. Use None to train indefinitely.') + +tf.app.flags.DEFINE_integer( + 'domain_separation_startpoint', 1, + 'The global step to add the domain separation losses.') + +tf.app.flags.DEFINE_integer( + 'bipartite_assignment_top_k', 3, + 'The number of top-k matches to use in bipartite matching adaptation.') + +tf.app.flags.DEFINE_float('decay_rate', 0.95, 'Learning rate decay factor.') + +tf.app.flags.DEFINE_integer('decay_steps', 20000, 'Learning rate decay steps.') + +tf.app.flags.DEFINE_float('momentum', 0.9, 'The momentum value.') + +tf.app.flags.DEFINE_bool('use_separation', False, + 'Use our domain separation model.') + +tf.app.flags.DEFINE_bool('use_logging', False, 'Debugging messages.') + +tf.app.flags.DEFINE_integer( + 'ps_tasks', 0, + 'The number of parameter servers. If the value is 0, then the parameters ' + 'are handled locally by the worker.') + +tf.app.flags.DEFINE_integer( + 'num_readers', 4, + 'The number of parallel readers that read data from the dataset.') + +tf.app.flags.DEFINE_integer('num_preprocessing_threads', 4, + 'The number of threads used to create the batches.') + +tf.app.flags.DEFINE_integer( + 'task', 0, + 'The Task ID. This value is used when training with multiple workers to ' + 'identify each worker.') + +tf.app.flags.DEFINE_string('decoder_name', 'small_decoder', + 'The decoder to use.') +tf.app.flags.DEFINE_string('encoder_name', 'default_encoder', + 'The encoder to use.') + +################################################################################ +# Flags that control the architecture and losses +################################################################################ +tf.app.flags.DEFINE_string( + 'similarity_loss', 'grl', + 'The method to use for encouraging the common encoder codes to be ' + 'similar, one of "grl", "mmd", "corr".') + +tf.app.flags.DEFINE_string('recon_loss_name', 'sum_of_pairwise_squares', + 'The name of the reconstruction loss.') + +tf.app.flags.DEFINE_string('basic_tower', 'pose_mini', + 'The basic tower building block.') + +def provide_batch_fn(): + """ The provide_batch function to use. """ + return dataset_factory.provide_batch + +def main(_): + model_params = { + 'use_separation': FLAGS.use_separation, + 'domain_separation_startpoint': FLAGS.domain_separation_startpoint, + 'layers_to_regularize': FLAGS.layers_to_regularize, + 'alpha_weight': FLAGS.alpha_weight, + 'beta_weight': FLAGS.beta_weight, + 'gamma_weight': FLAGS.gamma_weight, + 'pose_weight': FLAGS.pose_weight, + 'recon_loss_name': FLAGS.recon_loss_name, + 'decoder_name': FLAGS.decoder_name, + 'encoder_name': FLAGS.encoder_name, + 'weight_decay': FLAGS.weight_decay, + 'batch_size': FLAGS.batch_size, + 'use_logging': FLAGS.use_logging, + 'ps_tasks': FLAGS.ps_tasks, + 'task': FLAGS.task, + } + g = tf.Graph() + with g.as_default(): + with tf.device(tf.train.replica_device_setter(FLAGS.ps_tasks)): + # Load the data. + source_images, source_labels = provide_batch_fn()( + FLAGS.source_dataset, 'train', FLAGS.dataset_dir, FLAGS.num_readers, + FLAGS.batch_size, FLAGS.num_preprocessing_threads) + target_images, target_labels = provide_batch_fn()( + FLAGS.target_dataset, 'train', FLAGS.dataset_dir, FLAGS.num_readers, + FLAGS.batch_size, FLAGS.num_preprocessing_threads) + + # In the unsupervised case all the samples in the labeled + # domain are from the source domain. + domain_selection_mask = tf.fill((source_images.get_shape().as_list()[0],), + True) + + # When using the semisupervised model we include labeled target data in + # the source labelled data. + if FLAGS.target_labeled_dataset != 'none': + # 1000 is the maximum number of labelled target samples that exists in + # the datasets. + target_semi_images, target_semi_labels = provide_batch_fn()( + FLAGS.target_labeled_dataset, 'train', FLAGS.batch_size) + + # Calculate the proportion of source domain samples in the semi- + # supervised setting, so that the proportion is set accordingly in the + # batches. + proportion = float(source_labels['num_train_samples']) / ( + source_labels['num_train_samples'] + + target_semi_labels['num_train_samples']) + + rnd_tensor = tf.random_uniform( + (target_semi_images.get_shape().as_list()[0],)) + + domain_selection_mask = rnd_tensor < proportion + source_images = tf.where(domain_selection_mask, source_images, + target_semi_images) + source_class_labels = tf.where(domain_selection_mask, + source_labels['classes'], + target_semi_labels['classes']) + + if 'quaternions' in source_labels: + source_pose_labels = tf.where(domain_selection_mask, + source_labels['quaternions'], + target_semi_labels['quaternions']) + (source_images, source_class_labels, source_pose_labels, + domain_selection_mask) = tf.train.shuffle_batch( + [ + source_images, source_class_labels, source_pose_labels, + domain_selection_mask + ], + FLAGS.batch_size, + 50000, + 5000, + num_threads=1, + enqueue_many=True) + + else: + (source_images, source_class_labels, + domain_selection_mask) = tf.train.shuffle_batch( + [source_images, source_class_labels, domain_selection_mask], + FLAGS.batch_size, + 50000, + 5000, + num_threads=1, + enqueue_many=True) + source_labels = {} + source_labels['classes'] = source_class_labels + if 'quaternions' in source_labels: + source_labels['quaternions'] = source_pose_labels + + slim.get_or_create_global_step() + tf.summary.image('source_images', source_images, max_outputs=3) + tf.summary.image('target_images', target_images, max_outputs=3) + + dsn.create_model( + source_images, + source_labels, + domain_selection_mask, + target_images, + target_labels, + FLAGS.similarity_loss, + model_params, + basic_tower_name=FLAGS.basic_tower) + + # Configure the optimization scheme: + learning_rate = tf.train.exponential_decay( + FLAGS.learning_rate, + slim.get_or_create_global_step(), + FLAGS.decay_steps, + FLAGS.decay_rate, + staircase=True, + name='learning_rate') + + tf.summary.scalar('learning_rate', learning_rate) + tf.summary.scalar('total_loss', tf.losses.get_total_loss()) + + opt = tf.train.MomentumOptimizer(learning_rate, FLAGS.momentum) + tf.logging.set_verbosity(tf.logging.INFO) + # Run training. + loss_tensor = slim.learning.create_train_op( + slim.losses.get_total_loss(), + opt, + summarize_gradients=True, + colocate_gradients_with_ops=True) + slim.learning.train( + train_op=loss_tensor, + logdir=FLAGS.train_log_dir, + master=FLAGS.master, + is_chief=FLAGS.task == 0, + number_of_steps=FLAGS.max_number_of_steps, + save_summaries_secs=FLAGS.save_summaries_secs, + save_interval_secs=FLAGS.save_interval_secs) + + +if __name__ == '__main__': + tf.app.run() diff --git a/domain_adaptation/domain_separation/grl_op_grads.py b/domain_adaptation/domain_separation/grl_op_grads.py new file mode 100644 index 0000000000000000000000000000000000000000..fcd85ba2b5e7912bffe646a73558af8184812ea6 --- /dev/null +++ b/domain_adaptation/domain_separation/grl_op_grads.py @@ -0,0 +1,34 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Gradients for operators defined in grl_ops.py.""" +import tensorflow as tf + + +@tf.RegisterGradient("GradientReversal") +def _GradientReversalGrad(_, grad): + """The gradients for `gradient_reversal`. + + Args: + _: The `gradient_reversal` `Operation` that we are differentiating, + which we can use to find the inputs and outputs of the original op. + grad: Gradient with respect to the output of the `gradient_reversal` op. + + Returns: + Gradient with respect to the input of `gradient_reversal`, which is simply + the negative of the input gradient. + + """ + return tf.negative(grad) diff --git a/domain_adaptation/domain_separation/grl_op_kernels.cc b/domain_adaptation/domain_separation/grl_op_kernels.cc new file mode 100644 index 0000000000000000000000000000000000000000..ba30128f11e9e88c702d3a80593d930519f346fe --- /dev/null +++ b/domain_adaptation/domain_separation/grl_op_kernels.cc @@ -0,0 +1,47 @@ +/* Copyright 2016 The TensorFlow Authors All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +==============================================================================*/ + +// This file contains the implementations of the ops registered in +// grl_ops.cc. + +#include "tensorflow/core/framework/op_kernel.h" +#include "tensorflow/core/framework/types.pb.h" + +namespace tensorflow { + +// The gradient reversal op is used in domain adversarial training. It behaves +// as the identity op during forward propagation, and multiplies its input by -1 +// during backward propagation. +class GradientReversalOp : public OpKernel { + public: + explicit GradientReversalOp(OpKernelConstruction* context) + : OpKernel(context) {} + + // Gradient reversal op behaves as the identity op during forward + // propagation. Compute() function copied from the IdentityOp::Compute() + // function here: third_party/tensorflow/core/kernels/identity_op.h. + void Compute(OpKernelContext* context) override { + if (IsRefType(context->input_dtype(0))) { + context->forward_ref_input_to_ref_output(0, 0); + } else { + context->set_output(0, context->input(0)); + } + } +}; + +REGISTER_KERNEL_BUILDER(Name("GradientReversal").Device(DEVICE_CPU), + GradientReversalOp); + +} // namespace tensorflow diff --git a/domain_adaptation/domain_separation/grl_op_shapes.py b/domain_adaptation/domain_separation/grl_op_shapes.py new file mode 100644 index 0000000000000000000000000000000000000000..52773c680af265beca9125e48bf68152b8a34e56 --- /dev/null +++ b/domain_adaptation/domain_separation/grl_op_shapes.py @@ -0,0 +1,16 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Shape inference for operators defined in grl_ops.cc.""" diff --git a/domain_adaptation/domain_separation/grl_ops.cc b/domain_adaptation/domain_separation/grl_ops.cc new file mode 100644 index 0000000000000000000000000000000000000000..d441c2b484215605db65a043be6cfa0ab90da2c3 --- /dev/null +++ b/domain_adaptation/domain_separation/grl_ops.cc @@ -0,0 +1,36 @@ +/* Copyright 2016 The TensorFlow Authors All Rights Reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +==============================================================================*/ + +// Contains custom ops. + +#include "tensorflow/core/framework/common_shape_fns.h" +#include "tensorflow/core/framework/op.h" + +namespace tensorflow { + +// This custom op is used by adversarial training. +REGISTER_OP("GradientReversal") + .Input("input: float") + .Output("output: float") + .SetShapeFn(shape_inference::UnchangedShape) + .Doc(R"doc( +This op copies the input to the output during forward propagation, and +negates the input during backward propagation. + +input: Tensor. +output: Tensor, copied from input. +)doc"); + +} // namespace tensorflow diff --git a/domain_adaptation/domain_separation/grl_ops.py b/domain_adaptation/domain_separation/grl_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..50447247b10caf3e41f3c0fb1c6f943dd3d9de6e --- /dev/null +++ b/domain_adaptation/domain_separation/grl_ops.py @@ -0,0 +1,28 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""GradientReversal op Python library.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os.path + +import tensorflow as tf + +tf.logging.info(tf.resource_loader.get_data_files_path()) +_grl_ops_module = tf.load_op_library( + os.path.join(tf.resource_loader.get_data_files_path(), + '_grl_ops.so')) +gradient_reversal = _grl_ops_module.gradient_reversal diff --git a/domain_adaptation/domain_separation/grl_ops_test.py b/domain_adaptation/domain_separation/grl_ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b431a6c02b60ade92a653d2ee8108c0586c70fbb --- /dev/null +++ b/domain_adaptation/domain_separation/grl_ops_test.py @@ -0,0 +1,73 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for grl_ops.""" + +#from models.domain_adaptation.domain_separation import grl_op_grads # pylint: disable=unused-import +#from models.domain_adaptation.domain_separation import grl_op_shapes # pylint: disable=unused-import +import tensorflow as tf + +import grl_op_grads +import grl_ops + +FLAGS = tf.app.flags.FLAGS + + +class GRLOpsTest(tf.test.TestCase): + + def testGradientReversalOp(self): + with tf.Graph().as_default(): + with self.test_session(): + # Test that in forward prop, gradient reversal op acts as the + # identity operation. + examples = tf.constant([5.0, 4.0, 3.0, 2.0, 1.0]) + output = grl_ops.gradient_reversal(examples) + expected_output = examples + self.assertAllEqual(output.eval(), expected_output.eval()) + + # Test that shape inference works as expected. + self.assertAllEqual(output.get_shape(), expected_output.get_shape()) + + # Test that in backward prop, gradient reversal op multiplies + # gradients by -1. + examples = tf.constant([[1.0]]) + w = tf.get_variable(name='w', shape=[1, 1]) + b = tf.get_variable(name='b', shape=[1]) + init_op = tf.global_variables_initializer() + init_op.run() + features = tf.nn.xw_plus_b(examples, w, b) + # Construct two outputs: features layer passes directly to output1, but + # features layer passes through a gradient reversal layer before + # reaching output2. + output1 = features + output2 = grl_ops.gradient_reversal(features) + gold = tf.constant([1.0]) + loss1 = gold - output1 + loss2 = gold - output2 + opt = tf.train.GradientDescentOptimizer(learning_rate=0.01) + grads_and_vars_1 = opt.compute_gradients(loss1, + tf.trainable_variables()) + grads_and_vars_2 = opt.compute_gradients(loss2, + tf.trainable_variables()) + self.assertAllEqual(len(grads_and_vars_1), len(grads_and_vars_2)) + for i in range(len(grads_and_vars_1)): + g1 = grads_and_vars_1[i][0] + g2 = grads_and_vars_2[i][0] + # Verify that gradients of loss1 are the negative of gradients of + # loss2. + self.assertAllEqual(tf.negative(g1).eval(), g2.eval()) + +if __name__ == '__main__': + tf.test.main() diff --git a/domain_adaptation/domain_separation/losses.py b/domain_adaptation/domain_separation/losses.py new file mode 100644 index 0000000000000000000000000000000000000000..0d882340de10e4dd64d44f9357e8bfc5b1dd4712 --- /dev/null +++ b/domain_adaptation/domain_separation/losses.py @@ -0,0 +1,290 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Domain Adaptation Loss Functions. + +The following domain adaptation loss functions are defined: + +- Maximum Mean Discrepancy (MMD). + Relevant paper: + Gretton, Arthur, et al., + "A kernel two-sample test." + The Journal of Machine Learning Research, 2012 + +- Correlation Loss on a batch. +""" +from functools import partial +import tensorflow as tf + +import grl_op_grads # pylint: disable=unused-import +import grl_op_shapes # pylint: disable=unused-import +import grl_ops +import utils +slim = tf.contrib.slim + + +################################################################################ +# SIMILARITY LOSS +################################################################################ +def maximum_mean_discrepancy(x, y, kernel=utils.gaussian_kernel_matrix): + r"""Computes the Maximum Mean Discrepancy (MMD) of two samples: x and y. + + Maximum Mean Discrepancy (MMD) is a distance-measure between the samples of + the distributions of x and y. Here we use the kernel two sample estimate + using the empirical mean of the two distributions. + + MMD^2(P, Q) = || \E{\phi(x)} - \E{\phi(y)} ||^2 + = \E{ K(x, x) } + \E{ K(y, y) } - 2 \E{ K(x, y) }, + + where K = <\phi(x), \phi(y)>, + is the desired kernel function, in this case a radial basis kernel. + + Args: + x: a tensor of shape [num_samples, num_features] + y: a tensor of shape [num_samples, num_features] + kernel: a function which computes the kernel in MMD. Defaults to the + GaussianKernelMatrix. + + Returns: + a scalar denoting the squared maximum mean discrepancy loss. + """ + with tf.name_scope('MaximumMeanDiscrepancy'): + # \E{ K(x, x) } + \E{ K(y, y) } - 2 \E{ K(x, y) } + cost = tf.reduce_mean(kernel(x, x)) + cost += tf.reduce_mean(kernel(y, y)) + cost -= 2 * tf.reduce_mean(kernel(x, y)) + + # We do not allow the loss to become negative. + cost = tf.where(cost > 0, cost, 0, name='value') + return cost + + +def mmd_loss(source_samples, target_samples, weight, scope=None): + """Adds a similarity loss term, the MMD between two representations. + + This Maximum Mean Discrepancy (MMD) loss is calculated with a number of + different Gaussian kernels. + + Args: + source_samples: a tensor of shape [num_samples, num_features]. + target_samples: a tensor of shape [num_samples, num_features]. + weight: the weight of the MMD loss. + scope: optional name scope for summary tags. + + Returns: + a scalar tensor representing the MMD loss value. + """ + sigmas = [ + 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 5, 10, 15, 20, 25, 30, 35, 100, + 1e3, 1e4, 1e5, 1e6 + ] + gaussian_kernel = partial( + utils.gaussian_kernel_matrix, sigmas=tf.constant(sigmas)) + + loss_value = maximum_mean_discrepancy( + source_samples, target_samples, kernel=gaussian_kernel) + loss_value = tf.maximum(1e-4, loss_value) * weight + assert_op = tf.Assert(tf.is_finite(loss_value), [loss_value]) + with tf.control_dependencies([assert_op]): + tag = 'MMD Loss' + if scope: + tag = scope + tag + tf.summary.scalar(tag, loss_value) + tf.losses.add_loss(loss_value) + + return loss_value + + +def correlation_loss(source_samples, target_samples, weight, scope=None): + """Adds a similarity loss term, the correlation between two representations. + + Args: + source_samples: a tensor of shape [num_samples, num_features] + target_samples: a tensor of shape [num_samples, num_features] + weight: a scalar weight for the loss. + scope: optional name scope for summary tags. + + Returns: + a scalar tensor representing the correlation loss value. + """ + with tf.name_scope('corr_loss'): + source_samples -= tf.reduce_mean(source_samples, 0) + target_samples -= tf.reduce_mean(target_samples, 0) + + source_samples = tf.nn.l2_normalize(source_samples, 1) + target_samples = tf.nn.l2_normalize(target_samples, 1) + + source_cov = tf.matmul(tf.transpose(source_samples), source_samples) + target_cov = tf.matmul(tf.transpose(target_samples), target_samples) + + corr_loss = tf.reduce_mean(tf.square(source_cov - target_cov)) * weight + + assert_op = tf.Assert(tf.is_finite(corr_loss), [corr_loss]) + with tf.control_dependencies([assert_op]): + tag = 'Correlation Loss' + if scope: + tag = scope + tag + tf.summary.scalar(tag, corr_loss) + tf.losses.add_loss(corr_loss) + + return corr_loss + + +def dann_loss(source_samples, target_samples, weight, scope=None): + """Adds the domain adversarial (DANN) loss. + + Args: + source_samples: a tensor of shape [num_samples, num_features]. + target_samples: a tensor of shape [num_samples, num_features]. + weight: the weight of the loss. + scope: optional name scope for summary tags. + + Returns: + a scalar tensor representing the correlation loss value. + """ + with tf.variable_scope('dann'): + batch_size = tf.shape(source_samples)[0] + samples = tf.concat(axis=0, values=[source_samples, target_samples]) + samples = slim.flatten(samples) + + domain_selection_mask = tf.concat( + axis=0, values=[tf.zeros((batch_size, 1)), tf.ones((batch_size, 1))]) + + # Perform the gradient reversal and be careful with the shape. + grl = grl_ops.gradient_reversal(samples) + grl = tf.reshape(grl, (-1, samples.get_shape().as_list()[1])) + + grl = slim.fully_connected(grl, 100, scope='fc1') + logits = slim.fully_connected(grl, 1, activation_fn=None, scope='fc2') + + domain_predictions = tf.sigmoid(logits) + + domain_loss = tf.losses.log_loss( + domain_selection_mask, domain_predictions, weights=weight) + + domain_accuracy = utils.accuracy( + tf.round(domain_predictions), domain_selection_mask) + + assert_op = tf.Assert(tf.is_finite(domain_loss), [domain_loss]) + with tf.control_dependencies([assert_op]): + tag_loss = 'losses/domain_loss' + tag_accuracy = 'losses/domain_accuracy' + if scope: + tag_loss = scope + tag_loss + tag_accuracy = scope + tag_accuracy + + tf.summary.scalar(tag_loss, domain_loss) + tf.summary.scalar(tag_accuracy, domain_accuracy) + + return domain_loss + + +################################################################################ +# DIFFERENCE LOSS +################################################################################ +def difference_loss(private_samples, shared_samples, weight=1.0, name=''): + """Adds the difference loss between the private and shared representations. + + Args: + private_samples: a tensor of shape [num_samples, num_features]. + shared_samples: a tensor of shape [num_samples, num_features]. + weight: the weight of the incoherence loss. + name: the name of the tf summary. + """ + private_samples -= tf.reduce_mean(private_samples, 0) + shared_samples -= tf.reduce_mean(shared_samples, 0) + + private_samples = tf.nn.l2_normalize(private_samples, 1) + shared_samples = tf.nn.l2_normalize(shared_samples, 1) + + correlation_matrix = tf.matmul( + private_samples, shared_samples, transpose_a=True) + + cost = tf.reduce_mean(tf.square(correlation_matrix)) * weight + cost = tf.where(cost > 0, cost, 0, name='value') + + tf.summary.scalar('losses/Difference Loss {}'.format(name), + cost) + assert_op = tf.Assert(tf.is_finite(cost), [cost]) + with tf.control_dependencies([assert_op]): + tf.losses.add_loss(cost) + + +################################################################################ +# TASK LOSS +################################################################################ +def log_quaternion_loss_batch(predictions, labels, params): + """A helper function to compute the error between quaternions. + + Args: + predictions: A Tensor of size [batch_size, 4]. + labels: A Tensor of size [batch_size, 4]. + params: A dictionary of parameters. Expecting 'use_logging', 'batch_size'. + + Returns: + A Tensor of size [batch_size], denoting the error between the quaternions. + """ + use_logging = params['use_logging'] + assertions = [] + if use_logging: + assertions.append( + tf.Assert( + tf.reduce_all( + tf.less( + tf.abs(tf.reduce_sum(tf.square(predictions), [1]) - 1), + 1e-4)), + ['The l2 norm of each prediction quaternion vector should be 1.'])) + assertions.append( + tf.Assert( + tf.reduce_all( + tf.less( + tf.abs(tf.reduce_sum(tf.square(labels), [1]) - 1), 1e-4)), + ['The l2 norm of each label quaternion vector should be 1.'])) + + with tf.control_dependencies(assertions): + product = tf.multiply(predictions, labels) + internal_dot_products = tf.reduce_sum(product, [1]) + + if use_logging: + internal_dot_products = tf.Print( + internal_dot_products, + [internal_dot_products, tf.shape(internal_dot_products)], + 'internal_dot_products:') + + logcost = tf.log(1e-4 + 1 - tf.abs(internal_dot_products)) + return logcost + + +def log_quaternion_loss(predictions, labels, params): + """A helper function to compute the mean error between batches of quaternions. + + The caller is expected to add the loss to the graph. + + Args: + predictions: A Tensor of size [batch_size, 4]. + labels: A Tensor of size [batch_size, 4]. + params: A dictionary of parameters. Expecting 'use_logging', 'batch_size'. + + Returns: + A Tensor of size 1, denoting the mean error between batches of quaternions. + """ + use_logging = params['use_logging'] + logcost = log_quaternion_loss_batch(predictions, labels, params) + logcost = tf.reduce_sum(logcost, [0]) + batch_size = params['batch_size'] + logcost = tf.multiply(logcost, 1.0 / batch_size, name='log_quaternion_loss') + if use_logging: + logcost = tf.Print( + logcost, [logcost], '[logcost]', name='log_quaternion_loss_print') + return logcost diff --git a/domain_adaptation/domain_separation/losses_test.py b/domain_adaptation/domain_separation/losses_test.py new file mode 100644 index 0000000000000000000000000000000000000000..46e50301be56f5977adcb3fb00587f076934b785 --- /dev/null +++ b/domain_adaptation/domain_separation/losses_test.py @@ -0,0 +1,110 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for DSN losses.""" +from functools import partial + +import numpy as np +import tensorflow as tf + +import losses +import utils + + +def MaximumMeanDiscrepancySlow(x, y, sigmas): + num_samples = x.get_shape().as_list()[0] + + def AverageGaussianKernel(x, y, sigmas): + result = 0 + for sigma in sigmas: + dist = tf.reduce_sum(tf.square(x - y)) + result += tf.exp((-1.0 / (2.0 * sigma)) * dist) + return result / num_samples**2 + + total = 0 + + for i in range(num_samples): + for j in range(num_samples): + total += AverageGaussianKernel(x[i, :], x[j, :], sigmas) + total += AverageGaussianKernel(y[i, :], y[j, :], sigmas) + total += -2 * AverageGaussianKernel(x[i, :], y[j, :], sigmas) + + return total + + +class LogQuaternionLossTest(tf.test.TestCase): + + def test_log_quaternion_loss_batch(self): + with self.test_session(): + predictions = tf.random_uniform((10, 4), seed=1) + predictions = tf.nn.l2_normalize(predictions, 1) + labels = tf.random_uniform((10, 4), seed=1) + labels = tf.nn.l2_normalize(labels, 1) + params = {'batch_size': 10, 'use_logging': False} + x = losses.log_quaternion_loss_batch(predictions, labels, params) + self.assertTrue(((10,) == tf.shape(x).eval()).all()) + + +class MaximumMeanDiscrepancyTest(tf.test.TestCase): + + def test_mmd_name(self): + with self.test_session(): + x = tf.random_uniform((2, 3), seed=1) + kernel = partial(utils.gaussian_kernel_matrix, sigmas=tf.constant([1.])) + loss = losses.maximum_mean_discrepancy(x, x, kernel) + + self.assertEquals(loss.op.name, 'MaximumMeanDiscrepancy/value') + + def test_mmd_is_zero_when_inputs_are_same(self): + with self.test_session(): + x = tf.random_uniform((2, 3), seed=1) + kernel = partial(utils.gaussian_kernel_matrix, sigmas=tf.constant([1.])) + self.assertEquals(0, losses.maximum_mean_discrepancy(x, x, kernel).eval()) + + def test_fast_mmd_is_similar_to_slow_mmd(self): + with self.test_session(): + x = tf.constant(np.random.normal(size=(2, 3)), tf.float32) + y = tf.constant(np.random.rand(2, 3), tf.float32) + + cost_old = MaximumMeanDiscrepancySlow(x, y, [1.]).eval() + kernel = partial(utils.gaussian_kernel_matrix, sigmas=tf.constant([1.])) + cost_new = losses.maximum_mean_discrepancy(x, y, kernel).eval() + + self.assertAlmostEqual(cost_old, cost_new, delta=1e-5) + + def test_multiple_sigmas(self): + with self.test_session(): + x = tf.constant(np.random.normal(size=(2, 3)), tf.float32) + y = tf.constant(np.random.rand(2, 3), tf.float32) + + sigmas = tf.constant([2., 5., 10, 20, 30]) + kernel = partial(utils.gaussian_kernel_matrix, sigmas=sigmas) + cost_old = MaximumMeanDiscrepancySlow(x, y, [2., 5., 10, 20, 30]).eval() + cost_new = losses.maximum_mean_discrepancy(x, y, kernel=kernel).eval() + + self.assertAlmostEqual(cost_old, cost_new, delta=1e-5) + + def test_mmd_is_zero_when_distributions_are_same(self): + + with self.test_session(): + x = tf.random_uniform((1000, 10), seed=1) + y = tf.random_uniform((1000, 10), seed=3) + + kernel = partial(utils.gaussian_kernel_matrix, sigmas=tf.constant([100.])) + loss = losses.maximum_mean_discrepancy(x, y, kernel=kernel).eval() + + self.assertAlmostEqual(0, loss, delta=1e-4) + +if __name__ == '__main__': + tf.test.main() diff --git a/domain_adaptation/domain_separation/models.py b/domain_adaptation/domain_separation/models.py new file mode 100644 index 0000000000000000000000000000000000000000..04ccaf82eb9b31a6ea78871204c7df70eca3fbfd --- /dev/null +++ b/domain_adaptation/domain_separation/models.py @@ -0,0 +1,443 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Contains different architectures for the different DSN parts. + +We define here the modules that can be used in the different parts of the DSN +model. +- shared encoder (dsn_cropped_linemod, dann_xxxx) +- private encoder (default_encoder) +- decoder (large_decoder, gtsrb_decoder, small_decoder) +""" +import tensorflow as tf + +#from models.domain_adaptation.domain_separation +import utils + +slim = tf.contrib.slim + + +def default_batch_norm_params(is_training=False): + """Returns default batch normalization parameters for DSNs. + + Args: + is_training: whether or not the model is training. + + Returns: + a dictionary that maps batch norm parameter names (strings) to values. + """ + return { + # Decay for the moving averages. + 'decay': 0.5, + # epsilon to prevent 0s in variance. + 'epsilon': 0.001, + 'is_training': is_training + } + + +################################################################################ +# PRIVATE ENCODERS +################################################################################ +def default_encoder(images, code_size, batch_norm_params=None, + weight_decay=0.0): + """Encodes the given images to codes of the given size. + + Args: + images: a tensor of size [batch_size, height, width, 1]. + code_size: the number of hidden units in the code layer of the classifier. + batch_norm_params: a dictionary that maps batch norm parameter names to + values. + weight_decay: the value for the weight decay coefficient. + + Returns: + end_points: the code of the input. + """ + end_points = {} + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu, + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_params): + with slim.arg_scope([slim.conv2d], kernel_size=[5, 5], padding='SAME'): + net = slim.conv2d(images, 32, scope='conv1') + net = slim.max_pool2d(net, [2, 2], 2, scope='pool1') + net = slim.conv2d(net, 64, scope='conv2') + net = slim.max_pool2d(net, [2, 2], 2, scope='pool2') + + net = slim.flatten(net) + end_points['flatten'] = net + net = slim.fully_connected(net, code_size, scope='fc1') + end_points['fc3'] = net + return end_points + + +################################################################################ +# DECODERS +################################################################################ +def large_decoder(codes, + height, + width, + channels, + batch_norm_params=None, + weight_decay=0.0): + """Decodes the codes to a fixed output size. + + Args: + codes: a tensor of size [batch_size, code_size]. + height: the height of the output images. + width: the width of the output images. + channels: the number of the output channels. + batch_norm_params: a dictionary that maps batch norm parameter names to + values. + weight_decay: the value for the weight decay coefficient. + + Returns: + recons: the reconstruction tensor of shape [batch_size, height, width, 3]. + """ + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu, + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_params): + net = slim.fully_connected(codes, 600, scope='fc1') + batch_size = net.get_shape().as_list()[0] + net = tf.reshape(net, [batch_size, 10, 10, 6]) + + net = slim.conv2d(net, 32, [5, 5], scope='conv1_1') + + net = tf.image.resize_nearest_neighbor(net, (16, 16)) + + net = slim.conv2d(net, 32, [5, 5], scope='conv2_1') + + net = tf.image.resize_nearest_neighbor(net, (32, 32)) + + net = slim.conv2d(net, 32, [5, 5], scope='conv3_2') + + output_size = [height, width] + net = tf.image.resize_nearest_neighbor(net, output_size) + + with slim.arg_scope([slim.conv2d], kernel_size=[3, 3]): + net = slim.conv2d(net, channels, activation_fn=None, scope='conv4_1') + + return net + + +def gtsrb_decoder(codes, + height, + width, + channels, + batch_norm_params=None, + weight_decay=0.0): + """Decodes the codes to a fixed output size. This decoder is specific to GTSRB + + Args: + codes: a tensor of size [batch_size, 100]. + height: the height of the output images. + width: the width of the output images. + channels: the number of the output channels. + batch_norm_params: a dictionary that maps batch norm parameter names to + values. + weight_decay: the value for the weight decay coefficient. + + Returns: + recons: the reconstruction tensor of shape [batch_size, height, width, 3]. + + Raises: + ValueError: When the input code size is not 100. + """ + batch_size, code_size = codes.get_shape().as_list() + if code_size != 100: + raise ValueError('The code size used as an input to the GTSRB decoder is ' + 'expected to be 100.') + + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu, + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_params): + net = codes + net = tf.reshape(net, [batch_size, 10, 10, 1]) + net = slim.conv2d(net, 32, [3, 3], scope='conv1_1') + + # First upsampling 20x20 + net = tf.image.resize_nearest_neighbor(net, [20, 20]) + + net = slim.conv2d(net, 32, [3, 3], scope='conv2_1') + + output_size = [height, width] + # Final upsampling 40 x 40 + net = tf.image.resize_nearest_neighbor(net, output_size) + + with slim.arg_scope([slim.conv2d], kernel_size=[3, 3]): + net = slim.conv2d(net, 16, scope='conv3_1') + net = slim.conv2d(net, channels, activation_fn=None, scope='conv3_2') + + return net + + +def small_decoder(codes, + height, + width, + channels, + batch_norm_params=None, + weight_decay=0.0): + """Decodes the codes to a fixed output size. + + Args: + codes: a tensor of size [batch_size, code_size]. + height: the height of the output images. + width: the width of the output images. + channels: the number of the output channels. + batch_norm_params: a dictionary that maps batch norm parameter names to + values. + weight_decay: the value for the weight decay coefficient. + + Returns: + recons: the reconstruction tensor of shape [batch_size, height, width, 3]. + """ + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu, + normalizer_fn=slim.batch_norm, + normalizer_params=batch_norm_params): + net = slim.fully_connected(codes, 300, scope='fc1') + batch_size = net.get_shape().as_list()[0] + net = tf.reshape(net, [batch_size, 10, 10, 3]) + + net = slim.conv2d(net, 16, [3, 3], scope='conv1_1') + net = slim.conv2d(net, 16, [3, 3], scope='conv1_2') + + output_size = [height, width] + net = tf.image.resize_nearest_neighbor(net, output_size) + + with slim.arg_scope([slim.conv2d], kernel_size=[3, 3]): + net = slim.conv2d(net, 16, scope='conv2_1') + net = slim.conv2d(net, channels, activation_fn=None, scope='conv2_2') + + return net + + +################################################################################ +# SHARED ENCODERS +################################################################################ +def dann_mnist(images, + weight_decay=0.0, + prefix='model', + num_classes=10, + **kwargs): + """Creates a convolution MNIST model. + + Note that this model implements the architecture for MNIST proposed in: + Y. Ganin et al., Domain-Adversarial Training of Neural Networks (DANN), + JMLR 2015 + + Args: + images: the MNIST digits, a tensor of size [batch_size, 28, 28, 1]. + weight_decay: the value for the weight decay coefficient. + prefix: name of the model to use when prefixing tags. + num_classes: the number of output classes to use. + **kwargs: Placeholder for keyword arguments used by other shared encoders. + + Returns: + the output logits, a tensor of size [batch_size, num_classes]. + a dictionary with key/values the layer names and tensors. + """ + end_points = {} + + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu,): + with slim.arg_scope([slim.conv2d], padding='SAME'): + end_points['conv1'] = slim.conv2d(images, 32, [5, 5], scope='conv1') + end_points['pool1'] = slim.max_pool2d( + end_points['conv1'], [2, 2], 2, scope='pool1') + end_points['conv2'] = slim.conv2d( + end_points['pool1'], 48, [5, 5], scope='conv2') + end_points['pool2'] = slim.max_pool2d( + end_points['conv2'], [2, 2], 2, scope='pool2') + end_points['fc3'] = slim.fully_connected( + slim.flatten(end_points['pool2']), 100, scope='fc3') + end_points['fc4'] = slim.fully_connected( + slim.flatten(end_points['fc3']), 100, scope='fc4') + + logits = slim.fully_connected( + end_points['fc4'], num_classes, activation_fn=None, scope='fc5') + + return logits, end_points + + +def dann_svhn(images, + weight_decay=0.0, + prefix='model', + num_classes=10, + **kwargs): + """Creates the convolutional SVHN model. + + Note that this model implements the architecture for MNIST proposed in: + Y. Ganin et al., Domain-Adversarial Training of Neural Networks (DANN), + JMLR 2015 + + Args: + images: the SVHN digits, a tensor of size [batch_size, 32, 32, 3]. + weight_decay: the value for the weight decay coefficient. + prefix: name of the model to use when prefixing tags. + num_classes: the number of output classes to use. + **kwargs: Placeholder for keyword arguments used by other shared encoders. + + Returns: + the output logits, a tensor of size [batch_size, num_classes]. + a dictionary with key/values the layer names and tensors. + """ + + end_points = {} + + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu,): + with slim.arg_scope([slim.conv2d], padding='SAME'): + + end_points['conv1'] = slim.conv2d(images, 64, [5, 5], scope='conv1') + end_points['pool1'] = slim.max_pool2d( + end_points['conv1'], [3, 3], 2, scope='pool1') + end_points['conv2'] = slim.conv2d( + end_points['pool1'], 64, [5, 5], scope='conv2') + end_points['pool2'] = slim.max_pool2d( + end_points['conv2'], [3, 3], 2, scope='pool2') + end_points['conv3'] = slim.conv2d( + end_points['pool2'], 128, [5, 5], scope='conv3') + + end_points['fc3'] = slim.fully_connected( + slim.flatten(end_points['conv3']), 3072, scope='fc3') + end_points['fc4'] = slim.fully_connected( + slim.flatten(end_points['fc3']), 2048, scope='fc4') + + logits = slim.fully_connected( + end_points['fc4'], num_classes, activation_fn=None, scope='fc5') + + return logits, end_points + + +def dann_gtsrb(images, + weight_decay=0.0, + prefix='model', + num_classes=43, + **kwargs): + """Creates the convolutional GTSRB model. + + Note that this model implements the architecture for MNIST proposed in: + Y. Ganin et al., Domain-Adversarial Training of Neural Networks (DANN), + JMLR 2015 + + Args: + images: the GTSRB images, a tensor of size [batch_size, 40, 40, 3]. + weight_decay: the value for the weight decay coefficient. + prefix: name of the model to use when prefixing tags. + num_classes: the number of output classes to use. + **kwargs: Placeholder for keyword arguments used by other shared encoders. + + Returns: + the output logits, a tensor of size [batch_size, num_classes]. + a dictionary with key/values the layer names and tensors. + """ + + end_points = {} + + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu,): + with slim.arg_scope([slim.conv2d], padding='SAME'): + + end_points['conv1'] = slim.conv2d(images, 96, [5, 5], scope='conv1') + end_points['pool1'] = slim.max_pool2d( + end_points['conv1'], [2, 2], 2, scope='pool1') + end_points['conv2'] = slim.conv2d( + end_points['pool1'], 144, [3, 3], scope='conv2') + end_points['pool2'] = slim.max_pool2d( + end_points['conv2'], [2, 2], 2, scope='pool2') + end_points['conv3'] = slim.conv2d( + end_points['pool2'], 256, [5, 5], scope='conv3') + end_points['pool3'] = slim.max_pool2d( + end_points['conv3'], [2, 2], 2, scope='pool3') + + end_points['fc3'] = slim.fully_connected( + slim.flatten(end_points['pool3']), 512, scope='fc3') + + logits = slim.fully_connected( + end_points['fc3'], num_classes, activation_fn=None, scope='fc4') + + return logits, end_points + + +def dsn_cropped_linemod(images, + weight_decay=0.0, + prefix='model', + num_classes=11, + batch_norm_params=None, + is_training=False): + """Creates the convolutional pose estimation model for Cropped Linemod. + + Args: + images: the Cropped Linemod samples, a tensor of size + [batch_size, 64, 64, 4]. + weight_decay: the value for the weight decay coefficient. + prefix: name of the model to use when prefixing tags. + num_classes: the number of output classes to use. + batch_norm_params: a dictionary that maps batch norm parameter names to + values. + is_training: specifies whether or not we're currently training the model. + This variable will determine the behaviour of the dropout layer. + + Returns: + the output logits, a tensor of size [batch_size, num_classes]. + a dictionary with key/values the layer names and tensors. + """ + + end_points = {} + + tf.summary.image('{}/input_images'.format(prefix), images) + with slim.arg_scope( + [slim.conv2d, slim.fully_connected], + weights_regularizer=slim.l2_regularizer(weight_decay), + activation_fn=tf.nn.relu, + normalizer_fn=slim.batch_norm if batch_norm_params else None, + normalizer_params=batch_norm_params): + with slim.arg_scope([slim.conv2d], padding='SAME'): + end_points['conv1'] = slim.conv2d(images, 32, [5, 5], scope='conv1') + end_points['pool1'] = slim.max_pool2d( + end_points['conv1'], [2, 2], 2, scope='pool1') + end_points['conv2'] = slim.conv2d( + end_points['pool1'], 64, [5, 5], scope='conv2') + end_points['pool2'] = slim.max_pool2d( + end_points['conv2'], [2, 2], 2, scope='pool2') + net = slim.flatten(end_points['pool2']) + end_points['fc3'] = slim.fully_connected(net, 128, scope='fc3') + net = slim.dropout( + end_points['fc3'], 0.5, is_training=is_training, scope='dropout') + + with tf.variable_scope('quaternion_prediction'): + predicted_quaternion = slim.fully_connected( + net, 4, activation_fn=tf.nn.tanh) + predicted_quaternion = tf.nn.l2_normalize(predicted_quaternion, 1) + logits = slim.fully_connected( + net, num_classes, activation_fn=None, scope='fc4') + end_points['quaternion_pred'] = predicted_quaternion + + return logits, end_points diff --git a/domain_adaptation/domain_separation/models_test.py b/domain_adaptation/domain_separation/models_test.py new file mode 100644 index 0000000000000000000000000000000000000000..69d1a27259022569cc5865e49dd6bba5675d834f --- /dev/null +++ b/domain_adaptation/domain_separation/models_test.py @@ -0,0 +1,167 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for DSN components.""" + +import numpy as np +import tensorflow as tf + +#from models.domain_adaptation.domain_separation +import models + + +class SharedEncodersTest(tf.test.TestCase): + + def _testSharedEncoder(self, + input_shape=[5, 28, 28, 1], + model=models.dann_mnist, + is_training=True): + images = tf.to_float(np.random.rand(*input_shape)) + + with self.test_session() as sess: + logits, _ = model(images) + sess.run(tf.global_variables_initializer()) + logits_np = sess.run(logits) + return logits_np + + def testBuildGRLMnistModel(self): + logits = self._testSharedEncoder(model=getattr(models, + 'dann_mnist')) + self.assertEqual(logits.shape, (5, 10)) + self.assertTrue(np.any(logits)) + + def testBuildGRLSvhnModel(self): + logits = self._testSharedEncoder(model=getattr(models, + 'dann_svhn')) + self.assertEqual(logits.shape, (5, 10)) + self.assertTrue(np.any(logits)) + + def testBuildGRLGtsrbModel(self): + logits = self._testSharedEncoder([5, 40, 40, 3], + getattr(models, 'dann_gtsrb')) + self.assertEqual(logits.shape, (5, 43)) + self.assertTrue(np.any(logits)) + + def testBuildPoseModel(self): + logits = self._testSharedEncoder([5, 64, 64, 4], + getattr(models, 'dsn_cropped_linemod')) + self.assertEqual(logits.shape, (5, 11)) + self.assertTrue(np.any(logits)) + + def testBuildPoseModelWithBatchNorm(self): + images = tf.to_float(np.random.rand(10, 64, 64, 4)) + + with self.test_session() as sess: + logits, _ = getattr(models, 'dsn_cropped_linemod')( + images, batch_norm_params=models.default_batch_norm_params(True)) + sess.run(tf.global_variables_initializer()) + logits_np = sess.run(logits) + self.assertEqual(logits_np.shape, (10, 11)) + self.assertTrue(np.any(logits_np)) + + +class EncoderTest(tf.test.TestCase): + + def _testEncoder(self, batch_norm_params=None, channels=1): + images = tf.to_float(np.random.rand(10, 28, 28, channels)) + + with self.test_session() as sess: + end_points = models.default_encoder( + images, 128, batch_norm_params=batch_norm_params) + sess.run(tf.global_variables_initializer()) + private_code = sess.run(end_points['fc3']) + self.assertEqual(private_code.shape, (10, 128)) + self.assertTrue(np.any(private_code)) + self.assertTrue(np.all(np.isfinite(private_code))) + + def testEncoder(self): + self._testEncoder() + + def testEncoderMultiChannel(self): + self._testEncoder(None, 4) + + def testEncoderIsTrainingBatchNorm(self): + self._testEncoder(models.default_batch_norm_params(True)) + + def testEncoderBatchNorm(self): + self._testEncoder(models.default_batch_norm_params(False)) + + +class DecoderTest(tf.test.TestCase): + + def _testDecoder(self, + height=64, + width=64, + channels=4, + batch_norm_params=None, + decoder=models.small_decoder): + codes = tf.to_float(np.random.rand(32, 100)) + + with self.test_session() as sess: + output = decoder( + codes, + height=height, + width=width, + channels=channels, + batch_norm_params=batch_norm_params) + sess.run(tf.global_variables_initializer()) + output_np = sess.run(output) + self.assertEqual(output_np.shape, (32, height, width, channels)) + self.assertTrue(np.any(output_np)) + self.assertTrue(np.all(np.isfinite(output_np))) + + def testSmallDecoder(self): + self._testDecoder(28, 28, 4, None, getattr(models, 'small_decoder')) + + def testSmallDecoderThreeChannels(self): + self._testDecoder(28, 28, 3) + + def testSmallDecoderBatchNorm(self): + self._testDecoder(28, 28, 4, models.default_batch_norm_params(False)) + + def testSmallDecoderIsTrainingBatchNorm(self): + self._testDecoder(28, 28, 4, models.default_batch_norm_params(True)) + + def testLargeDecoder(self): + self._testDecoder(32, 32, 4, None, getattr(models, 'large_decoder')) + + def testLargeDecoderThreeChannels(self): + self._testDecoder(32, 32, 3, None, getattr(models, 'large_decoder')) + + def testLargeDecoderBatchNorm(self): + self._testDecoder(32, 32, 4, + models.default_batch_norm_params(False), + getattr(models, 'large_decoder')) + + def testLargeDecoderIsTrainingBatchNorm(self): + self._testDecoder(32, 32, 4, + models.default_batch_norm_params(True), + getattr(models, 'large_decoder')) + + def testGtsrbDecoder(self): + self._testDecoder(40, 40, 3, None, getattr(models, 'large_decoder')) + + def testGtsrbDecoderBatchNorm(self): + self._testDecoder(40, 40, 4, + models.default_batch_norm_params(False), + getattr(models, 'gtsrb_decoder')) + + def testGtsrbDecoderIsTrainingBatchNorm(self): + self._testDecoder(40, 40, 4, + models.default_batch_norm_params(True), + getattr(models, 'gtsrb_decoder')) + + +if __name__ == '__main__': + tf.test.main() diff --git a/domain_adaptation/domain_separation/utils.py b/domain_adaptation/domain_separation/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..e144ee86120bd58eb06b710fb35f3f58b5a05343 --- /dev/null +++ b/domain_adaptation/domain_separation/utils.py @@ -0,0 +1,183 @@ +# Copyright 2016 The TensorFlow Authors All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Auxiliary functions for domain adaptation related losses. +""" +import math +import tensorflow as tf + + +def create_summaries(end_points, prefix='', max_images=3, use_op_name=False): + """Creates a tf summary per endpoint. + + If the endpoint is a 4 dimensional tensor it displays it as an image + otherwise if it is a two dimensional one it creates a histogram summary. + + Args: + end_points: a dictionary of name, tf tensor pairs. + prefix: an optional string to prefix the summary with. + max_images: the maximum number of images to display per summary. + use_op_name: Use the op name as opposed to the shorter end_points key. + """ + for layer_name in end_points: + if use_op_name: + name = end_points[layer_name].op.name + else: + name = layer_name + if len(end_points[layer_name].get_shape().as_list()) == 4: + # if it's an actual image do not attempt to reshape it + if end_points[layer_name].get_shape().as_list()[-1] == 1 or end_points[ + layer_name].get_shape().as_list()[-1] == 3: + visualization_image = end_points[layer_name] + else: + visualization_image = reshape_feature_maps(end_points[layer_name]) + tf.summary.image( + '{}/{}'.format(prefix, name), + visualization_image, + max_outputs=max_images) + elif len(end_points[layer_name].get_shape().as_list()) == 3: + images = tf.expand_dims(end_points[layer_name], 3) + tf.summary.image( + '{}/{}'.format(prefix, name), + images, + max_outputs=max_images) + elif len(end_points[layer_name].get_shape().as_list()) == 2: + tf.summary.histogram('{}/{}'.format(prefix, name), end_points[layer_name]) + + +def reshape_feature_maps(features_tensor): + """Reshape activations for tf.summary.image visualization. + + Arguments: + features_tensor: a tensor of activations with a square number of feature + maps, eg 4, 9, 16, etc. + Returns: + A composite image with all the feature maps that can be passed as an + argument to tf.summary.image. + """ + assert len(features_tensor.get_shape().as_list()) == 4 + num_filters = features_tensor.get_shape().as_list()[-1] + assert num_filters > 0 + num_filters_sqrt = math.sqrt(num_filters) + assert num_filters_sqrt.is_integer( + ), 'Number of filters should be a square number but got {}'.format( + num_filters) + num_filters_sqrt = int(num_filters_sqrt) + conv_summary = tf.unstack(features_tensor, axis=3) + conv_one_row = tf.concat(axis=2, values=conv_summary[0:num_filters_sqrt]) + ind = 1 + conv_final = conv_one_row + for ind in range(1, num_filters_sqrt): + conv_one_row = tf.concat(axis=2, + values=conv_summary[ + ind * num_filters_sqrt + 0:ind * num_filters_sqrt + num_filters_sqrt]) + conv_final = tf.concat( + axis=1, values=[tf.squeeze(conv_final), tf.squeeze(conv_one_row)]) + conv_final = tf.expand_dims(conv_final, -1) + return conv_final + + +def accuracy(predictions, labels): + """Calculates the classificaton accuracy. + + Args: + predictions: the predicted values, a tensor whose size matches 'labels'. + labels: the ground truth values, a tensor of any size. + + Returns: + a tensor whose value on evaluation returns the total accuracy. + """ + return tf.reduce_mean(tf.cast(tf.equal(predictions, labels), tf.float32)) + + +def compute_upsample_values(input_tensor, upsample_height, upsample_width): + """Compute values for an upsampling op (ops.BatchCropAndResize). + + Args: + input_tensor: image tensor with shape [batch, height, width, in_channels] + upsample_height: integer + upsample_width: integer + + Returns: + grid_centers: tensor with shape [batch, 1] + crop_sizes: tensor with shape [batch, 1] + output_height: integer + output_width: integer + """ + batch, input_height, input_width, _ = input_tensor.shape + + height_half = input_height / 2. + width_half = input_width / 2. + grid_centers = tf.constant(batch * [[height_half, width_half]]) + crop_sizes = tf.constant(batch * [[input_height, input_width]]) + output_height = input_height * upsample_height + output_width = input_width * upsample_width + + return grid_centers, tf.to_float(crop_sizes), output_height, output_width + + +def compute_pairwise_distances(x, y): + """Computes the squared pairwise Euclidean distances between x and y. + + Args: + x: a tensor of shape [num_x_samples, num_features] + y: a tensor of shape [num_y_samples, num_features] + + Returns: + a distance matrix of dimensions [num_x_samples, num_y_samples]. + + Raises: + ValueError: if the inputs do no matched the specified dimensions. + """ + + if not len(x.get_shape()) == len(y.get_shape()) == 2: + raise ValueError('Both inputs should be matrices.') + + if x.get_shape().as_list()[1] != y.get_shape().as_list()[1]: + raise ValueError('The number of features should be the same.') + + norm = lambda x: tf.reduce_sum(tf.square(x), 1) + + # By making the `inner' dimensions of the two matrices equal to 1 using + # broadcasting then we are essentially substracting every pair of rows + # of x and y. + # x will be num_samples x num_features x 1, + # and y will be 1 x num_features x num_samples (after broadcasting). + # After the substraction we will get a + # num_x_samples x num_features x num_y_samples matrix. + # The resulting dist will be of shape num_y_samples x num_x_samples. + # and thus we need to transpose it again. + return tf.transpose(norm(tf.expand_dims(x, 2) - tf.transpose(y))) + + +def gaussian_kernel_matrix(x, y, sigmas): + r"""Computes a Guassian Radial Basis Kernel between the samples of x and y. + + We create a sum of multiple gaussian kernels each having a width sigma_i. + + Args: + x: a tensor of shape [num_samples, num_features] + y: a tensor of shape [num_samples, num_features] + sigmas: a tensor of floats which denote the widths of each of the + gaussians in the kernel. + Returns: + A tensor of shape [num_samples{x}, num_samples{y}] with the RBF kernel. + """ + beta = 1. / (2. * (tf.expand_dims(sigmas, 1))) + + dist = compute_pairwise_distances(x, y) + + s = tf.matmul(beta, tf.reshape(dist, (1, -1))) + + return tf.reshape(tf.reduce_sum(tf.exp(-s), 0), tf.shape(dist)) diff --git a/im2txt/README.md b/im2txt/README.md index 1cf151b966c26b77195d69b8f341d2c42572906b..223cf91fba52643e77116b4f6149bbd2bb8ba1c3 100644 --- a/im2txt/README.md +++ b/im2txt/README.md @@ -12,9 +12,9 @@ Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Full text available at: http://arxiv.org/abs/1609.06647 ## Contact -***Author:*** Chris Shallue (shallue@google.com). +***Author:*** Chris Shallue -***Pull requests and issues:*** @cshallue. +***Pull requests and issues:*** @cshallue ## Contents * [Model Overview](#model-overview) @@ -37,9 +37,7 @@ Full text available at: http://arxiv.org/abs/1609.06647 The *Show and Tell* model is a deep neural network that learns how to describe the content of images. For example: -
![Example captions](g3doc/example_captions.jpg) -
### Architecture @@ -66,9 +64,7 @@ learned during training. The following diagram illustrates the model architecture. -
![Show and Tell Architecture](g3doc/show_and_tell_architecture.png) -
In this diagram, \{*s*0, *s*1, ..., *s**N*-1\} are the words of the caption and \{*w**e**s*0, @@ -118,12 +114,12 @@ approximately 10 times slower. ### Install Required Packages First ensure that you have installed the following required packages: -* **Bazel** ([instructions](http://bazel.io/docs/install.html)). -* **TensorFlow** r0.12 or greater ([instructions](https://www.tensorflow.org/versions/master/get_started/os_setup.html)). -* **NumPy** ([instructions](http://www.scipy.org/install.html)). +* **Bazel** ([instructions](http://bazel.io/docs/install.html)) +* **TensorFlow** 1.0 or greater ([instructions](https://www.tensorflow.org/install/)) +* **NumPy** ([instructions](http://www.scipy.org/install.html)) * **Natural Language Toolkit (NLTK)**: - * First install NLTK ([instructions](http://www.nltk.org/install.html)). - * Then install the NLTK data ([instructions](http://www.nltk.org/data.html)). + * First install NLTK ([instructions](http://www.nltk.org/install.html)) + * Then install the NLTK data ([instructions](http://www.nltk.org/data.html)) ### Prepare the Training Data @@ -137,8 +133,7 @@ Each caption is a list of words. During preprocessing, a dictionary is created that assigns each word in the vocabulary to an integer-valued id. Each caption is encoded as a list of integer word ids in the `tf.SequenceExample` protos. -We have provided a script to download and preprocess the [MSCOCO] -(http://mscoco.org/) image captioning data set into this format. Downloading +We have provided a script to download and preprocess the [MSCOCO](http://mscoco.org/) image captioning data set into this format. Downloading and preprocessing the data may take several hours depending on your network and computer speed. Please be patient. @@ -150,7 +145,8 @@ available space for storing the downloaded and processed data. MSCOCO_DIR="${HOME}/im2txt/data/mscoco" # Build the preprocessing script. -bazel build im2txt/download_and_preprocess_mscoco +cd tensorflow-models/im2txt +bazel build //im2txt:download_and_preprocess_mscoco # Run the preprocessing script. bazel-bin/im2txt/download_and_preprocess_mscoco "${MSCOCO_DIR}" @@ -216,7 +212,8 @@ INCEPTION_CHECKPOINT="${HOME}/im2txt/data/inception_v3.ckpt" MODEL_DIR="${HOME}/im2txt/model" # Build the model. -bazel build -c opt im2txt/... +cd tensorflow-models/im2txt +bazel build -c opt //im2txt/... # Run the training script. bazel-bin/im2txt/train \ @@ -266,8 +263,7 @@ tensorboard --logdir="${MODEL_DIR}" ### Fine Tune the Inception v3 Model Your model will already be able to generate reasonable captions after the first -phase of training. Try it out! (See [Generating Captions] -(#generating-captions)). +phase of training. Try it out! (See [Generating Captions](#generating-captions)). You can further improve the performance of the model by running a second training phase to jointly fine-tune the parameters of the *Inception v3* @@ -296,8 +292,12 @@ Your trained *Show and Tell* model can generate captions for any JPEG image! The following command line will generate captions for an image from the test set. ```shell -# Directory containing model checkpoints. -CHECKPOINT_DIR="${HOME}/im2txt/model/train" +# Path to checkpoint file or a directory containing checkpoint files. Passing +# a directory will only work if there is also a file named 'checkpoint' which +# lists the available checkpoints in the directory. It will not work if you +# point to a directory with just a copy of a model checkpoint: in that case, +# you will need to pass the checkpoint path explicitly. +CHECKPOINT_PATH="${HOME}/im2txt/model/train" # Vocabulary file generated by the preprocessing script. VOCAB_FILE="${HOME}/im2txt/data/mscoco/word_counts.txt" @@ -306,7 +306,8 @@ VOCAB_FILE="${HOME}/im2txt/data/mscoco/word_counts.txt" IMAGE_FILE="${HOME}/im2txt/data/mscoco/raw-data/val2014/COCO_val2014_000000224477.jpg" # Build the inference binary. -bazel build -c opt im2txt/run_inference +cd tensorflow-models/im2txt +bazel build -c opt //im2txt:run_inference # Ignore GPU devices (only necessary if your GPU is currently memory # constrained, for example, by running the training script). @@ -314,7 +315,7 @@ export CUDA_VISIBLE_DEVICES="" # Run inference to generate captions. bazel-bin/im2txt/run_inference \ - --checkpoint_path=${CHECKPOINT_DIR} \ + --checkpoint_path=${CHECKPOINT_PATH} \ --vocab_file=${VOCAB_FILE} \ --input_files=${IMAGE_FILE} ``` @@ -333,6 +334,4 @@ expected. Here is the image: -
![Surfer](g3doc/COCO_val2014_000000224477.jpg) -
diff --git a/im2txt/im2txt/data/build_mscoco_data.py b/im2txt/im2txt/data/build_mscoco_data.py index be7311dafcb468663a7839a7f921bc80afc7cd59..cb70024759c9ff76149a4d710d233823fb004dd1 100644 --- a/im2txt/im2txt/data/build_mscoco_data.py +++ b/im2txt/im2txt/data/build_mscoco_data.py @@ -424,7 +424,7 @@ def _load_and_process_metadata(captions_file, image_dir): (len(id_to_filename), captions_file)) # Process the captions and combine the data into a list of ImageMetadata. - print("Proccessing captions.") + print("Processing captions.") image_metadata = [] num_captions = 0 for image_id, base_filename in id_to_filename: diff --git a/im2txt/im2txt/evaluate.py b/im2txt/im2txt/evaluate.py index 3ff6e5932dd7220ce1fc220bd8a063e5ee166116..61d2ce1b9f8c53aab7500915ad382bfa954a18b8 100644 --- a/im2txt/im2txt/evaluate.py +++ b/im2txt/im2txt/evaluate.py @@ -62,7 +62,7 @@ def evaluate_model(sess, model, global_step, summary_writer, summary_op): sess: Session object. model: Instance of ShowAndTellModel; the model to evaluate. global_step: Integer; global step of the model checkpoint. - summary_writer: Instance of SummaryWriter. + summary_writer: Instance of FileWriter. summary_op: Op for generating model summaries. """ # Log model summaries on a single batch. @@ -91,7 +91,7 @@ def evaluate_model(sess, model, global_step, summary_writer, summary_op): perplexity = math.exp(sum_losses / sum_weights) tf.logging.info("Perplexity = %f (%.2g sec)", perplexity, eval_time) - # Log perplexity to the SummaryWriter. + # Log perplexity to the FileWriter. summary = tf.Summary() value = summary.value.add() value.simple_value = perplexity @@ -110,7 +110,7 @@ def run_once(model, saver, summary_writer, summary_op): Args: model: Instance of ShowAndTellModel; the model to evaluate. saver: Instance of tf.train.Saver for restoring model Variables. - summary_writer: Instance of SummaryWriter. + summary_writer: Instance of FileWriter. summary_op: Op for generating model summaries. """ model_path = tf.train.latest_checkpoint(FLAGS.checkpoint_dir) @@ -171,8 +171,8 @@ def run(): saver = tf.train.Saver() # Create the summary operation and the summary writer. - summary_op = tf.merge_all_summaries() - summary_writer = tf.train.SummaryWriter(eval_dir) + summary_op = tf.summary.merge_all() + summary_writer = tf.summary.FileWriter(eval_dir) g.finalize() diff --git a/im2txt/im2txt/run_inference.py b/im2txt/im2txt/run_inference.py index 00acd464bf134f77aac1c650768f303f67f5bb2e..672ab9a34abdd6704d4820a92745afc5af1c6f72 100644 --- a/im2txt/im2txt/run_inference.py +++ b/im2txt/im2txt/run_inference.py @@ -39,6 +39,8 @@ tf.flags.DEFINE_string("input_files", "", "File pattern or comma-separated list of file patterns " "of image files.") +tf.logging.set_verbosity(tf.logging.INFO) + def main(_): # Build the inference graph. diff --git a/im2txt/im2txt/show_and_tell_model.py b/im2txt/im2txt/show_and_tell_model.py index 1292ea3e6e50a770c14ced5c666e02ff1fe2efd2..0ac29e7fdb80fbefe3594eabc972648a3fb32312 100644 --- a/im2txt/im2txt/show_and_tell_model.py +++ b/im2txt/im2txt/show_and_tell_model.py @@ -264,7 +264,7 @@ class ShowAndTellModel(object): if self.mode == "inference": # In inference mode, use concatenated states for convenient feeding and # fetching. - tf.concat(initial_state, 1, name="initial_state") + tf.concat(axis=1, values=initial_state, name="initial_state") # Placeholder for feeding a batch of concatenated states. state_feed = tf.placeholder(dtype=tf.float32, @@ -274,11 +274,11 @@ class ShowAndTellModel(object): # Run a single LSTM step. lstm_outputs, state_tuple = lstm_cell( - inputs=tf.squeeze(self.seq_embeddings, squeeze_dims=[1]), + inputs=tf.squeeze(self.seq_embeddings, axis=[1]), state=state_tuple) # Concatentate the resulting state. - tf.concat(state_tuple, 1, name="state") + tf.concat(axis=1, values=state_tuple, name="state") else: # Run the batch of sequence embeddings through the LSTM. sequence_length = tf.reduce_sum(self.input_mask, 1) diff --git a/inception/README.md b/inception/README.md index 9c9bde9e06f6ac1608467b3a2980971770b23f07..f4731213755714b01b49036bb8a745cf354df9dd 100644 --- a/inception/README.md +++ b/inception/README.md @@ -18,12 +18,15 @@ evaluation with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. Below is a visualization of the model architecture. -
![Inception-v3 Architecture](g3doc/inception_v3_architecture.png) -
## Description of Code +**NOTE**: For the most part, you will find a newer version of this code at [models/slim](https://github.com/tensorflow/models/tree/master/slim). In particular: + +* `inception_train.py` and `imagenet_train.py` should no longer be used. The slim editions for running on multiple GPUs are the current best examples. +* `inception_distributed_train.py` and `imagenet_distributed_train.py` are still valid examples of distributed training. + The code base provides three core binaries for: * Training an Inception v3 network from scratch across multiple GPUs and/or @@ -34,13 +37,12 @@ The code base provides three core binaries for: errors to fine tune the network weights. The training procedure employs synchronous stochastic gradient descent across -multiple GPUs. The user may specify the number of GPUs they wish harness. The +multiple GPUs. The user may specify the number of GPUs they wish to harness. The synchronous training performs *batch-splitting* by dividing a given batch across multiple GPUs. The training set up is nearly identical to the section [Training a Model Using -Multiple GPU Cards] -(https://www.tensorflow.org/tutorials/deep_cnn/index.html#training-a-model-using-multiple-gpu-cards) +Multiple GPU Cards](https://www.tensorflow.org/tutorials/deep_cnn/index.html#launching_and_training_the_model_on_multiple_gpu_cards) where we have substituted the CIFAR-10 model architecture with Inception v3. The primary differences with that setup are: @@ -49,18 +51,12 @@ primary differences with that setup are: * Specify the model architecture using a (still experimental) higher level language called TensorFlow-Slim. -For more details about TensorFlow-Slim, please see the [Slim README] -(inception/slim/README.md). Please note that this higher-level language is still +For more details about TensorFlow-Slim, please see the [Slim README](inception/slim/README.md). Please note that this higher-level language is still *experimental* and the API may change over time depending on usage and subsequent research. ## Getting Started -**NOTE** Before doing anything, we first need to build TensorFlow from source, -and installed as a PIP package. Please follow the instructions at [Installing -From Source] -(https://www.tensorflow.org/get_started/os_setup.html#create-the-pip-package-and-install). - Before you run the training script for the first time, you will need to download and convert the ImageNet data to native TFRecord format. The TFRecord format consists of a set of sharded files where each entry is a serialized `tf.Example` @@ -73,8 +69,7 @@ downloading and converting ImageNet data to TFRecord format. Downloading and preprocessing the data may take several hours (up to half a day) depending on your network and computer speed. Please be patient. -To begin, you will need to sign up for an account with [ImageNet] -(http://image-net.org) to gain access to the data. Look for the sign up page, +To begin, you will need to sign up for an account with [ImageNet](http://image-net.org) to gain access to the data. Look for the sign up page, create an account and request an access key to download the data. After you have `USERNAME` and `PASSWORD`, you are ready to run our script. Make @@ -91,7 +86,8 @@ you will not need to interact with the script again. DATA_DIR=$HOME/imagenet-data # build the preprocessing script. -bazel build inception/download_and_preprocess_imagenet +cd tensorflow-models/inception +bazel build //inception:download_and_preprocess_imagenet # run it bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}" @@ -103,9 +99,9 @@ The final line of the output script should read: 2016-02-17 14:30:17.287989: Finished writing all 1281167 images in data set. ``` -When the script finishes you will find 1024 and 128 training and validation -files in the `DATA_DIR`. The files will match the patterns `train-????-of-1024` -and `validation-?????-of-00128`, respectively. +When the script finishes, you will find 1024 training files and 128 validation +files in the `DATA_DIR`. The files will match the patterns +`train-?????-of-01024` and `validation-?????-of-00128`, respectively. [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now ready to train or evaluate with the ImageNet data set. @@ -116,15 +112,12 @@ ready to train or evaluate with the ImageNet data set. intensive task and depending on your compute setup may take several days or even weeks. -*Before proceeding* please read the [Convolutional Neural Networks] -(https://www.tensorflow.org/tutorials/deep_cnn/index.html) tutorial in -particular focus on [Training a Model Using Multiple GPU Cards] -(https://www.tensorflow.org/tutorials/deep_cnn/index.html#training-a-model-using-multiple-gpu-cards) -. The model training method is nearly identical to that described in the +*Before proceeding* please read the [Convolutional Neural Networks](https://www.tensorflow.org/tutorials/deep_cnn/index.html) tutorial; in +particular, focus on [Training a Model Using Multiple GPU Cards](https://www.tensorflow.org/tutorials/deep_cnn/index.html#launching_and_training_the_model_on_multiple_gpu_cards). The model training method is nearly identical to that described in the CIFAR-10 multi-GPU model training. Briefly, the model training -* Places an individual model replica on each GPU. Split the batch across the - GPUs. +* Places an individual model replica on each GPU. +* Splits the batch across the GPUs. * Updates model parameters synchronously by waiting for all GPUs to finish processing a batch of data. @@ -161,7 +154,8 @@ To train this model, you simply need to specify the following: ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/imagenet_train +cd tensorflow-models/inception +bazel build //inception:imagenet_train # run it bazel-bin/inception/imagenet_train --num_gpus=1 --batch_size=32 --train_dir=/tmp/imagenet_train --data_dir=/tmp/imagenet_data @@ -197,7 +191,8 @@ GPU cards. ```shell # Build the model. Note that we need to make sure the TensorFlow is ready to # use before this as this command will not build TensorFlow. -bazel build inception/imagenet_train +cd tensorflow-models/inception +bazel build //inception:imagenet_train # run it bazel-bin/inception/imagenet_train --num_gpus=2 --batch_size=64 --train_dir=/tmp/imagenet_train @@ -250,11 +245,9 @@ We term each machine that maintains model parameters a `ps`, short for `ps` as the model parameters may be sharded across multiple machines. Variables may be updated with synchronous or asynchronous gradient updates. One -may construct a an [`Optimizer`] -(https://www.tensorflow.org/api_docs/python/train.html#optimizers) in TensorFlow -that constructs the necessary graph for either case diagrammed below from -TensorFlow [Whitepaper] -(http://download.tensorflow.org/paper/whitepaper2015.pdf): +may construct a an [`Optimizer`](https://www.tensorflow.org/api_docs/python/train.html#optimizers) in TensorFlow +that constructs the necessary graph for either case diagrammed below from the +TensorFlow [Whitepaper](http://download.tensorflow.org/paper/whitepaper2015.pdf):
"${LABELS # Generate the validation data set. while read LABEL; do - VALIDATION_DIR_FOR_LABEL="${VALIDATION_DIRECTORY}${LABEL}" - TRAIN_DIR_FOR_LABEL="${TRAIN_DIRECTORY}${LABEL}" + VALIDATION_DIR_FOR_LABEL="${VALIDATION_DIRECTORY}/${LABEL}" + TRAIN_DIR_FOR_LABEL="${TRAIN_DIRECTORY}/${LABEL}" # Move the first randomly selected 100 images to the validation set. mkdir -p "${VALIDATION_DIR_FOR_LABEL}" VALIDATION_IMAGES=$(ls -1 "${TRAIN_DIR_FOR_LABEL}" | shuf | head -100) for IMAGE in ${VALIDATION_IMAGES}; do - mv -f "${TRAIN_DIRECTORY}${LABEL}/${IMAGE}" "${VALIDATION_DIR_FOR_LABEL}" + mv -f "${TRAIN_DIRECTORY}/${LABEL}/${IMAGE}" "${VALIDATION_DIR_FOR_LABEL}" done done < "${LABELS_FILE}" diff --git a/inception/inception/data/download_and_preprocess_flowers_mac.sh b/inception/inception/data/download_and_preprocess_flowers_mac.sh index 794301b4e8498315408da35a1d353563c6c0ac0d..154905635b19aeaaea087a8e76afda9b8c624d59 100644 --- a/inception/inception/data/download_and_preprocess_flowers_mac.sh +++ b/inception/inception/data/download_and_preprocess_flowers_mac.sh @@ -35,7 +35,7 @@ set -e if [ -z "$1" ]; then - echo "usage download_and_preprocess_flowers.sh [data dir]" + echo "Usage: download_and_preprocess_flowers.sh [data dir]" exit fi @@ -53,7 +53,7 @@ cd "${DATA_DIR}" TARBALL="flower_photos.tgz" if [ ! -f ${TARBALL} ]; then echo "Downloading flower data set." - wget -O ${TARBALL} "${DATA_URL}" + curl -o ${TARBALL} "${DATA_URL}" else echo "Skipping download of flower data." fi diff --git a/inception/inception/data/download_and_preprocess_imagenet.sh b/inception/inception/data/download_and_preprocess_imagenet.sh index 682ade70fe350b742211b5093c9cf33e9fc19119..6faae831075d4f6bfdc8bf8797219f7a0e4c1797 100755 --- a/inception/inception/data/download_and_preprocess_imagenet.sh +++ b/inception/inception/data/download_and_preprocess_imagenet.sh @@ -26,7 +26,7 @@ # data_dir/train-00000-of-01024 # data_dir/train-00001-of-01024 # ... -# data_dir/train-00127-of-01024 +# data_dir/train-01023-of-01024 # # and # @@ -49,7 +49,7 @@ set -e if [ -z "$1" ]; then - echo "usage download_and_preprocess_imagenet.sh [data dir]" + echo "Usage: download_and_preprocess_imagenet.sh [data dir]" exit fi @@ -84,7 +84,7 @@ BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv" BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/" "${BOUNDING_BOX_SCRIPT}" "${BOUNDING_BOX_DIR}" "${LABELS_FILE}" \ - | sort >"${BOUNDING_BOX_FILE}" + | sort > "${BOUNDING_BOX_FILE}" echo "Finished downloading and preprocessing the ImageNet data." # Build the TFRecords version of the ImageNet data. diff --git a/inception/inception/data/download_imagenet.sh b/inception/inception/data/download_imagenet.sh index 2611c538f3aa238a1c23390c26883b76cbe6c131..49b3b7d5609d92392420b015b5509077dc560e8d 100755 --- a/inception/inception/data/download_imagenet.sh +++ b/inception/inception/data/download_imagenet.sh @@ -24,7 +24,7 @@ # downloading the raw images. # # usage: -# ./download_imagenet.sh [dirname] +# ./download_imagenet.sh [dir name] [synsets file] set -e if [ "x$IMAGENET_ACCESS_KEY" == x -o "x$IMAGENET_USERNAME" == x ]; then @@ -40,7 +40,6 @@ fi OUTDIR="${1:-./imagenet-data}" SYNSETS_FILE="${2:-./synsets.txt}" -SYNSETS_FILE="${PWD}/${SYNSETS_FILE}" echo "Saving downloaded files to $OUTDIR" mkdir -p "${OUTDIR}" diff --git a/inception/inception/data/process_bounding_boxes.py b/inception/inception/data/process_bounding_boxes.py index c1a974449b63474cb906043f323cc831b93cfbb8..5e9fd786e40b6d95b89fcc9f9774aa7f132c1a6f 100755 --- a/inception/inception/data/process_bounding_boxes.py +++ b/inception/inception/data/process_bounding_boxes.py @@ -102,7 +102,9 @@ def GetItem(name, root, index=0): def GetInt(name, root, index=0): - return int(GetItem(name, root, index)) + # In some XML annotation files, the point values are not integers, but floats. + # So we add a float function to avoid ValueError. + return int(float(GetItem(name, root, index))) def FindNumberBoundingBoxes(root): diff --git a/inception/inception/image_processing.py b/inception/inception/image_processing.py index 6d8b992ea61e300425fcefa37f8521d77ab0fbd6..df1681205f02cedfa8a9613f3e9f6f3f9cb33a8b 100644 --- a/inception/inception/image_processing.py +++ b/inception/inception/image_processing.py @@ -142,11 +142,12 @@ def decode_jpeg(image_buffer, scope=None): Args: image_buffer: scalar string Tensor. - scope: Optional scope for op_scope. + scope: Optional scope for name_scope. Returns: 3-D float Tensor with values ranging from [0, 1). """ - with tf.op_scope([image_buffer], scope, 'decode_jpeg'): + with tf.name_scope(values=[image_buffer], name=scope, + default_name='decode_jpeg'): # Decode the string as an RGB JPEG. # Note that the resulting image contains an unknown height and width # that is set dynamically by decode_jpeg. In other words, the height @@ -171,11 +172,11 @@ def distort_color(image, thread_id=0, scope=None): Args: image: Tensor containing single image. thread_id: preprocessing thread ID. - scope: Optional scope for op_scope. + scope: Optional scope for name_scope. Returns: color-distorted image """ - with tf.op_scope([image], scope, 'distort_color'): + with tf.name_scope(values=[image], name=scope, default_name='distort_color'): color_ordering = thread_id % 2 if color_ordering == 0: @@ -209,11 +210,12 @@ def distort_image(image, height, width, bbox, thread_id=0, scope=None): where each coordinate is [0, 1) and the coordinates are arranged as [ymin, xmin, ymax, xmax]. thread_id: integer indicating the preprocessing thread. - scope: Optional scope for op_scope. + scope: Optional scope for name_scope. Returns: 3-D float Tensor of distorted image used for training. """ - with tf.op_scope([image, height, width, bbox], scope, 'distort_image'): + with tf.name_scope(values=[image, height, width, bbox], name=scope, + default_name='distort_image'): # Each bounding box has shape [1, num_boxes, box coords] and # the coordinates are ordered [ymin, xmin, ymax, xmax]. @@ -221,7 +223,7 @@ def distort_image(image, height, width, bbox, thread_id=0, scope=None): if not thread_id: image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0), bbox) - tf.image_summary('image_with_bounding_boxes', image_with_box) + tf.summary.image('image_with_bounding_boxes', image_with_box) # A large fraction of image datasets contain a human-annotated bounding # box delineating the region of the image containing the object of interest. @@ -242,7 +244,7 @@ def distort_image(image, height, width, bbox, thread_id=0, scope=None): if not thread_id: image_with_distorted_box = tf.image.draw_bounding_boxes( tf.expand_dims(image, 0), distort_bbox) - tf.image_summary('images_with_distorted_bounding_box', + tf.summary.image('images_with_distorted_bounding_box', image_with_distorted_box) # Crop the image to the specified bounding box. @@ -259,7 +261,7 @@ def distort_image(image, height, width, bbox, thread_id=0, scope=None): # the third dimension. distorted_image.set_shape([height, width, 3]) if not thread_id: - tf.image_summary('cropped_resized_image', + tf.summary.image('cropped_resized_image', tf.expand_dims(distorted_image, 0)) # Randomly flip the image horizontally. @@ -269,7 +271,7 @@ def distort_image(image, height, width, bbox, thread_id=0, scope=None): distorted_image = distort_color(distorted_image, thread_id) if not thread_id: - tf.image_summary('final_distorted_image', + tf.summary.image('final_distorted_image', tf.expand_dims(distorted_image, 0)) return distorted_image @@ -281,11 +283,12 @@ def eval_image(image, height, width, scope=None): image: 3-D float Tensor height: integer width: integer - scope: Optional scope for op_scope. + scope: Optional scope for name_scope. Returns: 3-D float Tensor of prepared image. """ - with tf.op_scope([image, height, width], scope, 'eval_image'): + with tf.name_scope(values=[image, height, width], name=scope, + default_name='eval_image'): # Crop the central region of the image with an area containing 87.5% of # the original image. image = tf.image.central_crop(image, central_fraction=0.875) @@ -328,8 +331,8 @@ def image_preprocessing(image_buffer, bbox, train, thread_id=0): image = eval_image(image, height, width) # Finally, rescale to [-1,1] instead of [0, 1) - image = tf.sub(image, 0.5) - image = tf.mul(image, 2.0) + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.0) return image @@ -394,7 +397,7 @@ def parse_example_proto(example_serialized): ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0) # Note that we impose an ordering of (y, x) just to make life difficult. - bbox = tf.concat(0, [ymin, xmin, ymax, xmax]) + bbox = tf.concat(axis=0, values=[ymin, xmin, ymax, xmax]) # Force the variable number of bounding boxes into the shape # [1, num_boxes, coords]. @@ -505,6 +508,6 @@ def batch_inputs(dataset, batch_size, train, num_preprocess_threads=None, images = tf.reshape(images, shape=[batch_size, height, width, depth]) # Display the training images in the visualizer. - tf.image_summary('images', images) + tf.summary.image('images', images) return images, tf.reshape(label_index_batch, [batch_size]) diff --git a/inception/inception/imagenet_distributed_train.py b/inception/inception/imagenet_distributed_train.py index 1c3ee3ab8eb676d6083f1638cf4a2fa7730a9183..f3615e012f042649b52e37aeaeeb2c3efc07f92c 100644 --- a/inception/inception/imagenet_distributed_train.py +++ b/inception/inception/imagenet_distributed_train.py @@ -45,7 +45,8 @@ def main(unused_args): {'ps': ps_hosts, 'worker': worker_hosts}, job_name=FLAGS.job_name, - task_index=FLAGS.task_id) + task_index=FLAGS.task_id, + protocol=FLAGS.protocol) if FLAGS.job_name == 'ps': # `ps` jobs wait for incoming connections from the workers. diff --git a/inception/inception/imagenet_eval.py b/inception/inception/imagenet_eval.py index 5444f192786822695f3caaf219d4a72bb6e874df..e6f8bac2ee71021914715172296d63dd56b5a6f9 100644 --- a/inception/inception/imagenet_eval.py +++ b/inception/inception/imagenet_eval.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""A binary to evaluate Inception on the flowers data set. +"""A binary to evaluate Inception on the ImageNet data set. Note that using the supplied pre-trained inception checkpoint, the eval should achieve: diff --git a/inception/inception/inception_distributed_train.py b/inception/inception/inception_distributed_train.py index 0cfbd97ee87eafbc42df3e3000f8eae9aa2f4bac..c1a589acb5fe386fd648ae3fae926ee927c0ca79 100644 --- a/inception/inception/inception_distributed_train.py +++ b/inception/inception/inception_distributed_train.py @@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '', """Comma-separated list of hostname:port for the """ """worker jobs. e.g. """ """'machine1:2222,machine2:1111,machine2:2222'""") +tf.app.flags.DEFINE_string('protocol', 'grpc', + """Communication protocol to use in distributed """ + """execution (default grpc) """) tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train', """Directory where to write event logs """ @@ -52,11 +55,11 @@ tf.app.flags.DEFINE_boolean('log_device_placement', False, 'Whether to log device placement.') # Task ID is used to select the chief and also to access the local_step for -# each replica to check staleness of the gradients in sync_replicas_optimizer. +# each replica to check staleness of the gradients in SyncReplicasOptimizer. tf.app.flags.DEFINE_integer( 'task_id', 0, 'Task ID of the worker/replica running the training.') -# More details can be found in the sync_replicas_optimizer class: +# More details can be found in the SyncReplicasOptimizer class: # tensorflow/python/training/sync_replicas_optimizer.py tf.app.flags.DEFINE_integer('num_replicas_to_aggregate', -1, """Number of gradients to collect before """ @@ -89,7 +92,7 @@ RMSPROP_EPSILON = 1.0 # Epsilon term for RMSProp. def train(target, dataset, cluster_spec): """Train Inception on a dataset for a number of steps.""" - # Number of workers and parameter servers are infered from the workers and ps + # Number of workers and parameter servers are inferred from the workers and ps # hosts string. num_workers = len(cluster_spec.as_dict()['worker']) num_parameter_servers = len(cluster_spec.as_dict()['ps']) @@ -133,7 +136,7 @@ def train(target, dataset, cluster_spec): FLAGS.learning_rate_decay_factor, staircase=True) # Add a summary to track the learning rate. - tf.scalar_summary('learning_rate', lr) + tf.summary.scalar('learning_rate', lr) # Create an optimizer that performs gradient descent. opt = tf.train.RMSPropOptimizer(lr, @@ -171,8 +174,8 @@ def train(target, dataset, cluster_spec): loss_name = l.op.name # Name each loss as '(raw)' and name the moving average version of the # loss as the original loss name. - tf.scalar_summary(loss_name + ' (raw)', l) - tf.scalar_summary(loss_name, loss_averages.average(l)) + tf.summary.scalar(loss_name + ' (raw)', l) + tf.summary.scalar(loss_name, loss_averages.average(l)) # Add dependency to compute loss_averages. with tf.control_dependencies([loss_averages_op]): @@ -191,13 +194,12 @@ def train(target, dataset, cluster_spec): # Add histograms for model variables. for var in variables_to_average: - tf.histogram_summary(var.op.name, var) + tf.summary.histogram(var.op.name, var) # Create synchronous replica optimizer. opt = tf.train.SyncReplicasOptimizer( opt, replicas_to_aggregate=num_replicas_to_aggregate, - replica_id=FLAGS.task_id, total_num_replicas=num_workers, variable_averages=exp_moving_averager, variables_to_average=variables_to_average) @@ -215,25 +217,23 @@ def train(target, dataset, cluster_spec): # Add histograms for gradients. for grad, var in grads: if grad is not None: - tf.histogram_summary(var.op.name + '/gradients', grad) + tf.summary.histogram(var.op.name + '/gradients', grad) apply_gradients_op = opt.apply_gradients(grads, global_step=global_step) with tf.control_dependencies([apply_gradients_op]): train_op = tf.identity(total_loss, name='train_op') - # Get chief queue_runners, init_tokens and clean_up_op, which is used to - # synchronize replicas. - # More details can be found in sync_replicas_optimizer. + # Get chief queue_runners and init_tokens, which is used to synchronize + # replicas. More details can be found in SyncReplicasOptimizer. chief_queue_runners = [opt.get_chief_queue_runner()] init_tokens_op = opt.get_init_tokens_op() - clean_up_op = opt.get_clean_up_op() # Create a saver. saver = tf.train.Saver() # Build the summary operation based on the TF collection of Summaries. - summary_op = tf.merge_all_summaries() + summary_op = tf.summary.merge_all() # Build an initialization operation to run below. init_op = tf.global_variables_initializer() @@ -301,8 +301,7 @@ def train(target, dataset, cluster_spec): next_summary_time += FLAGS.save_summaries_secs except: if is_chief: - tf.logging.info('About to execute sync_clean_up_op!') - sess.run(clean_up_op) + tf.logging.info('Chief got exception while running!') raise # Stop the supervisor. This also waits for service threads to finish. diff --git a/inception/inception/inception_eval.py b/inception/inception/inception_eval.py index b91b2f9f059b6f56a7ab84f1d165a9b18de90280..e7cfc3c399dd82a915b3a49c7ddd4a8565292f69 100644 --- a/inception/inception/inception_eval.py +++ b/inception/inception/inception_eval.py @@ -77,7 +77,7 @@ def _eval_once(saver, summary_writer, top_1_op, top_5_op, summary_op): # /my-favorite-path/imagenet_train/model.ckpt-0, # extract global_step from it. global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] - print('Succesfully loaded model from %s at step=%s.' % + print('Successfully loaded model from %s at step=%s.' % (ckpt.model_checkpoint_path, global_step)) else: print('No checkpoint file found') @@ -158,10 +158,10 @@ def evaluate(dataset): saver = tf.train.Saver(variables_to_restore) # Build the summary operation based on the TF collection of Summaries. - summary_op = tf.merge_all_summaries() + summary_op = tf.summary.merge_all() graph_def = tf.get_default_graph().as_graph_def() - summary_writer = tf.train.SummaryWriter(FLAGS.eval_dir, + summary_writer = tf.summary.FileWriter(FLAGS.eval_dir, graph_def=graph_def) while True: diff --git a/inception/inception/inception_model.py b/inception/inception/inception_model.py index b15615dd8bd25e30608fe1b80251a269d5b129d9..fedae13ae712f09d23ff020b161d86e87ee46e95 100644 --- a/inception/inception/inception_model.py +++ b/inception/inception/inception_model.py @@ -115,7 +115,7 @@ def loss(logits, labels, batch_size=None): # shape [FLAGS.batch_size, num_classes]. sparse_labels = tf.reshape(labels, [batch_size, 1]) indices = tf.reshape(tf.range(batch_size), [batch_size, 1]) - concated = tf.concat(1, [indices, sparse_labels]) + concated = tf.concat(axis=1, values=[indices, sparse_labels]) num_classes = logits[0].get_shape()[-1].value dense_labels = tf.sparse_to_dense(concated, [batch_size, num_classes], @@ -147,8 +147,8 @@ def _activation_summary(x): # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training # session. This helps the clarity of presentation on tensorboard. tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name) - tf.contrib.deprecated.histogram_summary(tensor_name + '/activations', x) - tf.contrib.deprecated.scalar_summary(tensor_name + '/sparsity', tf.nn.zero_fraction(x)) + tf.summary.histogram(tensor_name + '/activations', x) + tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x)) def _activation_summaries(endpoints): diff --git a/inception/inception/inception_train.py b/inception/inception/inception_train.py index 3794184d2efe34758220f9b382bafddeba694ed1..e1c32713b2012aec8a18637ec5dd79a1cc84d90f 100644 --- a/inception/inception/inception_train.py +++ b/inception/inception/inception_train.py @@ -12,7 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""A library to train Inception using multiple GPU's with synchronous updates. +"""A library to train Inception using multiple GPUs with synchronous updates. """ from __future__ import absolute_import from __future__ import division @@ -83,7 +83,7 @@ def _tower_loss(images, labels, num_classes, scope, reuse_variables=None): """Calculate the total loss on a single tower running the ImageNet model. We perform 'batch splitting'. This means that we cut up a batch across - multiple GPU's. For instance, if the batch size = 32 and num_gpus = 2, + multiple GPUs. For instance, if the batch size = 32 and num_gpus = 2, then each tower will operate on an batch of 16 images. Args: @@ -132,8 +132,8 @@ def _tower_loss(images, labels, num_classes, scope, reuse_variables=None): loss_name = re.sub('%s_[0-9]*/' % inception.TOWER_NAME, '', l.op.name) # Name each loss as '(raw)' and name the moving average version of the loss # as the original loss name. - tf.scalar_summary(loss_name +' (raw)', l) - tf.scalar_summary(loss_name, loss_averages.average(l)) + tf.summary.scalar(loss_name +' (raw)', l) + tf.summary.scalar(loss_name, loss_averages.average(l)) with tf.control_dependencies([loss_averages_op]): total_loss = tf.identity(total_loss) @@ -166,7 +166,7 @@ def _average_gradients(tower_grads): grads.append(expanded_g) # Average over the 'tower' dimension. - grad = tf.concat(0, grads) + grad = tf.concat(axis=0, values=grads) grad = tf.reduce_mean(grad, 0) # Keep in mind that the Variables are redundant because they are shared @@ -223,8 +223,8 @@ def train(dataset): num_classes = dataset.num_classes() + 1 # Split the batch of images and labels for towers. - images_splits = tf.split(0, FLAGS.num_gpus, images) - labels_splits = tf.split(0, FLAGS.num_gpus, labels) + images_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=images) + labels_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=labels) # Calculate the gradients for each model tower. tower_grads = [] @@ -268,20 +268,20 @@ def train(dataset): summaries.extend(input_summaries) # Add a summary to track the learning rate. - summaries.append(tf.scalar_summary('learning_rate', lr)) + summaries.append(tf.summary.scalar('learning_rate', lr)) # Add histograms for gradients. for grad, var in grads: if grad is not None: summaries.append( - tf.histogram_summary(var.op.name + '/gradients', grad)) + tf.summary.histogram(var.op.name + '/gradients', grad)) # Apply the gradients to adjust the shared variables. apply_gradient_op = opt.apply_gradients(grads, global_step=global_step) # Add histograms for trainable variables. for var in tf.trainable_variables(): - summaries.append(tf.histogram_summary(var.op.name, var)) + summaries.append(tf.summary.histogram(var.op.name, var)) # Track the moving averages of all trainable variables. # Note that we maintain a "double-average" of the BatchNormalization @@ -290,7 +290,7 @@ def train(dataset): variable_averages = tf.train.ExponentialMovingAverage( inception.MOVING_AVERAGE_DECAY, global_step) - # Another possiblility is to use tf.slim.get_variables(). + # Another possibility is to use tf.slim.get_variables(). variables_to_average = (tf.trainable_variables() + tf.moving_average_variables()) variables_averages_op = variable_averages.apply(variables_to_average) @@ -301,10 +301,10 @@ def train(dataset): batchnorm_updates_op) # Create a saver. - saver = tf.train.Saver(tf.all_variables()) + saver = tf.train.Saver(tf.global_variables()) # Build the summary operation from the last tower summaries. - summary_op = tf.merge_summary(summaries) + summary_op = tf.summary.merge(summaries) # Build an initialization operation to run below. init = tf.global_variables_initializer() @@ -329,9 +329,9 @@ def train(dataset): # Start the queue runners. tf.train.start_queue_runners(sess=sess) - summary_writer = tf.train.SummaryWriter( + summary_writer = tf.summary.FileWriter( FLAGS.train_dir, - graph_def=sess.graph.as_graph_def(add_shapes=True)) + graph=sess.graph) for step in range(FLAGS.max_steps): start_time = time.time() diff --git a/inception/inception/slim/README.md b/inception/inception/slim/README.md index 1fda0fed0a91b2feac4696850cb155769db55f13..bfc6e70b78c312b8dd61f86ef9efffa227c59ac2 100644 --- a/inception/inception/slim/README.md +++ b/inception/inception/slim/README.md @@ -319,7 +319,7 @@ their use, consider the following example. def MyNewOp(inputs): varA = ... varB = ... - outputs = tf.mul(varA, inputs) + varB + outputs = tf.multiply(varA, inputs) + varB return outputs ``` @@ -445,15 +445,15 @@ defined with just the following snippet: ```python with arg_scope([slim.ops.conv2d, slim.ops.fc], stddev=0.01, weight_decay=0.0005): - net = slim.ops.repeat_op(1, inputs, slim.ops.conv2d, 64, [3, 3], scope='conv1') + net = slim.ops.repeat_op(2, inputs, slim.ops.conv2d, 64, [3, 3], scope='conv1') net = slim.ops.max_pool(net, [2, 2], scope='pool1') - net = slim.ops.repeat_op(1, net, slim.ops.conv2d, 128, [3, 3], scope='conv2') + net = slim.ops.repeat_op(2, net, slim.ops.conv2d, 128, [3, 3], scope='conv2') net = slim.ops.max_pool(net, [2, 2], scope='pool2') - net = slim.ops.repeat_op(2, net, slim.ops.conv2d, 256, [3, 3], scope='conv3') + net = slim.ops.repeat_op(3, net, slim.ops.conv2d, 256, [3, 3], scope='conv3') net = slim.ops.max_pool(net, [2, 2], scope='pool3') - net = slim.ops.repeat_op(2, net, slim.ops.conv2d, 512, [3, 3], scope='conv4') + net = slim.ops.repeat_op(3, net, slim.ops.conv2d, 512, [3, 3], scope='conv4') net = slim.ops.max_pool(net, [2, 2], scope='pool4') - net = slim.ops.repeat_op(2, net, slim.ops.conv2d, 512, [3, 3], scope='conv5') + net = slim.ops.repeat_op(3, net, slim.ops.conv2d, 512, [3, 3], scope='conv5') net = slim.ops.max_pool(net, [2, 2], scope='pool5') net = slim.ops.flatten(net, scope='flatten5') net = slim.ops.fc(net, 4096, scope='fc6') diff --git a/inception/inception/slim/inception_model.py b/inception/inception/slim/inception_model.py index e42a7be75b191c952963eb1f1969b231167be228..6136ab1ba68716f4f135110a4d5c518b732b23df 100644 --- a/inception/inception/slim/inception_model.py +++ b/inception/inception/slim/inception_model.py @@ -122,7 +122,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 32, [1, 1]) - net = tf.concat([branch1x1, branch5x5, branch3x3dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch5x5, branch3x3dbl, branch_pool]) end_points['mixed_35x35x256a'] = net # mixed_1: 35 x 35 x 288. with tf.variable_scope('mixed_35x35x288a'): @@ -138,7 +138,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 64, [1, 1]) - net = tf.concat([branch1x1, branch5x5, branch3x3dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch5x5, branch3x3dbl, branch_pool]) end_points['mixed_35x35x288a'] = net # mixed_2: 35 x 35 x 288. with tf.variable_scope('mixed_35x35x288b'): @@ -154,7 +154,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 64, [1, 1]) - net = tf.concat([branch1x1, branch5x5, branch3x3dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch5x5, branch3x3dbl, branch_pool]) end_points['mixed_35x35x288b'] = net # mixed_3: 17 x 17 x 768. with tf.variable_scope('mixed_17x17x768a'): @@ -167,7 +167,7 @@ def inception_v3(inputs, stride=2, padding='VALID') with tf.variable_scope('branch_pool'): branch_pool = ops.max_pool(net, [3, 3], stride=2, padding='VALID') - net = tf.concat([branch3x3, branch3x3dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch3x3, branch3x3dbl, branch_pool]) end_points['mixed_17x17x768a'] = net # mixed4: 17 x 17 x 768. with tf.variable_scope('mixed_17x17x768b'): @@ -186,7 +186,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 192, [1, 1]) - net = tf.concat([branch1x1, branch7x7, branch7x7dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch7x7, branch7x7dbl, branch_pool]) end_points['mixed_17x17x768b'] = net # mixed_5: 17 x 17 x 768. with tf.variable_scope('mixed_17x17x768c'): @@ -205,7 +205,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 192, [1, 1]) - net = tf.concat([branch1x1, branch7x7, branch7x7dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch7x7, branch7x7dbl, branch_pool]) end_points['mixed_17x17x768c'] = net # mixed_6: 17 x 17 x 768. with tf.variable_scope('mixed_17x17x768d'): @@ -224,7 +224,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 192, [1, 1]) - net = tf.concat([branch1x1, branch7x7, branch7x7dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch7x7, branch7x7dbl, branch_pool]) end_points['mixed_17x17x768d'] = net # mixed_7: 17 x 17 x 768. with tf.variable_scope('mixed_17x17x768e'): @@ -243,7 +243,7 @@ def inception_v3(inputs, with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 192, [1, 1]) - net = tf.concat([branch1x1, branch7x7, branch7x7dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch7x7, branch7x7dbl, branch_pool]) end_points['mixed_17x17x768e'] = net # Auxiliary Head logits aux_logits = tf.identity(end_points['mixed_17x17x768e']) @@ -276,7 +276,7 @@ def inception_v3(inputs, stride=2, padding='VALID') with tf.variable_scope('branch_pool'): branch_pool = ops.max_pool(net, [3, 3], stride=2, padding='VALID') - net = tf.concat([branch3x3, branch7x7x3, branch_pool], 3) + net = tf.concat(axis=3, values=[branch3x3, branch7x7x3, branch_pool]) end_points['mixed_17x17x1280a'] = net # mixed_9: 8 x 8 x 2048. with tf.variable_scope('mixed_8x8x2048a'): @@ -284,17 +284,17 @@ def inception_v3(inputs, branch1x1 = ops.conv2d(net, 320, [1, 1]) with tf.variable_scope('branch3x3'): branch3x3 = ops.conv2d(net, 384, [1, 1]) - branch3x3 = tf.concat([ops.conv2d(branch3x3, 384, [1, 3]), - ops.conv2d(branch3x3, 384, [3, 1])], 3) + branch3x3 = tf.concat(axis=3, values=[ops.conv2d(branch3x3, 384, [1, 3]), + ops.conv2d(branch3x3, 384, [3, 1])]) with tf.variable_scope('branch3x3dbl'): branch3x3dbl = ops.conv2d(net, 448, [1, 1]) branch3x3dbl = ops.conv2d(branch3x3dbl, 384, [3, 3]) - branch3x3dbl = tf.concat([ops.conv2d(branch3x3dbl, 384, [1, 3]), - ops.conv2d(branch3x3dbl, 384, [3, 1])], 3) + branch3x3dbl = tf.concat(axis=3, values=[ops.conv2d(branch3x3dbl, 384, [1, 3]), + ops.conv2d(branch3x3dbl, 384, [3, 1])]) with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 192, [1, 1]) - net = tf.concat([branch1x1, branch3x3, branch3x3dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch3x3, branch3x3dbl, branch_pool]) end_points['mixed_8x8x2048a'] = net # mixed_10: 8 x 8 x 2048. with tf.variable_scope('mixed_8x8x2048b'): @@ -302,17 +302,17 @@ def inception_v3(inputs, branch1x1 = ops.conv2d(net, 320, [1, 1]) with tf.variable_scope('branch3x3'): branch3x3 = ops.conv2d(net, 384, [1, 1]) - branch3x3 = tf.concat([ops.conv2d(branch3x3, 384, [1, 3]), - ops.conv2d(branch3x3, 384, [3, 1])], 3) + branch3x3 = tf.concat(axis=3, values=[ops.conv2d(branch3x3, 384, [1, 3]), + ops.conv2d(branch3x3, 384, [3, 1])]) with tf.variable_scope('branch3x3dbl'): branch3x3dbl = ops.conv2d(net, 448, [1, 1]) branch3x3dbl = ops.conv2d(branch3x3dbl, 384, [3, 3]) - branch3x3dbl = tf.concat([ops.conv2d(branch3x3dbl, 384, [1, 3]), - ops.conv2d(branch3x3dbl, 384, [3, 1])], 3) + branch3x3dbl = tf.concat(axis=3, values=[ops.conv2d(branch3x3dbl, 384, [1, 3]), + ops.conv2d(branch3x3dbl, 384, [3, 1])]) with tf.variable_scope('branch_pool'): branch_pool = ops.avg_pool(net, [3, 3]) branch_pool = ops.conv2d(branch_pool, 192, [1, 1]) - net = tf.concat([branch1x1, branch3x3, branch3x3dbl, branch_pool], 3) + net = tf.concat(axis=3, values=[branch1x1, branch3x3, branch3x3dbl, branch_pool]) end_points['mixed_8x8x2048b'] = net # Final pooling and prediction with tf.variable_scope('logits'): diff --git a/inception/inception/slim/ops.py b/inception/inception/slim/ops.py index badc64a3bb796900a646c1a4e74bd0910f0251d1..54fda4eb81f3a138d9bb2748c21164b88570ede9 100644 --- a/inception/inception/slim/ops.py +++ b/inception/inception/slim/ops.py @@ -15,7 +15,7 @@ """Contains convenience wrappers for typical Neural Network TensorFlow layers. Additionally it maintains a collection with update_ops that need to be - updated after the ops have been computed, for exmaple to update moving means + updated after the ops have been computed, for example to update moving means and moving variances of batch_norm. Ops that have different behavior during training or eval have an is_training @@ -331,9 +331,9 @@ def one_hot_encoding(labels, num_classes, scope=None): batch_size = labels.get_shape()[0] indices = tf.expand_dims(tf.range(0, batch_size), 1) labels = tf.cast(tf.expand_dims(labels, 1), indices.dtype) - concated = tf.concat([indices, labels], 1) + concated = tf.concat(axis=1, values=[indices, labels]) onehot_labels = tf.sparse_to_dense( - concated, tf.pack([batch_size, num_classes]), 1.0, 0.0) + concated, tf.stack([batch_size, num_classes]), 1.0, 0.0) onehot_labels.set_shape([batch_size, num_classes]) return onehot_labels diff --git a/inception/inception/slim/ops_test.py b/inception/inception/slim/ops_test.py index 0978e0ef3783ed50e618cb70504a4619d127b2c9..13dc5d9aacf6e283540a406d419a67d2d7215161 100644 --- a/inception/inception/slim/ops_test.py +++ b/inception/inception/slim/ops_test.py @@ -21,8 +21,6 @@ from __future__ import print_function import numpy as np import tensorflow as tf -from tensorflow.python.ops import control_flow_ops - from inception.slim import ops from inception.slim import scopes from inception.slim import variables @@ -420,7 +418,7 @@ class DropoutTest(tf.test.TestCase): with self.test_session(): images = tf.random_uniform((5, height, width, 3), seed=1) output = ops.dropout(images) - self.assertEquals(output.op.name, 'Dropout/dropout/mul_1') + self.assertEquals(output.op.name, 'Dropout/dropout/mul') output.get_shape().assert_is_compatible_with(images.get_shape()) def testCreateDropoutNoTraining(self): @@ -601,8 +599,7 @@ class BatchNormTest(tf.test.TestCase): output = ops.batch_norm(images, decay=0.1) update_ops = tf.get_collection(ops.UPDATE_OPS_COLLECTION) with tf.control_dependencies(update_ops): - barrier = tf.no_op(name='gradient_barrier') - output = control_flow_ops.with_dependencies([barrier], output) + output = tf.identity(output) # Initialize all variables sess.run(tf.global_variables_initializer()) moving_mean = variables.get_variables('BatchNorm/moving_mean')[0] @@ -631,8 +628,7 @@ class BatchNormTest(tf.test.TestCase): output = ops.batch_norm(images, decay=0.1, is_training=False) update_ops = tf.get_collection(ops.UPDATE_OPS_COLLECTION) with tf.control_dependencies(update_ops): - barrier = tf.no_op(name='gradient_barrier') - output = control_flow_ops.with_dependencies([barrier], output) + output = tf.identity(output) # Initialize all variables sess.run(tf.global_variables_initializer()) moving_mean = variables.get_variables('BatchNorm/moving_mean')[0] @@ -665,8 +661,7 @@ class BatchNormTest(tf.test.TestCase): output = ops.batch_norm(images, decay=0.1, is_training=False) update_ops = tf.get_collection(ops.UPDATE_OPS_COLLECTION) with tf.control_dependencies(update_ops): - barrier = tf.no_op(name='gradient_barrier') - output = control_flow_ops.with_dependencies([barrier], output) + output = tf.identity(output) # Initialize all variables sess.run(tf.global_variables_initializer()) moving_mean = variables.get_variables('BatchNorm/moving_mean')[0] diff --git a/inception/inception/slim/variables.py b/inception/inception/slim/variables.py index 03f2c83e273c6c4aa30852dee4e589ee6dad4cf9..1d967b79e9563724b1114995a732cfd4dd486afd 100644 --- a/inception/inception/slim/variables.py +++ b/inception/inception/slim/variables.py @@ -240,7 +240,7 @@ def global_step(device=''): # Get the device for the variable. with tf.device(variable_device(device, 'global_step')): return tf.get_variable('global_step', shape=[], dtype=tf.int64, - initializer=tf.zeros_initializer, + initializer=tf.zeros_initializer(), trainable=False, collections=collections) diff --git a/learning_to_remember_rare_events/README.md b/learning_to_remember_rare_events/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f06c191ac37e37708c0eeccd39bfddb958c58134 --- /dev/null +++ b/learning_to_remember_rare_events/README.md @@ -0,0 +1,55 @@ +Code for the Memory Module as described +in "Learning to Remember Rare Events" by +Lukasz Kaiser, Ofir Nachum, Aurko Roy, and Samy Bengio +published as a conference paper at ICLR 2017. + +Requirements: +* TensorFlow (see tensorflow.org for how to install) +* Some basic command-line utilities (git, unzip). + +Description: + +The general memory module is located in memory.py. +Some code is provided to see the memory module in +action on the standard Omniglot dataset. +Download and setup the dataset using data_utils.py +and then run the training script train.py +(see example commands below). + +Note that the structure and parameters of the model +are optimized for the data preparation as provided. + +Quick Start: + +First download and set-up Omniglot data by running + +``` +python data_utils.py +``` + +Then run the training script: + +``` +python train.py --memory_size=8192 \ + --batch_size=16 --validation_length=50 \ + --episode_width=5 --episode_length=30 +``` + +The first validation batch may look like this (although it is noisy): +``` +0-shot: 0.040, 1-shot: 0.404, 2-shot: 0.516, 3-shot: 0.604, + 4-shot: 0.656, 5-shot: 0.684 +``` +At step 500 you may see something like this: +``` +0-shot: 0.036, 1-shot: 0.836, 2-shot: 0.900, 3-shot: 0.940, + 4-shot: 0.944, 5-shot: 0.916 +``` +At step 4000 you may see something like this: +``` +0-shot: 0.044, 1-shot: 0.960, 2-shot: 1.000, 3-shot: 0.988, + 4-shot: 0.972, 5-shot: 0.992 +``` + +Maintained by Ofir Nachum (ofirnachum) and +Lukasz Kaiser (lukaszkaiser). diff --git a/learning_to_remember_rare_events/data_utils.py b/learning_to_remember_rare_events/data_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..da83d1fe0ae02e4a559ae54b539b6cd89fe0124a --- /dev/null +++ b/learning_to_remember_rare_events/data_utils.py @@ -0,0 +1,242 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +"""Data loading and other utilities. + +Use this file to first copy over and pre-process the Omniglot dataset. +Simply call + python data_utils.py +""" + +import cPickle as pickle +import logging +import os +import subprocess + +import numpy as np +from scipy.misc import imresize +from scipy.misc import imrotate +from scipy.ndimage import imread +import tensorflow as tf + + +MAIN_DIR = '' +REPO_LOCATION = 'https://github.com/brendenlake/omniglot.git' +REPO_DIR = os.path.join(MAIN_DIR, 'omniglot') +DATA_DIR = os.path.join(REPO_DIR, 'python') +TRAIN_DIR = os.path.join(DATA_DIR, 'images_background') +TEST_DIR = os.path.join(DATA_DIR, 'images_evaluation') +DATA_FILE_FORMAT = os.path.join(MAIN_DIR, '%s_omni.pkl') + +TRAIN_ROTATIONS = True # augment training data with rotations +TEST_ROTATIONS = False # augment testing data with rotations +IMAGE_ORIGINAL_SIZE = 105 +IMAGE_NEW_SIZE = 28 + + +def get_data(): + """Get data in form suitable for episodic training. + + Returns: + Train and test data as dictionaries mapping + label to list of examples. + """ + with tf.gfile.GFile(DATA_FILE_FORMAT % 'train') as f: + processed_train_data = pickle.load(f) + with tf.gfile.GFile(DATA_FILE_FORMAT % 'test') as f: + processed_test_data = pickle.load(f) + + train_data = {} + test_data = {} + + for data, processed_data in zip([train_data, test_data], + [processed_train_data, processed_test_data]): + for image, label in zip(processed_data['images'], + processed_data['labels']): + if label not in data: + data[label] = [] + data[label].append(image.reshape([-1]).astype('float32')) + + intersection = set(train_data.keys()) & set(test_data.keys()) + assert not intersection, 'Train and test data intersect.' + ok_num_examples = [len(ll) == 20 for _, ll in train_data.iteritems()] + assert all(ok_num_examples), 'Bad number of examples in train data.' + ok_num_examples = [len(ll) == 20 for _, ll in test_data.iteritems()] + assert all(ok_num_examples), 'Bad number of examples in test data.' + + logging.info('Number of labels in train data: %d.', len(train_data)) + logging.info('Number of labels in test data: %d.', len(test_data)) + + return train_data, test_data + + +def crawl_directory(directory, augment_with_rotations=False, + first_label=0): + """Crawls data directory and returns stuff.""" + label_idx = first_label + images = [] + labels = [] + info = [] + + # traverse root directory + for root, _, files in os.walk(directory): + logging.info('Reading files from %s', root) + fileflag = 0 + for file_name in files: + full_file_name = os.path.join(root, file_name) + img = imread(full_file_name, flatten=True) + for i, angle in enumerate([0, 90, 180, 270]): + if not augment_with_rotations and i > 0: + break + + images.append(imrotate(img, angle)) + labels.append(label_idx + i) + info.append(full_file_name) + + fileflag = 1 + + if fileflag: + label_idx += 4 if augment_with_rotations else 1 + + return images, labels, info + + +def resize_images(images, new_width, new_height): + """Resize images to new dimensions.""" + resized_images = np.zeros([images.shape[0], new_width, new_height], + dtype=np.float32) + + for i in range(images.shape[0]): + resized_images[i, :, :] = imresize(images[i, :, :], + [new_width, new_height], + interp='bilinear', + mode=None) + return resized_images + + +def write_datafiles(directory, write_file, + resize=True, rotate=False, + new_width=IMAGE_NEW_SIZE, new_height=IMAGE_NEW_SIZE, + first_label=0): + """Load and preprocess images from a directory and write them to a file. + + Args: + directory: Directory of alphabet sub-directories. + write_file: Filename to write to. + resize: Whether to resize the images. + rotate: Whether to augment the dataset with rotations. + new_width: New resize width. + new_height: New resize height. + first_label: Label to start with. + + Returns: + Number of new labels created. + """ + + # these are the default sizes for Omniglot: + imgwidth = IMAGE_ORIGINAL_SIZE + imgheight = IMAGE_ORIGINAL_SIZE + + logging.info('Reading the data.') + images, labels, info = crawl_directory(directory, + augment_with_rotations=rotate, + first_label=first_label) + + images_np = np.zeros([len(images), imgwidth, imgheight], dtype=np.bool) + labels_np = np.zeros([len(labels)], dtype=np.uint32) + for i in xrange(len(images)): + images_np[i, :, :] = images[i] + labels_np[i] = labels[i] + + if resize: + logging.info('Resizing images.') + resized_images = resize_images(images_np, new_width, new_height) + + logging.info('Writing resized data in float32 format.') + data = {'images': resized_images, + 'labels': labels_np, + 'info': info} + with tf.gfile.GFile(write_file, 'w') as f: + pickle.dump(data, f) + else: + logging.info('Writing original sized data in boolean format.') + data = {'images': images_np, + 'labels': labels_np, + 'info': info} + with tf.gfile.GFile(write_file, 'w') as f: + pickle.dump(data, f) + + return len(np.unique(labels_np)) + + +def maybe_download_data(): + """Download Omniglot repo if it does not exist.""" + if os.path.exists(REPO_DIR): + logging.info('It appears that Git repo already exists.') + else: + logging.info('It appears that Git repo does not exist.') + logging.info('Cloning now.') + + subprocess.check_output('git clone %s' % REPO_LOCATION, shell=True) + + if os.path.exists(TRAIN_DIR): + logging.info('It appears that train data has already been unzipped.') + else: + logging.info('It appears that train data has not been unzipped.') + logging.info('Unzipping now.') + + subprocess.check_output('unzip %s.zip -d %s' % (TRAIN_DIR, DATA_DIR), + shell=True) + + if os.path.exists(TEST_DIR): + logging.info('It appears that test data has already been unzipped.') + else: + logging.info('It appears that test data has not been unzipped.') + logging.info('Unzipping now.') + + subprocess.check_output('unzip %s.zip -d %s' % (TEST_DIR, DATA_DIR), + shell=True) + + +def preprocess_omniglot(): + """Download and prepare raw Omniglot data. + + Downloads the data from GitHub if it does not exist. + Then load the images, augment with rotations if desired. + Resize the images and write them to a pickle file. + """ + + maybe_download_data() + + directory = TRAIN_DIR + write_file = DATA_FILE_FORMAT % 'train' + num_labels = write_datafiles( + directory, write_file, resize=True, rotate=TRAIN_ROTATIONS, + new_width=IMAGE_NEW_SIZE, new_height=IMAGE_NEW_SIZE) + + directory = TEST_DIR + write_file = DATA_FILE_FORMAT % 'test' + write_datafiles(directory, write_file, resize=True, rotate=TEST_ROTATIONS, + new_width=IMAGE_NEW_SIZE, new_height=IMAGE_NEW_SIZE, + first_label=num_labels) + + +def main(unused_argv): + logging.basicConfig(level=logging.INFO) + preprocess_omniglot() + + +if __name__ == '__main__': + tf.app.run() diff --git a/learning_to_remember_rare_events/memory.py b/learning_to_remember_rare_events/memory.py new file mode 100644 index 0000000000000000000000000000000000000000..573c4fd25ec3118696773d47ac2cb14017ae41a0 --- /dev/null +++ b/learning_to_remember_rare_events/memory.py @@ -0,0 +1,386 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +"""Memory module for storing "nearest neighbors". + +Implements a key-value memory for generalized one-shot learning +as described in the paper +"Learning to Remember Rare Events" +by Lukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio, +published as a conference paper at ICLR 2017. +""" + +import numpy as np +import tensorflow as tf + + +class Memory(object): + """Memory module.""" + + def __init__(self, key_dim, memory_size, vocab_size, + choose_k=256, alpha=0.1, correct_in_top=1, age_noise=8.0, + var_cache_device='', nn_device=''): + self.key_dim = key_dim + self.memory_size = memory_size + self.vocab_size = vocab_size + self.choose_k = min(choose_k, memory_size) + self.alpha = alpha + self.correct_in_top = correct_in_top + self.age_noise = age_noise + self.var_cache_device = var_cache_device # Variables are cached here. + self.nn_device = nn_device # Device to perform nearest neighbour matmul. + + caching_device = var_cache_device if var_cache_device else None + self.update_memory = tf.constant(True) # Can be fed "false" if needed. + self.mem_keys = tf.get_variable( + 'memkeys', [self.memory_size, self.key_dim], trainable=False, + initializer=tf.random_uniform_initializer(-0.0, 0.0), + caching_device=caching_device) + self.mem_vals = tf.get_variable( + 'memvals', [self.memory_size], dtype=tf.int32, trainable=False, + initializer=tf.constant_initializer(0, tf.int32), + caching_device=caching_device) + self.mem_age = tf.get_variable( + 'memage', [self.memory_size], dtype=tf.float32, trainable=False, + initializer=tf.constant_initializer(0.0), caching_device=caching_device) + self.recent_idx = tf.get_variable( + 'recent_idx', [self.vocab_size], dtype=tf.int32, trainable=False, + initializer=tf.constant_initializer(0, tf.int32)) + + # variable for projecting query vector into memory key + self.query_proj = tf.get_variable( + 'memory_query_proj', [self.key_dim, self.key_dim], dtype=tf.float32, + initializer=tf.truncated_normal_initializer(0, 0.01), + caching_device=caching_device) + + def get(self): + return self.mem_keys, self.mem_vals, self.mem_age, self.recent_idx + + def set(self, k, v, a, r=None): + return tf.group( + self.mem_keys.assign(k), + self.mem_vals.assign(v), + self.mem_age.assign(a), + (self.recent_idx.assign(r) if r is not None else tf.group())) + + def clear(self): + return tf.variables_initializer([self.mem_keys, self.mem_vals, self.mem_age, + self.recent_idx]) + + def get_hint_pool_idxs(self, normalized_query): + """Get small set of idxs to compute nearest neighbor queries on. + + This is an expensive look-up on the whole memory that is used to + avoid more expensive operations later on. + + Args: + normalized_query: A Tensor of shape [None, key_dim]. + + Returns: + A Tensor of shape [None, choose_k] of indices in memory + that are closest to the queries. + + """ + # look up in large memory, no gradients + with tf.device(self.nn_device): + similarities = tf.matmul(tf.stop_gradient(normalized_query), + self.mem_keys, transpose_b=True, name='nn_mmul') + _, hint_pool_idxs = tf.nn.top_k( + tf.stop_gradient(similarities), k=self.choose_k, name='nn_topk') + return hint_pool_idxs + + def make_update_op(self, upd_idxs, upd_keys, upd_vals, + batch_size, use_recent_idx, intended_output): + """Function that creates all the update ops.""" + mem_age_incr = self.mem_age.assign_add(tf.ones([self.memory_size], + dtype=tf.float32)) + with tf.control_dependencies([mem_age_incr]): + mem_age_upd = tf.scatter_update( + self.mem_age, upd_idxs, tf.zeros([batch_size], dtype=tf.float32)) + + mem_key_upd = tf.scatter_update( + self.mem_keys, upd_idxs, upd_keys) + mem_val_upd = tf.scatter_update( + self.mem_vals, upd_idxs, upd_vals) + + if use_recent_idx: + recent_idx_upd = tf.scatter_update( + self.recent_idx, intended_output, upd_idxs) + else: + recent_idx_upd = tf.group() + + return tf.group(mem_age_upd, mem_key_upd, mem_val_upd, recent_idx_upd) + + def query(self, query_vec, intended_output, use_recent_idx=True): + """Queries memory for nearest neighbor. + + Args: + query_vec: A batch of vectors to query (embedding of input to model). + intended_output: The values that would be the correct output of the + memory. + use_recent_idx: Whether to always insert at least one instance of a + correct memory fetch. + + Returns: + A tuple (result, mask, teacher_loss). + result: The result of the memory look up. + mask: The affinity of the query to the result. + teacher_loss: The loss for training the memory module. + """ + + batch_size = tf.shape(query_vec)[0] + output_given = intended_output is not None + + # prepare query for memory lookup + query_vec = tf.matmul(query_vec, self.query_proj) + normalized_query = tf.nn.l2_normalize(query_vec, dim=1) + + hint_pool_idxs = self.get_hint_pool_idxs(normalized_query) + + if output_given and use_recent_idx: # add at least one correct memory + most_recent_hint_idx = tf.gather(self.recent_idx, intended_output) + hint_pool_idxs = tf.concat( + axis=1, + values=[hint_pool_idxs, tf.expand_dims(most_recent_hint_idx, 1)]) + choose_k = tf.shape(hint_pool_idxs)[1] + + with tf.device(self.var_cache_device): + # create small memory and look up with gradients + my_mem_keys = tf.stop_gradient(tf.gather(self.mem_keys, hint_pool_idxs, + name='my_mem_keys_gather')) + similarities = tf.matmul(tf.expand_dims(normalized_query, 1), + my_mem_keys, adjoint_b=True, name='batch_mmul') + hint_pool_sims = tf.squeeze(similarities, [1], name='hint_pool_sims') + hint_pool_mem_vals = tf.gather(self.mem_vals, hint_pool_idxs, + name='hint_pool_mem_vals') + # Calculate softmax mask on the top-k if requested. + # Softmax temperature. Say we have K elements at dist x and one at (x+a). + # Softmax of the last is e^tm(x+a)/Ke^tm*x + e^tm(x+a) = e^tm*a/K+e^tm*a. + # To make that 20% we'd need to have e^tm*a ~= 0.2K, so tm = log(0.2K)/a. + softmax_temp = max(1.0, np.log(0.2 * self.choose_k) / self.alpha) + mask = tf.nn.softmax(hint_pool_sims[:, :choose_k - 1] * softmax_temp) + + # prepare hints from the teacher on hint pool + teacher_hints = tf.to_float( + tf.abs(tf.expand_dims(intended_output, 1) - hint_pool_mem_vals)) + teacher_hints = 1.0 - tf.minimum(1.0, teacher_hints) + + teacher_vals, teacher_hint_idxs = tf.nn.top_k( + hint_pool_sims * teacher_hints, k=1) + neg_teacher_vals, _ = tf.nn.top_k( + hint_pool_sims * (1 - teacher_hints), k=1) + + # bring back idxs to full memory + teacher_idxs = tf.gather( + tf.reshape(hint_pool_idxs, [-1]), + teacher_hint_idxs[:, 0] + choose_k * tf.range(batch_size)) + + # zero-out teacher_vals if there are no hints + teacher_vals *= ( + 1 - tf.to_float(tf.equal(0.0, tf.reduce_sum(teacher_hints, 1)))) + + # prepare returned values + nearest_neighbor = tf.to_int32( + tf.argmax(hint_pool_sims[:, :choose_k - 1], 1)) + no_teacher_idxs = tf.gather( + tf.reshape(hint_pool_idxs, [-1]), + nearest_neighbor + choose_k * tf.range(batch_size)) + + # we'll determine whether to do an update to memory based on whether + # memory was queried correctly + sliced_hints = tf.slice(teacher_hints, [0, 0], [-1, self.correct_in_top]) + incorrect_memory_lookup = tf.equal(0.0, tf.reduce_sum(sliced_hints, 1)) + + # loss based on triplet loss + teacher_loss = (tf.nn.relu(neg_teacher_vals - teacher_vals + self.alpha) + - self.alpha) + + with tf.device(self.var_cache_device): + result = tf.gather(self.mem_vals, tf.reshape(no_teacher_idxs, [-1])) + + # prepare memory updates + update_keys = normalized_query + update_vals = intended_output + + fetched_idxs = teacher_idxs # correctly fetched from memory + with tf.device(self.var_cache_device): + fetched_keys = tf.gather(self.mem_keys, fetched_idxs, name='fetched_keys') + fetched_vals = tf.gather(self.mem_vals, fetched_idxs, name='fetched_vals') + + # do memory updates here + fetched_keys_upd = update_keys + fetched_keys # Momentum-like update + fetched_keys_upd = tf.nn.l2_normalize(fetched_keys_upd, dim=1) + # Randomize age a bit, e.g., to select different ones in parallel workers. + mem_age_with_noise = self.mem_age + tf.random_uniform( + [self.memory_size], - self.age_noise, self.age_noise) + + _, oldest_idxs = tf.nn.top_k(mem_age_with_noise, k=batch_size, sorted=False) + + with tf.control_dependencies([result]): + upd_idxs = tf.where(incorrect_memory_lookup, + oldest_idxs, + fetched_idxs) + # upd_idxs = tf.Print(upd_idxs, [upd_idxs], "UPD IDX", summarize=8) + upd_keys = tf.where(incorrect_memory_lookup, + update_keys, + fetched_keys_upd) + upd_vals = tf.where(incorrect_memory_lookup, + update_vals, + fetched_vals) + + def make_update_op(): + return self.make_update_op(upd_idxs, upd_keys, upd_vals, + batch_size, use_recent_idx, intended_output) + + update_op = tf.cond(self.update_memory, make_update_op, tf.no_op) + + with tf.control_dependencies([update_op]): + result = tf.identity(result) + mask = tf.identity(mask) + teacher_loss = tf.identity(teacher_loss) + + return result, mask, tf.reduce_mean(teacher_loss) + + +class LSHMemory(Memory): + """Memory employing locality sensitive hashing. + + Note: Not fully tested. + """ + + def __init__(self, key_dim, memory_size, vocab_size, + choose_k=256, alpha=0.1, correct_in_top=1, age_noise=8.0, + var_cache_device='', nn_device='', + num_hashes=None, num_libraries=None): + super(LSHMemory, self).__init__( + key_dim, memory_size, vocab_size, + choose_k=choose_k, alpha=alpha, correct_in_top=1, age_noise=age_noise, + var_cache_device=var_cache_device, nn_device=nn_device) + + self.num_libraries = num_libraries or int(self.choose_k ** 0.5) + self.num_per_hash_slot = max(1, self.choose_k // self.num_libraries) + self.num_hashes = (num_hashes or + int(np.log2(self.memory_size / self.num_per_hash_slot))) + self.num_hashes = min(max(self.num_hashes, 1), 20) + self.num_hash_slots = 2 ** self.num_hashes + + # hashing vectors + self.hash_vecs = [ + tf.get_variable( + 'hash_vecs%d' % i, [self.num_hashes, self.key_dim], + dtype=tf.float32, trainable=False, + initializer=tf.truncated_normal_initializer(0, 1)) + for i in xrange(self.num_libraries)] + + # map representing which hash slots map to which mem keys + self.hash_slots = [ + tf.get_variable( + 'hash_slots%d' % i, [self.num_hash_slots, self.num_per_hash_slot], + dtype=tf.int32, trainable=False, + initializer=tf.random_uniform_initializer(maxval=self.memory_size, + dtype=tf.int32)) + for i in xrange(self.num_libraries)] + + def get(self): # not implemented + return self.mem_keys, self.mem_vals, self.mem_age, self.recent_idx + + def set(self, k, v, a, r=None): # not implemented + return tf.group( + self.mem_keys.assign(k), + self.mem_vals.assign(v), + self.mem_age.assign(a), + (self.recent_idx.assign(r) if r is not None else tf.group())) + + def clear(self): + return tf.variables_initializer([self.mem_keys, self.mem_vals, self.mem_age, + self.recent_idx] + self.hash_slots) + + def get_hash_slots(self, query): + """Gets hashed-to buckets for batch of queries. + + Args: + query: 2-d Tensor of query vectors. + + Returns: + A list of hashed-to buckets for each hash function. + """ + + binary_hash = [ + tf.less(tf.matmul(query, self.hash_vecs[i], transpose_b=True), 0) + for i in xrange(self.num_libraries)] + hash_slot_idxs = [ + tf.reduce_sum( + tf.to_int32(binary_hash[i]) * + tf.constant([[2 ** i for i in xrange(self.num_hashes)]], + dtype=tf.int32), 1) + for i in xrange(self.num_libraries)] + return hash_slot_idxs + + def get_hint_pool_idxs(self, normalized_query): + """Get small set of idxs to compute nearest neighbor queries on. + + This is an expensive look-up on the whole memory that is used to + avoid more expensive operations later on. + + Args: + normalized_query: A Tensor of shape [None, key_dim]. + + Returns: + A Tensor of shape [None, choose_k] of indices in memory + that are closest to the queries. + + """ + # get hash of query vecs + hash_slot_idxs = self.get_hash_slots(normalized_query) + + # grab mem idxs in the hash slots + hint_pool_idxs = [ + tf.maximum(tf.minimum( + tf.gather(self.hash_slots[i], idxs), + self.memory_size - 1), 0) + for i, idxs in enumerate(hash_slot_idxs)] + + return tf.concat(axis=1, values=hint_pool_idxs) + + def make_update_op(self, upd_idxs, upd_keys, upd_vals, + batch_size, use_recent_idx, intended_output): + """Function that creates all the update ops.""" + base_update_op = super(LSHMemory, self).make_update_op( + upd_idxs, upd_keys, upd_vals, + batch_size, use_recent_idx, intended_output) + + # compute hash slots to be updated + hash_slot_idxs = self.get_hash_slots(upd_keys) + + # make updates + update_ops = [] + with tf.control_dependencies([base_update_op]): + for i, slot_idxs in enumerate(hash_slot_idxs): + # for each slot, choose which entry to replace + entry_idx = tf.random_uniform([batch_size], + maxval=self.num_per_hash_slot, + dtype=tf.int32) + entry_mul = 1 - tf.one_hot(entry_idx, self.num_per_hash_slot, + dtype=tf.int32) + entry_add = (tf.expand_dims(upd_idxs, 1) * + tf.one_hot(entry_idx, self.num_per_hash_slot, + dtype=tf.int32)) + + mul_op = tf.scatter_mul(self.hash_slots[i], slot_idxs, entry_mul) + with tf.control_dependencies([mul_op]): + add_op = tf.scatter_add(self.hash_slots[i], slot_idxs, entry_add) + update_ops.append(add_op) + + return tf.group(*update_ops) diff --git a/learning_to_remember_rare_events/model.py b/learning_to_remember_rare_events/model.py new file mode 100644 index 0000000000000000000000000000000000000000..ed34603feee3be961cb255b382e3bb1e4e816323 --- /dev/null +++ b/learning_to_remember_rare_events/model.py @@ -0,0 +1,308 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +"""Model using memory component. + +The model embeds images using a standard CNN architecture. +These embeddings are used as keys to the memory component, +which returns nearest neighbors. +""" + +import tensorflow as tf + +import memory + +FLAGS = tf.flags.FLAGS + + +class BasicClassifier(object): + + def __init__(self, output_dim): + self.output_dim = output_dim + + def core_builder(self, memory_val, x, y): + del x, y + y_pred = memory_val + loss = 0.0 + + return loss, y_pred + + +class LeNet(object): + """Standard CNN architecture.""" + + def __init__(self, image_size, num_channels, hidden_dim): + self.image_size = image_size + self.num_channels = num_channels + self.hidden_dim = hidden_dim + self.matrix_init = tf.truncated_normal_initializer(stddev=0.1) + self.vector_init = tf.constant_initializer(0.0) + + def core_builder(self, x): + """Embeds x using standard CNN architecture. + + Args: + x: Batch of images as a 2-d Tensor [batch_size, -1]. + + Returns: + A 2-d Tensor [batch_size, hidden_dim] of embedded images. + """ + + ch1 = 32 * 2 # number of channels in 1st layer + ch2 = 64 * 2 # number of channels in 2nd layer + conv1_weights = tf.get_variable('conv1_w', + [3, 3, self.num_channels, ch1], + initializer=self.matrix_init) + conv1_biases = tf.get_variable('conv1_b', [ch1], + initializer=self.vector_init) + conv1a_weights = tf.get_variable('conv1a_w', + [3, 3, ch1, ch1], + initializer=self.matrix_init) + conv1a_biases = tf.get_variable('conv1a_b', [ch1], + initializer=self.vector_init) + + conv2_weights = tf.get_variable('conv2_w', [3, 3, ch1, ch2], + initializer=self.matrix_init) + conv2_biases = tf.get_variable('conv2_b', [ch2], + initializer=self.vector_init) + conv2a_weights = tf.get_variable('conv2a_w', [3, 3, ch2, ch2], + initializer=self.matrix_init) + conv2a_biases = tf.get_variable('conv2a_b', [ch2], + initializer=self.vector_init) + + # fully connected + fc1_weights = tf.get_variable( + 'fc1_w', [self.image_size // 4 * self.image_size // 4 * ch2, + self.hidden_dim], initializer=self.matrix_init) + fc1_biases = tf.get_variable('fc1_b', [self.hidden_dim], + initializer=self.vector_init) + + # define model + x = tf.reshape(x, + [-1, self.image_size, self.image_size, self.num_channels]) + batch_size = tf.shape(x)[0] + + conv1 = tf.nn.conv2d(x, conv1_weights, + strides=[1, 1, 1, 1], padding='SAME') + relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases)) + conv1 = tf.nn.conv2d(relu1, conv1a_weights, + strides=[1, 1, 1, 1], padding='SAME') + relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1a_biases)) + + pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], + strides=[1, 2, 2, 1], padding='SAME') + + conv2 = tf.nn.conv2d(pool1, conv2_weights, + strides=[1, 1, 1, 1], padding='SAME') + relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases)) + conv2 = tf.nn.conv2d(relu2, conv2a_weights, + strides=[1, 1, 1, 1], padding='SAME') + relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2a_biases)) + + pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], + strides=[1, 2, 2, 1], padding='SAME') + + reshape = tf.reshape(pool2, [batch_size, -1]) + hidden = tf.matmul(reshape, fc1_weights) + fc1_biases + + return hidden + + +class Model(object): + """Model for coordinating between CNN embedder and Memory module.""" + + def __init__(self, input_dim, output_dim, rep_dim, memory_size, vocab_size, + learning_rate=0.0001, use_lsh=False): + self.input_dim = input_dim + self.output_dim = output_dim + self.rep_dim = rep_dim + self.memory_size = memory_size + self.vocab_size = vocab_size + self.learning_rate = learning_rate + self.use_lsh = use_lsh + + self.embedder = self.get_embedder() + self.memory = self.get_memory() + self.classifier = self.get_classifier() + + self.global_step = tf.contrib.framework.get_or_create_global_step() + + def get_embedder(self): + return LeNet(int(self.input_dim ** 0.5), 1, self.rep_dim) + + def get_memory(self): + cls = memory.LSHMemory if self.use_lsh else memory.Memory + return cls(self.rep_dim, self.memory_size, self.vocab_size) + + def get_classifier(self): + return BasicClassifier(self.output_dim) + + def core_builder(self, x, y, keep_prob, use_recent_idx=True): + embeddings = self.embedder.core_builder(x) + if keep_prob < 1.0: + embeddings = tf.nn.dropout(embeddings, keep_prob) + memory_val, _, teacher_loss = self.memory.query( + embeddings, y, use_recent_idx=use_recent_idx) + loss, y_pred = self.classifier.core_builder(memory_val, x, y) + + return loss + teacher_loss, y_pred + + def train(self, x, y): + loss, _ = self.core_builder(x, y, keep_prob=0.3) + gradient_ops = self.training_ops(loss) + return loss, gradient_ops + + def eval(self, x, y): + _, y_preds = self.core_builder(x, y, keep_prob=1.0, + use_recent_idx=False) + return y_preds + + def get_xy_placeholders(self): + return (tf.placeholder(tf.float32, [None, self.input_dim]), + tf.placeholder(tf.int32, [None])) + + def setup(self): + """Sets up all components of the computation graph.""" + + self.x, self.y = self.get_xy_placeholders() + + with tf.variable_scope('core', reuse=None): + self.loss, self.gradient_ops = self.train(self.x, self.y) + with tf.variable_scope('core', reuse=True): + self.y_preds = self.eval(self.x, self.y) + + # setup memory "reset" ops + (self.mem_keys, self.mem_vals, + self.mem_age, self.recent_idx) = self.memory.get() + self.mem_keys_reset = tf.placeholder(self.mem_keys.dtype, + tf.identity(self.mem_keys).shape) + self.mem_vals_reset = tf.placeholder(self.mem_vals.dtype, + tf.identity(self.mem_vals).shape) + self.mem_age_reset = tf.placeholder(self.mem_age.dtype, + tf.identity(self.mem_age).shape) + self.recent_idx_reset = tf.placeholder(self.recent_idx.dtype, + tf.identity(self.recent_idx).shape) + self.mem_reset_op = self.memory.set(self.mem_keys_reset, + self.mem_vals_reset, + self.mem_age_reset, + None) + + def training_ops(self, loss): + opt = self.get_optimizer() + params = tf.trainable_variables() + gradients = tf.gradients(loss, params) + clipped_gradients, _ = tf.clip_by_global_norm(gradients, 5.0) + return opt.apply_gradients(zip(clipped_gradients, params), + global_step=self.global_step) + + def get_optimizer(self): + return tf.train.AdamOptimizer(learning_rate=self.learning_rate, + epsilon=1e-4) + + def one_step(self, sess, x, y): + outputs = [self.loss, self.gradient_ops] + return sess.run(outputs, feed_dict={self.x: x, self.y: y}) + + def episode_step(self, sess, x, y, clear_memory=False): + """Performs training steps on episodic input. + + Args: + sess: A Tensorflow Session. + x: A list of batches of images defining the episode. + y: A list of batches of labels corresponding to x. + clear_memory: Whether to clear the memory before the episode. + + Returns: + List of losses the same length as the episode. + """ + + outputs = [self.loss, self.gradient_ops] + + if clear_memory: + self.clear_memory(sess) + + losses = [] + for xx, yy in zip(x, y): + out = sess.run(outputs, feed_dict={self.x: xx, self.y: yy}) + loss = out[0] + losses.append(loss) + + return losses + + def predict(self, sess, x, y=None): + """Predict the labels on a single batch of examples. + + Args: + sess: A Tensorflow Session. + x: A batch of images. + y: The labels for the images in x. + This allows for updating the memory. + + Returns: + Predicted y. + """ + + cur_memory = sess.run([self.mem_keys, self.mem_vals, + self.mem_age]) + + outputs = [self.y_preds] + if y is None: + ret = sess.run(outputs, feed_dict={self.x: x}) + else: + ret = sess.run(outputs, feed_dict={self.x: x, self.y: y}) + + sess.run([self.mem_reset_op], + feed_dict={self.mem_keys_reset: cur_memory[0], + self.mem_vals_reset: cur_memory[1], + self.mem_age_reset: cur_memory[2]}) + + return ret + + def episode_predict(self, sess, x, y, clear_memory=False): + """Predict the labels on an episode of examples. + + Args: + sess: A Tensorflow Session. + x: A list of batches of images. + y: A list of labels for the images in x. + This allows for updating the memory. + clear_memory: Whether to clear the memory before the episode. + + Returns: + List of predicted y. + """ + + cur_memory = sess.run([self.mem_keys, self.mem_vals, + self.mem_age]) + + if clear_memory: + self.clear_memory(sess) + + outputs = [self.y_preds] + y_preds = [] + for xx, yy in zip(x, y): + out = sess.run(outputs, feed_dict={self.x: xx, self.y: yy}) + y_pred = out[0] + y_preds.append(y_pred) + + sess.run([self.mem_reset_op], + feed_dict={self.mem_keys_reset: cur_memory[0], + self.mem_vals_reset: cur_memory[1], + self.mem_age_reset: cur_memory[2]}) + + return y_preds + + def clear_memory(self, sess): + sess.run([self.memory.clear()]) diff --git a/learning_to_remember_rare_events/train.py b/learning_to_remember_rare_events/train.py new file mode 100644 index 0000000000000000000000000000000000000000..9145bc088c3ef64f1c8a4a8f4167543f50a800d2 --- /dev/null +++ b/learning_to_remember_rare_events/train.py @@ -0,0 +1,241 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# ============================================================================== +r"""Script for training model. + +Simple command to get up and running: + python train.py --memory_size=8192 \ + --batch_size=16 --validation_length=50 \ + --episode_width=5 --episode_length=30 +""" + +import logging +import os +import random + +import numpy as np +import tensorflow as tf + +import data_utils +import model + +FLAGS = tf.flags.FLAGS + +tf.flags.DEFINE_integer('rep_dim', 128, + 'dimension of keys to use in memory') +tf.flags.DEFINE_integer('episode_length', 100, 'length of episode') +tf.flags.DEFINE_integer('episode_width', 5, + 'number of distinct labels in a single episode') +tf.flags.DEFINE_integer('memory_size', None, 'number of slots in memory. ' + 'Leave as None to default to episode length') +tf.flags.DEFINE_integer('batch_size', 16, 'batch size') +tf.flags.DEFINE_integer('num_episodes', 100000, 'number of training episodes') +tf.flags.DEFINE_integer('validation_frequency', 20, + 'every so many training episodes, ' + 'assess validation accuracy') +tf.flags.DEFINE_integer('validation_length', 10, + 'number of episodes to use to compute ' + 'validation accuracy') +tf.flags.DEFINE_integer('seed', 888, 'random seed for training sampling') +tf.flags.DEFINE_string('save_dir', '', 'directory to save model to') +tf.flags.DEFINE_bool('use_lsh', False, + 'use locality-sensitive hashing ' + '(NOTE: not fully tested)') + + +class Trainer(object): + """Class that takes care of training, validating, and checkpointing model.""" + + def __init__(self, train_data, valid_data, input_dim, output_dim=None): + self.train_data = train_data + self.valid_data = valid_data + self.input_dim = input_dim + + self.rep_dim = FLAGS.rep_dim + self.episode_length = FLAGS.episode_length + self.episode_width = FLAGS.episode_width + self.batch_size = FLAGS.batch_size + self.memory_size = (self.episode_length * self.batch_size + if FLAGS.memory_size is None else FLAGS.memory_size) + self.use_lsh = FLAGS.use_lsh + + self.output_dim = (output_dim if output_dim is not None + else self.episode_width) + + def get_model(self): + # vocab size is the number of distinct values that + # could go into the memory key-value storage + vocab_size = self.episode_width * self.batch_size + return model.Model( + self.input_dim, self.output_dim, self.rep_dim, self.memory_size, + vocab_size, use_lsh=self.use_lsh) + + def sample_episode_batch(self, data, + episode_length, episode_width, batch_size): + """Generates a random batch for training or validation. + + Structures each element of the batch as an 'episode'. + Each episode contains episode_length examples and + episode_width distinct labels. + + Args: + data: A dictionary mapping label to list of examples. + episode_length: Number of examples in each episode. + episode_width: Distinct number of labels in each episode. + batch_size: Batch size (number of episodes). + + Returns: + A tuple (x, y) where x is a list of batches of examples + with size episode_length and y is a list of batches of labels. + """ + + episodes_x = [[] for _ in xrange(episode_length)] + episodes_y = [[] for _ in xrange(episode_length)] + assert len(data) >= episode_width + keys = data.keys() + for b in xrange(batch_size): + episode_labels = random.sample(keys, episode_width) + remainder = episode_length % episode_width + remainders = [0] * (episode_width - remainder) + [1] * remainder + episode_x = [ + random.sample(data[lab], + r + (episode_length - remainder) / episode_width) + for lab, r in zip(episode_labels, remainders)] + episode = sum([[(x, i, ii) for ii, x in enumerate(xx)] + for i, xx in enumerate(episode_x)], []) + random.shuffle(episode) + # Arrange episode so that each distinct label is seen before moving to + # 2nd showing + episode.sort(key=lambda elem: elem[2]) + assert len(episode) == episode_length + for i in xrange(episode_length): + episodes_x[i].append(episode[i][0]) + episodes_y[i].append(episode[i][1] + b * episode_width) + + return ([np.array(xx).astype('float32') for xx in episodes_x], + [np.array(yy).astype('int32') for yy in episodes_y]) + + def compute_correct(self, ys, y_preds): + return np.mean(np.equal(y_preds, np.array(ys))) + + def individual_compute_correct(self, y, y_pred): + return y_pred == y + + def run(self): + """Performs training. + + Trains a model using episodic training. + Every so often, runs some evaluations on validation data. + """ + + train_data, valid_data = self.train_data, self.valid_data + input_dim, output_dim = self.input_dim, self.output_dim + rep_dim, episode_length = self.rep_dim, self.episode_length + episode_width, memory_size = self.episode_width, self.memory_size + batch_size = self.batch_size + + train_size = len(train_data) + valid_size = len(valid_data) + logging.info('train_size (number of labels) %d', train_size) + logging.info('valid_size (number of labels) %d', valid_size) + logging.info('input_dim %d', input_dim) + logging.info('output_dim %d', output_dim) + logging.info('rep_dim %d', rep_dim) + logging.info('episode_length %d', episode_length) + logging.info('episode_width %d', episode_width) + logging.info('memory_size %d', memory_size) + logging.info('batch_size %d', batch_size) + + assert all(len(v) >= float(episode_length) / episode_width + for v in train_data.itervalues()) + assert all(len(v) >= float(episode_length) / episode_width + for v in valid_data.itervalues()) + + output_dim = episode_width + self.model = self.get_model() + self.model.setup() + + sess = tf.Session() + sess.run(tf.global_variables_initializer()) + + saver = tf.train.Saver(max_to_keep=10) + ckpt = None + if FLAGS.save_dir: + ckpt = tf.train.get_checkpoint_state(FLAGS.save_dir) + if ckpt and ckpt.model_checkpoint_path: + logging.info('restoring from %s', ckpt.model_checkpoint_path) + saver.restore(sess, ckpt.model_checkpoint_path) + + logging.info('starting now') + losses = [] + random.seed(FLAGS.seed) + np.random.seed(FLAGS.seed) + for i in xrange(FLAGS.num_episodes): + x, y = self.sample_episode_batch( + train_data, episode_length, episode_width, batch_size) + outputs = self.model.episode_step(sess, x, y, clear_memory=True) + loss = outputs + losses.append(loss) + + if i % FLAGS.validation_frequency == 0: + logging.info('episode batch %d, avg train loss %f', + i, np.mean(losses)) + losses = [] + + # validation + correct = [] + correct_by_shot = dict((k, []) for k in xrange(self.episode_width + 1)) + for _ in xrange(FLAGS.validation_length): + x, y = self.sample_episode_batch( + valid_data, episode_length, episode_width, 1) + outputs = self.model.episode_predict( + sess, x, y, clear_memory=True) + y_preds = outputs + correct.append(self.compute_correct(np.array(y), y_preds)) + + # compute per-shot accuracies + seen_counts = [[0] * episode_width for _ in xrange(batch_size)] + # loop over episode steps + for yy, yy_preds in zip(y, y_preds): + # loop over batch examples + for k, (yyy, yyy_preds) in enumerate(zip(yy, yy_preds)): + yyy, yyy_preds = int(yyy), int(yyy_preds) + count = seen_counts[k][yyy % self.episode_width] + if count in correct_by_shot: + correct_by_shot[count].append( + self.individual_compute_correct(yyy, yyy_preds)) + seen_counts[k][yyy % self.episode_width] = count + 1 + + logging.info('validation overall accuracy %f', np.mean(correct)) + logging.info('%d-shot: %.3f, ' * (self.episode_width + 1), + *sum([[k, np.mean(correct_by_shot[k])] + for k in xrange(self.episode_width + 1)], [])) + + if saver and FLAGS.save_dir: + saved_file = saver.save(sess, + os.path.join(FLAGS.save_dir, 'model.ckpt'), + global_step=self.model.global_step) + logging.info('saved model to %s', saved_file) + + +def main(unused_argv): + train_data, valid_data = data_utils.get_data() + trainer = Trainer(train_data, valid_data, data_utils.IMAGE_NEW_SIZE ** 2) + trainer.run() + + +if __name__ == '__main__': + logging.basicConfig(level=logging.INFO) + tf.app.run() diff --git a/lm_1b/README.md b/lm_1b/README.md index 86203cd646c26e870aacebc5e1e06df709674b15..24de775c86b8b2d0b680d2188841ed9a138df462 100644 --- a/lm_1b/README.md +++ b/lm_1b/README.md @@ -73,7 +73,7 @@ LSTM-8192-2048 (50\% Dropout) | 32.2 | 3.3 How To Run -Pre-requesite: +Prerequisites: * Install TensorFlow. * Install Bazel. @@ -97,7 +97,7 @@ Pre-requesite: [link](http://download.tensorflow.org/models/LM_LSTM_CNN/vocab-2016-09-10.txt) * test dataset: link [link](http://download.tensorflow.org/models/LM_LSTM_CNN/test/news.en.heldout-00000-of-00050) -* It is recommended to run on modern desktop instead of laptop. +* It is recommended to run on a modern desktop instead of a laptop. ```shell # 1. Clone the code to your workspace. @@ -105,7 +105,7 @@ Pre-requesite: # 3. Create an empty WORKSPACE file in your workspace. # 4. Create an empty output directory in your workspace. # Example directory structure below: -ls -R +$ ls -R .: data lm_1b output WORKSPACE @@ -121,13 +121,13 @@ BUILD data_utils.py lm_1b_eval.py README.md ./output: # Build the codes. -bazel build -c opt lm_1b/... +$ bazel build -c opt lm_1b/... # Run sample mode: -bazel-bin/lm_1b/lm_1b_eval --mode sample \ - --prefix "I love that I" \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --ckpt 'data/ckpt-*' +$ bazel-bin/lm_1b/lm_1b_eval --mode sample \ + --prefix "I love that I" \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --ckpt 'data/ckpt-*' ...(omitted some TensorFlow output) I love I love that @@ -138,11 +138,11 @@ I love that I find that amazing ...(omitted) # Run eval mode: -bazel-bin/lm_1b/lm_1b_eval --mode eval \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --input_data data/news.en.heldout-00000-of-00050 \ - --ckpt 'data/ckpt-*' +$ bazel-bin/lm_1b/lm_1b_eval --mode eval \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --input_data data/news.en.heldout-00000-of-00050 \ + --ckpt 'data/ckpt-*' ...(omitted some TensorFlow output) Loaded step 14108582. # perplexity is high initially because words without context are harder to @@ -166,28 +166,28 @@ Eval Step: 4531, Average Perplexity: 29.285674. ...(omitted. At convergence, it should be around 30.) # Run dump_emb mode: -bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --ckpt 'data/ckpt-*' \ - --save_dir output +$ bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --ckpt 'data/ckpt-*' \ + --save_dir output ...(omitted some TensorFlow output) Finished softmax weights Finished word embedding 0/793471 Finished word embedding 1/793471 Finished word embedding 2/793471 ...(omitted) -ls output/ +$ ls output/ embeddings_softmax.npy ... # Run dump_lstm_emb mode: -bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \ - --pbtxt data/graph-2016-09-10.pbtxt \ - --vocab_file data/vocab-2016-09-10.txt \ - --ckpt 'data/ckpt-*' \ - --sentence "I love who I am ." \ - --save_dir output -ls output/ +$ bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \ + --pbtxt data/graph-2016-09-10.pbtxt \ + --vocab_file data/vocab-2016-09-10.txt \ + --ckpt 'data/ckpt-*' \ + --sentence "I love who I am ." \ + --save_dir output +$ ls output/ lstm_emb_step_0.npy lstm_emb_step_2.npy lstm_emb_step_4.npy lstm_emb_step_6.npy lstm_emb_step_1.npy lstm_emb_step_3.npy lstm_emb_step_5.npy diff --git a/lm_1b/lm_1b_eval.py b/lm_1b/lm_1b_eval.py index 65c48aa4a543091b4f0af27ce7927da206de4ca2..ce8634757558c135ba137a9b9e09a733977adc3a 100644 --- a/lm_1b/lm_1b_eval.py +++ b/lm_1b/lm_1b_eval.py @@ -19,6 +19,7 @@ import os import sys import numpy as np +from six.moves import xrange import tensorflow as tf from google.protobuf import text_format @@ -83,7 +84,7 @@ def _LoadModel(gd_file, ckpt_file): with tf.Graph().as_default(): sys.stderr.write('Recovering graph.\n') with tf.gfile.FastGFile(gd_file, 'r') as f: - s = f.read() + s = f.read().decode() gd = tf.GraphDef() text_format.Merge(s, gd) @@ -230,7 +231,7 @@ def _DumpEmb(vocab): sys.stderr.write('Finished softmax weights\n') all_embs = np.zeros([vocab.size, 1024]) - for i in range(vocab.size): + for i in xrange(vocab.size): input_dict = {t['inputs_in']: inputs, t['targets_in']: targets, t['target_weights_in']: weights} diff --git a/namignizer/data_utils.py b/namignizer/data_utils.py index fcb0f257fb21ffcbdd8ce6306755feef6be1d78d..4320215026ccf7a2b31ffd476c25a153ecd92b86 100644 --- a/namignizer/data_utils.py +++ b/namignizer/data_utils.py @@ -58,7 +58,7 @@ def _letter_to_number(letter): def namignizer_iterator(names, counts, batch_size, num_steps, epoch_size): """Takes a list of names and counts like those output from read_names, and makes an iterator yielding a batch_size by num_steps array of random names - separated by an end of name token. The names are choosen randomly according + separated by an end of name token. The names are chosen randomly according to their counts. The batch may end mid-name Args: diff --git a/namignizer/model.py b/namignizer/model.py index 10a2b3b510b1f16784c0a55bc8e81d4aab997ca3..72c5c5ecb61e8a92ec2e74b8cc7ca13bb6ace817 100644 --- a/namignizer/model.py +++ b/namignizer/model.py @@ -37,11 +37,14 @@ class NamignizerModel(object): self._weights = tf.placeholder(tf.float32, [batch_size * num_steps]) # lstm for our RNN cell (GRU supported too) - lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(size, forget_bias=0.0) - if is_training and config.keep_prob < 1: - lstm_cell = tf.nn.rnn_cell.DropoutWrapper( - lstm_cell, output_keep_prob=config.keep_prob) - cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers) + lstm_cells = [] + for layer in range(config.num_layers): + lstm_cell = tf.contrib.rnn.BasicLSTMCell(size, forget_bias=0.0) + if is_training and config.keep_prob < 1: + lstm_cell = tf.contrib.rnn.DropoutWrapper( + lstm_cell, output_keep_prob=config.keep_prob) + lstm_cells.append(lstm_cell) + cell = tf.contrib.rnn.MultiRNNCell(lstm_cells) self._initial_state = cell.zero_state(batch_size, tf.float32) @@ -61,11 +64,11 @@ class NamignizerModel(object): (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output) - output = tf.reshape(tf.concat(1, outputs), [-1, size]) + output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size]) softmax_w = tf.get_variable("softmax_w", [size, vocab_size]) softmax_b = tf.get_variable("softmax_b", [vocab_size]) logits = tf.matmul(output, softmax_w) + softmax_b - loss = tf.nn.seq2seq.sequence_loss_by_example( + loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example( [logits], [tf.reshape(self._targets, [-1])], [self._weights]) @@ -77,7 +80,7 @@ class NamignizerModel(object): self._activations = tf.nn.softmax(logits) # ability to save the model - self.saver = tf.train.Saver(tf.all_variables()) + self.saver = tf.train.Saver(tf.global_variables()) if not is_training: return diff --git a/namignizer/names.py b/namignizer/names.py index 0898ef9b29b9ff6f1ed6e9d4a936e600b0506933..253742716391f2f4b7a0c0cf4987e40a2aaa808f 100644 --- a/namignizer/names.py +++ b/namignizer/names.py @@ -14,7 +14,7 @@ """A library showing off sequence recognition and generation with the simple example of names. -We use recurrent neural nets to learn complex functions able to recogize and +We use recurrent neural nets to learn complex functions able to recognize and generate sequences of a given form. This can be used for natural language syntax recognition, dynamically generating maps or puzzles and of course baby name generation. @@ -122,7 +122,6 @@ def run_epoch(session, m, names, counts, epoch_size, eval_op, verbose=False): cost, _ = session.run([m.cost, eval_op], {m.input_data: x, m.targets: y, - m.initial_state: m.initial_state.eval(), m.weights: np.ones(m.batch_size * m.num_steps)}) costs += cost iters += m.num_steps @@ -201,7 +200,6 @@ def namignize(names, checkpoint_path, config): cost, loss, _ = session.run([m.cost, m.loss, tf.no_op()], {m.input_data: x, m.targets: y, - m.initial_state: m.initial_state.eval(), m.weights: np.concatenate(( np.ones(len(name)), np.zeros(m.batch_size * m.num_steps - len(name))))}) @@ -234,7 +232,6 @@ def namignator(checkpoint_path, config): activations, final_state, _ = session.run([m.activations, m.final_state, tf.no_op()], {m.input_data: np.zeros((1, 1)), m.targets: np.zeros((1, 1)), - m.initial_state: m.initial_state.eval(), m.weights: np.ones(1)}) # sample from our softmax activations @@ -254,9 +251,9 @@ def namignator(checkpoint_path, config): if __name__ == "__main__": - # train("data/SmallNames.txt", "model/namignizer", SmallConfig) + train("data/SmallNames.txt", "model/namignizer", SmallConfig) - # namignize(["mary", "ida", "gazorbazorb", "mmmhmm", "bob"], - # tf.train.latest_checkpoint("model"), SmallConfig) + namignize(["mary", "ida", "gazorbazorb", "mmmhmm", "bob"], + tf.train.latest_checkpoint("model"), SmallConfig) - # namignator(tf.train.latest_checkpoint("model"), SmallConfig) + namignator(tf.train.latest_checkpoint("model"), SmallConfig) diff --git a/neural_gpu/README.md b/neural_gpu/README.md index b73dd85ef7cea67b6c3ca681f52b89f0119d8f93..510f1c5e0aef697f503bc7b856e032db7e402be7 100644 --- a/neural_gpu/README.md +++ b/neural_gpu/README.md @@ -1,6 +1,6 @@ # NeuralGPU -Code for the Neural GPU model described in [[http://arxiv.org/abs/1511.08228]]. -The extended version was described in [[https://arxiv.org/abs/1610.08613]]. +Code for the Neural GPU model described in http://arxiv.org/abs/1511.08228. +The extended version was described in https://arxiv.org/abs/1610.08613. Requirements: * TensorFlow (see tensorflow.org for how to install) diff --git a/neural_gpu/neural_gpu.py b/neural_gpu/neural_gpu.py index ecf85f508e59d421eda36cd14a6021d29e00bfba..e8ba66e9d774f48cc4e5d7ccbd8a1c16f999f705 100644 --- a/neural_gpu/neural_gpu.py +++ b/neural_gpu/neural_gpu.py @@ -36,7 +36,7 @@ def conv_linear(args, kw, kh, nin, nout, rate, do_bias, bias_start, prefix): if len(args) == 1: arg = args[0] else: - arg = tf.concat(args, 3) + arg = tf.concat(axis=3, values=args) res = tf.nn.convolution(arg, k, dilation_rate=(rate, 1), padding="SAME") if not do_bias: return res with tf.device("/cpu:0"): @@ -71,14 +71,14 @@ def place_at14(decided, selected, it): """Place selected at it-th coordinate of decided, dim=1 of 4.""" slice1 = decided[:, :it, :, :] slice2 = decided[:, it + 1:, :, :] - return tf.concat([slice1, selected, slice2], 1) + return tf.concat(axis=1, values=[slice1, selected, slice2]) def place_at13(decided, selected, it): """Place selected at it-th coordinate of decided, dim=1 of 3.""" slice1 = decided[:, :it, :] slice2 = decided[:, it + 1:, :] - return tf.concat([slice1, selected, slice2], 1) + return tf.concat(axis=1, values=[slice1, selected, slice2]) def tanh_cutoff(x, cutoff): @@ -211,7 +211,7 @@ def reorder_beam(beam_size, batch_size, beam_val, output, is_first, # beam_val is [batch_size x beam_size]; let b = batch_size * beam_size # decided is len x b x a x b # output is b x out_size; step is b x len x a x b; - outputs = tf.split(tf.nn.log_softmax(output), beam_size, 0) + outputs = tf.split(axis=0, num_or_size_splits=beam_size, value=tf.nn.log_softmax(output)) all_beam_vals, all_beam_idx = [], [] beam_range = 1 if is_first else beam_size for i in xrange(beam_range): @@ -221,9 +221,9 @@ def reorder_beam(beam_size, batch_size, beam_val, output, is_first, cur_beam_val], "GREPO", summarize=8) all_beam_vals.append(top_out + tf.expand_dims(cur_beam_val, 1)) all_beam_idx.append(top_out_idx) - all_beam_idx = tf.reshape(tf.transpose(tf.concat(all_beam_idx, 1), [1, 0]), + all_beam_idx = tf.reshape(tf.transpose(tf.concat(axis=1, values=all_beam_idx), [1, 0]), [-1]) - top_beam, top_beam_idx = tf.nn.top_k(tf.concat(all_beam_vals, 1), k=beam_size) + top_beam, top_beam_idx = tf.nn.top_k(tf.concat(axis=1, values=all_beam_vals), k=beam_size) top_beam_idx = tf.Print(top_beam_idx, [top_beam, top_beam_idx], "GREP", summarize=8) reordered = [[] for _ in xrange(len(tensors_to_reorder) + 1)] @@ -236,8 +236,8 @@ def reorder_beam(beam_size, batch_size, beam_val, output, is_first, reordered[0].append(tf.gather(output, which_beam)) for i, t in enumerate(tensors_to_reorder): reordered[i + 1].append(tf.gather(t, which_beam)) - new_tensors = [tf.concat(t, 0) for t in reordered] - top_out_idx = tf.concat(top_out_idx, 0) + new_tensors = [tf.concat(axis=0, values=t) for t in reordered] + top_out_idx = tf.concat(axis=0, values=top_out_idx) return (top_beam, new_tensors[0], top_out_idx, new_tensors[1:]) @@ -266,9 +266,9 @@ class NeuralGPU(object): self.input = tf.placeholder(tf.int32, name="inp") self.target = tf.placeholder(tf.int32, name="tgt") self.prev_step = tf.placeholder(tf.float32, name="prev_step") - gpu_input = tf.split(self.input, num_gpus, 0) - gpu_target = tf.split(self.target, num_gpus, 0) - gpu_prev_step = tf.split(self.prev_step, num_gpus, 0) + gpu_input = tf.split(axis=0, num_or_size_splits=num_gpus, value=self.input) + gpu_target = tf.split(axis=0, num_or_size_splits=num_gpus, value=self.target) + gpu_prev_step = tf.split(axis=0, num_or_size_splits=num_gpus, value=self.prev_step) batch_size = tf.shape(gpu_input[0])[0] if backward: @@ -410,7 +410,7 @@ class NeuralGPU(object): out_write = output_ta.write(it, output_l[:batch_size, :, :, :]) output = tf.gather(target_emb_weights, out) output = tf.reshape(output, [-1, 1, nmaps]) - output = tf.concat([output] * height, 1) + output = tf.concat(axis=1, values=[output] * height) tgt = tgts[it, :, :, :] selected = tf.cond(tf.less(tf.random_uniform([]), self.sampling), lambda: output, lambda: tgt) @@ -419,7 +419,7 @@ class NeuralGPU(object): out_idx = place_at13( out_idx, tf.reshape(out, [beam_size * batch_size, 1, 1]), it) if mem_size > 0: - mem = tf.concat([mem] * height, 2) + mem = tf.concat(axis=2, values=[mem] * height) dec_write = place_at14(dec_write, mem, it_incr) return (step, dec_write, out_write, mloss + mem_loss, nupd_in + nupd, out_idx, beam_cost) @@ -459,7 +459,7 @@ class NeuralGPU(object): gpu_targets_tn) embedded_targets_tn = tf.transpose( embedded_targets_tn, [2, 0, 1, 3]) # len x b x 1 x nmaps - embedded_targets_tn = tf.concat([embedded_targets_tn] * height, 2) + embedded_targets_tn = tf.concat(axis=2, values=[embedded_targets_tn] * height) # First image comes from start by applying convolution and adding 0s. start = tf.transpose(start, [0, 2, 1, 3]) # Now b x len x h x vec_s @@ -478,8 +478,10 @@ class NeuralGPU(object): # This is just for running a baseline RNN seq2seq model. if do_rnn: self.after_enc_step.append(step) # Not meaningful here, but needed. - lstm_cell = tf.contrib.rnn.BasicLSTMCell(height * nmaps) - cell = tf.contrib.rnn.MultiRNNCell([lstm_cell] * nconvs) + def lstm_cell(): + return tf.contrib.rnn.BasicLSTMCell(height * nmaps) + cell = tf.contrib.rnn.MultiRNNCell( + [lstm_cell() for _ in range(nconvs)]) with tf.variable_scope("encoder"): encoder_outputs, encoder_state = tf.nn.dynamic_rnn( cell, tf.reshape(step, [batch_size, length, height * nmaps]), @@ -505,7 +507,7 @@ class NeuralGPU(object): attn_res = attention_query(attn_q, tf.get_variable( "attn_v", [height * nmaps], initializer=tf.random_uniform_initializer(-0.1, 0.1))) - concatenated = tf.reshape(tf.concat([cell_inp, attn_res], 1), + concatenated = tf.reshape(tf.concat(axis=1, values=[cell_inp, attn_res]), [batch_size, 2 * height * nmaps]) cell_inp = tf.layers.dense( concatenated, height * nmaps, name="attn_merge") @@ -519,14 +521,14 @@ class NeuralGPU(object): res = tf.gather(target_emb_weights, res) res *= tf.expand_dims(mask[:, 0], 1) output = tf.layers.dense( - tf.concat([output, res], 1), height * nmaps, name="rnnmem") + tf.concat(axis=1, values=[output, res]), height * nmaps, name="rnnmem") return new_state, output, mem_loss # pylint: enable=cell-var-from-loop gpu_targets = tf.squeeze(gpu_target[gpu], [1]) # b x len gpu_tgt_trans = tf.transpose(gpu_targets, [1, 0]) dec_zero = tf.zeros([batch_size, 1], dtype=tf.int32) - dec_inp = tf.concat([dec_zero, gpu_targets], 1) + dec_inp = tf.concat(axis=1, values=[dec_zero, gpu_targets]) dec_inp = dec_inp[:, :length] embedded_dec_inp = tf.gather(target_emb_weights, dec_inp) embedded_dec_inp_proj = tf.layers.dense( @@ -573,9 +575,9 @@ class NeuralGPU(object): height, vec_size]) # Prepare for beam search. - tgts = tf.concat([embedded_targets_tn] * beam_size, 1) + tgts = tf.concat(axis=1, values=[embedded_targets_tn] * beam_size) beam_cost = tf.zeros([batch_size, beam_size]) - step = tf.concat([step] * beam_size, 0) + step = tf.concat(axis=0, values=[step] * beam_size) # First step hard-coded. step, decided_t, output_ta, mem_loss, nupd, oi, bc = dec_step( step, 0, 0, decided_t, output_ta, tgts, 0.0, 0, out_idx, @@ -654,7 +656,7 @@ class NeuralGPU(object): % (gpu, time.time() - start_time)) self.updates = [] - self.after_enc_step = tf.concat(self.after_enc_step, 0) # Concat GPUs. + self.after_enc_step = tf.concat(axis=0, values=self.after_enc_step) # Concat GPUs. if backward: tf.get_variable_scope()._reuse = False tf.get_variable_scope().set_caching_device(None) @@ -667,10 +669,10 @@ class NeuralGPU(object): self.losses = [gpu_avg([gpu_losses[g][i] for g in xrange(num_gpus)]) for i in xrange(len(gpu_losses[0]))] - self.out_idx = tf.concat(gpu_out_idx, 0) + self.out_idx = tf.concat(axis=0, values=gpu_out_idx) self.grad_norms = [gpu_avg([gpu_grad_norms[g][i] for g in xrange(num_gpus)]) for i in xrange(len(gpu_grad_norms[0]))] - self.outputs = [tf.concat([gpu_outputs[g] for g in xrange(num_gpus)], 1)] + self.outputs = [tf.concat(axis=1, values=[gpu_outputs[g] for g in xrange(num_gpus)])] self.quantize_op = quantize_weights_op(512, 8) if backward: self.saver = tf.train.Saver(tf.global_variables(), max_to_keep=10) diff --git a/neural_gpu/wmt_utils.py b/neural_gpu/wmt_utils.py index bb6ddbbd17824a6f287edab54844bad023b82356..97c89fb296ce2d60aba0e091f443a66e7022ed88 100644 --- a/neural_gpu/wmt_utils.py +++ b/neural_gpu/wmt_utils.py @@ -60,7 +60,7 @@ def maybe_download(directory, filename, url): print "Downloading %s to %s" % (url, filepath) filepath, _ = urllib.request.urlretrieve(url, filepath) statinfo = os.stat(filepath) - print "Succesfully downloaded", filename, statinfo.st_size, "bytes" + print "Successfully downloaded", filename, statinfo.st_size, "bytes" return filepath diff --git a/neural_programmer/data_utils.py b/neural_programmer/data_utils.py old mode 100755 new mode 100644 index f43ce57daa822a11a559ce72b3a3183422c047fd..f6d50b4eef34c9c4f55c950603b06de07ea47da1 --- a/neural_programmer/data_utils.py +++ b/neural_programmer/data_utils.py @@ -223,7 +223,7 @@ def list_join(a): def group_by_max(table, number): - #computes the most frequently occuring entry in a column + #computes the most frequently occurring entry in a column answer = [] for i in range(len(table)): temp = [] diff --git a/neural_programmer/model.py b/neural_programmer/model.py old mode 100755 new mode 100644 index 8c06c4f1f5b48bdf1295047cf0b9e085ff71b70c..fd0037a257e74736b896dd578e02778e6f59019e --- a/neural_programmer/model.py +++ b/neural_programmer/model.py @@ -121,21 +121,21 @@ class Graph(): if (self.utility.FLAGS.rnn_dropout > 0.0): question_hidden = question_hidden * rnn_dropout_mask hidden_vectors.append(tf.expand_dims(question_hidden, 0)) - hidden_vectors = tf.concat(0, hidden_vectors) + hidden_vectors = tf.concat(axis=0, values=hidden_vectors) return question_hidden, hidden_vectors def history_recurrent_step(self, curr_hprev, hprev): #A single RNN step for controller or history RNN return tf.tanh( tf.matmul( - tf.concat(1, [hprev, curr_hprev]), self.params[ + tf.concat(axis=1, values=[hprev, curr_hprev]), self.params[ "history_recurrent"])) + self.params["history_recurrent_bias"] def question_number_softmax(self, hidden_vectors): #Attention on quetsion to decide the question number to passed to comparison ops def compute_ans(op_embedding, comparison): op_embedding = tf.expand_dims(op_embedding, 0) - #dot product of operation embedding with hidden state to the left of the number occurence + #dot product of operation embedding with hidden state to the left of the number occurrence first = tf.transpose( tf.matmul(op_embedding, tf.transpose( @@ -150,13 +150,13 @@ class Graph(): tf.expand_dims( tf.transpose(self.batch_ordinal_question_one), 2 ), [1, 1, self.utility.FLAGS.embedding_dims]), 0)))) - question_number_softmax = tf.nn.softmax(tf.concat(1, [first, second])) + question_number_softmax = tf.nn.softmax(tf.concat(axis=1, values=[first, second])) if (self.mode == "test"): cond = tf.equal(question_number_softmax, tf.reshape( tf.reduce_max(question_number_softmax, 1), [self.batch_size, 1])) - question_number_softmax = tf.select( + question_number_softmax = tf.where( cond, tf.fill(tf.shape(question_number_softmax), 1.0), tf.fill(tf.shape(question_number_softmax), 0.0)) @@ -164,7 +164,7 @@ class Graph(): self.data_type) ans = tf.reshape( tf.reduce_sum(question_number_softmax * tf.concat( - 1, [self.batch_question_number, self.batch_question_number_one]), + axis=1, values=[self.batch_question_number, self.batch_question_number_one]), 1), [self.batch_size, 1]) return ans @@ -225,7 +225,7 @@ class Graph(): column_controller_vector = nn_utils.apply_dropout( column_controller_vector, self.utility.FLAGS.dropout, self.mode) self.full_column_hidden_vectors = tf.concat( - 1, [self.column_hidden_vectors, self.word_column_hidden_vectors]) + axis=1, values=[self.column_hidden_vectors, self.word_column_hidden_vectors]) self.full_column_hidden_vectors += self.summary_text_entry_embeddings self.full_column_hidden_vectors = nn_utils.apply_dropout( self.full_column_hidden_vectors, self.utility.FLAGS.dropout, self.mode) @@ -258,7 +258,7 @@ class Graph(): temp_ans.append(curr_prob) else: temp_ans.append(tf.zeros_like(curr_prob)) - temp_ans = tf.transpose(tf.concat(0, temp_ans)) + temp_ans = tf.transpose(tf.concat(axis=0, values=temp_ans)) answer += temp_ans return answer @@ -266,7 +266,7 @@ class Graph(): #converts soft selection to hard selection. used at test time cond = tf.equal( softmax, tf.reshape(tf.reduce_max(softmax, 1), [self.batch_size, 1])) - softmax = tf.select( + softmax = tf.where( cond, tf.fill(tf.shape(softmax), 1.0), tf.fill(tf.shape(softmax), 0.0)) softmax = tf.cast(softmax, self.data_type) return softmax @@ -297,7 +297,7 @@ class Graph(): curr_prob = curr_prob * tf.expand_dims((1 - sum_prob), 2) curr_prob = curr_prob * tf.expand_dims( tf.cast((1 - sum_prob) > 0.0, self.data_type), 2) - answer = tf.select(select_mask, curr_prob, answer) + answer = tf.where(select_mask, curr_prob, answer) sum_prob += tf.reduce_sum(curr_prob, 2) return answer @@ -335,11 +335,11 @@ class Graph(): 1) #BS * max_elements select_min = tf.reduce_sum(init_min * select_full_column_softmax, 1) #BS * max_elements - select_prev = tf.concat(1, [ + select_prev = tf.concat(axis=1, values=[ tf.slice(select, [0, 1], [self.batch_size, self.max_elements - 1]), tf.cast(tf.zeros([self.batch_size, 1]), self.data_type) ]) - select_next = tf.concat(1, [ + select_next = tf.concat(axis=1, values=[ tf.cast(tf.zeros([self.batch_size, 1]), self.data_type), tf.slice( select, [0, 0], [self.batch_size, self.max_elements - 1]) ]) @@ -352,11 +352,11 @@ class Graph(): length_content = 1 length_select = 13 length_print = 1 - values = tf.concat(1, [count]) + values = tf.concat(axis=1, values=[count]) softmax_content = tf.slice(softmax, [0, 0], [self.batch_size, length_content]) #compute scalar output - output = tf.reduce_sum(tf.mul(softmax_content, values), 1) + output = tf.reduce_sum(tf.multiply(softmax_content, values), 1) #compute lookup answer softmax_print = tf.slice(softmax, [0, length_content + length_select], [self.batch_size, length_print]) @@ -384,7 +384,7 @@ class Graph(): ] select = tf.reduce_sum( tf.tile(tf.expand_dims(softmax_select, 2), [1, 1, self.max_elements]) * - tf.concat(1, select_lists), 1) + tf.concat(axis=1, values=select_lists), 1) select = select * self.select_whole_mask return output, select @@ -396,11 +396,11 @@ class Graph(): self.batch_question_attention_mask) #batch_size * embedding_dims controller_vector = tf.nn.relu( tf.matmul(hprev, self.params["controller_prev"]) + tf.matmul( - tf.concat(1, [question_embedding, attention_vector]), self.params[ + tf.concat(axis=1, values=[question_embedding, attention_vector]), self.params[ "controller"])) column_controller_vector = tf.nn.relu( tf.matmul(hprev, self.params["column_controller_prev"]) + tf.matmul( - tf.concat(1, [question_embedding, attention_vector]), self.params[ + tf.concat(axis=1, values=[question_embedding, attention_vector]), self.params[ "column_controller"])) controller_vector = nn_utils.apply_dropout( controller_vector, self.utility.FLAGS.dropout, self.mode) @@ -413,7 +413,7 @@ class Graph(): tf.matmul(tf.transpose(self.params_unit), tf.transpose(softmax))) column_controller_vector = tf.nn.relu( tf.matmul( - tf.concat(1, [ + tf.concat(axis=1, values=[ column_controller_vector, weighted_op_representation ]), self.params["break_conditional"])) full_column_softmax = self.compute_column_softmax(column_controller_vector, @@ -429,7 +429,7 @@ class Graph(): def compute_lookup_error(self, val): #computes lookup error. cond = tf.equal(self.batch_print_answer, val) - inter = tf.select( + inter = tf.where( cond, self.init_print_error, tf.tile( tf.reshape(tf.constant(1e10, self.data_type), [1, 1, 1]), [ @@ -450,12 +450,12 @@ class Graph(): def error_computation(self): #computes the error of each example in a batch - math_error = 0.5 * tf.square(tf.sub(self.scalar_output, self.batch_answer)) + math_error = 0.5 * tf.square(tf.subtract(self.scalar_output, self.batch_answer)) #scale math error math_error = math_error / self.rows math_error = tf.minimum(math_error, self.utility.FLAGS.max_math_error * tf.ones(tf.shape(math_error), self.data_type)) - self.init_print_error = tf.select( + self.init_print_error = tf.where( self.batch_gold_select, -1 * tf.log(self.batch_lookup_answer + 1e-300 + self.invert_select_full_mask), -1 * tf.log(1 - self.batch_lookup_answer)) * self.select_full_mask @@ -466,24 +466,24 @@ class Graph(): print_error += self.compute_lookup_error(val + 0.0) print_error = print_error * self.utility.FLAGS.print_cost / self.num_entries if (self.mode == "train"): - error = tf.select( + error = tf.where( tf.logical_and( tf.not_equal(self.batch_answer, 0.0), tf.not_equal( tf.reduce_sum(tf.reduce_sum(self.batch_print_answer, 1), 1), 0.0)), self.soft_min(math_error, print_error), - tf.select( + tf.where( tf.not_equal(self.batch_answer, 0.0), math_error, print_error)) else: - error = tf.select( + error = tf.where( tf.logical_and( tf.equal(self.scalar_output, 0.0), tf.equal( tf.reduce_sum(tf.reduce_sum(self.batch_lookup_answer, 1), 1), 0.0)), tf.ones_like(math_error), - tf.select( + tf.where( tf.equal(self.scalar_output, 0.0), print_error, math_error)) return error @@ -558,7 +558,7 @@ class Graph(): input_col = tf.reduce_sum( tf.expand_dims(soft_column_softmax, 2) * self.full_column_hidden_vectors, 1) - history_input = tf.concat(1, [input_op, input_col]) + history_input = tf.concat(axis=1, values=[input_op, input_col]) history_input = nn_utils.apply_dropout( history_input, self.utility.FLAGS.dropout, self.mode) hprev = self.history_recurrent_step(history_input, hprev) @@ -567,7 +567,7 @@ class Graph(): self.scalar_output = output error = self.error_computation() cond = tf.less(error, 0.0001, name="cond") - correct_add = tf.select( + correct_add = tf.where( cond, tf.fill(tf.shape(cond), 1.0), tf.fill(tf.shape(cond), 0.0)) correct = tf.reduce_sum(correct_add) error = error / batch_size @@ -579,11 +579,11 @@ class Graph(): #Sets mask variables and performs batch processing self.batch_gold_select = self.batch_print_answer > 0.0 self.full_column_mask = tf.concat( - 1, [self.batch_number_column_mask, self.batch_word_column_mask]) + axis=1, values=[self.batch_number_column_mask, self.batch_word_column_mask]) self.full_processed_column = tf.concat( - 1, - [self.batch_processed_number_column, self.batch_processed_word_column]) - self.full_processed_sorted_index_column = tf.concat(1, [ + axis=1, + values=[self.batch_processed_number_column, self.batch_processed_word_column]) + self.full_processed_sorted_index_column = tf.concat(axis=1, values=[ self.batch_processed_sorted_index_number_column, self.batch_processed_sorted_index_word_column ]) @@ -603,7 +603,7 @@ class Graph(): tf.equal(self.batch_word_column_entry_mask, self.utility.dummy_token_id)), self.data_type) self.select_full_mask = tf.concat( - 1, [self.select_mask, self.select_word_mask]) + axis=1, values=[self.select_mask, self.select_word_mask]) self.select_whole_mask = tf.maximum( tf.reshape( tf.slice(self.select_mask, [0, 0, 0], @@ -614,7 +614,7 @@ class Graph(): [self.batch_size, 1, self.max_elements]), [self.batch_size, self.max_elements])) self.invert_select_full_mask = tf.cast( - tf.concat(1, [ + tf.concat(axis=1, values=[ tf.equal(self.batch_number_column, self.utility.FLAGS.pad_int), tf.equal(self.batch_word_column_entry_mask, self.utility.dummy_token_id) diff --git a/neural_programmer/neural_programmer.py b/neural_programmer/neural_programmer.py old mode 100755 new mode 100644 diff --git a/neural_programmer/nn_utils.py b/neural_programmer/nn_utils.py old mode 100755 new mode 100644 diff --git a/neural_programmer/parameters.py b/neural_programmer/parameters.py old mode 100755 new mode 100644 diff --git a/neural_programmer/wiki_data.py b/neural_programmer/wiki_data.py old mode 100755 new mode 100644 diff --git a/next_frame_prediction/README.md b/next_frame_prediction/README.md index 09d32205e390de4d15fea9901bb3209723e161c3..d79a6d4c78a5f2f703fe59bbdf0c6df5f865fab8 100644 --- a/next_frame_prediction/README.md +++ b/next_frame_prediction/README.md @@ -12,17 +12,11 @@ Authors: Xin Pan (Github: panyx0718), Anelia Angelova Results: - ![Sample1](g3doc/cross_conv.png) - - + ![Sample2](g3doc/cross_conv2.png) - - ![Loss](g3doc/cross_conv3.png) - - Prerequisite: @@ -40,7 +34,7 @@ to tf.SequenceExample. How to run: ```shell -ls -R +$ ls -R .: data next_frame_prediction WORKSPACE @@ -58,18 +52,18 @@ cross_conv2.png cross_conv3.png cross_conv.png # Build everything. -bazel build -c opt next_frame_prediction/... +$ bazel build -c opt next_frame_prediction/... # The following example runs the generated 2d objects. # For Sprites dataset, image_size should be 60, norm_scale should be 255.0. # Batch size is normally 16~64, depending on your memory size. -# + # Run training. -bazel-bin/next_frame_prediction/cross_conv/train \ - --batch_size=1 \ - --data_filepattern=data/tfrecords \ - --image_size=64 \ - --log_root=/tmp/predict +$ bazel-bin/next_frame_prediction/cross_conv/train \ + --batch_size=1 \ + --data_filepattern=data/tfrecords \ + --image_size=64 \ + --log_root=/tmp/predict step: 1, loss: 24.428671 step: 2, loss: 19.211605 @@ -81,11 +75,11 @@ step: 7, loss: 1.747665 step: 8, loss: 1.572436 step: 9, loss: 1.586816 step: 10, loss: 1.434191 -# + # Run eval. -bazel-bin/next_frame_prediction/cross_conv/eval \ - --batch_size=1 \ - --data_filepattern=data/tfrecords_test \ - --image_size=64 \ - --log_root=/tmp/predict +$ bazel-bin/next_frame_prediction/cross_conv/eval \ + --batch_size=1 \ + --data_filepattern=data/tfrecords_test \ + --image_size=64 \ + --log_root=/tmp/predict ``` diff --git a/next_frame_prediction/cross_conv/model.py b/next_frame_prediction/cross_conv/model.py index d8d32392bcfe9720976b2cff1c022d8911599961..927382fd2ae5002d2c49d2f5972446287a7276c0 100644 --- a/next_frame_prediction/cross_conv/model.py +++ b/next_frame_prediction/cross_conv/model.py @@ -65,7 +65,7 @@ class CrossConvModel(object): diff = diff * 2.0 - self.params['scale'] diff_output = self.diff_output * 2.0 - self.params['scale'] concat_image = tf.concat( - 1, [image, image + diff_output, image + diff, diff_output]) + axis=1, values=[image, image + diff_output, image + diff, diff_output]) tf.summary.image('origin_predict_expect_predictdiff', concat_image) self.summary_op = tf.summary.merge_all() return self.loss @@ -113,7 +113,7 @@ class CrossConvModel(object): assert shape[1] == shape[2] and shape[1] == 128 batch_size = shape[0] - net = tf.concat(3, [image, diff]) + net = tf.concat(axis=3, values=[image, diff]) with tf.variable_scope('motion_encoder'): with slim.arg_scope([slim.conv2d], padding='VALID'): net = slim.conv2d(net, 96, [5, 5], stride=1) @@ -128,7 +128,7 @@ class CrossConvModel(object): z = tf.reshape(net, shape=[batch_size, -1]) self.z_mean, self.z_stddev_log = tf.split( - split_dim=1, num_split=2, value=z) + axis=1, num_or_size_splits=2, value=z) self.z_stddev = tf.exp(self.z_stddev_log) epsilon = tf.random_normal( @@ -174,7 +174,7 @@ class CrossConvModel(object): def _CrossConv(self, encoded_images): """Apply the motion kernel on the encoded_images.""" cross_conved_images = [] - kernels = tf.split(split_dim=3, num_split=4, value=self.kernel) + kernels = tf.split(axis=3, num_or_size_splits=4, value=self.kernel) for (i, encoded_image) in enumerate(encoded_images): with tf.variable_scope('cross_conv_%d' % i): kernel = kernels[i] @@ -187,7 +187,7 @@ class CrossConvModel(object): for j in xrange(len(encoded_image)): conved_image.append(self._CrossConvHelper( encoded_image[j], kernel[j])) - cross_conved_images.append(tf.concat(0, conved_image)) + cross_conved_images.append(tf.concat(axis=0, values=conved_image)) sys.stderr.write('cross_conved shape: %s\n' % cross_conved_images[-1].get_shape()) return cross_conved_images @@ -224,7 +224,7 @@ class CrossConvModel(object): nets.append(self._Deconv( cross_conved_image, 64, kernel_size=3, stride=stride)) - net = tf.concat(3, nets) + net = tf.concat(axis=3, values=nets) net = slim.conv2d(net, 128, [9, 9], padding='SAME', stride=1) net = slim.conv2d(net, 128, [1, 1], padding='SAME', stride=1) net = slim.conv2d(net, 3, [1, 1], padding='SAME', stride=1) diff --git a/next_frame_prediction/cross_conv/reader.py b/next_frame_prediction/cross_conv/reader.py index 58d69747b66d157898910ce24940fb81f002811a..cd3cd22047b9167b9b27b12b3ae8a049b3ed34f9 100644 --- a/next_frame_prediction/cross_conv/reader.py +++ b/next_frame_prediction/cross_conv/reader.py @@ -42,7 +42,7 @@ def SequenceToImageAndDiff(images): for i in xrange(0, len(resized_images)-1): diffs.append(resized_images[i+1] - resized_images[i]) image_diff_list.append( - (tf.concat(0, resized_images[:-1]), tf.concat(0, diffs))) + (tf.concat(axis=0, values=resized_images[:-1]), tf.concat(axis=0, values=diffs))) return image_diff_list diff --git a/real_nvp/real_nvp_multiscale_dataset.py b/real_nvp/real_nvp_multiscale_dataset.py index 8587261f9b5c86a864e55b13d68ab92f5931e606..a89dec8aa73012b41c4367cd1fe743af203dd8f0 100644 --- a/real_nvp/real_nvp_multiscale_dataset.py +++ b/real_nvp/real_nvp_multiscale_dataset.py @@ -332,7 +332,7 @@ def masked_conv_aff_coupling(input_, mask_in, dim, name, residual_blocks=residual_blocks, bottleneck=bottleneck, skip=skip) mask = tf.mod(mask_channel + mask, 2) - res = tf.split(res, 2, 3) + res = tf.split(axis=3, num_or_size_splits=2, value=res) shift, log_rescaling = res[-2], res[-1] scale = variable_on_cpu( "rescaling_scale", [], @@ -486,9 +486,9 @@ def conv_ch_aff_coupling(input_, dim, name, scope.reuse_variables() if change_bottom: - input_, canvas = tf.split(input_, 2, 3) + input_, canvas = tf.split(axis=3, num_or_size_splits=2, value=input_) else: - canvas, input_ = tf.split(input_, 2, 3) + canvas, input_ = tf.split(axis=3, num_or_size_splits=2, value=input_) shape = input_.get_shape().as_list() batch_size = shape[0] height = shape[1] @@ -509,7 +509,7 @@ def conv_ch_aff_coupling(input_, dim, name, train=train, weight_norm=weight_norm, residual_blocks=residual_blocks, bottleneck=bottleneck, skip=skip) - shift, log_rescaling = tf.split(res, 2, 3) + shift, log_rescaling = tf.split(axis=3, num_or_size_splits=2, value=res) scale = variable_on_cpu( "scale", [], tf.constant_initializer(1.)) @@ -570,9 +570,9 @@ def conv_ch_add_coupling(input_, dim, name, scope.reuse_variables() if change_bottom: - input_, canvas = tf.split(input_, 2, 3) + input_, canvas = tf.split(axis=3, num_or_size_splits=2, value=input_) else: - canvas, input_ = tf.split(input_, 2, 3) + canvas, input_ = tf.split(axis=3, num_or_size_splits=2, value=input_) shape = input_.get_shape().as_list() channels = shape[3] res = input_ @@ -736,8 +736,8 @@ def rec_masked_conv_coupling(input_, hps, scale_idx, n_scale, log_diff_1 = log_diff[:, :, :, :channels] log_diff_2 = log_diff[:, :, :, channels:] else: - res_1, res_2 = tf.split(res, 2, 3) - log_diff_1, log_diff_2 = tf.split(log_diff, 2, 3) + res_1, res_2 = tf.split(axis=3, num_or_size_splits=2, value=res) + log_diff_1, log_diff_2 = tf.split(axis=3, num_or_size_splits=2, value=log_diff) res_1, inc_log_diff = rec_masked_conv_coupling( input_=res_1, hps=hps, scale_idx=scale_idx + 1, n_scale=n_scale, use_batch_norm=use_batch_norm, weight_norm=weight_norm, @@ -798,8 +798,8 @@ def rec_masked_deconv_coupling(input_, hps, scale_idx, n_scale, log_diff_1 = log_diff[:, :, :, :channels] log_diff_2 = log_diff[:, :, :, channels:] else: - res_1, res_2 = tf.split(res, 2, 3) - log_diff_1, log_diff_2 = tf.split(log_diff, 2, 3) + res_1, res_2 = tf.split(axis=3, num_or_size_splits=2, value=res) + log_diff_1, log_diff_2 = tf.split(axis=3, num_or_size_splits=2, value=log_diff) res_1, log_diff_1 = rec_masked_deconv_coupling( input_=res_1, hps=hps, scale_idx=scale_idx + 1, n_scale=n_scale, @@ -1305,7 +1305,7 @@ class RealNVP(object): z_lost = z_complete for scale_idx in xrange(hps.n_scale - 1): z_lost = squeeze_2x2_ordered(z_lost) - z_lost, _ = tf.split(z_lost, 2, 3) + z_lost, _ = tf.split(axis=3, num_or_size_splits=2, value=z_lost) z_compressed = z_lost z_noisy = z_lost for _ in xrange(scale_idx + 1): diff --git a/real_nvp/real_nvp_utils.py b/real_nvp/real_nvp_utils.py index 203ca35ec4ab6f28a0b45185ad952cefaec76541..004ef62ca6d4970a788120a7d6485278cd6efd67 100644 --- a/real_nvp/real_nvp_utils.py +++ b/real_nvp/real_nvp_utils.py @@ -99,8 +99,8 @@ def conv_layer(input_, filter_size[1] - input_.get_shape().as_list()[2], input_.get_shape().as_list()[3] ]) - res = tf.concat(1, [pad_1, res]) - res = tf.concat(2, [pad_2, res]) + res = tf.concat(axis=1, values=[pad_1, res]) + res = tf.concat(axis=2, values=[pad_2, res]) res = tf.nn.conv2d( input=res, filter=weights, @@ -139,8 +139,8 @@ def depool_2x2(input_, stride=2): channels = shape[3] res = tf.reshape(input_, [batch_size, height, 1, width, 1, channels]) res = tf.concat( - 2, [res, tf.zeros([batch_size, height, stride - 1, width, 1, channels])]) - res = tf.concat(4, [ + axis=2, values=[res, tf.zeros([batch_size, height, stride - 1, width, 1, channels])]) + res = tf.concat(axis=4, values=[ res, tf.zeros([batch_size, height, stride, width, stride - 1, channels]) ]) res = tf.reshape(res, [batch_size, stride * height, stride * width, channels]) @@ -158,11 +158,11 @@ def batch_random_flip(input_): height = shape[1] width = shape[2] channels = shape[3] - res = tf.split(0, batch_size, input_) + res = tf.split(axis=0, num_or_size_splits=batch_size, value=input_) res = [elem[0, :, :, :] for elem in res] res = [tf.image.random_flip_left_right(elem) for elem in res] res = [tf.reshape(elem, [1, height, width, channels]) for elem in res] - res = tf.concat(0, res) + res = tf.concat(axis=0, values=res) return res @@ -175,7 +175,7 @@ def as_one_hot(input_, n_indices): n_elem = numpy.prod(shape) indices = tf.range(n_elem) indices = tf.cast(indices, tf.int64) - indices_input = tf.concat(0, [indices, tf.reshape(input_, [-1])]) + indices_input = tf.concat(axis=0, values=[indices, tf.reshape(input_, [-1])]) indices_input = tf.reshape(indices_input, [2, -1]) indices_input = tf.transpose(indices_input) res = tf.sparse_to_dense( diff --git a/resnet/README.md b/resnet/README.md index 4ea8028438803da8fa4fb61d0eed5545f8c05b10..7591b39cb10933f12fb199d37637b4b9b4e33b28 100644 --- a/resnet/README.md +++ b/resnet/README.md @@ -23,7 +23,7 @@ https://arxiv.org/pdf/1605.07146v1.pdf Settings: * Random split 50k training set into 45k/5k train/eval split. -* Pad to 36x36 and random crop. Horizontal flip. Per-image whitenting. +* Pad to 36x36 and random crop. Horizontal flip. Per-image whitening. * Momentum optimizer 0.9. * Learning rate schedule: 0.1 (40k), 0.01 (60k), 0.001 (>60k). * L2 weight decay: 0.002. @@ -31,13 +31,9 @@ https://arxiv.org/pdf/1605.07146v1.pdf Results: - ![Precisions](g3doc/cifar_resnet.gif) - - -![Precisions Legends](g3doc/cifar_resnet_legends.gif) - +![Precisions Legends](g3doc/cifar_resnet_legends.gif) CIFAR-10 Model|Best Precision|Steps --------------|--------------|------ @@ -69,40 +65,40 @@ curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binar How to run: ```shell -# cd to the your workspace. -# It contains an empty WORKSPACE file, resnet codes and cifar10 dataset. -# Note: User can split 5k from train set for eval set. -ls -R - .: - cifar10 resnet WORKSPACE +# cd to the models repository and run with bash. Expected command output shown. +# The directory should contain an empty WORKSPACE file, the resnet code, and the cifar10 dataset. +# Note: The user can split 5k from train set for eval set. +$ ls -R +.: +cifar10 resnet WORKSPACE - ./cifar10: - data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin - data_batch_5.bin test_batch.bin +./cifar10: +data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin +data_batch_5.bin test_batch.bin - ./resnet: - BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py +./resnet: +BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py # Build everything for GPU. -bazel build -c opt --config=cuda resnet/... +$ bazel build -c opt --config=cuda resnet/... # Train the model. -bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \ - --log_root=/tmp/resnet_model \ - --train_dir=/tmp/resnet_model/train \ - --dataset='cifar10' \ - --num_gpus=1 +$ bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \ + --log_root=/tmp/resnet_model \ + --train_dir=/tmp/resnet_model/train \ + --dataset='cifar10' \ + --num_gpus=1 # While the model is training, you can also check on its progress using tensorboard: -tensorboard --logdir=/tmp/resnet_model +$ tensorboard --logdir=/tmp/resnet_model # Evaluate the model. # Avoid running on the same GPU as the training job at the same time, # otherwise, you might run out of memory. -bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \ - --log_root=/tmp/resnet_model \ - --eval_dir=/tmp/resnet_model/test \ - --mode=eval \ - --dataset='cifar10' \ - --num_gpus=0 +$ bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \ + --log_root=/tmp/resnet_model \ + --eval_dir=/tmp/resnet_model/test \ + --mode=eval \ + --dataset='cifar10' \ + --num_gpus=0 ``` diff --git a/resnet/resnet_model.py b/resnet/resnet_model.py index f21db8acb863cce65a9e090f3e7a1c5da2dcec3d..a8b7f10ca1297b7a36e33781c97390a3f9f71298 100644 --- a/resnet/resnet_model.py +++ b/resnet/resnet_model.py @@ -85,7 +85,7 @@ class ResNet(object): # comparably good performance. # https://arxiv.org/pdf/1605.07146v1.pdf # filters = [16, 160, 320, 640] - # Update hps.num_residual_units to 9 + # Update hps.num_residual_units to 4 with tf.variable_scope('unit_1_0'): x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]), @@ -128,7 +128,7 @@ class ResNet(object): def _build_train_op(self): """Build training specific ops for the graph.""" self.lrn_rate = tf.constant(self.hps.lrn_rate, tf.float32) - tf.summary.scalar('learning rate', self.lrn_rate) + tf.summary.scalar('learning_rate', self.lrn_rate) trainable_variables = tf.trainable_variables() grads = tf.gradients(self.cost, trainable_variables) diff --git a/skip_thoughts/.gitignore b/skip_thoughts/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..91cb861a9c87147ac86eda5434e4af270ea7b1dc --- /dev/null +++ b/skip_thoughts/.gitignore @@ -0,0 +1,8 @@ +/bazel-bin +/bazel-ci_build-cache +/bazel-genfiles +/bazel-out +/bazel-skip_thoughts +/bazel-testlogs +/bazel-tf +*.pyc diff --git a/skip_thoughts/README.md b/skip_thoughts/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cdcffe7c51bb12ca29265580ff8eae54d02c2b7d --- /dev/null +++ b/skip_thoughts/README.md @@ -0,0 +1,475 @@ +# Skip-Thought Vectors + +This is a TensorFlow implementation of the model described in: + +Jamie Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, +Antonio Torralba, Raquel Urtasun, Sanja Fidler. +[Skip-Thought Vectors](https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf). +*In NIPS, 2015.* + + +## Contact +***Code author:*** Chris Shallue + +***Pull requests and issues:*** @cshallue + +## Contents +* [Model Overview](#model-overview) +* [Getting Started](#getting-started) + * [Install Required Packages](#install-required-packages) + * [Download Pretrained Models (Optional)](#download-pretrained-models-optional) +* [Training a Model](#training-a-model) + * [Prepare the Training Data](#prepare-the-training-data) + * [Run the Training Script](#run-the-training-script) + * [Track Training Progress](#track-training-progress) +* [Expanding the Vocabulary](#expanding-the-vocabulary) + * [Overview](#overview) + * [Preparation](#preparation) + * [Run the Vocabulary Expansion Script](#run-the-vocabulary-expansion-script) +* [Evaluating a Model](#evaluating-a-model) + * [Overview](#overview-1) + * [Preparation](#preparation-1) + * [Run the Evaluation Tasks](#run-the-evaluation-tasks) +* [Encoding Sentences](#encoding-sentences) + +## Model overview + +The *Skip-Thoughts* model is a sentence encoder. It learns to encode input +sentences into a fixed-dimensional vector representation that is useful for many +tasks, for example to detect paraphrases or to classify whether a product review +is positive or negative. See the +[Skip-Thought Vectors](https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf) +paper for details of the model architecture and more example applications. + +A trained *Skip-Thoughts* model will encode similar sentences nearby each other +in the embedding vector space. The following examples show the nearest neighbor by +cosine similarity of some sentences from the +[movie review dataset](https://www.cs.cornell.edu/people/pabo/movie-review-data/). + + +| Input sentence | Nearest Neighbor | +|----------------|------------------| +| Simplistic, silly and tedious. | Trite, banal, cliched, mostly inoffensive. | +| Not so much farcical as sour. | Not only unfunny, but downright repellent. | +| A sensitive and astute first feature by Anne-Sophie Birot. | Absorbing character study by André Turpin . | +| An enthralling, entertaining feature. | A slick, engrossing melodrama. | + +## Getting Started + +### Install Required Packages +First ensure that you have installed the following required packages: + +* **Bazel** ([instructions](http://bazel.build/docs/install.html)) +* **TensorFlow** ([instructions](https://www.tensorflow.org/install/)) +* **NumPy** ([instructions](http://www.scipy.org/install.html)) +* **scikit-learn** ([instructions](http://scikit-learn.org/stable/install.html)) +* **Natural Language Toolkit (NLTK)** + * First install NLTK ([instructions](http://www.nltk.org/install.html)) + * Then install the NLTK data ([instructions](http://www.nltk.org/data.html)) +* **gensim** ([instructions](https://radimrehurek.com/gensim/install.html)) + * Only required if you will be expanding your vocabulary with the [word2vec](https://code.google.com/archive/p/word2vec/) model. + + +### Download Pretrained Models (Optional) + +You can download model checkpoints pretrained on the +[BookCorpus](http://yknzhu.wixsite.com/mbweb) dataset in the following +configurations: + +* Unidirectional RNN encoder ("uni-skip" in the paper) +* Bidirectional RNN encoder ("bi-skip" in the paper) + +```shell +# Directory to download the pretrained models to. +PRETRAINED_MODELS_DIR="${HOME}/skip_thoughts/pretrained/" + +mkdir -p ${PRETRAINED_MODELS_DIR} +cd ${PRETRAINED_MODELS_DIR} + +# Download and extract the unidirectional model. +wget "http://download.tensorflow.org/models/skip_thoughts_uni_2017_02_02.tar.gz" +tar -xvf skip_thoughts_uni_2017_02_02.tar.gz +rm skip_thoughts_uni_2017_02_02.tar.gz + +# Download and extract the bidirectional model. +wget "http://download.tensorflow.org/models/skip_thoughts_bi_2017_02_16.tar.gz" +tar -xvf skip_thoughts_bi_2017_02_16.tar.gz +rm skip_thoughts_bi_2017_02_16.tar.gz +``` + +You can now skip to the sections [Evaluating a Model](#evaluating-a-model) and +[Encoding Sentences](#encoding-sentences). + + +## Training a Model + +### Prepare the Training Data + +To train a model you will need to provide training data in TFRecord format. The +TFRecord format consists of a set of sharded files containing serialized +`tf.Example` protocol buffers. Each `tf.Example` proto contains three +sentences: + + * `encode`: The sentence to encode. + * `decode_pre`: The sentence preceding `encode` in the original text. + * `decode_post`: The sentence following `encode` in the original text. + +Each sentence is a list of words. During preprocessing, a dictionary is created +that assigns each word in the vocabulary to an integer-valued id. Each sentence +is encoded as a list of integer word ids in the `tf.Example` protos. + +We have provided a script to preprocess any set of text-files into this format. +You may wish to use the [BookCorpus](http://yknzhu.wixsite.com/mbweb) dataset. +Note that the preprocessing script may take **12 hours** or more to complete +on this large dataset. + +```shell +# Comma-separated list of globs matching the input input files. The format of +# the input files is assumed to be a list of newline-separated sentences, where +# each sentence is already tokenized. +INPUT_FILES="${HOME}/skip_thoughts/bookcorpus/*.txt" + +# Location to save the preprocessed training and validation data. +DATA_DIR="${HOME}/skip_thoughts/data" + +# Build the preprocessing script. +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts/data:preprocess_dataset + +# Run the preprocessing script. +bazel-bin/skip_thoughts/data/preprocess_dataset \ + --input_files=${INPUT_FILES} \ + --output_dir=${DATA_DIR} +``` + +When the script finishes you will find 100 training files and 1 validation file +in `DATA_DIR`. The files will match the patterns `train-?????-of-00100` and +`validation-00000-of-00001` respectively. + +The script will also produce a file named `vocab.txt`. The format of this file +is a list of newline-separated words where the word id is the corresponding 0- +based line index. Words are sorted by descending order of frequency in the input +data. Only the top 20,000 words are assigned unique ids; all other words are +assigned the "unknown id" of 1 in the processed data. + +### Run the Training Script + +Execute the following commands to start the training script. By default it will +run for 500k steps (around 9 days on a GeForce GTX 1080 GPU). + +```shell +# Directory containing the preprocessed data. +DATA_DIR="${HOME}/skip_thoughts/data" + +# Directory to save the model. +MODEL_DIR="${HOME}/skip_thoughts/model" + +# Build the model. +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts/... + +# Run the training script. +bazel-bin/skip_thoughts/train \ + --input_file_pattern="${DATA_DIR}/train-?????-of-00100" \ + --train_dir="${MODEL_DIR}/train" +``` + +### Track Training Progress + +Optionally, you can run the `track_perplexity` script in a separate process. +This will log per-word perplexity on the validation set which allows training +progress to be monitored on +[TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard). + +Note that you may run out of memory if you run the this script on the same GPU +as the training script. You can set the environment variable +`CUDA_VISIBLE_DEVICES=""` to force the script to run on CPU. If it runs too +slowly on CPU, you can decrease the value of `--num_eval_examples`. + +```shell +DATA_DIR="${HOME}/skip_thoughts/data" +MODEL_DIR="${HOME}/skip_thoughts/model" + +# Ignore GPU devices (only necessary if your GPU is currently memory +# constrained, for example, by running the training script). +export CUDA_VISIBLE_DEVICES="" + +# Run the evaluation script. This will run in a loop, periodically loading the +# latest model checkpoint file and computing evaluation metrics. +bazel-bin/skip_thoughts/track_perplexity \ + --input_file_pattern="${DATA_DIR}/validation-?????-of-00001" \ + --checkpoint_dir="${MODEL_DIR}/train" \ + --eval_dir="${MODEL_DIR}/val" \ + --num_eval_examples=50000 +``` + +If you started the `track_perplexity` script, run a +[TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) +server in a separate process for real-time monitoring of training summaries and +validation perplexity. + +```shell +MODEL_DIR="${HOME}/skip_thoughts/model" + +# Run a TensorBoard server. +tensorboard --logdir="${MODEL_DIR}" +``` + +## Expanding the Vocabulary + +### Overview + +The vocabulary generated by the preprocessing script contains only 20,000 words +which is insufficient for many tasks. For example, a sentence from Wikipedia +might contain nouns that do not appear in this vocabulary. + +A solution to this problem described in the +[Skip-Thought Vectors](https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf) +paper is to learn a mapping that transfers word representations from one model to +another. This idea is based on the "Translation Matrix" method from the paper +[Exploiting Similarities Among Languages for Machine Translation](https://arxiv.org/abs/1309.4168). + + +Specifically, we will load the word embeddings from a trained *Skip-Thoughts* +model and from a trained [word2vec model](https://arxiv.org/pdf/1301.3781.pdf) +(which has a much larger vocabulary). We will train a linear regression model +without regularization to learn a linear mapping from the word2vec embedding +space to the *Skip-Thoughts* embedding space. We will then apply the linear +model to all words in the word2vec vocabulary, yielding vectors in the *Skip- +Thoughts* word embedding space for the union of the two vocabularies. + +The linear regression task is to learn a parameter matrix *W* to minimize +*|| X - Y \* W ||2*, where *X* is a matrix of *Skip-Thoughts* +embeddings of shape `[num_words, dim1]`, *Y* is a matrix of word2vec embeddings +of shape `[num_words, dim2]`, and *W* is a matrix of shape `[dim2, dim1]`. + +### Preparation + +First you will need to download and unpack a pretrained +[word2vec model](https://arxiv.org/pdf/1301.3781.pdf) from +[this website](https://code.google.com/archive/p/word2vec/) +([direct download link](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing)). +This model was trained on the Google News dataset (about 100 billion words). + + +Also ensure that you have already [installed gensim](https://radimrehurek.com/gensim/install.html). + +### Run the Vocabulary Expansion Script + +```shell +# Path to checkpoint file or a directory containing checkpoint files (the script +# will select the most recent). +CHECKPOINT_PATH="${HOME}/skip_thoughts/model/train" + +# Vocabulary file generated by the preprocessing script. +SKIP_THOUGHTS_VOCAB="${HOME}/skip_thoughts/data/vocab.txt" + +# Path to downloaded word2vec model. +WORD2VEC_MODEL="${HOME}/skip_thoughts/googlenews/GoogleNews-vectors-negative300.bin" + +# Output directory. +EXP_VOCAB_DIR="${HOME}/skip_thoughts/exp_vocab" + +# Build the vocabulary expansion script. +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts:vocabulary_expansion + +# Run the vocabulary expansion script. +bazel-bin/skip_thoughts/vocabulary_expansion \ + --skip_thoughts_model=${CHECKPOINT_PATH} \ + --skip_thoughts_vocab=${SKIP_THOUGHTS_VOCAB} \ + --word2vec_model=${WORD2VEC_MODEL} \ + --output_dir=${EXP_VOCAB_DIR} +``` + +## Evaluating a Model + +### Overview + +The model can be evaluated using the benchmark tasks described in the +[Skip-Thought Vectors](https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf) +paper. The following tasks are supported (refer to the paper for full details): + + * **SICK** semantic relatedness task. + * **MSRP** (Microsoft Research Paraphrase Corpus) paraphrase detection task. + * Binary classification tasks: + * **MR** movie review sentiment task. + * **CR** customer product review task. + * **SUBJ** subjectivity/objectivity task. + * **MPQA** opinion polarity task. + * **TREC** question-type classification task. + +### Preparation + +You will need to clone or download the +[skip-thoughts GitHub repository](https://github.com/ryankiros/skip-thoughts) by +[ryankiros](https://github.com/ryankiros) (the first author of the Skip-Thoughts +paper): + +```shell +# Folder to clone the repository to. +ST_KIROS_DIR="${HOME}/skip_thoughts/skipthoughts_kiros" + +# Clone the repository. +git clone git@github.com:ryankiros/skip-thoughts.git "${ST_KIROS_DIR}/skipthoughts" + +# Make the package importable. +export PYTHONPATH="${ST_KIROS_DIR}/:${PYTHONPATH}" +``` + +You will also need to download the data needed for each evaluation task. See the +instructions [here](https://github.com/ryankiros/skip-thoughts). + +For example, the CR (customer review) dataset is found [here](http://nlp.stanford.edu/~sidaw/home/projects:nbsvm). For this task we want the +files `custrev.pos` and `custrev.neg`. + +### Run the Evaluation Tasks + +In the following example we will evaluate a unidirectional model ("uni-skip" in +the paper) on the CR task. To use a bidirectional model ("bi-skip" in the +paper), simply pass the flags `--bi_vocab_file`, `--bi_embeddings_file` and +`--bi_checkpoint_path` instead. To use the "combine-skip" model described in the +paper you will need to pass both the unidirectional and bidirectional flags. + +```shell +# Path to checkpoint file or a directory containing checkpoint files (the script +# will select the most recent). +CHECKPOINT_PATH="${HOME}/skip_thoughts/model/train" + +# Vocabulary file generated by the vocabulary expansion script. +VOCAB_FILE="${HOME}/skip_thoughts/exp_vocab/vocab.txt" + +# Embeddings file generated by the vocabulary expansion script. +EMBEDDINGS_FILE="${HOME}/skip_thoughts/exp_vocab/embeddings.npy" + +# Directory containing files custrev.pos and custrev.neg. +EVAL_DATA_DIR="${HOME}/skip_thoughts/eval_data" + +# Build the evaluation script. +cd tensorflow-models/skip_thoughts +bazel build -c opt //skip_thoughts:evaluate + +# Run the evaluation script. +bazel-bin/skip_thoughts/evaluate \ + --eval_task=CR \ + --data_dir=${EVAL_DATA_DIR} \ + --uni_vocab_file=${VOCAB_FILE} \ + --uni_embeddings_file=${EMBEDDINGS_FILE} \ + --uni_checkpoint_path=${CHECKPOINT_PATH} +``` + +Output: + +```python +[0.82539682539682535, 0.84084880636604775, 0.83023872679045096, + 0.86206896551724133, 0.83554376657824936, 0.85676392572944293, + 0.84084880636604775, 0.83023872679045096, 0.85145888594164454, + 0.82758620689655171] +``` + +The output is a list of accuracies of 10 cross-validation classification models. +To get a single number, simply take the average: + +```python +ipython # Launch iPython. + +In [0]: +import numpy as np +np.mean([0.82539682539682535, 0.84084880636604775, 0.83023872679045096, + 0.86206896551724133, 0.83554376657824936, 0.85676392572944293, + 0.84084880636604775, 0.83023872679045096, 0.85145888594164454, + 0.82758620689655171]) + +Out [0]: 0.84009936423729525 +``` + +## Encoding Sentences + +In this example we will encode data from the +[movie review dataset](https://www.cs.cornell.edu/people/pabo/movie-review-data/) +(specifically the [sentence polarity dataset v1.0](https://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz)). + +```python +ipython # Launch iPython. + +In [0]: + +# Imports. +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import numpy as np +import os.path +import scipy.spatial.distance as sd +from skip_thoughts import configuration +from skip_thoughts import encoder_manager + +In [1]: +# Set paths to the model. +VOCAB_FILE = "/path/to/vocab.txt" +EMBEDDING_MATRIX_FILE = "/path/to/embeddings.npy" +CHECKPOINT_PATH = "/path/to/model.ckpt-9999" +# The following directory should contain files rt-polarity.neg and +# rt-polarity.pos. +MR_DATA_DIR = "/dir/containing/mr/data" + +In [2]: +# Set up the encoder. Here we are using a single unidirectional model. +# To use a bidirectional model as well, call load_model() again with +# configuration.model_config(bidirectional_encoder=True) and paths to the +# bidirectional model's files. The encoder will use the concatenation of +# all loaded models. +encoder = encoder_manager.EncoderManager() +encoder.load_model(configuration.model_config(), + vocabulary_file=VOCAB_FILE, + embedding_matrix_file=EMBEDDING_MATRIX_FILE, + checkpoint_path=CHECKPOINT_PATH) + +In [3]: +# Load the movie review dataset. +data = [] +with open(os.path.join(MR_DATA_DIR, 'rt-polarity.neg'), 'rb') as f: + data.extend([line.decode('latin-1').strip() for line in f]) +with open(os.path.join(MR_DATA_DIR, 'rt-polarity.pos'), 'rb') as f: + data.extend([line.decode('latin-1').strip() for line in f]) + +In [4]: +# Generate Skip-Thought Vectors for each sentence in the dataset. +encodings = encoder.encode(data) + +In [5]: +# Define a helper function to generate nearest neighbors. +def get_nn(ind, num=10): + encoding = encodings[ind] + scores = sd.cdist([encoding], encodings, "cosine")[0] + sorted_ids = np.argsort(scores) + print("Sentence:") + print("", data[ind]) + print("\nNearest neighbors:") + for i in range(1, num + 1): + print(" %d. %s (%.3f)" % + (i, data[sorted_ids[i]], scores[sorted_ids[i]])) + +In [6]: +# Compute nearest neighbors of the first sentence in the dataset. +get_nn(0) +``` + +Output: + +``` +Sentence: + simplistic , silly and tedious . + +Nearest neighbors: + 1. trite , banal , cliched , mostly inoffensive . (0.247) + 2. banal and predictable . (0.253) + 3. witless , pointless , tasteless and idiotic . (0.272) + 4. loud , silly , stupid and pointless . (0.295) + 5. grating and tedious . (0.299) + 6. idiotic and ugly . (0.330) + 7. black-and-white and unrealistic . (0.335) + 8. hopelessly inane , humorless and under-inspired . (0.335) + 9. shallow , noisy and pretentious . (0.340) + 10. . . . unlikable , uninteresting , unfunny , and completely , utterly inept . (0.346) +``` diff --git a/skip_thoughts/WORKSPACE b/skip_thoughts/WORKSPACE new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/skip_thoughts/skip_thoughts/BUILD b/skip_thoughts/skip_thoughts/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..40ecd50102f4adeff42cdcb738f9bc7decec01e3 --- /dev/null +++ b/skip_thoughts/skip_thoughts/BUILD @@ -0,0 +1,94 @@ +package(default_visibility = [":internal"]) + +licenses(["notice"]) # Apache 2.0 + +exports_files(["LICENSE"]) + +package_group( + name = "internal", + packages = [ + "//skip_thoughts/...", + ], +) + +py_library( + name = "configuration", + srcs = ["configuration.py"], + srcs_version = "PY2AND3", +) + +py_library( + name = "skip_thoughts_model", + srcs = ["skip_thoughts_model.py"], + srcs_version = "PY2AND3", + deps = [ + "//skip_thoughts/ops:gru_cell", + "//skip_thoughts/ops:input_ops", + ], +) + +py_test( + name = "skip_thoughts_model_test", + size = "large", + srcs = ["skip_thoughts_model_test.py"], + deps = [ + ":configuration", + ":skip_thoughts_model", + ], +) + +py_binary( + name = "train", + srcs = ["train.py"], + srcs_version = "PY2AND3", + deps = [ + ":configuration", + ":skip_thoughts_model", + ], +) + +py_binary( + name = "track_perplexity", + srcs = ["track_perplexity.py"], + srcs_version = "PY2AND3", + deps = [ + ":configuration", + ":skip_thoughts_model", + ], +) + +py_binary( + name = "vocabulary_expansion", + srcs = ["vocabulary_expansion.py"], + srcs_version = "PY2AND3", +) + +py_library( + name = "skip_thoughts_encoder", + srcs = ["skip_thoughts_encoder.py"], + srcs_version = "PY2AND3", + deps = [ + ":skip_thoughts_model", + "//skip_thoughts/data:special_words", + ], +) + +py_library( + name = "encoder_manager", + srcs = ["encoder_manager.py"], + srcs_version = "PY2AND3", + deps = [ + ":skip_thoughts_encoder", + ], +) + +py_binary( + name = "evaluate", + srcs = ["evaluate.py"], + srcs_version = "PY2AND3", + deps = [ + ":encoder_manager", + "//skip_thoughts:configuration", + ], +) + diff --git a/skip_thoughts/skip_thoughts/__init__.py b/skip_thoughts/skip_thoughts/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/skip_thoughts/skip_thoughts/configuration.py b/skip_thoughts/skip_thoughts/configuration.py new file mode 100644 index 0000000000000000000000000000000000000000..bc04d57983584a7026df890d472ff326891e1136 --- /dev/null +++ b/skip_thoughts/skip_thoughts/configuration.py @@ -0,0 +1,110 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Default configuration for model architecture and training.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +class _HParams(object): + """Wrapper for configuration parameters.""" + pass + + +def model_config(input_file_pattern=None, + input_queue_capacity=640000, + num_input_reader_threads=1, + shuffle_input_data=True, + uniform_init_scale=0.1, + vocab_size=20000, + batch_size=128, + word_embedding_dim=620, + bidirectional_encoder=False, + encoder_dim=2400): + """Creates a model configuration object. + + Args: + input_file_pattern: File pattern of sharded TFRecord files containing + tf.Example protobufs. + input_queue_capacity: Number of examples to keep in the input queue. + num_input_reader_threads: Number of threads for prefetching input + tf.Examples. + shuffle_input_data: Whether to shuffle the input data. + uniform_init_scale: Scale of random uniform initializer. + vocab_size: Number of unique words in the vocab. + batch_size: Batch size (training and evaluation only). + word_embedding_dim: Word embedding dimension. + bidirectional_encoder: Whether to use a bidirectional or unidirectional + encoder RNN. + encoder_dim: Number of output dimensions of the sentence encoder. + + Returns: + An object containing model configuration parameters. + """ + config = _HParams() + config.input_file_pattern = input_file_pattern + config.input_queue_capacity = input_queue_capacity + config.num_input_reader_threads = num_input_reader_threads + config.shuffle_input_data = shuffle_input_data + config.uniform_init_scale = uniform_init_scale + config.vocab_size = vocab_size + config.batch_size = batch_size + config.word_embedding_dim = word_embedding_dim + config.bidirectional_encoder = bidirectional_encoder + config.encoder_dim = encoder_dim + return config + + +def training_config(learning_rate=0.0008, + learning_rate_decay_factor=0.5, + learning_rate_decay_steps=400000, + number_of_steps=500000, + clip_gradient_norm=5.0, + save_model_secs=600, + save_summaries_secs=600): + """Creates a training configuration object. + + Args: + learning_rate: Initial learning rate. + learning_rate_decay_factor: If > 0, the learning rate decay factor. + learning_rate_decay_steps: The number of steps before the learning rate + decays by learning_rate_decay_factor. + number_of_steps: The total number of training steps to run. Passing None + will cause the training script to run indefinitely. + clip_gradient_norm: If not None, then clip gradients to this value. + save_model_secs: How often (in seconds) to save model checkpoints. + save_summaries_secs: How often (in seconds) to save model summaries. + + Returns: + An object containing training configuration parameters. + + Raises: + ValueError: If learning_rate_decay_factor is set and + learning_rate_decay_steps is unset. + """ + if learning_rate_decay_factor and not learning_rate_decay_steps: + raise ValueError( + "learning_rate_decay_factor requires learning_rate_decay_steps.") + + config = _HParams() + config.learning_rate = learning_rate + config.learning_rate_decay_factor = learning_rate_decay_factor + config.learning_rate_decay_steps = learning_rate_decay_steps + config.number_of_steps = number_of_steps + config.clip_gradient_norm = clip_gradient_norm + config.save_model_secs = save_model_secs + config.save_summaries_secs = save_summaries_secs + return config diff --git a/skip_thoughts/skip_thoughts/data/BUILD b/skip_thoughts/skip_thoughts/data/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..afc209e181fb2f7bc5183bf52e471278daf82ed2 --- /dev/null +++ b/skip_thoughts/skip_thoughts/data/BUILD @@ -0,0 +1,23 @@ +package(default_visibility = ["//skip_thoughts:internal"]) + +licenses(["notice"]) # Apache 2.0 + +exports_files(["LICENSE"]) + +py_library( + name = "special_words", + srcs = ["special_words.py"], + srcs_version = "PY2AND3", + deps = [], +) + +py_binary( + name = "preprocess_dataset", + srcs = [ + "preprocess_dataset.py", + ], + srcs_version = "PY2AND3", + deps = [ + ":special_words", + ], +) diff --git a/skip_thoughts/skip_thoughts/data/__init__.py b/skip_thoughts/skip_thoughts/data/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/skip_thoughts/skip_thoughts/data/preprocess_dataset.py b/skip_thoughts/skip_thoughts/data/preprocess_dataset.py new file mode 100644 index 0000000000000000000000000000000000000000..dfca7e8468971bcefe709cf0338697397790c5fd --- /dev/null +++ b/skip_thoughts/skip_thoughts/data/preprocess_dataset.py @@ -0,0 +1,301 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Converts a set of text files to TFRecord format with Example protos. + +Each Example proto in the output contains the following fields: + + decode_pre: list of int64 ids corresponding to the "previous" sentence. + encode: list of int64 ids corresponding to the "current" sentence. + decode_post: list of int64 ids corresponding to the "post" sentence. + +In addition, the following files are generated: + + vocab.txt: List of " " pairs, where is the integer + encoding of in the Example protos. + word_counts.txt: List of " " pairs, where is the number + of occurrences of in the input files. + +The vocabulary of word ids is constructed from the top --num_words by word +count. All other words get the word id. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import os + + +import numpy as np +import tensorflow as tf + +from skip_thoughts.data import special_words + +FLAGS = tf.flags.FLAGS + +tf.flags.DEFINE_string("input_files", None, + "Comma-separated list of globs matching the input " + "files. The format of the input files is assumed to be " + "a list of newline-separated sentences, where each " + "sentence is already tokenized.") + +tf.flags.DEFINE_string("vocab_file", "", + "(Optional) existing vocab file. Otherwise, a new vocab " + "file is created and written to the output directory. " + "The file format is a list of newline-separated words, " + "where the word id is the corresponding 0-based index " + "in the file.") + +tf.flags.DEFINE_string("output_dir", None, "Output directory.") + +tf.flags.DEFINE_integer("train_output_shards", 100, + "Number of output shards for the training set.") + +tf.flags.DEFINE_integer("validation_output_shards", 1, + "Number of output shards for the validation set.") + +tf.flags.DEFINE_integer("num_validation_sentences", 50000, + "Number of output shards for the validation set.") + +tf.flags.DEFINE_integer("num_words", 20000, + "Number of words to include in the output.") + +tf.flags.DEFINE_integer("max_sentences", 0, + "If > 0, the maximum number of sentences to output.") + +tf.flags.DEFINE_integer("max_sentence_length", 30, + "If > 0, exclude sentences whose encode, decode_pre OR" + "decode_post sentence exceeds this length.") + +tf.flags.DEFINE_boolean("add_eos", True, + "Whether to add end-of-sentence ids to the output.") + +tf.logging.set_verbosity(tf.logging.INFO) + + +def _build_vocabulary(input_files): + """Loads or builds the model vocabulary. + + Args: + input_files: List of pre-tokenized input .txt files. + + Returns: + vocab: A dictionary of word to id. + """ + if FLAGS.vocab_file: + tf.logging.info("Loading existing vocab file.") + vocab = collections.OrderedDict() + with tf.gfile.GFile(FLAGS.vocab_file, mode="r") as f: + for i, line in enumerate(f): + word = line.decode("utf-8").strip() + assert word not in vocab, "Attempting to add word twice: %s" % word + vocab[word] = i + tf.logging.info("Read vocab of size %d from %s", + len(vocab), FLAGS.vocab_file) + return vocab + + tf.logging.info("Creating vocabulary.") + num = 0 + wordcount = collections.Counter() + for input_file in input_files: + tf.logging.info("Processing file: %s", input_file) + for sentence in tf.gfile.FastGFile(input_file): + wordcount.update(sentence.split()) + + num += 1 + if num % 1000000 == 0: + tf.logging.info("Processed %d sentences", num) + + tf.logging.info("Processed %d sentences total", num) + + words = wordcount.keys() + freqs = wordcount.values() + sorted_indices = np.argsort(freqs)[::-1] + + vocab = collections.OrderedDict() + vocab[special_words.EOS] = special_words.EOS_ID + vocab[special_words.UNK] = special_words.UNK_ID + for w_id, w_index in enumerate(sorted_indices[0:FLAGS.num_words - 2]): + vocab[words[w_index]] = w_id + 2 # 0: EOS, 1: UNK. + + tf.logging.info("Created vocab with %d words", len(vocab)) + + vocab_file = os.path.join(FLAGS.output_dir, "vocab.txt") + with tf.gfile.FastGFile(vocab_file, "w") as f: + f.write("\n".join(vocab.keys())) + tf.logging.info("Wrote vocab file to %s", vocab_file) + + word_counts_file = os.path.join(FLAGS.output_dir, "word_counts.txt") + with tf.gfile.FastGFile(word_counts_file, "w") as f: + for i in sorted_indices: + f.write("%s %d\n" % (words[i], freqs[i])) + tf.logging.info("Wrote word counts file to %s", word_counts_file) + + return vocab + + +def _int64_feature(value): + """Helper for creating an Int64 Feature.""" + return tf.train.Feature(int64_list=tf.train.Int64List( + value=[int(v) for v in value])) + + +def _sentence_to_ids(sentence, vocab): + """Helper for converting a sentence (list of words) to a list of ids.""" + ids = [vocab.get(w, special_words.UNK_ID) for w in sentence] + if FLAGS.add_eos: + ids.append(special_words.EOS_ID) + return ids + + +def _create_serialized_example(predecessor, current, successor, vocab): + """Helper for creating a serialized Example proto.""" + example = tf.train.Example(features=tf.train.Features(feature={ + "decode_pre": _int64_feature(_sentence_to_ids(predecessor, vocab)), + "encode": _int64_feature(_sentence_to_ids(current, vocab)), + "decode_post": _int64_feature(_sentence_to_ids(successor, vocab)), + })) + + return example.SerializeToString() + + +def _process_input_file(filename, vocab, stats): + """Processes the sentences in an input file. + + Args: + filename: Path to a pre-tokenized input .txt file. + vocab: A dictionary of word to id. + stats: A Counter object for statistics. + + Returns: + processed: A list of serialized Example protos + """ + tf.logging.info("Processing input file: %s", filename) + processed = [] + + predecessor = None # Predecessor sentence (list of words). + current = None # Current sentence (list of words). + successor = None # Successor sentence (list of words). + + for successor_str in tf.gfile.FastGFile(filename): + stats.update(["sentences_seen"]) + successor = successor_str.split() + + # The first 2 sentences per file will be skipped. + if predecessor and current and successor: + stats.update(["sentences_considered"]) + + # Note that we are going to insert later, so we only allow + # sentences with strictly less than max_sentence_length to pass. + if FLAGS.max_sentence_length and ( + len(predecessor) >= FLAGS.max_sentence_length or len(current) >= + FLAGS.max_sentence_length or len(successor) >= + FLAGS.max_sentence_length): + stats.update(["sentences_too_long"]) + else: + serialized = _create_serialized_example(predecessor, current, successor, + vocab) + processed.append(serialized) + stats.update(["sentences_output"]) + + predecessor = current + current = successor + + sentences_seen = stats["sentences_seen"] + sentences_output = stats["sentences_output"] + if sentences_seen and sentences_seen % 100000 == 0: + tf.logging.info("Processed %d sentences (%d output)", sentences_seen, + sentences_output) + if FLAGS.max_sentences and sentences_output >= FLAGS.max_sentences: + break + + tf.logging.info("Completed processing file %s", filename) + return processed + + +def _write_shard(filename, dataset, indices): + """Writes a TFRecord shard.""" + with tf.python_io.TFRecordWriter(filename) as writer: + for j in indices: + writer.write(dataset[j]) + + +def _write_dataset(name, dataset, indices, num_shards): + """Writes a sharded TFRecord dataset. + + Args: + name: Name of the dataset (e.g. "train"). + dataset: List of serialized Example protos. + indices: List of indices of 'dataset' to be written. + num_shards: The number of output shards. + """ + tf.logging.info("Writing dataset %s", name) + borders = np.int32(np.linspace(0, len(indices), num_shards + 1)) + for i in range(num_shards): + filename = os.path.join(FLAGS.output_dir, "%s-%.5d-of-%.5d" % (name, i, + num_shards)) + shard_indices = indices[borders[i]:borders[i + 1]] + _write_shard(filename, dataset, shard_indices) + tf.logging.info("Wrote dataset indices [%d, %d) to output shard %s", + borders[i], borders[i + 1], filename) + tf.logging.info("Finished writing %d sentences in dataset %s.", + len(indices), name) + + +def main(unused_argv): + if not FLAGS.input_files: + raise ValueError("--input_files is required.") + if not FLAGS.output_dir: + raise ValueError("--output_dir is required.") + + if not tf.gfile.IsDirectory(FLAGS.output_dir): + tf.gfile.MakeDirs(FLAGS.output_dir) + + input_files = [] + for pattern in FLAGS.input_files.split(","): + match = tf.gfile.Glob(FLAGS.input_files) + if not match: + raise ValueError("Found no files matching %s" % pattern) + input_files.extend(match) + tf.logging.info("Found %d input files.", len(input_files)) + + vocab = _build_vocabulary(input_files) + + tf.logging.info("Generating dataset.") + stats = collections.Counter() + dataset = [] + for filename in input_files: + dataset.extend(_process_input_file(filename, vocab, stats)) + if FLAGS.max_sentences and stats["sentences_output"] >= FLAGS.max_sentences: + break + + tf.logging.info("Generated dataset with %d sentences.", len(dataset)) + for k, v in stats.items(): + tf.logging.info("%s: %d", k, v) + + tf.logging.info("Shuffling dataset.") + np.random.seed(123) + shuffled_indices = np.random.permutation(len(dataset)) + val_indices = shuffled_indices[:FLAGS.num_validation_sentences] + train_indices = shuffled_indices[FLAGS.num_validation_sentences:] + + _write_dataset("train", dataset, train_indices, FLAGS.train_output_shards) + _write_dataset("validation", dataset, val_indices, + FLAGS.validation_output_shards) + + +if __name__ == "__main__": + tf.app.run() diff --git a/tutorials/rnn/seq2seq.py b/skip_thoughts/skip_thoughts/data/special_words.py similarity index 66% rename from tutorials/rnn/seq2seq.py rename to skip_thoughts/skip_thoughts/data/special_words.py index ff487f9475cb109809ec12d48fe9c69060bf1ee0..fb76b7a94d1655f49f6906aa42fb2913ba8eceb9 100644 --- a/tutorials/rnn/seq2seq.py +++ b/skip_thoughts/skip_thoughts/data/special_words.py @@ -1,4 +1,4 @@ -# Copyright 2015 The TensorFlow Authors. All Rights Reserved. +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,11 +12,16 @@ # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== -"""Import seq2seq python ops for backward compatibility.""" +"""Special word constants. -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function +NOTE: The ids of the EOS and UNK constants should not be modified. It is assumed +that these always occupy the first two ids. +""" -raise ImportError( - "This module is deprecated. Use tf.contrib.legacy_seq2seq instead.") +# End of sentence. +EOS = "" +EOS_ID = 0 + +# Unknown. +UNK = "" +UNK_ID = 1 diff --git a/skip_thoughts/skip_thoughts/encoder_manager.py b/skip_thoughts/skip_thoughts/encoder_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..6d74fb5204f75dec5e7acfb1c0425b577c2ca603 --- /dev/null +++ b/skip_thoughts/skip_thoughts/encoder_manager.py @@ -0,0 +1,134 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Manager class for loading and encoding with multiple skip-thoughts models. + +If multiple models are loaded at once then the encode() function returns the +concatenation of the outputs of each model. + +Example usage: + manager = EncoderManager() + manager.load_model(model_config_1, vocabulary_file_1, embedding_matrix_file_1, + checkpoint_path_1) + manager.load_model(model_config_2, vocabulary_file_2, embedding_matrix_file_2, + checkpoint_path_2) + encodings = manager.encode(data) +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections + + +import numpy as np +import tensorflow as tf + +from skip_thoughts import skip_thoughts_encoder + + +class EncoderManager(object): + """Manager class for loading and encoding with skip-thoughts models.""" + + def __init__(self): + self.encoders = [] + self.sessions = [] + + def load_model(self, model_config, vocabulary_file, embedding_matrix_file, + checkpoint_path): + """Loads a skip-thoughts model. + + Args: + model_config: Object containing parameters for building the model. + vocabulary_file: Path to vocabulary file containing a list of newline- + separated words where the word id is the corresponding 0-based index in + the file. + embedding_matrix_file: Path to a serialized numpy array of shape + [vocab_size, embedding_dim]. + checkpoint_path: SkipThoughtsModel checkpoint file or a directory + containing a checkpoint file. + """ + tf.logging.info("Reading vocabulary from %s", vocabulary_file) + with tf.gfile.GFile(vocabulary_file, mode="r") as f: + lines = list(f.readlines()) + reverse_vocab = [line.decode("utf-8").strip() for line in lines] + tf.logging.info("Loaded vocabulary with %d words.", len(reverse_vocab)) + + tf.logging.info("Loading embedding matrix from %s", embedding_matrix_file) + # Note: tf.gfile.GFile doesn't work here because np.load() calls f.seek() + # with 3 arguments. + with open(embedding_matrix_file, "r") as f: + embedding_matrix = np.load(f) + tf.logging.info("Loaded embedding matrix with shape %s", + embedding_matrix.shape) + + word_embeddings = collections.OrderedDict( + zip(reverse_vocab, embedding_matrix)) + + g = tf.Graph() + with g.as_default(): + encoder = skip_thoughts_encoder.SkipThoughtsEncoder(word_embeddings) + restore_model = encoder.build_graph_from_config(model_config, + checkpoint_path) + + sess = tf.Session(graph=g) + restore_model(sess) + + self.encoders.append(encoder) + self.sessions.append(sess) + + def encode(self, + data, + use_norm=True, + verbose=False, + batch_size=128, + use_eos=False): + """Encodes a sequence of sentences as skip-thought vectors. + + Args: + data: A list of input strings. + use_norm: If True, normalize output skip-thought vectors to unit L2 norm. + verbose: Whether to log every batch. + batch_size: Batch size for the RNN encoders. + use_eos: If True, append the end-of-sentence word to each input sentence. + + Returns: + thought_vectors: A list of numpy arrays corresponding to 'data'. + + Raises: + ValueError: If called before calling load_encoder. + """ + if not self.encoders: + raise ValueError( + "Must call load_model at least once before calling encode.") + + encoded = [] + for encoder, sess in zip(self.encoders, self.sessions): + encoded.append( + np.array( + encoder.encode( + sess, + data, + use_norm=use_norm, + verbose=verbose, + batch_size=batch_size, + use_eos=use_eos))) + + return np.concatenate(encoded, axis=1) + + def close(self): + """Closes the active TensorFlow Sessions.""" + for sess in self.sessions: + sess.close() diff --git a/skip_thoughts/skip_thoughts/evaluate.py b/skip_thoughts/skip_thoughts/evaluate.py new file mode 100644 index 0000000000000000000000000000000000000000..e840d9da9f5c2e7e223669388ef0f43ed4f63398 --- /dev/null +++ b/skip_thoughts/skip_thoughts/evaluate.py @@ -0,0 +1,117 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Script to evaluate a skip-thoughts model. + +This script can evaluate a model with a unidirectional encoder ("uni-skip" in +the paper); or a model with a bidirectional encoder ("bi-skip"); or the +combination of a model with a unidirectional encoder and a model with a +bidirectional encoder ("combine-skip"). + +The uni-skip model (if it exists) is specified by the flags +--uni_vocab_file, --uni_embeddings_file, --uni_checkpoint_path. + +The bi-skip model (if it exists) is specified by the flags +--bi_vocab_file, --bi_embeddings_path, --bi_checkpoint_path. + +The evaluation tasks have different running times. SICK may take 5-10 minutes. +MSRP, TREC and CR may take 20-60 minutes. SUBJ, MPQA and MR may take 2+ hours. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +from skipthoughts import eval_classification +from skipthoughts import eval_msrp +from skipthoughts import eval_sick +from skipthoughts import eval_trec +import tensorflow as tf + +from skip_thoughts import configuration +from skip_thoughts import encoder_manager + +FLAGS = tf.flags.FLAGS + +tf.flags.DEFINE_string("eval_task", "CR", + "Name of the evaluation task to run. Available tasks: " + "MR, CR, SUBJ, MPQA, SICK, MSRP, TREC.") + +tf.flags.DEFINE_string("data_dir", None, "Directory containing training data.") + +tf.flags.DEFINE_string("uni_vocab_file", None, + "Path to vocabulary file containing a list of newline-" + "separated words where the word id is the " + "corresponding 0-based index in the file.") +tf.flags.DEFINE_string("bi_vocab_file", None, + "Path to vocabulary file containing a list of newline-" + "separated words where the word id is the " + "corresponding 0-based index in the file.") + +tf.flags.DEFINE_string("uni_embeddings_file", None, + "Path to serialized numpy array of shape " + "[vocab_size, embedding_dim].") +tf.flags.DEFINE_string("bi_embeddings_file", None, + "Path to serialized numpy array of shape " + "[vocab_size, embedding_dim].") + +tf.flags.DEFINE_string("uni_checkpoint_path", None, + "Checkpoint file or directory containing a checkpoint " + "file.") +tf.flags.DEFINE_string("bi_checkpoint_path", None, + "Checkpoint file or directory containing a checkpoint " + "file.") + +tf.logging.set_verbosity(tf.logging.INFO) + + +def main(unused_argv): + if not FLAGS.data_dir: + raise ValueError("--data_dir is required.") + + encoder = encoder_manager.EncoderManager() + + # Maybe load unidirectional encoder. + if FLAGS.uni_checkpoint_path: + print("Loading unidirectional model...") + uni_config = configuration.model_config() + encoder.load_model(uni_config, FLAGS.uni_vocab_file, + FLAGS.uni_embeddings_file, FLAGS.uni_checkpoint_path) + + # Maybe load bidirectional encoder. + if FLAGS.bi_checkpoint_path: + print("Loading bidirectional model...") + bi_config = configuration.model_config(bidirectional_encoder=True) + encoder.load_model(bi_config, FLAGS.bi_vocab_file, FLAGS.bi_embeddings_file, + FLAGS.bi_checkpoint_path) + + if FLAGS.eval_task in ["MR", "CR", "SUBJ", "MPQA"]: + eval_classification.eval_nested_kfold( + encoder, FLAGS.eval_task, FLAGS.data_dir, use_nb=False) + elif FLAGS.eval_task == "SICK": + eval_sick.evaluate(encoder, evaltest=True, loc=FLAGS.data_dir) + elif FLAGS.eval_task == "MSRP": + eval_msrp.evaluate( + encoder, evalcv=True, evaltest=True, use_feats=True, loc=FLAGS.data_dir) + elif FLAGS.eval_task == "TREC": + eval_trec.evaluate(encoder, evalcv=True, evaltest=True, loc=FLAGS.data_dir) + else: + raise ValueError("Unrecognized eval_task: %s" % FLAGS.eval_task) + + encoder.close() + + +if __name__ == "__main__": + tf.app.run() diff --git a/skip_thoughts/skip_thoughts/ops/BUILD b/skip_thoughts/skip_thoughts/ops/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..4586e5a1c82e335dfacd6376b6d6d9ba4123c37b --- /dev/null +++ b/skip_thoughts/skip_thoughts/ops/BUILD @@ -0,0 +1,17 @@ +package(default_visibility = ["//skip_thoughts:internal"]) + +licenses(["notice"]) # Apache 2.0 + +exports_files(["LICENSE"]) + +py_library( + name = "input_ops", + srcs = ["input_ops.py"], + srcs_version = "PY2AND3", +) + +py_library( + name = "gru_cell", + srcs = ["gru_cell.py"], + srcs_version = "PY2AND3", +) diff --git a/skip_thoughts/skip_thoughts/ops/__init__.py b/skip_thoughts/skip_thoughts/ops/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/skip_thoughts/skip_thoughts/ops/gru_cell.py b/skip_thoughts/skip_thoughts/ops/gru_cell.py new file mode 100644 index 0000000000000000000000000000000000000000..c4bee46d3a9f5faf1ec060a3b21f66b4fe51d0c9 --- /dev/null +++ b/skip_thoughts/skip_thoughts/ops/gru_cell.py @@ -0,0 +1,134 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""GRU cell implementation for the skip-thought vectors model.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +import tensorflow as tf + +_layer_norm = tf.contrib.layers.layer_norm + + +class LayerNormGRUCell(tf.contrib.rnn.RNNCell): + """GRU cell with layer normalization. + + The layer normalization implementation is based on: + + https://arxiv.org/abs/1607.06450. + + "Layer Normalization" + Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton + """ + + def __init__(self, + num_units, + w_initializer, + u_initializer, + b_initializer, + activation=tf.nn.tanh): + """Initializes the cell. + + Args: + num_units: Number of cell units. + w_initializer: Initializer for the "W" (input) parameter matrices. + u_initializer: Initializer for the "U" (recurrent) parameter matrices. + b_initializer: Initializer for the "b" (bias) parameter vectors. + activation: Cell activation function. + """ + self._num_units = num_units + self._w_initializer = w_initializer + self._u_initializer = u_initializer + self._b_initializer = b_initializer + self._activation = activation + + @property + def state_size(self): + return self._num_units + + @property + def output_size(self): + return self._num_units + + def _w_h_initializer(self): + """Returns an initializer for the "W_h" parameter matrix. + + See equation (23) in the paper. The "W_h" parameter matrix is the + concatenation of two parameter submatrices. The matrix returned is + [U_z, U_r]. + + Returns: + A Tensor with shape [num_units, 2 * num_units] as described above. + """ + + def _initializer(shape, dtype=tf.float32, partition_info=None): + num_units = self._num_units + assert shape == [num_units, 2 * num_units] + u_z = self._u_initializer([num_units, num_units], dtype, partition_info) + u_r = self._u_initializer([num_units, num_units], dtype, partition_info) + return tf.concat([u_z, u_r], 1) + + return _initializer + + def _w_x_initializer(self, input_dim): + """Returns an initializer for the "W_x" parameter matrix. + + See equation (23) in the paper. The "W_x" parameter matrix is the + concatenation of two parameter submatrices. The matrix returned is + [W_z, W_r]. + + Args: + input_dim: The dimension of the cell inputs. + + Returns: + A Tensor with shape [input_dim, 2 * num_units] as described above. + """ + + def _initializer(shape, dtype=tf.float32, partition_info=None): + num_units = self._num_units + assert shape == [input_dim, 2 * num_units] + w_z = self._w_initializer([input_dim, num_units], dtype, partition_info) + w_r = self._w_initializer([input_dim, num_units], dtype, partition_info) + return tf.concat([w_z, w_r], 1) + + return _initializer + + def __call__(self, inputs, state, scope=None): + """GRU cell with layer normalization.""" + input_dim = inputs.get_shape().as_list()[1] + num_units = self._num_units + + with tf.variable_scope(scope or "gru_cell"): + with tf.variable_scope("gates"): + w_h = tf.get_variable( + "w_h", [num_units, 2 * num_units], + initializer=self._w_h_initializer()) + w_x = tf.get_variable( + "w_x", [input_dim, 2 * num_units], + initializer=self._w_x_initializer(input_dim)) + z_and_r = (_layer_norm(tf.matmul(state, w_h), scope="layer_norm/w_h") + + _layer_norm(tf.matmul(inputs, w_x), scope="layer_norm/w_x")) + z, r = tf.split(tf.sigmoid(z_and_r), 2, 1) + with tf.variable_scope("candidate"): + w = tf.get_variable( + "w", [input_dim, num_units], initializer=self._w_initializer) + u = tf.get_variable( + "u", [num_units, num_units], initializer=self._u_initializer) + h_hat = (r * _layer_norm(tf.matmul(state, u), scope="layer_norm/u") + + _layer_norm(tf.matmul(inputs, w), scope="layer_norm/w")) + new_h = (1 - z) * state + z * self._activation(h_hat) + return new_h, new_h diff --git a/skip_thoughts/skip_thoughts/ops/input_ops.py b/skip_thoughts/skip_thoughts/ops/input_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..51b03fc5da335b78977d5c1b9234160f1c240e53 --- /dev/null +++ b/skip_thoughts/skip_thoughts/ops/input_ops.py @@ -0,0 +1,118 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Input ops.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections + + +import tensorflow as tf + +# A SentenceBatch is a pair of Tensors: +# ids: Batch of input sentences represented as sequences of word ids: an int64 +# Tensor with shape [batch_size, padded_length]. +# mask: Boolean mask distinguishing real words (1) from padded words (0): an +# int32 Tensor with shape [batch_size, padded_length]. +SentenceBatch = collections.namedtuple("SentenceBatch", ("ids", "mask")) + + +def parse_example_batch(serialized): + """Parses a batch of tf.Example protos. + + Args: + serialized: A 1-D string Tensor; a batch of serialized tf.Example protos. + Returns: + encode: A SentenceBatch of encode sentences. + decode_pre: A SentenceBatch of "previous" sentences to decode. + decode_post: A SentenceBatch of "post" sentences to decode. + """ + features = tf.parse_example( + serialized, + features={ + "encode": tf.VarLenFeature(dtype=tf.int64), + "decode_pre": tf.VarLenFeature(dtype=tf.int64), + "decode_post": tf.VarLenFeature(dtype=tf.int64), + }) + + def _sparse_to_batch(sparse): + ids = tf.sparse_tensor_to_dense(sparse) # Padding with zeroes. + mask = tf.sparse_to_dense(sparse.indices, sparse.dense_shape, + tf.ones_like(sparse.values, dtype=tf.int32)) + return SentenceBatch(ids=ids, mask=mask) + + output_names = ("encode", "decode_pre", "decode_post") + return tuple(_sparse_to_batch(features[x]) for x in output_names) + + +def prefetch_input_data(reader, + file_pattern, + shuffle, + capacity, + num_reader_threads=1): + """Prefetches string values from disk into an input queue. + + Args: + reader: Instance of tf.ReaderBase. + file_pattern: Comma-separated list of file patterns (e.g. + "/tmp/train_data-?????-of-00100", where '?' acts as a wildcard that + matches any character). + shuffle: Boolean; whether to randomly shuffle the input data. + capacity: Queue capacity (number of records). + num_reader_threads: Number of reader threads feeding into the queue. + + Returns: + A Queue containing prefetched string values. + """ + data_files = [] + for pattern in file_pattern.split(","): + data_files.extend(tf.gfile.Glob(pattern)) + if not data_files: + tf.logging.fatal("Found no input files matching %s", file_pattern) + else: + tf.logging.info("Prefetching values from %d files matching %s", + len(data_files), file_pattern) + + filename_queue = tf.train.string_input_producer( + data_files, shuffle=shuffle, capacity=16, name="filename_queue") + + if shuffle: + min_after_dequeue = int(0.6 * capacity) + values_queue = tf.RandomShuffleQueue( + capacity=capacity, + min_after_dequeue=min_after_dequeue, + dtypes=[tf.string], + shapes=[[]], + name="random_input_queue") + else: + values_queue = tf.FIFOQueue( + capacity=capacity, + dtypes=[tf.string], + shapes=[[]], + name="fifo_input_queue") + + enqueue_ops = [] + for _ in range(num_reader_threads): + _, value = reader.read(filename_queue) + enqueue_ops.append(values_queue.enqueue([value])) + tf.train.queue_runner.add_queue_runner( + tf.train.queue_runner.QueueRunner(values_queue, enqueue_ops)) + tf.summary.scalar("queue/%s/fraction_of_%d_full" % (values_queue.name, + capacity), + tf.cast(values_queue.size(), tf.float32) * (1.0 / capacity)) + + return values_queue diff --git a/skip_thoughts/skip_thoughts/skip_thoughts_encoder.py b/skip_thoughts/skip_thoughts/skip_thoughts_encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..79c47c58813feb72f1b9bdb5c2f7bd7956f015c8 --- /dev/null +++ b/skip_thoughts/skip_thoughts/skip_thoughts_encoder.py @@ -0,0 +1,258 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Class for encoding text using a trained SkipThoughtsModel. + +Example usage: + g = tf.Graph() + with g.as_default(): + encoder = SkipThoughtsEncoder(embeddings) + restore_fn = encoder.build_graph_from_config(model_config, checkpoint_path) + + with tf.Session(graph=g) as sess: + restore_fn(sess) + skip_thought_vectors = encoder.encode(sess, data) +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import os.path + + +import nltk +import nltk.tokenize +import numpy as np +import tensorflow as tf + +from skip_thoughts import skip_thoughts_model +from skip_thoughts.data import special_words + + +def _pad(seq, target_len): + """Pads a sequence of word embeddings up to the target length. + + Args: + seq: Sequence of word embeddings. + target_len: Desired padded sequence length. + + Returns: + embeddings: Input sequence padded with zero embeddings up to the target + length. + mask: A 0/1 vector with zeros corresponding to padded embeddings. + + Raises: + ValueError: If len(seq) is not in the interval (0, target_len]. + """ + seq_len = len(seq) + if seq_len <= 0 or seq_len > target_len: + raise ValueError("Expected 0 < len(seq) <= %d, got %d" % (target_len, + seq_len)) + + emb_dim = seq[0].shape[0] + padded_seq = np.zeros(shape=(target_len, emb_dim), dtype=seq[0].dtype) + mask = np.zeros(shape=(target_len,), dtype=np.int8) + for i in range(seq_len): + padded_seq[i] = seq[i] + mask[i] = 1 + return padded_seq, mask + + +def _batch_and_pad(sequences): + """Batches and pads sequences of word embeddings into a 2D array. + + Args: + sequences: A list of batch_size sequences of word embeddings. + + Returns: + embeddings: A numpy array with shape [batch_size, padded_length, emb_dim]. + mask: A numpy 0/1 array with shape [batch_size, padded_length] with zeros + corresponding to padded elements. + """ + batch_embeddings = [] + batch_mask = [] + batch_len = max([len(seq) for seq in sequences]) + for seq in sequences: + embeddings, mask = _pad(seq, batch_len) + batch_embeddings.append(embeddings) + batch_mask.append(mask) + return np.array(batch_embeddings), np.array(batch_mask) + + +class SkipThoughtsEncoder(object): + """Skip-thoughts sentence encoder.""" + + def __init__(self, embeddings): + """Initializes the encoder. + + Args: + embeddings: Dictionary of word to embedding vector (1D numpy array). + """ + self._sentence_detector = nltk.data.load("tokenizers/punkt/english.pickle") + self._embeddings = embeddings + + def _create_restore_fn(self, checkpoint_path, saver): + """Creates a function that restores a model from checkpoint. + + Args: + checkpoint_path: Checkpoint file or a directory containing a checkpoint + file. + saver: Saver for restoring variables from the checkpoint file. + + Returns: + restore_fn: A function such that restore_fn(sess) loads model variables + from the checkpoint file. + + Raises: + ValueError: If checkpoint_path does not refer to a checkpoint file or a + directory containing a checkpoint file. + """ + if tf.gfile.IsDirectory(checkpoint_path): + latest_checkpoint = tf.train.latest_checkpoint(checkpoint_path) + if not latest_checkpoint: + raise ValueError("No checkpoint file found in: %s" % checkpoint_path) + checkpoint_path = latest_checkpoint + + def _restore_fn(sess): + tf.logging.info("Loading model from checkpoint: %s", checkpoint_path) + saver.restore(sess, checkpoint_path) + tf.logging.info("Successfully loaded checkpoint: %s", + os.path.basename(checkpoint_path)) + + return _restore_fn + + def build_graph_from_config(self, model_config, checkpoint_path): + """Builds the inference graph from a configuration object. + + Args: + model_config: Object containing configuration for building the model. + checkpoint_path: Checkpoint file or a directory containing a checkpoint + file. + + Returns: + restore_fn: A function such that restore_fn(sess) loads model variables + from the checkpoint file. + """ + tf.logging.info("Building model.") + model = skip_thoughts_model.SkipThoughtsModel(model_config, mode="encode") + model.build() + saver = tf.train.Saver() + + return self._create_restore_fn(checkpoint_path, saver) + + def build_graph_from_proto(self, graph_def_file, saver_def_file, + checkpoint_path): + """Builds the inference graph from serialized GraphDef and SaverDef protos. + + Args: + graph_def_file: File containing a serialized GraphDef proto. + saver_def_file: File containing a serialized SaverDef proto. + checkpoint_path: Checkpoint file or a directory containing a checkpoint + file. + + Returns: + restore_fn: A function such that restore_fn(sess) loads model variables + from the checkpoint file. + """ + # Load the Graph. + tf.logging.info("Loading GraphDef from file: %s", graph_def_file) + graph_def = tf.GraphDef() + with tf.gfile.FastGFile(graph_def_file, "rb") as f: + graph_def.ParseFromString(f.read()) + tf.import_graph_def(graph_def, name="") + + # Load the Saver. + tf.logging.info("Loading SaverDef from file: %s", saver_def_file) + saver_def = tf.train.SaverDef() + with tf.gfile.FastGFile(saver_def_file, "rb") as f: + saver_def.ParseFromString(f.read()) + saver = tf.train.Saver(saver_def=saver_def) + + return self._create_restore_fn(checkpoint_path, saver) + + def _tokenize(self, item): + """Tokenizes an input string into a list of words.""" + tokenized = [] + for s in self._sentence_detector.tokenize(item): + tokenized.extend(nltk.tokenize.word_tokenize(s)) + + return tokenized + + def _word_to_embedding(self, w): + """Returns the embedding of a word.""" + return self._embeddings.get(w, self._embeddings[special_words.UNK]) + + def _preprocess(self, data, use_eos): + """Preprocesses text for the encoder. + + Args: + data: A list of input strings. + use_eos: Whether to append the end-of-sentence word to each sentence. + + Returns: + embeddings: A list of word embedding sequences corresponding to the input + strings. + """ + preprocessed_data = [] + for item in data: + tokenized = self._tokenize(item) + if use_eos: + tokenized.append(special_words.EOS) + preprocessed_data.append([self._word_to_embedding(w) for w in tokenized]) + return preprocessed_data + + def encode(self, + sess, + data, + use_norm=True, + verbose=True, + batch_size=128, + use_eos=False): + """Encodes a sequence of sentences as skip-thought vectors. + + Args: + sess: TensorFlow Session. + data: A list of input strings. + use_norm: Whether to normalize skip-thought vectors to unit L2 norm. + verbose: Whether to log every batch. + batch_size: Batch size for the encoder. + use_eos: Whether to append the end-of-sentence word to each input + sentence. + + Returns: + thought_vectors: A list of numpy arrays corresponding to the skip-thought + encodings of sentences in 'data'. + """ + data = self._preprocess(data, use_eos) + thought_vectors = [] + + batch_indices = np.arange(0, len(data), batch_size) + for batch, start_index in enumerate(batch_indices): + if verbose: + tf.logging.info("Batch %d / %d.", batch, len(batch_indices)) + + embeddings, mask = _batch_and_pad( + data[start_index:start_index + batch_size]) + feed_dict = { + "encode_emb:0": embeddings, + "encode_mask:0": mask, + } + thought_vectors.extend( + sess.run("encoder/thought_vectors:0", feed_dict=feed_dict)) + + if use_norm: + thought_vectors = [v / np.linalg.norm(v) for v in thought_vectors] + + return thought_vectors diff --git a/skip_thoughts/skip_thoughts/skip_thoughts_model.py b/skip_thoughts/skip_thoughts/skip_thoughts_model.py new file mode 100644 index 0000000000000000000000000000000000000000..9a9a43a4fed0dbbb03affd26ffa1c635c386aa55 --- /dev/null +++ b/skip_thoughts/skip_thoughts/skip_thoughts_model.py @@ -0,0 +1,369 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Skip-Thoughts model for learning sentence vectors. + +The model is based on the paper: + + "Skip-Thought Vectors" + Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, + Antonio Torralba, Raquel Urtasun, Sanja Fidler. + https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf + +Layer normalization is applied based on the paper: + + "Layer Normalization" + Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton + https://arxiv.org/abs/1607.06450 +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +import tensorflow as tf + +from skip_thoughts.ops import gru_cell +from skip_thoughts.ops import input_ops + + +def random_orthonormal_initializer(shape, dtype=tf.float32, + partition_info=None): # pylint: disable=unused-argument + """Variable initializer that produces a random orthonormal matrix.""" + if len(shape) != 2 or shape[0] != shape[1]: + raise ValueError("Expecting square shape, got %s" % shape) + _, u, _ = tf.svd(tf.random_normal(shape, dtype=dtype), full_matrices=True) + return u + + +class SkipThoughtsModel(object): + """Skip-thoughts model.""" + + def __init__(self, config, mode="train", input_reader=None): + """Basic setup. The actual TensorFlow graph is constructed in build(). + + Args: + config: Object containing configuration parameters. + mode: "train", "eval" or "encode". + input_reader: Subclass of tf.ReaderBase for reading the input serialized + tf.Example protocol buffers. Defaults to TFRecordReader. + + Raises: + ValueError: If mode is invalid. + """ + if mode not in ["train", "eval", "encode"]: + raise ValueError("Unrecognized mode: %s" % mode) + + self.config = config + self.mode = mode + self.reader = input_reader if input_reader else tf.TFRecordReader() + + # Initializer used for non-recurrent weights. + self.uniform_initializer = tf.random_uniform_initializer( + minval=-self.config.uniform_init_scale, + maxval=self.config.uniform_init_scale) + + # Input sentences represented as sequences of word ids. "encode" is the + # source sentence, "decode_pre" is the previous sentence and "decode_post" + # is the next sentence. + # Each is an int64 Tensor with shape [batch_size, padded_length]. + self.encode_ids = None + self.decode_pre_ids = None + self.decode_post_ids = None + + # Boolean masks distinguishing real words (1) from padded words (0). + # Each is an int32 Tensor with shape [batch_size, padded_length]. + self.encode_mask = None + self.decode_pre_mask = None + self.decode_post_mask = None + + # Input sentences represented as sequences of word embeddings. + # Each is a float32 Tensor with shape [batch_size, padded_length, emb_dim]. + self.encode_emb = None + self.decode_pre_emb = None + self.decode_post_emb = None + + # The output from the sentence encoder. + # A float32 Tensor with shape [batch_size, num_gru_units]. + self.thought_vectors = None + + # The cross entropy losses and corresponding weights of the decoders. Used + # for evaluation. + self.target_cross_entropy_losses = [] + self.target_cross_entropy_loss_weights = [] + + # The total loss to optimize. + self.total_loss = None + + def build_inputs(self): + """Builds the ops for reading input data. + + Outputs: + self.encode_ids + self.decode_pre_ids + self.decode_post_ids + self.encode_mask + self.decode_pre_mask + self.decode_post_mask + """ + if self.mode == "encode": + # Word embeddings are fed from an external vocabulary which has possibly + # been expanded (see vocabulary_expansion.py). + encode_ids = None + decode_pre_ids = None + decode_post_ids = None + encode_mask = tf.placeholder(tf.int8, (None, None), name="encode_mask") + decode_pre_mask = None + decode_post_mask = None + else: + # Prefetch serialized tf.Example protos. + input_queue = input_ops.prefetch_input_data( + self.reader, + self.config.input_file_pattern, + shuffle=self.config.shuffle_input_data, + capacity=self.config.input_queue_capacity, + num_reader_threads=self.config.num_input_reader_threads) + + # Deserialize a batch. + serialized = input_queue.dequeue_many(self.config.batch_size) + encode, decode_pre, decode_post = input_ops.parse_example_batch( + serialized) + + encode_ids = encode.ids + decode_pre_ids = decode_pre.ids + decode_post_ids = decode_post.ids + + encode_mask = encode.mask + decode_pre_mask = decode_pre.mask + decode_post_mask = decode_post.mask + + self.encode_ids = encode_ids + self.decode_pre_ids = decode_pre_ids + self.decode_post_ids = decode_post_ids + + self.encode_mask = encode_mask + self.decode_pre_mask = decode_pre_mask + self.decode_post_mask = decode_post_mask + + def build_word_embeddings(self): + """Builds the word embeddings. + + Inputs: + self.encode_ids + self.decode_pre_ids + self.decode_post_ids + + Outputs: + self.encode_emb + self.decode_pre_emb + self.decode_post_emb + """ + if self.mode == "encode": + # Word embeddings are fed from an external vocabulary which has possibly + # been expanded (see vocabulary_expansion.py). + encode_emb = tf.placeholder(tf.float32, ( + None, None, self.config.word_embedding_dim), "encode_emb") + # No sequences to decode. + decode_pre_emb = None + decode_post_emb = None + else: + word_emb = tf.get_variable( + name="word_embedding", + shape=[self.config.vocab_size, self.config.word_embedding_dim], + initializer=self.uniform_initializer) + + encode_emb = tf.nn.embedding_lookup(word_emb, self.encode_ids) + decode_pre_emb = tf.nn.embedding_lookup(word_emb, self.decode_pre_ids) + decode_post_emb = tf.nn.embedding_lookup(word_emb, self.decode_post_ids) + + self.encode_emb = encode_emb + self.decode_pre_emb = decode_pre_emb + self.decode_post_emb = decode_post_emb + + def _initialize_gru_cell(self, num_units): + """Initializes a GRU cell. + + The Variables of the GRU cell are initialized in a way that exactly matches + the skip-thoughts paper: recurrent weights are initialized from random + orthonormal matrices and non-recurrent weights are initialized from random + uniform matrices. + + Args: + num_units: Number of output units. + + Returns: + cell: An instance of RNNCell with variable initializers that match the + skip-thoughts paper. + """ + return gru_cell.LayerNormGRUCell( + num_units, + w_initializer=self.uniform_initializer, + u_initializer=random_orthonormal_initializer, + b_initializer=tf.constant_initializer(0.0)) + + def build_encoder(self): + """Builds the sentence encoder. + + Inputs: + self.encode_emb + self.encode_mask + + Outputs: + self.thought_vectors + + Raises: + ValueError: if config.bidirectional_encoder is True and config.encoder_dim + is odd. + """ + with tf.variable_scope("encoder") as scope: + length = tf.to_int32(tf.reduce_sum(self.encode_mask, 1), name="length") + + if self.config.bidirectional_encoder: + if self.config.encoder_dim % 2: + raise ValueError( + "encoder_dim must be even when using a bidirectional encoder.") + num_units = self.config.encoder_dim // 2 + cell_fw = self._initialize_gru_cell(num_units) # Forward encoder + cell_bw = self._initialize_gru_cell(num_units) # Backward encoder + _, states = tf.nn.bidirectional_dynamic_rnn( + cell_fw=cell_fw, + cell_bw=cell_bw, + inputs=self.encode_emb, + sequence_length=length, + dtype=tf.float32, + scope=scope) + thought_vectors = tf.concat(states, 1, name="thought_vectors") + else: + cell = self._initialize_gru_cell(self.config.encoder_dim) + _, state = tf.nn.dynamic_rnn( + cell=cell, + inputs=self.encode_emb, + sequence_length=length, + dtype=tf.float32, + scope=scope) + # Use an identity operation to name the Tensor in the Graph. + thought_vectors = tf.identity(state, name="thought_vectors") + + self.thought_vectors = thought_vectors + + def _build_decoder(self, name, embeddings, targets, mask, initial_state, + reuse_logits): + """Builds a sentence decoder. + + Args: + name: Decoder name. + embeddings: Batch of sentences to decode; a float32 Tensor with shape + [batch_size, padded_length, emb_dim]. + targets: Batch of target word ids; an int64 Tensor with shape + [batch_size, padded_length]. + mask: A 0/1 Tensor with shape [batch_size, padded_length]. + initial_state: Initial state of the GRU. A float32 Tensor with shape + [batch_size, num_gru_cells]. + reuse_logits: Whether to reuse the logits weights. + """ + # Decoder RNN. + cell = self._initialize_gru_cell(self.config.encoder_dim) + with tf.variable_scope(name) as scope: + # Add a padding word at the start of each sentence (to correspond to the + # prediction of the first word) and remove the last word. + decoder_input = tf.pad( + embeddings[:, :-1, :], [[0, 0], [1, 0], [0, 0]], name="input") + length = tf.reduce_sum(mask, 1, name="length") + decoder_output, _ = tf.nn.dynamic_rnn( + cell=cell, + inputs=decoder_input, + sequence_length=length, + initial_state=initial_state, + scope=scope) + + # Stack batch vertically. + decoder_output = tf.reshape(decoder_output, [-1, self.config.encoder_dim]) + targets = tf.reshape(targets, [-1]) + weights = tf.to_float(tf.reshape(mask, [-1])) + + # Logits. + with tf.variable_scope("logits", reuse=reuse_logits) as scope: + logits = tf.contrib.layers.fully_connected( + inputs=decoder_output, + num_outputs=self.config.vocab_size, + activation_fn=None, + weights_initializer=self.uniform_initializer, + scope=scope) + + losses = tf.nn.sparse_softmax_cross_entropy_with_logits( + labels=targets, logits=logits) + batch_loss = tf.reduce_sum(losses * weights) + tf.losses.add_loss(batch_loss) + + tf.summary.scalar("losses/" + name, batch_loss) + + self.target_cross_entropy_losses.append(losses) + self.target_cross_entropy_loss_weights.append(weights) + + def build_decoders(self): + """Builds the sentence decoders. + + Inputs: + self.decode_pre_emb + self.decode_post_emb + self.decode_pre_ids + self.decode_post_ids + self.decode_pre_mask + self.decode_post_mask + self.thought_vectors + + Outputs: + self.target_cross_entropy_losses + self.target_cross_entropy_loss_weights + """ + if self.mode != "encode": + # Pre-sentence decoder. + self._build_decoder("decoder_pre", self.decode_pre_emb, + self.decode_pre_ids, self.decode_pre_mask, + self.thought_vectors, False) + + # Post-sentence decoder. Logits weights are reused. + self._build_decoder("decoder_post", self.decode_post_emb, + self.decode_post_ids, self.decode_post_mask, + self.thought_vectors, True) + + def build_loss(self): + """Builds the loss Tensor. + + Outputs: + self.total_loss + """ + if self.mode != "encode": + total_loss = tf.losses.get_total_loss() + tf.summary.scalar("losses/total", total_loss) + + self.total_loss = total_loss + + def build_global_step(self): + """Builds the global step Tensor. + + Outputs: + self.global_step + """ + self.global_step = tf.contrib.framework.create_global_step() + + def build(self): + """Creates all ops for training, evaluation or encoding.""" + self.build_inputs() + self.build_word_embeddings() + self.build_encoder() + self.build_decoders() + self.build_loss() + self.build_global_step() diff --git a/skip_thoughts/skip_thoughts/skip_thoughts_model_test.py b/skip_thoughts/skip_thoughts/skip_thoughts_model_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7bd64326d9d9cdcaae11d74ac8831adac915dfe2 --- /dev/null +++ b/skip_thoughts/skip_thoughts/skip_thoughts_model_test.py @@ -0,0 +1,191 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for tensorflow_models.skip_thoughts.skip_thoughts_model.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +import numpy as np +import tensorflow as tf + +from skip_thoughts import configuration +from skip_thoughts import skip_thoughts_model + + +class SkipThoughtsModel(skip_thoughts_model.SkipThoughtsModel): + """Subclass of SkipThoughtsModel without the disk I/O.""" + + def build_inputs(self): + if self.mode == "encode": + # Encode mode doesn't read from disk, so defer to parent. + return super(SkipThoughtsModel, self).build_inputs() + else: + # Replace disk I/O with random Tensors. + self.encode_ids = tf.random_uniform( + [self.config.batch_size, 15], + minval=0, + maxval=self.config.vocab_size, + dtype=tf.int64) + self.decode_pre_ids = tf.random_uniform( + [self.config.batch_size, 15], + minval=0, + maxval=self.config.vocab_size, + dtype=tf.int64) + self.decode_post_ids = tf.random_uniform( + [self.config.batch_size, 15], + minval=0, + maxval=self.config.vocab_size, + dtype=tf.int64) + self.encode_mask = tf.ones_like(self.encode_ids) + self.decode_pre_mask = tf.ones_like(self.decode_pre_ids) + self.decode_post_mask = tf.ones_like(self.decode_post_ids) + + +class SkipThoughtsModelTest(tf.test.TestCase): + + def setUp(self): + super(SkipThoughtsModelTest, self).setUp() + self._model_config = configuration.model_config() + + def _countModelParameters(self): + """Counts the number of parameters in the model at top level scope.""" + counter = {} + for v in tf.global_variables(): + name = v.op.name.split("/")[0] + num_params = v.get_shape().num_elements() + if not num_params: + self.fail("Could not infer num_elements from Variable %s" % v.op.name) + counter[name] = counter.get(name, 0) + num_params + return counter + + def _checkModelParameters(self): + """Verifies the number of parameters in the model.""" + param_counts = self._countModelParameters() + expected_param_counts = { + # vocab_size * embedding_size + "word_embedding": 12400000, + # GRU Cells + "encoder": 21772800, + "decoder_pre": 21772800, + "decoder_post": 21772800, + # (encoder_dim + 1) * vocab_size + "logits": 48020000, + "global_step": 1, + } + self.assertDictEqual(expected_param_counts, param_counts) + + def _checkOutputs(self, expected_shapes, feed_dict=None): + """Verifies that the model produces expected outputs. + + Args: + expected_shapes: A dict mapping Tensor or Tensor name to expected output + shape. + feed_dict: Values of Tensors to feed into Session.run(). + """ + fetches = expected_shapes.keys() + + with self.test_session() as sess: + sess.run(tf.global_variables_initializer()) + outputs = sess.run(fetches, feed_dict) + + for index, output in enumerate(outputs): + tensor = fetches[index] + expected = expected_shapes[tensor] + actual = output.shape + if expected != actual: + self.fail("Tensor %s has shape %s (expected %s)." % (tensor, actual, + expected)) + + def testBuildForTraining(self): + model = SkipThoughtsModel(self._model_config, mode="train") + model.build() + + self._checkModelParameters() + + expected_shapes = { + # [batch_size, length] + model.encode_ids: (128, 15), + model.decode_pre_ids: (128, 15), + model.decode_post_ids: (128, 15), + model.encode_mask: (128, 15), + model.decode_pre_mask: (128, 15), + model.decode_post_mask: (128, 15), + # [batch_size, length, word_embedding_dim] + model.encode_emb: (128, 15, 620), + model.decode_pre_emb: (128, 15, 620), + model.decode_post_emb: (128, 15, 620), + # [batch_size, encoder_dim] + model.thought_vectors: (128, 2400), + # [batch_size * length] + model.target_cross_entropy_losses[0]: (1920,), + model.target_cross_entropy_losses[1]: (1920,), + # [batch_size * length] + model.target_cross_entropy_loss_weights[0]: (1920,), + model.target_cross_entropy_loss_weights[1]: (1920,), + # Scalar + model.total_loss: (), + } + self._checkOutputs(expected_shapes) + + def testBuildForEval(self): + model = SkipThoughtsModel(self._model_config, mode="eval") + model.build() + + self._checkModelParameters() + + expected_shapes = { + # [batch_size, length] + model.encode_ids: (128, 15), + model.decode_pre_ids: (128, 15), + model.decode_post_ids: (128, 15), + model.encode_mask: (128, 15), + model.decode_pre_mask: (128, 15), + model.decode_post_mask: (128, 15), + # [batch_size, length, word_embedding_dim] + model.encode_emb: (128, 15, 620), + model.decode_pre_emb: (128, 15, 620), + model.decode_post_emb: (128, 15, 620), + # [batch_size, encoder_dim] + model.thought_vectors: (128, 2400), + # [batch_size * length] + model.target_cross_entropy_losses[0]: (1920,), + model.target_cross_entropy_losses[1]: (1920,), + # [batch_size * length] + model.target_cross_entropy_loss_weights[0]: (1920,), + model.target_cross_entropy_loss_weights[1]: (1920,), + # Scalar + model.total_loss: (), + } + self._checkOutputs(expected_shapes) + + def testBuildForEncode(self): + model = SkipThoughtsModel(self._model_config, mode="encode") + model.build() + + # Test feeding a batch of word embeddings to get skip thought vectors. + encode_emb = np.random.rand(64, 15, 620) + encode_mask = np.ones((64, 15), dtype=np.int64) + feed_dict = {model.encode_emb: encode_emb, model.encode_mask: encode_mask} + expected_shapes = { + # [batch_size, encoder_dim] + model.thought_vectors: (64, 2400), + } + self._checkOutputs(expected_shapes, feed_dict) + + +if __name__ == "__main__": + tf.test.main() diff --git a/skip_thoughts/skip_thoughts/track_perplexity.py b/skip_thoughts/skip_thoughts/track_perplexity.py new file mode 100644 index 0000000000000000000000000000000000000000..05d0e3324fc731790079168257388da95ce24ad3 --- /dev/null +++ b/skip_thoughts/skip_thoughts/track_perplexity.py @@ -0,0 +1,199 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tracks training progress via per-word perplexity. + +This script should be run concurrently with training so that summaries show up +in TensorBoard. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math +import os.path +import time + + +import numpy as np +import tensorflow as tf + +from skip_thoughts import configuration +from skip_thoughts import skip_thoughts_model + +FLAGS = tf.flags.FLAGS + +tf.flags.DEFINE_string("input_file_pattern", None, + "File pattern of sharded TFRecord input files.") +tf.flags.DEFINE_string("checkpoint_dir", None, + "Directory containing model checkpoints.") +tf.flags.DEFINE_string("eval_dir", None, "Directory to write event logs to.") + +tf.flags.DEFINE_integer("eval_interval_secs", 600, + "Interval between evaluation runs.") +tf.flags.DEFINE_integer("num_eval_examples", 50000, + "Number of examples for evaluation.") + +tf.flags.DEFINE_integer("min_global_step", 100, + "Minimum global step to run evaluation.") + +tf.logging.set_verbosity(tf.logging.INFO) + + +def evaluate_model(sess, losses, weights, num_batches, global_step, + summary_writer, summary_op): + """Computes perplexity-per-word over the evaluation dataset. + + Summaries and perplexity-per-word are written out to the eval directory. + + Args: + sess: Session object. + losses: A Tensor of any shape; the target cross entropy losses for the + current batch. + weights: A Tensor of weights corresponding to losses. + num_batches: Integer; the number of evaluation batches. + global_step: Integer; global step of the model checkpoint. + summary_writer: Instance of SummaryWriter. + summary_op: Op for generating model summaries. + """ + # Log model summaries on a single batch. + summary_str = sess.run(summary_op) + summary_writer.add_summary(summary_str, global_step) + + start_time = time.time() + sum_losses = 0.0 + sum_weights = 0.0 + for i in xrange(num_batches): + batch_losses, batch_weights = sess.run([losses, weights]) + sum_losses += np.sum(batch_losses * batch_weights) + sum_weights += np.sum(batch_weights) + if not i % 100: + tf.logging.info("Computed losses for %d of %d batches.", i + 1, + num_batches) + eval_time = time.time() - start_time + + perplexity = math.exp(sum_losses / sum_weights) + tf.logging.info("Perplexity = %f (%.2f sec)", perplexity, eval_time) + + # Log perplexity to the SummaryWriter. + summary = tf.Summary() + value = summary.value.add() + value.simple_value = perplexity + value.tag = "perplexity" + summary_writer.add_summary(summary, global_step) + + # Write the Events file to the eval directory. + summary_writer.flush() + tf.logging.info("Finished processing evaluation at global step %d.", + global_step) + + +def run_once(model, losses, weights, saver, summary_writer, summary_op): + """Evaluates the latest model checkpoint. + + Args: + model: Instance of SkipThoughtsModel; the model to evaluate. + losses: Tensor; the target cross entropy losses for the current batch. + weights: A Tensor of weights corresponding to losses. + saver: Instance of tf.train.Saver for restoring model Variables. + summary_writer: Instance of FileWriter. + summary_op: Op for generating model summaries. + """ + model_path = tf.train.latest_checkpoint(FLAGS.checkpoint_dir) + if not model_path: + tf.logging.info("Skipping evaluation. No checkpoint found in: %s", + FLAGS.checkpoint_dir) + return + + with tf.Session() as sess: + # Load model from checkpoint. + tf.logging.info("Loading model from checkpoint: %s", model_path) + saver.restore(sess, model_path) + global_step = tf.train.global_step(sess, model.global_step.name) + tf.logging.info("Successfully loaded %s at global step = %d.", + os.path.basename(model_path), global_step) + if global_step < FLAGS.min_global_step: + tf.logging.info("Skipping evaluation. Global step = %d < %d", global_step, + FLAGS.min_global_step) + return + + # Start the queue runners. + coord = tf.train.Coordinator() + threads = tf.train.start_queue_runners(coord=coord) + + num_eval_batches = int( + math.ceil(FLAGS.num_eval_examples / model.config.batch_size)) + + # Run evaluation on the latest checkpoint. + try: + evaluate_model(sess, losses, weights, num_eval_batches, global_step, + summary_writer, summary_op) + except tf.InvalidArgumentError: + tf.logging.error( + "Evaluation raised InvalidArgumentError (e.g. due to Nans).") + finally: + coord.request_stop() + coord.join(threads, stop_grace_period_secs=10) + + +def main(unused_argv): + if not FLAGS.input_file_pattern: + raise ValueError("--input_file_pattern is required.") + if not FLAGS.checkpoint_dir: + raise ValueError("--checkpoint_dir is required.") + if not FLAGS.eval_dir: + raise ValueError("--eval_dir is required.") + + # Create the evaluation directory if it doesn't exist. + eval_dir = FLAGS.eval_dir + if not tf.gfile.IsDirectory(eval_dir): + tf.logging.info("Creating eval directory: %s", eval_dir) + tf.gfile.MakeDirs(eval_dir) + + g = tf.Graph() + with g.as_default(): + # Build the model for evaluation. + model_config = configuration.model_config( + input_file_pattern=FLAGS.input_file_pattern, + input_queue_capacity=FLAGS.num_eval_examples, + shuffle_input_data=False) + model = skip_thoughts_model.SkipThoughtsModel(model_config, mode="eval") + model.build() + + losses = tf.concat(model.target_cross_entropy_losses, 0) + weights = tf.concat(model.target_cross_entropy_loss_weights, 0) + + # Create the Saver to restore model Variables. + saver = tf.train.Saver() + + # Create the summary operation and the summary writer. + summary_op = tf.summary.merge_all() + summary_writer = tf.summary.FileWriter(eval_dir) + + g.finalize() + + # Run a new evaluation run every eval_interval_secs. + while True: + start = time.time() + tf.logging.info("Starting evaluation at " + time.strftime( + "%Y-%m-%d-%H:%M:%S", time.localtime())) + run_once(model, losses, weights, saver, summary_writer, summary_op) + time_to_next_eval = start + FLAGS.eval_interval_secs - time.time() + if time_to_next_eval > 0: + time.sleep(time_to_next_eval) + + +if __name__ == "__main__": + tf.app.run() diff --git a/skip_thoughts/skip_thoughts/train.py b/skip_thoughts/skip_thoughts/train.py new file mode 100644 index 0000000000000000000000000000000000000000..445f31c5a8fe9d1c6084ccacb2109449839f1bd5 --- /dev/null +++ b/skip_thoughts/skip_thoughts/train.py @@ -0,0 +1,99 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Train the skip-thoughts model.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + + +import tensorflow as tf + +from skip_thoughts import configuration +from skip_thoughts import skip_thoughts_model + +FLAGS = tf.flags.FLAGS + +tf.flags.DEFINE_string("input_file_pattern", None, + "File pattern of sharded TFRecord files containing " + "tf.Example protos.") +tf.flags.DEFINE_string("train_dir", None, + "Directory for saving and loading checkpoints.") + +tf.logging.set_verbosity(tf.logging.INFO) + + +def _setup_learning_rate(config, global_step): + """Sets up the learning rate with optional exponential decay. + + Args: + config: Object containing learning rate configuration parameters. + global_step: Tensor; the global step. + + Returns: + learning_rate: Tensor; the learning rate with exponential decay. + """ + if config.learning_rate_decay_factor > 0: + learning_rate = tf.train.exponential_decay( + learning_rate=float(config.learning_rate), + global_step=global_step, + decay_steps=config.learning_rate_decay_steps, + decay_rate=config.learning_rate_decay_factor, + staircase=False) + else: + learning_rate = tf.constant(config.learning_rate) + return learning_rate + + +def main(unused_argv): + if not FLAGS.input_file_pattern: + raise ValueError("--input_file_pattern is required.") + if not FLAGS.train_dir: + raise ValueError("--train_dir is required.") + + model_config = configuration.model_config( + input_file_pattern=FLAGS.input_file_pattern) + training_config = configuration.training_config() + + tf.logging.info("Building training graph.") + g = tf.Graph() + with g.as_default(): + model = skip_thoughts_model.SkipThoughtsModel(model_config, mode="train") + model.build() + + learning_rate = _setup_learning_rate(training_config, model.global_step) + optimizer = tf.train.AdamOptimizer(learning_rate) + + train_tensor = tf.contrib.slim.learning.create_train_op( + total_loss=model.total_loss, + optimizer=optimizer, + global_step=model.global_step, + clip_gradient_norm=training_config.clip_gradient_norm) + + saver = tf.train.Saver() + + tf.contrib.slim.learning.train( + train_op=train_tensor, + logdir=FLAGS.train_dir, + graph=g, + global_step=model.global_step, + number_of_steps=training_config.number_of_steps, + save_summaries_secs=training_config.save_summaries_secs, + saver=saver, + save_interval_secs=training_config.save_model_secs) + + +if __name__ == "__main__": + tf.app.run() diff --git a/skip_thoughts/skip_thoughts/vocabulary_expansion.py b/skip_thoughts/skip_thoughts/vocabulary_expansion.py new file mode 100644 index 0000000000000000000000000000000000000000..43c7977fdb0366a9549f79d881d248e1547f077e --- /dev/null +++ b/skip_thoughts/skip_thoughts/vocabulary_expansion.py @@ -0,0 +1,203 @@ +# Copyright 2017 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Compute an expanded vocabulary of embeddings using a word2vec model. + +This script loads the word embeddings from a trained skip-thoughts model and +from a trained word2vec model (typically with a larger vocabulary). It trains a +linear regression model without regularization to learn a linear mapping from +the word2vec embedding space to the skip-thoughts embedding space. The model is +then applied to all words in the word2vec vocabulary, yielding vectors in the +skip-thoughts word embedding space for the union of the two vocabularies. + +The linear regression task is to learn a parameter matrix W to minimize + || X - Y * W ||^2, +where X is a matrix of skip-thoughts embeddings of shape [num_words, dim1], +Y is a matrix of word2vec embeddings of shape [num_words, dim2], and W is a +matrix of shape [dim2, dim1]. + +This is based on the "Translation Matrix" method from the paper: + + "Exploiting Similarities among Languages for Machine Translation" + Tomas Mikolov, Quoc V. Le, Ilya Sutskever + https://arxiv.org/abs/1309.4168 +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import os.path + + +import gensim.models +import numpy as np +import sklearn.linear_model +import tensorflow as tf + +FLAGS = tf.flags.FLAGS + +tf.flags.DEFINE_string("skip_thoughts_model", None, + "Checkpoint file or directory containing a checkpoint " + "file.") + +tf.flags.DEFINE_string("skip_thoughts_vocab", None, + "Path to vocabulary file containing a list of newline-" + "separated words where the word id is the " + "corresponding 0-based index in the file.") + +tf.flags.DEFINE_string("word2vec_model", None, + "File containing a word2vec model in binary format.") + +tf.flags.DEFINE_string("output_dir", None, "Output directory.") + +tf.logging.set_verbosity(tf.logging.INFO) + + +def _load_skip_thoughts_embeddings(checkpoint_path): + """Loads the embedding matrix from a skip-thoughts model checkpoint. + + Args: + checkpoint_path: Model checkpoint file or directory containing a checkpoint + file. + + Returns: + word_embedding: A numpy array of shape [vocab_size, embedding_dim]. + + Raises: + ValueError: If no checkpoint file matches checkpoint_path. + """ + if tf.gfile.IsDirectory(checkpoint_path): + checkpoint_file = tf.train.latest_checkpoint(checkpoint_path) + if not checkpoint_file: + raise ValueError("No checkpoint file found in %s" % checkpoint_path) + else: + checkpoint_file = checkpoint_path + + tf.logging.info("Loading skip-thoughts embedding matrix from %s", + checkpoint_file) + reader = tf.train.NewCheckpointReader(checkpoint_file) + word_embedding = reader.get_tensor("word_embedding") + tf.logging.info("Loaded skip-thoughts embedding matrix of shape %s", + word_embedding.shape) + + return word_embedding + + +def _load_vocabulary(filename): + """Loads a vocabulary file. + + Args: + filename: Path to text file containing newline-separated words. + + Returns: + vocab: A dictionary mapping word to word id. + """ + tf.logging.info("Reading vocabulary from %s", filename) + vocab = collections.OrderedDict() + with tf.gfile.GFile(filename, mode="r") as f: + for i, line in enumerate(f): + word = line.decode("utf-8").strip() + assert word not in vocab, "Attempting to add word twice: %s" % word + vocab[word] = i + tf.logging.info("Read vocabulary of size %d", len(vocab)) + return vocab + + +def _expand_vocabulary(skip_thoughts_emb, skip_thoughts_vocab, word2vec): + """Runs vocabulary expansion on a skip-thoughts model using a word2vec model. + + Args: + skip_thoughts_emb: A numpy array of shape [skip_thoughts_vocab_size, + skip_thoughts_embedding_dim]. + skip_thoughts_vocab: A dictionary of word to id. + word2vec: An instance of gensim.models.Word2Vec. + + Returns: + combined_emb: A dictionary mapping words to embedding vectors. + """ + # Find words shared between the two vocabularies. + tf.logging.info("Finding shared words") + shared_words = [w for w in word2vec.vocab if w in skip_thoughts_vocab] + + # Select embedding vectors for shared words. + tf.logging.info("Selecting embeddings for %d shared words", len(shared_words)) + shared_st_emb = skip_thoughts_emb[[ + skip_thoughts_vocab[w] for w in shared_words + ]] + shared_w2v_emb = word2vec[shared_words] + + # Train a linear regression model on the shared embedding vectors. + tf.logging.info("Training linear regression model") + model = sklearn.linear_model.LinearRegression() + model.fit(shared_w2v_emb, shared_st_emb) + + # Create the expanded vocabulary. + tf.logging.info("Creating embeddings for expanded vocabuary") + combined_emb = collections.OrderedDict() + for w in word2vec.vocab: + # Ignore words with underscores (spaces). + if "_" not in w: + w_emb = model.predict(word2vec[w].reshape(1, -1)) + combined_emb[w] = w_emb.reshape(-1) + + for w in skip_thoughts_vocab: + combined_emb[w] = skip_thoughts_emb[skip_thoughts_vocab[w]] + + tf.logging.info("Created expanded vocabulary of %d words", len(combined_emb)) + + return combined_emb + + +def main(unused_argv): + if not FLAGS.skip_thoughts_model: + raise ValueError("--skip_thoughts_model is required.") + if not FLAGS.skip_thoughts_vocab: + raise ValueError("--skip_thoughts_vocab is required.") + if not FLAGS.word2vec_model: + raise ValueError("--word2vec_model is required.") + if not FLAGS.output_dir: + raise ValueError("--output_dir is required.") + + if not tf.gfile.IsDirectory(FLAGS.output_dir): + tf.gfile.MakeDirs(FLAGS.output_dir) + + # Load the skip-thoughts embeddings and vocabulary. + skip_thoughts_emb = _load_skip_thoughts_embeddings(FLAGS.skip_thoughts_model) + skip_thoughts_vocab = _load_vocabulary(FLAGS.skip_thoughts_vocab) + + # Load the Word2Vec model. + word2vec = gensim.models.Word2Vec.load_word2vec_format( + FLAGS.word2vec_model, binary=True) + + # Run vocabulary expansion. + embedding_map = _expand_vocabulary(skip_thoughts_emb, skip_thoughts_vocab, + word2vec) + + # Save the output. + vocab = embedding_map.keys() + vocab_file = os.path.join(FLAGS.output_dir, "vocab.txt") + with tf.gfile.GFile(vocab_file, "w") as f: + f.write("\n".join(vocab)) + tf.logging.info("Wrote vocabulary file to %s", vocab_file) + + embeddings = np.array(embedding_map.values()) + embeddings_file = os.path.join(FLAGS.output_dir, "embeddings.npy") + np.save(embeddings_file, embeddings) + tf.logging.info("Wrote embeddings file to %s", embeddings_file) + + +if __name__ == "__main__": + tf.app.run() diff --git a/slim/BUILD b/slim/BUILD index e0f39d2ab3d19c38d72018dfdad69561c3b91d47..77a1ae50353905968c351a5ffa91d891354b645d 100644 --- a/slim/BUILD +++ b/slim/BUILD @@ -1,7 +1,10 @@ # Description: # Contains files for loading, training and evaluating TF-Slim-based models. -package(default_visibility = [":internal"]) +package(default_visibility = [ + ":internal", + "//domain_adaptation:__subpackages__", +]) licenses(["notice"]) # Apache 2.0 diff --git a/slim/README.md b/slim/README.md index 0fbd754b24c373b3aafd738554b083ac2cbd2640..85275e8d17ee0dc6098d5d4fdeed0135fdc639a8 100644 --- a/slim/README.md +++ b/slim/README.md @@ -13,7 +13,7 @@ converting them to TensorFlow's native TFRecord format and reading them in using TF-Slim's data reading and queueing utilities. You can easily train any model on any of these datasets, as we demonstrate below. We've also included a -[jupyter notebook](https://github.com/tensorflow/models/blob/master/slim/slim_walkthough.ipynb), +[jupyter notebook](https://github.com/tensorflow/models/blob/master/slim/slim_walkthrough.ipynb), which provides working examples of how to use TF-Slim for image classification. ## Contacts @@ -41,23 +41,9 @@ prerequisite packages. ## Installing latest version of TF-slim -As of 8/28/16, the latest [stable release of TF](https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#pip-installation) -is r0.10, which contains most of TF-Slim but not some later additions. To obtain the -latest version, you must install the most recent nightly build of -TensorFlow. You can find the latest nightly binaries at -[TensorFlow Installation](https://github.com/tensorflow/tensorflow#installation) -in the section that reads "People who are a little more adventurous can -also try our nightly binaries". Copy the link address that corresponds to -the appropriate machine architecture and python version, and pip install -it. For example: - -```shell -export TF_BINARY_URL=https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.10.0rc0-cp27-none-linux_x86_64.whl -sudo pip install --upgrade $TF_BINARY_URL -``` - -To test this has worked, execute the following command; it should run -without raising any errors. +TF-Slim is available as `tf.contrib.slim` via TensorFlow 1.0. To test that your +installation is working, execute the following command; it should run without +raising any errors. ``` python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once" @@ -140,7 +126,7 @@ You can use the same script to create the mnist and cifar10 datasets. However, for ImageNet, you have to follow the instructions [here](https://github.com/tensorflow/models/blob/master/inception/README.md#getting-started). Note that you first have to sign up for an account at image-net.org. -Also, the download can take several hours, and uses about 500MB. +Also, the download can take several hours, and could use up to 500GB. ## Creating a TF-Slim Dataset Descriptor. @@ -192,12 +178,12 @@ image classification dataset. In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). -Note that the VGG and ResNet parameters have been converted from their original +Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats ([here](https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014) and [here](https://github.com/KaimingHe/deep-residual-networks)), -whereas the Inception parameters have been trained internally at +whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales. @@ -209,12 +195,19 @@ Model | TF-Slim File | Checkpoint | Top-1 Accuracy| Top-5 Accuracy | [Inception V3](http://arxiv.org/abs/1512.00567)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py)|[inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)|78.0|93.9| [Inception V4](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py)|[inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)|80.2|95.2| [Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3| -[ResNet 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2| -[ResNet 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9| -[ResNet 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2| +[ResNet V1 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2| +[ResNet V1 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9| +[ResNet V1 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2| +[ResNet V2 50](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_50.tar.gz](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz)|75.6|92.8| +[ResNet V2 101](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_101.tar.gz](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz)|77.0|93.7| +[ResNet V2 152](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_152.tar.gz](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz)|77.8|94.1| [VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8| [VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8| +^ ResNet V2 models use Inception pre-processing and input image size of 299 (use +`--preprocessing_name inception --eval_image_size 299` when using +`eval_image_classifier.py`). Performance numbers for ResNet V2 models are +reported on ImageNet valdiation set. Here is an example of how to download the Inception V3 checkpoint: @@ -303,8 +296,8 @@ $ python train_image_classifier.py \ --dataset_split_name=train \ --model_name=inception_v3 \ --checkpoint_path=${CHECKPOINT_PATH} \ - --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits/Logits \ - --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits/Logits + --checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits \ + --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits ``` @@ -358,10 +351,10 @@ following error: ```bash InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000] ``` -This is due to the fact that the VGG and ResNet final layers have only 1000 +This is due to the fact that the VGG and ResNet V1 final layers have only 1000 outputs rather than 1001. -To fix this issue, you can set the `--labels_offsets=1` flag. This results in +To fix this issue, you can set the `--labels_offset=1` flag. This results in the ImageNet labels being shifted down by one: diff --git a/slim/WORKSPACE b/slim/WORKSPACE new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/slim/datasets/cifar10.py b/slim/datasets/cifar10.py index 233426914bd40a64114565c5ab5e5fe0ae41327c..72d58f8d11786f67b80331165fbe504426a8b72a 100644 --- a/slim/datasets/cifar10.py +++ b/slim/datasets/cifar10.py @@ -15,7 +15,7 @@ """Provides data for the Cifar10 dataset. The dataset scripts used to create the dataset can be found at: -tensorflow/models/slim/data/create_cifar10_dataset.py +tensorflow/models/slim/datasets/download_and_convert_cifar10.py """ from __future__ import absolute_import diff --git a/slim/datasets/download_and_convert_mnist.py b/slim/datasets/download_and_convert_mnist.py index 7ecceeb0297c65518eebf988864fe36f44297f83..d6ae8743de23718ab33e18cf6347c52198f29504 100644 --- a/slim/datasets/download_and_convert_mnist.py +++ b/slim/datasets/download_and_convert_mnist.py @@ -125,7 +125,7 @@ def _add_to_tfrecord(data_filename, labels_filename, num_images, png_string = sess.run(encoded_png, feed_dict={image: images[j]}) example = dataset_utils.image_to_tfexample( - png_string, 'png', _IMAGE_SIZE, _IMAGE_SIZE, labels[j]) + png_string, 'png'.encode(), _IMAGE_SIZE, _IMAGE_SIZE, labels[j]) tfrecord_writer.write(example.SerializeToString()) @@ -165,7 +165,7 @@ def _download_dataset(dataset_dir): _progress) print() with tf.gfile.GFile(filepath) as f: - size = f.Size() + size = f.size() print('Successfully downloaded', filename, size, 'bytes.') diff --git a/slim/datasets/mnist.py b/slim/datasets/mnist.py index d9c91a2a5662aff6aacaffd5cc681edfc88762b1..525061c58fee980db628fdefa58e39434e997498 100644 --- a/slim/datasets/mnist.py +++ b/slim/datasets/mnist.py @@ -15,7 +15,7 @@ """Provides data for the MNIST dataset. The dataset scripts used to create the dataset can be found at: -tensorflow/models/slim/data/create_mnist_dataset.py +tensorflow/models/slim/datasets/download_and_convert_mnist.py """ from __future__ import absolute_import diff --git a/slim/deployment/model_deploy.py b/slim/deployment/model_deploy.py index 8e56e24c7a4b83e3639f6b8d279a842591c6e216..67b6f9a386b6e8a796865663d656fba1ff359f60 100644 --- a/slim/deployment/model_deploy.py +++ b/slim/deployment/model_deploy.py @@ -232,11 +232,9 @@ def _gather_clone_loss(clone, num_clones, regularization_losses): sum_loss = tf.add_n(all_losses) # Add the summaries out of the clone device block. if clone_loss is not None: - tf.scalar_summary(clone.scope + '/clone_loss', clone_loss, - name='clone_loss') + tf.summary.scalar(clone.scope + '/clone_loss', clone_loss) if regularization_loss is not None: - tf.scalar_summary('regularization_loss', regularization_loss, - name='regularization_loss') + tf.summary.scalar('regularization_loss', regularization_loss) return sum_loss @@ -306,7 +304,7 @@ def optimize_clones(clones, optimizer, regularization_losses = None # Compute the total_loss summing all the clones_losses. total_loss = tf.add_n(clones_losses, name='total_loss') - # Sum the gradients accross clones. + # Sum the gradients across clones. grads_and_vars = _sum_clones_gradients(grads_and_vars) return total_loss, grads_and_vars @@ -380,8 +378,8 @@ def deploy(config, update_ops.append(grad_updates) update_op = tf.group(*update_ops) - train_op = control_flow_ops.with_dependencies([update_op], total_loss, - name='train_op') + with tf.control_dependencies([update_op]): + train_op = tf.identity(total_loss, name='train_op') else: clones_losses = [] regularization_losses = tf.get_collection( @@ -404,12 +402,11 @@ def deploy(config, if total_loss is not None: # Add total_loss to summary. - summaries.add(tf.scalar_summary('total_loss', total_loss, - name='total_loss')) + summaries.add(tf.summary.scalar('total_loss', total_loss)) if summaries: # Merge all summaries together. - summary_op = tf.merge_summary(list(summaries), name='summary_op') + summary_op = tf.summary.merge(list(summaries), name='summary_op') else: summary_op = None @@ -467,9 +464,9 @@ def _add_gradients_summaries(grads_and_vars): grad_values = grad.values else: grad_values = grad - summaries.append(tf.histogram_summary(var.op.name + ':gradient', + summaries.append(tf.summary.histogram(var.op.name + ':gradient', grad_values)) - summaries.append(tf.histogram_summary(var.op.name + ':gradient_norm', + summaries.append(tf.summary.histogram(var.op.name + ':gradient_norm', tf.global_norm([grad_values]))) else: tf.logging.info('Var %s has no gradient', var.op.name) @@ -666,7 +663,7 @@ class DeploymentConfig(object): if op.device: return op.device node_def = op if isinstance(op, tf.NodeDef) else op.node_def - if node_def.op == 'Variable': + if node_def.op.startswith('Variable'): t = self._task self._task = (self._task + 1) % self._tasks d = '%s/task:%d' % (self._device, t) diff --git a/slim/eval_image_classifier.py b/slim/eval_image_classifier.py index e5b923e1d0c1bc6b8ab84408304aad8de5357b36..82d10d91cfbefb8179c847123ee6db24b0e54e43 100644 --- a/slim/eval_image_classifier.py +++ b/slim/eval_image_classifier.py @@ -153,14 +153,14 @@ def main(_): # Define the metrics: names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({ 'Accuracy': slim.metrics.streaming_accuracy(predictions, labels), - 'Recall@5': slim.metrics.streaming_recall_at_k( + 'Recall_5': slim.metrics.streaming_recall_at_k( logits, labels, 5), }) # Print the summaries to screen. - for name, value in names_to_values.iteritems(): + for name, value in names_to_values.items(): summary_name = 'eval/%s' % name - op = tf.scalar_summary(summary_name, value, collections=[]) + op = tf.summary.scalar(summary_name, value, collections=[]) op = tf.Print(op, [value], summary_name) tf.add_to_collection(tf.GraphKeys.SUMMARIES, op) @@ -183,7 +183,7 @@ def main(_): checkpoint_path=checkpoint_path, logdir=FLAGS.eval_dir, num_evals=num_batches, - eval_op=names_to_updates.values(), + eval_op=list(names_to_updates.values()), variables_to_restore=variables_to_restore) diff --git a/slim/nets/alexnet.py b/slim/nets/alexnet.py index a6b93de054c31d4ea057e705507b608a49d7bd2f..4e7e563cd127f1f5d0274d636993c8812d01d57e 100644 --- a/slim/nets/alexnet.py +++ b/slim/nets/alexnet.py @@ -113,7 +113,7 @@ def alexnet_v2(inputs, net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, - biases_initializer=tf.zeros_initializer, + biases_initializer=tf.zeros_initializer(), scope='fc8') # Convert end_points_collection into a end_point dict. diff --git a/slim/nets/cifarnet.py b/slim/nets/cifarnet.py index 371a9cbf28d9f07f71df2fa8a00e8d6b0dd5e0a4..44ca0fed2d8c1129327e73d1154ca9ade1c59790 100644 --- a/slim/nets/cifarnet.py +++ b/slim/nets/cifarnet.py @@ -77,7 +77,7 @@ def cifarnet(images, num_classes=10, is_training=False, net = slim.fully_connected(net, 192, scope='fc4') end_points['fc4'] = net logits = slim.fully_connected(net, num_classes, - biases_initializer=tf.zeros_initializer, + biases_initializer=tf.zeros_initializer(), weights_initializer=trunc_normal(1/192.0), weights_regularizer=None, activation_fn=None, diff --git a/slim/nets/inception_resnet_v2.py b/slim/nets/inception_resnet_v2.py index 6ed3002116e7b81fc62d37fdf88edbaae0cd3f6b..b5a54c5b6186c8e9357e478d2d0faf22e6cf979b 100644 --- a/slim/nets/inception_resnet_v2.py +++ b/slim/nets/inception_resnet_v2.py @@ -171,7 +171,7 @@ def inception_resnet_v2(inputs, num_classes=1001, is_training=True, end_points['Mixed_5b'] = net net = slim.repeat(net, 10, block35, scale=0.17) - # 17 x 17 x 1024 + # 17 x 17 x 1088 with tf.variable_scope('Mixed_6a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 384, 3, stride=2, padding='VALID', @@ -191,7 +191,7 @@ def inception_resnet_v2(inputs, num_classes=1001, is_training=True, end_points['Mixed_6a'] = net net = slim.repeat(net, 20, block17, scale=0.10) - # Auxillary tower + # Auxiliary tower with tf.variable_scope('AuxLogits'): aux = slim.avg_pool2d(net, 5, stride=3, padding='VALID', scope='Conv2d_1a_3x3') diff --git a/slim/nets/inception_resnet_v2_test.py b/slim/nets/inception_resnet_v2_test.py index 864eb2501ee2f5bc98ce5e941ded7a9929dbefa9..b1560fb0102f8aeed01a4baa3e89f57386c08efe 100644 --- a/slim/nets/inception_resnet_v2_test.py +++ b/slim/nets/inception_resnet_v2_test.py @@ -65,9 +65,9 @@ class InceptionTest(tf.test.TestCase): inception.inception_resnet_v2(inputs, num_classes) with tf.variable_scope('on_gpu'), tf.device('/gpu:0'): inception.inception_resnet_v2(inputs, num_classes) - for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='on_cpu'): + for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'): self.assertDeviceEqual(v.device, '/cpu:0') - for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='on_gpu'): + for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'): self.assertDeviceEqual(v.device, '/gpu:0') def testHalfSizeImages(self): diff --git a/slim/nets/inception_v1.py b/slim/nets/inception_v1.py index 8f644796e7a5b2e0f75c2c95b7fcc28ba248643e..4207c2a7f725e215d26019c1483948f6214f32a5 100644 --- a/slim/nets/inception_v1.py +++ b/slim/nets/inception_v1.py @@ -93,7 +93,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -110,7 +110,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -132,7 +132,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -149,7 +149,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -166,7 +166,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -183,7 +183,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -200,7 +200,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -222,7 +222,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points @@ -239,7 +239,7 @@ def inception_v1_base(inputs, with tf.variable_scope('Branch_3'): branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if final_endpoint == end_point: return net, end_points raise ValueError('Unknown final endpoint %s' % final_endpoint) @@ -270,7 +270,7 @@ def inception_v1(inputs, is_training: whether is training or not. dropout_keep_prob: the percentage of activation values that are retained. prediction_fn: a function to get predictions out of logits. - spatial_squeeze: if True, logits is of shape is [B, C], if false logits is + spatial_squeeze: if True, logits is of shape [B, C], if false logits is of shape [B, 1, 1, C], where B is batch_size and C is number of classes. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. @@ -289,7 +289,7 @@ def inception_v1(inputs, is_training=is_training): net, end_points = inception_v1_base(inputs, scope=scope) with tf.variable_scope('Logits'): - net = slim.avg_pool2d(net, [7, 7], stride=1, scope='MaxPool_0a_7x7') + net = slim.avg_pool2d(net, [7, 7], stride=1, scope='AvgPool_0a_7x7') net = slim.dropout(net, dropout_keep_prob, scope='Dropout_0b') logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, diff --git a/slim/nets/inception_v2.py b/slim/nets/inception_v2.py index 6c9f10098badc2034997712adba9ee9b3cd0b4e5..2651f71f9ab60484d183338616c2cedd0a1fd5a5 100644 --- a/slim/nets/inception_v2.py +++ b/slim/nets/inception_v2.py @@ -145,7 +145,7 @@ def inception_v2_base(inputs, branch_3, depth(32), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 28 x 28 x 256 @@ -175,7 +175,7 @@ def inception_v2_base(inputs, branch_3, depth(64), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 28 x 28 x 320 @@ -200,7 +200,7 @@ def inception_v2_base(inputs, with tf.variable_scope('Branch_2'): branch_2 = slim.max_pool2d( net, [3, 3], stride=2, scope='MaxPool_1a_3x3') - net = tf.concat(3, [branch_0, branch_1, branch_2]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 14 x 14 x 576 @@ -230,7 +230,7 @@ def inception_v2_base(inputs, branch_3, depth(128), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 14 x 14 x 576 @@ -260,7 +260,7 @@ def inception_v2_base(inputs, branch_3, depth(128), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 14 x 14 x 576 @@ -290,7 +290,7 @@ def inception_v2_base(inputs, branch_3, depth(96), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -321,7 +321,7 @@ def inception_v2_base(inputs, branch_3, depth(96), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 14 x 14 x 576 @@ -346,7 +346,7 @@ def inception_v2_base(inputs, with tf.variable_scope('Branch_2'): branch_2 = slim.max_pool2d(net, [3, 3], stride=2, scope='MaxPool_1a_3x3') - net = tf.concat(3, [branch_0, branch_1, branch_2]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # 7 x 7 x 1024 @@ -376,7 +376,7 @@ def inception_v2_base(inputs, branch_3, depth(128), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -407,7 +407,7 @@ def inception_v2_base(inputs, branch_3, depth(128), [1, 1], weights_initializer=trunc_normal(0.1), scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points raise ValueError('Unknown final endpoint %s' % final_endpoint) @@ -443,7 +443,7 @@ def inception_v2(inputs, usage will be to set this value in (0, 1) to reduce the number of parameters or computation cost of the model. prediction_fn: a function to get predictions out of logits. - spatial_squeeze: if True, logits is of shape is [B, C], if false logits is + spatial_squeeze: if True, logits is of shape [B, C], if false logits is of shape [B, 1, 1, C], where B is batch_size and C is number of classes. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. diff --git a/slim/nets/inception_v3.py b/slim/nets/inception_v3.py index 5c5f96519e19558e3baddbe21f34b8f252540083..d64bcfd4748ea103145f7187ef366f46946ae2f8 100644 --- a/slim/nets/inception_v3.py +++ b/slim/nets/inception_v3.py @@ -158,7 +158,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -182,7 +182,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -205,7 +205,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -224,7 +224,7 @@ def inception_v3_base(inputs, with tf.variable_scope('Branch_2'): branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') - net = tf.concat(3, [branch_0, branch_1, branch_2]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -253,7 +253,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -282,7 +282,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # mixed_6: 17 x 17 x 768. @@ -310,7 +310,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -339,7 +339,7 @@ def inception_v3_base(inputs, branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -361,7 +361,7 @@ def inception_v3_base(inputs, with tf.variable_scope('Branch_2'): branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') - net = tf.concat(3, [branch_0, branch_1, branch_2]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points # mixed_9: 8 x 8 x 2048. @@ -371,21 +371,21 @@ def inception_v3_base(inputs, branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') with tf.variable_scope('Branch_1'): branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') - branch_1 = tf.concat(3, [ + branch_1 = tf.concat(axis=3, values=[ slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1')]) with tf.variable_scope('Branch_2'): branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') branch_2 = slim.conv2d( branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') - branch_2 = tf.concat(3, [ + branch_2 = tf.concat(axis=3, values=[ slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')]) with tf.variable_scope('Branch_3'): branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d( branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points @@ -396,21 +396,21 @@ def inception_v3_base(inputs, branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') with tf.variable_scope('Branch_1'): branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') - branch_1 = tf.concat(3, [ + branch_1 = tf.concat(axis=3, values=[ slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1')]) with tf.variable_scope('Branch_2'): branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') branch_2 = slim.conv2d( branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') - branch_2 = tf.concat(3, [ + branch_2 = tf.concat(axis=3, values=[ slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')]) with tf.variable_scope('Branch_3'): branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d( branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') - net = tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) end_points[end_point] = net if end_point == final_endpoint: return net, end_points raise ValueError('Unknown final endpoint %s' % final_endpoint) @@ -453,7 +453,7 @@ def inception_v3(inputs, usage will be to set this value in (0, 1) to reduce the number of parameters or computation cost of the model. prediction_fn: a function to get predictions out of logits. - spatial_squeeze: if True, logits is of shape is [B, C], if false logits is + spatial_squeeze: if True, logits is of shape [B, C], if false logits is of shape [B, 1, 1, C], where B is batch_size and C is number of classes. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. diff --git a/slim/nets/inception_v4.py b/slim/nets/inception_v4.py index 0c581f7c41e5666ac06a7fd6ed5826d4defb9605..b4f07ea70edf69ecac94fad26fb949295a41eac0 100644 --- a/slim/nets/inception_v4.py +++ b/slim/nets/inception_v4.py @@ -49,7 +49,7 @@ def block_inception_a(inputs, scope=None, reuse=None): with tf.variable_scope('Branch_3'): branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 96, [1, 1], scope='Conv2d_0b_1x1') - return tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) def block_reduction_a(inputs, scope=None, reuse=None): @@ -69,7 +69,7 @@ def block_reduction_a(inputs, scope=None, reuse=None): with tf.variable_scope('Branch_2'): branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') - return tf.concat(3, [branch_0, branch_1, branch_2]) + return tf.concat(axis=3, values=[branch_0, branch_1, branch_2]) def block_inception_b(inputs, scope=None, reuse=None): @@ -93,7 +93,7 @@ def block_inception_b(inputs, scope=None, reuse=None): with tf.variable_scope('Branch_3'): branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1') - return tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) def block_reduction_b(inputs, scope=None, reuse=None): @@ -115,7 +115,7 @@ def block_reduction_b(inputs, scope=None, reuse=None): with tf.variable_scope('Branch_2'): branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') - return tf.concat(3, [branch_0, branch_1, branch_2]) + return tf.concat(axis=3, values=[branch_0, branch_1, branch_2]) def block_inception_c(inputs, scope=None, reuse=None): @@ -128,20 +128,20 @@ def block_inception_c(inputs, scope=None, reuse=None): branch_0 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1') with tf.variable_scope('Branch_1'): branch_1 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1') - branch_1 = tf.concat(3, [ + branch_1 = tf.concat(axis=3, values=[ slim.conv2d(branch_1, 256, [1, 3], scope='Conv2d_0b_1x3'), slim.conv2d(branch_1, 256, [3, 1], scope='Conv2d_0c_3x1')]) with tf.variable_scope('Branch_2'): branch_2 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1') branch_2 = slim.conv2d(branch_2, 448, [3, 1], scope='Conv2d_0b_3x1') branch_2 = slim.conv2d(branch_2, 512, [1, 3], scope='Conv2d_0c_1x3') - branch_2 = tf.concat(3, [ + branch_2 = tf.concat(axis=3, values=[ slim.conv2d(branch_2, 256, [1, 3], scope='Conv2d_0d_1x3'), slim.conv2d(branch_2, 256, [3, 1], scope='Conv2d_0e_3x1')]) with tf.variable_scope('Branch_3'): branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3') branch_3 = slim.conv2d(branch_3, 256, [1, 1], scope='Conv2d_0b_1x1') - return tf.concat(3, [branch_0, branch_1, branch_2, branch_3]) + return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3]) def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): @@ -192,7 +192,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): with tf.variable_scope('Branch_1'): branch_1 = slim.conv2d(net, 96, [3, 3], stride=2, padding='VALID', scope='Conv2d_0a_3x3') - net = tf.concat(3, [branch_0, branch_1]) + net = tf.concat(axis=3, values=[branch_0, branch_1]) if add_and_check_final('Mixed_3a', net): return net, end_points # 73 x 73 x 160 @@ -207,7 +207,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): branch_1 = slim.conv2d(branch_1, 64, [7, 1], scope='Conv2d_0c_7x1') branch_1 = slim.conv2d(branch_1, 96, [3, 3], padding='VALID', scope='Conv2d_1a_3x3') - net = tf.concat(3, [branch_0, branch_1]) + net = tf.concat(axis=3, values=[branch_0, branch_1]) if add_and_check_final('Mixed_4a', net): return net, end_points # 71 x 71 x 192 @@ -218,12 +218,12 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): with tf.variable_scope('Branch_1'): branch_1 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') - net = tf.concat(3, [branch_0, branch_1]) + net = tf.concat(axis=3, values=[branch_0, branch_1]) if add_and_check_final('Mixed_5a', net): return net, end_points # 35 x 35 x 384 # 4 x Inception-A blocks - for idx in xrange(4): + for idx in range(4): block_scope = 'Mixed_5' + chr(ord('b') + idx) net = block_inception_a(net, block_scope) if add_and_check_final(block_scope, net): return net, end_points @@ -235,7 +235,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): # 17 x 17 x 1024 # 7 x Inception-B blocks - for idx in xrange(7): + for idx in range(7): block_scope = 'Mixed_6' + chr(ord('b') + idx) net = block_inception_b(net, block_scope) if add_and_check_final(block_scope, net): return net, end_points @@ -247,7 +247,7 @@ def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None): # 8 x 8 x 1536 # 3 x Inception-C blocks - for idx in xrange(3): + for idx in range(3): block_scope = 'Mixed_7' + chr(ord('b') + idx) net = block_inception_c(net, block_scope) if add_and_check_final(block_scope, net): return net, end_points @@ -269,7 +269,7 @@ def inception_v4(inputs, num_classes=1001, is_training=True, reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. - create_aux_logits: Whether to include the auxilliary logits. + create_aux_logits: Whether to include the auxiliary logits. Returns: logits: the logits outputs of the model. diff --git a/slim/nets/inception_v4_test.py b/slim/nets/inception_v4_test.py index 5ffe656e41976993d2bfdfdb5a51d0f290962708..11cffb63169a2c2580a37a3e8948e8d42f28cf5f 100644 --- a/slim/nets/inception_v4_test.py +++ b/slim/nets/inception_v4_test.py @@ -146,9 +146,9 @@ class InceptionTest(tf.test.TestCase): inception.inception_v4(inputs, num_classes) with tf.variable_scope('on_gpu'), tf.device('/gpu:0'): inception.inception_v4(inputs, num_classes) - for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='on_cpu'): + for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'): self.assertDeviceEqual(v.device, '/cpu:0') - for v in tf.get_collection(tf.GraphKeys.VARIABLES, scope='on_gpu'): + for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'): self.assertDeviceEqual(v.device, '/gpu:0') def testHalfSizeImages(self): diff --git a/slim/nets/nets_factory.py b/slim/nets/nets_factory.py index b4f71abd1a24354c2a76314d536e3a925916d933..bd8d7127a10e82be65920febae63e7371f3fae11 100644 --- a/slim/nets/nets_factory.py +++ b/slim/nets/nets_factory.py @@ -97,10 +97,10 @@ def get_network_fn(name, num_classes, weight_decay=0.0, is_training=False): """ if name not in networks_map: raise ValueError('Name of network unknown %s' % name) - arg_scope = arg_scopes_map[name](weight_decay=weight_decay) func = networks_map[name] @functools.wraps(func) def network_fn(images): + arg_scope = arg_scopes_map[name](weight_decay=weight_decay) with slim.arg_scope(arg_scope): return func(images, num_classes, is_training=is_training) if hasattr(func, 'default_image_size'): diff --git a/slim/nets/nets_factory_test.py b/slim/nets/nets_factory_test.py index 6ac723b6d98833f8eb1ebe02c4552e0cf1d758a1..b4ab1f822c9e85ab41b25e57589479e95377de18 100644 --- a/slim/nets/nets_factory_test.py +++ b/slim/nets/nets_factory_test.py @@ -19,11 +19,12 @@ from __future__ import absolute_import from __future__ import division from __future__ import print_function - import tensorflow as tf from nets import nets_factory +slim = tf.contrib.slim + class NetworksTest(tf.test.TestCase): @@ -42,5 +43,19 @@ class NetworksTest(tf.test.TestCase): self.assertEqual(logits.get_shape().as_list()[0], batch_size) self.assertEqual(logits.get_shape().as_list()[-1], num_classes) + def testGetNetworkFnArgScope(self): + batch_size = 5 + num_classes = 10 + net = 'cifarnet' + with self.test_session(use_gpu=True): + net_fn = nets_factory.get_network_fn(net, num_classes) + image_size = getattr(net_fn, 'default_image_size', 224) + with slim.arg_scope([slim.model_variable, slim.variable], + device='/CPU:0'): + inputs = tf.random_uniform((batch_size, image_size, image_size, 3)) + net_fn(inputs) + weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, 'CifarNet/conv1')[0] + self.assertDeviceEqual('/CPU:0', weights.device) + if __name__ == '__main__': tf.test.main() diff --git a/slim/nets/overfeat.py b/slim/nets/overfeat.py index 0c8f45ce0278adcfa483dd0865d83891d52b8f7f..64a542523a1df7079ab3cca1a0137a60d592ec70 100644 --- a/slim/nets/overfeat.py +++ b/slim/nets/overfeat.py @@ -41,7 +41,7 @@ def overfeat_arg_scope(weight_decay=0.0005): with slim.arg_scope([slim.conv2d, slim.fully_connected], activation_fn=tf.nn.relu, weights_regularizer=slim.l2_regularizer(weight_decay), - biases_initializer=tf.zeros_initializer): + biases_initializer=tf.zeros_initializer()): with slim.arg_scope([slim.conv2d], padding='SAME'): with slim.arg_scope([slim.max_pool2d], padding='VALID') as arg_sc: return arg_sc @@ -107,7 +107,7 @@ def overfeat(inputs, net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, - biases_initializer=tf.zeros_initializer, + biases_initializer=tf.zeros_initializer(), scope='fc8') # Convert end_points_collection into a end_point dict. end_points = slim.utils.convert_collection_to_dict(end_points_collection) diff --git a/slim/nets/resnet_v1.py b/slim/nets/resnet_v1.py index 03d49eda0d264304a04242747e2a0afba7f32bb9..3cb3121e9774e6db11938b62b86c774dbed14bb1 100644 --- a/slim/nets/resnet_v1.py +++ b/slim/nets/resnet_v1.py @@ -119,6 +119,7 @@ def resnet_v1(inputs, global_pool=True, output_stride=None, include_root_block=True, + spatial_squeeze=True, reuse=None, scope=None): """Generator for v1 ResNet models. @@ -158,6 +159,8 @@ def resnet_v1(inputs, ratio of input to output spatial resolution. include_root_block: If True, include the initial convolution followed by max-pooling, if False excludes it. + spatial_squeeze: if True, logits is of shape [B, C], if false logits is + of shape [B, 1, 1, C], where B is batch_size and C is number of classes. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. @@ -197,11 +200,15 @@ def resnet_v1(inputs, if num_classes is not None: net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits') + if spatial_squeeze: + logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') + else: + logits = net # Convert end_points_collection into a dictionary of end_points. end_points = slim.utils.convert_collection_to_dict(end_points_collection) if num_classes is not None: - end_points['predictions'] = slim.softmax(net, scope='predictions') - return net, end_points + end_points['predictions'] = slim.softmax(logits, scope='predictions') + return logits, end_points resnet_v1.default_image_size = 224 @@ -210,6 +217,7 @@ def resnet_v1_50(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_50'): """ResNet-50 model of [1]. See resnet_v1() for arg and return description.""" @@ -225,7 +233,9 @@ def resnet_v1_50(inputs, ] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v1_50.default_image_size = resnet_v1.default_image_size def resnet_v1_101(inputs, @@ -233,6 +243,7 @@ def resnet_v1_101(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_101'): """ResNet-101 model of [1]. See resnet_v1() for arg and return description.""" @@ -248,7 +259,9 @@ def resnet_v1_101(inputs, ] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v1_101.default_image_size = resnet_v1.default_image_size def resnet_v1_152(inputs, @@ -256,6 +269,7 @@ def resnet_v1_152(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_152'): """ResNet-152 model of [1]. See resnet_v1() for arg and return description.""" @@ -270,7 +284,9 @@ def resnet_v1_152(inputs, 'block4', bottleneck, [(2048, 512, 1)] * 3)] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v1_152.default_image_size = resnet_v1.default_image_size def resnet_v1_200(inputs, @@ -278,6 +294,7 @@ def resnet_v1_200(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v1_200'): """ResNet-200 model of [2]. See resnet_v1() for arg and return description.""" @@ -292,4 +309,6 @@ def resnet_v1_200(inputs, 'block4', bottleneck, [(2048, 512, 1)] * 3)] return resnet_v1(inputs, blocks, num_classes, is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v1_200.default_image_size = resnet_v1.default_image_size diff --git a/slim/nets/resnet_v2.py b/slim/nets/resnet_v2.py index 9476db246029ebdbf3bd426e714c7937ed952c60..87a1df67daa6db2f82055daad6d54e814ca65d87 100644 --- a/slim/nets/resnet_v2.py +++ b/slim/nets/resnet_v2.py @@ -25,8 +25,6 @@ introduced by: The key difference of the full preactivation 'v2' variant compared to the 'v1' variant in [1] is the use of batch normalization before every weight layer. -Another difference is that 'v2' ResNets do not include an activation function in -the main pathway. Also see [2; Fig. 4e]. Typical use: @@ -117,6 +115,7 @@ def resnet_v2(inputs, global_pool=True, output_stride=None, include_root_block=True, + spatial_squeeze=True, reuse=None, scope=None): """Generator for v2 (preactivation) ResNet models. @@ -157,6 +156,8 @@ def resnet_v2(inputs, include_root_block: If True, include the initial convolution followed by max-pooling, if False excludes it. If excluded, `inputs` should be the results of an activation-less convolution. + spatial_squeeze: if True, logits is of shape [B, C], if false logits is + of shape [B, 1, 1, C], where B is batch_size and C is number of classes. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. @@ -206,11 +207,15 @@ def resnet_v2(inputs, if num_classes is not None: net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits') + if spatial_squeeze: + logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') + else: + logits = net # Convert end_points_collection into a dictionary of end_points. end_points = slim.utils.convert_collection_to_dict(end_points_collection) if num_classes is not None: - end_points['predictions'] = slim.softmax(net, scope='predictions') - return net, end_points + end_points['predictions'] = slim.softmax(logits, scope='predictions') + return logits, end_points resnet_v2.default_image_size = 224 @@ -219,6 +224,7 @@ def resnet_v2_50(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v2_50'): """ResNet-50 model of [1]. See resnet_v2() for arg and return description.""" @@ -233,7 +239,9 @@ def resnet_v2_50(inputs, 'block4', bottleneck, [(2048, 512, 1)] * 3)] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v2_50.default_image_size = resnet_v2.default_image_size def resnet_v2_101(inputs, @@ -241,6 +249,7 @@ def resnet_v2_101(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v2_101'): """ResNet-101 model of [1]. See resnet_v2() for arg and return description.""" @@ -255,7 +264,9 @@ def resnet_v2_101(inputs, 'block4', bottleneck, [(2048, 512, 1)] * 3)] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v2_101.default_image_size = resnet_v2.default_image_size def resnet_v2_152(inputs, @@ -263,6 +274,7 @@ def resnet_v2_152(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v2_152'): """ResNet-152 model of [1]. See resnet_v2() for arg and return description.""" @@ -277,7 +289,9 @@ def resnet_v2_152(inputs, 'block4', bottleneck, [(2048, 512, 1)] * 3)] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v2_152.default_image_size = resnet_v2.default_image_size def resnet_v2_200(inputs, @@ -285,6 +299,7 @@ def resnet_v2_200(inputs, is_training=True, global_pool=True, output_stride=None, + spatial_squeeze=True, reuse=None, scope='resnet_v2_200'): """ResNet-200 model of [2]. See resnet_v2() for arg and return description.""" @@ -299,4 +314,6 @@ def resnet_v2_200(inputs, 'block4', bottleneck, [(2048, 512, 1)] * 3)] return resnet_v2(inputs, blocks, num_classes, is_training=is_training, global_pool=global_pool, output_stride=output_stride, - include_root_block=True, reuse=reuse, scope=scope) + include_root_block=True, spatial_squeeze=spatial_squeeze, + reuse=reuse, scope=scope) +resnet_v2_200.default_image_size = resnet_v2.default_image_size diff --git a/slim/nets/vgg.py b/slim/nets/vgg.py index c9a66e1bd3a251a011e28d109a3b89c61ae6a321..79680702c5efb0383036376619395df8bf340a30 100644 --- a/slim/nets/vgg.py +++ b/slim/nets/vgg.py @@ -58,7 +58,7 @@ def vgg_arg_scope(weight_decay=0.0005): with slim.arg_scope([slim.conv2d, slim.fully_connected], activation_fn=tf.nn.relu, weights_regularizer=slim.l2_regularizer(weight_decay), - biases_initializer=tf.zeros_initializer): + biases_initializer=tf.zeros_initializer()): with slim.arg_scope([slim.conv2d], padding='SAME') as arg_sc: return arg_sc @@ -68,7 +68,8 @@ def vgg_a(inputs, is_training=True, dropout_keep_prob=0.5, spatial_squeeze=True, - scope='vgg_a'): + scope='vgg_a', + fc_conv_padding='VALID'): """Oxford Net VGG 11-Layers version A Example. Note: All the fully_connected layers have been transformed to conv2d layers. @@ -83,6 +84,11 @@ def vgg_a(inputs, spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. + fc_conv_padding: the type of padding to use for the fully connected layer + that is implemented as a convolutional layer. Use 'SAME' padding if you + are applying the network in a fully convolutional manner and want to + get a prediction map downsampled by a factor of 32 as an output. Otherwise, + the output prediction map will be (input / 32) - 6 in case of 'VALID' padding. Returns: the last op containing the log predictions and end_points dict. @@ -103,7 +109,7 @@ def vgg_a(inputs, net = slim.repeat(net, 2, slim.conv2d, 512, [3, 3], scope='conv5') net = slim.max_pool2d(net, [2, 2], scope='pool5') # Use conv2d instead of fully_connected layers. - net = slim.conv2d(net, 4096, [7, 7], padding='VALID', scope='fc6') + net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6') net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout6') net = slim.conv2d(net, 4096, [1, 1], scope='fc7') @@ -127,7 +133,8 @@ def vgg_16(inputs, is_training=True, dropout_keep_prob=0.5, spatial_squeeze=True, - scope='vgg_16'): + scope='vgg_16', + fc_conv_padding='VALID'): """Oxford Net VGG 16-Layers version D Example. Note: All the fully_connected layers have been transformed to conv2d layers. @@ -142,6 +149,11 @@ def vgg_16(inputs, spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. + fc_conv_padding: the type of padding to use for the fully connected layer + that is implemented as a convolutional layer. Use 'SAME' padding if you + are applying the network in a fully convolutional manner and want to + get a prediction map downsampled by a factor of 32 as an output. Otherwise, + the output prediction map will be (input / 32) - 6 in case of 'VALID' padding. Returns: the last op containing the log predictions and end_points dict. @@ -162,7 +174,7 @@ def vgg_16(inputs, net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5') net = slim.max_pool2d(net, [2, 2], scope='pool5') # Use conv2d instead of fully_connected layers. - net = slim.conv2d(net, 4096, [7, 7], padding='VALID', scope='fc6') + net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6') net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout6') net = slim.conv2d(net, 4096, [1, 1], scope='fc7') @@ -186,7 +198,8 @@ def vgg_19(inputs, is_training=True, dropout_keep_prob=0.5, spatial_squeeze=True, - scope='vgg_19'): + scope='vgg_19', + fc_conv_padding='VALID'): """Oxford Net VGG 19-Layers version E Example. Note: All the fully_connected layers have been transformed to conv2d layers. @@ -201,6 +214,11 @@ def vgg_19(inputs, spatial_squeeze: whether or not should squeeze the spatial dimensions of the outputs. Useful to remove unnecessary dimensions for classification. scope: Optional scope for the variables. + fc_conv_padding: the type of padding to use for the fully connected layer + that is implemented as a convolutional layer. Use 'SAME' padding if you + are applying the network in a fully convolutional manner and want to + get a prediction map downsampled by a factor of 32 as an output. Otherwise, + the output prediction map will be (input / 32) - 6 in case of 'VALID' padding. Returns: the last op containing the log predictions and end_points dict. @@ -221,7 +239,7 @@ def vgg_19(inputs, net = slim.repeat(net, 4, slim.conv2d, 512, [3, 3], scope='conv5') net = slim.max_pool2d(net, [2, 2], scope='pool5') # Use conv2d instead of fully_connected layers. - net = slim.conv2d(net, 4096, [7, 7], padding='VALID', scope='fc6') + net = slim.conv2d(net, 4096, [7, 7], padding=fc_conv_padding, scope='fc6') net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='dropout6') net = slim.conv2d(net, 4096, [1, 1], scope='fc7') diff --git a/slim/preprocessing/cifarnet_preprocessing.py b/slim/preprocessing/cifarnet_preprocessing.py index 1057d462889d2005ce73c65051817ed164af2571..195a5c7d0035fc82332d474d74e57f51a75938d7 100644 --- a/slim/preprocessing/cifarnet_preprocessing.py +++ b/slim/preprocessing/cifarnet_preprocessing.py @@ -45,7 +45,7 @@ def preprocess_for_train(image, Returns: A preprocessed image. """ - tf.image_summary('image', tf.expand_dims(image, 0)) + tf.summary.image('image', tf.expand_dims(image, 0)) # Transform the image to floats. image = tf.to_float(image) @@ -58,7 +58,7 @@ def preprocess_for_train(image, # Randomly flip the image horizontally. distorted_image = tf.image.random_flip_left_right(distorted_image) - tf.image_summary('distorted_image', tf.expand_dims(distorted_image, 0)) + tf.summary.image('distorted_image', tf.expand_dims(distorted_image, 0)) # Because these operations are not commutative, consider randomizing # the order their operation. @@ -67,7 +67,7 @@ def preprocess_for_train(image, distorted_image = tf.image.random_contrast(distorted_image, lower=0.2, upper=1.8) # Subtract off the mean and divide by the variance of the pixels. - return tf.image.per_image_whitening(distorted_image) + return tf.image.per_image_standardization(distorted_image) def preprocess_for_eval(image, output_height, output_width): @@ -81,7 +81,7 @@ def preprocess_for_eval(image, output_height, output_width): Returns: A preprocessed image. """ - tf.image_summary('image', tf.expand_dims(image, 0)) + tf.summary.image('image', tf.expand_dims(image, 0)) # Transform the image to floats. image = tf.to_float(image) @@ -89,10 +89,10 @@ def preprocess_for_eval(image, output_height, output_width): resized_image = tf.image.resize_image_with_crop_or_pad(image, output_width, output_height) - tf.image_summary('resized_image', tf.expand_dims(resized_image, 0)) + tf.summary.image('resized_image', tf.expand_dims(resized_image, 0)) # Subtract off the mean and divide by the variance of the pixels. - return tf.image.per_image_whitening(resized_image) + return tf.image.per_image_standardization(resized_image) def preprocess_image(image, output_height, output_width, is_training=False): diff --git a/slim/preprocessing/inception_preprocessing.py b/slim/preprocessing/inception_preprocessing.py index 133264b65429909a659a5c3a28bfdb9b6edadee8..b907aab1f4e610844f843ff590904859722ba237 100644 --- a/slim/preprocessing/inception_preprocessing.py +++ b/slim/preprocessing/inception_preprocessing.py @@ -192,7 +192,7 @@ def preprocess_for_train(image, height, width, bbox, # the coordinates are ordered [ymin, xmin, ymax, xmax]. image_with_box = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0), bbox) - tf.image_summary('image_with_bounding_boxes', image_with_box) + tf.summary.image('image_with_bounding_boxes', image_with_box) distorted_image, distorted_bbox = distorted_bounding_box_crop(image, bbox) # Restore the shape since the dynamic slice based upon the bbox_size loses @@ -200,7 +200,7 @@ def preprocess_for_train(image, height, width, bbox, distorted_image.set_shape([None, None, 3]) image_with_distorted_box = tf.image.draw_bounding_boxes( tf.expand_dims(image, 0), distorted_bbox) - tf.image_summary('images_with_distorted_bounding_box', + tf.summary.image('images_with_distorted_bounding_box', image_with_distorted_box) # This resizing operation may distort the images because the aspect @@ -215,7 +215,7 @@ def preprocess_for_train(image, height, width, bbox, lambda x, method: tf.image.resize_images(x, [height, width], method=method), num_cases=num_resize_cases) - tf.image_summary('cropped_resized_image', + tf.summary.image('cropped_resized_image', tf.expand_dims(distorted_image, 0)) # Randomly flip the image horizontally. @@ -227,10 +227,10 @@ def preprocess_for_train(image, height, width, bbox, lambda x, ordering: distort_color(x, ordering, fast_mode), num_cases=4) - tf.image_summary('final_distorted_image', + tf.summary.image('final_distorted_image', tf.expand_dims(distorted_image, 0)) - distorted_image = tf.sub(distorted_image, 0.5) - distorted_image = tf.mul(distorted_image, 2.0) + distorted_image = tf.subtract(distorted_image, 0.5) + distorted_image = tf.multiply(distorted_image, 2.0) return distorted_image @@ -241,7 +241,7 @@ def preprocess_for_eval(image, height, width, If height and width are specified it would output an image with that size by applying resize_bilinear. - If central_fraction is specified it would cropt the central fraction of the + If central_fraction is specified it would crop the central fraction of the input image. Args: @@ -270,8 +270,8 @@ def preprocess_for_eval(image, height, width, image = tf.image.resize_bilinear(image, [height, width], align_corners=False) image = tf.squeeze(image, [0]) - image = tf.sub(image, 0.5) - image = tf.mul(image, 2.0) + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.0) return image diff --git a/slim/preprocessing/lenet_preprocessing.py b/slim/preprocessing/lenet_preprocessing.py index 22c352a291cbe3a0520c608d7f3f254d4c3e21d8..ac5e71af889866312a0896f35023d85b2c260b25 100644 --- a/slim/preprocessing/lenet_preprocessing.py +++ b/slim/preprocessing/lenet_preprocessing.py @@ -39,6 +39,6 @@ def preprocess_image(image, output_height, output_width, is_training): image = tf.to_float(image) image = tf.image.resize_image_with_crop_or_pad( image, output_width, output_height) - image = tf.sub(image, 128.0) + image = tf.subtract(image, 128.0) image = tf.div(image, 128.0) return image diff --git a/slim/preprocessing/preprocessing_factory.py b/slim/preprocessing/preprocessing_factory.py index 89d3d96282b72ded2259b6370c5bf4fcb425a630..35f8645ef92f35fc74e5798fb0a4bf5a09b28730 100644 --- a/slim/preprocessing/preprocessing_factory.py +++ b/slim/preprocessing/preprocessing_factory.py @@ -56,6 +56,9 @@ def get_preprocessing(name, is_training=False): 'resnet_v1_50': vgg_preprocessing, 'resnet_v1_101': vgg_preprocessing, 'resnet_v1_152': vgg_preprocessing, + 'resnet_v2_50': vgg_preprocessing, + 'resnet_v2_101': vgg_preprocessing, + 'resnet_v2_152': vgg_preprocessing, 'vgg': vgg_preprocessing, 'vgg_a': vgg_preprocessing, 'vgg_16': vgg_preprocessing, diff --git a/slim/preprocessing/vgg_preprocessing.py b/slim/preprocessing/vgg_preprocessing.py index 672c7408e16896b4e53d9a3a1f17e9175332addf..c2c92f0a70a1c7b5f15f9232d59373f243eabe62 100644 --- a/slim/preprocessing/vgg_preprocessing.py +++ b/slim/preprocessing/vgg_preprocessing.py @@ -34,8 +34,6 @@ from __future__ import print_function import tensorflow as tf -from tensorflow.python.ops import control_flow_ops - slim = tf.contrib.slim _R_MEAN = 123.68 @@ -71,9 +69,8 @@ def _crop(image, offset_height, offset_width, crop_height, crop_width): rank_assertion = tf.Assert( tf.equal(tf.rank(image), 3), ['Rank of image must be equal to 3.']) - cropped_shape = control_flow_ops.with_dependencies( - [rank_assertion], - tf.pack([crop_height, crop_width, original_shape[2]])) + with tf.control_dependencies([rank_assertion]): + cropped_shape = tf.stack([crop_height, crop_width, original_shape[2]]) size_assertion = tf.Assert( tf.logical_and( @@ -81,13 +78,12 @@ def _crop(image, offset_height, offset_width, crop_height, crop_width): tf.greater_equal(original_shape[1], crop_width)), ['Crop size greater than the image size.']) - offsets = tf.to_int32(tf.pack([offset_height, offset_width, 0])) + offsets = tf.to_int32(tf.stack([offset_height, offset_width, 0])) # Use tf.slice instead of crop_to_bounding box as it accepts tensors to # define the crop size. - image = control_flow_ops.with_dependencies( - [size_assertion], - tf.slice(image, offsets, cropped_shape)) + with tf.control_dependencies([size_assertion]): + image = tf.slice(image, offsets, cropped_shape) return tf.reshape(image, cropped_shape) @@ -126,9 +122,8 @@ def _random_crop(image_list, crop_height, crop_width): image_list[i].name, 3, image_rank]) rank_assertions.append(rank_assert) - image_shape = control_flow_ops.with_dependencies( - [rank_assertions[0]], - tf.shape(image_list[0])) + with tf.control_dependencies([rank_assertions[0]]): + image_shape = tf.shape(image_list[0]) image_height = image_shape[0] image_width = image_shape[1] crop_size_assert = tf.Assert( @@ -142,8 +137,8 @@ def _random_crop(image_list, crop_height, crop_width): for i in range(1, len(image_list)): image = image_list[i] asserts.append(rank_assertions[i]) - shape = control_flow_ops.with_dependencies([rank_assertions[i]], - tf.shape(image)) + with tf.control_dependencies([rank_assertions[i]]): + shape = tf.shape(image) height = shape[0] width = shape[1] @@ -162,10 +157,10 @@ def _random_crop(image_list, crop_height, crop_width): # Use tf.random_uniform and not numpy.random.rand as doing the former would # generate random numbers at graph eval time, unlike the latter which # generates random numbers at graph definition time. - max_offset_height = control_flow_ops.with_dependencies( - asserts, tf.reshape(image_height - crop_height + 1, [])) - max_offset_width = control_flow_ops.with_dependencies( - asserts, tf.reshape(image_width - crop_width + 1, [])) + with tf.control_dependencies(asserts): + max_offset_height = tf.reshape(image_height - crop_height + 1, []) + with tf.control_dependencies(asserts): + max_offset_width = tf.reshape(image_width - crop_width + 1, []) offset_height = tf.random_uniform( [], maxval=max_offset_height, dtype=tf.int32) offset_width = tf.random_uniform( @@ -227,10 +222,10 @@ def _mean_image_subtraction(image, means): if len(means) != num_channels: raise ValueError('len(means) must match the number of channels') - channels = tf.split(2, num_channels, image) + channels = tf.split(axis=2, num_or_size_splits=num_channels, value=image) for i in range(num_channels): channels[i] -= means[i] - return tf.concat(2, channels) + return tf.concat(axis=2, values=channels) def _smallest_size_at_least(height, width, smallest_side): diff --git a/slim/scripts/finetune_inception_v1_on_flowers.sh b/slim/scripts/finetune_inception_v1_on_flowers.sh index 480b46c0991aec159e52b2df4972cf10f7f03cce..d152e367a7a4cc44bbd381beb9b50e1972468942 100644 --- a/slim/scripts/finetune_inception_v1_on_flowers.sh +++ b/slim/scripts/finetune_inception_v1_on_flowers.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/finetune_inception_v1_on_flowers.sh +set -e # Where the pre-trained InceptionV1 checkpoint is saved to. PRETRAINED_CHECKPOINT_DIR=/tmp/checkpoints diff --git a/slim/scripts/finetune_inception_v3_on_flowers.sh b/slim/scripts/finetune_inception_v3_on_flowers.sh index dfcc87ac8734ee5a5007c2e517ccfb1f7e0a50c5..627e42c063c4c0569508e9152bf9fa37c47c17ac 100644 --- a/slim/scripts/finetune_inception_v3_on_flowers.sh +++ b/slim/scripts/finetune_inception_v3_on_flowers.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/finetune_inceptionv3_on_flowers.sh +set -e # Where the pre-trained InceptionV3 checkpoint is saved to. PRETRAINED_CHECKPOINT_DIR=/tmp/checkpoints diff --git a/slim/scripts/finetune_resnet_v1_50_on_flowers.sh b/slim/scripts/finetune_resnet_v1_50_on_flowers.sh new file mode 100644 index 0000000000000000000000000000000000000000..8134dfc3d5bbb516784bb3eec8180e5dbc2fde52 --- /dev/null +++ b/slim/scripts/finetune_resnet_v1_50_on_flowers.sh @@ -0,0 +1,90 @@ +#!/bin/bash +# +# This script performs the following operations: +# 1. Downloads the Flowers dataset +# 2. Fine-tunes a ResNetV1-50 model on the Flowers training set. +# 3. Evaluates the model on the Flowers validation set. +# +# Usage: +# cd slim +# ./slim/scripts/finetune_resnet_v1_50_on_flowers.sh +set -e + +# Where the pre-trained ResNetV1-50 checkpoint is saved to. +PRETRAINED_CHECKPOINT_DIR=/tmp/checkpoints + +# Where the training (fine-tuned) checkpoint and logs will be saved to. +TRAIN_DIR=/tmp/flowers-models/resnet_v1_50 + +# Where the dataset is saved to. +DATASET_DIR=/tmp/flowers + +# Download the pre-trained checkpoint. +if [ ! -d "$PRETRAINED_CHECKPOINT_DIR" ]; then + mkdir ${PRETRAINED_CHECKPOINT_DIR} +fi +if [ ! -f ${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt ]; then + wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz + tar -xvf resnet_v1_50_2016_08_28.tar.gz + mv resnet_v1_50.ckpt ${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt + rm resnet_v1_50_2016_08_28.tar.gz +fi + +# Download the dataset +python download_and_convert_data.py \ + --dataset_name=flowers \ + --dataset_dir=${DATASET_DIR} + +# Fine-tune only the new layers for 3000 steps. +python train_image_classifier.py \ + --train_dir=${TRAIN_DIR} \ + --dataset_name=flowers \ + --dataset_split_name=train \ + --dataset_dir=${DATASET_DIR} \ + --model_name=resnet_v1_50 \ + --checkpoint_path=${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt \ + --checkpoint_exclude_scopes=resnet_v1_50/logits \ + --trainable_scopes=resnet_v1_50/logits \ + --max_number_of_steps=3000 \ + --batch_size=32 \ + --learning_rate=0.01 \ + --save_interval_secs=60 \ + --save_summaries_secs=60 \ + --log_every_n_steps=100 \ + --optimizer=rmsprop \ + --weight_decay=0.00004 + +# Run evaluation. +python eval_image_classifier.py \ + --checkpoint_path=${TRAIN_DIR} \ + --eval_dir=${TRAIN_DIR} \ + --dataset_name=flowers \ + --dataset_split_name=validation \ + --dataset_dir=${DATASET_DIR} \ + --model_name=resnet_v1_50 + +# Fine-tune all the new layers for 1000 steps. +python train_image_classifier.py \ + --train_dir=${TRAIN_DIR}/all \ + --dataset_name=flowers \ + --dataset_split_name=train \ + --dataset_dir=${DATASET_DIR} \ + --checkpoint_path=${TRAIN_DIR} \ + --model_name=resnet_v1_50 \ + --max_number_of_steps=1000 \ + --batch_size=32 \ + --learning_rate=0.001 \ + --save_interval_secs=60 \ + --save_summaries_secs=60 \ + --log_every_n_steps=100 \ + --optimizer=rmsprop \ + --weight_decay=0.00004 + +# Run evaluation. +python eval_image_classifier.py \ + --checkpoint_path=${TRAIN_DIR}/all \ + --eval_dir=${TRAIN_DIR}/all \ + --dataset_name=flowers \ + --dataset_split_name=validation \ + --dataset_dir=${DATASET_DIR} \ + --model_name=resnet_v1_50 diff --git a/slim/scripts/train_cifarnet_on_cifar10.sh b/slim/scripts/train_cifarnet_on_cifar10.sh index daefb22e13b576b6731b8c6637135db8ef3acc8d..bee535a7719ef91672d1b9a6220f569d42c103de 100644 --- a/slim/scripts/train_cifarnet_on_cifar10.sh +++ b/slim/scripts/train_cifarnet_on_cifar10.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./scripts/train_cifar_net_on_mnist.sh +set -e # Where the checkpoint and logs will be saved to. TRAIN_DIR=/tmp/cifarnet-model diff --git a/slim/scripts/train_lenet_on_mnist.sh b/slim/scripts/train_lenet_on_mnist.sh index 8dbeff2a00a6f76f8528bfa2e0da2cb19327616c..e5371eba52773cdd5c0b5fa4318fd26a395daa6b 100644 --- a/slim/scripts/train_lenet_on_mnist.sh +++ b/slim/scripts/train_lenet_on_mnist.sh @@ -8,6 +8,7 @@ # Usage: # cd slim # ./slim/scripts/train_lenet_on_mnist.sh +set -e # Where the checkpoint and logs will be saved to. TRAIN_DIR=/tmp/lenet-model diff --git a/slim/slim_walkthough.ipynb b/slim/slim_walkthrough.ipynb similarity index 90% rename from slim/slim_walkthough.ipynb rename to slim/slim_walkthrough.ipynb index ad91d4a383243eb169fd2aa450271a39495edfda..da868ef19eb55ae55e4f2c99be0d33e7081746ca 100644 --- a/slim/slim_walkthough.ipynb +++ b/slim/slim_walkthrough.ipynb @@ -157,7 +157,7 @@ "\n", " # Print name and shape of each tensor.\n", " print \"Layers\"\n", - " for k, v in end_points.iteritems():\n", + " for k, v in end_points.items():\n", " print 'name = {}, shape = {}'.format(v.name, v.get_shape())\n", "\n", " # Print name and shape of parameter nodes (values not yet initialized)\n", @@ -232,7 +232,7 @@ }, "outputs": [], "source": [ - "# The following snippet trains the regression model using a sum_of_squares loss.\n", + "# The following snippet trains the regression model using a mean_squared_error loss.\n", "ckpt_dir = '/tmp/regression_model/'\n", "\n", "with tf.Graph().as_default():\n", @@ -244,7 +244,7 @@ " predictions, nodes = regression_model(inputs, is_training=True)\n", "\n", " # Add the loss function to the graph.\n", - " loss = slim.losses.sum_of_squares(predictions, targets)\n", + " loss = tf.losses.mean_squared_error(labels=targets, predictions=predictions)\n", " \n", " # The total loss is the uers's loss plus any regularization losses.\n", " total_loss = slim.losses.get_total_loss()\n", @@ -289,12 +289,12 @@ " predictions, end_points = regression_model(inputs, is_training=True)\n", "\n", " # Add multiple loss nodes.\n", - " sum_of_squares_loss = slim.losses.sum_of_squares(predictions, targets)\n", + " mean_squared_error_loss = tf.losses.mean_squared_error(labels=targets, predictions=predictions)\n", " absolute_difference_loss = slim.losses.absolute_difference(predictions, targets)\n", "\n", " # The following two ways to compute the total loss are equivalent\n", " regularization_loss = tf.add_n(slim.losses.get_regularization_losses())\n", - " total_loss1 = sum_of_squares_loss + absolute_difference_loss + regularization_loss\n", + " total_loss1 = mean_squared_error_loss + absolute_difference_loss + regularization_loss\n", "\n", " # Regularization Loss is included in the total loss by default.\n", " # This is good for training, but not for testing.\n", @@ -391,7 +391,7 @@ " final_op=names_to_value_nodes.values())\n", "\n", " names_to_values = dict(zip(names_to_value_nodes.keys(), metric_values))\n", - " for key, value in names_to_values.iteritems():\n", + " for key, value in names_to_values.items():\n", " print('%s: %f' % (key, value))" ] }, @@ -676,7 +676,7 @@ " total_loss = slim.losses.get_total_loss()\n", "\n", " # Create some summaries to visualize the training process:\n", - " tf.scalar_summary('losses/Total Loss', total_loss)\n", + " tf.summary.scalar('losses/Total Loss', total_loss)\n", " \n", " # Specify the optimizer and create the train op:\n", " optimizer = tf.train.AdamOptimizer(learning_rate=0.01)\n", @@ -753,7 +753,9 @@ "However, this means they must be trained on big datasets. Since this process is slow, we provide various pre-trained models - see the list [here](https://github.com/tensorflow/models/tree/master/slim#pre-trained-models).\n", "\n", "\n", - "You can either use these models as-is, or you can perform \"surgery\" on them, to modify them for some other task. For example, it is common to \"chop off\" the final pre-softmax layer, and replace it with a new set of weights corresponding to some new set of labels. You can then quickly fine tune the new model on a small new dataset. We illustrate this below, using inception-v1 as the base model. While models like Inception V3 are more powerful, Inception V1 is used for speed purposes.\n" + "You can either use these models as-is, or you can perform \"surgery\" on them, to modify them for some other task. For example, it is common to \"chop off\" the final pre-softmax layer, and replace it with a new set of weights corresponding to some new set of labels. You can then quickly fine tune the new model on a small new dataset. We illustrate this below, using inception-v1 as the base model. While models like Inception V3 are more powerful, Inception V1 is used for speed purposes.\n", + "\n", + "Take into account that VGG and ResNet final layers have only 1000 outputs rather than 1001. The ImageNet dataset provied has an empty background class which can be used to fine-tune the model to other tasks. VGG and ResNet models provided here don't use that class. We provide two examples of using pretrained models: Inception V1 and VGG-19 models to highlight this difference.\n" ] }, { @@ -789,7 +791,7 @@ "metadata": {}, "source": [ "\n", - "### Apply Pre-trained model to Images.\n", + "### Apply Pre-trained Inception V1 model to Images.\n", "\n", "We have to convert each image to the size expected by the model checkpoint.\n", "There is no easy way to determine this size from the checkpoint itself.\n", @@ -815,7 +817,6 @@ "\n", "slim = tf.contrib.slim\n", "\n", - "batch_size = 3\n", "image_size = inception.inception_v1.default_image_size\n", "\n", "with tf.Graph().as_default():\n", @@ -848,7 +849,102 @@ " names = imagenet.create_readable_names_for_imagenet_labels()\n", " for i in range(5):\n", " index = sorted_inds[i]\n", - " print('Probability %0.2f%% => [%s]' % (probabilities[index], names[index]))" + " print('Probability %0.2f%% => [%s]' % (probabilities[index] * 100, names[index]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download the VGG-16 checkpoint" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from datasets import dataset_utils\n", + "import tensorflow as tf\n", + "\n", + "url = \"http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz\"\n", + "checkpoints_dir = '/tmp/checkpoints'\n", + "\n", + "if not tf.gfile.Exists(checkpoints_dir):\n", + " tf.gfile.MakeDirs(checkpoints_dir)\n", + "\n", + "dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Apply Pre-trained VGG-16 model to Images.\n", + "\n", + "We have to convert each image to the size expected by the model checkpoint.\n", + "There is no easy way to determine this size from the checkpoint itself.\n", + "So we use a preprocessor to enforce this. Pay attention to the difference caused by 1000 classes instead of 1001." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import os\n", + "import tensorflow as tf\n", + "import urllib2\n", + "\n", + "from datasets import imagenet\n", + "from nets import vgg\n", + "from preprocessing import vgg_preprocessing\n", + "\n", + "slim = tf.contrib.slim\n", + "\n", + "image_size = vgg.vgg_16.default_image_size\n", + "\n", + "with tf.Graph().as_default():\n", + " url = 'https://upload.wikimedia.org/wikipedia/commons/d/d9/First_Student_IC_school_bus_202076.jpg'\n", + " image_string = urllib2.urlopen(url).read()\n", + " image = tf.image.decode_jpeg(image_string, channels=3)\n", + " processed_image = vgg_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)\n", + " processed_images = tf.expand_dims(processed_image, 0)\n", + " \n", + " # Create the model, use the default arg scope to configure the batch norm parameters.\n", + " with slim.arg_scope(vgg.vgg_arg_scope()):\n", + " # 1000 classes instead of 1001.\n", + " logits, _ = vgg.vgg_16(processed_images, num_classes=1000, is_training=False)\n", + " probabilities = tf.nn.softmax(logits)\n", + " \n", + " init_fn = slim.assign_from_checkpoint_fn(\n", + " os.path.join(checkpoints_dir, 'vgg_16.ckpt'),\n", + " slim.get_model_variables('vgg_16'))\n", + " \n", + " with tf.Session() as sess:\n", + " init_fn(sess)\n", + " np_image, probabilities = sess.run([image, probabilities])\n", + " probabilities = probabilities[0, 0:]\n", + " sorted_inds = [i[0] for i in sorted(enumerate(-probabilities), key=lambda x:x[1])]\n", + " \n", + " plt.figure()\n", + " plt.imshow(np_image.astype(np.uint8))\n", + " plt.axis('off')\n", + " plt.show()\n", + " \n", + " names = imagenet.create_readable_names_for_imagenet_labels()\n", + " for i in range(5):\n", + " index = sorted_inds[i]\n", + " # Shift the index of a class name by one. \n", + " print('Probability %0.2f%% => [%s]' % (probabilities[index] * 100, names[index+1]))" ] }, { @@ -919,7 +1015,7 @@ " total_loss = slim.losses.get_total_loss()\n", "\n", " # Create some summaries to visualize the training process:\n", - " tf.scalar_summary('losses/Total Loss', total_loss)\n", + " tf.summary.scalar('losses/Total Loss', total_loss)\n", " \n", " # Specify the optimizer and create the train op:\n", " optimizer = tf.train.AdamOptimizer(learning_rate=0.01)\n", @@ -1015,7 +1111,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", - "version": "2.7.6" + "version": "2.7.11" } }, "nbformat": 4, diff --git a/slim/train_image_classifier.py b/slim/train_image_classifier.py old mode 100644 new mode 100755 index 0e95c60206cf29000ce23b7419285f0072c730d8..57049a1a23f44bcacd6331066620ee69a23368b0 --- a/slim/train_image_classifier.py +++ b/slim/train_image_classifier.py @@ -20,7 +20,6 @@ from __future__ import print_function import tensorflow as tf -from tensorflow.python.ops import control_flow_ops from datasets import dataset_factory from deployment import model_deploy from nets import nets_factory @@ -118,8 +117,6 @@ tf.app.flags.DEFINE_float( 'momentum', 0.9, 'The momentum for the MomentumOptimizer and RMSPropOptimizer.') -tf.app.flags.DEFINE_float('rmsprop_momentum', 0.9, 'Momentum.') - tf.app.flags.DEFINE_float('rmsprop_decay', 0.9, 'Decay term for RMSProp.') ####################### @@ -304,7 +301,7 @@ def _configure_optimizer(learning_rate): optimizer = tf.train.RMSPropOptimizer( learning_rate, decay=FLAGS.rmsprop_decay, - momentum=FLAGS.rmsprop_momentum, + momentum=FLAGS.momentum, epsilon=FLAGS.opt_epsilon) elif FLAGS.optimizer == 'sgd': optimizer = tf.train.GradientDescentOptimizer(learning_rate) @@ -312,15 +309,6 @@ def _configure_optimizer(learning_rate): raise ValueError('Optimizer [%s] was not recognized', FLAGS.optimizer) return optimizer - -def _add_variables_summaries(learning_rate): - summaries = [] - for variable in slim.get_model_variables(): - summaries.append(tf.histogram_summary(variable.op.name, variable)) - summaries.append(tf.scalar_summary('training/Learning Rate', learning_rate)) - return summaries - - def _get_init_fn(): """Returns a function run by the chief worker to warm-start the training. @@ -394,9 +382,9 @@ def main(_): tf.logging.set_verbosity(tf.logging.INFO) with tf.Graph().as_default(): - ###################### - # Config model_deploy# - ###################### + ####################### + # Config model_deploy # + ####################### deploy_config = model_deploy.DeploymentConfig( num_clones=FLAGS.num_clones, clone_on_cpu=FLAGS.clone_on_cpu, @@ -414,9 +402,9 @@ def main(_): dataset = dataset_factory.get_dataset( FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir) - #################### + ###################### # Select the network # - #################### + ###################### network_fn = nets_factory.get_network_fn( FLAGS.model_name, num_classes=(dataset.num_classes - FLAGS.labels_offset), @@ -469,11 +457,12 @@ def main(_): # Specify the loss function # ############################# if 'AuxLogits' in end_points: - slim.losses.softmax_cross_entropy( - end_points['AuxLogits'], labels, - label_smoothing=FLAGS.label_smoothing, weight=0.4, scope='aux_loss') - slim.losses.softmax_cross_entropy( - logits, labels, label_smoothing=FLAGS.label_smoothing, weight=1.0) + tf.losses.softmax_cross_entropy( + logits=end_points['AuxLogits'], onehot_labels=labels, + label_smoothing=FLAGS.label_smoothing, weights=0.4, scope='aux_loss') + tf.losses.softmax_cross_entropy( + logits=logits, onehot_labels=labels, + label_smoothing=FLAGS.label_smoothing, weights=1.0) return end_points # Gather initial summaries. @@ -489,17 +478,17 @@ def main(_): end_points = clones[0].outputs for end_point in end_points: x = end_points[end_point] - summaries.add(tf.histogram_summary('activations/' + end_point, x)) - summaries.add(tf.scalar_summary('sparsity/' + end_point, + summaries.add(tf.summary.histogram('activations/' + end_point, x)) + summaries.add(tf.summary.scalar('sparsity/' + end_point, tf.nn.zero_fraction(x))) # Add summaries for losses. for loss in tf.get_collection(tf.GraphKeys.LOSSES, first_clone_scope): - summaries.add(tf.scalar_summary('losses/%s' % loss.op.name, loss)) + summaries.add(tf.summary.scalar('losses/%s' % loss.op.name, loss)) # Add summaries for variables. for variable in slim.get_model_variables(): - summaries.add(tf.histogram_summary(variable.op.name, variable)) + summaries.add(tf.summary.histogram(variable.op.name, variable)) ################################# # Configure the moving averages # @@ -517,8 +506,7 @@ def main(_): with tf.device(deploy_config.optimizer_device()): learning_rate = _configure_learning_rate(dataset.num_samples, global_step) optimizer = _configure_optimizer(learning_rate) - summaries.add(tf.scalar_summary('learning_rate', learning_rate, - name='learning_rate')) + summaries.add(tf.summary.scalar('learning_rate', learning_rate)) if FLAGS.sync_replicas: # If sync_replicas is enabled, the averaging will be done in the chief @@ -543,8 +531,7 @@ def main(_): optimizer, var_list=variables_to_train) # Add total_loss to summary. - summaries.add(tf.scalar_summary('total_loss', total_loss, - name='total_loss')) + summaries.add(tf.summary.scalar('total_loss', total_loss)) # Create gradient updates. grad_updates = optimizer.apply_gradients(clones_gradients, @@ -552,8 +539,8 @@ def main(_): update_ops.append(grad_updates) update_op = tf.group(*update_ops) - train_tensor = control_flow_ops.with_dependencies([update_op], total_loss, - name='train_op') + with tf.control_dependencies([update_op]): + train_tensor = tf.identity(total_loss, name='train_op') # Add the summaries from the first clone. These contain the summaries # created by model_fn and either optimize_clones() or _gather_clone_loss(). @@ -561,7 +548,7 @@ def main(_): first_clone_scope)) # Merge all summaries together. - summary_op = tf.merge_summary(list(summaries), name='summary_op') + summary_op = tf.summary.merge(list(summaries), name='summary_op') ########################### diff --git a/street/README.md b/street/README.md index c816496991d89f68689301da2696d56c0e820824..1750a8843cfdb78e6e2739b6ff25c42c7db43b0c 100644 --- a/street/README.md +++ b/street/README.md @@ -32,11 +32,9 @@ identify the name of a street (in France) from an image containing upto four different views of the street name sign. The model merges information from the different views and normalizes the text to the correct format. For example: -
![Example image](g3doc/avdessapins.png) Avenue des Sapins -
## Installing and setting up the STREET model @@ -56,6 +54,10 @@ TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') g++ -std=c++11 -shared rnn_ops.cc -o rnn_ops.so -fPIC -I $TF_INC -O3 -mavx ``` +(Note: if running on Mac, add `-undefined dynamic_lookup` to your `g++` command. +If you are running a newer version of gcc, you may also need to add +`-D_GLIBCXX_USE_CXX11_ABI=0`.) + Run the unittests: ``` @@ -77,6 +79,7 @@ Note that these datasets are very large. The approximate sizes are: * Validation: 64 files of 40MB each. * Test: 64 files of 50MB each. * Testdata: some smaller data files of a few MB for testing. +* Total: ~158 Gb. Here is a list of the download paths: @@ -97,9 +100,14 @@ https://download.tensorflow.org/data/fsns-20160927/validation/validation-00000-o https://download.tensorflow.org/data/fsns-20160927/validation/validation-00063-of-00064 ``` -The above files need to be downloaded individually, as they are large and -downloads are more likely to succeed with the individual files than with a -single archive containing them all. +All URLs are stored in the text file `python/fsns_urls.txt`, to download them in +parallel: + +``` +aria2c -c -j 20 -i fsns_urls.txt +``` +If you ctrl+c and re-execute the command it will continue the aborted download. + ## Confidence Tests @@ -254,4 +262,3 @@ defines a Tensor Flow graph that can be used to process images of variable sizes to output a 1-dimensional sequence, like a transcription/OCR problem, or a 0-dimensional label, as for image identification problems. For more information see [vgslspecs](g3doc/vgslspecs.md) - diff --git a/street/python/fsns_urls.py b/street/python/fsns_urls.py new file mode 100644 index 0000000000000000000000000000000000000000..bea547b9d57315e81ed69d290370f851b17784e0 --- /dev/null +++ b/street/python/fsns_urls.py @@ -0,0 +1,49 @@ +# Copyright 2016 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Creates a text file with URLs to download FSNS dataset using aria2c. + +The FSNS dataset has 640 files and takes 158Gb of the disk space. So it is +highly recommended to use some kind of a download manager to download it. + +Aria2c is a powerful download manager which can download multiple files in +parallel, re-try if encounter an error and continue previously unfinished +downloads. +""" + +import os + +_FSNS_BASE_URL = 'http://download.tensorflow.org/data/fsns-20160927/' +_SHARDS = {'test': 64, 'train': 512, 'validation':64} +_OUTPUT_FILE = "fsns_urls.txt" +_OUTPUT_DIR = "data/fsns" + +def fsns_paths(): + paths = ['charset_size=134.txt'] + for name, shards in _SHARDS.items(): + for i in range(shards): + paths.append('%s/%s-%05d-of-%05d' % (name, name, i, shards)) + return paths + + +if __name__ == "__main__": + with open(_OUTPUT_FILE, "w") as f: + for path in fsns_paths(): + url = _FSNS_BASE_URL + path + dst_path = os.path.join(_OUTPUT_DIR, path) + f.write("%s\n out=%s\n" % (url, dst_path)) + print("To download FSNS dataset execute:") + print("aria2c -c -j 20 -i %s" % _OUTPUT_FILE) + print("The downloaded FSNS dataset will be stored under %s" % _OUTPUT_DIR) diff --git a/street/python/fsns_urls.txt b/street/python/fsns_urls.txt new file mode 100644 index 0000000000000000000000000000000000000000..959ffbd5d432105a2964ef2a4be07d046c7ab026 --- /dev/null +++ b/street/python/fsns_urls.txt @@ -0,0 +1,1282 @@ +http://download.tensorflow.org/data/fsns-20160927/charset_size=134.txt + out=data/fsns/charset_size=134.txt +http://download.tensorflow.org/data/fsns-20160927/test/test-00000-of-00064 + out=data/fsns/test/test-00000-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00001-of-00064 + out=data/fsns/test/test-00001-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00002-of-00064 + out=data/fsns/test/test-00002-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00003-of-00064 + out=data/fsns/test/test-00003-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00004-of-00064 + out=data/fsns/test/test-00004-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00005-of-00064 + out=data/fsns/test/test-00005-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00006-of-00064 + out=data/fsns/test/test-00006-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00007-of-00064 + out=data/fsns/test/test-00007-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00008-of-00064 + out=data/fsns/test/test-00008-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00009-of-00064 + out=data/fsns/test/test-00009-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00010-of-00064 + out=data/fsns/test/test-00010-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00011-of-00064 + out=data/fsns/test/test-00011-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00012-of-00064 + out=data/fsns/test/test-00012-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00013-of-00064 + out=data/fsns/test/test-00013-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00014-of-00064 + out=data/fsns/test/test-00014-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00015-of-00064 + out=data/fsns/test/test-00015-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00016-of-00064 + out=data/fsns/test/test-00016-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00017-of-00064 + out=data/fsns/test/test-00017-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00018-of-00064 + out=data/fsns/test/test-00018-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00019-of-00064 + out=data/fsns/test/test-00019-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00020-of-00064 + out=data/fsns/test/test-00020-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00021-of-00064 + out=data/fsns/test/test-00021-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00022-of-00064 + out=data/fsns/test/test-00022-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00023-of-00064 + out=data/fsns/test/test-00023-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00024-of-00064 + out=data/fsns/test/test-00024-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00025-of-00064 + out=data/fsns/test/test-00025-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00026-of-00064 + out=data/fsns/test/test-00026-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00027-of-00064 + out=data/fsns/test/test-00027-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00028-of-00064 + out=data/fsns/test/test-00028-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00029-of-00064 + out=data/fsns/test/test-00029-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00030-of-00064 + out=data/fsns/test/test-00030-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00031-of-00064 + out=data/fsns/test/test-00031-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00032-of-00064 + out=data/fsns/test/test-00032-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00033-of-00064 + out=data/fsns/test/test-00033-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00034-of-00064 + out=data/fsns/test/test-00034-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00035-of-00064 + out=data/fsns/test/test-00035-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00036-of-00064 + out=data/fsns/test/test-00036-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00037-of-00064 + out=data/fsns/test/test-00037-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00038-of-00064 + out=data/fsns/test/test-00038-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00039-of-00064 + out=data/fsns/test/test-00039-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00040-of-00064 + out=data/fsns/test/test-00040-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00041-of-00064 + out=data/fsns/test/test-00041-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00042-of-00064 + out=data/fsns/test/test-00042-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00043-of-00064 + out=data/fsns/test/test-00043-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00044-of-00064 + out=data/fsns/test/test-00044-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00045-of-00064 + out=data/fsns/test/test-00045-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00046-of-00064 + out=data/fsns/test/test-00046-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00047-of-00064 + out=data/fsns/test/test-00047-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00048-of-00064 + out=data/fsns/test/test-00048-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00049-of-00064 + out=data/fsns/test/test-00049-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00050-of-00064 + out=data/fsns/test/test-00050-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00051-of-00064 + out=data/fsns/test/test-00051-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00052-of-00064 + out=data/fsns/test/test-00052-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00053-of-00064 + out=data/fsns/test/test-00053-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00054-of-00064 + out=data/fsns/test/test-00054-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00055-of-00064 + out=data/fsns/test/test-00055-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00056-of-00064 + out=data/fsns/test/test-00056-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00057-of-00064 + out=data/fsns/test/test-00057-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00058-of-00064 + out=data/fsns/test/test-00058-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00059-of-00064 + out=data/fsns/test/test-00059-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00060-of-00064 + out=data/fsns/test/test-00060-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00061-of-00064 + out=data/fsns/test/test-00061-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00062-of-00064 + out=data/fsns/test/test-00062-of-00064 +http://download.tensorflow.org/data/fsns-20160927/test/test-00063-of-00064 + out=data/fsns/test/test-00063-of-00064 +http://download.tensorflow.org/data/fsns-20160927/train/train-00000-of-00512 + out=data/fsns/train/train-00000-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00001-of-00512 + out=data/fsns/train/train-00001-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00002-of-00512 + out=data/fsns/train/train-00002-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00003-of-00512 + out=data/fsns/train/train-00003-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00004-of-00512 + out=data/fsns/train/train-00004-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00005-of-00512 + out=data/fsns/train/train-00005-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00006-of-00512 + out=data/fsns/train/train-00006-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00007-of-00512 + out=data/fsns/train/train-00007-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00008-of-00512 + out=data/fsns/train/train-00008-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00009-of-00512 + out=data/fsns/train/train-00009-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00010-of-00512 + out=data/fsns/train/train-00010-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00011-of-00512 + out=data/fsns/train/train-00011-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00012-of-00512 + out=data/fsns/train/train-00012-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00013-of-00512 + out=data/fsns/train/train-00013-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00014-of-00512 + out=data/fsns/train/train-00014-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00015-of-00512 + out=data/fsns/train/train-00015-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00016-of-00512 + out=data/fsns/train/train-00016-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00017-of-00512 + out=data/fsns/train/train-00017-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00018-of-00512 + out=data/fsns/train/train-00018-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00019-of-00512 + out=data/fsns/train/train-00019-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00020-of-00512 + out=data/fsns/train/train-00020-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00021-of-00512 + out=data/fsns/train/train-00021-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00022-of-00512 + out=data/fsns/train/train-00022-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00023-of-00512 + out=data/fsns/train/train-00023-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00024-of-00512 + out=data/fsns/train/train-00024-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00025-of-00512 + out=data/fsns/train/train-00025-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00026-of-00512 + out=data/fsns/train/train-00026-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00027-of-00512 + out=data/fsns/train/train-00027-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00028-of-00512 + out=data/fsns/train/train-00028-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00029-of-00512 + out=data/fsns/train/train-00029-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00030-of-00512 + out=data/fsns/train/train-00030-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00031-of-00512 + out=data/fsns/train/train-00031-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00032-of-00512 + out=data/fsns/train/train-00032-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00033-of-00512 + out=data/fsns/train/train-00033-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00034-of-00512 + out=data/fsns/train/train-00034-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00035-of-00512 + out=data/fsns/train/train-00035-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00036-of-00512 + out=data/fsns/train/train-00036-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00037-of-00512 + out=data/fsns/train/train-00037-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00038-of-00512 + out=data/fsns/train/train-00038-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00039-of-00512 + out=data/fsns/train/train-00039-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00040-of-00512 + out=data/fsns/train/train-00040-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00041-of-00512 + out=data/fsns/train/train-00041-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00042-of-00512 + out=data/fsns/train/train-00042-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00043-of-00512 + out=data/fsns/train/train-00043-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00044-of-00512 + out=data/fsns/train/train-00044-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00045-of-00512 + out=data/fsns/train/train-00045-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00046-of-00512 + out=data/fsns/train/train-00046-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00047-of-00512 + out=data/fsns/train/train-00047-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00048-of-00512 + out=data/fsns/train/train-00048-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00049-of-00512 + out=data/fsns/train/train-00049-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00050-of-00512 + out=data/fsns/train/train-00050-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00051-of-00512 + out=data/fsns/train/train-00051-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00052-of-00512 + out=data/fsns/train/train-00052-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00053-of-00512 + out=data/fsns/train/train-00053-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00054-of-00512 + out=data/fsns/train/train-00054-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00055-of-00512 + out=data/fsns/train/train-00055-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00056-of-00512 + out=data/fsns/train/train-00056-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00057-of-00512 + out=data/fsns/train/train-00057-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00058-of-00512 + out=data/fsns/train/train-00058-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00059-of-00512 + out=data/fsns/train/train-00059-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00060-of-00512 + out=data/fsns/train/train-00060-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00061-of-00512 + out=data/fsns/train/train-00061-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00062-of-00512 + out=data/fsns/train/train-00062-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00063-of-00512 + out=data/fsns/train/train-00063-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00064-of-00512 + out=data/fsns/train/train-00064-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00065-of-00512 + out=data/fsns/train/train-00065-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00066-of-00512 + out=data/fsns/train/train-00066-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00067-of-00512 + out=data/fsns/train/train-00067-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00068-of-00512 + out=data/fsns/train/train-00068-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00069-of-00512 + out=data/fsns/train/train-00069-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00070-of-00512 + out=data/fsns/train/train-00070-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00071-of-00512 + out=data/fsns/train/train-00071-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00072-of-00512 + out=data/fsns/train/train-00072-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00073-of-00512 + out=data/fsns/train/train-00073-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00074-of-00512 + out=data/fsns/train/train-00074-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00075-of-00512 + out=data/fsns/train/train-00075-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00076-of-00512 + out=data/fsns/train/train-00076-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00077-of-00512 + out=data/fsns/train/train-00077-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00078-of-00512 + out=data/fsns/train/train-00078-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00079-of-00512 + out=data/fsns/train/train-00079-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00080-of-00512 + out=data/fsns/train/train-00080-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00081-of-00512 + out=data/fsns/train/train-00081-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00082-of-00512 + out=data/fsns/train/train-00082-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00083-of-00512 + out=data/fsns/train/train-00083-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00084-of-00512 + out=data/fsns/train/train-00084-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00085-of-00512 + out=data/fsns/train/train-00085-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00086-of-00512 + out=data/fsns/train/train-00086-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00087-of-00512 + out=data/fsns/train/train-00087-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00088-of-00512 + out=data/fsns/train/train-00088-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00089-of-00512 + out=data/fsns/train/train-00089-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00090-of-00512 + out=data/fsns/train/train-00090-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00091-of-00512 + out=data/fsns/train/train-00091-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00092-of-00512 + out=data/fsns/train/train-00092-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00093-of-00512 + out=data/fsns/train/train-00093-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00094-of-00512 + out=data/fsns/train/train-00094-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00095-of-00512 + out=data/fsns/train/train-00095-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00096-of-00512 + out=data/fsns/train/train-00096-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00097-of-00512 + out=data/fsns/train/train-00097-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00098-of-00512 + out=data/fsns/train/train-00098-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00099-of-00512 + out=data/fsns/train/train-00099-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00100-of-00512 + out=data/fsns/train/train-00100-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00101-of-00512 + out=data/fsns/train/train-00101-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00102-of-00512 + out=data/fsns/train/train-00102-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00103-of-00512 + out=data/fsns/train/train-00103-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00104-of-00512 + out=data/fsns/train/train-00104-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00105-of-00512 + out=data/fsns/train/train-00105-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00106-of-00512 + out=data/fsns/train/train-00106-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00107-of-00512 + out=data/fsns/train/train-00107-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00108-of-00512 + out=data/fsns/train/train-00108-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00109-of-00512 + out=data/fsns/train/train-00109-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00110-of-00512 + out=data/fsns/train/train-00110-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00111-of-00512 + out=data/fsns/train/train-00111-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00112-of-00512 + out=data/fsns/train/train-00112-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00113-of-00512 + out=data/fsns/train/train-00113-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00114-of-00512 + out=data/fsns/train/train-00114-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00115-of-00512 + out=data/fsns/train/train-00115-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00116-of-00512 + out=data/fsns/train/train-00116-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00117-of-00512 + out=data/fsns/train/train-00117-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00118-of-00512 + out=data/fsns/train/train-00118-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00119-of-00512 + out=data/fsns/train/train-00119-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00120-of-00512 + out=data/fsns/train/train-00120-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00121-of-00512 + out=data/fsns/train/train-00121-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00122-of-00512 + out=data/fsns/train/train-00122-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00123-of-00512 + out=data/fsns/train/train-00123-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00124-of-00512 + out=data/fsns/train/train-00124-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00125-of-00512 + out=data/fsns/train/train-00125-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00126-of-00512 + out=data/fsns/train/train-00126-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00127-of-00512 + out=data/fsns/train/train-00127-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00128-of-00512 + out=data/fsns/train/train-00128-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00129-of-00512 + out=data/fsns/train/train-00129-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00130-of-00512 + out=data/fsns/train/train-00130-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00131-of-00512 + out=data/fsns/train/train-00131-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00132-of-00512 + out=data/fsns/train/train-00132-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00133-of-00512 + out=data/fsns/train/train-00133-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00134-of-00512 + out=data/fsns/train/train-00134-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00135-of-00512 + out=data/fsns/train/train-00135-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00136-of-00512 + out=data/fsns/train/train-00136-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00137-of-00512 + out=data/fsns/train/train-00137-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00138-of-00512 + out=data/fsns/train/train-00138-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00139-of-00512 + out=data/fsns/train/train-00139-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00140-of-00512 + out=data/fsns/train/train-00140-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00141-of-00512 + out=data/fsns/train/train-00141-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00142-of-00512 + out=data/fsns/train/train-00142-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00143-of-00512 + out=data/fsns/train/train-00143-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00144-of-00512 + out=data/fsns/train/train-00144-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00145-of-00512 + out=data/fsns/train/train-00145-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00146-of-00512 + out=data/fsns/train/train-00146-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00147-of-00512 + out=data/fsns/train/train-00147-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00148-of-00512 + out=data/fsns/train/train-00148-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00149-of-00512 + out=data/fsns/train/train-00149-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00150-of-00512 + out=data/fsns/train/train-00150-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00151-of-00512 + out=data/fsns/train/train-00151-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00152-of-00512 + out=data/fsns/train/train-00152-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00153-of-00512 + out=data/fsns/train/train-00153-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00154-of-00512 + out=data/fsns/train/train-00154-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00155-of-00512 + out=data/fsns/train/train-00155-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00156-of-00512 + out=data/fsns/train/train-00156-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00157-of-00512 + out=data/fsns/train/train-00157-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00158-of-00512 + out=data/fsns/train/train-00158-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00159-of-00512 + out=data/fsns/train/train-00159-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00160-of-00512 + out=data/fsns/train/train-00160-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00161-of-00512 + out=data/fsns/train/train-00161-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00162-of-00512 + out=data/fsns/train/train-00162-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00163-of-00512 + out=data/fsns/train/train-00163-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00164-of-00512 + out=data/fsns/train/train-00164-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00165-of-00512 + out=data/fsns/train/train-00165-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00166-of-00512 + out=data/fsns/train/train-00166-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00167-of-00512 + out=data/fsns/train/train-00167-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00168-of-00512 + out=data/fsns/train/train-00168-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00169-of-00512 + out=data/fsns/train/train-00169-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00170-of-00512 + out=data/fsns/train/train-00170-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00171-of-00512 + out=data/fsns/train/train-00171-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00172-of-00512 + out=data/fsns/train/train-00172-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00173-of-00512 + out=data/fsns/train/train-00173-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00174-of-00512 + out=data/fsns/train/train-00174-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00175-of-00512 + out=data/fsns/train/train-00175-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00176-of-00512 + out=data/fsns/train/train-00176-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00177-of-00512 + out=data/fsns/train/train-00177-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00178-of-00512 + out=data/fsns/train/train-00178-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00179-of-00512 + out=data/fsns/train/train-00179-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00180-of-00512 + out=data/fsns/train/train-00180-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00181-of-00512 + out=data/fsns/train/train-00181-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00182-of-00512 + out=data/fsns/train/train-00182-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00183-of-00512 + out=data/fsns/train/train-00183-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00184-of-00512 + out=data/fsns/train/train-00184-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00185-of-00512 + out=data/fsns/train/train-00185-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00186-of-00512 + out=data/fsns/train/train-00186-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00187-of-00512 + out=data/fsns/train/train-00187-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00188-of-00512 + out=data/fsns/train/train-00188-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00189-of-00512 + out=data/fsns/train/train-00189-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00190-of-00512 + out=data/fsns/train/train-00190-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00191-of-00512 + out=data/fsns/train/train-00191-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00192-of-00512 + out=data/fsns/train/train-00192-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00193-of-00512 + out=data/fsns/train/train-00193-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00194-of-00512 + out=data/fsns/train/train-00194-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00195-of-00512 + out=data/fsns/train/train-00195-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00196-of-00512 + out=data/fsns/train/train-00196-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00197-of-00512 + out=data/fsns/train/train-00197-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00198-of-00512 + out=data/fsns/train/train-00198-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00199-of-00512 + out=data/fsns/train/train-00199-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00200-of-00512 + out=data/fsns/train/train-00200-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00201-of-00512 + out=data/fsns/train/train-00201-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00202-of-00512 + out=data/fsns/train/train-00202-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00203-of-00512 + out=data/fsns/train/train-00203-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00204-of-00512 + out=data/fsns/train/train-00204-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00205-of-00512 + out=data/fsns/train/train-00205-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00206-of-00512 + out=data/fsns/train/train-00206-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00207-of-00512 + out=data/fsns/train/train-00207-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00208-of-00512 + out=data/fsns/train/train-00208-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00209-of-00512 + out=data/fsns/train/train-00209-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00210-of-00512 + out=data/fsns/train/train-00210-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00211-of-00512 + out=data/fsns/train/train-00211-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00212-of-00512 + out=data/fsns/train/train-00212-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00213-of-00512 + out=data/fsns/train/train-00213-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00214-of-00512 + out=data/fsns/train/train-00214-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00215-of-00512 + out=data/fsns/train/train-00215-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00216-of-00512 + out=data/fsns/train/train-00216-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00217-of-00512 + out=data/fsns/train/train-00217-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00218-of-00512 + out=data/fsns/train/train-00218-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00219-of-00512 + out=data/fsns/train/train-00219-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00220-of-00512 + out=data/fsns/train/train-00220-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00221-of-00512 + out=data/fsns/train/train-00221-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00222-of-00512 + out=data/fsns/train/train-00222-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00223-of-00512 + out=data/fsns/train/train-00223-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00224-of-00512 + out=data/fsns/train/train-00224-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00225-of-00512 + out=data/fsns/train/train-00225-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00226-of-00512 + out=data/fsns/train/train-00226-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00227-of-00512 + out=data/fsns/train/train-00227-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00228-of-00512 + out=data/fsns/train/train-00228-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00229-of-00512 + out=data/fsns/train/train-00229-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00230-of-00512 + out=data/fsns/train/train-00230-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00231-of-00512 + out=data/fsns/train/train-00231-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00232-of-00512 + out=data/fsns/train/train-00232-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00233-of-00512 + out=data/fsns/train/train-00233-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00234-of-00512 + out=data/fsns/train/train-00234-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00235-of-00512 + out=data/fsns/train/train-00235-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00236-of-00512 + out=data/fsns/train/train-00236-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00237-of-00512 + out=data/fsns/train/train-00237-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00238-of-00512 + out=data/fsns/train/train-00238-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00239-of-00512 + out=data/fsns/train/train-00239-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00240-of-00512 + out=data/fsns/train/train-00240-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00241-of-00512 + out=data/fsns/train/train-00241-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00242-of-00512 + out=data/fsns/train/train-00242-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00243-of-00512 + out=data/fsns/train/train-00243-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00244-of-00512 + out=data/fsns/train/train-00244-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00245-of-00512 + out=data/fsns/train/train-00245-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00246-of-00512 + out=data/fsns/train/train-00246-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00247-of-00512 + out=data/fsns/train/train-00247-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00248-of-00512 + out=data/fsns/train/train-00248-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00249-of-00512 + out=data/fsns/train/train-00249-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00250-of-00512 + out=data/fsns/train/train-00250-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00251-of-00512 + out=data/fsns/train/train-00251-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00252-of-00512 + out=data/fsns/train/train-00252-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00253-of-00512 + out=data/fsns/train/train-00253-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00254-of-00512 + out=data/fsns/train/train-00254-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00255-of-00512 + out=data/fsns/train/train-00255-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00256-of-00512 + out=data/fsns/train/train-00256-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00257-of-00512 + out=data/fsns/train/train-00257-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00258-of-00512 + out=data/fsns/train/train-00258-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00259-of-00512 + out=data/fsns/train/train-00259-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00260-of-00512 + out=data/fsns/train/train-00260-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00261-of-00512 + out=data/fsns/train/train-00261-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00262-of-00512 + out=data/fsns/train/train-00262-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00263-of-00512 + out=data/fsns/train/train-00263-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00264-of-00512 + out=data/fsns/train/train-00264-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00265-of-00512 + out=data/fsns/train/train-00265-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00266-of-00512 + out=data/fsns/train/train-00266-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00267-of-00512 + out=data/fsns/train/train-00267-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00268-of-00512 + out=data/fsns/train/train-00268-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00269-of-00512 + out=data/fsns/train/train-00269-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00270-of-00512 + out=data/fsns/train/train-00270-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00271-of-00512 + out=data/fsns/train/train-00271-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00272-of-00512 + out=data/fsns/train/train-00272-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00273-of-00512 + out=data/fsns/train/train-00273-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00274-of-00512 + out=data/fsns/train/train-00274-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00275-of-00512 + out=data/fsns/train/train-00275-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00276-of-00512 + out=data/fsns/train/train-00276-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00277-of-00512 + out=data/fsns/train/train-00277-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00278-of-00512 + out=data/fsns/train/train-00278-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00279-of-00512 + out=data/fsns/train/train-00279-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00280-of-00512 + out=data/fsns/train/train-00280-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00281-of-00512 + out=data/fsns/train/train-00281-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00282-of-00512 + out=data/fsns/train/train-00282-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00283-of-00512 + out=data/fsns/train/train-00283-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00284-of-00512 + out=data/fsns/train/train-00284-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00285-of-00512 + out=data/fsns/train/train-00285-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00286-of-00512 + out=data/fsns/train/train-00286-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00287-of-00512 + out=data/fsns/train/train-00287-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00288-of-00512 + out=data/fsns/train/train-00288-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00289-of-00512 + out=data/fsns/train/train-00289-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00290-of-00512 + out=data/fsns/train/train-00290-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00291-of-00512 + out=data/fsns/train/train-00291-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00292-of-00512 + out=data/fsns/train/train-00292-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00293-of-00512 + out=data/fsns/train/train-00293-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00294-of-00512 + out=data/fsns/train/train-00294-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00295-of-00512 + out=data/fsns/train/train-00295-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00296-of-00512 + out=data/fsns/train/train-00296-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00297-of-00512 + out=data/fsns/train/train-00297-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00298-of-00512 + out=data/fsns/train/train-00298-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00299-of-00512 + out=data/fsns/train/train-00299-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00300-of-00512 + out=data/fsns/train/train-00300-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00301-of-00512 + out=data/fsns/train/train-00301-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00302-of-00512 + out=data/fsns/train/train-00302-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00303-of-00512 + out=data/fsns/train/train-00303-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00304-of-00512 + out=data/fsns/train/train-00304-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00305-of-00512 + out=data/fsns/train/train-00305-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00306-of-00512 + out=data/fsns/train/train-00306-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00307-of-00512 + out=data/fsns/train/train-00307-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00308-of-00512 + out=data/fsns/train/train-00308-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00309-of-00512 + out=data/fsns/train/train-00309-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00310-of-00512 + out=data/fsns/train/train-00310-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00311-of-00512 + out=data/fsns/train/train-00311-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00312-of-00512 + out=data/fsns/train/train-00312-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00313-of-00512 + out=data/fsns/train/train-00313-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00314-of-00512 + out=data/fsns/train/train-00314-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00315-of-00512 + out=data/fsns/train/train-00315-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00316-of-00512 + out=data/fsns/train/train-00316-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00317-of-00512 + out=data/fsns/train/train-00317-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00318-of-00512 + out=data/fsns/train/train-00318-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00319-of-00512 + out=data/fsns/train/train-00319-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00320-of-00512 + out=data/fsns/train/train-00320-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00321-of-00512 + out=data/fsns/train/train-00321-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00322-of-00512 + out=data/fsns/train/train-00322-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00323-of-00512 + out=data/fsns/train/train-00323-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00324-of-00512 + out=data/fsns/train/train-00324-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00325-of-00512 + out=data/fsns/train/train-00325-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00326-of-00512 + out=data/fsns/train/train-00326-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00327-of-00512 + out=data/fsns/train/train-00327-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00328-of-00512 + out=data/fsns/train/train-00328-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00329-of-00512 + out=data/fsns/train/train-00329-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00330-of-00512 + out=data/fsns/train/train-00330-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00331-of-00512 + out=data/fsns/train/train-00331-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00332-of-00512 + out=data/fsns/train/train-00332-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00333-of-00512 + out=data/fsns/train/train-00333-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00334-of-00512 + out=data/fsns/train/train-00334-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00335-of-00512 + out=data/fsns/train/train-00335-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00336-of-00512 + out=data/fsns/train/train-00336-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00337-of-00512 + out=data/fsns/train/train-00337-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00338-of-00512 + out=data/fsns/train/train-00338-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00339-of-00512 + out=data/fsns/train/train-00339-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00340-of-00512 + out=data/fsns/train/train-00340-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00341-of-00512 + out=data/fsns/train/train-00341-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00342-of-00512 + out=data/fsns/train/train-00342-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00343-of-00512 + out=data/fsns/train/train-00343-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00344-of-00512 + out=data/fsns/train/train-00344-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00345-of-00512 + out=data/fsns/train/train-00345-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00346-of-00512 + out=data/fsns/train/train-00346-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00347-of-00512 + out=data/fsns/train/train-00347-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00348-of-00512 + out=data/fsns/train/train-00348-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00349-of-00512 + out=data/fsns/train/train-00349-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00350-of-00512 + out=data/fsns/train/train-00350-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00351-of-00512 + out=data/fsns/train/train-00351-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00352-of-00512 + out=data/fsns/train/train-00352-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00353-of-00512 + out=data/fsns/train/train-00353-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00354-of-00512 + out=data/fsns/train/train-00354-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00355-of-00512 + out=data/fsns/train/train-00355-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00356-of-00512 + out=data/fsns/train/train-00356-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00357-of-00512 + out=data/fsns/train/train-00357-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00358-of-00512 + out=data/fsns/train/train-00358-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00359-of-00512 + out=data/fsns/train/train-00359-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00360-of-00512 + out=data/fsns/train/train-00360-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00361-of-00512 + out=data/fsns/train/train-00361-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00362-of-00512 + out=data/fsns/train/train-00362-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00363-of-00512 + out=data/fsns/train/train-00363-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00364-of-00512 + out=data/fsns/train/train-00364-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00365-of-00512 + out=data/fsns/train/train-00365-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00366-of-00512 + out=data/fsns/train/train-00366-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00367-of-00512 + out=data/fsns/train/train-00367-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00368-of-00512 + out=data/fsns/train/train-00368-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00369-of-00512 + out=data/fsns/train/train-00369-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00370-of-00512 + out=data/fsns/train/train-00370-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00371-of-00512 + out=data/fsns/train/train-00371-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00372-of-00512 + out=data/fsns/train/train-00372-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00373-of-00512 + out=data/fsns/train/train-00373-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00374-of-00512 + out=data/fsns/train/train-00374-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00375-of-00512 + out=data/fsns/train/train-00375-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00376-of-00512 + out=data/fsns/train/train-00376-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00377-of-00512 + out=data/fsns/train/train-00377-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00378-of-00512 + out=data/fsns/train/train-00378-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00379-of-00512 + out=data/fsns/train/train-00379-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00380-of-00512 + out=data/fsns/train/train-00380-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00381-of-00512 + out=data/fsns/train/train-00381-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00382-of-00512 + out=data/fsns/train/train-00382-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00383-of-00512 + out=data/fsns/train/train-00383-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00384-of-00512 + out=data/fsns/train/train-00384-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00385-of-00512 + out=data/fsns/train/train-00385-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00386-of-00512 + out=data/fsns/train/train-00386-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00387-of-00512 + out=data/fsns/train/train-00387-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00388-of-00512 + out=data/fsns/train/train-00388-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00389-of-00512 + out=data/fsns/train/train-00389-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00390-of-00512 + out=data/fsns/train/train-00390-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00391-of-00512 + out=data/fsns/train/train-00391-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00392-of-00512 + out=data/fsns/train/train-00392-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00393-of-00512 + out=data/fsns/train/train-00393-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00394-of-00512 + out=data/fsns/train/train-00394-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00395-of-00512 + out=data/fsns/train/train-00395-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00396-of-00512 + out=data/fsns/train/train-00396-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00397-of-00512 + out=data/fsns/train/train-00397-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00398-of-00512 + out=data/fsns/train/train-00398-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00399-of-00512 + out=data/fsns/train/train-00399-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00400-of-00512 + out=data/fsns/train/train-00400-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00401-of-00512 + out=data/fsns/train/train-00401-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00402-of-00512 + out=data/fsns/train/train-00402-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00403-of-00512 + out=data/fsns/train/train-00403-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00404-of-00512 + out=data/fsns/train/train-00404-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00405-of-00512 + out=data/fsns/train/train-00405-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00406-of-00512 + out=data/fsns/train/train-00406-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00407-of-00512 + out=data/fsns/train/train-00407-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00408-of-00512 + out=data/fsns/train/train-00408-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00409-of-00512 + out=data/fsns/train/train-00409-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00410-of-00512 + out=data/fsns/train/train-00410-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00411-of-00512 + out=data/fsns/train/train-00411-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00412-of-00512 + out=data/fsns/train/train-00412-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00413-of-00512 + out=data/fsns/train/train-00413-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00414-of-00512 + out=data/fsns/train/train-00414-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00415-of-00512 + out=data/fsns/train/train-00415-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00416-of-00512 + out=data/fsns/train/train-00416-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00417-of-00512 + out=data/fsns/train/train-00417-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00418-of-00512 + out=data/fsns/train/train-00418-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00419-of-00512 + out=data/fsns/train/train-00419-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00420-of-00512 + out=data/fsns/train/train-00420-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00421-of-00512 + out=data/fsns/train/train-00421-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00422-of-00512 + out=data/fsns/train/train-00422-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00423-of-00512 + out=data/fsns/train/train-00423-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00424-of-00512 + out=data/fsns/train/train-00424-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00425-of-00512 + out=data/fsns/train/train-00425-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00426-of-00512 + out=data/fsns/train/train-00426-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00427-of-00512 + out=data/fsns/train/train-00427-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00428-of-00512 + out=data/fsns/train/train-00428-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00429-of-00512 + out=data/fsns/train/train-00429-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00430-of-00512 + out=data/fsns/train/train-00430-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00431-of-00512 + out=data/fsns/train/train-00431-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00432-of-00512 + out=data/fsns/train/train-00432-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00433-of-00512 + out=data/fsns/train/train-00433-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00434-of-00512 + out=data/fsns/train/train-00434-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00435-of-00512 + out=data/fsns/train/train-00435-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00436-of-00512 + out=data/fsns/train/train-00436-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00437-of-00512 + out=data/fsns/train/train-00437-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00438-of-00512 + out=data/fsns/train/train-00438-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00439-of-00512 + out=data/fsns/train/train-00439-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00440-of-00512 + out=data/fsns/train/train-00440-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00441-of-00512 + out=data/fsns/train/train-00441-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00442-of-00512 + out=data/fsns/train/train-00442-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00443-of-00512 + out=data/fsns/train/train-00443-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00444-of-00512 + out=data/fsns/train/train-00444-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00445-of-00512 + out=data/fsns/train/train-00445-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00446-of-00512 + out=data/fsns/train/train-00446-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00447-of-00512 + out=data/fsns/train/train-00447-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00448-of-00512 + out=data/fsns/train/train-00448-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00449-of-00512 + out=data/fsns/train/train-00449-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00450-of-00512 + out=data/fsns/train/train-00450-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00451-of-00512 + out=data/fsns/train/train-00451-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00452-of-00512 + out=data/fsns/train/train-00452-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00453-of-00512 + out=data/fsns/train/train-00453-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00454-of-00512 + out=data/fsns/train/train-00454-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00455-of-00512 + out=data/fsns/train/train-00455-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00456-of-00512 + out=data/fsns/train/train-00456-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00457-of-00512 + out=data/fsns/train/train-00457-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00458-of-00512 + out=data/fsns/train/train-00458-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00459-of-00512 + out=data/fsns/train/train-00459-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00460-of-00512 + out=data/fsns/train/train-00460-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00461-of-00512 + out=data/fsns/train/train-00461-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00462-of-00512 + out=data/fsns/train/train-00462-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00463-of-00512 + out=data/fsns/train/train-00463-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00464-of-00512 + out=data/fsns/train/train-00464-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00465-of-00512 + out=data/fsns/train/train-00465-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00466-of-00512 + out=data/fsns/train/train-00466-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00467-of-00512 + out=data/fsns/train/train-00467-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00468-of-00512 + out=data/fsns/train/train-00468-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00469-of-00512 + out=data/fsns/train/train-00469-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00470-of-00512 + out=data/fsns/train/train-00470-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00471-of-00512 + out=data/fsns/train/train-00471-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00472-of-00512 + out=data/fsns/train/train-00472-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00473-of-00512 + out=data/fsns/train/train-00473-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00474-of-00512 + out=data/fsns/train/train-00474-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00475-of-00512 + out=data/fsns/train/train-00475-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00476-of-00512 + out=data/fsns/train/train-00476-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00477-of-00512 + out=data/fsns/train/train-00477-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00478-of-00512 + out=data/fsns/train/train-00478-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00479-of-00512 + out=data/fsns/train/train-00479-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00480-of-00512 + out=data/fsns/train/train-00480-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00481-of-00512 + out=data/fsns/train/train-00481-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00482-of-00512 + out=data/fsns/train/train-00482-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00483-of-00512 + out=data/fsns/train/train-00483-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00484-of-00512 + out=data/fsns/train/train-00484-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00485-of-00512 + out=data/fsns/train/train-00485-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00486-of-00512 + out=data/fsns/train/train-00486-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00487-of-00512 + out=data/fsns/train/train-00487-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00488-of-00512 + out=data/fsns/train/train-00488-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00489-of-00512 + out=data/fsns/train/train-00489-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00490-of-00512 + out=data/fsns/train/train-00490-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00491-of-00512 + out=data/fsns/train/train-00491-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00492-of-00512 + out=data/fsns/train/train-00492-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00493-of-00512 + out=data/fsns/train/train-00493-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00494-of-00512 + out=data/fsns/train/train-00494-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00495-of-00512 + out=data/fsns/train/train-00495-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00496-of-00512 + out=data/fsns/train/train-00496-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00497-of-00512 + out=data/fsns/train/train-00497-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00498-of-00512 + out=data/fsns/train/train-00498-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00499-of-00512 + out=data/fsns/train/train-00499-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00500-of-00512 + out=data/fsns/train/train-00500-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00501-of-00512 + out=data/fsns/train/train-00501-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00502-of-00512 + out=data/fsns/train/train-00502-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00503-of-00512 + out=data/fsns/train/train-00503-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00504-of-00512 + out=data/fsns/train/train-00504-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00505-of-00512 + out=data/fsns/train/train-00505-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00506-of-00512 + out=data/fsns/train/train-00506-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00507-of-00512 + out=data/fsns/train/train-00507-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00508-of-00512 + out=data/fsns/train/train-00508-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00509-of-00512 + out=data/fsns/train/train-00509-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00510-of-00512 + out=data/fsns/train/train-00510-of-00512 +http://download.tensorflow.org/data/fsns-20160927/train/train-00511-of-00512 + out=data/fsns/train/train-00511-of-00512 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00000-of-00064 + out=data/fsns/validation/validation-00000-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00001-of-00064 + out=data/fsns/validation/validation-00001-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00002-of-00064 + out=data/fsns/validation/validation-00002-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00003-of-00064 + out=data/fsns/validation/validation-00003-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00004-of-00064 + out=data/fsns/validation/validation-00004-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00005-of-00064 + out=data/fsns/validation/validation-00005-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00006-of-00064 + out=data/fsns/validation/validation-00006-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00007-of-00064 + out=data/fsns/validation/validation-00007-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00008-of-00064 + out=data/fsns/validation/validation-00008-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00009-of-00064 + out=data/fsns/validation/validation-00009-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00010-of-00064 + out=data/fsns/validation/validation-00010-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00011-of-00064 + out=data/fsns/validation/validation-00011-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00012-of-00064 + out=data/fsns/validation/validation-00012-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00013-of-00064 + out=data/fsns/validation/validation-00013-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00014-of-00064 + out=data/fsns/validation/validation-00014-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00015-of-00064 + out=data/fsns/validation/validation-00015-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00016-of-00064 + out=data/fsns/validation/validation-00016-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00017-of-00064 + out=data/fsns/validation/validation-00017-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00018-of-00064 + out=data/fsns/validation/validation-00018-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00019-of-00064 + out=data/fsns/validation/validation-00019-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00020-of-00064 + out=data/fsns/validation/validation-00020-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00021-of-00064 + out=data/fsns/validation/validation-00021-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00022-of-00064 + out=data/fsns/validation/validation-00022-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00023-of-00064 + out=data/fsns/validation/validation-00023-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00024-of-00064 + out=data/fsns/validation/validation-00024-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00025-of-00064 + out=data/fsns/validation/validation-00025-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00026-of-00064 + out=data/fsns/validation/validation-00026-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00027-of-00064 + out=data/fsns/validation/validation-00027-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00028-of-00064 + out=data/fsns/validation/validation-00028-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00029-of-00064 + out=data/fsns/validation/validation-00029-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00030-of-00064 + out=data/fsns/validation/validation-00030-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00031-of-00064 + out=data/fsns/validation/validation-00031-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00032-of-00064 + out=data/fsns/validation/validation-00032-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00033-of-00064 + out=data/fsns/validation/validation-00033-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00034-of-00064 + out=data/fsns/validation/validation-00034-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00035-of-00064 + out=data/fsns/validation/validation-00035-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00036-of-00064 + out=data/fsns/validation/validation-00036-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00037-of-00064 + out=data/fsns/validation/validation-00037-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00038-of-00064 + out=data/fsns/validation/validation-00038-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00039-of-00064 + out=data/fsns/validation/validation-00039-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00040-of-00064 + out=data/fsns/validation/validation-00040-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00041-of-00064 + out=data/fsns/validation/validation-00041-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00042-of-00064 + out=data/fsns/validation/validation-00042-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00043-of-00064 + out=data/fsns/validation/validation-00043-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00044-of-00064 + out=data/fsns/validation/validation-00044-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00045-of-00064 + out=data/fsns/validation/validation-00045-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00046-of-00064 + out=data/fsns/validation/validation-00046-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00047-of-00064 + out=data/fsns/validation/validation-00047-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00048-of-00064 + out=data/fsns/validation/validation-00048-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00049-of-00064 + out=data/fsns/validation/validation-00049-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00050-of-00064 + out=data/fsns/validation/validation-00050-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00051-of-00064 + out=data/fsns/validation/validation-00051-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00052-of-00064 + out=data/fsns/validation/validation-00052-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00053-of-00064 + out=data/fsns/validation/validation-00053-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00054-of-00064 + out=data/fsns/validation/validation-00054-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00055-of-00064 + out=data/fsns/validation/validation-00055-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00056-of-00064 + out=data/fsns/validation/validation-00056-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00057-of-00064 + out=data/fsns/validation/validation-00057-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00058-of-00064 + out=data/fsns/validation/validation-00058-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00059-of-00064 + out=data/fsns/validation/validation-00059-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00060-of-00064 + out=data/fsns/validation/validation-00060-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00061-of-00064 + out=data/fsns/validation/validation-00061-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00062-of-00064 + out=data/fsns/validation/validation-00062-of-00064 +http://download.tensorflow.org/data/fsns-20160927/validation/validation-00063-of-00064 + out=data/fsns/validation/validation-00063-of-00064 diff --git a/street/python/nn_ops.py b/street/python/nn_ops.py index 8f41e1f914781164f3a516ff3fe99f2634e3c55b..20c3b502853bbec80f30e9d2aa915477fa674c62 100644 --- a/street/python/nn_ops.py +++ b/street/python/nn_ops.py @@ -92,7 +92,7 @@ def rnn_helper(inp, elif direction == "backward": out = backward else: - out = tf.concat(2, [forward, backward]) + out = tf.concat(axis=2, values=[forward, backward]) return out @@ -183,7 +183,7 @@ def lstm_layer(inp, with tf.variable_scope(name): if backward: if length is None: - inp = tf.reverse(inp, [False, True, False]) + inp = tf.reverse(inp, [1]) else: inp = tf.reverse_sequence(inp, length, 1, 0) @@ -217,14 +217,14 @@ def lstm_layer(inp, batch_size = shapes.tensor_dim(inp, dim=0) num_frames = shapes.tensor_dim(inp, dim=1) - prev = tf.reshape(inp, tf.pack([batch_size * num_frames, num_prev])) + prev = tf.reshape(inp, tf.stack([batch_size * num_frames, num_prev])) if use_native_weights: with tf.variable_scope("LSTMCell"): b = tf.get_variable( "B", shape=[4 * num_nodes], - initializer=tf.zeros_initializer, + initializer=tf.zeros_initializer(), dtype=tf.float32) biases = tf.identity(b, name="biases") else: @@ -236,17 +236,17 @@ def lstm_layer(inp, biases, name="biases_reg")) prev = tf.nn.xw_plus_b(prev, w_i_m, biases) - prev = tf.reshape(prev, tf.pack([batch_size, num_frames, 4, num_nodes])) + prev = tf.reshape(prev, tf.stack([batch_size, num_frames, 4, num_nodes])) if state is None: - state = tf.fill(tf.pack([batch_size, num_nodes]), 0.0) + state = tf.fill(tf.stack([batch_size, num_nodes]), 0.0) if memory is None: - memory = tf.fill(tf.pack([batch_size, num_nodes]), 0.0) + memory = tf.fill(tf.stack([batch_size, num_nodes]), 0.0) out, _, mem = rnn.variable_lstm(prev, state, memory, w_m_m, clip=clip) if backward: if length is None: - out = tf.reverse(out, [False, True, False]) + out = tf.reverse(out, [1]) else: out = tf.reverse_sequence(out, length, 1, 0) diff --git a/street/python/vgsl_input.py b/street/python/vgsl_input.py index 17372e29ae01d145f88bdae174c45e06cfb20852..e4495c680aa7c757d87e6cfe2fefc1e62bc7ae6f 100644 --- a/street/python/vgsl_input.py +++ b/street/python/vgsl_input.py @@ -79,7 +79,7 @@ def ImageInput(input_pattern, num_threads, shape, using_ctc, reader=None): # Give the images a nice name as well. images = tf.identity(images, name='Images') - tf.image_summary('Images', images) + tf.summary.image('Images', images) return images, heights, widths, labels, sparse_labels, truths @@ -145,6 +145,6 @@ def _ImageProcessing(image_buffer, shape): image = tf.image.decode_png(image_buffer, channels=shape.depth) image.set_shape([shape.height, shape.width, shape.depth]) image = tf.cast(image, tf.float32) - image = tf.sub(image, 128.0) - image = tf.mul(image, 1 / 100.0) + image = tf.subtract(image, 128.0) + image = tf.multiply(image, 1 / 100.0) return image diff --git a/street/python/vgsl_model.py b/street/python/vgsl_model.py index 52a1d57d93020890ee493fde9845237b85b02f59..13d8bbe416565c274356c7087b7f442489860bc9 100644 --- a/street/python/vgsl_model.py +++ b/street/python/vgsl_model.py @@ -147,7 +147,7 @@ def Eval(train_dir, sequence_error=None) with tf.Graph().as_default(): model = InitNetwork(eval_data, model_str, 'eval', reader=reader) - sw = tf.train.SummaryWriter(eval_dir) + sw = tf.summary.FileWriter(eval_dir) while True: sess = tf.Session('') @@ -369,7 +369,7 @@ class VGSLImageModel(object): if self.mode == 'train': # Setup loss for training. self.loss = self._AddLossFunction(logits, height_in, out_dims, out_func) - tf.scalar_summary('loss', self.loss, name='loss') + tf.summary.scalar('loss', self.loss) elif out_dims == 0: # Be sure the labels match the output, even in eval mode. self.labels = tf.slice(self.labels, [0, 0], [-1, 1]) @@ -484,7 +484,7 @@ class VGSLImageModel(object): opt = tf.train.AdamOptimizer(learning_rate=learn_rate_dec) else: raise ValueError('Invalid optimizer type: ' + optimizer_type) - tf.scalar_summary('learn_rate', learn_rate_dec, name='lr_summ') + tf.summary.scalar('learn_rate', learn_rate_dec) self.train_op = opt.minimize( self.loss, global_step=self.global_step, name='train') diff --git a/street/python/vgslspecs.py b/street/python/vgslspecs.py index 1e08552f7ef5faf46562bd11b44c93ba8be3ecfb..2c96d77b2d6d7f11d0d0040c298d07de2e42f481 100644 --- a/street/python/vgslspecs.py +++ b/street/python/vgslspecs.py @@ -149,7 +149,7 @@ class VGSLSpecs(object): else: lengths = tf.ones_like(lengths) if factor != 1: - lengths = tf.mul(lengths, tf.cast(factor, tf.float32)) + lengths = tf.multiply(lengths, tf.cast(factor, tf.float32)) return tf.cast(lengths, tf.int32) def BuildFromString(self, prev_layer, index): @@ -235,7 +235,7 @@ class VGSLSpecs(object): final_factors = self.reduction_factors if index == len(self.model_str): raise ValueError('Missing ) at end of parallel!' + self.model_str) - return tf.concat(num_dims - 1, layers), index + 1 + return tf.concat(axis=num_dims - 1, values=layers), index + 1 def AddConvLayer(self, prev_layer, index): """Add a single standard convolutional layer. @@ -342,7 +342,7 @@ class VGSLSpecs(object): factor1 = tf.cast(self.reduction_factors[i], tf.float32) factor2 = tf.cast(prev_shape[i], tf.float32) divisor = tf.cast(result_shape[i], tf.float32) - self.reduction_factors[i] = tf.div(tf.mul(factor1, factor2), divisor) + self.reduction_factors[i] = tf.div(tf.multiply(factor1, factor2), divisor) return layer, m.end() def AddFCLayer(self, prev_layer, index): @@ -401,7 +401,7 @@ class VGSLSpecs(object): name + '_forward') back = self._LSTMLayer(prev_layer, 'backward', dim, True, depth, name + '_reverse') - return tf.concat(3, [fwd, back], name=name + '_concat'), m.end() + return tf.concat(axis=3, values=[fwd, back], name=name + '_concat'), m.end() if direction == 'f': direction = 'forward' elif direction == 'r': diff --git a/swivel/README.md b/swivel/README.md index fff8cc6f431f210e62cbfe6d1ef6a22524e71a25..ed77c747abcffae9ad462d6105d96540162e51d4 100644 --- a/swivel/README.md +++ b/swivel/README.md @@ -24,7 +24,7 @@ Note that the resulting co-occurrence matrix is very sparse (i.e., contains many zeros) since most words won't have been observed in the context of other words. In the case of very rare words, it seems reasonable to assume that you just haven't sampled enough data to spot their co-occurrence yet. On the other hand, -if we've failed to observed to common words co-occuring, it seems likely that +if we've failed to observed two common words co-occuring, it seems likely that they are *anti-correlated*. Swivel attempts to capture this intuition by using both the observed and the @@ -42,6 +42,9 @@ This release includes the following programs. * `swivel.py` is a TensorFlow program that generates embeddings from the co-occurrence statistics. It uses the files created by `prep.py` as input, and generates two text files as output: the row and column embeddings. +* `distributed.sh` is a Bash script that is meant to act as a template for + launching "distributed" Swivel training; i.e., multiple processes that work in + parallel and communicate via a parameter server. * `text2bin.py` combines the row and column vectors generated by Swivel into a flat binary file that can be quickly loaded into memory to perform vector arithmetic. This can also be used to convert embeddings from @@ -174,11 +177,5 @@ mixed case and evaluate them using lower case, things won't work well. # Contact If you have any questions about Swivel, feel free to post to -[swivel-embeddings@googlegroups.com](https://groups.google.com/forum/#!forum/swivel-embeddings) -or contact us directly: - -* Noam Shazeer (`noam@google.com`) -* Ryan Doherty (`portalfire@google.com`) -* Colin Evans (`colinhevans@google.com`) -* Chris Waterson (`waterson@google.com`) +[swivel-embeddings@googlegroups.com](https://groups.google.com/forum/#!forum/swivel-embeddings). diff --git a/swivel/distributed.sh b/swivel/distributed.sh new file mode 100644 index 0000000000000000000000000000000000000000..6aa59f751a8bbd3761a419f5f3242a9d1d5ce5e3 --- /dev/null +++ b/swivel/distributed.sh @@ -0,0 +1,54 @@ +#!/bin/bash +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This script launches a multi-process version of Swivel on a single machine. +set -e + +# A comma-separated list of parameter server processes. +PS_HOSTS="localhost:4000" + +# A comma-separated list of worker processes. +WORKER_HOSTS="localhost:5000,localhost:5001,localhost:5002,localhost:5003" + +# Where the Swivel training data is located. All processes must be able to read +# from this directory, so it ought to be a network filesystem if you're running +# on multiple servers. +INPUT_BASE_PATH="${HOME}/tmp/swivel/in" + +# Where the output and working directory is located. +OUTPUT_BASE_PATH="${HOME}/tmp/swivel/out" + +# Location of evaluation data, if you want to observe evaluation while training. +EVAL_BASE_PATH="${HOME}/tmp/swivel/eval" + +ARGS="--ps_hosts ${PS_HOSTS} +--worker_hosts ${WORKER_HOSTS} +--input_base_path ${INPUT_BASE_PATH} +--output_base_path ${OUTPUT_BASE_PATH} +--eval_base_path ${EVAL_BASE_PATH}" + +# This configuration is for a two-GPU machine. It starts four worker +# processes, two for each GPU. +python swivel.py --job_name ps --task_index 0 ${ARGS} >& /tmp/ps.0 & +python swivel.py --job_name worker --task_index 0 --gpu_device 0 ${ARGS} >& /tmp/worker.0 & +python swivel.py --job_name worker --task_index 1 --gpu_device 1 ${ARGS} >& /tmp/worker.1 & +python swivel.py --job_name worker --task_index 2 --gpu_device 0 ${ARGS} >& /tmp/worker.2 & +python swivel.py --job_name worker --task_index 3 --gpu_device 1 ${ARGS} >& /tmp/worker.3 & + +# Perhaps there is a more clever way to clean up the parameter server once all +# the workers are done. +wait %2 %3 %4 %5 +kill %1 + diff --git a/swivel/fastprep.cc b/swivel/fastprep.cc index e0918b473c59d8f10dfc4c26f69ec74a05d0cea2..a4bd7feef470ab29c9c8eb89051fc763aaeb16ff 100644 --- a/swivel/fastprep.cc +++ b/swivel/fastprep.cc @@ -25,7 +25,6 @@ #include #include -#include #include #include #include @@ -36,7 +35,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -79,6 +80,10 @@ Options: --window_size Specifies the window size for computing co-occurrence stats; default 10. + + --num_threads + The number of workers to calculate the co-occurrence matrix; + default 4. )"; struct cooc_t { @@ -246,15 +251,14 @@ class CoocBuffer { std::vector fds_; // Ensures that only one buffer file is getting written at a time. - pthread_mutex_t writer_mutex_; + std::mutex writer_mutex_; }; CoocBuffer::CoocBuffer(const std::string &output_dirname, const int num_shards, const int shard_size) : output_dirname_(output_dirname), num_shards_(num_shards), - shard_size_(shard_size), - writer_mutex_(PTHREAD_MUTEX_INITIALIZER) { + shard_size_(shard_size) { for (int row = 0; row < num_shards_; ++row) { for (int col = 0; col < num_shards_; ++col) { char filename[256]; @@ -290,14 +294,11 @@ void CoocBuffer::AccumulateCoocs(const cooc_counts_t &coocs) { bufs[bot_shard_idx].push_back(cooc_t{col_off, row_off, cnt}); } - // XXX TODO: lock for (int i = 0; i < static_cast(fds_.size()); ++i) { - int rv = pthread_mutex_lock(&writer_mutex_); - assert(rv == 0); + std::lock_guard rv(writer_mutex_); const int nbytes = bufs[i].size() * sizeof(cooc_t); int nwritten = write(fds_[i], bufs[i].data(), nbytes); assert(nwritten == nbytes); - pthread_mutex_unlock(&writer_mutex_); } } @@ -370,12 +371,25 @@ void CoocBuffer::WriteShards() { munmap(coocs, nbytes); close(fds_[shard]); + if (sparse_local_row->value_size() * 8 >= (64 << 20)) { + std::cout << "Warning: you are likely to catch protobuf parsing errors " + "in TF 1.0 and older because the shard is too fat (>= 64MiB); see " + << std::endl << + "kDefaultTotalBytesLimit in src/google/protobuf/io/coded_stream.h " + " changed in protobuf/commit/5a76e633ea9b5adb215e93fdc11e1c0c08b3fc74" + << std::endl << + "https://github.com/tensorflow/tensorflow/issues/7311" + << std::endl << + "Consider increasing the number of shards."; + } + // Write the protocol buffer as a binary blob to disk. - char filename[256]; - snprintf(filename, sizeof(filename), "shard-%03d-%03d.pb", row_shard, + const int filename_max_size = 4096; + std::unique_ptr filename(new char[filename_max_size]); + snprintf(filename.get(), filename_max_size, "shard-%03d-%03d.pb", row_shard, col_shard); - const std::string path = output_dirname_ + "/" + filename; + const std::string path = output_dirname_ + "/" + filename.get(); int fd = open(path.c_str(), O_WRONLY | O_TRUNC | O_CREAT, 0666); assert(fd != -1); @@ -448,7 +462,7 @@ void CoocCounter::Count() { fin_.seekg(start_); int nlines = 0; - for (off_t filepos = start_; filepos < end_; filepos = fin_.tellg()) { + for (off_t filepos = start_; filepos < end_ && !fin_.eof(); filepos = fin_.tellg()) { // Buffer a single sentence. std::vector sentence; bool eos; @@ -586,10 +600,11 @@ int main(int argc, char *argv[]) { struct stat sb; if (lstat(output_dirname.c_str(), &sb) != 0 || !S_ISDIR(sb.st_mode)) { - std::cerr << "output directory '" << output_dirname - << "' does not exist of is not a directory." << std::endl; - - return 1; + if (mkdir(output_dirname.c_str(), 0755) != 0) { + std::cerr << "output directory '" << output_dirname + << "' does not exist or is not a directory." << std::endl; + return 1; + } } if (lstat(input_filename.c_str(), &sb) != 0 || !S_ISREG(sb.st_mode)) { @@ -630,15 +645,12 @@ int main(int argc, char *argv[]) { token_to_id_map[vocab[i]] = i; // Compute the co-occurrences - std::vector threads; + std::vector threads; + threads.reserve(num_threads); std::vector counters; const off_t nbytes_per_thread = input_size / num_threads; - - pthread_attr_t attr; - if (pthread_attr_init(&attr) != 0) { - std::cerr << "unable to initalize pthreads" << std::endl; - return 1; - } + std::cout << "Running " << num_threads << " threads, each on " + << nbytes_per_thread << " bytes" << std::endl; for (int i = 0; i < num_threads; ++i) { // We could make this smarter and look around for newlines. But @@ -652,16 +664,16 @@ int main(int argc, char *argv[]) { counters.push_back(counter); - pthread_t thread; - pthread_create(&thread, &attr, CoocCounter::Run, counter); - - threads.push_back(thread); + threads.emplace_back(CoocCounter::Run, counter); } // Wait for threads to finish and collect marginals. std::vector marginals(vocab.size()); for (int i = 0; i < num_threads; ++i) { - pthread_join(threads[i], 0); + if (i > 0) { + std::cout << "joining thread #" << (i + 1) << std::endl; + } + threads[i].join(); const std::vector& counter_marginals = counters[i]->Marginals(); for (int j = 0; j < static_cast(vocab.size()); ++j) diff --git a/swivel/fastprep.mk b/swivel/fastprep.mk index bbc4a1736a70c21f855cdf3db40de61f4a0fd6c4..b1798d0b68a7ac53f47c3d8fa4f72babd1af89f2 100644 --- a/swivel/fastprep.mk +++ b/swivel/fastprep.mk @@ -16,50 +16,16 @@ # limitations under the License. - # This makefile builds "fastprep", a faster version of prep.py that can be used -# to build training data for Swivel. Building "fastprep" is a bit more -# involved: you'll need to pull and build the Tensorflow source, and then build -# and install compatible protobuf software. We've tested this with Tensorflow -# version 0.7. -# -# = Step 1. Pull and Build Tensorflow. = -# -# These instructions are somewhat abridged; for pre-requisites and the most -# up-to-date instructions, refer to: -# -# -# -# To build the Tensorflow components required for "fastpret", you'll need to -# install Bazel, Numpy, Swig, and Python development headers as described in at -# the above URL. Run the "configure" script as appropriate for your -# environment and then build the "build_pip_package" target: -# -# bazel build -c opt [--config=cuda] //tensorflow/tools/pip_package:build_pip_package -# -# This will generate the Tensorflow headers and libraries necessary for -# "fastprep". -# +# to build training data for Swivel. # -# = Step 2. Build and Install Compatible Protobuf Libraries = +# = Step 1. Install protobuf v3 = # -# "fastprep" also needs compatible protocol buffer libraries, which you can -# build from the protobuf implementation included with the Tensorflow -# distribution: +# Ubuntu 16.10+: sudo apt install libprotobuf-dev +# Ubuntu 16.04: https://launchpad.net/~maarten-fonville/+archive/ubuntu/ppa + replace xenial with yakkety in /etc/apt/sources.list.d/maarten-fonville-ubuntu-ppa-xenial.list +# macOS: brew install protobuf # -# cd ${TENSORFLOW_SRCDIR}/google/protobuf -# ./autogen.sh -# ./configure --prefix=${HOME} # ...or whatever -# make -# make install # ...or maybe "sudo make install" -# -# This will install the headers and libraries appropriately. -# -# -# = Step 3. Build "fastprep". = -# -# Finally modify this file (if necessary) to update PB_DIR and TF_DIR to refer -# to appropriate locations, and: +# = Step 2. Build "fastprep". = # # make -f fastprep.mk # @@ -68,20 +34,27 @@ # matrices and other files necessary to train a Swivel matrix. -# The root directory where the Google Protobuf software is installed. -# Alternative locations might be "/usr" or "/usr/local". -PB_DIR=$(HOME) +CXXFLAGS=-std=c++11 -march=native -g -O2 -flto -Wall -I. +LDLIBS=-lprotobuf -pthread -lm + +FETCHER=curl -L -o +TF_URL=https://github.com/tensorflow/tensorflow/raw/master +PROTOC=protoc + + +%.proto: tensorflow/core/example + $(FETCHER) $@ $(TF_URL)/$@ -# Assuming you've got the Tensorflow source unpacked and built in ${HOME}/src: -TF_DIR=$(HOME)/src/tensorflow +%.pb.cc: %.proto + $(PROTOC) --cpp_out=. $< -PB_INCLUDE=$(PB_DIR)/include -TF_INCLUDE=$(TF_DIR)/bazel-genfiles -CXXFLAGS=-std=c++11 -m64 -mavx -g -Ofast -Wall -I$(TF_INCLUDE) -I$(PB_INCLUDE) +fastprep: fastprep.cc tensorflow/core/example/feature.pb.cc tensorflow/core/example/example.pb.cc -PB_LIBDIR=$(PB_DIR)/lib -TF_LIBDIR=$(TF_DIR)/bazel-bin/tensorflow/core -LDFLAGS=-L$(TF_LIBDIR) -L$(PB_LIBDIR) -LDLIBS=-lprotos_all_cc -lprotobuf -lpthread -lm +tensorflow/core/example: + @mkdir -p tensorflow/core/example -fastprep: fastprep.cc +clean: + @rm -f fastprep + +mrproper: clean + @rm -rf tensorflow diff --git a/swivel/glove_to_shards.py b/swivel/glove_to_shards.py old mode 100755 new mode 100644 diff --git a/swivel/nearest.py b/swivel/nearest.py old mode 100755 new mode 100644 diff --git a/swivel/prep.py b/swivel/prep.py old mode 100755 new mode 100644 diff --git a/swivel/swivel.py b/swivel/swivel.py old mode 100755 new mode 100644 index 456327e6b2101cfbac656eaeb7386c94878d7c4e..c69660c09c18f54da654ca8a7341559f8b9bcc22 --- a/swivel/swivel.py +++ b/swivel/swivel.py @@ -1,5 +1,3 @@ -#!/usr/bin/env python -# # Copyright 2016 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -49,304 +47,442 @@ number of epochs. When complete, it will output the trained vectors to a tab-separated file that contains one line per embedding. Row and column embeddings are stored in separate files. +Swivel can be run "stand-alone" or "distributed". The latter involves running +at least one parameter server process, along with one or more worker processes. """ -import argparse +from __future__ import division +from __future__ import print_function + import glob -import math +import itertools import os -import sys -import time -import threading +import random import numpy as np +import scipy.stats import tensorflow as tf flags = tf.app.flags -flags.DEFINE_string('input_base_path', '/tmp/swivel_data', - 'Directory containing input shards, vocabularies, ' - 'and marginals.') -flags.DEFINE_string('output_base_path', '/tmp/swivel_data', - 'Path where to write the trained embeddings.') -flags.DEFINE_integer('embedding_size', 300, 'Size of the embeddings') -flags.DEFINE_boolean('trainable_bias', False, 'Biases are trainable') -flags.DEFINE_integer('submatrix_rows', 4096, 'Rows in each training submatrix. ' - 'This must match the training data.') -flags.DEFINE_integer('submatrix_cols', 4096, 'Rows in each training submatrix. ' - 'This must match the training data.') -flags.DEFINE_float('loss_multiplier', 1.0 / 4096, - 'constant multiplier on loss.') -flags.DEFINE_float('confidence_exponent', 0.5, - 'Exponent for l2 confidence function') -flags.DEFINE_float('confidence_scale', 0.25, 'Scale for l2 confidence function') -flags.DEFINE_float('confidence_base', 0.1, 'Base for l2 confidence function') -flags.DEFINE_float('learning_rate', 1.0, 'Initial learning rate') -flags.DEFINE_integer('num_concurrent_steps', 2, - 'Number of threads to train with') -flags.DEFINE_float('num_epochs', 40, 'Number epochs to train for') -flags.DEFINE_float('per_process_gpu_memory_fraction', 0.25, - 'Fraction of GPU memory to use') +flags.DEFINE_string( + 'input_base_path', '/tmp/swivel_data', + 'Directory containing input shards, vocabularies, and marginals.') +flags.DEFINE_string( + 'output_base_path', '/tmp/swivel_data', + 'Path where to write the trained embeddings.') +flags.DEFINE_string('eval_base_path', '', 'Path to evaluation data') + +# Control for training. +flags.DEFINE_float('num_epochs', 40, 'Number epochs to train') +flags.DEFINE_string('hparams', '', 'Model hyper-parameters') + +# Model hyper-parameters. (Move these to tf.HParams once that gets integrated +# into TF from tf.contrib.) +flags.DEFINE_integer( + 'dim', 300, 'Embedding dimensionality') +flags.DEFINE_string( + 'optimizer', 'rmsprop', 'SGD optimizer; either "adagrad" or "rmsprop"') +flags.DEFINE_float( + 'learning_rate', 0.1, 'Optimizer learning rate') +flags.DEFINE_float( + 'momentum', 0.1, 'Optimizer momentum; used with RMSProp') +flags.DEFINE_float( + 'confidence_base', 0.0, 'Base for count weighting') +flags.DEFINE_float( + 'confidence_scale', 1.0, 'Scale for count weighting') +flags.DEFINE_float( + 'confidence_exponent', 0.5, 'Exponent for count weighting') +flags.DEFINE_integer( + 'submatrix_rows', 4096, 'Number of rows in each submatrix') +flags.DEFINE_integer( + 'submatrix_cols', 4096, 'Number of cols in each submatrix') + +# For distributed training. +flags.DEFINE_string( + 'ps_hosts', '', + 'Comma-separated list of parameter server host:port; if empty, run local') +flags.DEFINE_string( + 'worker_hosts', '', 'Comma-separated list of worker host:port') +flags.DEFINE_string( + 'job_name', '', 'The job this process will run, either "ps" or "worker"') +flags.DEFINE_integer( + 'task_index', 0, 'The task index for this process') +flags.DEFINE_integer( + 'gpu_device', 0, 'The GPU device to use.') FLAGS = flags.FLAGS -def embeddings_with_init(vocab_size, embedding_dim, name): - """Creates and initializes the embedding tensors.""" - return tf.get_variable(name=name, - shape=[vocab_size, embedding_dim], - initializer=tf.random_normal_initializer( - stddev=math.sqrt(1.0 / embedding_dim))) - - -def count_matrix_input(filenames, submatrix_rows, submatrix_cols): - """Reads submatrix shards from disk.""" - filename_queue = tf.train.string_input_producer(filenames) - reader = tf.WholeFileReader() - _, serialized_example = reader.read(filename_queue) - features = tf.parse_single_example( - serialized_example, - features={ - 'global_row': tf.FixedLenFeature([submatrix_rows], dtype=tf.int64), - 'global_col': tf.FixedLenFeature([submatrix_cols], dtype=tf.int64), - 'sparse_local_row': tf.VarLenFeature(dtype=tf.int64), - 'sparse_local_col': tf.VarLenFeature(dtype=tf.int64), - 'sparse_value': tf.VarLenFeature(dtype=tf.float32) - }) - - global_row = features['global_row'] - global_col = features['global_col'] - - sparse_local_row = features['sparse_local_row'].values - sparse_local_col = features['sparse_local_col'].values - sparse_count = features['sparse_value'].values - - sparse_indices = tf.concat([tf.expand_dims(sparse_local_row, 1), - tf.expand_dims(sparse_local_col, 1)], 1) - count = tf.sparse_to_dense(sparse_indices, [submatrix_rows, submatrix_cols], - sparse_count) - - queued_global_row, queued_global_col, queued_count = tf.train.batch( - [global_row, global_col, count], - batch_size=1, - num_threads=4, - capacity=32) - - queued_global_row = tf.reshape(queued_global_row, [submatrix_rows]) - queued_global_col = tf.reshape(queued_global_col, [submatrix_cols]) - queued_count = tf.reshape(queued_count, [submatrix_rows, submatrix_cols]) - - return queued_global_row, queued_global_col, queued_count - - -def read_marginals_file(filename): - """Reads text file with one number per line to an array.""" - with open(filename) as lines: - return [float(line) for line in lines] - - -def write_embedding_tensor_to_disk(vocab_path, output_path, sess, embedding): - """Writes tensor to output_path as tsv""" - # Fetch the embedding values from the model - embeddings = sess.run(embedding) - - with open(output_path, 'w') as out_f: - with open(vocab_path) as vocab_f: - for index, word in enumerate(vocab_f): - word = word.strip() - embedding = embeddings[index] - out_f.write(word + '\t' + '\t'.join([str(x) for x in embedding]) + '\n') - - -def write_embeddings_to_disk(config, model, sess): - """Writes row and column embeddings disk""" - # Row Embedding - row_vocab_path = config.input_base_path + '/row_vocab.txt' - row_embedding_output_path = config.output_base_path + '/row_embedding.tsv' - print 'Writing row embeddings to:', row_embedding_output_path - sys.stdout.flush() - write_embedding_tensor_to_disk(row_vocab_path, row_embedding_output_path, - sess, model.row_embedding) - - # Column Embedding - col_vocab_path = config.input_base_path + '/col_vocab.txt' - col_embedding_output_path = config.output_base_path + '/col_embedding.tsv' - print 'Writing column embeddings to:', col_embedding_output_path - sys.stdout.flush() - write_embedding_tensor_to_disk(col_vocab_path, col_embedding_output_path, - sess, model.col_embedding) - - -class SwivelModel(object): - """Small class to gather needed pieces from a Graph being built.""" - - def __init__(self, config): - """Construct graph for dmc.""" - self._config = config - - # Create paths to input data files - print 'Reading model from:', config.input_base_path - sys.stdout.flush() - count_matrix_files = glob.glob(config.input_base_path + '/shard-*.pb') - row_sums_path = config.input_base_path + '/row_sums.txt' - col_sums_path = config.input_base_path + '/col_sums.txt' - - # Read marginals - row_sums = read_marginals_file(row_sums_path) - col_sums = read_marginals_file(col_sums_path) - - self.n_rows = len(row_sums) - self.n_cols = len(col_sums) - print 'Matrix dim: (%d,%d) SubMatrix dim: (%d,%d) ' % ( - self.n_rows, self.n_cols, config.submatrix_rows, config.submatrix_cols) - sys.stdout.flush() - self.n_submatrices = (self.n_rows * self.n_cols / - (config.submatrix_rows * config.submatrix_cols)) - print 'n_submatrices: %d' % (self.n_submatrices) - sys.stdout.flush() - - # ===== CREATE VARIABLES ====== - - with tf.device('/cpu:0'): - # embeddings - self.row_embedding = embeddings_with_init( - embedding_dim=config.embedding_size, - vocab_size=self.n_rows, - name='row_embedding') - self.col_embedding = embeddings_with_init( - embedding_dim=config.embedding_size, - vocab_size=self.n_cols, - name='col_embedding') - tf.summary.histogram('row_emb', self.row_embedding) - tf.summary.histogram('col_emb', self.col_embedding) - - matrix_log_sum = math.log(np.sum(row_sums) + 1) - row_bias_init = [math.log(x + 1) for x in row_sums] - col_bias_init = [math.log(x + 1) for x in col_sums] - self.row_bias = tf.Variable(row_bias_init, - trainable=config.trainable_bias) - self.col_bias = tf.Variable(col_bias_init, - trainable=config.trainable_bias) - tf.summary.histogram('row_bias', self.row_bias) - tf.summary.histogram('col_bias', self.col_bias) - - # ===== CREATE GRAPH ===== - - # Get input - with tf.device('/cpu:0'): - global_row, global_col, count = count_matrix_input( - count_matrix_files, config.submatrix_rows, config.submatrix_cols) - - # Fetch embeddings. - selected_row_embedding = tf.nn.embedding_lookup(self.row_embedding, - global_row) - selected_col_embedding = tf.nn.embedding_lookup(self.col_embedding, - global_col) - - # Fetch biases. - selected_row_bias = tf.nn.embedding_lookup([self.row_bias], global_row) - selected_col_bias = tf.nn.embedding_lookup([self.col_bias], global_col) - - # Multiply the row and column embeddings to generate predictions. - predictions = tf.matmul( - selected_row_embedding, selected_col_embedding, transpose_b=True) +class Model(object): + """A Swivel model.""" + + def __init__(self, input_base_path, hparams): + """Creates a new Swivel model.""" + # Read vocab + self.row_ix_to_word, self.row_word_to_ix = self._read_vocab( + os.path.join(input_base_path, 'row_vocab.txt')) + self.col_ix_to_word, self.col_word_to_ix = self._read_vocab( + os.path.join(input_base_path, 'col_vocab.txt')) + + # Read marginals. + row_sums = self._read_marginals_file( + os.path.join(input_base_path, 'row_sums.txt')) + col_sums = self._read_marginals_file( + os.path.join(input_base_path, 'col_sums.txt')) + + # Construct input tensors. + count_matrix_files = glob.glob( + os.path.join(input_base_path, 'shard-*.pb')) + + global_rows, global_cols, counts = self._count_matrix_input( + count_matrix_files, hparams.submatrix_rows, hparams.submatrix_cols) + + # Create embedding variables. + sigma = 1.0 / np.sqrt(hparams.dim) + self.row_embedding = tf.get_variable( + 'row_embedding', + shape=[len(row_sums), hparams.dim], + initializer=tf.random_normal_initializer(0, sigma), + dtype=tf.float32) + self.col_embedding = tf.get_variable( + 'col_embedding', + shape=[len(col_sums), hparams.dim], + initializer=tf.random_normal_initializer(0, sigma), + dtype=tf.float32) + + matrix_log_sum = np.log(np.sum(row_sums) + 1) + row_bias = tf.constant( + [np.log(x + 1) for x in row_sums], dtype=tf.float32) + col_bias = tf.constant( + [np.log(x + 1) for x in col_sums], dtype=tf.float32) + + # Fetch embeddings. + selected_rows = tf.nn.embedding_lookup(self.row_embedding, global_rows) + selected_cols = tf.nn.embedding_lookup(self.col_embedding, global_cols) + + selected_row_bias = tf.gather(row_bias, global_rows) + selected_col_bias = tf.gather(col_bias, global_cols) + + predictions = tf.matmul(selected_rows, selected_cols, transpose_b=True) # These binary masks separate zero from non-zero values. - count_is_nonzero = tf.to_float(tf.cast(count, tf.bool)) - count_is_zero = 1 - tf.to_float(tf.cast(count, tf.bool)) + count_is_nonzero = tf.to_float(tf.cast(counts, tf.bool)) + count_is_zero = 1 - count_is_nonzero - objectives = count_is_nonzero * tf.log(count + 1e-30) - objectives -= tf.reshape(selected_row_bias, [config.submatrix_rows, 1]) + objectives = count_is_nonzero * tf.log(counts + 1e-30) + objectives -= tf.reshape(selected_row_bias, [-1, 1]) objectives -= selected_col_bias objectives += matrix_log_sum err = predictions - objectives - # The confidence function scales the L2 loss based on the raw co-occurrence - # count. - l2_confidence = (config.confidence_base + config.confidence_scale * tf.pow( - count, config.confidence_exponent)) - - l2_loss = config.loss_multiplier * tf.reduce_sum( - 0.5 * l2_confidence * err * err * count_is_nonzero) + # The confidence function scales the L2 loss based on the raw + # co-occurrence count. + l2_confidence = (hparams.confidence_base + + hparams.confidence_scale * tf.pow( + counts, hparams.confidence_exponent)) - sigmoid_loss = config.loss_multiplier * tf.reduce_sum( - tf.nn.softplus(err) * count_is_zero) + loss_multiplier = 1 / np.sqrt( + hparams.submatrix_rows * hparams.submatrix_cols) - self.loss = l2_loss + sigmoid_loss + l2_loss = loss_multiplier * tf.reduce_sum( + 0.5 * l2_confidence * tf.square(err)) - tf.summary.scalar("l2_loss", l2_loss) - tf.summary.scalar("sigmoid_loss", sigmoid_loss) - tf.summary.scalar("loss", self.loss) + sigmoid_loss = loss_multiplier * tf.reduce_sum( + tf.nn.softplus(err) * count_is_zero) - # Add optimizer. - self.global_step = tf.Variable(0, name='global_step') - opt = tf.train.AdagradOptimizer(config.learning_rate) - self.train_op = opt.minimize(self.loss, global_step=self.global_step) - self.saver = tf.train.Saver(sharded=True) + self.loss_op = l2_loss + sigmoid_loss + + if hparams.optimizer == 'adagrad': + opt = tf.train.AdagradOptimizer(hparams.learning_rate) + elif hparams.optimizer == 'rmsprop': + opt = tf.train.RMSPropOptimizer(hparams.learning_rate, hparams.momentum) + else: + raise ValueError('unknown optimizer "%s"' % hparams.optimizer) + + self.global_step = tf.get_variable( + 'global_step', initializer=0, trainable=False) + + self.train_op = opt.minimize(self.loss_op, global_step=self.global_step) + + # One epoch trains each submatrix once. + self.steps_per_epoch = ( + (len(row_sums) / hparams.submatrix_rows) * + (len(col_sums) / hparams.submatrix_cols)) + + def _read_vocab(self, filename): + """Reads the vocabulary file.""" + with open(filename) as lines: + ix_to_word = [line.strip() for line in lines] + word_to_ix = {word: ix for ix, word in enumerate(ix_to_word)} + return ix_to_word, word_to_ix + + def _read_marginals_file(self, filename): + """Reads text file with one number per line to an array.""" + with open(filename) as lines: + return [float(line.strip()) for line in lines] + + def _count_matrix_input(self, filenames, submatrix_rows, submatrix_cols): + """Creates ops that read submatrix shards from disk.""" + random.shuffle(filenames) + filename_queue = tf.train.string_input_producer(filenames) + reader = tf.WholeFileReader() + _, serialized_example = reader.read(filename_queue) + features = tf.parse_single_example( + serialized_example, + features={ + 'global_row': tf.FixedLenFeature([submatrix_rows], dtype=tf.int64), + 'global_col': tf.FixedLenFeature([submatrix_cols], dtype=tf.int64), + 'sparse_local_row': tf.VarLenFeature(dtype=tf.int64), + 'sparse_local_col': tf.VarLenFeature(dtype=tf.int64), + 'sparse_value': tf.VarLenFeature(dtype=tf.float32) + }) + + global_row = features['global_row'] + global_col = features['global_col'] + + sparse_local_row = features['sparse_local_row'].values + sparse_local_col = features['sparse_local_col'].values + sparse_count = features['sparse_value'].values + + sparse_indices = tf.concat( + axis=1, values=[tf.expand_dims(sparse_local_row, 1), + tf.expand_dims(sparse_local_col, 1)]) + + count = tf.sparse_to_dense(sparse_indices, [submatrix_rows, submatrix_cols], + sparse_count) + + return global_row, global_col, count + + def wordsim_eval_op(self, filename): + """Returns an op that runs an eval on a word similarity dataset. + + The eval dataset is assumed to be tab-separated, one scored word pair per + line. The resulting value is Spearman's rho of the human judgements with + the cosine similarity of the word embeddings. + + Args: + filename: the filename containing the word similarity data. + + Returns: + An operator that will compute Spearman's rho of the current row + embeddings. + """ + with open(filename, 'r') as fh: + tuples = (line.strip().split('\t') for line in fh.read().splitlines()) + word1s, word2s, sims = zip(*tuples) + actuals = map(float, sims) + + v1s_t = tf.nn.embedding_lookup( + self.row_embedding, + [self.row_word_to_ix.get(w, 0) for w in word1s]) + + v2s_t = tf.nn.embedding_lookup( + self.row_embedding, + [self.row_word_to_ix.get(w, 0) for w in word2s]) + + # Compute the predicted word similarity as the cosine similarity between the + # embedding vectors. + preds_t = tf.reduce_sum( + tf.nn.l2_normalize(v1s_t, dim=1) * tf.nn.l2_normalize(v2s_t, dim=1), + axis=1) + + def _op(preds): + rho, _ = scipy.stats.spearmanr(preds, actuals) + return rho + + return tf.py_func(_op, [preds_t], tf.float64) + + def analogy_eval_op(self, filename, max_vocab_size=20000): + """Returns an op that runs an eval on an analogy dataset. + + The eval dataset is assumed to be tab-separated, with four tokens per + line. The first three tokens are query terms, the last is the expected + answer. For each line (e.g., "man king woman queen"), the vectors + corresponding to the query terms are added ("king - man + woman") to produce + a query vector. If the expected answer's vector is the nearest neighbor to + the query vector (not counting any of the query vectors themselves), then + the line is scored as correct. The reported accuracy is the number of + correct rows divided by the total number of rows. Missing terms are + replaced with an arbitrary vector and will almost certainly result in + incorrect answers. + + Note that the results are approximate: for efficiency's sake, only the first + `max_vocab_size` terms are included in the nearest neighbor search. + + Args: + filename: the filename containing the analogy data. + max_vocab_size: the maximum number of tokens to include in the nearest + neighbor search. By default, 20000. + + Returns: + The accuracy on the analogy task. + """ + analogy_ixs = [] + with open(filename, 'r') as lines: + for line in lines: + parts = line.strip().split('\t') + if len(parts) == 4: + analogy_ixs.append([self.row_word_to_ix.get(w, 0) for w in parts]) + + # man:king :: woman:queen => king - man + woman == queen + ix1s, ix2s, ix3s, _ = zip(*analogy_ixs) + v1s_t, v2s_t, v3s_t = ( + tf.nn.l2_normalize( + tf.nn.embedding_lookup(self.row_embedding, ixs), + dim=1) + for ixs in (ix1s, ix2s, ix3s)) + + preds_t = v2s_t - v1s_t + v3s_t + + # Compute the nearest neighbors as the cosine similarity. We only consider + # up to max_vocab_size to avoid a matmul that swamps the machine. + sims_t = tf.matmul( + preds_t, + tf.nn.l2_normalize(self.row_embedding[:max_vocab_size], dim=1), + transpose_b=True) + + # Take the four nearest neighbors, since the eval explicitly discards the + # query terms. + _, preds_ixs_t = tf.nn.top_k(sims_t, 4) + + def _op(preds_ixs): + correct, total = 0, 0 + for pred_ixs, actual_ixs in itertools.izip(preds_ixs, analogy_ixs): + pred_ixs = [ix for ix in pred_ixs if ix not in actual_ixs[:3]] + correct += pred_ixs[0] == actual_ixs[3] + total += 1 + + return correct / total + + return tf.py_func(_op, [preds_ixs_t], tf.float64) + + def _write_tensor(self, vocab_path, output_path, session, embedding): + """Writes tensor to output_path as tsv.""" + embeddings = session.run(embedding) + + with open(output_path, 'w') as out_f: + with open(vocab_path) as vocab_f: + for index, word in enumerate(vocab_f): + word = word.strip() + embedding = embeddings[index] + print('\t'.join([word.strip()] + [str(x) for x in embedding]), + file=out_f) + + def write_embeddings(self, config, session): + """Writes row and column embeddings disk.""" + self._write_tensor( + os.path.join(config.input_base_path, 'row_vocab.txt'), + os.path.join(config.output_base_path, 'row_embedding.tsv'), + session, self.row_embedding) + + self._write_tensor( + os.path.join(config.input_base_path, 'col_vocab.txt'), + os.path.join(config.output_base_path, 'col_embedding.tsv'), + session, self.col_embedding) def main(_): - # Create the output path. If this fails, it really ought to fail - # now. :) - if not os.path.isdir(FLAGS.output_base_path): - os.makedirs(FLAGS.output_base_path) - - # Create and run model + tf.logging.set_verbosity(tf.logging.INFO) + + # If we have ps_hosts, then we'll assume that this is going to be a + # distributed training run. Configure the cluster appropriately. Otherwise, + # we just do everything in-process. + if FLAGS.ps_hosts: + cluster = tf.train.ClusterSpec({ + 'ps': FLAGS.ps_hosts.split(','), + 'worker': FLAGS.worker_hosts.split(','), + }) + + if FLAGS.job_name == 'ps': + # Ignore the GPU if we're the parameter server. This let's the PS run on + # the same machine as a worker. + config = tf.ConfigProto(device_count={'GPU': 0}) + elif FLAGS.job_name == 'worker': + config = tf.ConfigProto(gpu_options=tf.GPUOptions( + visible_device_list='%d' % FLAGS.gpu_device, + allow_growth=True)) + else: + raise ValueError('unknown job name "%s"' % FLAGS.job_name) + + server = tf.train.Server( + cluster, + job_name=FLAGS.job_name, + task_index=FLAGS.task_index, + config=config) + + if FLAGS.job_name == 'ps': + return server.join() + + device_setter = tf.train.replica_device_setter( + worker_device='/job:worker/task:%d' % FLAGS.task_index, + cluster=cluster) + + else: + server = None + device_setter = tf.train.replica_device_setter(0) + + # Build the graph. with tf.Graph().as_default(): - model = SwivelModel(FLAGS) - - # Create a session for running Ops on the Graph. - gpu_options = tf.GPUOptions( - per_process_gpu_memory_fraction=FLAGS.per_process_gpu_memory_fraction) - sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) - - # Run the Op to initialize the variables. - sess.run(tf.global_variables_initializer()) - - # Start feeding input - coord = tf.train.Coordinator() - threads = tf.train.start_queue_runners(sess=sess, coord=coord) - - # Calculate how many steps each thread should run - n_total_steps = int(FLAGS.num_epochs * model.n_rows * model.n_cols) / ( - FLAGS.submatrix_rows * FLAGS.submatrix_cols) - n_steps_per_thread = n_total_steps / FLAGS.num_concurrent_steps - n_submatrices_to_train = model.n_submatrices * FLAGS.num_epochs - t0 = [time.time()] - - def TrainingFn(): - for _ in range(n_steps_per_thread): - _, global_step = sess.run([model.train_op, model.global_step]) - n_steps_between_status_updates = 100 - if (global_step % n_steps_between_status_updates) == 0: - elapsed = float(time.time() - t0[0]) - print '%d/%d submatrices trained (%.1f%%), %.1f submatrices/sec' % ( - global_step, n_submatrices_to_train, - 100.0 * global_step / n_submatrices_to_train, - n_steps_between_status_updates / elapsed) - sys.stdout.flush() - t0[0] = time.time() - - # Start training threads - train_threads = [] - for _ in range(FLAGS.num_concurrent_steps): - t = threading.Thread(target=TrainingFn) - train_threads.append(t) - t.start() - - # Wait for threads to finish. - for t in train_threads: - t.join() - - coord.request_stop() - coord.join(threads) - - # Write out vectors - write_embeddings_to_disk(FLAGS, model, sess) - - #Shutdown - sess.close() + with tf.device(device_setter): + model = Model(FLAGS.input_base_path, FLAGS) + + # If an eval path is present, then create eval operators and set up scalar + # summaries to report on the results. Run the evals on the CPU since + # the analogy eval requires a fairly enormous tensor to be allocated to + # do the nearest neighbor search. + if FLAGS.eval_base_path: + wordsim_filenames = glob.glob( + os.path.join(FLAGS.eval_base_path, '*.ws.tab')) + + for filename in wordsim_filenames: + name = os.path.basename(filename).split('.')[0] + with tf.device(tf.DeviceSpec(device_type='CPU')): + op = model.wordsim_eval_op(filename) + tf.summary.scalar(name, op) + + analogy_filenames = glob.glob( + os.path.join(FLAGS.eval_base_path, '*.an.tab')) + + for filename in analogy_filenames: + name = os.path.basename(filename).split('.')[0] + with tf.device(tf.DeviceSpec(device_type='CPU')): + op = model.analogy_eval_op(filename) + tf.summary.scalar(name, op) + + tf.summary.scalar('loss', model.loss_op) + + # Train on, soldier. + supervisor = tf.train.Supervisor( + logdir=FLAGS.output_base_path, + is_chief=(FLAGS.task_index == 0), + save_summaries_secs=60, + recovery_wait_secs=5) + + max_step = FLAGS.num_epochs * model.steps_per_epoch + master = server.target if server else '' + with supervisor.managed_session(master) as session: + local_step = 0 + global_step = session.run(model.global_step) + while not supervisor.should_stop() and global_step < max_step: + global_step, loss, _ = session.run([ + model.global_step, model.loss_op, model.train_op]) + + if not np.isfinite(loss): + raise ValueError('non-finite cost at step %d' % global_step) + + local_step += 1 + if local_step % 10 == 0: + tf.logging.info( + 'local_step=%d global_step=%d loss=%.1f, %.1f%% complete', + local_step, global_step, loss, 100.0 * global_step / max_step) + + if FLAGS.task_index == 0: + supervisor.saver.save( + session, supervisor.save_path, global_step=global_step) + + model.write_embeddings(FLAGS, session) if __name__ == '__main__': diff --git a/swivel/text2bin.py b/swivel/text2bin.py old mode 100755 new mode 100644 diff --git a/swivel/wordsim.py b/swivel/wordsim.py old mode 100755 new mode 100644 diff --git a/syntaxnet/.dockerignore b/syntaxnet/.dockerignore new file mode 100644 index 0000000000000000000000000000000000000000..14b990cb244dd42360aeafe2eca52228bdfcaf4c --- /dev/null +++ b/syntaxnet/.dockerignore @@ -0,0 +1,4 @@ +.git +bazel/ +Dockerfile* +tensorflow/.git diff --git a/syntaxnet/Dockerfile b/syntaxnet/Dockerfile index 9b4b9689439f47cd366f6790d698577de47d7e69..332cdfc3e42df6fae138f0106a3383b9ed26a4f5 100644 --- a/syntaxnet/Dockerfile +++ b/syntaxnet/Dockerfile @@ -1,33 +1,92 @@ -FROM java:8 +# Java baseimage, for Bazel. +FROM openjdk:8 ENV SYNTAXNETDIR=/opt/tensorflow PATH=$PATH:/root/bin +# Install system packages. This doesn't include everything the TensorFlow +# dockerfile specifies, so if anything goes awry, maybe install more packages +# from there. Also, running apt-get clean before further commands will make the +# Docker images smaller. RUN mkdir -p $SYNTAXNETDIR \ && cd $SYNTAXNETDIR \ && apt-get update \ - && apt-get install git zlib1g-dev file swig python2.7 python-dev python-pip python-mock -y \ - && pip install --upgrade pip \ - && pip install -U protobuf==3.0.0b2 \ - && pip install asciitree \ - && pip install numpy \ - && wget https://github.com/bazelbuild/bazel/releases/download/0.4.3/bazel-0.4.3-installer-linux-x86_64.sh \ + && apt-get install -y \ + file \ + git \ + graphviz \ + libcurl3-dev \ + libfreetype6-dev \ + libgraphviz-dev \ + liblapack-dev \ + libopenblas-dev \ + libpng12-dev \ + libxft-dev \ + python-dev \ + python-mock \ + python-pip \ + python2.7 \ + swig \ + vim \ + zlib1g-dev \ + && apt-get clean \ + && (rm -f /var/cache/apt/archives/*.deb \ + /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true) + +# Install common Python dependencies. Similar to above, remove caches +# afterwards to help keep Docker images smaller. +RUN pip install --ignore-installed pip \ + && python -m pip install numpy \ + && rm -rf /root/.cache/pip /tmp/pip* +RUN python -m pip install \ + asciitree \ + ipykernel \ + jupyter \ + matplotlib \ + pandas \ + protobuf \ + scipy \ + sklearn \ + && python -m ipykernel.kernelspec \ + && python -m pip install pygraphviz \ + --install-option="--include-path=/usr/include/graphviz" \ + --install-option="--library-path=/usr/lib/graphviz/" \ + && python -m jupyter_core.command nbextension enable \ + --py --sys-prefix widgetsnbextension \ + && rm -rf /root/.cache/pip /tmp/pip* + +# Installs the latest version of Bazel. +RUN wget --quiet https://github.com/bazelbuild/bazel/releases/download/0.4.3/bazel-0.4.3-installer-linux-x86_64.sh \ && chmod +x bazel-0.4.3-installer-linux-x86_64.sh \ - && ./bazel-0.4.3-installer-linux-x86_64.sh --user \ - && git clone --recursive https://github.com/tensorflow/models.git \ - && cd $SYNTAXNETDIR/models/syntaxnet/tensorflow \ - && echo -e "\n\n\n\n\n\n\n\n\n" | ./configure \ - && apt-get autoremove -y \ - && apt-get clean + && ./bazel-0.4.3-installer-linux-x86_64.sh \ + && rm ./bazel-0.4.3-installer-linux-x86_64.sh + +COPY WORKSPACE $SYNTAXNETDIR/syntaxnet/WORKSPACE +COPY tools/bazel.rc $SYNTAXNETDIR/syntaxnet/tools/bazel.rc +COPY tensorflow $SYNTAXNETDIR/syntaxnet/tensorflow + +# Compile common TensorFlow targets, which don't depend on DRAGNN / SyntaxNet +# source. This makes it more convenient to re-compile DRAGNN / SyntaxNet for +# development (though not as convenient as the docker-devel scripts). +RUN cd $SYNTAXNETDIR/syntaxnet/tensorflow \ + && tensorflow/tools/ci_build/builds/configured CPU \ + && cd $SYNTAXNETDIR/syntaxnet \ + && bazel build -c opt @org_tensorflow//tensorflow:tensorflow_py -RUN cd $SYNTAXNETDIR/models/syntaxnet \ - && bazel test --genrule_strategy=standalone syntaxnet/... util/utf8/... +# Build the codez. +WORKDIR $SYNTAXNETDIR/syntaxnet +COPY dragnn $SYNTAXNETDIR/syntaxnet/dragnn +COPY syntaxnet $SYNTAXNETDIR/syntaxnet/syntaxnet +COPY third_party $SYNTAXNETDIR/syntaxnet/third_party +COPY util/utf8 $SYNTAXNETDIR/syntaxnet/util/utf8 +RUN bazel build -c opt //dragnn/python:all //dragnn/tools:all -WORKDIR $SYNTAXNETDIR/models/syntaxnet +# This makes the IP exposed actually "*"; we'll do host restrictions by passing +# a hostname to the `docker run` command. +COPY tensorflow/tensorflow/tools/docker/jupyter_notebook_config.py /root/.jupyter/ +EXPOSE 8888 -CMD [ "sh", "-c", "echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh" ] +# This does not need to be compiled, only copied. +COPY examples $SYNTAXNETDIR/syntaxnet/examples +# Todo: Move this earlier in the file (don't want to invalidate caches for now). -# COMMANDS to build and run -# =============================== -# mkdir build && cp Dockerfile build/ && cd build -# docker build -t syntaxnet . -# docker run syntaxnet +CMD /bin/bash -c "bazel-bin/dragnn/tools/oss_notebook_launcher notebook --debug --notebook-dir=/opt/tensorflow/syntaxnet/examples --allow-root" diff --git a/syntaxnet/README.md b/syntaxnet/README.md index bce5ea75a1307243977eae6bf4e9d26c71a53925..779ba2d8dac3cfba1f27a57dbdecad260d97956c 100644 --- a/syntaxnet/README.md +++ b/syntaxnet/README.md @@ -1,90 +1,75 @@ # SyntaxNet: Neural Models of Syntax. -*A TensorFlow implementation of the models described in [Andor et al. (2016)] -(http://arxiv.org/abs/1603.06042).* +*A TensorFlow toolkit for deep learning powered natural language understanding +(NLU).* -**Update**: Parsey models are now [available](universal.md) for 40 languages -trained on Universal Dependencies datasets, with support for text segmentation -and morphological analysis. +**CoNLL**: See [here](g3doc/conll2017/README.md) for instructions for using the +SyntaxNet/DRAGNN baseline for the CoNLL2017 Shared Task. At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways. We are excited to share the fruits of our research with the broader community by -releasing SyntaxNet, an open-source neural network framework for [TensorFlow] -(http://www.tensorflow.org) that provides a foundation for Natural Language -Understanding (NLU) systems. Our release includes all the code needed to train -new SyntaxNet models on your own data, as well as *Parsey McParseface*, an -English parser that we have trained for you, and that you can use to analyze -English text. - -So, how accurate is Parsey McParseface? For this release, we tried to balance a -model that runs fast enough to be useful on a single machine (e.g. ~600 -words/second on a modern desktop) and that is also the most accurate parser -available. Here's how Parsey McParseface compares to the academic literature on -several different English domains: (all numbers are % correct head assignments -in the tree, or unlabelled attachment score) +releasing SyntaxNet, an open-source neural network framework for +[TensorFlow](http://www.tensorflow.org) that provides a foundation for Natural +Language Understanding (NLU) systems. Our release includes all the code needed +to train new SyntaxNet models on your own data, as well as a suite of models +that we have trained for you, and that you can use to analyze text in over 40 +languages. + +This repository is largely divided into two sub-packages: + +1. **DRAGNN: + [code](https://github.com/tensorflow/models/tree/master/syntaxnet/dragnn), + [documentation](g3doc/DRAGNN.md), + [paper](https://arxiv.org/pdf/1703.04474.pdf)** implements Dynamic Recurrent + Acyclic Graphical Neural Networks (DRAGNN), a framework for building + multi-task, fully dynamically constructed computation graphs. Practically, we + use DRAGNN to extend our prior work from [Andor et al. + (2016)](http://arxiv.org/abs/1603.06042) with end-to-end, deep recurrent + models and to provide a much easier to use interface to SyntaxNet. *DRAGNN + is designed first and foremost as a Python library, and therefore much + easier to use than the original SyntaxNet implementation.* + +1. **SyntaxNet: + [code](https://github.com/tensorflow/models/tree/master/syntaxnet/syntaxnet), + [documentation](g3doc/syntaxnet-tutorial.md)** is a transition-based + framework for natural language processing, with core functionality for + feature extraction, representing annotated data, and evaluation. As of the + DRAGNN release, it is recommended to train and deploy SyntaxNet models using + the DRAGNN framework. + +## How to use this library + +There are three ways to use SyntaxNet: + +* See [here](g3doc/conll2017/README.md) for instructions for using the + SyntaxNet/DRAGNN baseline for the CoNLL2017 Shared Task, and running the + ParseySaurus models. +* You can use DRAGNN to train your NLP models for other tasks and dataset. See + "Getting started with DRAGNN" below. +* You can continue to use the Parsey McParseface family of pre-trained + SyntaxNet models. See "Pre-trained NLP models" below. -Model | News | Web | Questions ---------------------------------------------------------------------------------------------------------------- | :---: | :---: | :-------: -[Martins et al. (2013)](http://www.cs.cmu.edu/~ark/TurboParser/) | 93.10 | 88.23 | 94.21 -[Zhang and McDonald (2014)](http://research.google.com/pubs/archive/38148.pdf) | 93.32 | 88.65 | 93.37 -[Weiss et al. (2015)](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf) | 93.91 | 89.29 | 94.17 -[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 94.44 | 90.17 | 95.40 -Parsey McParseface | 94.15 | 89.08 | 94.77 - -We see that Parsey McParseface is state-of-the-art; more importantly, with -SyntaxNet you can train larger networks with more hidden units and bigger beam -sizes if you want to push the accuracy even further: [Andor et al. (2016)] -(http://arxiv.org/abs/1603.06042)* is simply a SyntaxNet model with a -larger beam and network. For futher information on the datasets, see that paper -under the section "Treebank Union". - -Parsey McParseface is also state-of-the-art for part-of-speech (POS) tagging -(numbers below are per-token accuracy): +## Installation -Model | News | Web | Questions --------------------------------------------------------------------------- | :---: | :---: | :-------: -[Ling et al. (2015)](http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf) | 97.44 | 94.03 | 96.18 -[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 97.77 | 94.80 | 96.86 -Parsey McParseface | 97.52 | 94.24 | 96.45 +### Docker installation -The first part of this tutorial describes how to install the necessary tools and -use the already trained models provided in this release. In the second part of -the tutorial we provide more background about the models, as well as -instructions for training models on other datasets. - -## Contents -* [Installation](#installation) -* [Getting Started](#getting-started) - * [Parsing from Standard Input](#parsing-from-standard-input) - * [Annotating a Corpus](#annotating-a-corpus) - * [Configuring the Python Scripts](#configuring-the-python-scripts) - * [Next Steps](#next-steps) -* [Detailed Tutorial: Building an NLP Pipeline with SyntaxNet](#detailed-tutorial-building-an-nlp-pipeline-with-syntaxnet) - * [Obtaining Data](#obtaining-data) - * [Part-of-Speech Tagging](#part-of-speech-tagging) - * [Training the SyntaxNet POS Tagger](#training-the-syntaxnet-pos-tagger) - * [Preprocessing with the Tagger](#preprocessing-with-the-tagger) - * [Dependency Parsing: Transition-Based Parsing](#dependency-parsing-transition-based-parsing) - * [Training a Parser Step 1: Local Pretraining](#training-a-parser-step-1-local-pretraining) - * [Training a Parser Step 2: Global Training](#training-a-parser-step-2-global-training) -* [Contact](#contact) -* [Credits](#credits) +The simplest way to get started with DRAGNN is by loading our Docker container. +[Here](g3doc/CLOUD.md) is a tutorial for running the DRAGNN container on +[GCP](https://cloud.google.com) (just as applicable to your own computer). -## Installation +### Manual installation -Running and training SyntaxNet models requires building this package from +Running and training SyntaxNet/DRAGNN models requires building this package from source. You'll need to install: * python 2.7: - * python 3 support is not available yet + * Python 3 support is not available yet * bazel: - * **version 0.4.3** - * follow the instructions [here](http://bazel.build/docs/install.html) - * Alternately, Download bazel (0.4.3) <.deb> from - [https://github.com/bazelbuild/bazel/releases] - (https://github.com/bazelbuild/bazel/releases) for your system - configuration. + * Follow the instructions [here](http://bazel.build/docs/install.html) + * Alternately, Download bazel <.deb> from + [https://github.com/bazelbuild/bazel/releases](https://github.com/bazelbuild/bazel/releases) + for your system configuration. * Install it using the command: sudo dpkg -i <.deb file> * Check for the bazel version by typing: bazel version * swig: @@ -92,13 +77,18 @@ source. You'll need to install: * `brew install swig` on OSX * protocol buffers, with a version supported by TensorFlow: * check your protobuf version with `pip freeze | grep protobuf` - * upgrade to a supported version with `pip install -U protobuf==3.0.0b2` + * upgrade to a supported version with `pip install -U protobuf==3.3.0` * mock, the testing package: * `pip install mock` * asciitree, to draw parse trees on the console for the demo: * `pip install asciitree` * numpy, package for scientific computing: * `pip install numpy` +* pygraphviz to visualize traces and parse trees: + * `apt-get install -y graphviz libgraphviz-dev` + * `pip install pygraphviz + --install-option="--include-path=/usr/include/graphviz" + --install-option="--library-path=/usr/lib/graphviz/"` Once you completed the above steps, you can build and test SyntaxNet with the following commands: @@ -108,31 +98,93 @@ following commands: cd models/syntaxnet/tensorflow ./configure cd .. - bazel test syntaxnet/... util/utf8/... + bazel test ... # On Mac, run the following: bazel test --linkopt=-headerpad_max_install_names \ - syntaxnet/... util/utf8/... + dragnn/... syntaxnet/... util/utf8/... ``` - Bazel should complete reporting all tests passed. -You can also compile SyntaxNet in a [Docker](https://www.docker.com/what-docker) -container using this [Dockerfile](Dockerfile). +Now you can install the SyntaxNet and DRAGNN Python modules with the following commands: +```shell + mkdir /tmp/syntaxnet_pkg + bazel-bin/dragnn/tools/build_pip_package --output-dir=/tmp/syntaxnet_pkg + # The filename of the .whl depends on your platform. + sudo pip install /tmp/syntaxnet_pkg/syntaxnet-x.xx-none-any.whl +``` To build SyntaxNet with GPU support please refer to the instructions in [issues/248](https://github.com/tensorflow/models/issues/248). + + **Note:** If you are running Docker on OSX, make sure that you have enough memory allocated for your Docker VM. ## Getting Started +We have a few guides on this README, as well as more extensive +[documentation](g3doc/). + +### Learning the DRAGNN framework + +![DRAGNN](g3doc/unrolled-dragnn.png) + +An easy and visual way to get started with DRAGNN is to run our Jupyter +notebooks for [interactive +debugging](examples/dragnn/interactive_text_analyzer.ipynb) and [training a new +model](examples/dragnn/trainer_tutorial.ipynb). Our tutorial +[here](g3doc/CLOUD.md) explains how to start it up from the Docker container. +Once you have DRAGNN installed and running, try out the +[ParseySaurus](g3doc/conll2017) models. + +### Using the Pre-trained NLP models + +We are happy to release *Parsey McParseface*, an English parser that we have +trained for you, and that you can use to analyze English text, along with +[trained models for 40 languages](g3doc/universal.md) and support for text +segmentation and morphological analysis. + Once you have successfully built SyntaxNet, you can start parsing text right away with Parsey McParseface, located under `syntaxnet/models`. The easiest thing is to use or modify the included script `syntaxnet/demo.sh`, which shows a basic setup to parse English taking plain text as input. -### Parsing from Standard Input +You can also skip right away to the [detailed SyntaxNet +tutorial](g3doc/syntaxnet-tutorial.md). + +How accurate is Parsey McParseface? For the initial release, we tried to balance +a model that runs fast enough to be useful on a single machine (e.g. ~600 +words/second on a modern desktop) and that is also the most accurate parser +available. Here's how Parsey McParseface compares to the academic literature on +several different English domains: (all numbers are % correct head assignments +in the tree, or unlabelled attachment score) + +Model | News | Web | Questions +--------------------------------------------------------------------------------------------------------------- | :---: | :---: | :-------: +[Martins et al. (2013)](http://www.cs.cmu.edu/~ark/TurboParser/) | 93.10 | 88.23 | 94.21 +[Zhang and McDonald (2014)](http://research.google.com/pubs/archive/38148.pdf) | 93.32 | 88.65 | 93.37 +[Weiss et al. (2015)](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43800.pdf) | 93.91 | 89.29 | 94.17 +[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 94.44 | 90.17 | 95.40 +Parsey McParseface | 94.15 | 89.08 | 94.77 + +We see that Parsey McParseface is state-of-the-art; more importantly, with +SyntaxNet you can train larger networks with more hidden units and bigger beam +sizes if you want to push the accuracy even further: [Andor et al. +(2016)](http://arxiv.org/abs/1603.06042)* is simply a SyntaxNet model with a +larger beam and network. For futher information on the datasets, see that paper +under the section "Treebank Union". + +Parsey McParseface is also state-of-the-art for part-of-speech (POS) tagging +(numbers below are per-token accuracy): + +Model | News | Web | Questions +-------------------------------------------------------------------------- | :---: | :---: | :-------: +[Ling et al. (2015)](http://www.cs.cmu.edu/~lingwang/papers/emnlp2015.pdf) | 97.44 | 94.03 | 96.18 +[Andor et al. (2016)](http://arxiv.org/abs/1603.06042)* | 97.77 | 94.80 | 96.86 +Parsey McParseface | 97.52 | 94.24 | 96.45 + +#### Parsing from Standard Input Simply pass one sentence per line of text into the script at `syntaxnet/demo.sh`. The script will break the text into words, run the POS @@ -160,7 +212,7 @@ visualized in our tutorial graphs. In this example, we see that the verb If you want to feed in tokenized, CONLL-formatted text, you can run `demo.sh --conll`. -### Annotating a Corpus +#### Annotating a Corpus To change the pipeline to read and write to specific files (as opposed to piping through stdin and stdout), we have to modify the `demo.sh` to point to the files @@ -200,7 +252,7 @@ input { Then we can use `--input=wsj-data --output=wsj-data-tagged` on the command line to specify reading and writing to these files. -### Configuring the Python Scripts +#### Configuring the Python Scripts As mentioned above, the python scripts are configured in two ways: @@ -234,386 +286,13 @@ There are many ways to extend this framework, e.g. adding new features, changing the model structure, training on other languages, etc. We suggest reading the detailed tutorial below to get a handle on the rest of the framework. -## Detailed Tutorial: Building an NLP Pipeline with SyntaxNet - -In this tutorial, we'll go over how to train new models, and explain in a bit -more technical detail the NLP side of the models. Our goal here is to explain -the NLP pipeline produced by this package. - -### Obtaining Data - -The included English parser, Parsey McParseface, was trained on the the standard -corpora of the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) and -[OntoNotes](https://catalog.ldc.upenn.edu/LDC2013T19), as well as the [English -Web Treebank](https://catalog.ldc.upenn.edu/LDC2012T13), but these are -unfortunately not freely available. - -However, the [Universal Dependencies](http://universaldependencies.org/) project -provides freely available treebank data in a number of languages. SyntaxNet can -be trained and evaluated on any of these corpora. - -### Part-of-Speech Tagging - -Consider the following sentence, which exhibits several ambiguities that affect -its interpretation: - -> I saw the man with glasses. - -This sentence is composed of words: strings of characters that are segmented -into groups (e.g. "I", "saw", etc.) Each word in the sentence has a *grammatical -function* that can be useful for understanding the meaning of language. For -example, "saw" in this example is a past tense of the verb "to see". But any -given word might have different meanings in different contexts: "saw" could just -as well be a noun (e.g., a saw used for cutting) or a present tense verb (using -a saw to cut something). - -A logical first step in understanding language is figuring out these roles for -each word in the sentence. This process is called *Part-of-Speech (POS) -Tagging*. The roles are called POS tags. Although a given word might have -multiple possible tags depending on the context, given any one interpretation of -a sentence each word will generally only have one tag. - -One interesting challenge of POS tagging is that the problem of defining a -vocabulary of POS tags for a given language is quite involved. While the concept -of nouns and verbs is pretty common, it has been traditionally difficult to -agree on a standard set of roles across all languages. The [Universal -Dependencies](http://www.universaldependencies.org) project aims to solve this -problem. - -### Training the SyntaxNet POS Tagger - -In general, determining the correct POS tag requires understanding the entire -sentence and the context in which it is uttered. In practice, we can do very -well just by considering a small window of words around the word of interest. -For example, words that follow the word ‘the’ tend to be adjectives or nouns, -rather than verbs. - -To predict POS tags, we use a simple setup. We process the sentences -left-to-right. For any given word, we extract features of that word and a window -around it, and use these as inputs to a feed-forward neural network classifier, -which predicts a probability distribution over POS tags. Because we make -decisions in left-to-right order, we also use prior decisions as features in -subsequent ones (e.g. "the previous predicted tag was a noun."). - -All the models in this package use a flexible markup language to define -features. For example, the features in the POS tagger are found in the -`brain_pos_features` parameter in the `TaskSpec`, and look like this (modulo -spacing): - -``` -stack(3).word stack(2).word stack(1).word stack.word input.word input(1).word input(2).word input(3).word; -input.digit input.hyphen; -stack.suffix(length=2) input.suffix(length=2) input(1).suffix(length=2); -stack.prefix(length=2) input.prefix(length=2) input(1).prefix(length=2) -``` - -Note that `stack` here means "words we have already tagged." Thus, this feature -spec uses three types of features: words, suffixes, and prefixes. The features -are grouped into blocks that share an embedding matrix, concatenated together, -and fed into a chain of hidden layers. This structure is based upon the model -proposed by [Chen and Manning (2014)] -(http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf). - -We show this layout in the schematic below: the state of the system (a stack and -a buffer, visualized below for both the POS and the dependency parsing task) is -used to extract sparse features, which are fed into the network in groups. We -show only a small subset of the features to simplify the presentation in the -schematic: - -![Schematic](ff_nn_schematic.png "Feed-forward Network Structure") - -In the configuration above, each block gets its own embedding matrix and the -blocks in the configuration above are delineated with a semi-colon. The -dimensions of each block are controlled in the `brain_pos_embedding_dims` -parameter. **Important note:** unlike many simple NLP models, this is *not* a -bag of words model. Remember that although certain features share embedding -matrices, the above features will be concatenated, so the interpretation of -`input.word` will be quite different from `input(1).word`. This also means that -adding features increases the dimension of the `concat` layer of the model as -well as the number of parameters for the first hidden layer. - -To train the model, first edit `syntaxnet/context.pbtxt` so that the inputs -`training-corpus`, `tuning-corpus`, and `dev-corpus` point to the location of -your training data. You can then train a part-of-speech tagger with: - -```shell -bazel-bin/syntaxnet/parser_trainer \ - --task_context=syntaxnet/context.pbtxt \ - --arg_prefix=brain_pos \ # read from POS configuration - --compute_lexicon \ # required for first stage of pipeline - --graph_builder=greedy \ # no beam search - --training_corpus=training-corpus \ # names of training/tuning set - --tuning_corpus=tuning-corpus \ - --output_path=models \ # where to save new resources - --batch_size=32 \ # Hyper-parameters - --decay_steps=3600 \ - --hidden_layer_sizes=128 \ - --learning_rate=0.08 \ - --momentum=0.9 \ - --seed=0 \ - --params=128-0.08-3600-0.9-0 # name for these parameters -``` - -This will read in the data, construct a lexicon, build a tensorflow graph for -the model with the specific hyperparameters, and train the model. Every so often -the model will be evaluated on the tuning set, and only the checkpoint with the -highest accuracy on this set will be saved. **Note that you should never use a -corpus you intend to test your model on as your tuning set, as you will inflate -your test set results.** - -For best results, you should repeat this command with at least 3 different -seeds, and possibly with a few different values for `--learning_rate` and -`--decay_steps`. Good values for `--learning_rate` are usually close to 0.1, and -you usually want `--decay_steps` to correspond to about one tenth of your -corpus. The `--params` flag is only a human readable identifier for the model -being trained, used to construct the full output path, so that you don't need to -worry about clobbering old models by accident. - -The `--arg_prefix` flag controls which parameters should be read from the task -context file `context.pbtxt`. In this case `arg_prefix` is set to `brain_pos`, -so the paramters being used in this training run are -`brain_pos_transition_system`, `brain_pos_embedding_dims`, `brain_pos_features` -and, `brain_pos_embedding_names`. To train the dependency parser later -`arg_prefix` will be set to `brain_parser`. - -### Preprocessing with the Tagger - -Now that we have a trained POS tagging model, we want to use the output of this -model as features in the parser. Thus the next step is to run the trained model -over our training, tuning, and dev (evaluation) sets. We can use the -parser_eval.py` script for this. - -For example, the model `128-0.08-3600-0.9-0` trained above can be run over the -training, tuning, and dev sets with the following command: - -```shell -PARAMS=128-0.08-3600-0.9-0 -for SET in training tuning dev; do - bazel-bin/syntaxnet/parser_eval \ - --task_context=models/brain_pos/greedy/$PARAMS/context \ - --hidden_layer_sizes=128 \ - --input=$SET-corpus \ - --output=tagged-$SET-corpus \ - --arg_prefix=brain_pos \ - --graph_builder=greedy \ - --model_path=models/brain_pos/greedy/$PARAMS/model -done -``` - -**Important note:** This command only works because we have created entries for -you in `context.pbtxt` that correspond to `tagged-training-corpus`, -`tagged-dev-corpus`, and `tagged-tuning-corpus`. From these default settings, -the above will write tagged versions of the training, tuning, and dev set to the -directory `models/brain_pos/greedy/$PARAMS/`. This location is chosen because -the `input` entries do not have `file_pattern` set: instead, they have `creator: -brain_pos/greedy`, which means that `parser_trainer.py` will construct *new* -files when called with `--arg_prefix=brain_pos --graph_builder=greedy` using the -`--model_path` flag to determine the location. - -For convenience, `parser_eval.py` also logs POS tagging accuracy after the -output tagged datasets have been written. - -### Dependency Parsing: Transition-Based Parsing - -Now that we have a prediction for the grammatical role of the words, we want to -understand how the words in the sentence relate to each other. This parser is -built around the *head-modifier* construction: for each word, we choose a -*syntactic head* that it modifies according to some grammatical role. - -An example for the above sentence is as follows: - -![Figure](sawman.png) - -Below each word in the sentence we see both a fine-grained part-of-speech -(*PRP*, *VBD*, *DT*, *NN* etc.), and a coarse-grained part-of-speech (*PRON*, -*VERB*, *DET*, *NOUN*, etc.). Coarse-grained POS tags encode basic grammatical -categories, while the fine-grained POS tags make further distinctions: for -example *NN* is a singular noun (as opposed, for example, to *NNS*, which is a -plural noun), and *VBD* is a past-tense verb. For more discussion see [Petrov et -al. (2012)](http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf). - -Crucially, we also see directed arcs signifying grammatical relationships -between different words in the sentence. For example *I* is the subject of -*saw*, as signified by the directed arc labeled *nsubj* between these words; -*man* is the direct object (dobj) of *saw*; the preposition *with* modifies -*man* with a prep relation, signifiying modification by a prepositional phrase; -and so on. In addition the verb *saw* is identified as the *root* of the entire -sentence. - -Whenever we have a directed arc between two words, we refer to the word at the -start of the arc as the *head*, and the word at the end of the arc as the -*modifier*. For example we have one arc where the head is *saw* and the modifier -is *I*, another where the head is *saw* and the modifier is *man*, and so on. - -The grammatical relationships encoded in dependency structures are directly -related to the underlying meaning of the sentence in question. They allow us to -easily recover the answers to various questions, for example *whom did I see?*, -*who saw the man with glasses?*, and so on. - -SyntaxNet is a **transition-based** dependency parser [Nivre (2007)] -(http://www.mitpressjournals.org/doi/pdfplus/10.1162/coli.07-056-R1-07-027) that -constructs a parse incrementally. Like the tagger, it processes words -left-to-right. The words all start as unprocessed input, called the *buffer*. As -words are encountered they are put onto a *stack*. At each step, the parser can -do one of three things: - -1. **SHIFT:** Push another word onto the top of the stack, i.e. shifting one - token from the buffer to the stack. -1. **LEFT_ARC:** Pop the top two words from the stack. Attach the second to the - first, creating an arc pointing to the **left**. Push the **first** word - back on the stack. -1. **RIGHT_ARC:** Pop the top two words from the stack. Attach the second to - the first, creating an arc point to the **right**. Push the **second** word - back on the stack. - -At each step, we call the combination of the stack and the buffer the -*configuration* of the parser. For the left and right actions, we also assign a -dependency relation label to that arc. This process is visualized in the -following animation for a short sentence: - -![Animation](looping-parser.gif "Parsing in Action") - -Note that this parser is following a sequence of actions, called a -**derivation**, to produce a "gold" tree labeled by a linguist. We can use this -sequence of decisions to learn a classifier that takes a configuration and -predicts the next action to take. - -### Training a Parser Step 1: Local Pretraining - -As described in our [paper](http://arxiv.org/abs/1603.06042), the first -step in training the model is to *pre-train* using *local* decisions. In this -phase, we use the gold dependency to guide the parser, and train a softmax layer -to predict the correct action given these gold dependencies. This can be -performed very efficiently, since the parser's decisions are all independent in -this setting. - -Once the tagged datasets are available, a locally normalized dependency parsing -model can be trained with the following command: - -```shell -bazel-bin/syntaxnet/parser_trainer \ - --arg_prefix=brain_parser \ - --batch_size=32 \ - --projectivize_training_set \ - --decay_steps=4400 \ - --graph_builder=greedy \ - --hidden_layer_sizes=200,200 \ - --learning_rate=0.08 \ - --momentum=0.85 \ - --output_path=models \ - --task_context=models/brain_pos/greedy/$PARAMS/context \ - --seed=4 \ - --training_corpus=tagged-training-corpus \ - --tuning_corpus=tagged-tuning-corpus \ - --params=200x200-0.08-4400-0.85-4 -``` - -Note that we point the trainer to the context corresponding to the POS tagger -that we picked previously. This allows the parser to reuse the lexicons and the -tagged datasets that were created in the previous steps. Processing data can be -done similarly to how tagging was done above. For example if in this case we -picked parameters `200x200-0.08-4400-0.85-4`, the training, tuning and dev sets -can be parsed with the following command: - -```shell -PARAMS=200x200-0.08-4400-0.85-4 -for SET in training tuning dev; do - bazel-bin/syntaxnet/parser_eval \ - --task_context=models/brain_parser/greedy/$PARAMS/context \ - --hidden_layer_sizes=200,200 \ - --input=tagged-$SET-corpus \ - --output=parsed-$SET-corpus \ - --arg_prefix=brain_parser \ - --graph_builder=greedy \ - --model_path=models/brain_parser/greedy/$PARAMS/model -done -``` - -### Training a Parser Step 2: Global Training - -As we describe in the paper, there are several problems with the locally -normalized models we just trained. The most important is the *label-bias* -problem: the model doesn't learn what a good parse looks like, only what action -to take given a history of gold decisions. This is because the scores are -normalized *locally* using a softmax for each decision. - -In the paper, we show how we can achieve much better results using a *globally* -normalized model: in this model, the softmax scores are summed in log space, and -the scores are not normalized until we reach a final decision. When the parser -stops, the scores of each hypothesis are normalized against a small set of -possible parses (in the case of this model, a beam size of 8). When training, we -force the parser to stop during parsing when the gold derivation falls off the -beam (a strategy known as early-updates). - -We give a simplified view of how this training works for a [garden path -sentence](https://en.wikipedia.org/wiki/Garden_path_sentence), where it is -important to maintain multiple hypotheses. A single mistake early on in parsing -leads to a completely incorrect parse; after training, the model learns to -prefer the second (correct) parse. - -![Beam search training](beam_search_training.png) - -Parsey McParseface correctly parses this sentence. Even though the correct parse -is initially ranked 4th out of multiple hypotheses, when the end of the garden -path is reached, Parsey McParseface can recover due to the beam; using a larger -beam will get a more accurate model, but it will be slower (we used beam 32 for -the models in the paper). - -Once you have the pre-trained locally normalized model, a globally normalized -parsing model can now be trained with the following command: - -```shell -bazel-bin/syntaxnet/parser_trainer \ - --arg_prefix=brain_parser \ - --batch_size=8 \ - --decay_steps=100 \ - --graph_builder=structured \ - --hidden_layer_sizes=200,200 \ - --learning_rate=0.02 \ - --momentum=0.9 \ - --output_path=models \ - --task_context=models/brain_parser/greedy/$PARAMS/context \ - --seed=0 \ - --training_corpus=projectivized-training-corpus \ - --tuning_corpus=tagged-tuning-corpus \ - --params=200x200-0.02-100-0.9-0 \ - --pretrained_params=models/brain_parser/greedy/$PARAMS/model \ - --pretrained_params_names=\ -embedding_matrix_0,embedding_matrix_1,embedding_matrix_2,\ -bias_0,weights_0,bias_1,weights_1 -``` - -Training a beam model with the structured builder will take a lot longer than -the greedy training runs above, perhaps 3 or 4 times longer. Note once again -that multiple restarts of training will yield the most reliable results. -Evaluation can again be done with `parser_eval.py`. In this case we use -parameters `200x200-0.02-100-0.9-0` to evaluate on the training, tuning and dev -sets with the following command: - -```shell -PARAMS=200x200-0.02-100-0.9-0 -for SET in training tuning dev; do - bazel-bin/syntaxnet/parser_eval \ - --task_context=models/brain_parser/structured/$PARAMS/context \ - --hidden_layer_sizes=200,200 \ - --input=tagged-$SET-corpus \ - --output=beam-parsed-$SET-corpus \ - --arg_prefix=brain_parser \ - --graph_builder=structured \ - --model_path=models/brain_parser/structured/$PARAMS/model -done -``` - -Hooray! You now have your very own cousin of Parsey McParseface, ready to go out -and parse text in the wild. - ## Contact To ask questions or report issues please post on Stack Overflow with the tag -[syntaxnet](http://stackoverflow.com/questions/tagged/syntaxnet) -or open an issue on the tensorflow/models -[issues tracker](https://github.com/tensorflow/models/issues). -Please assign SyntaxNet issues to @calberti or @andorardo. +[syntaxnet](http://stackoverflow.com/questions/tagged/syntaxnet) or open an +issue on the tensorflow/models [issues +tracker](https://github.com/tensorflow/models/issues). Please assign SyntaxNet +issues to @calberti or @andorardo. ## Credits @@ -623,6 +302,7 @@ Original authors of the code in this package include (in alphabetical order): * Aliaksei Severyn * Andy Golding * Bernd Bohnet +* Chayut Thanapirom * Chris Alberti * Daniel Andor * David Weiss @@ -632,7 +312,9 @@ Original authors of the code in this package include (in alphabetical order): * Ji Ma * Keith Hall * Kuzman Ganchev +* Lingpeng Kong * Livio Baldini Soares +* Mark Omernick * Michael Collins * Michael Ringgaard * Ryan McDonald @@ -640,3 +322,4 @@ Original authors of the code in this package include (in alphabetical order): * Stefan Istrate * Terry Koo * Tim Credo +* Zora Tung diff --git a/syntaxnet/WORKSPACE b/syntaxnet/WORKSPACE index 8b5b22442e78e4c21116f6c84c07a4aeb6f1f871..f9b2ffd6238d48851b686500066aa354bcbb4c9f 100644 --- a/syntaxnet/WORKSPACE +++ b/syntaxnet/WORKSPACE @@ -3,10 +3,23 @@ local_repository( path = "tensorflow", ) +# We need to pull in @io_bazel_rules_closure for TensorFlow. Bazel design +# documentation states that this verbosity is intentional, to prevent +# TensorFlow/SyntaxNet from depending on different versions of +# @io_bazel_rules_closure. +http_archive( + name = "io_bazel_rules_closure", + sha256 = "60fc6977908f999b23ca65698c2bb70213403824a84f7904310b6000d78be9ce", + strip_prefix = "rules_closure-5ca1dab6df9ad02050f7ba4e816407f88690cf7d", + urls = [ + "http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_closure/archive/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz", # 2017-02-03 + "https://github.com/bazelbuild/rules_closure/archive/5ca1dab6df9ad02050f7ba4e816407f88690cf7d.tar.gz", + ], +) + load("@org_tensorflow//tensorflow:workspace.bzl", "tf_workspace") tf_workspace(path_prefix="", tf_repo_name="org_tensorflow") # Test that Bazel is up-to-date. load("@org_tensorflow//tensorflow:workspace.bzl", "check_version") -check_version("0.4.3") - +check_version("0.4.2") diff --git a/syntaxnet/docker-devel/Dockerfile.min b/syntaxnet/docker-devel/Dockerfile.min new file mode 100644 index 0000000000000000000000000000000000000000..876f69d9b6831287585a6e78a8ee211b2b09c8fb --- /dev/null +++ b/syntaxnet/docker-devel/Dockerfile.min @@ -0,0 +1,66 @@ +# You need to build wheels before building this image. Please consult +# docker-devel/README.txt. + +# This is the base of the openjdk image. +# +# It might be more efficient to use a minimal distribution, like Alpine. But +# the upside of this being popular is that people might already have it. +FROM buildpack-deps:jessie-curl + +ENV SYNTAXNETDIR=/opt/tensorflow PATH=$PATH:/root/bin + +RUN apt-get update \ + && apt-get install -y \ + file \ + git \ + graphviz \ + libcurl3 \ + libfreetype6 \ + libgraphviz-dev \ + liblapack3 \ + libopenblas-base \ + libpng12-0 \ + libxft2 \ + python-dev \ + python-mock \ + python-pip \ + python2.7 \ + zlib1g-dev \ + && apt-get clean \ + && (rm -f /var/cache/apt/archives/*.deb \ + /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true) + +# Install common Python dependencies. Similar to above, remove caches +# afterwards to help keep Docker images smaller. +RUN pip install --ignore-installed pip \ + && python -m pip install numpy \ + && rm -rf /root/.cache/pip /tmp/pip* +RUN python -m pip install \ + asciitree \ + ipykernel \ + jupyter \ + matplotlib \ + pandas \ + protobuf \ + scipy \ + sklearn \ + && python -m ipykernel.kernelspec \ + && python -m pip install pygraphviz \ + --install-option="--include-path=/usr/include/graphviz" \ + --install-option="--library-path=/usr/lib/graphviz/" \ + && rm -rf /root/.cache/pip /tmp/pip* + +COPY syntaxnet_with_tensorflow-0.2-cp27-none-linux_x86_64.whl $SYNTAXNETDIR/ +RUN python -m pip install \ + $SYNTAXNETDIR/syntaxnet_with_tensorflow-0.2-cp27-none-linux_x86_64.whl \ + && rm -rf /root/.cache/pip /tmp/pip* + +# This makes the IP exposed actually "*"; we'll do host restrictions by passing +# a hostname to the `docker run` command. +COPY tensorflow/tensorflow/tools/docker/jupyter_notebook_config.py /root/.jupyter/ +EXPOSE 8888 + +# This does not need to be compiled, only copied. +COPY examples $SYNTAXNETDIR/syntaxnet/examples +# For some reason, this works if we run it in a bash shell :/ :/ :/ +CMD /bin/bash -c "python -m jupyter_core.command notebook --debug --notebook-dir=/opt/tensorflow/syntaxnet/examples" diff --git a/syntaxnet/docker-devel/README.txt b/syntaxnet/docker-devel/README.txt new file mode 100644 index 0000000000000000000000000000000000000000..e190d5991d2db6d552f1a5fd29dbdf968de62169 --- /dev/null +++ b/syntaxnet/docker-devel/README.txt @@ -0,0 +1,64 @@ +Docker is used for packaging the SyntaxNet. There are three primary things we +build with Docker, + +1. A development image, which contains all source built with Bazel. +2. Python/pip wheels, built by running a command in the development container. +3. A minified image, which only has the compiled version of TensorFlow and + SyntaxNet, by installing the wheel built by the above step. + + +Important info (please read) +------------------------------ + +One thing to be wary of is that YOU CAN LOSE DATA IF YOU DEVELOP IN A DOCKER +CONTAINER. Please be very careful to mount data you care about to Docker +volumes, or use a volume mount so that it's mapped to your host filesystem. + +Another note, especially relevant to training models, is that Docker sends the +whole source tree to the Docker daemon every time you try to build an image. +This can take some time if you have large temporary model files lying around. +You can exclude your model files by editing .dockerignore, or just don't store +them in the base directory. + + +Step 1: Building the development image +------------------------------ + +Simply run `docker build -t dragnn-oss .` in the base directory. Make sure you +have all the source checked out correctly, including git submodules. + + +Step 2: Building wheels +------------------------------ + +Please run, + + bash ./docker-devel/build_wheels.sh + +This actually builds the image from Step 1 as well. + + +Step 3: Building the development image +------------------------------ + +First, ensure you have the file + + syntaxnet_with_tensorflow-0.2-cp27-none-linux_x86_64.whl + +in your working directory, from step 2. Then run, + + docker build -t dragnn-oss:latest-minimal -f docker-devel/Dockerfile.min + +If the filename changes (e.g. you are on a different architecture), just update +Dockerfile.min. + + +Developing in Docker +------------------------------ + +We recommend developing in Docker by using the `./docker-devel/build_devel.sh` +script; it will set up a few volume mounts, and port mappings automatically. +You may want to add more port mappings on your own. If you want to drop into a +shell instead of launching the notebook, simply run, + + ./docker-devel/build_devel.sh /bin/bash diff --git a/syntaxnet/docker-devel/build_devel.sh b/syntaxnet/docker-devel/build_devel.sh new file mode 100755 index 0000000000000000000000000000000000000000..57a745ba10419d0cc22207d157cbf35dd977aa4e --- /dev/null +++ b/syntaxnet/docker-devel/build_devel.sh @@ -0,0 +1,28 @@ +#!/bin/bash +# +# This file puts you in a Docker sub-shell where you can build SyntaxNet +# targets. It is intended for development, as the Dockerfile (build file) does +# not actually build any of SyntaxNet, but instead mounts it in a volume. + +script_path="$(readlink -f "$0")" +root_path="$(dirname "$(dirname "${script_path}")")" +set -e + +if [[ -z "$(docker images -q dragnn-oss)" ]]; then + docker build -t dragnn-oss . +else + echo "NOTE: dragnn-oss image already exists, not re-building." >&2 + echo "Please run \`docker build -t dragnn-oss .\` if you need." >&2 +fi + +echo -e "\n\nRun bazel commands like \`bazel test syntaxnet/...\`" + +# NOTE: Unfortunately, we need to mount /tensorflow over /syntaxnet/tensorflow +# (which happens via devel_entrypoint.sh). This requires privileged mode. +syntaxnet_base="/opt/tensorflow/syntaxnet" +docker run --rm -ti \ + -v "${root_path}"/syntaxnet:"${syntaxnet_base}"/syntaxnet \ + -v "${root_path}"/dragnn:"${syntaxnet_base}"/dragnn \ + -v "${root_path}"/examples:"${syntaxnet_base}"/examples \ + -p 127.0.0.1:8888:8888 \ + dragnn-oss "$@" diff --git a/syntaxnet/docker-devel/build_wheels.sh b/syntaxnet/docker-devel/build_wheels.sh new file mode 100644 index 0000000000000000000000000000000000000000..a79063abed4d0df8c6efbcffb31eed88073c5fdb --- /dev/null +++ b/syntaxnet/docker-devel/build_wheels.sh @@ -0,0 +1,35 @@ +#!/bin/bash +# +# Convenience script to build wheel files in Docker, and copy them out of the +# container. +# +# Usage: docker-devel/build_wheels.sh (takes no arguments; run it from the base +# directory). +set -e +docker build -t dragnn-oss . + +# Start building the wheels. +script="bazel run //dragnn/tools:build_pip_package \ + -- --output-dir=/opt/tensorflow/syntaxnet; \ + bazel run //dragnn/tools:build_pip_package \ + -- --output-dir=/opt/tensorflow/syntaxnet --include-tensorflow" +container_id="$(docker run -d dragnn-oss /bin/bash -c "${script}")" + +echo "Waiting for container ${container_id} to finish building the wheel ..." +if [[ "$(docker wait "${container_id}")" != 0 ]]; then + echo "Container failed! Please run \`docker logs \` to see errors." >&2 + exit 1 +fi + +# The build_pip_package.py script prints lines like "Wrote x.whl". The wheel +# names are prefixed by architecture and such, so don't guess them. +wheels=( + $(docker logs "${container_id}" 2>/dev/null | grep Wrote | awk '{print $2;}')) +for wheel in "${wheels[@]}"; do + output=./"$(basename "${wheel}")" + docker cp "${container_id}:${wheel}" "${output}" + echo "Wrote ${output} ($(du -h "${output}" | awk '{print $1;}'))" +done + +echo "Removing ${container_id} ..." +docker rm "${container_id}" >/dev/null diff --git a/syntaxnet/dragnn/BUILD b/syntaxnet/dragnn/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..3ac01eacdceb557c72f939a45c01127251b20f06 --- /dev/null +++ b/syntaxnet/dragnn/BUILD @@ -0,0 +1,5 @@ +package_group( + name = "dragnn_visibility", + packages = [ + ], +) diff --git a/syntaxnet/dragnn/components/stateless/BUILD b/syntaxnet/dragnn/components/stateless/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..dd8caf2379c72c20b21403bdf430d6185462a8f1 --- /dev/null +++ b/syntaxnet/dragnn/components/stateless/BUILD @@ -0,0 +1,34 @@ +package( + default_visibility = ["//visibility:public"], + features = ["-layering_check"], +) + +cc_library( + name = "stateless_component", + srcs = ["stateless_component.cc"], + deps = [ + "//dragnn/core:component_registry", + "//dragnn/core/interfaces:component", + "//dragnn/core/interfaces:transition_state", + "//dragnn/io:sentence_input_batch", + "//dragnn/protos:data_proto", + "//syntaxnet:base", + ], + alwayslink = 1, +) + +cc_test( + name = "stateless_component_test", + srcs = ["stateless_component_test.cc"], + deps = [ + ":stateless_component", + "//dragnn/core:component_registry", + "//dragnn/core:input_batch_cache", + "//dragnn/core/test:generic", + "//dragnn/core/test:mock_transition_state", + "//dragnn/io:sentence_input_batch", + "//syntaxnet:base", + "//syntaxnet:sentence_proto", + "//syntaxnet:test_main", + ], +) diff --git a/syntaxnet/dragnn/components/stateless/stateless_component.cc b/syntaxnet/dragnn/components/stateless/stateless_component.cc new file mode 100644 index 0000000000000000000000000000000000000000..47a7b70b28e7b72eb1d1db8c3afba6be3340f1da --- /dev/null +++ b/syntaxnet/dragnn/components/stateless/stateless_component.cc @@ -0,0 +1,131 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/component_registry.h" +#include "dragnn/core/interfaces/component.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/io/sentence_input_batch.h" +#include "dragnn/protos/data.pb.h" +#include "syntaxnet/base.h" + +namespace syntaxnet { +namespace dragnn { +namespace { + +// A component that does not create its own transition states; instead, it +// simply forwards the states of the previous component. Does not support all +// methods. Intended for "compute-only" bulk components that only use linked +// features, which use only a small subset of DRAGNN functionality. +class StatelessComponent : public Component { + public: + void InitializeComponent(const ComponentSpec &spec) override { + name_ = spec.name(); + } + + // Stores the |parent_states| for forwarding to downstream components. + void InitializeData( + const std::vector> &parent_states, + int max_beam_size, InputBatchCache *input_data) override { + // Must use SentenceInputBatch to match SyntaxNetComponent. + batch_size_ = input_data->GetAs()->data()->size(); + beam_size_ = max_beam_size; + parent_states_ = parent_states; + + // The beam should be wide enough for the previous component. + for (const auto &beam : parent_states) { + CHECK_LE(beam.size(), beam_size_); + } + } + + // Forwards the states of the previous component. + std::vector> GetBeam() override { + return parent_states_; + } + + // Forwards the |current_index| to the previous component. + int GetSourceBeamIndex(int current_index, int batch) const override { + return current_index; + } + + string Name() const override { return name_; } + int BeamSize() const override { return beam_size_; } + int BatchSize() const override { return batch_size_; } + int StepsTaken(int batch_index) const override { return 0; } + bool IsReady() const override { return true; } + bool IsTerminal() const override { return true; } + void FinalizeData() override {} + void ResetComponent() override {} + void InitializeTracing() override {} + void DisableTracing() override {} + std::vector> GetTraceProtos() const override { + return {}; + } + + // Unsupported methods. + int GetBeamIndexAtStep(int step, int current_index, + int batch) const override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + return 0; + } + std::function GetStepLookupFunction( + const string &method) override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + return nullptr; + } + void AdvanceFromPrediction(const float transition_matrix[], + int matrix_length) override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + } + void AdvanceFromOracle() override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + } + std::vector> GetOracleLabels() const override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + return {}; + } + int GetFixedFeatures(std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + return 0; + } + int BulkGetFixedFeatures(const BulkFeatureExtractor &extractor) override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + return 0; + } + std::vector GetRawLinkFeatures(int channel_id) const override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + return {}; + } + void AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) override { + LOG(FATAL) << "[" << name_ << "] Method not supported"; + } + + private: + string name_; // component name + int batch_size_ = 1; // number of sentences in current batch + int beam_size_ = 1; // maximum beam size + + // Parent states passed to InitializeData(), and passed along in GetBeam(). + std::vector> parent_states_; +}; + +REGISTER_DRAGNN_COMPONENT(StatelessComponent); + +} // namespace +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/stateless/stateless_component_test.cc b/syntaxnet/dragnn/components/stateless/stateless_component_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..4a7d43312db74166c348261e63dc32a2967fe6ba --- /dev/null +++ b/syntaxnet/dragnn/components/stateless/stateless_component_test.cc @@ -0,0 +1,171 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/component_registry.h" +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/core/test/generic.h" +#include "dragnn/core/test/mock_transition_state.h" +#include "dragnn/io/sentence_input_batch.h" +#include "syntaxnet/base.h" +#include "syntaxnet/sentence.pb.h" +#include "tensorflow/core/lib/core/errors.h" +#include "tensorflow/core/lib/core/status.h" +#include "tensorflow/core/lib/io/path.h" +#include "tensorflow/core/platform/env.h" +#include "tensorflow/core/platform/protobuf.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { +namespace { + +const char kSentence0[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "0" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +const char kSentence1[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "1" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +const char kLongSentence[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "1" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "2" start: 10 end: 10 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "3" start: 11 end: 11 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 12 end: 12 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +const char kMasterSpec[] = R"( +component { + name: "test" + transition_system { + registered_name: "shift-only" + } + linked_feature { + name: "prev" + fml: "input.focus" + embedding_dim: 32 + size: 1 + source_component: "prev" + source_translator: "identity" + source_layer: "last_layer" + } + backend { + registered_name: "StatelessComponent" + } +} +)"; + +} // namespace + +using testing::Return; + +class StatelessComponentTest : public ::testing::Test { + public: + std::unique_ptr CreateParser( + int beam_size, + const std::vector> &states, + const std::vector &data) { + MasterSpec master_spec; + CHECK(TextFormat::ParseFromString(kMasterSpec, &master_spec)); + data_.reset(new InputBatchCache(data)); + + // Create a parser component with the specified beam size. + std::unique_ptr parser_component( + Component::Create("StatelessComponent")); + parser_component->InitializeComponent(master_spec.component(0)); + parser_component->InitializeData(states, beam_size, data_.get()); + return parser_component; + } + + std::unique_ptr data_; +}; + +TEST_F(StatelessComponentTest, ForwardsTransitionStates) { + MockTransitionState mock_state_1, mock_state_2, mock_state_3; + const std::vector> parent_states = { + {}, {&mock_state_1}, {&mock_state_2, &mock_state_3}}; + + std::vector data; + for (const string &textproto : {kSentence0, kSentence1, kLongSentence}) { + Sentence sentence; + CHECK(TextFormat::ParseFromString(textproto, &sentence)); + data.emplace_back(); + CHECK(sentence.SerializeToString(&data.back())); + } + CHECK_EQ(parent_states.size(), data.size()); + + const int kBeamSize = 2; + auto test_parser = CreateParser(kBeamSize, parent_states, data); + + EXPECT_TRUE(test_parser->IsReady()); + EXPECT_TRUE(test_parser->IsTerminal()); + EXPECT_EQ(kBeamSize, test_parser->BeamSize()); + EXPECT_EQ(data.size(), test_parser->BatchSize()); + EXPECT_TRUE(test_parser->GetTraceProtos().empty()); + + for (int batch_index = 0; batch_index < parent_states.size(); ++batch_index) { + EXPECT_EQ(0, test_parser->StepsTaken(batch_index)); + const auto &beam = parent_states[batch_index]; + for (int beam_index = 0; beam_index < beam.size(); ++beam_index) { + // Expect an identity mapping. + EXPECT_EQ(beam_index, + test_parser->GetSourceBeamIndex(beam_index, batch_index)); + } + } + + const auto forwarded_states = test_parser->GetBeam(); + EXPECT_EQ(parent_states, forwarded_states); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/BUILD b/syntaxnet/dragnn/components/syntaxnet/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..6c746a4a58017cd1caabf56c7888de8ffe9b560c --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/BUILD @@ -0,0 +1,113 @@ +package( + default_visibility = ["//visibility:public"], + features = ["-layering_check"], +) + +cc_library( + name = "syntaxnet_component", + srcs = ["syntaxnet_component.cc"], + hdrs = ["syntaxnet_component.h"], + deps = [ + ":syntaxnet_link_feature_extractor", + ":syntaxnet_transition_state", + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core:beam", + "//dragnn/core:component_registry", + "//dragnn/core:input_batch_cache", + "//dragnn/core/interfaces:component", + "//dragnn/core/interfaces:transition_state", + "//dragnn/io:sentence_input_batch", + "//dragnn/io:syntaxnet_sentence", + "//dragnn/protos:data_proto", + "//dragnn/protos:spec_proto", + "//dragnn/protos:trace_proto", + "//syntaxnet:base", + "//syntaxnet:parser_transitions", + "//syntaxnet:registry", + "//syntaxnet:sparse_proto", + "//syntaxnet:task_context", + "//syntaxnet:task_spec_proto", + "//syntaxnet:utils", + ], + alwayslink = 1, +) + +cc_library( + name = "syntaxnet_link_feature_extractor", + srcs = ["syntaxnet_link_feature_extractor.cc"], + hdrs = ["syntaxnet_link_feature_extractor.h"], + deps = [ + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "//syntaxnet:embedding_feature_extractor", + "//syntaxnet:parser_transitions", + "//syntaxnet:task_context", + ], +) + +cc_library( + name = "syntaxnet_transition_state", + srcs = ["syntaxnet_transition_state.cc"], + hdrs = ["syntaxnet_transition_state.h"], + deps = [ + "//dragnn/core/interfaces:cloneable_transition_state", + "//dragnn/core/interfaces:transition_state", + "//dragnn/io:syntaxnet_sentence", + "//dragnn/protos:trace_proto", + "//syntaxnet:base", + "//syntaxnet:parser_transitions", + ], +) + +# Test data. +filegroup( + name = "testdata", + data = glob(["testdata/**"]), +) + +# Tests. +cc_test( + name = "syntaxnet_component_test", + srcs = ["syntaxnet_component_test.cc"], + data = [":testdata"], + deps = [ + ":syntaxnet_component", + "//dragnn/core:input_batch_cache", + "//dragnn/core/test:generic", + "//dragnn/core/test:mock_transition_state", + "//dragnn/io:sentence_input_batch", + "//syntaxnet:base", + "//syntaxnet:sentence_proto", + "//syntaxnet:test_main", + ], +) + +cc_test( + name = "syntaxnet_link_feature_extractor_test", + srcs = ["syntaxnet_link_feature_extractor_test.cc"], + deps = [ + ":syntaxnet_link_feature_extractor", + "//dragnn/core/test:generic", + "//dragnn/protos:spec_proto", + "//syntaxnet:task_context", + "//syntaxnet:test_main", + ], +) + +cc_test( + name = "syntaxnet_transition_state_test", + srcs = ["syntaxnet_transition_state_test.cc"], + data = [":testdata"], + deps = [ + ":syntaxnet_component", + ":syntaxnet_transition_state", + "//dragnn/core:input_batch_cache", + "//dragnn/core/test:generic", + "//dragnn/core/test:mock_transition_state", + "//dragnn/io:sentence_input_batch", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "//syntaxnet:sentence_proto", + "//syntaxnet:test_main", + ], +) diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component.cc b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component.cc new file mode 100644 index 0000000000000000000000000000000000000000..f5df9e6bd91d94968724375905b1649f6ffb57a2 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component.cc @@ -0,0 +1,794 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/syntaxnet/syntaxnet_component.h" + +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/component_registry.h" +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/core/interfaces/component.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/io/sentence_input_batch.h" +#include "dragnn/io/syntaxnet_sentence.h" +#include "syntaxnet/parser_state.h" +#include "syntaxnet/sparse.pb.h" +#include "syntaxnet/task_spec.pb.h" +#include "syntaxnet/utils.h" +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +using tensorflow::strings::StrCat; + +namespace { + +// Returns a new step in a trace based on a ComponentSpec. +ComponentStepTrace GetNewStepTrace(const ComponentSpec &spec, + const TransitionState &state) { + ComponentStepTrace step; + for (auto &linked_spec : spec.linked_feature()) { + auto &channel_trace = *step.add_linked_feature_trace(); + channel_trace.set_name(linked_spec.name()); + channel_trace.set_source_component(linked_spec.source_component()); + channel_trace.set_source_translator(linked_spec.source_translator()); + channel_trace.set_source_layer(linked_spec.source_layer()); + } + for (auto &fixed_spec : spec.fixed_feature()) { + step.add_fixed_feature_trace()->set_name(fixed_spec.name()); + } + step.set_html_representation(state.HTMLRepresentation()); + return step; +} + +// Returns the last step in the trace. +ComponentStepTrace *GetLastStepInTrace(ComponentTrace *trace) { + CHECK_GT(trace->step_trace_size(), 0) << "Trace has no steps added yet"; + return trace->mutable_step_trace(trace->step_trace_size() - 1); +} + +} // anonymous namespace + +SyntaxNetComponent::SyntaxNetComponent() + : feature_extractor_("brain_parser"), + rewrite_root_labels_(false), + max_beam_size_(1), + input_data_(nullptr) {} + +void SyntaxNetComponent::InitializeComponent(const ComponentSpec &spec) { + // Save off the passed spec for future reference. + spec_ = spec; + + // Create and populate a TaskContext for the underlying parser. + TaskContext context; + + // Add the specified resources. + for (const Resource &resource : spec_.resource()) { + auto *input = context.GetInput(resource.name()); + for (const Part &part : resource.part()) { + auto *input_part = input->add_part(); + input_part->set_file_pattern(part.file_pattern()); + input_part->set_file_format(part.file_format()); + input_part->set_record_format(part.record_format()); + } + } + + // Add the specified task args to the transition system. + for (const auto ¶m : spec_.transition_system().parameters()) { + context.SetParameter(param.first, param.second); + } + + // Set the arguments for the feature extractor. + std::vector names; + std::vector dims; + std::vector fml; + std::vector predicate_maps; + + for (const FixedFeatureChannel &channel : spec.fixed_feature()) { + names.push_back(channel.name()); + fml.push_back(channel.fml()); + predicate_maps.push_back(channel.predicate_map()); + dims.push_back(StrCat(channel.embedding_dim())); + } + + context.SetParameter("neurosis_feature_syntax_version", "2"); + context.SetParameter("brain_parser_embedding_dims", utils::Join(dims, ";")); + context.SetParameter("brain_parser_predicate_maps", + utils::Join(predicate_maps, ";")); + context.SetParameter("brain_parser_features", utils::Join(fml, ";")); + context.SetParameter("brain_parser_embedding_names", utils::Join(names, ";")); + + names.clear(); + dims.clear(); + fml.clear(); + predicate_maps.clear(); + + std::vector source_components; + std::vector source_layers; + std::vector source_translators; + + for (const LinkedFeatureChannel &channel : spec.linked_feature()) { + names.push_back(channel.name()); + fml.push_back(channel.fml()); + dims.push_back(StrCat(channel.embedding_dim())); + source_components.push_back(channel.source_component()); + source_layers.push_back(channel.source_layer()); + source_translators.push_back(channel.source_translator()); + predicate_maps.push_back("none"); + } + + context.SetParameter("link_embedding_dims", utils::Join(dims, ";")); + context.SetParameter("link_predicate_maps", utils::Join(predicate_maps, ";")); + context.SetParameter("link_features", utils::Join(fml, ";")); + context.SetParameter("link_embedding_names", utils::Join(names, ";")); + context.SetParameter("link_source_layers", utils::Join(source_layers, ";")); + context.SetParameter("link_source_translators", + utils::Join(source_translators, ";")); + context.SetParameter("link_source_components", + utils::Join(source_components, ";")); + + context.SetParameter("parser_transition_system", + spec.transition_system().registered_name()); + + // Set up the fixed feature extractor. + feature_extractor_.Setup(&context); + feature_extractor_.Init(&context); + feature_extractor_.RequestWorkspaces(&workspace_registry_); + + // Set up the underlying transition system. + transition_system_.reset(ParserTransitionSystem::Create( + context.Get("parser_transition_system", "arc-standard"))); + transition_system_->Setup(&context); + transition_system_->Init(&context); + + // Create label map. + string path = TaskContext::InputFile(*context.GetInput("label-map")); + label_map_ = + SharedStoreUtils::GetWithDefaultName(path, 0, 0); + + // Set up link feature extractors. + if (spec.linked_feature_size() > 0) { + link_feature_extractor_.Setup(&context); + link_feature_extractor_.Init(&context); + link_feature_extractor_.RequestWorkspaces(&workspace_registry_); + } + + // Get the legacy flag for simulating old parser processor behavior. If the + // flag is not set, default to 'false'. + rewrite_root_labels_ = context.Get("rewrite_root_labels", false); +} + +std::unique_ptr> SyntaxNetComponent::CreateBeam( + int max_size) { + std::unique_ptr> beam( + new Beam(max_size)); + auto permission_function = [this](SyntaxNetTransitionState *state, + int action) { + VLOG(3) << "permission_function action:" << action + << " is_allowed:" << this->IsAllowed(state, action); + return this->IsAllowed(state, action); + }; + auto finality_function = [this](SyntaxNetTransitionState *state) { + VLOG(2) << "finality_function is_final:" << this->IsFinal(state); + return this->IsFinal(state); + }; + auto oracle_function = [this](SyntaxNetTransitionState *state) { + VLOG(2) << "oracle_function action:" << this->GetOracleLabel(state); + return this->GetOracleLabel(state); + }; + auto beam_ptr = beam.get(); + auto advance_function = [this, beam_ptr](SyntaxNetTransitionState *state, + int action) { + VLOG(2) << "advance_function beam ptr:" << beam_ptr << " action:" << action; + this->Advance(state, action, beam_ptr); + }; + beam->SetFunctions(permission_function, finality_function, advance_function, + oracle_function); + + return beam; +} + +void SyntaxNetComponent::InitializeData( + const std::vector> &parent_states, + int max_beam_size, InputBatchCache *input_data) { + // Save off the input data object. + input_data_ = input_data; + + // If beam size has changed, change all beam sizes for existing beams. + if (max_beam_size_ != max_beam_size) { + CHECK_GT(max_beam_size, 0) + << "Requested max beam size must be greater than 0."; + VLOG(2) << "Adjusting max beam size from " << max_beam_size_ << " to " + << max_beam_size; + max_beam_size_ = max_beam_size; + for (auto &beam : batch_) { + beam->SetMaxSize(max_beam_size_); + } + } + + SentenceInputBatch *sentences = input_data->GetAs(); + + // Expect that the sentence data is the same size as the input states batch. + if (!parent_states.empty()) { + CHECK_EQ(parent_states.size(), sentences->data()->size()); + } + + // Adjust the beam vector so that it is the correct size for this batch. + if (batch_.size() < sentences->data()->size()) { + VLOG(1) << "Batch size is increased to " << sentences->data()->size() + << " from " << batch_.size(); + for (int i = batch_.size(); i < sentences->data()->size(); ++i) { + batch_.push_back(CreateBeam(max_beam_size)); + } + } else if (batch_.size() > sentences->data()->size()) { + VLOG(1) << "Batch size is decreased to " << sentences->data()->size() + << " from " << batch_.size(); + batch_.erase(batch_.begin() + sentences->data()->size(), batch_.end()); + + } else { + VLOG(1) << "Batch size is constant at " << sentences->data()->size(); + } + CHECK_EQ(batch_.size(), sentences->data()->size()); + + // Fill the beams with the relevant data for that batch. + for (int batch_index = 0; batch_index < sentences->data()->size(); + ++batch_index) { + // Create a vector of states for this component's beam. + std::vector> initial_states; + if (parent_states.empty()) { + // If no states have been passed in, create a single state to seed the + // beam. + initial_states.push_back( + CreateState(&(sentences->data()->at(batch_index)))); + } else { + // If states have been passed in, seed the beam with them up to the max + // beam size. + int num_states = + std::min(batch_.at(batch_index)->max_size(), + static_cast(parent_states.at(batch_index).size())); + VLOG(2) << "Creating a beam using " << num_states << " initial states"; + for (int i = 0; i < num_states; ++i) { + std::unique_ptr state( + CreateState(&(sentences->data()->at(batch_index)))); + state->Init(*parent_states.at(batch_index).at(i)); + initial_states.push_back(std::move(state)); + } + } + batch_.at(batch_index)->Init(std::move(initial_states)); + } +} + +bool SyntaxNetComponent::IsReady() const { return input_data_ != nullptr; } + +string SyntaxNetComponent::Name() const { + return "SyntaxNet-backed beam parser"; +} + +int SyntaxNetComponent::BatchSize() const { return batch_.size(); } + +int SyntaxNetComponent::BeamSize() const { return max_beam_size_; } + +int SyntaxNetComponent::StepsTaken(int batch_index) const { + return batch_.at(batch_index)->num_steps(); +} + +int SyntaxNetComponent::GetBeamIndexAtStep(int step, int current_index, + int batch) const { + return batch_.at(batch)->FindPreviousIndex(current_index, step); +} + +int SyntaxNetComponent::GetSourceBeamIndex(int current_index, int batch) const { + return batch_.at(batch)->FindPreviousIndex(current_index, 0); +} + +std::function SyntaxNetComponent::GetStepLookupFunction( + const string &method) { + if (method == "shift-reduce-step") { + // TODO(googleuser): Describe this function. + return [this](int batch_index, int beam_index, int value) { + SyntaxNetTransitionState *state = + batch_.at(batch_index)->beam_state(beam_index); + return state->step_for_token(value); + }; + } else if (method == "reduce-step") { + // TODO(googleuser): Describe this function. + return [this](int batch_index, int beam_index, int value) { + SyntaxNetTransitionState *state = + batch_.at(batch_index)->beam_state(beam_index); + return state->parent_step_for_token(value); + }; + } else if (method == "parent-shift-reduce-step") { + // TODO(googleuser): Describe this function. + return [this](int batch_index, int beam_index, int value) { + SyntaxNetTransitionState *state = + batch_.at(batch_index)->beam_state(beam_index); + return state->step_for_token(state->parent_step_for_token(value)); + }; + } else if (method == "reverse-token") { + // TODO(googleuser): Describe this function. + return [this](int batch_index, int beam_index, int value) { + SyntaxNetTransitionState *state = + batch_.at(batch_index)->beam_state(beam_index); + int result = state->sentence()->sentence()->token_size() - value - 1; + if (result >= 0 && result < state->sentence()->sentence()->token_size()) { + return result; + } else { + return -1; + } + }; + } else { + LOG(FATAL) << "Unable to find step lookup function " << method; + } +} + +void SyntaxNetComponent::AdvanceFromPrediction(const float transition_matrix[], + int transition_matrix_length) { + VLOG(2) << "Advancing from prediction."; + int matrix_index = 0; + int num_labels = transition_system_->NumActions(label_map_->Size()); + for (int i = 0; i < batch_.size(); ++i) { + int max_beam_size = batch_.at(i)->max_size(); + int matrix_size = num_labels * max_beam_size; + CHECK_LE(matrix_index + matrix_size, transition_matrix_length); + if (!batch_.at(i)->IsTerminal()) { + batch_.at(i)->AdvanceFromPrediction(&transition_matrix[matrix_index], + matrix_size, num_labels); + } + matrix_index += num_labels * max_beam_size; + } +} + +void SyntaxNetComponent::AdvanceFromOracle() { + VLOG(2) << "Advancing from oracle."; + for (auto &beam : batch_) { + beam->AdvanceFromOracle(); + } +} + +bool SyntaxNetComponent::IsTerminal() const { + VLOG(2) << "Checking terminal status."; + for (const auto &beam : batch_) { + if (!beam->IsTerminal()) { + return false; + } + } + return true; +} + +std::vector> +SyntaxNetComponent::GetBeam() { + std::vector> state_beam; + for (auto &beam : batch_) { + // Because this component only finalizes the data of the highest ranked + // component in each beam, the next component should only be initialized + // from the highest ranked component in that beam. + state_beam.push_back({beam->beam().at(0)}); + } + return state_beam; +} + +int SyntaxNetComponent::GetFixedFeatures( + std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, int channel_id) const { + std::vector features; + + const int channel_size = spec_.fixed_feature(channel_id).size(); + + // For every beam in the batch... + for (const auto &beam : batch_) { + // For every element in the beam... + for (int beam_idx = 0; beam_idx < beam->size(); ++beam_idx) { + // Get the SparseFeatures from the feature extractor. + auto state = beam->beam_state(beam_idx); + const std::vector> sparse_features = + feature_extractor_.ExtractSparseFeatures( + *(state->sentence()->workspace()), *(state->parser_state())); + + // Hold the SparseFeatures for later processing. + for (const SparseFeatures &f : sparse_features[channel_id]) { + features.emplace_back(f); + if (do_tracing_) { + FixedFeatures fixed_features; + for (const string &name : f.description()) { + fixed_features.add_value_name(name); + } + fixed_features.set_feature_name(""); + auto *trace = GetLastStepInTrace(state->mutable_trace()); + auto *fixed_trace = trace->mutable_fixed_feature_trace(channel_id); + *fixed_trace->add_value_trace() = fixed_features; + } + } + } + const int pad_amount = max_beam_size_ - beam->size(); + features.resize(features.size() + pad_amount * channel_size); + } + + int feature_count = 0; + for (const auto &feature : features) { + feature_count += feature.id_size(); + } + + VLOG(2) << "Feature count is " << feature_count; + int32 *indices_tensor = allocate_indices(feature_count); + int64 *ids_tensor = allocate_ids(feature_count); + float *weights_tensor = allocate_weights(feature_count); + + int array_index = 0; + for (int feature_index = 0; feature_index < features.size(); + ++feature_index) { + VLOG(2) << "Extracting for feature_index " << feature_index; + const auto feature = features[feature_index]; + for (int sub_idx = 0; sub_idx < feature.id_size(); ++sub_idx) { + indices_tensor[array_index] = feature_index; + ids_tensor[array_index] = feature.id(sub_idx); + if (sub_idx < feature.weight_size()) { + weights_tensor[array_index] = feature.weight(sub_idx); + } else { + weights_tensor[array_index] = 1.0; + } + VLOG(2) << "Feature index: " << indices_tensor[array_index] + << " id: " << ids_tensor[array_index] + << " weight: " << weights_tensor[array_index]; + + ++array_index; + } + } + return feature_count; +} + +int SyntaxNetComponent::BulkGetFixedFeatures( + const BulkFeatureExtractor &extractor) { + // Allocate a vector of SparseFeatures per channel. + const int num_channels = spec_.fixed_feature_size(); + std::vector channel_size(num_channels); + for (int i = 0; i < num_channels; ++i) { + channel_size[i] = spec_.fixed_feature(i).size(); + } + std::vector> features(num_channels); + std::vector> feature_indices(num_channels); + std::vector> step_indices(num_channels); + std::vector> element_indices(num_channels); + std::vector feature_counts(num_channels); + int step_count = 0; + + while (!IsTerminal()) { + int current_element = 0; + + // For every beam in the batch... + for (const auto &beam : batch_) { + // For every element in the beam... + for (int beam_idx = 0; beam_idx < beam->size(); ++beam_idx) { + // Get the SparseFeatures from the parser. + auto state = beam->beam_state(beam_idx); + const std::vector> sparse_features = + feature_extractor_.ExtractSparseFeatures( + *(state->sentence()->workspace()), *(state->parser_state())); + + for (int channel_id = 0; channel_id < num_channels; ++channel_id) { + int feature_count = 0; + for (const SparseFeatures &f : sparse_features[channel_id]) { + // Trace, if requested. + if (do_tracing_) { + FixedFeatures fixed_features; + for (const string &name : f.description()) { + fixed_features.add_value_name(name); + } + fixed_features.set_feature_name(""); + auto *trace = GetLastStepInTrace(state->mutable_trace()); + auto *fixed_trace = + trace->mutable_fixed_feature_trace(channel_id); + *fixed_trace->add_value_trace() = fixed_features; + } + + // Hold the SparseFeatures for later processing. + features[channel_id].emplace_back(f); + element_indices[channel_id].emplace_back(current_element); + step_indices[channel_id].emplace_back(step_count); + feature_indices[channel_id].emplace_back(feature_count); + feature_counts[channel_id] += f.id_size(); + ++feature_count; + } + } + ++current_element; + } + + // Advance the current element to skip unused beam slots. + // Pad the beam out to max_beam_size. + int pad_amount = max_beam_size_ - beam->size(); + current_element += pad_amount; + } + AdvanceFromOracle(); + ++step_count; + } + + const int total_steps = step_count; + const int num_elements = batch_.size() * max_beam_size_; + + // This would be a good place to add threading. + for (int channel_id = 0; channel_id < num_channels; ++channel_id) { + int feature_count = feature_counts[channel_id]; + LOG(INFO) << "Feature count is " << feature_count << " for channel " + << channel_id; + int32 *indices_tensor = + extractor.AllocateIndexMemory(channel_id, feature_count); + int64 *ids_tensor = extractor.AllocateIdMemory(channel_id, feature_count); + float *weights_tensor = + extractor.AllocateWeightMemory(channel_id, feature_count); + int array_index = 0; + for (int feat_idx = 0; feat_idx < features[channel_id].size(); ++feat_idx) { + const auto &feature = features[channel_id][feat_idx]; + int element_index = element_indices[channel_id][feat_idx]; + int step_index = step_indices[channel_id][feat_idx]; + int feature_index = feature_indices[channel_id][feat_idx]; + for (int sub_idx = 0; sub_idx < feature.id_size(); ++sub_idx) { + indices_tensor[array_index] = + extractor.GetIndex(total_steps, num_elements, feature_index, + element_index, step_index); + ids_tensor[array_index] = feature.id(sub_idx); + if (sub_idx < feature.weight_size()) { + weights_tensor[array_index] = feature.weight(sub_idx); + } else { + weights_tensor[array_index] = 1.0; + } + ++array_index; + } + } + } + return step_count; +} + +std::vector SyntaxNetComponent::GetRawLinkFeatures( + int channel_id) const { + std::vector features; + const int channel_size = spec_.linked_feature(channel_id).size(); + std::unique_ptr> feature_names; + if (do_tracing_) { + feature_names.reset(new std::vector); + *feature_names = utils::Split(spec_.linked_feature(channel_id).fml(), ' '); + } + + // For every beam in the batch... + for (int batch_idx = 0; batch_idx < batch_.size(); ++batch_idx) { + // For every element in the beam... + const auto &beam = batch_[batch_idx]; + for (int beam_idx = 0; beam_idx < beam->size(); ++beam_idx) { + // Get the raw link features from the linked feature extractor. + auto state = beam->beam_state(beam_idx); + std::vector raw_features( + link_feature_extractor_.NumEmbeddings()); + link_feature_extractor_.ExtractFeatures(*(state->sentence()->workspace()), + *(state->parser_state()), + &raw_features); + + // Add the raw feature values to the LinkFeatures proto. + CHECK_LT(channel_id, raw_features.size()); + for (int i = 0; i < raw_features[channel_id].size(); ++i) { + features.emplace_back(); + features.back().set_feature_value(raw_features[channel_id].value(i)); + features.back().set_batch_idx(batch_idx); + features.back().set_beam_idx(beam_idx); + if (do_tracing_) { + features.back().set_feature_name(feature_names->at(i)); + } + } + } + + // Pad the beam out to max_beam_size. + int pad_amount = max_beam_size_ - beam->size(); + features.resize(features.size() + pad_amount * channel_size); + } + + return features; +} + +std::vector> SyntaxNetComponent::GetOracleLabels() const { + std::vector> oracle_labels; + for (const auto &beam : batch_) { + oracle_labels.emplace_back(); + for (int beam_idx = 0; beam_idx < beam->size(); ++beam_idx) { + // Get the raw link features from the linked feature extractor. + auto state = beam->beam_state(beam_idx); + oracle_labels.back().push_back(GetOracleLabel(state)); + } + } + return oracle_labels; +} + +void SyntaxNetComponent::FinalizeData() { + // This chooses the top-scoring member of the beam to annotate the underlying + // document. + VLOG(2) << "Finalizing data."; + for (auto &beam : batch_) { + if (beam->size() != 0) { + auto top_state = beam->beam_state(0); + VLOG(3) << "Finalizing for sentence: " + << top_state->sentence()->sentence()->ShortDebugString(); + top_state->parser_state()->AddParseToDocument( + top_state->sentence()->sentence(), rewrite_root_labels_); + VLOG(3) << "Sentence is now: " + << top_state->sentence()->sentence()->ShortDebugString(); + } else { + LOG(WARNING) << "Attempting to finalize an empty beam for component " + << spec_.name(); + } + } +} + +void SyntaxNetComponent::ResetComponent() { + for (auto &beam : batch_) { + beam->Reset(); + } + input_data_ = nullptr; + max_beam_size_ = 0; +} + +std::unique_ptr SyntaxNetComponent::CreateState( + SyntaxNetSentence *sentence) { + VLOG(3) << "Creating state for sentence " + << sentence->sentence()->DebugString(); + std::unique_ptr parser_state(new ParserState( + sentence->sentence(), transition_system_->NewTransitionState(false), + label_map_)); + sentence->workspace()->Reset(workspace_registry_); + feature_extractor_.Preprocess(sentence->workspace(), parser_state.get()); + link_feature_extractor_.Preprocess(sentence->workspace(), parser_state.get()); + std::unique_ptr transition_state( + new SyntaxNetTransitionState(std::move(parser_state), sentence)); + return transition_state; +} + +bool SyntaxNetComponent::IsAllowed(SyntaxNetTransitionState *state, + int action) const { + return transition_system_->IsAllowedAction(action, *(state->parser_state())); +} + +bool SyntaxNetComponent::IsFinal(SyntaxNetTransitionState *state) const { + return transition_system_->IsFinalState(*(state->parser_state())); +} + +int SyntaxNetComponent::GetOracleLabel(SyntaxNetTransitionState *state) const { + if (IsFinal(state)) { + // It is not permitted to request an oracle label from a sentence that is + // in a final state. + return -1; + } else { + return transition_system_->GetNextGoldAction(*(state->parser_state())); + } +} + +void SyntaxNetComponent::Advance(SyntaxNetTransitionState *state, int action, + Beam *beam) { + auto parser_state = state->parser_state(); + auto sentence_size = state->sentence()->sentence()->token_size(); + const int num_steps = beam->num_steps(); + + if (transition_system_->SupportsActionMetaData()) { + const int parent_idx = + transition_system_->ParentIndex(*parser_state, action); + constexpr int kShiftAction = -1; + if (parent_idx == kShiftAction) { + if (parser_state->Next() < sentence_size && parser_state->Next() >= 0) { + // if we have already consumed all the input then it is not a shift + // action. We just skip it. + state->set_step_for_token(parser_state->Next(), num_steps); + } + } else if (parent_idx >= 0) { + VLOG(2) << spec_.name() << ": Updating pointer: " << parent_idx << " -> " + << num_steps; + state->set_step_for_token(parent_idx, num_steps); + const int child_idx = + transition_system_->ChildIndex(*parser_state, action); + assert(child_idx >= 0 && child_idx < sentence_size); + state->set_parent_for_token(child_idx, parent_idx); + + VLOG(2) << spec_.name() << ": Updating parent for child: " << parent_idx + << " -> " << child_idx; + state->set_parent_step_for_token(child_idx, num_steps); + } else { + VLOG(2) << spec_.name() << ": Invalid parent index: " << parent_idx; + } + } + if (do_tracing_) { + auto *trace = state->mutable_trace(); + auto *last_step = GetLastStepInTrace(trace); + + // Add action to the prior step. + last_step->set_caption( + transition_system_->ActionAsString(action, *parser_state)); + last_step->set_step_finished(true); + } + + transition_system_->PerformAction(action, parser_state); + + if (do_tracing_) { + // Add info for the next step. + *state->mutable_trace()->add_step_trace() = GetNewStepTrace(spec_, *state); + } +} + +void SyntaxNetComponent::InitializeTracing() { + do_tracing_ = true; + CHECK(IsReady()) << "Cannot initialize trace before InitializeData()."; + + // Initialize each element of the beam with a new trace. + for (auto &beam : batch_) { + for (int beam_idx = 0; beam_idx < beam->size(); ++beam_idx) { + SyntaxNetTransitionState *state = beam->beam_state(beam_idx); + std::unique_ptr trace(new ComponentTrace()); + trace->set_name(spec_.name()); + *trace->add_step_trace() = GetNewStepTrace(spec_, *state); + state->set_trace(std::move(trace)); + } + } + + feature_extractor_.set_add_strings(true); +} + +void SyntaxNetComponent::DisableTracing() { + do_tracing_ = false; + feature_extractor_.set_add_strings(false); +} + +void SyntaxNetComponent::AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) { + CHECK(do_tracing_) << "Tracing is not enabled."; + int linear_idx = 0; + const int channel_size = spec_.linked_feature(channel_id).size(); + + // For every beam in the batch... + for (const auto &beam : batch_) { + // For every element in the beam... + for (int beam_idx = 0; beam_idx < max_beam_size_; ++beam_idx) { + for (int feature_idx = 0; feature_idx < channel_size; ++feature_idx) { + if (beam_idx < beam->size()) { + auto state = beam->beam_state(beam_idx); + auto *trace = GetLastStepInTrace(state->mutable_trace()); + auto *link_trace = trace->mutable_linked_feature_trace(channel_id); + if (features[linear_idx].feature_value() >= 0 && + features[linear_idx].step_idx() >= 0) { + *link_trace->add_value_trace() = features[linear_idx]; + } + } + ++linear_idx; + } + } + } +} + +std::vector> SyntaxNetComponent::GetTraceProtos() + const { + std::vector> traces; + + // For every beam in the batch... + for (const auto &beam : batch_) { + std::vector beam_trace; + + // For every element in the beam... + for (int beam_idx = 0; beam_idx < beam->size(); ++beam_idx) { + auto state = beam->beam_state(beam_idx); + beam_trace.push_back(*state->mutable_trace()); + } + traces.push_back(beam_trace); + } + return traces; +}; + +REGISTER_DRAGNN_COMPONENT(SyntaxNetComponent); + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component.h b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component.h new file mode 100644 index 0000000000000000000000000000000000000000..303fcf7739bc85b3cf9a8ad4a8c53589e33c6014 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component.h @@ -0,0 +1,198 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_COMPONENT_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_COMPONENT_H_ + +#include + +#include "dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.h" +#include "dragnn/components/syntaxnet/syntaxnet_transition_state.h" +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/beam.h" +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/core/interfaces/component.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/protos/data.pb.h" +#include "dragnn/protos/spec.pb.h" +#include "dragnn/protos/trace.pb.h" +#include "syntaxnet/base.h" +#include "syntaxnet/parser_transitions.h" +#include "syntaxnet/registry.h" +#include "syntaxnet/task_context.h" + +namespace syntaxnet { +namespace dragnn { + +class SyntaxNetComponent : public Component { + public: + // Create a SyntaxNet-backed DRAGNN component. + SyntaxNetComponent(); + + // Initializes this component from the spec. + void InitializeComponent(const ComponentSpec &spec) override; + + // Provides the previous beam to the component. + void InitializeData( + const std::vector> &states, + int max_beam_size, InputBatchCache *input_data) override; + + // Returns true if the component has had InitializeData called on it since + // the last time it was reset. + bool IsReady() const override; + + // Returns the string name of this component. + string Name() const override; + + // Returns the number of steps taken by the given batch in this component. + int StepsTaken(int batch_index) const override; + + // Returns the current batch size of the component's underlying data. + int BatchSize() const override; + + // Returns the maximum beam size of this component. + int BeamSize() const override; + + // Return the beam index of the item which is currently at index + // 'index', when the beam was at step 'step', for batch element 'batch'. + int GetBeamIndexAtStep(int step, int current_index, int batch) const override; + + // Return the source index of the item which is currently at index 'index' + // for batch element 'batch'. This index is into the final beam of the + // Component that this Component was initialized from. + int GetSourceBeamIndex(int current_index, int batch) const override; + + // Request a translation function based on the given method string. + // The translation function will be called with arguments (batch, beam, value) + // and should return the step index corresponding to the given value, for the + // data in the given beam and batch. + std::function GetStepLookupFunction( + const string &method) override; + + // Advances this component from the given transition matrix. + void AdvanceFromPrediction(const float transition_matrix[], + int transition_matrix_length) override; + + // Advances this component from the state oracles. + void AdvanceFromOracle() override; + + // Returns true if all states within this component are terminal. + bool IsTerminal() const override; + + // Returns the current batch of beams for this component. + std::vector> GetBeam() override; + + // Extracts and populates the vector of FixedFeatures for the specified + // channel. + int GetFixedFeatures(std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const override; + + // Extracts and populates all FixedFeatures for all channels, advancing this + // component via the oracle until it is terminal. + int BulkGetFixedFeatures(const BulkFeatureExtractor &extractor) override; + + // Extracts and returns the vector of LinkFeatures for the specified + // channel. Note: these are NOT translated. + std::vector GetRawLinkFeatures(int channel_id) const override; + + // Returns a vector of oracle labels for each element in the beam and + // batch. + std::vector> GetOracleLabels() const override; + + // Annotate the underlying data object with the results of this Component's + // calculation. + void FinalizeData() override; + + // Reset this component. + void ResetComponent() override; + + // Initializes the component for tracing execution. This will typically have + // the side effect of slowing down all subsequent Component calculations + // and storing a trace in memory that can be returned by GetTraceProtos(). + void InitializeTracing() override; + + // Disables tracing, freeing any additional memory and avoiding triggering + // additional computation in the future. + void DisableTracing() override; + + std::vector> GetTraceProtos() const override; + + void AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) override; + + private: + friend class SyntaxNetComponentTest; + friend class SyntaxNetTransitionStateTest; + + // Permission function for this component. + bool IsAllowed(SyntaxNetTransitionState *state, int action) const; + + // Returns true if this state is final + bool IsFinal(SyntaxNetTransitionState *state) const; + + // Oracle function for this component. + int GetOracleLabel(SyntaxNetTransitionState *state) const; + + // State advance function for this component. + void Advance(SyntaxNetTransitionState *state, int action, + Beam *beam); + + // Creates a new state for the given nlp_saft::SentenceExample. + std::unique_ptr CreateState( + SyntaxNetSentence *example); + + // Creates a newly initialized Beam. + std::unique_ptr> CreateBeam(int max_size); + + // Transition system. + std::unique_ptr transition_system_; + + // Label map for transition system. + const TermFrequencyMap *label_map_; + + // Extractor for fixed features + ParserEmbeddingFeatureExtractor feature_extractor_; + + // Extractor for linked features. + SyntaxNetLinkFeatureExtractor link_feature_extractor_; + + // Internal workspace registry for use in feature extraction. + WorkspaceRegistry workspace_registry_; + + // Switch for simulating legacy parser behaviour. + bool rewrite_root_labels_; + + // The ComponentSpec used to initialize this component. + ComponentSpec spec_; + + // State search beams + std::vector>> batch_; + + // Current max beam size. + int max_beam_size_; + + // Underlying input data. + InputBatchCache *input_data_; + + // Whether or not to trace for each batch and beam element. + bool do_tracing_ = false; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_COMPONENT_H_ diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component_test.cc b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..52441ad31572c9ebd2ac119e933eef1f3d74208d --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_component_test.cc @@ -0,0 +1,1273 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/syntaxnet/syntaxnet_component.h" + +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/core/test/generic.h" +#include "dragnn/core/test/mock_transition_state.h" +#include "dragnn/io/sentence_input_batch.h" +#include "syntaxnet/sentence.pb.h" +#include "tensorflow/core/lib/core/errors.h" +#include "tensorflow/core/lib/core/status.h" +#include "tensorflow/core/lib/io/path.h" +#include "tensorflow/core/platform/env.h" +#include "tensorflow/core/platform/protobuf.h" +#include "tensorflow/core/platform/test.h" + +// This test suite is intended to validate the contracts that the DRAGNN +// system expects from all transition state subclasses. Developers creating +// new TransitionStates should copy this test and modify it as necessary, +// using it to ensure their state conforms to DRAGNN expectations. + +namespace syntaxnet { +namespace dragnn { + +namespace { + +const char kSentence0[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "0" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +const char kSentence1[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "1" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +const char kLongSentence[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "1" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "2" start: 10 end: 10 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "3" start: 11 end: 11 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 12 end: 12 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +} // namespace + +using testing::Return; + +class SyntaxNetComponentTest : public ::testing::Test { + public: + std::unique_ptr CreateParser( + const std::vector> &states, + const std::vector &data) { + constexpr int kBeamSize = 2; + return CreateParserWithBeamSize(kBeamSize, states, data); + } + std::unique_ptr CreateParserWithBeamSize( + int beam_size, + const std::vector> &states, + const std::vector &data) { + // Get the master spec proto from the test data directory. + MasterSpec master_spec; + string file_name = tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + "master_spec.textproto"); + TF_CHECK_OK(tensorflow::ReadTextProto(tensorflow::Env::Default(), file_name, + &master_spec)); + + // Get all the resource protos from the test data directory. + for (Resource &resource : + *(master_spec.mutable_component(0)->mutable_resource())) { + resource.mutable_part(0)->set_file_pattern(tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + resource.part(0).file_pattern())); + } + + data_.reset(new InputBatchCache(data)); + + // Create a parser component with the specified beam size. + std::unique_ptr parser_component( + new SyntaxNetComponent()); + parser_component->InitializeComponent(*(master_spec.mutable_component(0))); + parser_component->InitializeData(states, beam_size, data_.get()); + return parser_component; + } + + const std::vector *> GetBeams( + SyntaxNetComponent *component) const { + std::vector *> return_vector; + for (const auto &beam : component->batch_) { + return_vector.push_back(beam.get()); + } + return return_vector; + } + + std::unique_ptr data_; +}; + +TEST_F(SyntaxNetComponentTest, AdvancesFromOracleAndTerminates) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + auto test_parser = CreateParser({}, {sentence_0_str}); + constexpr int kNumTokensInSentence = 3; + + // The master spec will initialize a parser, so expect 2*N transitions. + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + test_parser->AdvanceFromOracle(); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), kExpectedNumTransitions); + + // Make sure the parser doesn't segfault. + test_parser->FinalizeData(); +} + +TEST_F(SyntaxNetComponentTest, AdvancesFromPredictionAndTerminates) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + auto test_parser = CreateParser({}, {sentence_0_str}); + constexpr int kNumTokensInSentence = 3; + + // The master spec will initialize a parser, so expect 2*N transitions. + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBeamSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times. + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + test_parser->AdvanceFromPrediction(transition_matrix, + kNumPossibleTransitions * kBeamSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), kExpectedNumTransitions); + + // Prepare to validate the batched beams. + auto beam = test_parser->GetBeam(); + + // All beams should only have one element. + for (const auto &per_beam : beam) { + EXPECT_EQ(per_beam.size(), 1); + } + + // The final states should have kExpectedNumTransitions * kTransitionValue. + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + + // Make sure the parser doesn't segfault. + test_parser->FinalizeData(); + + // TODO(googleuser): What should the finalized data look like? +} + +TEST_F(SyntaxNetComponentTest, RetainsPassedTransitionStateData) { + // Create and initialize the state-> + MockTransitionState mock_state_one; + constexpr int kParentBeamIndexOne = 1138; + constexpr float kParentScoreOne = 7.2; + EXPECT_CALL(mock_state_one, GetBeamIndex()) + .WillRepeatedly(Return(kParentBeamIndexOne)); + EXPECT_CALL(mock_state_one, GetScore()) + .WillRepeatedly(Return(kParentScoreOne)); + + MockTransitionState mock_state_two; + constexpr int kParentBeamIndexTwo = 1123; + constexpr float kParentScoreTwo = 42.03; + EXPECT_CALL(mock_state_two, GetBeamIndex()) + .WillRepeatedly(Return(kParentBeamIndexTwo)); + EXPECT_CALL(mock_state_two, GetScore()) + .WillRepeatedly(Return(kParentScoreTwo)); + + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + auto test_parser = + CreateParser({{&mock_state_one, &mock_state_two}}, {sentence_0_str}); + constexpr int kNumTokensInSentence = 3; + + // The master spec will initialize a parser, so expect 2*N transitions. + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBeamSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + test_parser->AdvanceFromPrediction(transition_matrix, + kNumPossibleTransitions * kBeamSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), kExpectedNumTransitions); + + // The final states should have kExpectedNumTransitions * kTransitionValue, + // plus the higher parent state score (from state two). + auto beam = test_parser->GetBeam(); + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions + kParentScoreTwo); + + // Make sure that the parent state is reported correctly. + EXPECT_EQ(test_parser->GetSourceBeamIndex(0, 0), kParentBeamIndexTwo); + + // Make sure the parser doesn't segfault. + test_parser->FinalizeData(); + + // TODO(googleuser): What should the finalized data look like? +} + +TEST_F(SyntaxNetComponentTest, AdvancesFromPredictionForMultiSentenceBatches) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + auto test_parser = CreateParser({}, {sentence_0_str, sentence_1_str}); + constexpr int kNumTokensInSentence = 3; + + // The master spec will initialize a parser, so expect 2*N transitions. + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 2; + constexpr int kBeamSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times. + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), kExpectedNumTransitions); + EXPECT_EQ(test_parser->StepsTaken(1), kExpectedNumTransitions); + + // The final states should have kExpectedNumTransitions * kTransitionValue. + auto beam = test_parser->GetBeam(); + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + EXPECT_EQ(beam.at(1).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + + // Make sure the parser doesn't segfault. + test_parser->FinalizeData(); + + // TODO(googleuser): What should the finalized data look like? +} + +TEST_F(SyntaxNetComponentTest, + AdvancesFromPredictionForVaryingLengthSentences) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence long_sentence; + TextFormat::ParseFromString(kLongSentence, &long_sentence); + string long_sentence_str; + long_sentence.SerializeToString(&long_sentence_str); + + auto test_parser = CreateParser({}, {sentence_0_str, long_sentence_str}); + constexpr int kNumTokensInSentence = 3; + constexpr int kNumTokensInLongSentence = 5; + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 2; + constexpr int kBeamSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times. + constexpr int kExpectedNumTransitions = kNumTokensInLongSentence * 2; + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), kNumTokensInSentence * 2); + EXPECT_EQ(test_parser->StepsTaken(1), kNumTokensInLongSentence * 2); + + // The final states should have kExpectedNumTransitions * kTransitionValue. + auto beam = test_parser->GetBeam(); + + // The first sentence is shorter, so it should have a lower final score. + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kNumTokensInSentence * 2); + EXPECT_EQ(beam.at(1).at(0)->GetScore(), + kTransitionValue * kNumTokensInLongSentence * 2); + + // Make sure the parser doesn't segfault. + test_parser->FinalizeData(); + + // TODO(googleuser): What should the finalized data look like? +} + +TEST_F(SyntaxNetComponentTest, ResetAllowsReductionInBatchSize) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence long_sentence; + TextFormat::ParseFromString(kLongSentence, &long_sentence); + string long_sentence_str; + long_sentence.SerializeToString(&long_sentence_str); + + // Get the master spec proto from the test data directory. + MasterSpec master_spec; + string file_name = tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + "master_spec.textproto"); + TF_CHECK_OK(tensorflow::ReadTextProto(tensorflow::Env::Default(), file_name, + &master_spec)); + + // Get all the resource protos from the test data directory. + for (Resource &resource : + *(master_spec.mutable_component(0)->mutable_resource())) { + resource.mutable_part(0)->set_file_pattern(tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + resource.part(0).file_pattern())); + } + + // Create an input batch cache with a large batch size. + constexpr int kBeamSize = 2; + std::unique_ptr large_batch_data(new InputBatchCache( + {sentence_0_str, sentence_0_str, sentence_0_str, sentence_0_str})); + std::unique_ptr parser_component( + new SyntaxNetComponent()); + parser_component->InitializeComponent(*(master_spec.mutable_component(0))); + parser_component->InitializeData({}, kBeamSize, large_batch_data.get()); + + // Reset the component and pass in a new input batch that is smaller. + parser_component->ResetComponent(); + std::unique_ptr small_batch_data(new InputBatchCache( + {long_sentence_str, long_sentence_str, long_sentence_str})); + parser_component->InitializeData({}, kBeamSize, small_batch_data.get()); + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 3; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times. + constexpr int kNumTokensInSentence = 5; + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(parser_component->IsTerminal()); + parser_component->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(parser_component->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(parser_component->StepsTaken(0), kExpectedNumTransitions); + EXPECT_EQ(parser_component->StepsTaken(1), kExpectedNumTransitions); + EXPECT_EQ(parser_component->StepsTaken(2), kExpectedNumTransitions); + + // The final states should have kExpectedNumTransitions * kTransitionValue. + auto beam = parser_component->GetBeam(); + + // The beam should be of batch size 3. + EXPECT_EQ(beam.size(), 3); + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + EXPECT_EQ(beam.at(1).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + EXPECT_EQ(beam.at(2).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + + // Make sure the parser doesn't segfault. + parser_component->FinalizeData(); +} + +TEST_F(SyntaxNetComponentTest, ResetAllowsIncreaseInBatchSize) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence long_sentence; + TextFormat::ParseFromString(kLongSentence, &long_sentence); + string long_sentence_str; + long_sentence.SerializeToString(&long_sentence_str); + + // Get the master spec proto from the test data directory. + MasterSpec master_spec; + string file_name = tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + "master_spec.textproto"); + TF_CHECK_OK(tensorflow::ReadTextProto(tensorflow::Env::Default(), file_name, + &master_spec)); + + // Get all the resource protos from the test data directory. + for (Resource &resource : + *(master_spec.mutable_component(0)->mutable_resource())) { + resource.mutable_part(0)->set_file_pattern(tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + resource.part(0).file_pattern())); + } + + // Create an input batch cache with a small batch size. + constexpr int kBeamSize = 2; + std::unique_ptr small_batch_data( + new InputBatchCache(sentence_0_str)); + std::unique_ptr parser_component( + new SyntaxNetComponent()); + parser_component->InitializeComponent(*(master_spec.mutable_component(0))); + parser_component->InitializeData({}, kBeamSize, small_batch_data.get()); + + // Reset the component and pass in a new input batch that is larger. + parser_component->ResetComponent(); + std::unique_ptr large_batch_data(new InputBatchCache( + {long_sentence_str, long_sentence_str, long_sentence_str})); + parser_component->InitializeData({}, kBeamSize, large_batch_data.get()); + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 3; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times. + constexpr int kNumTokensInSentence = 5; + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(parser_component->IsTerminal()); + parser_component->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(parser_component->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(parser_component->StepsTaken(0), kExpectedNumTransitions); + EXPECT_EQ(parser_component->StepsTaken(1), kExpectedNumTransitions); + EXPECT_EQ(parser_component->StepsTaken(2), kExpectedNumTransitions); + + // The final states should have kExpectedNumTransitions * kTransitionValue. + auto beam = parser_component->GetBeam(); + + // The beam should be of batch size 3. + EXPECT_EQ(beam.size(), 3); + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + EXPECT_EQ(beam.at(1).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + EXPECT_EQ(beam.at(2).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + + // Make sure the parser doesn't segfault. + parser_component->FinalizeData(); +} + +TEST_F(SyntaxNetComponentTest, ResetCausesBeamToReset) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence long_sentence; + TextFormat::ParseFromString(kLongSentence, &long_sentence); + string long_sentence_str; + long_sentence.SerializeToString(&long_sentence_str); + + auto test_parser = CreateParser({}, {sentence_0_str}); + constexpr int kNumTokensInSentence = 3; + + // The master spec will initialize a parser, so expect 2*N transitions. + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBeamSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Transition the expected number of times. + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + test_parser->AdvanceFromPrediction(transition_matrix, + kNumPossibleTransitions * kBeamSize); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // Check that the component is reporting 2N steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), kExpectedNumTransitions); + + // The final states should have kExpectedNumTransitions * kTransitionValue. + auto beam = test_parser->GetBeam(); + EXPECT_EQ(beam.at(0).at(0)->GetScore(), + kTransitionValue * kExpectedNumTransitions); + + // Reset the test parser and give it new data. + test_parser->ResetComponent(); + std::unique_ptr new_data( + new InputBatchCache(long_sentence_str)); + test_parser->InitializeData({}, kBeamSize, new_data.get()); + + // Check that the component is not terminal. + EXPECT_FALSE(test_parser->IsTerminal()); + + // Check that the component is reporting 0 steps taken. + EXPECT_EQ(test_parser->StepsTaken(0), 0); + + // The states should have 0 as their score. + auto new_beam = test_parser->GetBeam(); + EXPECT_EQ(new_beam.at(0).at(0)->GetScore(), 0); +} + +TEST_F(SyntaxNetComponentTest, AdjustingMaxBeamSizeAdjustsSizeForAllBeams) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence long_sentence; + TextFormat::ParseFromString(kLongSentence, &long_sentence); + string long_sentence_str; + long_sentence.SerializeToString(&long_sentence_str); + + // Get the master spec proto from the test data directory. + MasterSpec master_spec; + string file_name = tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + "master_spec.textproto"); + TF_CHECK_OK(tensorflow::ReadTextProto(tensorflow::Env::Default(), file_name, + &master_spec)); + + // Get all the resource protos from the test data directory. + for (Resource &resource : + *(master_spec.mutable_component(0)->mutable_resource())) { + resource.mutable_part(0)->set_file_pattern(tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + resource.part(0).file_pattern())); + } + + // Create an input batch cache with a small batch size. + constexpr int kBeamSize = 2; + std::unique_ptr small_batch_data( + new InputBatchCache(sentence_0_str)); + std::unique_ptr parser_component( + new SyntaxNetComponent()); + parser_component->InitializeComponent(*(master_spec.mutable_component(0))); + parser_component->InitializeData({}, kBeamSize, small_batch_data.get()); + + // Make sure all the beams in the batch have max size 2. + for (const auto &beam : GetBeams(parser_component.get())) { + EXPECT_EQ(beam->max_size(), kBeamSize); + } + + // Reset the component and pass in a new input batch that is larger, with + // a higher beam size. + constexpr int kNewBeamSize = 5; + parser_component->ResetComponent(); + std::unique_ptr large_batch_data(new InputBatchCache( + {long_sentence_str, long_sentence_str, long_sentence_str})); + parser_component->InitializeData({}, kNewBeamSize, large_batch_data.get()); + + // Make sure all the beams in the batch now have max size 5. + for (const auto &beam : GetBeams(parser_component.get())) { + EXPECT_EQ(beam->max_size(), kNewBeamSize); + } +} + +TEST_F(SyntaxNetComponentTest, SettingBeamSizeZeroFails) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence long_sentence; + TextFormat::ParseFromString(kLongSentence, &long_sentence); + string long_sentence_str; + long_sentence.SerializeToString(&long_sentence_str); + + // Get the master spec proto from the test data directory. + MasterSpec master_spec; + string file_name = tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + "master_spec.textproto"); + TF_CHECK_OK(tensorflow::ReadTextProto(tensorflow::Env::Default(), file_name, + &master_spec)); + + // Get all the resource protos from the test data directory. + for (Resource &resource : + *(master_spec.mutable_component(0)->mutable_resource())) { + resource.mutable_part(0)->set_file_pattern(tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + resource.part(0).file_pattern())); + } + + // Create an input batch cache with a small batch size. + constexpr int kBeamSize = 0; + std::unique_ptr small_batch_data( + new InputBatchCache(sentence_0_str)); + std::unique_ptr parser_component( + new SyntaxNetComponent()); + parser_component->InitializeComponent(*(master_spec.mutable_component(0))); + EXPECT_DEATH( + parser_component->InitializeData({}, kBeamSize, small_batch_data.get()), + "must be greater than 0"); +} + +TEST_F(SyntaxNetComponentTest, ExportsFixedFeaturesWithPadding) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + constexpr int kBeamSize = 3; + + auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str, sentence_1_str}); + + // Get and check the raw link features. + vector indices; + auto indices_fn = [&indices](int size) { + indices.resize(size); + return indices.data(); + }; + vector ids; + auto ids_fn = [&ids](int size) { + ids.resize(size); + return ids.data(); + }; + vector weights; + auto weights_fn = [&weights](int size) { + weights.resize(size); + return weights.data(); + }; + constexpr int kChannelId = 0; + const int num_features = + test_parser->GetFixedFeatures(indices_fn, ids_fn, weights_fn, kChannelId); + + // The raw features for each beam object should be [single, single]. + // There is also padding expected in this beam - there is only one + // element in each beam (so two elements total; batch is two). Thus, we expect + // 0,1 and 6,7 to be filled with one element each. + constexpr int kExpectedOutputSize = 4; + const vector expected_indices({0, 1, 6, 7}); + const vector expected_ids({0, 12, 0, 12}); + const vector expected_weights({1.0, 1.0, 1.0, 1.0}); + + EXPECT_EQ(expected_indices.size(), kExpectedOutputSize); + EXPECT_EQ(expected_ids.size(), kExpectedOutputSize); + EXPECT_EQ(expected_weights.size(), kExpectedOutputSize); + EXPECT_EQ(num_features, kExpectedOutputSize); + + EXPECT_EQ(expected_indices, indices); + EXPECT_EQ(expected_ids, ids); + EXPECT_EQ(expected_weights, weights); +} + +TEST_F(SyntaxNetComponentTest, ExportsFixedFeatures) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + constexpr int kBeamSize = 3; + + auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str, sentence_1_str}); + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Advance twice, so that the underlying parser fills the beam. + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + + // Get and check the raw link features. + vector indices; + auto indices_fn = [&indices](int size) { + indices.resize(size); + return indices.data(); + }; + vector ids; + auto ids_fn = [&ids](int size) { + ids.resize(size); + return ids.data(); + }; + vector weights; + auto weights_fn = [&weights](int size) { + weights.resize(size); + return weights.data(); + }; + constexpr int kChannelId = 0; + const int num_features = + test_parser->GetFixedFeatures(indices_fn, ids_fn, weights_fn, kChannelId); + + constexpr int kExpectedOutputSize = 12; + const vector expected_indices({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}); + const vector expected_ids({7, 50, 12, 7, 12, 7, 7, 50, 12, 7, 12, 7}); + const vector expected_weights( + {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}); + + EXPECT_EQ(expected_indices.size(), kExpectedOutputSize); + EXPECT_EQ(expected_ids.size(), kExpectedOutputSize); + EXPECT_EQ(expected_weights.size(), kExpectedOutputSize); + EXPECT_EQ(num_features, kExpectedOutputSize); + + EXPECT_EQ(expected_indices, indices); + EXPECT_EQ(expected_ids, ids); + EXPECT_EQ(expected_weights, weights); +} + +TEST_F(SyntaxNetComponentTest, AdvancesAccordingToHighestWeightedInputOption) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + constexpr int kBeamSize = 3; + + auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str, sentence_1_str}); + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Replace the first several options with varying scores to test sorting. + constexpr int kBatchOffset = kNumPossibleTransitions * kBeamSize; + transition_matrix[0] = 3 * kTransitionValue; + transition_matrix[1] = 3 * kTransitionValue; + transition_matrix[2] = 4 * kTransitionValue; + transition_matrix[3] = 4 * kTransitionValue; + transition_matrix[4] = 2 * kTransitionValue; + transition_matrix[5] = 2 * kTransitionValue; + transition_matrix[kBatchOffset + 0] = 3 * kTransitionValue; + transition_matrix[kBatchOffset + 1] = 3 * kTransitionValue; + transition_matrix[kBatchOffset + 2] = 4 * kTransitionValue; + transition_matrix[kBatchOffset + 3] = 4 * kTransitionValue; + transition_matrix[kBatchOffset + 4] = 2 * kTransitionValue; + transition_matrix[kBatchOffset + 5] = 2 * kTransitionValue; + + // Advance twice, so that the underlying parser fills the beam. + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + + // Get and check the raw link features. + vector indices; + auto indices_fn = [&indices](int size) { + indices.resize(size); + return indices.data(); + }; + vector ids; + auto ids_fn = [&ids](int size) { + ids.resize(size); + return ids.data(); + }; + vector weights; + auto weights_fn = [&weights](int size) { + weights.resize(size); + return weights.data(); + }; + constexpr int kChannelId = 0; + const int num_features = + test_parser->GetFixedFeatures(indices_fn, ids_fn, weights_fn, kChannelId); + + // In this case, all even features and all odd features are identical. + constexpr int kExpectedOutputSize = 12; + const vector expected_indices({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}); + const vector expected_ids({12, 7, 7, 50, 12, 7, 12, 7, 7, 50, 12, 7}); + const vector expected_weights( + {1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}); + + EXPECT_EQ(expected_indices.size(), kExpectedOutputSize); + EXPECT_EQ(expected_ids.size(), kExpectedOutputSize); + EXPECT_EQ(expected_weights.size(), kExpectedOutputSize); + EXPECT_EQ(num_features, kExpectedOutputSize); + + EXPECT_EQ(expected_indices, indices); + EXPECT_EQ(expected_ids, ids); + EXPECT_EQ(expected_weights, weights); +} + +TEST_F(SyntaxNetComponentTest, ExportsBulkFixedFeatures) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + constexpr int kBeamSize = 3; + auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str, sentence_1_str}); + + // Get and check the raw link features. + vector> indices; + auto indices_fn = [&indices](int channel, int size) { + indices.resize(channel + 1); + indices[channel].resize(size); + return indices[channel].data(); + }; + vector> ids; + auto ids_fn = [&ids](int channel, int size) { + ids.resize(channel + 1); + ids[channel].resize(size); + return ids[channel].data(); + }; + vector> weights; + auto weights_fn = [&weights](int channel, int size) { + weights.resize(channel + 1); + weights[channel].resize(size); + return weights[channel].data(); + }; + + BulkFeatureExtractor extractor(indices_fn, ids_fn, weights_fn); + const int num_steps = test_parser->BulkGetFixedFeatures(extractor); + + // There should be 6 steps (2N, where N is the longest number of tokens). + EXPECT_EQ(num_steps, 6); + + // These are empirically derived. + const vector expected_ch0_indices({0, 36, 18, 54, 1, 37, 19, 55, + 2, 38, 20, 56, 3, 39, 21, 57, + 4, 40, 22, 58, 5, 41, 23, 59}); + const vector expected_ch0_ids({0, 12, 0, 12, 12, 7, 12, 7, + 7, 50, 7, 50, 7, 50, 7, 50, + 50, 50, 50, 50, 50, 50, 50, 50}); + const vector expected_ch0_weights( + {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}); + const vector expected_ch1_indices( + {0, 36, 72, 18, 54, 90, 1, 37, 73, 19, 55, 91, 2, 38, 74, 20, 56, 92, + 3, 39, 75, 21, 57, 93, 4, 40, 76, 22, 58, 94, 5, 41, 77, 23, 59, 95}); + const vector expected_ch1_ids( + {51, 0, 12, 51, 0, 12, 0, 12, 7, 0, 12, 7, 12, 7, 50, 12, 7, 50, + 12, 7, 50, 12, 7, 50, 7, 50, 50, 7, 50, 50, 7, 50, 50, 7, 50, 50}); + const vector expected_ch1_weights( + {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}); + + EXPECT_EQ(indices[0], expected_ch0_indices); + EXPECT_EQ(ids[0], expected_ch0_ids); + EXPECT_EQ(weights[0], expected_ch0_weights); + EXPECT_EQ(indices[1], expected_ch1_indices); + EXPECT_EQ(ids[1], expected_ch1_ids); + EXPECT_EQ(weights[1], expected_ch1_weights); +} + +TEST_F(SyntaxNetComponentTest, ExportsRawLinkFeaturesWithPadding) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + constexpr int kBeamSize = 3; + constexpr int kBatchSize = 2; + auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str, sentence_1_str}); + + // Get and check the raw link features. + constexpr int kNumLinkFeatures = 2; + auto link_features = test_parser->GetRawLinkFeatures(0); + EXPECT_EQ(link_features.size(), kBeamSize * kBatchSize * kNumLinkFeatures); + + EXPECT_EQ(link_features.at(0).feature_value(), -1); + EXPECT_EQ(link_features.at(0).batch_idx(), 0); + EXPECT_EQ(link_features.at(0).beam_idx(), 0); + + EXPECT_EQ(link_features.at(1).feature_value(), -2); + EXPECT_EQ(link_features.at(1).batch_idx(), 0); + EXPECT_EQ(link_features.at(1).beam_idx(), 0); + + // These are padding, so we do not expect them to have a feature value. + EXPECT_FALSE(link_features.at(2).has_feature_value()); + EXPECT_FALSE(link_features.at(2).has_batch_idx()); + EXPECT_FALSE(link_features.at(2).has_beam_idx()); + EXPECT_FALSE(link_features.at(3).has_feature_value()); + EXPECT_FALSE(link_features.at(3).has_batch_idx()); + EXPECT_FALSE(link_features.at(3).has_beam_idx()); + EXPECT_FALSE(link_features.at(4).has_feature_value()); + EXPECT_FALSE(link_features.at(4).has_batch_idx()); + EXPECT_FALSE(link_features.at(4).has_beam_idx()); + EXPECT_FALSE(link_features.at(5).has_feature_value()); + EXPECT_FALSE(link_features.at(5).has_batch_idx()); + EXPECT_FALSE(link_features.at(5).has_beam_idx()); + + EXPECT_EQ(link_features.at(6).feature_value(), -1); + EXPECT_EQ(link_features.at(6).batch_idx(), 1); + EXPECT_EQ(link_features.at(6).beam_idx(), 0); + + EXPECT_EQ(link_features.at(7).feature_value(), -2); + EXPECT_EQ(link_features.at(7).batch_idx(), 1); + EXPECT_EQ(link_features.at(7).beam_idx(), 0); + + // These are padding, so we do not expect them to have a feature value. + EXPECT_FALSE(link_features.at(8).has_feature_value()); + EXPECT_FALSE(link_features.at(8).has_batch_idx()); + EXPECT_FALSE(link_features.at(8).has_beam_idx()); + EXPECT_FALSE(link_features.at(9).has_feature_value()); + EXPECT_FALSE(link_features.at(9).has_batch_idx()); + EXPECT_FALSE(link_features.at(9).has_beam_idx()); + EXPECT_FALSE(link_features.at(10).has_feature_value()); + EXPECT_FALSE(link_features.at(10).has_batch_idx()); + EXPECT_FALSE(link_features.at(10).has_beam_idx()); + EXPECT_FALSE(link_features.at(11).has_feature_value()); + EXPECT_FALSE(link_features.at(11).has_batch_idx()); + EXPECT_FALSE(link_features.at(11).has_beam_idx()); +} + +TEST_F(SyntaxNetComponentTest, ExportsRawLinkFeatures) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + Sentence sentence_1; + TextFormat::ParseFromString(kSentence1, &sentence_1); + string sentence_1_str; + sentence_1.SerializeToString(&sentence_1_str); + + constexpr int kBeamSize = 3; + auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str, sentence_1_str}); + + // There are 93 possible transitions for any given state. Create a transition + // array with a score of 10.0 for each transition. + constexpr int kBatchSize = 2; + constexpr int kNumPossibleTransitions = 93; + constexpr float kTransitionValue = 10.0; + float transition_matrix[kNumPossibleTransitions * kBeamSize * kBatchSize]; + for (int i = 0; i < kNumPossibleTransitions * kBeamSize * kBatchSize; ++i) { + transition_matrix[i] = kTransitionValue; + } + + // Advance twice, so that the underlying parser fills the beam. + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + test_parser->AdvanceFromPrediction( + transition_matrix, kNumPossibleTransitions * kBeamSize * kBatchSize); + + // Get and check the raw link features. + constexpr int kNumLinkFeatures = 2; + auto link_features = test_parser->GetRawLinkFeatures(0); + EXPECT_EQ(link_features.size(), kBeamSize * kBatchSize * kNumLinkFeatures); + + // These should index into batch 0. + EXPECT_EQ(link_features.at(0).feature_value(), 1); + EXPECT_EQ(link_features.at(0).batch_idx(), 0); + EXPECT_EQ(link_features.at(0).beam_idx(), 0); + + EXPECT_EQ(link_features.at(1).feature_value(), 0); + EXPECT_EQ(link_features.at(1).batch_idx(), 0); + EXPECT_EQ(link_features.at(1).beam_idx(), 0); + + EXPECT_EQ(link_features.at(2).feature_value(), -1); + EXPECT_EQ(link_features.at(2).batch_idx(), 0); + EXPECT_EQ(link_features.at(2).beam_idx(), 1); + + EXPECT_EQ(link_features.at(3).feature_value(), -2); + EXPECT_EQ(link_features.at(3).batch_idx(), 0); + EXPECT_EQ(link_features.at(3).beam_idx(), 1); + + EXPECT_EQ(link_features.at(4).feature_value(), -1); + EXPECT_EQ(link_features.at(4).batch_idx(), 0); + EXPECT_EQ(link_features.at(4).beam_idx(), 2); + + EXPECT_EQ(link_features.at(5).feature_value(), -2); + EXPECT_EQ(link_features.at(5).batch_idx(), 0); + EXPECT_EQ(link_features.at(5).beam_idx(), 2); + + // These should index into batch 1. + EXPECT_EQ(link_features.at(6).feature_value(), 1); + EXPECT_EQ(link_features.at(6).batch_idx(), 1); + EXPECT_EQ(link_features.at(6).beam_idx(), 0); + + EXPECT_EQ(link_features.at(7).feature_value(), 0); + EXPECT_EQ(link_features.at(7).batch_idx(), 1); + EXPECT_EQ(link_features.at(7).beam_idx(), 0); + + EXPECT_EQ(link_features.at(8).feature_value(), -1); + EXPECT_EQ(link_features.at(8).batch_idx(), 1); + EXPECT_EQ(link_features.at(8).beam_idx(), 1); + + EXPECT_EQ(link_features.at(9).feature_value(), -2); + EXPECT_EQ(link_features.at(9).batch_idx(), 1); + EXPECT_EQ(link_features.at(9).beam_idx(), 1); + + EXPECT_EQ(link_features.at(10).feature_value(), -1); + EXPECT_EQ(link_features.at(10).batch_idx(), 1); + EXPECT_EQ(link_features.at(10).beam_idx(), 2); + + EXPECT_EQ(link_features.at(11).feature_value(), -2); + EXPECT_EQ(link_features.at(11).batch_idx(), 1); + EXPECT_EQ(link_features.at(11).beam_idx(), 2); +} + +TEST_F(SyntaxNetComponentTest, AdvancesFromOracleWithTracing) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + constexpr int kBeamSize = 1; + auto test_parser = CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str}); + test_parser->InitializeTracing(); + + constexpr int kNumTokensInSentence = 3; + + // The master spec will initialize a parser, so expect 2*N transitions. + constexpr int kExpectedNumTransitions = kNumTokensInSentence * 2; + constexpr int kFixedFeatureChannels = 1; + for (int i = 0; i < kExpectedNumTransitions; ++i) { + EXPECT_FALSE(test_parser->IsTerminal()); + vector indices; + auto indices_fn = [&indices](int size) { + indices.resize(size); + return indices.data(); + }; + vector ids; + auto ids_fn = [&ids](int size) { + ids.resize(size); + return ids.data(); + }; + vector weights; + auto weights_fn = [&weights](int size) { + weights.resize(size); + return weights.data(); + }; + for (int j = 0; j < kFixedFeatureChannels; ++j) { + test_parser->GetFixedFeatures(indices_fn, ids_fn, weights_fn, j); + } + auto features = test_parser->GetRawLinkFeatures(0); + + // Make some fake translations to test visualization. + for (int j = 0; j < features.size(); ++j) { + features[j].set_step_idx(j < i ? j : -1); + } + test_parser->AddTranslatedLinkFeaturesToTrace(features, 0); + test_parser->AdvanceFromOracle(); + } + + // At this point, the test parser should be terminal. + EXPECT_TRUE(test_parser->IsTerminal()); + + // TODO(googleuser): Add EXPECT_EQ here instead of printing. + std::vector> traces = + test_parser->GetTraceProtos(); + for (auto &batch_trace : traces) { + for (auto &trace : batch_trace) { + LOG(INFO) << "trace:" << std::endl << trace.DebugString(); + } + } +} + +TEST_F(SyntaxNetComponentTest, NoTracingDropsFeatureNames) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + constexpr int kBeamSize = 1; + const auto test_parser = + CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str}); + const auto link_features = test_parser->GetRawLinkFeatures(0); + + // The fml associated with the channel is "stack.focus stack(1).focus". + // Both features should lack the feature_name field. + EXPECT_EQ(link_features.size(), 2); + EXPECT_FALSE(link_features.at(0).has_feature_name()); + EXPECT_FALSE(link_features.at(1).has_feature_name()); +} + +TEST_F(SyntaxNetComponentTest, TracingOutputsFeatureNames) { + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + + constexpr int kBeamSize = 1; + auto test_parser = CreateParserWithBeamSize(kBeamSize, {}, {sentence_0_str}); + test_parser->InitializeTracing(); + const auto link_features = test_parser->GetRawLinkFeatures(0); + + // The fml associated with the channel is "stack.focus stack(1).focus". + EXPECT_EQ(link_features.size(), 2); + EXPECT_EQ(link_features.at(0).feature_name(), "stack.focus"); + EXPECT_EQ(link_features.at(1).feature_name(), "stack(1).focus"); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.cc b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.cc new file mode 100644 index 0000000000000000000000000000000000000000..8e394a09d1b312aa3b2f51053b9660d8e8686d97 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.cc @@ -0,0 +1,64 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.h" + +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +void SyntaxNetLinkFeatureExtractor::Setup(TaskContext *context) { + ParserEmbeddingFeatureExtractor::Setup(context); + + if (NumEmbeddings() > 0) { + channel_sources_ = utils::Split( + context->Get( + tensorflow::strings::StrCat(ArgPrefix(), "_", "source_components"), + ""), + ';'); + channel_layers_ = utils::Split( + context->Get( + tensorflow::strings::StrCat(ArgPrefix(), "_", "source_layers"), ""), + ';'); + channel_translators_ = utils::Split( + context->Get( + tensorflow::strings::StrCat(ArgPrefix(), "_", "source_translators"), + ""), + ';'); + } + + CHECK_EQ(channel_sources_.size(), NumEmbeddings()); + CHECK_EQ(channel_layers_.size(), NumEmbeddings()); + CHECK_EQ(channel_translators_.size(), NumEmbeddings()); +} + +void SyntaxNetLinkFeatureExtractor::AddLinkedFeatureChannelProtos( + ComponentSpec *spec) const { + for (int embedding_idx = 0; embedding_idx < NumEmbeddings(); + ++embedding_idx) { + LinkedFeatureChannel *channel = spec->add_linked_feature(); + channel->set_name(embedding_name(embedding_idx)); + channel->set_fml(embedding_fml()[embedding_idx]); + channel->set_embedding_dim(EmbeddingDims(embedding_idx)); + channel->set_size(FeatureSize(embedding_idx)); + channel->set_source_layer(channel_layers_[embedding_idx]); + channel->set_source_component(channel_sources_[embedding_idx]); + channel->set_source_translator(channel_translators_[embedding_idx]); + } +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.h b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.h new file mode 100644 index 0000000000000000000000000000000000000000..4d94cfb5931f115ade015d5292d07847b6560720 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.h @@ -0,0 +1,70 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_LINK_FEATURE_EXTRACTOR_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_LINK_FEATURE_EXTRACTOR_H_ + +#include +#include + +#include "dragnn/protos/spec.pb.h" +#include "syntaxnet/embedding_feature_extractor.h" +#include "syntaxnet/parser_state.h" +#include "syntaxnet/parser_transitions.h" +#include "syntaxnet/task_context.h" + +namespace syntaxnet { +namespace dragnn { + +// Provides feature extraction for linked features in the +// WrapperParserComponent. This re-ues the EmbeddingFeatureExtractor +// architecture to get another set of feature extractors. Note that we should +// ignore predicate maps here, and we don't care about the vocabulary size +// because all the feature values will be used for translation, but this means +// we can configure the extractor from the GCL using the standard +// neurosis-lib.wf syntax. +// +// Because it uses a different prefix, it can be executed in the same wf.stage +// as the regular fixed extractor. +class SyntaxNetLinkFeatureExtractor : public ParserEmbeddingFeatureExtractor { + public: + SyntaxNetLinkFeatureExtractor() : ParserEmbeddingFeatureExtractor("link") {} + ~SyntaxNetLinkFeatureExtractor() override {} + + const string ArgPrefix() const override { return "link"; } + + // Parses the TaskContext to get additional information like target layers, + // etc. + void Setup(TaskContext *context) override; + + // Called during InitComponentProtoTask to add the specification from the + // wrapped feature extractor as LinkedFeatureChannel protos. + void AddLinkedFeatureChannelProtos(ComponentSpec *spec) const; + + private: + // Source component names for each channel. + std::vector channel_sources_; + + // Source layer names for each channel. + std::vector channel_layers_; + + // Source translator name for each channel. + std::vector channel_translators_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_LINK_FEATURE_EXTRACTOR_H_ diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor_test.cc b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..7d5f2188410376b15c6c3bbefda6dfe5d9125ae5 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_link_feature_extractor_test.cc @@ -0,0 +1,78 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/syntaxnet/syntaxnet_link_feature_extractor.h" + +#include + +#include "dragnn/core/test/generic.h" +#include "dragnn/protos/spec.pb.h" +#include "syntaxnet/task_context.h" +#include "tensorflow/core/platform/test.h" + +using syntaxnet::test::EqualsProto; + +namespace syntaxnet { +namespace dragnn { + +class ExportSpecTest : public ::testing::Test { + public: +}; + +TEST_F(ExportSpecTest, WritesChannelSpec) { + TaskContext context; + + context.SetParameter("neurosis_feature_syntax_version", "2"); + context.SetParameter("link_features", "input.focus;stack.focus"); + context.SetParameter("link_embedding_names", "tagger;parser"); + context.SetParameter("link_predicate_maps", "none;none"); + context.SetParameter("link_embedding_dims", "16;16"); + context.SetParameter("link_source_components", "tagger;parser"); + context.SetParameter("link_source_layers", "hidden0;lstm"); + context.SetParameter("link_source_translators", "token;last_action"); + + SyntaxNetLinkFeatureExtractor link_features; + link_features.Setup(&context); + link_features.Init(&context); + + ComponentSpec spec; + link_features.AddLinkedFeatureChannelProtos(&spec); + const string expected_spec_str = R"( + linked_feature { + name: "tagger" + fml: "input.focus" + embedding_dim: 16 + size: 1 + source_component: "tagger" + source_translator: "token" + source_layer: "hidden0" + } + linked_feature { + name: "parser" + fml: "stack.focus" + embedding_dim: 16 + size: 1 + source_component: "parser" + source_translator: "last_action" + source_layer: "lstm" + } + )"; + ComponentSpec expected_spec; + TextFormat::ParseFromString(expected_spec_str, &expected_spec); + EXPECT_THAT(spec, EqualsProto(expected_spec)); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state.cc b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state.cc new file mode 100644 index 0000000000000000000000000000000000000000..a15e883b2578c3880061945c429661e074d409ae --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state.cc @@ -0,0 +1,100 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/syntaxnet/syntaxnet_transition_state.h" + +#include "tensorflow/core/lib/strings/strcat.h" +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +SyntaxNetTransitionState::SyntaxNetTransitionState( + std::unique_ptr parser_state, SyntaxNetSentence *sentence) + : parser_state_(std::move(parser_state)), sentence_(sentence) { + score_ = 0; + current_beam_index_ = -1; + parent_beam_index_ = 0; + step_for_token_.resize(sentence->sentence()->token_size(), -1); + parent_for_token_.resize(sentence->sentence()->token_size(), -1); + parent_step_for_token_.resize(sentence->sentence()->token_size(), -1); +} + +void SyntaxNetTransitionState::Init(const TransitionState &parent) { + score_ = parent.GetScore(); + parent_beam_index_ = parent.GetBeamIndex(); +} + +std::unique_ptr SyntaxNetTransitionState::Clone() + const { + // Create a new state from a clone of the underlying parser state. + std::unique_ptr cloned_state(parser_state_->Clone()); + std::unique_ptr new_state( + new SyntaxNetTransitionState(std::move(cloned_state), sentence_)); + + // Copy relevant data members and set non-copied ones to flag values. + new_state->score_ = score_; + new_state->current_beam_index_ = current_beam_index_; + new_state->parent_beam_index_ = parent_beam_index_; + new_state->step_for_token_ = step_for_token_; + new_state->parent_step_for_token_ = parent_step_for_token_; + new_state->parent_for_token_ = parent_for_token_; + + // Copy trace if it exists. + if (trace_) { + new_state->trace_.reset(new ComponentTrace(*trace_)); + } + + return new_state; +} + +const int SyntaxNetTransitionState::ParentBeamIndex() const { + return parent_beam_index_; +} + +const int SyntaxNetTransitionState::GetBeamIndex() const { + return current_beam_index_; +} + +void SyntaxNetTransitionState::SetBeamIndex(const int index) { + current_beam_index_ = index; +} + +const float SyntaxNetTransitionState::GetScore() const { return score_; } + +void SyntaxNetTransitionState::SetScore(const float score) { score_ = score; } + +string SyntaxNetTransitionState::HTMLRepresentation() const { + // Crude HTML string showing the stack and the word on the input. + string html = "Stack: "; + for (int i = parser_state_->StackSize() - 1; i >= 0; --i) { + const int word_idx = parser_state_->Stack(i); + if (word_idx >= 0) { + tensorflow::strings::StrAppend( + &html, parser_state_->GetToken(word_idx).word(), " "); + } + } + tensorflow::strings::StrAppend(&html, "| Input: "); + const int word_idx = parser_state_->Input(0); + if (word_idx >= 0) { + tensorflow::strings::StrAppend( + &html, parser_state_->GetToken(word_idx).word(), " "); + } + + return html; +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state.h b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state.h new file mode 100644 index 0000000000000000000000000000000000000000..193b33f532c11d5a1ad1efb81c124b790c2952f0 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state.h @@ -0,0 +1,159 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_TRANSITION_STATE_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_TRANSITION_STATE_H_ + +#include + +#include "dragnn/core/interfaces/cloneable_transition_state.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/io/syntaxnet_sentence.h" +#include "dragnn/protos/trace.pb.h" +#include "syntaxnet/base.h" +#include "syntaxnet/parser_state.h" + +namespace syntaxnet { +namespace dragnn { + +class SyntaxNetTransitionState + : public CloneableTransitionState { + public: + // Create a SyntaxNetTransitionState to wrap this nlp_saft::ParserState. + SyntaxNetTransitionState(std::unique_ptr parser_state, + SyntaxNetSentence *sentence); + + // Initialize this TransitionState from a previous TransitionState. The + // ParentBeamIndex is the location of that previous TransitionState in the + // provided beam. + void Init(const TransitionState &parent) override; + + // Produces a new state with the same backing data as this state. + std::unique_ptr Clone() const override; + + // Return the beam index of the state passed into the initializer of this + // TransitionState. + const int ParentBeamIndex() const override; + + // Get the current beam index for this state. + const int GetBeamIndex() const override; + + // Set the current beam index for this state. + void SetBeamIndex(const int index) override; + + // Get the score associated with this transition state. + const float GetScore() const override; + + // Set the score associated with this transition state. + void SetScore(const float score) override; + + // Depicts this state as an HTML-language string. + string HTMLRepresentation() const override; + + // **** END INHERITED INTERFACE **** + + // TODO(googleuser): Make these comments actually mean something. + // Data accessor. + int step_for_token(int token) { + if (token < 0 || token >= step_for_token_.size()) { + return -1; + } else { + return step_for_token_.at(token); + } + } + + // Data setter. + void set_step_for_token(int token, int step) { + step_for_token_.insert(step_for_token_.begin() + token, step); + } + + // Data accessor. + int parent_step_for_token(int token) { + if (token < 0 || token >= step_for_token_.size()) { + return -1; + } else { + return parent_step_for_token_.at(token); + } + } + + // Data setter. + void set_parent_step_for_token(int token, int parent_step) { + parent_step_for_token_.insert(parent_step_for_token_.begin() + token, + parent_step); + } + + // Data accessor. + int parent_for_token(int token) { + if (token < 0 || token >= step_for_token_.size()) { + return -1; + } else { + return parent_for_token_.at(token); + } + } + + // Data setter. + void set_parent_for_token(int token, int parent) { + parent_for_token_.insert(parent_for_token_.begin() + token, parent); + } + + // Accessor for the underlying nlp_saft::ParserState. + ParserState *parser_state() { return parser_state_.get(); } + + // Accessor for the underlying sentence object. + SyntaxNetSentence *sentence() { return sentence_; } + + ComponentTrace *mutable_trace() { + CHECK(trace_) << "Trace is not initialized"; + return trace_.get(); + } + void set_trace(std::unique_ptr trace) { + trace_ = std::move(trace); + } + + private: + // Underlying ParserState object that is being wrapped. + std::unique_ptr parser_state_; + + // Sentence object that is being examined with this state. + SyntaxNetSentence *sentence_; + + // The current score of this state. + float score_; + + // The current beam index of this state. + int current_beam_index_; + + // The parent beam index for this state. + int parent_beam_index_; + + // Maintains a list of which steps in the history correspond to + // representations for each of the tokens on the stack. + std::vector step_for_token_; + + // Maintains a list of which steps in the history correspond to the actions + // that assigned a parent for tokens when reduced. + std::vector parent_step_for_token_; + + // Maintain the parent index of a token in the system. + std::vector parent_for_token_; + + // Trace of the history to produce this state. + std::unique_ptr trace_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_SYNTAXNET_SYNTAXNET_TRANSITION_STATE_H_ diff --git a/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state_test.cc b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..ab4ab303f1e54ad2dc06f3602a58df468db95b23 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/syntaxnet_transition_state_test.cc @@ -0,0 +1,291 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/syntaxnet/syntaxnet_transition_state.h" + +#include "dragnn/components/syntaxnet/syntaxnet_component.h" +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/core/test/generic.h" +#include "dragnn/core/test/mock_transition_state.h" +#include "dragnn/io/sentence_input_batch.h" +#include "dragnn/protos/spec.pb.h" +#include "syntaxnet/sentence.pb.h" +#include "tensorflow/core/lib/core/errors.h" +#include "tensorflow/core/lib/core/status.h" +#include "tensorflow/core/lib/io/path.h" +#include "tensorflow/core/platform/env.h" +#include "tensorflow/core/platform/protobuf.h" +#include "tensorflow/core/platform/test.h" + +// This test suite is intended to validate the contracts that the DRAGNN +// system expects from all transition state subclasses. Developers creating +// new TransitionStates should copy this test and modify it as necessary, +// using it to ensure their state conforms to DRAGNN expectations. + +namespace syntaxnet { +namespace dragnn { + +namespace { + +const char kSentence0[] = R"( +token { + word: "Sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" + break_level: NO_BREAK +} +token { + word: "0" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" + break_level: SPACE_BREAK +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." category: "." label: "punct" + break_level: NO_BREAK +} +)"; + +} // namespace + +using testing::Return; + +class SyntaxNetTransitionStateTest : public ::testing::Test { + public: + std::unique_ptr CreateState() { + // Get the master spec proto from the test data directory. + MasterSpec master_spec; + string file_name = tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + "master_spec.textproto"); + TF_CHECK_OK(tensorflow::ReadTextProto(tensorflow::Env::Default(), file_name, + &master_spec)); + + // Get all the resource protos from the test data directory. + for (Resource &resource : + *(master_spec.mutable_component(0)->mutable_resource())) { + resource.mutable_part(0)->set_file_pattern(tensorflow::io::JoinPath( + test::GetTestDataPrefix(), "dragnn/components/syntaxnet/testdata", + resource.part(0).file_pattern())); + } + + // Create an empty input batch and beam vector to initialize the parser. + Sentence sentence_0; + TextFormat::ParseFromString(kSentence0, &sentence_0); + string sentence_0_str; + sentence_0.SerializeToString(&sentence_0_str); + data_.reset(new InputBatchCache(sentence_0_str)); + SentenceInputBatch *sentences = data_->GetAs(); + + // Create a parser comoponent that will generate a parser state for this + // test. + SyntaxNetComponent component; + component.InitializeComponent(*(master_spec.mutable_component(0))); + std::vector> states; + constexpr int kBeamSize = 1; + component.InitializeData(states, kBeamSize, data_.get()); + + // Get a transition state from the component. + std::unique_ptr test_state = + component.CreateState(&(sentences->data()->at(0))); + return test_state; + } + + std::unique_ptr data_; +}; + +// Validates the consistency of the beam index setter and getter. +TEST_F(SyntaxNetTransitionStateTest, CanSetAndGetBeamIndex) { + // Create and initialize a test state. + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr int kOldBeamIndex = 12; + test_state->SetBeamIndex(kOldBeamIndex); + EXPECT_EQ(test_state->GetBeamIndex(), kOldBeamIndex); + + constexpr int kNewBeamIndex = 7; + test_state->SetBeamIndex(kNewBeamIndex); + EXPECT_EQ(test_state->GetBeamIndex(), kNewBeamIndex); +} + +// Validates the consistency of the score setter and getter. +TEST_F(SyntaxNetTransitionStateTest, CanSetAndGetScore) { + // Create and initialize a test state. + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr float kOldScore = 12.1; + test_state->SetScore(kOldScore); + EXPECT_EQ(test_state->GetScore(), kOldScore); + + constexpr float kNewScore = 7.2; + test_state->SetScore(kNewScore); + EXPECT_EQ(test_state->GetScore(), kNewScore); +} + +// This test ensures that the initializing state's current index is saved +// as the parent beam index of the state being initialized. +TEST_F(SyntaxNetTransitionStateTest, ReportsParentBeamIndex) { + // Create a mock transition state that wil report a specific current index. + // This index should become the parent state index for the test state. + MockTransitionState mock_state; + constexpr int kParentBeamIndex = 1138; + EXPECT_CALL(mock_state, GetBeamIndex()) + .WillRepeatedly(Return(kParentBeamIndex)); + + auto test_state = CreateState(); + test_state->Init(mock_state); + EXPECT_EQ(test_state->ParentBeamIndex(), kParentBeamIndex); +} + +// This test ensures that the initializing state's current score is saved +// as the current score of the state being initialized. +TEST_F(SyntaxNetTransitionStateTest, InitializationCopiesParentScore) { + // Create a mock transition state that wil report a specific current index. + // This index should become the parent state index for the test state. + MockTransitionState mock_state; + constexpr float kParentScore = 24.12; + EXPECT_CALL(mock_state, GetScore()).WillRepeatedly(Return(kParentScore)); + + auto test_state = CreateState(); + test_state->Init(mock_state); + EXPECT_EQ(test_state->GetScore(), kParentScore); +} + +// This test ensures that calling Clone maintains the state data (parent beam +// index, beam index, score, etc.) of the state that was cloned. +TEST_F(SyntaxNetTransitionStateTest, CloningMaintainsState) { + // Create and initialize the state-> + MockTransitionState mock_state; + constexpr int kParentBeamIndex = 1138; + EXPECT_CALL(mock_state, GetBeamIndex()) + .WillRepeatedly(Return(kParentBeamIndex)); + auto test_state = CreateState(); + test_state->Init(mock_state); + + // Validate the internal state of the test state. + constexpr float kOldScore = 20.0; + test_state->SetScore(kOldScore); + EXPECT_EQ(test_state->GetScore(), kOldScore); + constexpr int kOldBeamIndex = 12; + test_state->SetBeamIndex(kOldBeamIndex); + EXPECT_EQ(test_state->GetBeamIndex(), kOldBeamIndex); + + auto clone = test_state->Clone(); + + // The clone should have identical state to the old state. + EXPECT_EQ(clone->ParentBeamIndex(), kParentBeamIndex); + EXPECT_EQ(clone->GetScore(), kOldScore); + EXPECT_EQ(clone->GetBeamIndex(), kOldBeamIndex); +} + +// Validates the consistency of the step_for_token setter and getter. +TEST_F(SyntaxNetTransitionStateTest, CanSetAndGetStepForToken) { + // Create and initialize a test state. + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr int kStepForTokenZero = 12; + constexpr int kStepForTokenTwo = 34; + test_state->set_step_for_token(0, kStepForTokenZero); + test_state->set_step_for_token(2, kStepForTokenTwo); + + // Expect that the set tokens return values and the unset steps return the + // default. + constexpr int kDefaultValue = -1; + EXPECT_EQ(kStepForTokenZero, test_state->step_for_token(0)); + EXPECT_EQ(kDefaultValue, test_state->step_for_token(1)); + EXPECT_EQ(kStepForTokenTwo, test_state->step_for_token(2)); + + // Expect that out of bound accesses will return the default. (There are only + // 3 tokens in the backing sentence, so token 3 and greater are out of bound.) + EXPECT_EQ(kDefaultValue, test_state->step_for_token(-1)); + EXPECT_EQ(kDefaultValue, test_state->step_for_token(3)); +} + +// Validates the consistency of the parent_step_for_token setter and getter. +TEST_F(SyntaxNetTransitionStateTest, CanSetAndGetParentStepForToken) { + // Create and initialize a test state. + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr int kStepForTokenZero = 12; + constexpr int kStepForTokenTwo = 34; + test_state->set_parent_step_for_token(0, kStepForTokenZero); + test_state->set_parent_step_for_token(2, kStepForTokenTwo); + + // Expect that the set tokens return values and the unset steps return the + // default. + constexpr int kDefaultValue = -1; + EXPECT_EQ(kStepForTokenZero, test_state->parent_step_for_token(0)); + EXPECT_EQ(kDefaultValue, test_state->parent_step_for_token(1)); + EXPECT_EQ(kStepForTokenTwo, test_state->parent_step_for_token(2)); + + // Expect that out of bound accesses will return the default. (There are only + // 3 tokens in the backing sentence, so token 3 and greater are out of bound.) + EXPECT_EQ(kDefaultValue, test_state->parent_step_for_token(-1)); + EXPECT_EQ(kDefaultValue, test_state->parent_step_for_token(3)); +} + +// Validates the consistency of the parent_for_token setter and getter. +TEST_F(SyntaxNetTransitionStateTest, CanSetAndGetParentForToken) { + // Create and initialize a test state. + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr int kParentForTokenZero = 12; + constexpr int kParentForTokenTwo = 34; + test_state->set_parent_for_token(0, kParentForTokenZero); + test_state->set_parent_for_token(2, kParentForTokenTwo); + + // Expect that the set tokens return values and the unset steps return the + // default. + constexpr int kDefaultValue = -1; + EXPECT_EQ(kParentForTokenZero, test_state->parent_for_token(0)); + EXPECT_EQ(kDefaultValue, test_state->parent_for_token(1)); + EXPECT_EQ(kParentForTokenTwo, test_state->parent_for_token(2)); + + // Expect that out of bound accesses will return the default. (There are only + // 3 tokens in the backing sentence, so token 3 and greater are out of bound.) + EXPECT_EQ(kDefaultValue, test_state->parent_for_token(-1)); + EXPECT_EQ(kDefaultValue, test_state->parent_for_token(3)); +} + +// Validates the consistency of trace proto setter/getter. +TEST_F(SyntaxNetTransitionStateTest, CanSetAndGetTrace) { + // Create and initialize a test state. + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + const string kTestComponentName = "test"; + std::unique_ptr trace; + trace.reset(new ComponentTrace()); + trace->set_name(kTestComponentName); + test_state->set_trace(std::move(trace)); + + EXPECT_EQ(trace.get(), nullptr); + EXPECT_EQ(test_state->mutable_trace()->name(), kTestComponentName); + + // Should be preserved when cloing. + auto cloned_state = test_state->Clone(); + EXPECT_EQ(cloned_state->mutable_trace()->name(), kTestComponentName); + EXPECT_EQ(test_state->mutable_trace()->name(), kTestComponentName); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/components/syntaxnet/testdata/master_spec.textproto b/syntaxnet/dragnn/components/syntaxnet/testdata/master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..ffd87c44d96ac93782c5c40e8688993c96309040 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/testdata/master_spec.textproto @@ -0,0 +1,48 @@ +component { + name: "parser" + transition_system { + registered_name: "arc-standard" + } + resource { + name: 'label-map' + part { + file_pattern: 'syntaxnet-tagger.label-map' + file_format: 'text' + } + } + resource { + name: 'tag-map' + part { + file_pattern: 'syntaxnet-tagger.tag-map' + file_format: 'text' + } + } + fixed_feature { + name: "tags" + fml: "input.tag input(1).tag" + embedding_dim: 32 + vocabulary_size: 46 + size: 2 + predicate_map: "hashed" + } + fixed_feature { + name: "tags" + fml: "input(-1).tag input.tag input(1).tag" + embedding_dim: 32 + vocabulary_size: 46 + size: 3 + predicate_map: "hashed" + } + linked_feature { + name: "recurrent_stack" + fml: "stack.focus stack(1).focus" + embedding_dim: 32 + size: 2 + source_component: "parser" + source_translator: "identity" + source_layer: "hidden_0" + } + backend { + registered_name: "SyntaxNetComponent" + } +} diff --git a/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.label-map b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.label-map new file mode 100644 index 0000000000000000000000000000000000000000..8fdd1fc86d9f33e2e639d794bb2b719a0767bc75 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.label-map @@ -0,0 +1,47 @@ +46 +punct 243160 +prep 194627 +pobj 186958 +det 170592 +nsubj 144821 +nn 144800 +amod 117242 +ROOT 90592 +dobj 88551 +aux 76523 +advmod 72893 +conj 59384 +cc 57532 +num 36350 +poss 35117 +dep 34986 +ccomp 29470 +cop 25991 +mark 25141 +xcomp 25111 +rcmod 16234 +auxpass 15740 +advcl 14996 +possessive 14866 +nsubjpass 14133 +pcomp 12488 +appos 11112 +partmod 11106 +neg 11090 +number 10658 +prt 7123 +quantmod 6653 +tmod 5418 +infmod 5134 +npadvmod 3213 +parataxis 3012 +mwe 2793 +expl 2712 +iobj 1642 +acomp 1632 +discourse 1381 +csubj 1225 +predet 1160 +preconj 749 +goeswith 146 +csubjpass 41 diff --git a/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.master-spec b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.master-spec new file mode 100644 index 0000000000000000000000000000000000000000..03305f17d1e3995869773e4e3bec9b60368f195a --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.master-spec @@ -0,0 +1,65 @@ +component { + name: "tagger" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet-tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet-tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet-tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + } + fixed_feature { + name: "words" + fml: "input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 5 + } + linked_feature { + name: "rnn" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "tagger" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: "SyntaxNetComponent" + } + network_unit { + registered_name: 'feed-forward' + parameters { + key: 'hidden_layer_sizes' + value: '64' + } + } +} diff --git a/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.tag-map b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.tag-map new file mode 100644 index 0000000000000000000000000000000000000000..2cad1a73b010ace29854dc80296c79728e9b3c52 --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.tag-map @@ -0,0 +1,50 @@ +49 +NN 285194 +IN 228165 +DT 179147 +NNP 175147 +JJ 125667 +NNS 115732 +, 97481 +. 85938 +RB 78513 +VB 63952 +CC 57554 +VBD 56635 +CD 55674 +PRP 55244 +VBZ 48126 +VBN 44458 +VBG 34524 +VBP 33669 +TO 28772 +MD 22364 +PRP$ 20706 +HYPH 18526 +POS 14905 +`` 12193 +'' 12154 +WDT 10267 +: 8713 +$ 7993 +WP 7336 +RP 7335 +WRB 6634 +JJR 6295 +NNPS 5917 +-RRB- 3904 +-LRB- 3840 +JJS 3596 +RBR 3186 +EX 2733 +UH 1521 +RBS 1467 +PDT 1271 +FW 928 +NFP 844 +SYM 652 +ADD 476 +LS 392 +WP$ 332 +GW 184 +AFX 42 diff --git a/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.word-map b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.word-map new file mode 100644 index 0000000000000000000000000000000000000000..86cc301ae201004b586d87dd28c71d7df6e9788a --- /dev/null +++ b/syntaxnet/dragnn/components/syntaxnet/testdata/syntaxnet-tagger.word-map @@ -0,0 +1,4 @@ +3 +Sentence 4 +. 3 +0 2 diff --git a/syntaxnet/dragnn/components/util/BUILD b/syntaxnet/dragnn/components/util/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..7600cd19ef491d1b11a7d5a49db27418d1d0f5cf --- /dev/null +++ b/syntaxnet/dragnn/components/util/BUILD @@ -0,0 +1,12 @@ +package( + default_visibility = ["//visibility:public"], + features = ["-layering_check"], +) + +cc_library( + name = "bulk_feature_extractor", + hdrs = ["bulk_feature_extractor.h"], + deps = [ + "//syntaxnet:base", + ], +) diff --git a/syntaxnet/dragnn/components/util/bulk_feature_extractor.h b/syntaxnet/dragnn/components/util/bulk_feature_extractor.h new file mode 100644 index 0000000000000000000000000000000000000000..9fbd08a2eb6226678b5967abf05d8c0e10232542 --- /dev/null +++ b/syntaxnet/dragnn/components/util/bulk_feature_extractor.h @@ -0,0 +1,110 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_UTIL_BULK_FEATURE_EXTRACTOR_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_UTIL_BULK_FEATURE_EXTRACTOR_H_ + +#include +#include +#include "tensorflow/core/platform/types.h" + +namespace syntaxnet { +namespace dragnn { + +// Provides a wrapper for allocator functions and padding data for the Bulk +// ExtractFixedFeatures operation. +class BulkFeatureExtractor { + public: + // Create a BulkFeatureExtractor with the given allocator functions and + // padding. The allocator functions should take a channel and an element + // count and return a contigous block of memory that is associated with that + // channel (the caller can decide what that means). If use_padding is true, + // the provided pad_to_step and pad_to_element will be used to calculate + // the ID size. + BulkFeatureExtractor( + std::function + allocate_indices_by_channel, + std::function + allocate_ids_by_channel, + std::function + allocate_weights_by_channel, + bool use_padding, int pad_to_step, int pad_to_element) + : use_padding_(use_padding), + pad_to_step_(pad_to_step), + pad_to_element_(pad_to_element), + allocate_indices_by_channel_(std::move(allocate_indices_by_channel)), + allocate_ids_by_channel_(std::move(allocate_ids_by_channel)), + allocate_weights_by_channel_(std::move(allocate_weights_by_channel)) {} + + // Create a BulkFeatureExtractor with allocator functions as above, but with + // use_padding set to False. Useful when you know your caller will never + // need to pad. + BulkFeatureExtractor( + std::function + allocate_indices_by_channel, + std::function + allocate_ids_by_channel, + std::function + allocate_weights_by_channel) + : use_padding_(false), + pad_to_step_(-1), + pad_to_element_(-1), + allocate_indices_by_channel_(std::move(allocate_indices_by_channel)), + allocate_ids_by_channel_(std::move(allocate_ids_by_channel)), + allocate_weights_by_channel_(std::move(allocate_weights_by_channel)) {} + + // Invoke the index memory allocator. + tensorflow::int32 *AllocateIndexMemory(int channel, int num_elements) const { + return allocate_indices_by_channel_(channel, num_elements); + } + + // Invoke the ID memory allocator. + tensorflow::int64 *AllocateIdMemory(int channel, int num_elements) const { + return allocate_ids_by_channel_(channel, num_elements); + } + + // Invoke the weight memory allocator. + float *AllocateWeightMemory(int channel, int num_elements) const { + return allocate_weights_by_channel_(channel, num_elements); + } + + // Given the total number of steps and total number of elements for a given + // feature, calculate the index (not ID) of that feature. Based on how the + // BulkFeatureExtractor was constructed, it may use the given number of steps + // and number of elements, or it may use the passed padded number. + int GetIndex(int total_steps, int num_elements, int feature_idx, + int element_idx, int step_idx) const { + const int steps = (use_padding_) ? pad_to_step_ : total_steps; + const int elements = (use_padding_) ? pad_to_element_ : num_elements; + const int feature_offset = elements * steps; + const int element_offset = steps; + return (feature_idx * feature_offset) + (element_idx * element_offset) + + step_idx; + } + + private: + const bool use_padding_; + const int pad_to_step_; + const int pad_to_element_; + const std::function + allocate_indices_by_channel_; + const std::function allocate_ids_by_channel_; + const std::function allocate_weights_by_channel_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_COMPONENTS_UTIL_BULK_FEATURE_EXTRACTOR_H_ diff --git a/syntaxnet/dragnn/conll2017/BUILD b/syntaxnet/dragnn/conll2017/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..ead13b1696fa140ffb97a46041971639ab59196a --- /dev/null +++ b/syntaxnet/dragnn/conll2017/BUILD @@ -0,0 +1,9 @@ +py_binary( + name = "make_parser_spec", + srcs = ["make_parser_spec.py"], + deps = [ + "//dragnn/protos:spec_py_pb2", + "//dragnn/python:spec_builder", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) diff --git a/syntaxnet/dragnn/conll2017/conll_parser_trainer.sh b/syntaxnet/dragnn/conll2017/conll_parser_trainer.sh new file mode 100755 index 0000000000000000000000000000000000000000..d08d035f811ca28df47c7390effb919f28e8b332 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/conll_parser_trainer.sh @@ -0,0 +1,40 @@ +#!/bin/sh +# Copyright 2016 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +# A script to train the CONLL2017 baseline. +set -e + +language=English +output_dir=./trained-"$language" + +training_corpus=$1 +dev_corpus=$2 + +bazel build -c opt //dragnn/tools:trainer //dragnn/conll2017:make_parser_spec + +mkdir -p $output_dir +bazel-bin/dragnn/conll2017/make_parser_spec \ + --spec_file="$output_dir/parser_spec.textproto" + +bazel-bin/dragnn/tools/trainer \ + --logtostderr \ + --compute_lexicon \ + --dragnn_spec="$output_dir/parser_spec.textproto" \ + --resource_path="$output_dir/resources" \ + --training_corpus_path="$training_corpus" \ + --tune_corpus_path="$dev_corpus" \ + --tensorboard_dir="$output_dir/tensorboard" \ + --checkpoint_filename="$output_dir/checkpoint.model" diff --git a/syntaxnet/dragnn/conll2017/make_parser_spec.py b/syntaxnet/dragnn/conll2017/make_parser_spec.py new file mode 100644 index 0000000000000000000000000000000000000000..3dc69d1e39fafa180327cd149bc98e0fc7ef5e69 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/make_parser_spec.py @@ -0,0 +1,105 @@ +# Copyright 2016 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Construct the spec for the CONLL2017 Parser baseline.""" + +import tensorflow as tf + +from tensorflow.python.platform import gfile + +from dragnn.protos import spec_pb2 +from dragnn.python import spec_builder + +flags = tf.app.flags +FLAGS = flags.FLAGS + +flags.DEFINE_string('spec_file', 'parser_spec.textproto', + 'Filename to save the spec to.') + + +def main(unused_argv): + # Left-to-right, character-based LSTM. + char2word = spec_builder.ComponentSpecBuilder('char_lstm') + char2word.set_network_unit( + name='wrapped_units.LayerNormBasicLSTMNetwork', + hidden_layer_sizes='256') + char2word.set_transition_system(name='char-shift-only', left_to_right='true') + char2word.add_fixed_feature(name='chars', fml='char-input.text-char', + embedding_dim=16) + + # Lookahead LSTM reads right-to-left to represent the rightmost context of the + # words. It gets word embeddings from the char model. + lookahead = spec_builder.ComponentSpecBuilder('lookahead') + lookahead.set_network_unit( + name='wrapped_units.LayerNormBasicLSTMNetwork', + hidden_layer_sizes='256') + lookahead.set_transition_system(name='shift-only', left_to_right='false') + lookahead.add_link(source=char2word, fml='input.last-char-focus', + embedding_dim=64) + + # Construct the tagger. This is a simple left-to-right LSTM sequence tagger. + tagger = spec_builder.ComponentSpecBuilder('tagger') + tagger.set_network_unit( + name='wrapped_units.LayerNormBasicLSTMNetwork', + hidden_layer_sizes='256') + tagger.set_transition_system(name='tagger') + tagger.add_token_link(source=lookahead, fml='input.focus', embedding_dim=64) + + # Construct the parser. + parser = spec_builder.ComponentSpecBuilder('parser') + parser.set_network_unit(name='FeedForwardNetwork', hidden_layer_sizes='256', + layer_norm_hidden='true') + parser.set_transition_system(name='arc-standard') + parser.add_token_link(source=lookahead, fml='input.focus', embedding_dim=64) + parser.add_token_link( + source=tagger, fml='input.focus stack.focus stack(1).focus', + embedding_dim=64) + + # Add discrete features of the predicted parse tree so far, like in Parsey + # McParseface. + parser.add_fixed_feature(name='labels', embedding_dim=16, + fml=' '.join([ + 'stack.child(1).label', + 'stack.child(1).sibling(-1).label', + 'stack.child(-1).label', + 'stack.child(-1).sibling(1).label', + 'stack(1).child(1).label', + 'stack(1).child(1).sibling(-1).label', + 'stack(1).child(-1).label', + 'stack(1).child(-1).sibling(1).label', + 'stack.child(2).label', + 'stack.child(-2).label', + 'stack(1).child(2).label', + 'stack(1).child(-2).label'])) + + # Recurrent connection for the arc-standard parser. For both tokens on the + # stack, we connect to the last time step to either SHIFT or REDUCE that + # token. This allows the parser to build up compositional representations of + # phrases. + parser.add_link( + source=parser, # recurrent connection + name='rnn-stack', # unique identifier + fml='stack.focus stack(1).focus', # look for both stack tokens + source_translator='shift-reduce-step', # maps token indices -> step + embedding_dim=64) # project down to 64 dims + + master_spec = spec_pb2.MasterSpec() + master_spec.component.extend( + [char2word.spec, lookahead.spec, tagger.spec, parser.spec]) + + with gfile.FastGFile(FLAGS.spec_file, 'w') as f: + f.write(str(master_spec).encode('utf-8')) + +if __name__ == '__main__': + tf.app.run() diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/category-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/category-map new file mode 100644 index 0000000000000000000000000000000000000000..4a23ecdbaf16beec0851882f533ab666dd882e02 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/category-map @@ -0,0 +1,16 @@ +15 +NOUN 25758 +VERB 14242 +PUNCT 12945 +PART 9977 +PROPN 8280 +NUM 5082 +ADV 4323 +ADP 4165 +ADJ 2318 +AUX 2024 +PRON 1343 +CCONJ 1329 +DET 994 +X 948 +SYM 25 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/char-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/char-map new file mode 100644 index 0000000000000000000000000000000000000000..d86506634dcc85078405f3373bbe34e4a6d31391 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/char-map @@ -0,0 +1,3518 @@ +3517 +9 8220 +, 5926 +的 4320 +. 3945 +年 1387 +在 1372 +一 1331 +是 1254 +為 1153 +國 1092 +中 1087 +人 1010 +、 954 +有 890 +於 849 +大 774 +和 741 +了 672 +他 604 +以 563 +時 563 +不 554 +日 550 +個 538 +學 529 +上 516 +地 513 +後 501 +成 498 +會 495 +月 487 +出 443 +( 431 +) 431 +部 424 +生 410 +公 408 +到 407 +與 405 +行 404 +這 401 +發 389 +之 388 +作 378 +方 378 +家 375 +用 371 +其 361 +主 346 +e 344 +斯 341 +來 337 +由 335 +也 331 +多 331 +而 326 +西 324 +」 321 +分 321 +「 320 +物 319 +被 319 +位 318 +名 318 +a 316 +區 316 +同 311 +對 310 +法 305 +第 303 +下 300 +最 300 +並 298 +軍 298 +及 297 +可 291 +本 286 +此 286 +民 285 +長 284 +自 282 +要 280 +子 271 +爾 270 +開 269 +現 268 +文 267 +海 266 +過 265 +因 263 +i 260 +動 259 +市 259 +政 251 +新 251 +高 251 +n 250 +當 250 +美 250 +o 248 +代 248 +戰 247 +特 247 +前 246 +立 245 +r 242 +世 242 +能 241 +小 240 +道 240 +間 240 +事 239 +建 239 +理 236 +亞 234 +任 234 +體 234 +所 231 +教 230 +等 230 +面 230 +工 228 +得 228 +種 228 +加 227 +里 225 +東 224 +業 224 +經 224 +進 223 +但 222 +內 222 +利 221 +平 220 +度 219 +電 217 +機 216 +南 215 +使 214 +《 213 +》 213 +次 213 +該 210 +期 209 +三 207 +兩 206 +重 206 +北 204 +外 204 +者 203 +路 203 +車 203 +入 200 +化 198 +通 197 +們 196 +員 196 +天 196 +德 196 +場 194 +定 192 +球 191 +表 187 +克 186 +l 183 +t 183 +· 183 +總 183 +都 183 +合 180 +常 180 +口 177 +稱 177 +隊 177 +全 176 +州 176 +性 176 +至 176 +將 175 +二 173 +相 172 +力 168 +山 167 +馬 166 +或 165 +城 164 +華 164 +數 163 +星 163 +羅 162 +院 162 +科 161 +比 160 +目 160 +水 159 +共 158 +式 158 +原 156 +s 155 +些 155 +始 155 +如 154 +明 154 +格 154 +聯 154 +台 153 +士 153 +達 153 +王 152 +米 151 +著 150 +設 150 +就 149 +起 148 +然 147 +統 146 +關 146 +約 145 +英 145 +樂 144 +治 143 +說 142 +則 141 +線 141 +賽 141 +從 140 +交 139 +巴 139 +女 138 +已 138 +拉 138 +產 138 +- 137 +十 137 +布 137 +站 137 +色 136 +手 135 +省 135 +安 134 +心 133 +選 133 +元 132 +正 132 +司 131 +港 131 +議 130 +金 130 +受 129 +接 129 +無 129 +府 128 +影 128 +情 127 +運 127 +量 126 +意 125 +蘭 125 +保 123 +直 123 +她 122 +形 122 +基 121 +實 121 +尼 120 +界 119 +認 119 +務 118 +資 118 +類 118 +指 117 +提 117 +鎮 117 +帝 116 +書 116 +組 116 +義 116 +身 116 +制 115 +管 113 +型 112 +帶 112 +朝 112 +維 112 +號 112 +計 112 +傳 111 +各 111 +展 111 +很 111 +曾 111 +結 111 +u 110 +四 110 +林 110 +品 109 +沒 108 +流 107 +系 107 +鐵 107 +門 107 +古 106 +屬 106 +校 105 +視 105 +五 104 +更 104 +h 103 +廣 103 +河 102 +包 101 +改 101 +社 101 +近 101 +音 101 +領 101 +列 100 +島 100 +信 99 +演 99 +向 98 +它 98 +據 98 +語 98 +獲 97 +香 97 +C 96 +神 96 +紀 96 +還 96 +首 96 +太 95 +級 95 +縣 95 +造 95 +字 94 +空 94 +製 94 +擊 93 +權 93 +親 93 +A 92 +導 92 +決 92 +節 92 +局 91 +條 91 +每 91 +活 91 +程 91 +術 91 +變 91 +村 90 +命 89 +族 89 +集 89 +: 88 +普 88 +積 88 +點 88 +反 87 +取 87 +江 87 +研 87 +團 86 +往 86 +爭 86 +參 85 +洲 85 +遊 85 +阿 85 +c 84 +去 84 +持 84 +清 84 +論 84 +S 83 +只 83 +單 83 +師 83 +故 83 +波 83 +萬 83 +究 82 +納 82 +; 81 +d 81 +兒 81 +史 81 +園 81 +圖 81 +少 81 +應 81 +灣 81 +非 81 +別 80 +回 80 +己 80 +果 80 +死 80 +示 80 +先 79 +收 79 +航 79 +京 78 +專 78 +強 78 +想 78 +李 78 +版 78 +給 78 +解 78 +m 77 +器 77 +母 77 +處 77 +角 77 +頭 77 +黨 77 +求 76 +片 76 +龍 76 +卡 75 +支 75 +畫 75 +白 75 +知 75 +光 74 +服 74 +武 74 +歷 74 +皇 74 +蘇 74 +言 74 +記 74 +許 74 +超 74 +再 73 +放 73 +風 73 +份 72 +伊 72 +初 72 +夫 72 +離 72 +威 71 +引 71 +張 71 +技 71 +氣 71 +父 71 +病 71 +石 71 +終 71 +較 71 +供 70 +創 70 +好 70 +括 70 +湖 70 +亦 69 +即 69 +標 69 +樣 69 +漢 69 +育 69 +食 69 +功 68 +助 68 +又 68 +奧 68 +層 68 +根 68 +舉 68 +戲 67 +劇 66 +商 66 +官 66 +易 66 +班 66 +眾 66 +見 66 +轉 66 +隨 66 +題 66 +另 65 +攻 65 +職 65 +調 65 +B 64 +件 64 +域 64 +居 64 +希 64 +M 63 +具 63 +存 63 +愛 63 +我 63 +歐 63 +興 63 +速 63 +那 63 +八 62 +容 62 +觀 62 +問 61 +委 61 +播 61 +段 61 +老 61 +查 60 +構 60 +派 60 +裡 60 +陸 60 +際 60 +% 59 +p 59 +y 59 +印 59 +報 59 +環 59 +辦 59 +且 58 +便 58 +客 58 +密 58 +街 58 +陽 58 +P 57 +g 57 +堂 57 +料 57 +洛 57 +火 57 +男 57 +薩 57 +足 57 +博 56 +境 56 +態 56 +打 56 +曼 56 +營 56 +連 56 +醫 56 +低 55 +劃 55 +均 55 +感 55 +築 55 +仍 54 +佛 54 +像 54 +協 54 +哥 54 +座 54 +擔 54 +未 54 +歌 54 +牙 54 +真 54 +精 54 +聲 54 +致 54 +花 54 +裝 54 +責 54 +館 54 +律 53 +思 53 +推 53 +案 53 +深 53 +熱 53 +獨 53 +般 53 +需 53 +令 52 +修 52 +季 52 +把 52 +整 52 +核 52 +歲 52 +編 52 +蒙 52 +評 52 +話 52 +負 52 +轄 52 +黑 52 +佔 51 +千 51 +失 51 +木 51 +酒 51 +倫 50 +土 50 +室 50 +幾 50 +採 50 +早 50 +模 50 +源 50 +百 50 +織 50 +藝 50 +質 50 +錄 50 +項 50 +吉 49 +哈 49 +增 49 +完 49 +投 49 +曲 49 +樓 49 +步 49 +溫 49 +狀 49 +留 49 +置 49 +聖 49 +船 49 +規 49 +I 48 +T 48 +升 48 +塔 48 +宣 48 +沙 48 +網 48 +護 48 +讓 48 +邊 48 +除 48 +飛 48 +伯 47 +價 47 +率 47 +福 47 +素 47 +試 47 +遠 47 +響 47 +D 46 +九 46 +半 46 +卻 46 +圍 46 +宗 46 +席 46 +底 46 +濟 46 +考 46 +越 46 +農 46 +依 45 +寫 45 +射 45 +庫 45 +復 45 +承 45 +望 45 +殺 45 +確 45 +繼 45 +魯 45 +候 44 +備 44 +兵 44 +友 44 +姆 44 +才 44 +控 44 +止 44 +盟 44 +看 44 +督 44 +續 44 +習 44 +落 44 +藏 44 +鄉 44 +頓 44 +G 43 +N 43 +R 43 +例 43 +係 43 +典 43 +恩 43 +房 43 +瓦 43 +紅 43 +衛 43 +象 43 +住 42 +做 42 +六 42 +寺 42 +敗 42 +洋 42 +銀 42 +勢 41 +抗 41 +瑞 41 +群 41 +識 41 +雖 41 +麗 41 +今 40 +俄 40 +周 40 +塞 40 +滿 40 +甚 40 +田 40 +紐 40 +絕 40 +貝 40 +億 39 +告 39 +息 39 +毛 39 +測 39 +澳 39 +獎 39 +移 39 +費 39 +限 39 +驗 39 +魚 39 +E 38 +k 38 +久 38 +唱 38 +堡 38 +守 38 +必 38 +排 38 +旗 38 +索 38 +降 38 +預 38 +丹 37 +效 37 +施 37 +極 37 +況 37 +泰 37 +細 37 +艦 37 +雙 37 +雷 37 +青 37 +七 36 +察 36 +屆 36 +庭 36 +康 36 +慶 36 +換 36 +擁 36 +氏 36 +準 36 +童 36 +邦 36 +F 35 +H 35 +v 35 +何 35 +冠 35 +勒 35 +壓 35 +念 35 +短 35 +碼 35 +突 35 +述 35 +逐 35 +附 35 +顯 35 +黎 35 +傷 34 +富 34 +批 34 +景 34 +讀 34 +值 33 +停 33 +僅 33 +破 33 +胡 33 +舊 33 +走 33 +遺 33 +鏡 33 +難 33 +韓 33 +鬥 33 +L 32 +b 32 +勞 32 +善 32 +奇 32 +拔 32 +植 32 +橋 32 +烈 32 +異 32 +策 32 +算 32 +耳 32 +艾 32 +萊 32 +貓 32 +退 32 +雲 32 +革 32 +頻 32 +養 32 +餘 32 +互 31 +嚴 31 +埃 31 +央 31 +屋 31 +春 31 +架 31 +津 31 +瑪 31 +略 31 +範 31 +腦 31 +莫 31 +葉 31 +警 31 +距 31 +遭 31 +配 31 +鮮 31 +麼 31 +黃 31 +丁 30 +乘 30 +川 30 +店 30 +役 30 +托 30 +拿 30 +斷 30 +監 30 +票 30 +靈 30 +駐 30 +麥 30 +齒 30 +寶 29 +差 29 +快 29 +映 29 +油 29 +盛 29 +藥 29 +血 29 +請 29 +證 29 +迪 29 +防 29 +須 29 +似 28 +判 28 +唐 28 +婚 28 +找 28 +消 28 +爆 28 +益 28 +禮 28 +肉 28 +菲 28 +輯 28 +亡 27 +併 27 +優 27 +冰 27 +切 27 +售 27 +唯 27 +啟 27 +執 27 +幫 27 +廷 27 +弗 27 +徵 27 +材 27 +森 27 +永 27 +登 27 +臨 27 +覺 27 +送 27 +J 26 +予 26 +介 26 +免 26 +劉 26 +宮 26 +封 26 +左 26 +延 26 +朗 26 +漸 26 +照 26 +牛 26 +畢 26 +簡 26 +署 26 +股 26 +良 26 +詞 26 +財 26 +貨 26 +趙 26 +輸 26 +隆 26 +階 26 +險 26 +雄 26 +雜 26 +企 25 +刻 25 +副 25 +右 25 +堅 25 +夠 25 +妻 25 +徒 25 +懷 25 +戶 25 +束 25 +榮 25 +歸 25 +滅 25 +烏 25 +猶 25 +療 25 +秘 25 +租 25 +聚 25 +聞 25 +靠 25 +頂 25 +K 24 +O 24 +仁 24 +侵 24 +吃 24 +吸 24 +夏 24 +尾 24 +廳 24 +授 24 +救 24 +柏 24 +毒 24 +炸 24 +獻 24 +玩 24 +珠 24 +甲 24 +章 24 +端 24 +藍 24 +詩 24 +貴 24 +輕 24 +順 24 +願 24 +f 23 +側 23 +冷 23 +址 23 +害 23 +尋 23 +帕 23 +廟 23 +弟 23 +摩 23 +擴 23 +槍 23 +漫 23 +甘 23 +皮 23 +秀 23 +胞 23 +臘 23 +芬 23 +草 23 +訊 23 +誌 23 +諾 23 +遇 23 +避 23 +鄭 23 +w 22 +假 22 +充 22 +固 22 +坡 22 +寬 22 +岸 22 +彈 22 +微 22 +恐 22 +旅 22 +松 22 +梅 22 +泛 22 +激 22 +疾 22 +眼 22 +祖 22 +篇 22 +蓋 22 +輛 22 +迷 22 +透 22 +陳 22 +鳥 22 +' 21 +凱 21 +劍 21 +勝 21 +叫 21 +君 21 +暴 21 +樹 21 +減 21 +湯 21 +潛 21 +爵 21 +缺 21 +臺 21 +蒂 21 +虎 21 +賓 21 +適 21 +雪 21 +W 20 +乎 20 +什 20 +召 20 +含 20 +喬 20 +奪 20 +妃 20 +孫 20 +序 20 +廈 20 +揮 20 +援 20 +攝 20 +暗 20 +杜 20 +染 20 +檢 20 +毀 20 +池 20 +注 20 +牌 20 +牧 20 +練 20 +罪 20 +若 20 +譯 20 +豐 20 +買 20 +跟 20 +軌 20 +載 20 +追 20 +逝 20 +鄰 20 +銅 20 +陣 20 +隻 20 +/ 19 +V 19 +坦 19 +奏 19 +孩 19 +廠 19 +恆 19 +拒 19 +敵 19 +昌 19 +晚 19 +某 19 +梁 19 +污 19 +涉 19 +熊 19 +秦 19 +綫 19 +艇 19 +補 19 +訂 19 +討 19 +譽 19 +赫 19 +辛 19 +遷 19 +魔 19 +默 19 +你 18 +刺 18 +危 18 +否 18 +呼 18 +哲 18 +喜 18 +嘉 18 +奴 18 +妹 18 +審 18 +尚 18 +尤 18 +幣 18 +待 18 +志 18 +擇 18 +末 18 +桑 18 +汽 18 +澤 18 +競 18 +簽 18 +籍 18 +董 18 +藤 18 +襲 18 +談 18 +辭 18 +逃 18 +遣 18 +郡 18 +酸 18 +釋 18 +野 18 +銷 18 +陵 18 +顆 18 +額 18 +亂 17 +俗 17 +剛 17 +塑 17 +墨 17 +壯 17 +奉 17 +寧 17 +巨 17 +幼 17 +捕 17 +掌 17 +探 17 +損 17 +殖 17 +氧 17 +液 17 +症 17 +秋 17 +粒 17 +絲 17 +耶 17 +茲 17 +莉 17 +莎 17 +藉 17 +衝 17 +訴 17 +誤 17 +購 17 +錦 17 +隸 17 +仙 16 +儀 16 +后 16 +噸 16 +圈 16 +夢 16 +姐 16 +宇 16 +宋 16 +彩 16 +徑 16 +扎 16 +抵 16 +拍 16 +晉 16 +板 16 +柯 16 +棲 16 +楚 16 +款 16 +歡 16 +殿 16 +沿 16 +浦 16 +游 16 +牠 16 +穆 16 +穿 16 +符 16 +綱 16 +羊 16 +苦 16 +茶 16 +虛 16 +衣 16 +複 16 +課 16 +輔 16 +靜 16 +U 15 +亮 15 +休 15 +佈 15 +佐 15 +傑 15 +兼 15 +墓 15 +套 15 +幹 15 +廉 15 +戀 15 +戴 15 +拜 15 +描 15 +散 15 +敦 15 +替 15 +楊 15 +泉 15 +獄 15 +私 15 +純 15 +緊 15 +繪 15 +羽 15 +翻 15 +聽 15 +臣 15 +舍 15 +荷 15 +融 15 +訓 15 +訪 15 +貢 15 +賣 15 +輪 15 +遍 15 +郎 15 +郵 15 +x 14 +亨 14 +享 14 +仔 14 +仰 14 +俱 14 +健 14 +匈 14 +圓 14 +姓 14 +宿 14 +岩 14 +幻 14 +廢 14 +怪 14 +慈 14 +招 14 +按 14 +擬 14 +暫 14 +桃 14 +橫 14 +檔 14 +殊 14 +沃 14 +潮 14 +燃 14 +牆 14 +獵 14 +珍 14 +疑 14 +禁 14 +糖 14 +背 14 +脈 14 +葡 14 +蓮 14 +諸 14 +謀 14 +講 14 +返 14 +錯 14 +鐘 14 +阻 14 +霍 14 +韋 14 +飲 14 +飾 14 +餐 14 +骨 14 +鳳 14 +麻 14 +乾 13 +付 13 +伸 13 +傅 13 +刊 13 +劑 13 +壁 13 +娛 13 +峰 13 +干 13 +弱 13 +惡 13 +揚 13 +撤 13 +操 13 +智 13 +朱 13 +氯 13 +泊 13 +洪 13 +混 13 +災 13 +煙 13 +玉 13 +盃 13 +礎 13 +竹 13 +紙 13 +緩 13 +繞 13 +觸 13 +診 13 +誕 13 +趣 13 +跑 13 +迫 13 +郭 13 +閣 13 +陷 13 +障 13 +雨 13 +騎 13 +齊 13 +丘 12 +佳 12 +偏 12 +儒 12 +儘 12 +匯 12 +午 12 +占 12 +坐 12 +培 12 +夜 12 +媒 12 +宏 12 +征 12 +怖 12 +急 12 +患 12 +惠 12 +拆 12 +敏 12 +杭 12 +梯 12 +棄 12 +浙 12 +溪 12 +濱 12 +犯 12 +琴 12 +申 12 +穩 12 +籃 12 +紹 12 +綜 12 +緣 12 +繁 12 +罕 12 +肯 12 +脅 12 +舞 12 +萄 12 +謝 12 +譜 12 +谷 12 +貫 12 +賀 12 +賈 12 +賞 12 +赤 12 +軟 12 +邨 12 +隱 12 +雅 12 +雕 12 +零 12 +顧 12 +鼠 12 +? 11 +乙 11 +促 11 +冬 11 +努 11 +勳 11 +呈 11 +味 11 +困 11 +填 11 +壞 11 +孔 11 +尺 11 +忠 11 +掉 11 +晶 11 +曹 11 +泥 11 +濕 11 +獸 11 +玄 11 +珊 11 +礙 11 +綠 11 +翼 11 +耕 11 +葛 11 +蔡 11 +蛇 11 +覽 11 +託 11 +迅 11 +途 11 +週 11 +鋒 11 +鋼 11 +龐 11 +z 10 +乃 10 +伍 10 +伴 10 +冊 10 +勵 10 +吳 10 +塊 10 +婦 10 +宅 10 +尊 10 +幕 10 +彼 10 +徐 10 +恢 10 +悲 10 +慕 10 +慢 10 +搜 10 +旁 10 +曉 10 +朋 10 +枚 10 +棒 10 +樞 10 +殘 10 +毫 10 +洗 10 +洞 10 +溝 10 +滑 10 +潘 10 +濃 10 +炎 10 +燒 10 +瓷 10 +礁 10 +紛 10 +績 10 +翠 10 +翰 10 +腳 10 +膠 10 +莊 10 +袖 10 +裁 10 +賴 10 +輻 10 +迎 10 +遜 10 +醒 10 +銘 10 +錫 10 +鍵 10 +隔 10 +隧 10 +震 10 +顏 10 +髮 10 +鹿 10 +井 9 +佩 9 +借 9 +傾 9 +准 9 +刑 9 +叛 9 +句 9 +呂 9 +孕 9 +孟 9 +宜 9 +寵 9 +尖 9 +岳 9 +嶼 9 +巧 9 +幅 9 +幸 9 +徹 9 +戈 9 +拓 9 +捷 9 +掛 9 +擎 9 +敘 9 +昭 9 +枝 9 +棍 9 +概 9 +欣 9 +沖 9 +灰 9 +熟 9 +狼 9 +甸 9 +疫 9 +盜 9 +盤 9 +碳 9 +礦 9 +筆 9 +籌 9 +紋 9 +緬 9 +縱 9 +繫 9 +肅 9 +脫 9 +苗 9 +茨 9 +葬 9 +裔 9 +覆 9 +讚 9 +豬 9 +貿 9 +贊 9 +贏 9 +轟 9 +迴 9 +霸 9 +驅 9 +驚 9 +鯨 9 +鰭 9 +鴻 9 +Y 8 +丈 8 +乏 8 +伏 8 +伐 8 +估 8 +倒 8 +兄 8 +兆 8 +刀 8 +勁 8 +勇 8 +嘗 8 +埔 8 +塘 8 +壽 8 +娃 8 +娜 8 +媽 8 +孤 8 +屯 8 +峽 8 +床 8 +彭 8 +御 8 +慮 8 +憲 8 +懸 8 +截 8 +振 8 +捐 8 +搶 8 +斂 8 +昆 8 +杯 8 +析 8 +柔 8 +柱 8 +柳 8 +柴 8 +棋 8 +榜 8 +氫 8 +沉 8 +浪 8 +渡 8 +滾 8 +潭 8 +瀋 8 +灘 8 +犬 8 +狂 8 +琉 8 +町 8 +皆 8 +盡 8 +盧 8 +睡 8 +碑 8 +糧 8 +翌 8 +聘 8 +肖 8 +芝 8 +荒 8 +蓉 8 +裂 8 +註 8 +詢 8 +賢 8 +趾 8 +跡 8 +跨 8 +跳 8 +迦 8 +遼 8 +邀 8 +鄧 8 +針 8 +錢 8 +閉 8 +露 8 +頜 8 +駛 8 +~ 7 +俘 7 +倍 7 +偉 7 +偶 7 +匹 7 +卑 7 +呎 7 +喀 7 +喇 7 +壇 7 +壘 7 +奈 7 +契 7 +婆 7 +崖 7 +弓 7 +彌 7 +徽 7 +忙 7 +忽 7 +怡 7 +惱 7 +憶 7 +拖 7 +握 7 +搬 7 +撒 7 +撞 7 +擾 7 +斐 7 +旋 7 +既 7 +晨 7 +暖 7 +杉 7 +欖 7 +泳 7 +浮 7 +滬 7 +潔 7 +炮 7 +爪 7 +爬 7 +狐 7 +瑟 7 +癌 7 +硬 7 +碎 7 +磨 7 +祭 7 +稻 7 +籤 7 +粹 7 +緒 7 +胎 7 +舒 7 +艱 7 +蒸 7 +蔣 7 +薦 7 +藩 7 +蛋 7 +衡 7 +誠 7 +諷 7 +趨 7 +躲 7 +違 7 +邏 7 +邱 7 +采 7 +銜 7 +陰 7 +陶 7 +馮 7 +鬆 7 +鬼 7 +魏 7 +鼎 7 +鼓 7 +° 6 +乳 6 +仿 6 +伽 6 +侯 6 +允 6 +冕 6 +凡 6 +劫 6 +勃 6 +募 6 +卿 6 +厚 6 +咖 6 +咸 6 +啡 6 +喚 6 +噴 6 +圳 6 +坊 6 +埋 6 +奮 6 +奶 6 +妖 6 +妥 6 +妮 6 +娘 6 +嫁 6 +嬌 6 +嬴 6 +孜 6 +宙 6 +尉 6 +屍 6 +岡 6 +崇 6 +崎 6 +崗 6 +嶺 6 +巡 6 +帽 6 +弄 6 +怒 6 +惜 6 +慣 6 +扶 6 +折 6 +抽 6 +挖 6 +挪 6 +捉 6 +措 6 +搖 6 +擋 6 +擠 6 +敬 6 +斥 6 +旦 6 +栽 6 +棉 6 +汁 6 +汗 6 +汞 6 +泡 6 +涯 6 +淡 6 +淮 6 +漂 6 +煮 6 +爽 6 +猛 6 +猴 6 +玻 6 +琳 6 +璃 6 +甜 6 +痛 6 +硫 6 +祝 6 +禦 6 +稿 6 +窯 6 +箭 6 +粵 6 +絡 6 +縮 6 +翔 6 +翡 6 +蓬 6 +蕭 6 +薪 6 +蝕 6 +衷 6 +袁 6 +袋 6 +裏 6 +諮 6 +豪 6 +貞 6 +貪 6 +赴 6 +軸 6 +輿 6 +遮 6 +遲 6 +邵 6 +郊 6 +醇 6 +鍋 6 +鎖 6 +鑒 6 +閘 6 +陀 6 +雌 6 +雞 6 +霖 6 +頒 6 +頗 6 +飼 6 +鮑 6 +鯉 6 +鳴 6 +X 5 +『 5 +于 5 +亥 5 +俊 5 +倉 5 +偷 5 +偽 5 +僧 5 +僱 5 +償 5 +儲 5 +凌 5 +券 5 +削 5 +剩 5 +割 5 +勾 5 +卜 5 +叔 5 +吏 5 +吾 5 +喪 5 +喻 5 +嘛 5 +堆 5 +墅 5 +墜 5 +墟 5 +夕 5 +姑 5 +姻 5 +嫌 5 +宰 5 +屠 5 +崔 5 +幽 5 +廂 5 +廊 5 +廖 5 +廚 5 +彙 5 +循 5 +忍 5 +怎 5 +愈 5 +慘 5 +憑 5 +憤 5 +戒 5 +抒 5 +抓 5 +抱 5 +挺 5 +掠 5 +搞 5 +搭 5 +摘 5 +斑 5 +旺 5 +栓 5 +梨 5 +棕 5 +欲 5 +欽 5 +殼 5 +沈 5 +沼 5 +涅 5 +涌 5 +淄 5 +淘 5 +湘 5 +溶 5 +滕 5 +漁 5 +燥 5 +狗 5 +狩 5 +獅 5 +玲 5 +珀 5 +瑚 5 +瓊 5 +畜 5 +疆 5 +碧 5 +磁 5 +祥 5 +祿 5 +窄 5 +笑 5 +纖 5 +胸 5 +腹 5 +舌 5 +艘 5 +芭 5 +苑 5 +苯 5 +范 5 +菜 5 +蟲 5 +裕 5 +詹 5 +謙 5 +豫 5 +贈 5 +輝 5 +辯 5 +逼 5 +遂 5 +郗 5 +酵 5 +醉 5 +釀 5 +鋪 5 +鎊 5 +鑑 5 +闢 5 +陝 5 +鹼 5 +鹽 5 +麟 5 +黛 5 +齡 5 +$ 4 +j 4 +』 4 +串 4 +亭 4 +伺 4 +侍 4 +倖 4 +倡 4 +僚 4 +兇 4 +函 4 +劣 4 +勤 4 +厘 4 +吞 4 +吧 4 +呢 4 +喙 4 +喝 4 +嘲 4 +坎 4 +夥 4 +姊 4 +姬 4 +娶 4 +宛 4 +宴 4 +寄 4 +尹 4 +屈 4 +崙 4 +崩 4 +巫 4 +帥 4 +庸 4 +弘 4 +弦 4 +彰 4 +恨 4 +悅 4 +悉 4 +慧 4 +扮 4 +拾 4 +挑 4 +插 4 +撥 4 +撰 4 +擺 4 +敖 4 +旨 4 +昏 4 +暨 4 +朔 4 +杰 4 +橄 4 +橡 4 +檸 4 +櫃 4 +歇 4 +歧 4 +殉 4 +浩 4 +浴 4 +浸 4 +淨 4 +渝 4 +滋 4 +滙 4 +澱 4 +焦 4 +煉 4 +煩 4 +熏 4 +熙 4 +燕 4 +爲 4 +牟 4 +狄 4 +狙 4 +狸 4 +狹 4 +璋 4 +畝 4 +瘋 4 +盾 4 +眉 4 +睦 4 +睿 4 +瞭 4 +矛 4 +矩 4 +矮 4 +砂 4 +碟 4 +磯 4 +秒 4 +窗 4 +窟 4 +竟 4 +笨 4 +答 4 +箱 4 +篡 4 +簧 4 +粉 4 +糕 4 +緻 4 +罰 4 +罷 4 +罹 4 +翁 4 +耀 4 +耐 4 +聊 4 +肥 4 +胺 4 +腔 4 +腸 4 +膚 4 +膜 4 +膝 4 +膽 4 +臉 4 +艙 4 +茂 4 +茵 4 +莽 4 +菌 4 +蓄 4 +蘋 4 +蛛 4 +蜂 4 +蜘 4 +襄 4 +誰 4 +誼 4 +謠 4 +豹 4 +貧 4 +賜 4 +賠 4 +賦 4 +踢 4 +辟 4 +辨 4 +辰 4 +辱 4 +迹 4 +逢 4 +逾 4 +遞 4 +邑 4 +邪 4 +鄂 4 +酷 4 +鈞 4 +錶 4 +鍊 4 +鍾 4 +鎳 4 +鑼 4 +閒 4 +閥 4 +闆 4 +闊 4 +隋 4 +隕 4 +靖 4 +鞏 4 +頸 4 +颱 4 +飯 4 +駕 4 +騙 4 +鬧 4 +鱗 4 +麓 4 +! 3 +Q 3 +Z 3 +q 3 +─ 3 +・ 3 +丟 3 +仇 3 +侈 3 +俠 3 +倆 3 +倪 3 +偵 3 +催 3 +傭 3 +債 3 +僻 3 +儂 3 +冒 3 +冥 3 +凈 3 +凍 3 +刷 3 +卧 3 +厥 3 +厭 3 +吐 3 +吹 3 +咬 3 +哪 3 +哺 3 +喉 3 +嚇 3 +囑 3 +囚 3 +坂 3 +坪 3 +堪 3 +墊 3 +墾 3 +壤 3 +夷 3 +夸 3 +夾 3 +奔 3 +奢 3 +妨 3 +姚 3 +姿 3 +嬰 3 +孝 3 +寒 3 +尻 3 +屏 3 +履 3 +岐 3 +嵌 3 +巷 3 +帳 3 +庄 3 +庾 3 +廿 3 +彥 3 +彬 3 +徙 3 +忒 3 +忘 3 +怕 3 +恥 3 +悟 3 +悠 3 +愉 3 +愚 3 +慌 3 +慎 3 +慾 3 +憂 3 +懲 3 +懶 3 +扭 3 +抄 3 +拯 3 +拳 3 +拷 3 +拼 3 +挽 3 +掘 3 +揭 3 +撐 3 +撫 3 +撲 3 +擒 3 +擦 3 +敕 3 +斜 3 +旬 3 +昂 3 +昇 3 +昔 3 +昧 3 +暑 3 +曆 3 +朴 3 +杏 3 +柬 3 +栃 3 +株 3 +桓 3 +楠 3 +楷 3 +槓 3 +樊 3 +樸 3 +檬 3 +欺 3 +殯 3 +毅 3 +毓 3 +毗 3 +氨 3 +氮 3 +汝 3 +汪 3 +汰 3 +沫 3 +洩 3 +涮 3 +淋 3 +淪 3 +淹 3 +淺 3 +添 3 +溥 3 +滉 3 +滯 3 +漏 3 +漳 3 +潰 3 +澄 3 +瀏 3 +烯 3 +焚 3 +煤 3 +煥 3 +燭 3 +爐 3 +爸 3 +爺 3 +琪 3 +瑙 3 +瑜 3 +瓜 3 +瓶 3 +痕 3 +盆 3 +盪 3 +眠 3 +睛 3 +矚 3 +砍 3 +磡 3 +磷 3 +祐 3 +禍 3 +禽 3 +稀 3 +稅 3 +穌 3 +窮 3 +粗 3 +紡 3 +紮 3 +累 3 +絞 3 +綺 3 +縫 3 +繩 3 +纜 3 +羌 3 +聰 3 +肆 3 +肝 3 +肢 3 +肺 3 +胃 3 +胖 3 +胚 3 +脂 3 +腓 3 +腺 3 +腿 3 +臟 3 +臭 3 +臼 3 +舅 3 +芳 3 +芽 3 +茅 3 +莆 3 +菊 3 +葵 3 +蔭 3 +蕉 3 +蕩 3 +薄 3 +薇 3 +蘆 3 +虔 3 +虜 3 +蜀 3 +螺 3 +蟒 3 +蠟 3 +蠶 3 +蠻 3 +衍 3 +衙 3 +衰 3 +褂 3 +褐 3 +褲 3 +詳 3 +誘 3 +諧 3 +謂 3 +謎 3 +謹 3 +貂 3 +貌 3 +販 3 +貼 3 +賃 3 +趁 3 +趕 3 +跌 3 +蹈 3 +蹟 3 +辜 3 +逆 3 +逮 3 +邁 3 +酗 3 +釉 3 +釜 3 +銠 3 +銳 3 +鐸 3 +鑽 3 +閃 3 +閑 3 +閱 3 +阜 3 +阪 3 +隙 3 +雍 3 +霜 3 +鞋 3 +韻 3 +頌 3 +餃 3 +餅 3 +餓 3 +饑 3 +饒 3 +驟 3 +魅 3 +鴿 3 +鵝 3 +麵 3 += 2 +` 2 +​ 2 +‧ 2 +〈 2 +〉 2 +丐 2 +丞 2 +仲 2 +俸 2 +倚 2 +倭 2 +傍 2 +傲 2 +僕 2 +兔 2 +兢 2 +冤 2 +凝 2 +凰 2 +凸 2 +划 2 +剝 2 +剿 2 +勸 2 +匕 2 +匪 2 +匾 2 +卉 2 +卓 2 +卦 2 +卷 2 +厄 2 +叢 2 +吊 2 +吋 2 +吵 2 +吻 2 +呆 2 +咀 2 +咎 2 +咐 2 +哇 2 +唸 2 +喊 2 +嗜 2 +嗣 2 +嘆 2 +嘴 2 +噁 2 +噪 2 +嚨 2 +圭 2 +坤 2 +垮 2 +堯 2 +塍 2 +塗 2 +塵 2 +墮 2 +壩 2 +壺 2 +奎 2 +奕 2 +妒 2 +妙 2 +妝 2 +姜 2 +姥 2 +婁 2 +婭 2 +婷 2 +媚 2 +嫉 2 +嫻 2 +孵 2 +寓 2 +寢 2 +寨 2 +尿 2 +峙 2 +嶽 2 +帆 2 +帛 2 +并 2 +弈 2 +弊 2 +弧 2 +彗 2 +忌 2 +恰 2 +悼 2 +惑 2 +惟 2 +憎 2 +憐 2 +憩 2 +懊 2 +懼 2 +戍 2 +戚 2 +扣 2 +抑 2 +披 2 +拋 2 +拱 2 +掩 2 +摸 2 +撈 2 +撿 2 +擅 2 +敲 2 +斗 2 +旭 2 +暢 2 +曝 2 +曬 2 +札 2 +柄 2 +栗 2 +栩 2 +桂 2 +框 2 +桌 2 +桐 2 +械 2 +梳 2 +棟 2 +棣 2 +椎 2 +楓 2 +榆 2 +榔 2 +槽 2 +樁 2 +樟 2 +橙 2 +欄 2 +氦 2 +洵 2 +浚 2 +涇 2 +淑 2 +淳 2 +渭 2 +湛 2 +溯 2 +滄 2 +滸 2 +漠 2 +漲 2 +漿 2 +潑 2 +潟 2 +濤 2 +濫 2 +濾 2 +灌 2 +熔 2 +燈 2 +牲 2 +牽 2 +犀 2 +猜 2 +琅 2 +琦 2 +瑣 2 +瑩 2 +璘 2 +甥 2 +甦 2 +畔 2 +番 2 +疊 2 +疏 2 +疲 2 +疽 2 +瘤 2 +皓 2 +盈 2 +盔 2 +眈 2 +眷 2 +砲 2 +硅 2 +硝 2 +碘 2 +碩 2 +磅 2 +磺 2 +祀 2 +祂 2 +祕 2 +祺 2 +禪 2 +稽 2 +穀 2 +穫 2 +穴 2 +窘 2 +竣 2 +笛 2 +筍 2 +筐 2 +箏 2 +簿 2 +籠 2 +籲 2 +糞 2 +糾 2 +紓 2 +綁 2 +綢 2 +綴 2 +綽 2 +縉 2 +縛 2 +繚 2 +繳 2 +羚 2 +羞 2 +翦 2 +肌 2 +肚 2 +腎 2 +腐 2 +臥 2 +舟 2 +艷 2 +芮 2 +苔 2 +苣 2 +茄 2 +茸 2 +荃 2 +菁 2 +菩 2 +菸 2 +萌 2 +葚 2 +蒐 2 +蒜 2 +蒲 2 +蓓 2 +蔬 2 +蔽 2 +蕪 2 +蕾 2 +藻 2 +虐 2 +虢 2 +虹 2 +蜜 2 +蜥 2 +蝸 2 +蟹 2 +裙 2 +裴 2 +裸 2 +詮 2 +誇 2 +誓 2 +謊 2 +豆 2 +豎 2 +賄 2 +賤 2 +賭 2 +賺 2 +贖 2 +踏 2 +蹤 2 +躁 2 +躍 2 +輟 2 +轅 2 +轎 2 +辣 2 +迢 2 +遙 2 +遴 2 +遵 2 +郷 2 +酪 2 +醜 2 +醺 2 +銹 2 +鋁 2 +鋅 2 +錛 2 +錠 2 +錳 2 +鏈 2 +鑄 2 +閏 2 +閩 2 +閻 2 +阱 2 +陪 2 +霧 2 +霾 2 +靡 2 +韌 2 +顎 2 +顛 2 +飢 2 +飽 2 +餾 2 +駁 2 +騰 2 +髓 2 +髖 2 +髻 2 +鬚 2 +魁 2 +鯛 2 +鰂 2 +鱸 2 +鵰 2 +鶴 2 +鷹 2 +麴 2 +黔 2 +黜 2 +齋 2 +齧 2 ++ 1 +á 1 +é 1 +ð 1 +ö 1 +þ 1 +ō 1 +̄ 1 +θ 1 +〔 1 +〕 1 +丕 1 +丙 1 +丸 1 +乞 1 +仕 1 +仗 1 +伎 1 +伙 1 +伶 1 +佑 1 +佗 1 +佬 1 +侄 1 +侏 1 +侖 1 +侮 1 +侶 1 +俏 1 +俚 1 +俯 1 +俾 1 +倩 1 +倬 1 +傀 1 +傢 1 +傻 1 +僑 1 +僖 1 +僵 1 +儡 1 +儷 1 +兌 1 +冀 1 +冉 1 +冢 1 +凄 1 +凊 1 +凳 1 +凶 1 +凹 1 +刃 1 +刪 1 +刮 1 +剋 1 +剌 1 +剪 1 +剷 1 +効 1 +劾 1 +勉 1 +勘 1 +勲 1 +勺 1 +勻 1 +匙 1 +匡 1 +卵 1 +卸 1 +叡 1 +吁 1 +吟 1 +吩 1 +呀 1 +呔 1 +咒 1 +咧 1 +咪 1 +哀 1 +哨 1 +哭 1 +唇 1 +唾 1 +啄 1 +啊 1 +啤 1 +喃 1 +喘 1 +嗅 1 +嗎 1 +嘈 1 +嘎 1 +嘔 1 +嘩 1 +嘯 1 +噶 1 +嚮 1 +囊 1 +囒 1 +囪 1 +坍 1 +坑 1 +坮 1 +垣 1 +埜 1 +埠 1 +埤 1 +堈 1 +堊 1 +堤 1 +堵 1 +塚 1 +塢 1 +墳 1 +壑 1 +壹 1 +奬 1 +奸 1 +妄 1 +妊 1 +妳 1 +姦 1 +姪 1 +娠 1 +婢 1 +婪 1 +嫘 1 +嫣 1 +孚 1 +孛 1 +孺 1 +宦 1 +寅 1 +寇 1 +寡 1 +寮 1 +寰 1 +寸 1 +尬 1 +尷 1 +岑 1 +岔 1 +岷 1 +峨 1 +峪 1 +峯 1 +崞 1 +嵩 1 +巔 1 +巢 1 +巳 1 +巾 1 +幀 1 +幌 1 +幟 1 +幢 1 +幪 1 +庇 1 +庚 1 +庵 1 +廓 1 +廝 1 +廬 1 +弭 1 +彎 1 +彝 1 +彪 1 +彷 1 +彿 1 +怨 1 +恣 1 +恤 1 +恭 1 +悍 1 +悔 1 +悖 1 +您 1 +悶 1 +惇 1 +愁 1 +愙 1 +愧 1 +愨 1 +慚 1 +慨 1 +慰 1 +慷 1 +懂 1 +懿 1 +戌 1 +戟 1 +扁 1 +扈 1 +扔 1 +扼 1 +抨 1 +抬 1 +押 1 +拌 1 +拏 1 +拙 1 +挫 1 +挹 1 +挾 1 +捍 1 +捨 1 +捲 1 +掙 1 +搏 1 +摑 1 +摒 1 +摔 1 +摧 1 +摯 1 +摹 1 +撓 1 +撘 1 +撮 1 +擂 1 +擢 1 +擱 1 +攀 1 +攔 1 +攜 1 +攣 1 +攤 1 +攪 1 +攸 1 +敉 1 +敞 1 +敢 1 +斃 1 +斌 1 +斤 1 +斧 1 +斬 1 +旱 1 +旻 1 +昨 1 +晃 1 +晒 1 +晤 1 +晴 1 +暇 1 +暮 1 +暱 1 +暹 1 +曄 1 +曖 1 +曧 1 +曷 1 +朵 1 +朽 1 +杖 1 +杞 1 +枸 1 +柚 1 +柝 1 +柢 1 +柩 1 +査 1 +柿 1 +桿 1 +梓 1 +梧 1 +梭 1 +梵 1 +棵 1 +椅 1 +椰 1 +楂 1 +楞 1 +榕 1 +榨 1 +榫 1 +榴 1 +槌 1 +槳 1 +樑 1 +橈 1 +橢 1 +檎 1 +檐 1 +檜 1 +檨 1 +檯 1 +檳 1 +櫟 1 +櫾 1 +歆 1 +歉 1 +歩 1 +殃 1 +殆 1 +殲 1 +毎 1 +毯 1 +氈 1 +氘 1 +氚 1 +汀 1 +汲 1 +汶 1 +沌 1 +沢 1 +沽 1 +沾 1 +泌 1 +泗 1 +泠 1 +洒 1 +浜 1 +涪 1 +涵 1 +淀 1 +淚 1 +淫 1 +淵 1 +渚 1 +渠 1 +渣 1 +渤 1 +渦 1 +渴 1 +渾 1 +湄 1 +湧 1 +湮 1 +溢 1 +滲 1 +滴 1 +漆 1 +漬 1 +漱 1 +漶 1 +漾 1 +潢 1 +澀 1 +濁 1 +濠 1 +瀆 1 +瀑 1 +瀕 1 +瀘 1 +瀝 1 +瀟 1 +灶 1 +灼 1 +炘 1 +炙 1 +炭 1 +烤 1 +烴 1 +烷 1 +烹 1 +焊 1 +焗 1 +焜 1 +煌 1 +煜 1 +煦 1 +煽 1 +熄 1 +熾 1 +燁 1 +燄 1 +燦 1 +燾 1 +爛 1 +牘 1 +牡 1 +犁 1 +犧 1 +狡 1 +猝 1 +猾 1 +猿 1 +玕 1 +玖 1 +玫 1 +玷 1 +琬 1 +琺 1 +瑋 1 +瑛 1 +瑰 1 +瑾 1 +璣 1 +瓘 1 +瓣 1 +甄 1 +甌 1 +甫 1 +甬 1 +畏 1 +畹 1 +疇 1 +疙 1 +疹 1 +疼 1 +痙 1 +痢 1 +痰 1 +痹 1 +瘦 1 +瘧 1 +瘩 1 +癖 1 +癤 1 +癥 1 +癮 1 +皈 1 +皋 1 +皖 1 +皰 1 +盒 1 +眯 1 +睞 1 +睹 1 +睾 1 +瞪 1 +瞬 1 +瞰 1 +矗 1 +矢 1 +砒 1 +砝 1 +碁 1 +碰 1 +磐 1 +磚 1 +祁 1 +祈 1 +祠 1 +禕 1 +禛 1 +禱 1 +秉 1 +秩 1 +稍 1 +稗 1 +稚 1 +稼 1 +穗 1 +穹 1 +窩 1 +竄 1 +竊 1 +竭 1 +竿 1 +笳 1 +筒 1 +箔 1 +箬 1 +箴 1 +篤 1 +粦 1 +粽 1 +糟 1 +紂 1 +紈 1 +紊 1 +紗 1 +紜 1 +紳 1 +紺 1 +絨 1 +絶 1 +綉 1 +綏 1 +綿 1 +緋 1 +緝 1 +締 1 +緯 1 +緹 1 +縈 1 +繡 1 +繭 1 +繹 1 +繽 1 +纂 1 +纏 1 +缽 1 +罵 1 +羨 1 +羯 1 +羱 1 +羲 1 +翟 1 +耗 1 +耽 1 +聆 1 +聳 1 +聶 1 +聾 1 +肇 1 +肘 1 +肩 1 +脆 1 +脊 1 +脱 1 +脹 1 +脾 1 +腥 1 +腫 1 +膀 1 +膨 1 +膳 1 +臂 1 +臍 1 +臧 1 +臿 1 +舀 1 +舖 1 +舜 1 +芘 1 +芻 1 +苛 1 +苟 1 +苷 1 +茜 1 +荊 1 +荔 1 +莖 1 +菅 1 +菱 1 +萎 1 +葆 1 +葫 1 +葯 1 +葺 1 +蒼 1 +蔑 1 +蔥 1 +蕙 1 +蕨 1 +薔 1 +薛 1 +薺 1 +蘊 1 +虞 1 +虱 1 +蚊 1 +蚌 1 +蚩 1 +蚺 1 +蛙 1 +蜆 1 +蜒 1 +蜚 1 +蜴 1 +蜿 1 +蝴 1 +蝶 1 +螞 1 +螢 1 +蟬 1 +蟻 1 +蟾 1 +蠣 1 +衢 1 +衫 1 +袍 1 +袥 1 +袱 1 +裋 1 +裹 1 +褪 1 +襟 1 +襪 1 +覓 1 +覲 1 +訃 1 +訄 1 +訇 1 +訐 1 +訖 1 +訝 1 +訥 1 +詐 1 +詔 1 +詛 1 +詝 1 +詼 1 +誥 1 +誦 1 +誹 1 +諂 1 +諒 1 +諜 1 +諱 1 +諶 1 +諺 1 +謁 1 +謇 1 +謔 1 +謗 1 +謚 1 +譚 1 +譴 1 +豈 1 +豢 1 +貶 1 +貽 1 +賚 1 +賡 1 +賬 1 +赦 1 +趟 1 +跋 1 +踐 1 +踞 1 +躬 1 +軀 1 +軋 1 +軒 1 +輾 1 +轍 1 +辮 1 +逍 1 +逛 1 +逵 1 +逸 1 +遹 1 +邗 1 +邳 1 +邸 1 +郝 1 +郪 1 +郫 1 +鄢 1 +酃 1 +酆 1 +酉 1 +酊 1 +酋 1 +酩 1 +酮 1 +醋 1 +醬 1 +醴 1 +釗 1 +釘 1 +釧 1 +釩 1 +鈇 1 +鈦 1 +鈺 1 +鈾 1 +鉑 1 +鉛 1 +鉤 1 +銎 1 +銓 1 +銨 1 +鋸 1 +錘 1 +鍔 1 +鍛 1 +鍝 1 +鎂 1 +鎰 1 +鏞 1 +鏢 1 +鐳 1 +鑫 1 +鑰 1 +鑿 1 +闍 1 +闖 1 +闡 1 +阡 1 +陂 1 +陌 1 +陛 1 +陞 1 +陡 1 +隍 1 +隠 1 +雇 1 +雉 1 +雎 1 +雯 1 +霆 1 +霞 1 +靴 1 +靶 1 +靼 1 +鞍 1 +鞘 1 +鞦 1 +韃 1 +韆 1 +韶 1 +頁 1 +頃 1 +頑 1 +頡 1 +頰 1 +頹 1 +顥 1 +飈 1 +餉 1 +餡 1 +餮 1 +饃 1 +饕 1 +馳 1 +馴 1 +駅 1 +駙 1 +駿 1 +騁 1 +騏 1 +騷 1 +驢 1 +驤 1 +驥 1 +骯 1 +骷 1 +骸 1 +髏 1 +髒 1 +鬢 1 +鬱 1 +魂 1 +鯽 1 +鰓 1 +鰺 1 +鱂 1 +鱲 1 +鱷 1 +鴛 1 +鴦 1 +鴨 1 +鵑 1 +鵬 1 +鷗 1 +麒 1 +麩 1 +鼐 1 +鼩 1 +鼬 1 +鼻 1 +齲 1 +龜 1 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/char-ngram-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/char-ngram-map new file mode 100644 index 0000000000000000000000000000000000000000..a2af3de4e9653e07980ae366cd26960fbdab3c9e --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/char-ngram-map @@ -0,0 +1,16126 @@ +16125 +99 4963 +中國 218 +.9 156 +9. 156 +美國 131 +開始 118 +可以 115 +公里 110 +人口 108 +使用 102 +日本 95 +平方 93 +沒有 93 +第一 92 +他們 91 +公司 88 +由於 88 +其中 87 +地區 87 +國家 86 +政府 86 +主要 83 +世界 81 +大學 81 +不同 80 +香港 79 +自己 77 +因為 76 +研究 76 +面積 75 +,9 74 +9, 74 +進行 72 +包括 69 +當時 66 +這些 66 +部分 66 +中華 65 +工作 65 +認為 65 +也是 64 +以及 64 +發現 64 +方公 62 +er 61 +an 60 +同時 60 +學院 60 +9% 59 +成立 59 +第二 59 +代表 58 +發展 58 +發生 58 +之後 57 +社會 57 +一些 56 +人民 56 +其他 56 +世紀 54 +建築 53 +為了 53 +獲得 52 +目前 52 +英國 52 +重要 52 +文化 51 +中心 50 +但是 50 +第9 50 +許多 50 +之間 49 +可能 49 +歷史 49 +遊戲 49 +9萬 48 +ar 48 +帝國 48 +期間 48 +音樂 48 +一般 47 +年代 47 +根據 47 +行星 47 +電影 47 +in 46 +政治 46 +組織 46 +鐵路 46 +-- 45 +城市 45 +故事 45 +學校 44 +所有 44 +科學 44 +on 43 +作品 43 +最後 43 +通過 43 +關係 43 +已經 42 +建立 42 +時間 42 +電視 42 +共和 41 +後來 41 +管理 41 +表示 41 +通常 41 +出現 40 +影響 40 +成功 40 +戰爭 40 +提供 40 +系統 40 +動物 39 +地方 39 +就是 39 +德國 39 +設計 39 +負責 39 +國際 38 +技術 38 +方面 38 +最終 38 +父親 38 +車站 38 +上海 37 +人物 37 +台灣 37 +參加 36 +擔任 36 +時期 36 +服務 36 +正式 36 +生活 36 +要求 36 +運動 36 +一直 35 +單位 35 +大利 35 +委員 35 +民國 35 +法國 35 +理論 35 +第三 35 +人類 34 +歐洲 34 +決定 34 +現在 34 +羅馬 34 +航空 34 +行政 34 +足球 34 +雖然 34 +利亞 33 +印度 33 +問題 33 +小說 33 +教育 33 +製作 33 +西班 33 +en 32 +不是 32 +保護 32 +全國 32 +形成 32 +很多 32 +得到 32 +活動 32 +班牙 32 +節目 32 +主義 31 +尼亞 31 +市鎮 31 +方式 31 +時代 31 +最高 31 +需要 31 +al 30 +ri 30 +ro 30 +中央 30 +另外 30 +控制 30 +擁有 30 +產生 30 +經濟 30 +進入 30 +he 29 +ll 29 +or 29 +公園 29 +具有 29 +大陸 29 +接受 29 +球隊 29 +當地 29 +並且 28 +北京 28 +受到 28 +如果 28 +學生 28 +工程 28 +時候 28 +計劃 28 +超過 28 +電腦 28 +9億 27 +ia 27 +存在 27 +對於 27 +情況 27 +戰鬥 27 +方法 27 +機場 27 +比賽 27 +甚至 27 +總統 27 +義大 27 +都是 27 +非常 27 +le 26 +st 26 +人員 26 +原因 26 +國民 26 +支持 26 +然而 26 +獨立 26 +生物 26 +聯合 26 +ne 25 +re 25 +兒子 25 +出版 25 +巴士 25 +我們 25 +海拔 25 +經過 25 +議會 25 +il 24 +li 24 +te 24 +一樣 24 +交通 24 +例如 24 +分布 24 +加入 24 +同年 24 +大量 24 +於是 24 +最大 24 +生產 24 +皇帝 24 +系列 24 +高度 24 +事件 23 +內容 23 +命名 23 +宣布 23 +導致 23 +必須 23 +成員 23 +清朝 23 +演出 23 +直接 23 +紐約 23 +行為 23 +距離 23 +軍事 23 +部隊 23 +銀行 23 +集團 23 +-9 22 +不少 22 +不過 22 +傳統 22 +反對 22 +增加 22 +它們 22 +思想 22 +有關 22 +此外 22 +母親 22 +組成 22 +結構 22 +羅斯 22 +聯盟 22 +聯賽 22 +能力 22 +語言 22 +附近 22 +ch 21 +es 21 +nt 21 +ra 21 +一起 21 +作用 21 +出生 21 +只有 21 +唯一 21 +地位 21 +廣泛 21 +植物 21 +海軍 21 +無法 21 +現代 21 +環境 21 +紀念 21 +結束 21 +舉行 21 +角色 21 +議員 21 +選舉 21 +at 20 +el 20 +ha 20 +ic 20 +it 20 +ti 20 +不能 20 +主席 20 +仍然 20 +冠軍 20 +出任 20 +分子 20 +原子 20 +參與 20 +地下 20 +城鎮 20 +天津 20 +工業 20 +希臘 20 +引起 20 +採用 20 +攻擊 20 +整個 20 +文學 20 +文物 20 +朝鮮 20 +東北 20 +機構 20 +比較 20 +猶太 20 +管轄 20 +範圍 20 +細胞 20 +經常 20 +自治 20 +自由 20 +逐漸 20 +重新 20 +類型 20 +am 19 +ma 19 +不久 19 +以上 19 +佔領 19 +分別 19 +台北 19 +多數 19 +天文 19 +巴黎 19 +所以 19 +方米 19 +最早 19 +會議 19 +有些 19 +民族 19 +結果 19 +繼續 19 +能夠 19 +造成 19 +達到 19 +部份 19 +風格 19 +et 18 +la 18 +ve 18 +不會 18 +任何 18 +企業 18 +先後 18 +列車 18 +功能 18 +取得 18 +合併 18 +外交 18 +廣州 18 +戰役 18 +明朝 18 +每年 18 +治療 18 +法院 18 +漫畫 18 +爾斯 18 +畢業 18 +疾病 18 +相當 18 +艦隊 18 +身體 18 +軍隊 18 +離開 18 +領導 18 +體育 18 +Ma 17 +ea 17 +na 17 +ng 17 +ou 17 +rt 17 +ta 17 +再次 17 +名字 17 +大戰 17 +宗教 17 +家族 17 +希望 17 +廣場 17 +採取 17 +提出 17 +教堂 17 +新聞 17 +最初 17 +格蘭 17 +物理 17 +特別 17 +發行 17 +總部 17 +自然 17 +蘇聯 17 +行動 17 +製造 17 +西北 17 +資料 17 +選擇 17 +領域 17 +飛機 17 +St 16 +as 16 +is 16 +nd 16 +ol 16 +oo 16 +se 16 +to 16 +九龍 16 +共同 16 +利用 16 +制度 16 +前往 16 +創作 16 +勢力 16 +區域 16 +協助 16 +各種 16 +大樓 16 +家庭 16 +實驗 16 +居民 16 +山東 16 +心理 16 +或者 16 +拒絕 16 +東南 16 +武器 16 +民主 16 +法律 16 +爆發 16 +狀態 16 +而且 16 +藝術 16 +表現 16 +西亞 16 +記者 16 +設有 16 +設立 16 +資源 16 +軌道 16 +過程 16 +道路 16 +還是 16 +革命 16 +首次 16 +高速 16 +Co 15 +io 15 +os 15 +th 15 +下轄 15 +中共 15 +主角 15 +作戰 15 +則是 15 +化石 15 +十分 15 +南京 15 +南部 15 +回到 15 +國內 15 +國王 15 +地球 15 +基督 15 +大廈 15 +大約 15 +太陽 15 +女兒 15 +女性 15 +如此 15 +學習 15 +完全 15 +實際 15 +常見 15 +幾乎 15 +應用 15 +承認 15 +投資 15 +指出 15 +指揮 15 +斯特 15 +普查 15 +未來 15 +此後 15 +火星 15 +版本 15 +牠們 15 +發表 15 +直到 15 +碼頭 15 +科技 15 +立法 15 +統治 15 +職業 15 +著名 15 +蒙古 15 +西部 15 +調查 15 +路線 15 +車輛 15 +農業 15 +這樣 15 +鐵道 15 +.. 14 +Ca 14 +me 14 +om 14 +un 14 +ur 14 +us 14 +一定 14 +二十 14 +交易 14 +人們 14 +以來 14 +位置 14 +使得 14 +俄羅 14 +俱樂 14 +傳播 14 +兒童 14 +公主 14 +北部 14 +十一 14 +博物 14 +合作 14 +基本 14 +境內 14 +太平 14 +失去 14 +完成 14 +容易 14 +密度 14 +專業 14 +市場 14 +幫助 14 +建造 14 +擊敗 14 +曾經 14 +有限 14 +棲息 14 +樂部 14 +波蘭 14 +澳門 14 +營運 14 +特色 14 +相同 14 +看到 14 +簡稱 14 +統計 14 +網路 14 +聯邦 14 +董事 14 +規模 14 +解決 14 +貝爾 14 +起來 14 +路易 14 +這裡 14 +進攻 14 +開發 14 +限制 14 +顯示 14 +Be 13 +ce 13 +ec 13 +hi 13 +ir 13 +rs 13 +伊斯 13 +倫敦 13 +克斯 13 +全部 13 +公路 13 +公開 13 +其後 13 +初期 13 +加上 13 +博士 13 +司令 13 +同意 13 +因而 13 +圖書 13 +土地 13 +埃及 13 +基礎 13 +墨西 13 +天主 13 +妻子 13 +娛樂 13 +建設 13 +形式 13 +從事 13 +改變 13 +教會 13 +數學 13 +數據 13 +數量 13 +早期 13 +更多 13 +東京 13 +樂團 13 +模式 13 +死亡 13 +每個 13 +水平 13 +流域 13 +準備 13 +物種 13 +物質 13 +王國 13 +玩家 13 +男性 13 +當選 13 +目標 13 +相關 13 +知識 13 +第四 13 +紀錄 13 +統一 13 +街道 13 +西哥 13 +設定 13 +身份 13 +辦公 13 +速度 13 +運輸 13 +項目 13 +食物 13 +ai 12 +ee 12 +ey 12 +ni 12 +nn 12 +tr 12 +一帶 12 +上帝 12 +中學 12 +中部 12 +之前 12 +人數 12 +什麼 12 +以下 12 +保留 12 +個人 12 +價值 12 +元素 12 +內部 12 +公元 12 +加坡 12 +半島 12 +原本 12 +反應 12 +反映 12 +可是 12 +商業 12 +嚴重 12 +基地 12 +大型 12 +女子 12 +將軍 12 +尤其 12 +居住 12 +帶來 12 +平均 12 +建議 12 +很大 12 +律師 12 +恆星 12 +恐怖 12 +改革 12 +政策 12 +新加 12 +月台 12 +有時 12 +東部 12 +標準 12 +機關 12 +歌手 12 +決賽 12 +汽車 12 +減少 12 +潛艇 12 +熱帶 12 +瑞典 12 +生命 12 +產品 12 +產業 12 +相對 12 +眾多 12 +知道 12 +精神 12 +經營 12 +英格 12 +葡萄 12 +該國 12 +變成 12 +賽事 12 +透過 12 +遭到 12 +遺址 12 +避免 12 +醫院 12 +重建 12 +重慶 12 +阿爾 12 +電子 12 +9多 11 +au 11 +co 11 +di 11 +ip 11 +lo 11 +og 11 +vi 11 +主張 11 +主持 11 +主教 11 +之中 11 +亨利 11 +人士 11 +以前 11 +以色 11 +作者 11 +保持 11 +信仰 11 +先生 11 +全球 11 +出身 11 +創立 11 +創辦 11 +力量 11 +印第 11 +去世 11 +取代 11 +召開 11 +喬治 11 +地利 11 +大會 11 +奧地 11 +威爾 11 +威脅 11 +安全 11 +專輯 11 +強烈 11 +拉克 11 +接近 11 +推出 11 +描述 11 +播放 11 +文字 11 +斯蘭 11 +普遍 11 +柏林 11 +業務 11 +殖民 11 +江蘇 11 +涉及 11 +現時 11 +留下 11 +目的 11 +相信 11 +社區 11 +福建 11 +第十 11 +第安 11 +給予 11 +網站 11 +線路 11 +繼承 11 +色列 11 +試圖 11 +資訊 11 +部門 11 +阿拉 11 +馬來 11 +9- 10 +9千 10 +II 10 +ct 10 +de 10 +id 10 +no 10 +ns 10 +pe 10 +pi 10 +rn 10 +si 10 +並非 10 +事業 10 +交流 10 +以後 10 +來往 10 +儘管 10 +克里 10 +勞動 10 +包含 10 +化學 10 +協會 10 +君主 10 +和平 10 +唱片 10 +國旗 10 +國會 10 +報告 10 +威廉 10 +學位 10 +復興 10 +感到 10 +手術 10 +投入 10 +推動 10 +播出 10 +改名 10 +文明 10 +文藝 10 +明顯 10 +有效 10 +杭州 10 +東方 10 +條件 10 +模型 10 +比亞 10 +河流 10 +法庭 10 +派遣 10 +演員 10 +演唱 10 +火車 10 +爭議 10 +特定 10 +特徵 10 +特殊 10 +獨特 10 +生長 10 +當中 10 +發動 10 +發射 10 +盛頓 10 +確定 10 +神話 10 +移民 10 +空間 10 +終於 10 +結婚 10 +維持 10 +總理 10 +芬蘭 10 +花園 10 +華盛 10 +衝突 10 +西藏 10 +規定 10 +訓練 10 +記載 10 +記錄 10 +該市 10 +警察 10 +變化 10 +責任 10 +起源 10 +逝世 10 +運行 10 +里斯 10 +錦標 10 +關於 10 +陸軍 10 +雜誌 10 +類似 10 +飛行 10 +首都 10 +'' 9 +A9 9 +Gr 9 +Th 9 +be 9 +iv 9 +od 9 +sa 9 +ss 9 +ty 9 +一切 9 +一致 9 +上陣 9 +下降 9 +不斷 9 +不滿 9 +中山 9 +丹麥 9 +之外 9 +事務 9 +互相 9 +介紹 9 +來到 9 +健康 9 +內閣 9 +全長 9 +公布 9 +其實 9 +再度 9 +出來 9 +出售 9 +分支 9 +到達 9 +加利 9 +動畫 9 +十八 9 +千米 9 +南方 9 +危險 9 +古代 9 +古典 9 +各類 9 +吉尼 9 +國務 9 +團體 9 +地產 9 +地點 9 +執行 9 +士兵 9 +奪得 9 +媒體 9 +字母 9 +孩子 9 +學者 9 +安那 9 +對手 9 +就讀 9 +工人 9 +左右 9 +帶領 9 +引擎 9 +強大 9 +律賓 9 +後期 9 +快速 9 +恢復 9 +意外 9 +戰略 9 +打擊 9 +批評 9 +拍攝 9 +接觸 9 +攻入 9 +放棄 9 +政權 9 +教學 9 +斯基 9 +易斯 9 +星期 9 +普通 9 +朋友 9 +未能 9 +本人 9 +本身 9 +核心 9 +森林 9 +標誌 9 +機會 9 +機車 9 +權利 9 +此時 9 +民間 9 +沿海 9 +浙江 9 +湖泊 9 +滿洲 9 +爆炸 9 +父母 9 +爾德 9 +特大 9 +狀況 9 +瑞士 9 +當局 9 +發布 9 +百萬 9 +皇后 9 +皇家 9 +相互 9 +相似 9 +破壞 9 +穩定 9 +空中 9 +第五 9 +米亞 9 +粒子 9 +約翰 9 +絕對 9 +經歷 9 +經理 9 +綜合 9 +總督 9 +老師 9 +而是 9 +聯繫 9 +職務 9 +自行 9 +菲律 9 +處理 9 +觀眾 9 +解放 9 +貢獻 9 +資格 9 +進士 9 +運作 9 +那麼 9 +酒店 9 +金屬 9 +階段 9 +隧道 9 +隨後 9 +集中 9 +電話 9 +青年 9 +頻道 9 +顏色 9 +高等 9 +Go 8 +Me 8 +ge 8 +ld 8 +ru 8 +ul 8 +上升 8 +下來 8 +中環 8 +主題 8 +也納 8 +二世 8 +亞洲 8 +人工 8 +以外 8 +佔地 8 +依據 8 +俄國 8 +保守 8 +信息 8 +價格 8 +光棍 8 +內地 8 +內戰 8 +公分 8 +分鐘 8 +利益 8 +劇情 8 +加哥 8 +加拿 8 +十二 8 +即使 8 +原來 8 +古老 8 +同樣 8 +命令 8 +喜歡 8 +因素 8 +圖案 8 +地鐵 8 +報道 8 +增長 8 +大多 8 +大小 8 +大道 8 +家人 8 +專門 8 +小型 8 +小時 8 +局長 8 +山脈 8 +山西 8 +工藝 8 +工資 8 +巨大 8 +巴斯 8 +幻想 8 +廣播 8 +廣東 8 +往往 8 +從此 8 +意義 8 +意見 8 +或是 8 +房屋 8 +拿大 8 +提升 8 +提高 8 +攝影 8 +效果 8 +教授 8 +文章 8 +方千 8 +方案 8 +旅遊 8 +明確 8 +書記 8 +書院 8 +材料 8 +武漢 8 +比如 8 +氯化 8 +污染 8 +注意 8 +測試 8 +湯姆 8 +澳大 8 +澳洲 8 +瀋陽 8 +燃料 8 +爵士 8 +現存 8 +男子 8 +病逝 8 +發明 8 +白色 8 +的話 8 +監督 8 +真正 8 +知名 8 +秘書 8 +程度 8 +穆斯 8 +立方 8 +符號 8 +等等 8 +維也 8 +維爾 8 +編碼 8 +編輯 8 +羽毛 8 +翻譯 8 +考慮 8 +聚集 8 +股份 8 +臨時 8 +良好 8 +芝加 8 +表達 8 +複雜 8 +襲擊 8 +西南 8 +西方 8 +解釋 8 +討論 8 +賽季 8 +贏得 8 +軟體 8 +過去 8 +部長 8 +里亞 8 +重大 8 +銀河 8 +長度 8 +隨即 8 +雄性 8 +餐廳 8 +首府 8 +高中 8 +麥克 8 +.c 7 +9餘 7 +Ba 7 +Ch 7 +Da 7 +H9 7 +Ha 7 +Ja 7 +La 7 +Mi 7 +Pa 7 +Ro 7 +Sa 7 +ap 7 +gi 7 +ie 7 +mi 7 +nc 7 +oh 7 +pa 7 +rk 7 +sc 7 +tu 7 +um 7 +下午 7 +不可 7 +主人 7 +之下 7 +事實 7 +事情 7 +二戰 7 +交換 7 +任命 7 +伊麗 7 +伯特 7 +住宅 7 +佛教 7 +保險 7 +傳說 7 +入侵 7 +公共 7 +公務 7 +公爵 7 +共產 7 +典型 7 +分析 7 +前身 7 +創造 7 +匈奴 7 +北角 7 +十三 7 +十字 7 +原著 7 +各地 7 +名稱 7 +名義 7 +吸引 7 +哈爾 7 +員工 7 +哲學 7 +唐朝 7 +在此 7 +城堡 7 +城門 7 +基金 7 +場所 7 +大使 7 +天星 7 +天然 7 +失敗 7 +奴隸 7 +姆斯 7 +學術 7 +安排 7 +實現 7 +實行 7 +專科 7 +尋找 7 +尋求 7 +小組 7 +島嶼 7 +差異 7 +巴西 7 +市區 7 +市民 7 +常常 7 +平原 7 +年級 7 +年輕 7 +建國 7 +弗吉 7 +弗朗 7 +強調 7 +形象 7 +很少 7 +德拉 7 +想像 7 +意識 7 +愛爾 7 +找到 7 +拉伯 7 +持有 7 +指導 7 +探測 7 +支援 7 +收斂 7 +教師 7 +斯坦 7 +斯塔 7 +方英 7 +旗下 7 +最多 7 +本地 7 +某些 7 +校園 7 +核糖 7 +格林 7 +條約 7 +榮譽 7 +樂隊 7 +檢查 7 +母音 7 +氣候 7 +水庫 7 +法蘭 7 +海岸 7 +海洋 7 +混合 7 +清真 7 +港島 7 +湖南 7 +激烈 7 +無綫 7 +然後 7 +熊貓 7 +爾特 7 +爾蘭 7 +特有 7 +現有 7 +現象 7 +球員 7 +球季 7 +理工 7 +瑪麗 7 +甘肅 7 +生態 7 +申請 7 +真實 7 +石油 7 +秘密 7 +移動 7 +空軍 7 +突破 7 +策略 7 +簽訂 7 +結合 7 +維新 7 +美麗 7 +翌年 7 +臺灣 7 +興建 7 +興趣 7 +舉辦 7 +航班 7 +航線 7 +莎白 7 +著作 7 +蒙特 7 +蘭克 7 +衛生 7 +表演 7 +表面 7 +規劃 7 +覺得 7 +觀測 7 +觀點 7 +計算 7 +訪問 7 +設施 7 +評論 7 +調整 7 +講述 7 +議院 7 +貴族 7 +貿易 7 +較小 7 +較為 7 +轟炸 7 +迅速 7 +近年 7 +連接 7 +道德 7 +達成 7 +適合 7 +選出 7 +邏輯 7 +醫學 7 +重點 7 +錄製 7 +鏡頭 7 +長期 7 +長達 7 +降低 7 +需求 7 +面對 7 +韓國 7 +領先 7 +領袖 7 +題材 7 +風暴 7 +食用 7 +駐守 7 +體現 7 +體系 7 +高級 7 +高達 7 +魔法 7 +麗莎 7 +Al 6 +Ar 6 +Br 6 +Cl 6 +JR 6 +Ka 6 +Le 6 +Li 6 +M9 6 +Mo 6 +N9 6 +Ph 6 +Wi 6 +ad 6 +ag 6 +ay 6 +ca 6 +do 6 +ed 6 +ef 6 +ew 6 +hn 6 +ho 6 +hr 6 +ig 6 +mo 6 +ob 6 +of 6 +op 6 +pp 6 +rr 6 +so 6 +ts 6 +ud 6 +一世 6 +丈夫 6 +上市 6 +上映 6 +事物 6 +亞歷 6 +亦是 6 +享受 6 +代理 6 +任務 6 +但丁 6 +作出 6 +來源 6 +來越 6 +依然 6 +依靠 6 +促進 6 +信號 6 +個體 6 +做法 6 +優勢 6 +元朗 6 +克拉 6 +克蘭 6 +入口 6 +全家 6 +公民 6 +公眾 6 +出土 6 +判決 6 +劃分 6 +加工 6 +助理 6 +努力 6 +動力 6 +十五 6 +協議 6 +卡斯 6 +卡爾 6 +原始 6 +反射 6 +取消 6 +口號 6 +司法 6 +否認 6 +含有 6 +吸收 6 +呼吸 6 +咖啡 6 +商品 6 +商店 6 +嘗試 6 +四川 6 +困難 6 +國歌 6 +基因 6 +壓力 6 +外國 6 +多樣 6 +大大 6 +大獎 6 +大眾 6 +太空 6 +夫人 6 +夫斯 6 +奧爾 6 +奧運 6 +她們 6 +好友 6 +如同 6 +始建 6 +季節 6 +官方 6 +定居 6 +定義 6 +客運 6 +宣佈 6 +家中 6 +密碼 6 +對應 6 +對抗 6 +對象 6 +導演 6 +展覽 6 +山大 6 +島上 6 +師範 6 +平等 6 +平面 6 +廣告 6 +延伸 6 +強度 6 +形容 6 +形態 6 +形狀 6 +影片 6 +彼此 6 +德爾 6 +情感 6 +意味 6 +懷孕 6 +成熟 6 +成績 6 +成長 6 +手法 6 +打算 6 +批准 6 +投票 6 +拉斯 6 +授予 6 +提名 6 +搖滾 6 +搜索 6 +操作 6 +擴展 6 +改編 6 +效力 6 +敘利 6 +教導 6 +斯林 6 +斯科 6 +新城 6 +方向 6 +方形 6 +日報 6 +日耳 6 +時任 6 +時常 6 +普魯 6 +更名 6 +最近 6 +朝廷 6 +東西 6 +查爾 6 +查理 6 +柯林 6 +校區 6 +校長 6 +歌曲 6 +歷山 6 +死後 6 +民眾 6 +氧化 6 +河道 6 +流行 6 +海盜 6 +消費 6 +深入 6 +深圳 6 +滅亡 6 +無論 6 +無關 6 +爾曼 6 +版權 6 +牙齒 6 +王朝 6 +玻璃 6 +生存 6 +男友 6 +畫家 6 +病毒 6 +發出 6 +發起 6 +發達 6 +確認 6 +神奇 6 +神秘 6 +神經 6 +禁止 6 +私人 6 +秦國 6 +立刻 6 +立場 6 +童年 6 +第七 6 +籃球 6 +米蘭 6 +經典 6 +經驗 6 +緬甸 6 +繪畫 6 +缺乏 6 +習俗 6 +翡翠 6 +耳曼 6 +能量 6 +色彩 6 +荷蘭 6 +藉由 6 +蘇格 6 +虛擬 6 +蛋白 6 +血統 6 +行走 6 +表明 6 +製成 6 +西斯 6 +西蘭 6 +覆蓋 6 +規則 6 +設置 6 +試驗 6 +詩人 6 +詩歌 6 +該片 6 +說服 6 +說法 6 +諮詢 6 +證明 6 +豐富 6 +超人 6 +越來 6 +跑道 6 +車展 6 +輿論 6 +近代 6 +返回 6 +退役 6 +通往 6 +通訊 6 +進步 6 +過來 6 +選區 6 +遺傳 6 +邀請 6 +邊緣 6 +酒精 6 +醫生 6 +醫療 6 +金融 6 +銷售 6 +開展 6 +開放 6 +阻止 6 +陷入 6 +隊員 6 +階級 6 +隨機 6 +雕刻 6 +雲南 6 +電池 6 +非洲 6 +顧問 6 +首先 6 +馬克 6 +馬爾 6 +馬達 6 +騎兵 6 +魯士 6 +9A 5 +9百 5 +Bo 5 +Di 5 +ET 5 +El 5 +Fi 5 +In 5 +Jo 5 +Ne 5 +Ni 5 +Re 5 +Ri 5 +Se 5 +Sh 5 +ab 5 +ac 5 +av 5 +bi 5 +ck 5 +ev 5 +fo 5 +gl 5 +im 5 +ke 5 +ki 5 +lu 5 +nk 5 +oc 5 +ph 5 +rc 5 +rg 5 +rl 5 +rm 5 +su 5 +t. 5 +ue 5 +三十 5 +上述 5 +不僅 5 +不好 5 +中立 5 +中西 5 +中間 5 +丹尼 5 +主流 5 +事故 5 +亞馬 5 +人均 5 +今天 5 +今日 5 +介入 5 +以北 5 +任期 5 +佔據 5 +佛羅 5 +作家 5 +來西 5 +依舊 5 +侵略 5 +保加 5 +保存 5 +信任 5 +信奉 5 +信託 5 +修正 5 +倫比 5 +停止 5 +傑出 5 +傳承 5 +傷害 5 +像是 5 +儀式 5 +免費 5 +公交 5 +公會 5 +其它 5 +其餘 5 +冷卻 5 +出口 5 +分配 5 +分類 5 +列入 5 +別墅 5 +刺激 5 +創建 5 +加熱 5 +加盟 5 +勒密 5 +勒斯 5 +動作 5 +勞工 5 +化合 5 +北海 5 +十六 5 +十多 5 +千億 5 +升級 5 +南北 5 +南極 5 +占庭 5 +參謀 5 +參議 5 +受傷 5 +叫做 5 +司馬 5 +各個 5 +合法 5 +合理 5 +吉爾 5 +同盟 5 +名單 5 +名詞 5 +呈現 5 +周圍 5 +品牌 5 +哈定 5 +啟超 5 +喇嘛 5 +四十 5 +固定 5 +固體 5 +圖像 5 +土耳 5 +在內 5 +地圖 5 +城區 5 +執政 5 +培養 5 +堅持 5 +場地 5 +塞爾 5 +壁畫 5 +外科 5 +多芬 5 +大氣 5 +大西 5 +奧斯 5 +如下 5 +如今 5 +始終 5 +學名 5 +學會 5 +學科 5 +宇宙 5 +安裝 5 +官吏 5 +客戶 5 +客體 5 +宮廷 5 +家長 5 +容納 5 +宿舍 5 +察覺 5 +寫作 5 +專利 5 +專家 5 +對外 5 +對此 5 +少數 5 +尼克 5 +尼士 5 +尼黑 5 +展出 5 +展開 5 +工具 5 +巴克 5 +巴哈 5 +巴爾 5 +市政 5 +希米 5 +席位 5 +年度 5 +底部 5 +廈門 5 +廣西 5 +建成 5 +引發 5 +弟弟 5 +得知 5 +微博 5 +德堡 5 +德里 5 +意志 5 +意思 5 +愛情 5 +感情 5 +感覺 5 +慈善 5 +態度 5 +慕尼 5 +慶祝 5 +成年 5 +成本 5 +成都 5 +戰國 5 +戰後 5 +房間 5 +手中 5 +手段 5 +托勒 5 +托爾 5 +技能 5 +抗議 5 +抵抗 5 +抵達 5 +拉格 5 +拜占 5 +持續 5 +指定 5 +指示 5 +掌握 5 +排名 5 +接管 5 +推進 5 +措施 5 +提到 5 +撤銷 5 +收入 5 +收藏 5 +政務 5 +故宮 5 +教皇 5 +教習 5 +敵人 5 +文忠 5 +文獻 5 +斯堡 5 +斯托 5 +新型 5 +新華 5 +新鮮 5 +方便 5 +方言 5 +施工 5 +旅行 5 +日期 5 +早年 5 +明治 5 +是否 5 +更加 5 +書中 5 +有的 5 +本片 5 +東海 5 +林頓 5 +架構 5 +某種 5 +格拉 5 +格羅 5 +格里 5 +棉花 5 +棒球 5 +構成 5 +樓梯 5 +機制 5 +機器 5 +次年 5 +欣賞 5 +歡迎 5 +正常 5 +正確 5 +武裝 5 +殺害 5 +每天 5 +比利 5 +民兵 5 +氣體 5 +水果 5 +水系 5 +江西 5 +決策 5 +河北 5 +河南 5 +波希 5 +波音 5 +泥塑 5 +泰安 5 +泳兒 5 +洛桑 5 +海峽 5 +海底 5 +海德 5 +消息 5 +游擊 5 +湖北 5 +溫度 5 +溫泉 5 +滅絕 5 +演化 5 +演奏 5 +漢朝 5 +澤東 5 +濃度 5 +物品 5 +物業 5 +物體 5 +特勒 5 +特蘭 5 +狩獵 5 +王子 5 +珊瑚 5 +現場 5 +現實 5 +瑪利 5 +生下 5 +生涯 5 +用作 5 +發送 5 +百餘 5 +直徑 5 +直至 5 +真理 5 +真相 5 +祖先 5 +神聖 5 +移居 5 +程序 5 +種植 5 +種類 5 +稱作 5 +空氣 5 +突變 5 +競賽 5 +符合 5 +第六 5 +簡單 5 +紅軍 5 +紐西 5 +級別 5 +細節 5 +組合 5 +結局 5 +維亞 5 +維吉 5 +維多 5 +編號 5 +練習 5 +總署 5 +羅伯 5 +美洲 5 +群島 5 +群眾 5 +翼龍 5 +耕地 5 +耳其 5 +聯絡 5 +聲明 5 +肯定 5 +興奮 5 +興起 5 +船上 5 +英里 5 +華視 5 +萊姆 5 +落後 5 +薩摩 5 +薩斯 5 +薩爾 5 +藝人 5 +藥物 5 +蘭卡 5 +虎丘 5 +虛構 5 +融合 5 +血壓 5 +行業 5 +裝甲 5 +裝置 5 +裡面 5 +西遊 5 +西門 5 +解散 5 +設備 5 +診斷 5 +該地 5 +該屬 5 +詹姆 5 +認可 5 +認知 5 +認識 5 +誕生 5 +象徵 5 +貝多 5 +財產 5 +貨車 5 +貨運 5 +質量 5 +赤道 5 +超級 5 +越南 5 +趙國 5 +身亡 5 +軍團 5 +輻鰭 5 +轉移 5 +轉變 5 +辭去 5 +辭職 5 +退出 5 +通車 5 +通道 5 +連任 5 +連續 5 +進而 5 +進軍 5 +適當 5 +遭遇 5 +那裡 5 +郵政 5 +鄉鎮 5 +鄰近 5 +醒亞 5 +醫師 5 +鎮壓 5 +長大 5 +長官 5 +長沙 5 +開設 5 +防禦 5 +陝西 5 +院長 5 +階層 5 +障礙 5 +隸屬 5 +電梯 5 +電車 5 +青海 5 +預測 5 +預算 5 +預防 5 +領土 5 +頻率 5 +食品 5 +飲料 5 +首相 5 +馬尼 5 +馬遜 5 +馬里 5 +騎士 5 +體積 5 +體色 5 +高低 5 +黑人 5 +龐大 5 +9: 4 +9B 4 +:9 4 +AA 4 +AC 4 +Ad 4 +BA 4 +BC 4 +CR 4 +Ce 4 +DS 4 +De 4 +Do 4 +E9 4 +Fo 4 +Fr 4 +Ga 4 +Ho 4 +Lo 4 +NA 4 +NB 4 +Na 4 +OR 4 +Pi 4 +SP 4 +So 4 +Sp 4 +Su 4 +To 4 +Wo 4 +Yo 4 +ak 4 +ba 4 +cr 4 +dd 4 +dl 4 +dr 4 +ei 4 +em 4 +gh 4 +gu 4 +lt 4 +ly 4 +mp 4 +nb 4 +nu 4 +ok 4 +ot 4 +ow 4 +oy 4 +ps 4 +rd 4 +sh 4 +tt 4 +ut 4 +wa 4 +一半 4 +一旦 4 +一百 4 +三世 4 +三氯 4 +上下 4 +上演 4 +上訴 4 +下令 4 +下頜 4 +不及 4 +不得 4 +不應 4 +不等 4 +不足 4 +世凱 4 +中全 4 +中東 4 +中止 4 +中視 4 +丹羽 4 +主演 4 +之上 4 +乘坐 4 +乘客 4 +乾燥 4 +乾隆 4 +了解 4 +事變 4 +五世 4 +五十 4 +五百 4 +亞冠 4 +亞當 4 +亞軍 4 +交給 4 +交配 4 +交響 4 +亦為 4 +享年 4 +人事 4 +代言 4 +以為 4 +以西 4 +任職 4 +企圖 4 +伊拉 4 +伺服 4 +供應 4 +依法 4 +侵蝕 4 +保羅 4 +保障 4 +信徒 4 +修復 4 +倫斯 4 +倫理 4 +做出 4 +停留 4 +億9 4 +優惠 4 +優秀 4 +兄弟 4 +充電 4 +先進 4 +克森 4 +克特 4 +克薩 4 +入選 4 +內斯 4 +全會 4 +全面 4 +公安 4 +公式 4 +共振 4 +其間 4 +具體 4 +冬天 4 +出入 4 +出場 4 +出戰 4 +出發 4 +出租 4 +出色 4 +分佈 4 +分成 4 +分期 4 +列表 4 +利普 4 +利特 4 +則天 4 +則為 4 +前期 4 +前線 4 +前進 4 +劇集 4 +加州 4 +加強 4 +勝利 4 +包裝 4 +匈牙 4 +區劃 4 +十七 4 +協定 4 +協調 4 +南側 4 +南延 4 +南海 4 +卡拉 4 +卡洛 4 +卡羅 4 +危機 4 +原有 4 +原理 4 +參考 4 +參賽 4 +古物 4 +只要 4 +各國 4 +各式 4 +各界 4 +合成 4 +合眾 4 +合金 4 +吉他 4 +同事 4 +同治 4 +名將 4 +吾爾 4 +周年 4 +命運 4 +哈伊 4 +哈里 4 +哥倫 4 +商人 4 +啟用 4 +單車 4 +嘲諷 4 +回來 4 +國防 4 +圓形 4 +地底 4 +地形 4 +地面 4 +埃米 4 +基辛 4 +堅決 4 +塔夫 4 +塞維 4 +士頓 4 +外星 4 +多利 4 +大同 4 +大帝 4 +大臣 4 +天皇 4 +夫婦 4 +失望 4 +奧林 4 +妹妹 4 +姐姐 4 +姓氏 4 +委任 4 +威斯 4 +婚姻 4 +婦女 4 +媽媽 4 +學堂 4 +學府 4 +安德 4 +官員 4 +定律 4 +宣稱 4 +實業 4 +實體 4 +寶貝 4 +小學 4 +少女 4 +尼斯 4 +尼西 4 +尼迪 4 +尼龍 4 +局部 4 +展示 4 +屯門 4 +山區 4 +山頂 4 +島式 4 +島津 4 +巡迴 4 +巴塞 4 +巴格 4 +巴納 4 +布拉 4 +布里 4 +布魯 4 +帶給 4 +常春 4 +常用 4 +幅度 4 +年齡 4 +幽默 4 +度假 4 +廚房 4 +廢除 4 +弗里 4 +影像 4 +影業 4 +很快 4 +很難 4 +後者 4 +得分 4 +得名 4 +得寵 4 +循環 4 +徵召 4 +德克 4 +志願 4 +怎麼 4 +性別 4 +性質 4 +恐龍 4 +患者 4 +情報 4 +情形 4 +情節 4 +情緒 4 +應該 4 +戀愛 4 +成人 4 +成分 4 +戰敗 4 +戰死 4 +扮演 4 +批判 4 +技巧 4 +抒情 4 +拉多 4 +拉夫 4 +拓展 4 +招募 4 +指數 4 +按照 4 +挪威 4 +排列 4 +排水 4 +排行 4 +接收 4 +接替 4 +推薦 4 +推行 4 +揚州 4 +擔心 4 +擴大 4 +擴建 4 +擴張 4 +收到 4 +收錄 4 +改造 4 +攻克 4 +教區 4 +教宗 4 +教練 4 +整理 4 +數十 4 +數千 4 +數字 4 +文泰 4 +斯加 4 +斯大 4 +斯洛 4 +斯維 4 +斯里 4 +新疆 4 +新竹 4 +旅客 4 +日常 4 +昆明 4 +明基 4 +星球 4 +星等 4 +春秋 4 +時段 4 +晉國 4 +晚間 4 +暗示 4 +暴力 4 +更換 4 +曼德 4 +最低 4 +最好 4 +最長 4 +有用 4 +服裝 4 +望遠 4 +木材 4 +本作 4 +本土 4 +本科 4 +本線 4 +本魚 4 +杉磯 4 +杜蘭 4 +東區 4 +林匹 4 +校舍 4 +案件 4 +楚國 4 +樂器 4 +標本 4 +樞紐 4 +模仿 4 +橄欖 4 +檢測 4 +正月 4 +此前 4 +此次 4 +步兵 4 +武術 4 +歷任 4 +死神 4 +殺死 4 +每秒 4 +比例 4 +毫克 4 +毫米 4 +水深 4 +永江 4 +污泥 4 +沉澱 4 +沙灘 4 +河川 4 +油價 4 +治亞 4 +法案 4 +法規 4 +波斯 4 +注入 4 +洛斯 4 +洛杉 4 +洛陽 4 +流感 4 +流經 4 +海域 4 +海外 4 +海戰 4 +海水 4 +海灣 4 +海面 4 +液態 4 +液體 4 +測量 4 +港鐵 4 +滿貫 4 +潮濕 4 +濟南 4 +灣仔 4 +火災 4 +炸藥 4 +烏克 4 +無意 4 +無線 4 +無錫 4 +照片 4 +營業 4 +爾尼 4 +爾濱 4 +牙利 4 +牛奶 4 +牧場 4 +特遣 4 +特點 4 +犯罪 4 +狀元 4 +狙擊 4 +獎勵 4 +王后 4 +珍珠 4 +珠爾 4 +現今 4 +現金 4 +球會 4 +理事 4 +理想 4 +琉球 4 +瓦爾 4 +瓷器 4 +甘珠 4 +生化 4 +生意 4 +產地 4 +產量 4 +當天 4 +當年 4 +當日 4 +當然 4 +疫苗 4 +癌症 4 +發育 4 +發言 4 +發酵 4 +皮膚 4 +監獄 4 +直升 4 +直隸 4 +相比 4 +相近 4 +省份 4 +省委 4 +省級 4 +督察 4 +矩陣 4 +短暫 4 +短篇 4 +研發 4 +社團 4 +祖父 4 +神廟 4 +神達 4 +票房 4 +福斯 4 +科爾 4 +租界 4 +種族 4 +稱號 4 +空調 4 +突然 4 +立即 4 +競爭 4 +等級 4 +節日 4 +簽約 4 +米爾 4 +米特 4 +精確 4 +紋理 4 +納德 4 +納粹 4 +索馬 4 +終止 4 +終結 4 +維吾 4 +維埃 4 +網球 4 +緊密 4 +總量 4 +總長 4 +繼任 4 +罕見 4 +罪名 4 +羅倫 4 +羅爾 4 +義務 4 +習慣 4 +老闆 4 +考察 4 +考試 4 +聖母 4 +聲稱 4 +聲譽 4 +背景 4 +胡佛 4 +自動 4 +船隻 4 +艱難 4 +苦艾 4 +英尺 4 +草本 4 +莊園 4 +莫斯 4 +華航 4 +華龍 4 +萊茵 4 +落成 4 +著重 4 +蒙扎 4 +蓉蓉 4 +薩克 4 +蘇家 4 +蘋果 4 +蘭戈 4 +蜘蛛 4 +血栓 4 +行省 4 +術語 4 +衛星 4 +西側 4 +西曼 4 +西洋 4 +西納 4 +親王 4 +評價 4 +評定 4 +詞語 4 +該劇 4 +該區 4 +課程 4 +談話 4 +請求 4 +論文 4 +識字 4 +警署 4 +議長 4 +讀者 4 +財富 4 +財政 4 +貨幣 4 +貨物 4 +費爾 4 +資本 4 +資深 4 +資金 4 +購買 4 +贊助 4 +起義 4 +身上 4 +身分 4 +躲避 4 +車序 4 +軍人 4 +軍力 4 +軍官 4 +軍閥 4 +較多 4 +較少 4 +較高 4 +輸入 4 +輻射 4 +轄下 4 +轉換 4 +辛格 4 +辦事 4 +辦法 4 +辦理 4 +農民 4 +迪絲 4 +逃往 4 +這麼 4 +週期 4 +進口 4 +進球 4 +進程 4 +遊行 4 +過度 4 +過枝 4 +達爾 4 +遷移 4 +遼寧 4 +邊境 4 +部落 4 +郵票 4 +里奧 4 +里蘭 4 +重視 4 +野生 4 +量子 4 +金字 4 +針對 4 +銅鑼 4 +鋼琴 4 +錯誤 4 +鏡片 4 +鏡面 4 +長子 4 +長江 4 +門診 4 +開幕 4 +開闢 4 +關心 4 +防守 4 +阿格 4 +院校 4 +陽光 4 +隊伍 4 +隔離 4 +雕塑 4 +雨水 4 +電力 4 +電台 4 +電磁 4 +電訊 4 +靜態 4 +靜脈 4 +非法 4 +靠近 4 +順位 4 +預期 4 +願意 4 +風險 4 +颱風 4 +飼養 4 +餘下 4 +首領 4 +馬丁 4 +馬利 4 +體內 4 +體長 4 +高架 4 +高溫 4 +鬥爭 4 +鰭魚 4 +鳥類 4 +黃埔 4 +黑色 4 +黨籍 4 +鼓勵 4 +龍鳥 4 +/9 3 +9/ 3 +9N 3 +AS 3 +Ae 3 +An 3 +Av 3 +CM 3 +DC 3 +DP 3 +FC 3 +GD 3 +Ge 3 +HI 3 +He 3 +Je 3 +Ki 3 +Mu 3 +NE 3 +No 3 +PA 3 +PS 3 +Pe 3 +Pl 3 +Po 3 +Pr 3 +Pu 3 +Qu 3 +RH 3 +SM 3 +ST 3 +TV 3 +Te 3 +Un 3 +Vi 3 +We 3 +ah 3 +az 3 +cu 3 +da 3 +ds 3 +e- 3 +eo 3 +ga 3 +go 3 +hl 3 +hu 3 +iP 3 +ka 3 +km 3 +lp 3 +ls 3 +ov 3 +ox 3 +po 3 +py 3 +sb 3 +ua 3 +ub 3 +uc 3 +uk 3 +up 3 +va 3 +we 3 +wi 3 +xx 3 +yc 3 +°C 3 +一中 3 +一八 3 +一同 3 +一時 3 +一郎 3 +三江 3 +三百 3 +上游 3 +上環 3 +上萬 3 +上表 3 +上課 3 +上面 3 +下列 3 +下台 3 +下場 3 +下級 3 +下車 3 +不但 3 +不出 3 +不想 3 +不敵 3 +不明 3 +不遠 3 +不韋 3 +世宗 3 +丟失 3 +中古 3 +中子 3 +中將 3 +中期 3 +中轉 3 +中風 3 +丹佛 3 +主任 3 +主動 3 +主唱 3 +主機 3 +主編 3 +主辦 3 +主體 3 +乃爾 3 +之一 3 +之內 3 +之時 3 +乙烯 3 +九一 3 +也好 3 +二百 3 +二胺 3 +互聯 3 +五角 3 +亞利 3 +亞目 3 +亞視 3 +交往 3 +亥俄 3 +京都 3 +亮度 3 +人性 3 +人次 3 +人生 3 +人身 3 +他人 3 +付出 3 +仙女 3 +以往 3 +以致 3 +任內 3 +任教 3 +份子 3 +企鵝 3 +伊恩 3 +伊賀 3 +伍德 3 +休息 3 +估計 3 +伸出 3 +伽利 3 +低槓 3 +住戶 3 +住院 3 +佔有 3 +佛像 3 +佛學 3 +佛瑞 3 +作霖 3 +來亞 3 +來自 3 +例子 3 +供奉 3 +供給 3 +依賴 3 +俄亥 3 +俘虜 3 +保安 3 +保育 3 +保證 3 +信德 3 +修建 3 +修道 3 +個別 3 +個性 3 +倖存 3 +候選 3 +借用 3 +倫薩 3 +值得 3 +偉大 3 +停車 3 +備受 3 +傳奇 3 +傳教 3 +傳染 3 +傾向 3 +優異 3 +允許 3 +元洪 3 +光源 3 +光緒 3 +克羅 3 +兒女 3 +兒法 3 +內哥 3 +內瓦 3 +內陸 3 +全市 3 +全縣 3 +全體 3 +八百 3 +公國 3 +公尺 3 +公轉 3 +六十 3 +共計 3 +兵力 3 +兼任 3 +冊封 3 +凱撒 3 +出使 3 +出獄 3 +函數 3 +分之 3 +分散 3 +分行 3 +分裂 3 +分解 3 +切斷 3 +刊物 3 +列為 3 +利堅 3 +利斯 3 +利桑 3 +利略 3 +利福 3 +制定 3 +前後 3 +前鋒 3 +前面 3 +剛好 3 +創意 3 +劇團 3 +劇本 3 +劇目 3 +劇院 3 +劍橋 3 +力學 3 +加爾 3 +勒格 3 +勞倫 3 +化工 3 +北冕 3 +北洋 3 +區別 3 +十四 3 +十萬 3 +十餘 3 +千上 3 +千萬 3 +升任 3 +升格 3 +卑斯 3 +南昌 3 +博恩 3 +印象 3 +即將 3 +卻是 3 +厘米 3 +原則 3 +原告 3 +原料 3 +友誼 3 +受損 3 +叛亂 3 +口徑 3 +古城 3 +可惜 3 +台中 3 +史上 3 +史密 3 +各州 3 +各省 3 +吉布 3 +同名 3 +同性 3 +同情 3 +名利 3 +名譽 3 +告訴 3 +周邊 3 +呼聲 3 +和也 3 +和約 3 +品種 3 +哥哥 3 +哥特 3 +哥羅 3 +哺乳 3 +喜愛 3 +喬伊 3 +單一 3 +嘉賓 3 +器官 3 +噴泉 3 +嚴格 3 +四世 3 +回應 3 +回歸 3 +國泰 3 +國籍 3 +國軍 3 +圍繞 3 +園區 3 +土壤 3 +在任 3 +在場 3 +地中 3 +地勢 3 +地獄 3 +地理 3 +地阿 3 +報導 3 +場合 3 +塑造 3 +塞普 3 +塞隆 3 +填充 3 +填海 3 +填補 3 +境地 3 +墜毀 3 +士官 3 +壯觀 3 +夏天 3 +夕法 3 +外來 3 +外界 3 +外部 3 +多倫 3 +多瓦 3 +多達 3 +大夫 3 +大家 3 +大師 3 +大拿 3 +大橋 3 +大權 3 +大致 3 +大街 3 +大賽 3 +大選 3 +天國 3 +天王 3 +天空 3 +太小 3 +太郎 3 +失業 3 +奇異 3 +奈米 3 +契約 3 +奪取 3 +女巫 3 +女王 3 +女神 3 +好評 3 +如何 3 +妨礙 3 +委派 3 +委託 3 +威力 3 +威尼 3 +威格 3 +娃娃 3 +嫁給 3 +嫌疑 3 +嬌嬌 3 +子女 3 +孟席 3 +孫子 3 +安徽 3 +安東 3 +安納 3 +安置 3 +宋朝 3 +完備 3 +完善 3 +完工 3 +完美 3 +官僚 3 +宣傳 3 +宣告 3 +室內 3 +宰相 3 +家寶 3 +家裡 3 +家鄉 3 +密斯 3 +富有 3 +富江 3 +寒冷 3 +實在 3 +實施 3 +寶石 3 +封閉 3 +射入 3 +射擊 3 +專用 3 +對方 3 +對比 3 +導航 3 +小吃 3 +小堂 3 +小孩 3 +小平 3 +就算 3 +尼山 3 +尼爾 3 +尾貓 3 +局面 3 +屋大 3 +屋邨 3 +州長 3 +已婚 3 +已知 3 +布庫 3 +布斯 3 +布爾 3 +布羅 3 +布袋 3 +布賴 3 +希爾 3 +師傅 3 +席斯 3 +帶到 3 +帶走 3 +帶頭 3 +帽子 3 +平台 3 +平民 3 +平衡 3 +年間 3 +幸福 3 +幹線 3 +幾何 3 +序列 3 +底層 3 +度過 3 +庫斯 3 +康乃 3 +延續 3 +延長 3 +建業 3 +弓毛 3 +引入 3 +引力 3 +引用 3 +弟子 3 +弱小 3 +強制 3 +強壯 3 +彈簧 3 +彰化 3 +影視 3 +往來 3 +往後 3 +征服 3 +待遇 3 +很好 3 +很高 3 +後人 3 +後方 3 +後衛 3 +後面 3 +徒步 3 +復工 3 +復辟 3 +徵收 3 +德勒 3 +德川 3 +德意 3 +德烈 3 +德綱 3 +徹底 3 +心情 3 +必然 3 +必要 3 +忽略 3 +思潮 3 +怡和 3 +急速 3 +性格 3 +怪物 3 +怪獸 3 +恩來 3 +悠久 3 +情書 3 +想到 3 +想法 3 +愛上 3 +愛國 3 +愛達 3 +感應 3 +慢慢 3 +憑藉 3 +憤怒 3 +懷疑 3 +懸崖 3 +成份 3 +成千 3 +成果 3 +戒毒 3 +截止 3 +戰俘 3 +戰時 3 +戰艦 3 +戰術 3 +戲劇 3 +房地 3 +房子 3 +手下 3 +扎維 3 +托克 3 +扭曲 3 +扶手 3 +承受 3 +承擔 3 +投手 3 +抗戰 3 +抵擋 3 +拆除 3 +拉丁 3 +拯救 3 +指令 3 +挑戰 3 +捐助 3 +捕捉 3 +捷克 3 +排出 3 +探討 3 +接任 3 +接唱 3 +提議 3 +換乘 3 +損失 3 +損害 3 +搬到 3 +摩爾 3 +撞擊 3 +播映 3 +撰寫 3 +擔當 3 +據說 3 +擴充 3 +支付 3 +支撐 3 +支流 3 +收回 3 +收拾 3 +收購 3 +改制 3 +改善 3 +改稱 3 +改進 3 +攻打 3 +放射 3 +故障 3 +敘述 3 +教科 3 +教養 3 +文人 3 +文件 3 +料理 3 +斯克 3 +斯德 3 +斯拉 3 +斯曼 3 +斯氏 3 +新增 3 +新建 3 +新教 3 +新村 3 +新羅 3 +方呎 3 +旁遮 3 +族群 3 +日內 3 +日後 3 +日間 3 +明星 3 +明珠 3 +明納 3 +昏迷 3 +星光 3 +星際 3 +星雲 3 +映射 3 +春日 3 +春藤 3 +昭和 3 +時機 3 +時空 3 +晚年 3 +晨興 3 +普勒 3 +普斯 3 +普選 3 +景德 3 +景點 3 +晶體 3 +暫時 3 +暴動 3 +更為 3 +書店 3 +曼聯 3 +替代 3 +替換 3 +最佳 3 +最為 3 +會堂 3 +月氏 3 +月球 3 +有利 3 +有機 3 +有權 3 +有趣 3 +服役 3 +服用 3 +朗索 3 +朝日 3 +期望 3 +木板 3 +本來 3 +村民 3 +東側 3 +東尼 3 +東港 3 +東面 3 +松鼠 3 +板塊 3 +柏立 3 +某個 3 +柯爾 3 +栃木 3 +校名 3 +核電 3 +根廷 3 +格式 3 +格納 3 +栽培 3 +栽種 3 +桃園 3 +桃浦 3 +梅妃 3 +梅莉 3 +條例 3 +極度 3 +極端 3 +概念 3 +概率 3 +榮聲 3 +槍手 3 +樂曲 3 +樂章 3 +模擬 3 +機率 3 +檢察 3 +檸檬 3 +權力 3 +權勢 3 +權益 3 +次子 3 +次日 3 +次郎 3 +歌劇 3 +正義 3 +正選 3 +步槍 3 +步道 3 +死傷 3 +死去 3 +毀滅 3 +比起 3 +民進 3 +氣壓 3 +氣泡 3 +氧氣 3 +水上 3 +水域 3 +水塔 3 +水族 3 +水溝 3 +水稻 3 +永遠 3 +求救 3 +江南 3 +江孜 3 +污水 3 +決議 3 +沒收 3 +沙烏 3 +油脂 3 +沼澤 3 +沿著 3 +法人 3 +法學 3 +法官 3 +法尼 3 +波動 3 +波塞 3 +波士 3 +波特 3 +波長 3 +泰國 3 +洋房 3 +洋行 3 +洗浴 3 +洛克 3 +洛夫 3 +洛維 3 +活佛 3 +活力 3 +流動 3 +流量 3 +海上 3 +海珊 3 +消滅 3 +淋巴 3 +淘汰 3 +淡水 3 +清代 3 +港口 3 +湖水 3 +湯瑪 3 +準則 3 +溥儀 3 +溫帶 3 +溶解 3 +滑冰 3 +漂亮 3 +漢城 3 +漳州 3 +潛入 3 +澤西 3 +火箭 3 +災難 3 +為期 3 +烏地 3 +無數 3 +煙草 3 +照相 3 +煩惱 3 +熱庫 3 +熱能 3 +爬行 3 +爾卑 3 +爾士 3 +爾多 3 +爾夫 3 +爾扎 3 +爾維 3 +爾遜 3 +爾馬 3 +牆壁 3 +牛津 3 +物資 3 +特內 3 +特化 3 +特斯 3 +特里 3 +狹窄 3 +獎學 3 +獎項 3 +獲利 3 +獲取 3 +獵食 3 +獻給 3 +率領 3 +王室 3 +珠海 3 +班納 3 +現任 3 +球場 3 +理解 3 +瑞草 3 +瑪斯 3 +瑪莉 3 +生前 3 +生成 3 +生殖 3 +產下 3 +用品 3 +用戶 3 +用法 3 +用途 3 +男女 3 +男孩 3 +畫作 3 +異常 3 +當今 3 +當作 3 +當初 3 +疑問 3 +病故 3 +瘋狂 3 +登上 3 +登基 3 +登堡 3 +登場 3 +登陸 3 +發揮 3 +發源 3 +白人 3 +白金 3 +百科 3 +皇室 3 +盟友 3 +盟旗 3 +監管 3 +目錄 3 +直系 3 +直線 3 +直選 3 +相反 3 +相機 3 +相遇 3 +真人 3 +真武 3 +眼睛 3 +睡蓮 3 +睦斯 3 +瞭解 3 +矚目 3 +知情 3 +短尾 3 +短短 3 +破曉 3 +破產 3 +硬體 3 +碎片 3 +碳化 3 +確保 3 +確立 3 +社群 3 +神父 3 +票價 3 +福尼 3 +福島 3 +禮儀 3 +禮拜 3 +科系 3 +科隆 3 +租借 3 +租賃 3 +種種 3 +積極 3 +窯瓷 3 +立基 3 +立陶 3 +童話 3 +競選 3 +竹子 3 +第八 3 +管道 3 +節慶 3 +節省 3 +米利 3 +精通 3 +精選 3 +糧食 3 +約定 3 +約瑟 3 +紅磡 3 +紅色 3 +納努 3 +納斯 3 +納蒂 3 +紛爭 3 +素貞 3 +紡織 3 +細小 3 +終點 3 +組建 3 +組裝 3 +結成 3 +維京 3 +維基 3 +維耶 3 +編劇 3 +總共 3 +總數 3 +總結 3 +總體 3 +繪製 3 +繼位 3 +纖維 3 +缺席 3 +缺點 3 +置富 3 +羅萊 3 +羊肉 3 +美利 3 +義勇 3 +老鼠 3 +考古 3 +考驗 3 +而非 3 +耶穌 3 +聖地 3 +聖誕 3 +聖靈 3 +聘請 3 +聚會 3 +聯手 3 +聯軍 3 +聲勢 3 +職位 3 +股價 3 +股票 3 +胡安 3 +自傳 3 +自我 3 +自稱 3 +自身 3 +自願 3 +至少 3 +致力 3 +致命 3 +臺南 3 +臼齒 3 +舞蹈 3 +航海 3 +航程 3 +船員 3 +艾女 3 +艾滋 3 +芭比 3 +英九 3 +茶葉 3 +草食 3 +莉迪 3 +莫爾 3 +華格 3 +華麗 3 +菲利 3 +菲爾 3 +菲特 3 +萊特 3 +萊納 3 +著稱 3 +蒂亞 3 +蒂克 3 +蒙大 3 +蒸汽 3 +蓬勃 3 +薩魯 3 +藍色 3 +蘇州 3 +蘇爾 3 +蘇維 3 +蘇里 3 +虎鯨 3 +衍生 3 +衙門 3 +衛視 3 +衝擊 3 +衣服 3 +裁判 3 +補給 3 +裝飾 3 +複合 3 +複製 3 +西安 3 +西湖 3 +西關 3 +西面 3 +見到 3 +規格 3 +視頻 3 +親自 3 +計畫 3 +記憶 3 +評估 3 +評審 3 +該寺 3 +該書 3 +該校 3 +該站 3 +該鎮 3 +誠實 3 +誤認 3 +說明 3 +課室 3 +諷刺 3 +諸多 3 +謀殺 3 +證實 3 +識別 3 +護照 3 +讀書 3 +變形 3 +變數 3 +變體 3 +讚賞 3 +貝克 3 +貴妃 3 +貴州 3 +買家 3 +費德 3 +費雪 3 +資助 3 +賓夕 3 +賦予 3 +走廊 3 +起訴 3 +足協 3 +路徑 3 +身邊 3 +車體 3 +較大 3 +較長 3 +輔助 3 +輔導 3 +輔政 3 +輸出 3 +轄區 3 +轉乘 3 +轉到 3 +轉投 3 +轉讓 3 +近期 3 +迪斯 3 +迫使 3 +追逐 3 +退休 3 +逃離 3 +逐步 3 +通信 3 +通用 3 +通行 3 +速食 3 +連環 3 +連線 3 +逮捕 3 +進出 3 +遇見 3 +遊樂 3 +運營 3 +過渡 3 +道光 3 +達也 3 +達姆 3 +達荷 3 +違法 3 +遠航 3 +適應 3 +遮普 3 +遷徙 3 +選手 3 +選秀 3 +遺體 3 +邊界 3 +那裏 3 +邦聯 3 +部件 3 +都市 3 +鄉議 3 +配樂 3 +配置 3 +酗酒 3 +釀酒 3 +釋放 3 +里米 3 +重傷 3 +重整 3 +金山 3 +金庫 3 +金庸 3 +金鐘 3 +錄音 3 +鐵人 3 +長安 3 +長州 3 +長相 3 +長遠 3 +開播 3 +關節 3 +阿兒 3 +阿根 3 +附屬 3 +降落 3 +陣營 3 +除外 3 +陶宛 3 +陸地 3 +陸續 3 +隱藏 3 +隱語 3 +雅典 3 +雌雄 3 +雙立 3 +雜技 3 +難度 3 +雪梨 3 +雪莉 3 +零售 3 +雷克 3 +雷爾 3 +雷睦 3 +霍普 3 +靈頓 3 +青島 3 +鞏固 3 +音頻 3 +頂層 3 +順利 3 +預先 3 +頭銜 3 +頻譜 3 +題寫 3 +飲食 3 +飾演 3 +首任 3 +首演 3 +馬德 3 +驅動 3 +驅逐 3 +體操 3 +高原 3 +高層 3 +高山 3 +高麗 3 +魯曼 3 +鳳山 3 +鹿兒 3 +麥爾 3 +麥田 3 +黃金 3 +黑子 3 +黑斑 3 +黑洞 3 +黛比 3 +黨員 3 +龍馬 3 +$9 2 +%- 2 +(x 2 +.s 2 +/h 2 +9C 2 +9D 2 +9° 2 +=9 2 +AB 2 +AE 2 +AI 2 +Ab 2 +Ai 2 +Au 2 +BB 2 +BE 2 +BT 2 +C- 2 +CA 2 +CE 2 +CH 2 +CI 2 +CP 2 +DJ 2 +DN 2 +EC 2 +En 2 +Ep 2 +Eu 2 +Ev 2 +F- 2 +Fa 2 +GB 2 +GC 2 +Gi 2 +Hi 2 +Hu 2 +IG 2 +IS 2 +IV 2 +Is 2 +Ju 2 +Ke 2 +Ku 2 +MG 2 +MO 2 +My 2 +ND 2 +OC 2 +Om 2 +Or 2 +PL 2 +PV 2 +RB 2 +RI 2 +RO 2 +Ru 2 +SA 2 +SB 2 +Sc 2 +TF 2 +TO 2 +Ti 2 +Tr 2 +Tu 2 +Tw 2 +U9 2 +UA 2 +US 2 +VB 2 +VC 2 +VV 2 +Va 2 +Wa 2 +Wy 2 +XI 2 +ae 2 +af 2 +ao 2 +bl 2 +bo 2 +br 2 +bu 2 +by 2 +ci 2 +cl 2 +dg 2 +dw 2 +eg 2 +ek 2 +ep 2 +eu 2 +fa 2 +ff 2 +gy 2 +hm 2 +hy 2 +ib 2 +if 2 +ii 2 +iu 2 +kh 2 +ks 2 +lk 2 +lm 2 +lv 2 +m/ 2 +nm 2 +nr 2 +o- 2 +oi 2 +pu 2 +rb 2 +rh 2 +rv 2 +rw 2 +ry 2 +sm 2 +sp 2 +sy 2 +tl 2 +tz 2 +ug 2 +ui 2 +wh 2 +x) 2 +ya 2 +ye 2 +ys 2 +yt 2 +zi 2 +一共 2 +一千 2 +一向 2 +一度 2 +一手 2 +一提 2 +一貫 2 +一面 2 +丁尼 2 +七十 2 +七喜 2 +三中 2 +三井 2 +三棟 2 +三藏 2 +上任 2 +上佳 2 +上加 2 +上午 2 +上吊 2 +上將 2 +上層 2 +上方 2 +上校 2 +上街 2 +下去 2 +下層 2 +下屬 2 +下旬 2 +下水 2 +下海 2 +下游 2 +下野 2 +不一 2 +不停 2 +不再 2 +不列 2 +不受 2 +不夠 2 +不如 2 +不宜 2 +不已 2 +不幸 2 +不法 2 +不清 2 +不用 2 +不管 2 +不聊 2 +不良 2 +不論 2 +不變 2 +不錯 2 +不需 2 +不願 2 +丐幫 2 +世俗 2 +世博 2 +世卿 2 +世家 2 +世民 2 +世襲 2 +世錦 2 +世音 2 +丘陵 2 +中區 2 +中午 2 +中南 2 +中天 2 +中巴 2 +中正 2 +中途 2 +中道 2 +中遠 2 +主上 2 +主力 2 +主因 2 +主場 2 +主權 2 +主管 2 +主線 2 +乘船 2 +乘車 2 +乙級 2 +九州 2 +九巴 2 +也有 2 +也許 2 +乳酪 2 +事後 2 +二甘 2 +二郎 2 +互動 2 +五四 2 +五峰 2 +亞伯 2 +亞得 2 +亞德 2 +亞特 2 +亞韋 2 +交互 2 +交到 2 +交匯 2 +交好 2 +交情 2 +交戰 2 +交手 2 +交趾 2 +人力 2 +人心 2 +人才 2 +人文 2 +人格 2 +人熙 2 +人群 2 +人間 2 +人魚 2 +仁慈 2 +仁記 2 +今年 2 +介乎 2 +介石 2 +仍舊 2 +付款 2 +仙劍 2 +以南 2 +以東 2 +以至 2 +任城 2 +任天 2 +任意 2 +份額 2 +伊利 2 +伊比 2 +伊瓦 2 +伏威 2 +休閒 2 +伯公 2 +伯恩 2 +伯爵 2 +伯納 2 +伯靈 2 +伴隨 2 +似乎 2 +低地 2 +低廉 2 +低溫 2 +住房 2 +佐土 2 +佐夫 2 +佐藤 2 +佛山 2 +佛朗 2 +佛殿 2 +作好 2 +作業 2 +作物 2 +佩劍 2 +併入 2 +使命 2 +使者 2 +使館 2 +來訪 2 +例外 2 +供暖 2 +供熱 2 +供職 2 +依託 2 +侵入 2 +侵犯 2 +便宜 2 +促使 2 +促成 2 +俄明 2 +俗成 2 +俗稱 2 +保有 2 +保機 2 +保級 2 +保衛 2 +信心 2 +信義 2 +信長 2 +信雄 2 +修士 2 +修理 2 +修羅 2 +修習 2 +修訂 2 +修鍊 2 +個案 2 +倒台 2 +倒掛 2 +候鳥 2 +倡導 2 +倫多 2 +假如 2 +假期 2 +假髮 2 +偏差 2 +停戰 2 +停滯 2 +偶然 2 +偶爾 2 +偽造 2 +傑作 2 +催化 2 +傳入 2 +傳到 2 +傳動 2 +傳媒 2 +傳導 2 +傳授 2 +傳聞 2 +傳言 2 +傳送 2 +傳達 2 +債務 2 +傾聽 2 +僅僅 2 +僱員 2 +儀錶 2 +儒家 2 +優先 2 +儲備 2 +元代 2 +元件 2 +元帥 2 +元年 2 +元洲 2 +元璋 2 +元甲 2 +元首 2 +充斥 2 +充當 2 +兆帕 2 +先知 2 +先行 2 +先驅 2 +光線 2 +光譜 2 +光軸 2 +克基 2 +克塞 2 +克尼 2 +克思 2 +克托 2 +克林 2 +克果 2 +克洛 2 +克爾 2 +克用 2 +克西 2 +克隆 2 +克魯 2 +免職 2 +入伍 2 +入圍 2 +入學 2 +入獄 2 +入讀 2 +入門 2 +內務 2 +內外 2 +內心 2 +內流 2 +內爾 2 +全新 2 +全日 2 +全校 2 +全權 2 +全能 2 +全身 2 +兩千 2 +八一 2 +公學 2 +公寓 2 +公署 2 +公認 2 +六七 2 +六千 2 +兵營 2 +其父 2 +其頓 2 +具備 2 +典禮 2 +再造 2 +冠龍 2 +冬季 2 +冰兄 2 +冰峰 2 +冰川 2 +冰雪 2 +凡爾 2 +凱恩 2 +凱特 2 +凱瑞 2 +出家 2 +出席 2 +出演 2 +出產 2 +出賽 2 +出道 2 +分享 2 +分化 2 +分區 2 +分手 2 +分擔 2 +分歧 2 +分隊 2 +切爾 2 +刊載 2 +列佐 2 +列傳 2 +列出 2 +列斯 2 +列顛 2 +初學 2 +初年 2 +初稿 2 +初級 2 +初賽 2 +判斷 2 +判處 2 +別列 2 +利比 2 +利爾 2 +利物 2 +到來 2 +到底 2 +制止 2 +制裁 2 +制訂 2 +刺客 2 +刺死 2 +刻有 2 +則布 2 +削弱 2 +前任 2 +前來 2 +前妻 2 +前途 2 +剝奪 2 +剩下 2 +副本 2 +創始 2 +創新 2 +創業 2 +劃入 2 +劃給 2 +劇烈 2 +劍術 2 +劍齒 2 +力克 2 +功率 2 +加之 2 +加勒 2 +加堆 2 +加斯 2 +加重 2 +劣勢 2 +助戰 2 +勇為 2 +勒拿 2 +勒比 2 +勒沃 2 +勒謝 2 +動一 2 +動機 2 +動漫 2 +動脈 2 +動車 2 +勝出 2 +勳章 2 +勳銜 2 +勾引 2 +包圍 2 +包廂 2 +包衣 2 +匕首 2 +化氫 2 +化纖 2 +化身 2 +化金 2 +化銠 2 +化鋁 2 +北宋 2 +北平 2 +北方 2 +北端 2 +北約 2 +北道 2 +北齊 2 +匯率 2 +匹克 2 +區分 2 +千9 2 +升學 2 +半山 2 +半球 2 +協商 2 +協奏 2 +協約 2 +南下 2 +南安 2 +南山 2 +南斯 2 +南遣 2 +南邊 2 +南陽 2 +南非 2 +南面 2 +南韓 2 +博弈 2 +博彩 2 +占卜 2 +卡夫 2 +卡普 2 +卡梅 2 +卡片 2 +卡特 2 +卡薩 2 +卡達 2 +印加 2 +印尼 2 +印製 2 +即位 2 +即時 2 +即興 2 +卿雲 2 +厄運 2 +原名 2 +原址 2 +原聲 2 +去除 2 +參觀 2 +參選 2 +又是 2 +又稱 2 +及格 2 +友好 2 +反叛 2 +反抗 2 +反擊 2 +叔叔 2 +取決 2 +受審 2 +受益 2 +受體 2 +口中 2 +口述 2 +古屋 2 +古巴 2 +古德 2 +古拉 2 +古斯 2 +古柯 2 +古蹟 2 +召喚 2 +可汗 2 +史學 2 +史提 2 +史蒂 2 +右岸 2 +司機 2 +司長 2 +司鼓 2 +吃肉 2 +吃飯 2 +各樣 2 +各級 2 +各自 2 +各部 2 +合同 2 +合川 2 +合稱 2 +合葬 2 +吉奧 2 +吉林 2 +吉里 2 +同人 2 +同居 2 +同工 2 +同體 2 +名人 2 +名古 2 +名縉 2 +名鎮 2 +向量 2 +君王 2 +吞併 2 +否則 2 +否定 2 +否是 2 +告別 2 +告知 2 +告終 2 +周歲 2 +呼叫 2 +呼籲 2 +和華 2 +和解 2 +和談 2 +咬金 2 +品行 2 +哈林 2 +哈根 2 +哈歐 2 +哥斯 2 +哥本 2 +哪裡 2 +售賣 2 +唯有 2 +唯美 2 +啟動 2 +啟睿 2 +啟航 2 +啟蒙 2 +喀則 2 +善化 2 +善意 2 +喉嚨 2 +喜劇 2 +喪生 2 +喬艾 2 +單元 2 +單曲 2 +嘉慶 2 +嘉木 2 +嘉玲 2 +器物 2 +噪音 2 +噴氣 2 +嚴密 2 +囚禁 2 +四分 2 +回國 2 +回想 2 +回憶 2 +回收 2 +回生 2 +因斯 2 +固醇 2 +國代 2 +國外 2 +國寶 2 +國徽 2 +國璋 2 +國語 2 +國鋒 2 +圍攻 2 +園藝 2 +圓頂 2 +圖樣 2 +圖爾 2 +圖畫 2 +團結 2 +團聚 2 +團長 2 +土原 2 +在位 2 +在來 2 +地外 2 +地帶 2 +坐診 2 +型態 2 +埃蒙 2 +城中 2 +城子 2 +域名 2 +域治 2 +執導 2 +執掌 2 +執教 2 +執法 2 +基勒 2 +基底 2 +基拉 2 +堂區 2 +堅固 2 +堅強 2 +報紙 2 +場場 2 +塔尼 2 +塔爾 2 +塞克 2 +塞塔 2 +塞摩 2 +塞琉 2 +塞羅 2 +境外 2 +墓地 2 +墓室 2 +增多 2 +增建 2 +增強 2 +增設 2 +墮胎 2 +壓倒 2 +壓強 2 +壓迫 2 +士蘭 2 +壯大 2 +壯年 2 +夏伊 2 +夏季 2 +夏茸 2 +外傳 2 +外圍 2 +外在 2 +外援 2 +外甥 2 +外觀 2 +外資 2 +多半 2 +多少 2 +夜晚 2 +夥伴 2 +大亂 2 +大事 2 +大佛 2 +大公 2 +大力 2 +大勝 2 +大半 2 +大堂 2 +大妃 2 +大將 2 +大屋 2 +大廳 2 +大批 2 +大敗 2 +大林 2 +大槍 2 +大火 2 +大碟 2 +大笨 2 +大維 2 +大衛 2 +大連 2 +大阪 2 +大黎 2 +天地 2 +天堂 2 +天子 2 +天師 2 +天敵 2 +天日 2 +天氣 2 +天衣 2 +天雷 2 +太古 2 +太多 2 +太大 2 +太子 2 +太守 2 +太極 2 +太洛 2 +太祖 2 +夫卡 2 +夸脫 2 +奉天 2 +契合 2 +奢侈 2 +奧克 2 +奧多 2 +奧布 2 +奧朗 2 +奧特 2 +奪冠 2 +女士 2 +女孩 2 +女皇 2 +妖精 2 +妖魔 2 +妥善 2 +姊妹 2 +始皇 2 +姐妹 2 +姐弟 2 +姑家 2 +姓名 2 +姿態 2 +娘舅 2 +婆婆 2 +嫉妒 2 +子夜 2 +子珍 2 +孔子 2 +字元 2 +字型 2 +字體 2 +存有 2 +存活 2 +孟能 2 +季前 2 +季軍 2 +孤僻 2 +孤獨 2 +孵化 2 +學制 2 +學問 2 +學士 2 +學年 2 +學期 2 +學童 2 +學系 2 +學費 2 +宇一 2 +守衛 2 +安修 2 +安得 2 +安息 2 +安打 2 +安菲 2 +安邑 2 +完整 2 +宏觀 2 +宗室 2 +官職 2 +定下 2 +定俗 2 +定名 2 +定型 2 +定期 2 +客串 2 +客人 2 +客室 2 +客機 2 +客車 2 +宣戰 2 +害怕 2 +家久 2 +家堡 2 +家境 2 +家屬 2 +家產 2 +家衛 2 +家貓 2 +寄宿 2 +密切 2 +密蘇 2 +富人 2 +富特 2 +實力 2 +實務 2 +實用 2 +實習 2 +審判 2 +審查 2 +寫道 2 +寬廣 2 +寬頻 2 +寬鬆 2 +寺廟 2 +寺院 2 +封神 2 +封面 2 +射殺 2 +專區 2 +專員 2 +專有 2 +專題 2 +尊嚴 2 +尊重 2 +尋常 2 +對峙 2 +對待 2 +對陣 2 +導體 2 +小兒 2 +小姐 2 +小心 2 +小桃 2 +小梅 2 +小鎮 2 +小閻 2 +小青 2 +就任 2 +就業 2 +尼古 2 +尼奧 2 +尼納 2 +尼羅 2 +尼采 2 +尾部 2 +局限 2 +居委 2 +居里 2 +屋苑 2 +展館 2 +履仁 2 +屬名 2 +山丘 2 +山坡 2 +山海 2 +岩石 2 +岳母 2 +岳父 2 +崇拜 2 +崔西 2 +嶺南 2 +嶽麓 2 +工兵 2 +工商 2 +工農 2 +工黨 2 +左上 2 +左側 2 +巧眉 2 +巧言 2 +差距 2 +差點 2 +巴列 2 +巴勒 2 +巴拉 2 +巴比 2 +巴洛 2 +巴特 2 +市值 2 +市內 2 +市商 2 +市郊 2 +市長 2 +布卡 2 +布希 2 +布朗 2 +布萊 2 +布蘭 2 +布雷 2 +希羅 2 +帕克 2 +帕斯 2 +帕洛 2 +帕納 2 +帛琉 2 +帶去 2 +帶有 2 +常務 2 +常年 2 +常德 2 +常規 2 +幫忙 2 +干擾 2 +干涉 2 +干預 2 +平安 2 +平息 2 +平成 2 +平時 2 +平頂 2 +年初 2 +年科 2 +年紀 2 +年譜 2 +幼體 2 +幾十 2 +床墊 2 +序數 2 +店鋪 2 +度母 2 +座堂 2 +庫伊 2 +庫夫 2 +庫容 2 +庫爾 2 +康復 2 +康辛 2 +廉租 2 +廠房 2 +廢墟 2 +廢止 2 +廣大 2 +廣安 2 +廣義 2 +延任 2 +延遲 2 +建御 2 +建有 2 +建銘 2 +式各 2 +引種 2 +引退 2 +引進 2 +強風 2 +彈奏 2 +彈性 2 +彌迦 2 +彙集 2 +彩色 2 +影展 2 +往返 2 +征戰 2 +很近 2 +後端 2 +後裔 2 +徒刑 2 +得拉 2 +得票 2 +得道 2 +得里 2 +從小 2 +從而 2 +從軍 2 +御苑 2 +微山 2 +微粒 2 +德州 2 +德特 2 +德瑞 2 +德羅 2 +德華 2 +德輔 2 +德魯 2 +徹斯 2 +心中 2 +必烈 2 +志剛 2 +快樂 2 +忽必 2 +思念 2 +思明 2 +思科 2 +性交 2 +恆鳳 2 +恐慌 2 +恥辱 2 +恩寵 2 +恩賜 2 +悅強 2 +悲痛 2 +悲觀 2 +情意 2 +惠山 2 +愉景 2 +意料 2 +意願 2 +愛因 2 +愛娜 2 +愛德 2 +愛惜 2 +感動 2 +感受 2 +感染 2 +慈幼 2 +慈鯛 2 +慘敗 2 +慣例 2 +慶尚 2 +慶豐 2 +慾望 2 +憎恨 2 +應對 2 +懊惱 2 +懷俄 2 +懷舊 2 +懸浮 2 +懸索 2 +成仙 2 +成傑 2 +成因 2 +成型 2 +成就 2 +成群 2 +成貓 2 +戰亂 2 +戰列 2 +戰場 2 +戰士 2 +戰線 2 +戴麟 2 +手冊 2 +手動 2 +手可 2 +手機 2 +手裡 2 +才能 2 +打工 2 +打敗 2 +打破 2 +打開 2 +托斯 2 +扶植 2 +找出 2 +找回 2 +找尋 2 +承諾 2 +抄襲 2 +抓住 2 +投影 2 +投降 2 +抗擊 2 +抽取 2 +拆穿 2 +拆解 2 +拉姆 2 +拉底 2 +拉德 2 +拉斐 2 +拉松 2 +拉爾 2 +拉瓦 2 +拉西 2 +拉邦 2 +拔出 2 +拖延 2 +招商 2 +招股 2 +拷貝 2 +拼音 2 +拿到 2 +拿破 2 +拿走 2 +指引 2 +指控 2 +指涉 2 +按鍵 2 +挖角 2 +挽救 2 +捐贈 2 +捕獲 2 +捕食 2 +捷運 2 +排放 2 +排氣 2 +排演 2 +掛架 2 +掠過 2 +掠食 2 +採訪 2 +接待 2 +接掌 2 +接種 2 +接駁 2 +控球 2 +推廣 2 +推翻 2 +推選 2 +描寫 2 +提倡 2 +提及 2 +提夫 2 +提示 2 +插圖 2 +揚聲 2 +換入 2 +換股 2 +損傷 2 +損毀 2 +搞笑 2 +搭檔 2 +搶險 2 +摩斯 2 +摩根 2 +撤出 2 +撤軍 2 +播客 2 +擅長 2 +擊退 2 +擒抱 2 +擔負 2 +據守 2 +擺脫 2 +擾動 2 +支出 2 +支柱 2 +收復 2 +收發 2 +收穫 2 +收視 2 +收集 2 +改回 2 +改寫 2 +改建 2 +改版 2 +改良 2 +改裝 2 +攻佔 2 +攻陷 2 +放映 2 +放置 2 +政協 2 +政變 2 +政黨 2 +故意 2 +故此 2 +故鄉 2 +效忠 2 +效率 2 +敏感 2 +敗給 2 +教友 2 +教員 2 +教徒 2 +教派 2 +整修 2 +整套 2 +整體 2 +敵對 2 +數位 2 +數理 2 +數目 2 +文元 2 +文官 2 +文帝 2 +文康 2 +文英 2 +文華 2 +文革 2 +文體 2 +斐濟 2 +斐爾 2 +斥資 2 +斯丁 2 +斯卡 2 +斯圖 2 +斯多 2 +斯庫 2 +斯康 2 +斯提 2 +斯泰 2 +斯爾 2 +斯理 2 +斯蒂 2 +新世 2 +新宿 2 +新岩 2 +新曲 2 +新澤 2 +新田 2 +新罕 2 +新興 2 +斷裂 2 +方位 2 +方尺 2 +方針 2 +施行 2 +旁邊 2 +旅鴿 2 +旋律 2 +日喀 2 +日益 2 +日航 2 +日行 2 +早上 2 +旺山 2 +旺盛 2 +昆士 2 +明哥 2 +明帝 2 +明頓 2 +易名 2 +昔日 2 +星形 2 +星蟒 2 +春天 2 +時尚 2 +時速 2 +晉升 2 +晚上 2 +晚會 2 +普及 2 +普拉 2 +普朗 2 +普爾 2 +普陀 2 +普頓 2 +景帝 2 +景觀 2 +景象 2 +智慧 2 +暑假 2 +暗殺 2 +暢銷 2 +暫停 2 +暫緩 2 +暴露 2 +曝氣 2 +更好 2 +更改 2 +更深 2 +更高 2 +書信 2 +書寫 2 +書房 2 +書法 2 +曼尼 2 +最久 2 +最小 2 +最少 2 +最新 2 +最遊 2 +會員 2 +會場 2 +會社 2 +會談 2 +會長 2 +月刊 2 +有助 2 +有意 2 +有毒 2 +有罪 2 +服從 2 +朔日 2 +朗克 2 +朗則 2 +朗明 2 +朝代 2 +木星 2 +木犀 2 +木管 2 +末年 2 +末期 2 +本區 2 +本哈 2 +本屆 2 +本班 2 +本站 2 +本質 2 +本願 2 +本龍 2 +村落 2 +村頭 2 +束縛 2 +東亞 2 +東吳 2 +東山 2 +東征 2 +東晉 2 +東正 2 +東視 2 +松潘 2 +林庄 2 +林維 2 +果實 2 +果汁 2 +架設 2 +柏油 2 +染色 2 +柔佛 2 +柔弱 2 +查德 2 +柯鹼 2 +柳江 2 +柴油 2 +柴灣 2 +校內 2 +校隊 2 +核能 2 +根本 2 +格勞 2 +格斯 2 +格曼 2 +格格 2 +格檔 2 +格達 2 +格魯 2 +桃太 2 +桌面 2 +桑葚 2 +桑那 2 +梅塔 2 +梅隆 2 +棕熊 2 +棟屋 2 +植被 2 +楊樹 2 +業者 2 +極大 2 +極性 2 +極高 2 +榮獲 2 +樂農 2 +標語 2 +標題 2 +樞機 2 +樟湖 2 +模具 2 +樣本 2 +樹木 2 +機動 2 +機員 2 +機槍 2 +橡樹 2 +橡膠 2 +橫山 2 +橫濱 2 +橫跨 2 +檢索 2 +檢討 2 +權威 2 +權貴 2 +次數 2 +次級 2 +次要 2 +次長 2 +欺騙 2 +歇爾 2 +歌仔 2 +歌唱 2 +歌聲 2 +歌迷 2 +歐拉 2 +歐斯 2 +正是 2 +正直 2 +正統 2 +正面 2 +此人 2 +此案 2 +此物 2 +此種 2 +此線 2 +此舉 2 +此類 2 +步態 2 +武大 2 +武昌 2 +武松 2 +歧視 2 +歸類 2 +死靈 2 +殘存 2 +殘忍 2 +殘酷 2 +殯葬 2 +殺傷 2 +殺掉 2 +每位 2 +每周 2 +每層 2 +每日 2 +每次 2 +毒性 2 +毒殺 2 +毒藥 2 +比婭 2 +比施 2 +比西 2 +毗鄰 2 +毛利 2 +氏星 2 +民不 2 +民調 2 +民都 2 +氣田 2 +氨酸 2 +氫彈 2 +氯金 2 +水孔 2 +水手 2 +水準 2 +水溫 2 +水滸 2 +水牛 2 +水質 2 +水道 2 +水餃 2 +永嘉 2 +永寧 2 +永樂 2 +汗位 2 +汝霖 2 +江北 2 +江戶 2 +池尻 2 +汪達 2 +決心 2 +決戰 2 +沃夫 2 +沃思 2 +沖繩 2 +沙咀 2 +沙柏 2 +沙河 2 +沙龍 2 +河水 2 +泉州 2 +泊桑 2 +法拉 2 +法醫 2 +泡沫 2 +注射 2 +注重 2 +泰坦 2 +泰姬 2 +泰然 2 +洗手 2 +洛水 2 +洛辛 2 +洛馬 2 +活性 2 +派出 2 +派別 2 +派駐 2 +流傳 2 +流失 2 +流求 2 +流派 2 +流通 2 +浙東 2 +浩劫 2 +浮冰 2 +海亞 2 +海南 2 +海涌 2 +海豹 2 +海邊 2 +海關 2 +海默 2 +消化 2 +消失 2 +淄博 2 +淮河 2 +深厚 2 +深得 2 +深愛 2 +深遠 2 +淹沒 2 +添加 2 +清晨 2 +清楚 2 +清華 2 +清鍾 2 +減輕 2 +游牧 2 +湖州 2 +湯興 2 +溝通 2 +溪流 2 +溫和 2 +溫州 2 +溫布 2 +溫暖 2 +溫特 2 +滄州 2 +滅口 2 +滙豐 2 +滬東 2 +滿足 2 +漁業 2 +漂流 2 +演說 2 +漢佛 2 +漢口 2 +漸漸 2 +潭西 2 +潮州 2 +澤普 2 +澳底 2 +激光 2 +激戰 2 +激起 2 +濃縮 2 +濕原 2 +濕度 2 +濟寧 2 +濱松 2 +濱湖 2 +瀏覽 2 +灌木 2 +火藥 2 +灰狼 2 +灰色 2 +災害 2 +炮台 2 +為數 2 +烏孜 2 +烏孫 2 +烏扎 2 +烏斯 2 +無力 2 +無效 2 +無界 2 +無緣 2 +無辜 2 +無黨 2 +焦耳 2 +煙熏 2 +照料 2 +照顧 2 +煮制 2 +熊隻 2 +熱比 2 +熱衷 2 +燃燒 2 +燒毀 2 +燒餅 2 +燕山 2 +爪獸 2 +爭取 2 +爭執 2 +爭辯 2 +爭霸 2 +父子 2 +爾伯 2 +爾克 2 +爾加 2 +爾卡 2 +爾及 2 +爾幹 2 +爾庫 2 +爾後 2 +爾文 2 +爾滕 2 +爾登 2 +爾良 2 +爾茨 2 +爾薩 2 +爾西 2 +爾賽 2 +爾那 2 +牆體 2 +片段 2 +牙買 2 +牛仔 2 +牛肉 2 +牧師 2 +牧羊 2 +牧養 2 +物價 2 +物浦 2 +特丹 2 +特使 2 +特性 2 +特拉 2 +特權 2 +特烈 2 +特爾 2 +特種 2 +特維 2 +特羅 2 +特裡 2 +犀欖 2 +犬隻 2 +犬齒 2 +狐狸 2 +狸藻 2 +猛烈 2 +猛虎 2 +猶他 2 +猶豫 2 +獎章 2 +獎金 2 +獨居 2 +獨自 2 +獵奇 2 +獵殺 2 +玄機 2 +率軍 2 +玉帶 2 +玉門 2 +王位 2 +王妃 2 +玩具 2 +珀斯 2 +珀西 2 +珍品 2 +現址 2 +現狀 2 +球迷 2 +理念 2 +琳達 2 +琴行 2 +瑞克 2 +瑟夫 2 +瑪納 2 +環島 2 +環形 2 +環球 2 +環礁 2 +瓊璘 2 +瓜分 2 +瓦伊 2 +瓦拉 2 +瓦斯 2 +甘醇 2 +甚少 2 +甚麼 2 +甜甜 2 +生日 2 +生母 2 +生病 2 +產值 2 +產區 2 +產物 2 +用地 2 +用電 2 +由來 2 +由衷 2 +甲板 2 +甲醇 2 +申花 2 +男爵 2 +町村 2 +留存 2 +留學 2 +留意 2 +留香 2 +畫上 2 +畫報 2 +異性 2 +當事 2 +當代 2 +當前 2 +當場 2 +當成 2 +疫情 2 +病人 2 +病理 2 +痕迹 2 +登記 2 +登輝 2 +發售 2 +發回 2 +發掘 2 +發覺 2 +發音 2 +白紙 2 +白馬 2 +百多 2 +百度 2 +皇子 2 +皇宮 2 +皮埃 2 +皮特 2 +盆子 2 +益世 2 +盟校 2 +監察 2 +監製 2 +監視 2 +直人 2 +直布 2 +直轄 2 +直通 2 +相傳 2 +相戀 2 +相等 2 +相識 2 +相連 2 +省立 2 +看似 2 +看法 2 +真宗 2 +真情 2 +真的 2 +眼鏡 2 +眾人 2 +睡衣 2 +矛盾 2 +知節 2 +短面 2 +矮人 2 +石化 2 +石原 2 +石家 2 +砍柴 2 +研製 2 +研討 2 +破崙 2 +硫化 2 +硫磺 2 +碎石 2 +碧翠 2 +碩士 2 +確實 2 +磨損 2 +礦業 2 +示威 2 +社交 2 +祕教 2 +祖外 2 +神代 2 +祺瑞 2 +福來 2 +福利 2 +福部 2 +福音 2 +禮節 2 +禽龍 2 +秀全 2 +秀吉 2 +秋天 2 +科幻 2 +科文 2 +科特 2 +科羅 2 +科赫 2 +科雷 2 +秘魯 2 +租客 2 +移除 2 +稀有 2 +程式 2 +種姓 2 +稱呼 2 +稱臣 2 +稱讚 2 +稻盛 2 +穆爾 2 +穆罕 2 +積分 2 +空缺 2 +穿耳 2 +突出 2 +突厥 2 +突擊 2 +立下 2 +立憲 2 +立熙 2 +站台 2 +竟然 2 +竣工 2 +童星 2 +競技 2 +競馬 2 +笑話 2 +第九 2 +筆下 2 +等到 2 +等待 2 +策劃 2 +管弦 2 +管治 2 +節奏 2 +簡易 2 +簽署 2 +籃壇 2 +籃子 2 +籌建 2 +米拉 2 +米格 2 +米莉 2 +精度 2 +精武 2 +精液 2 +精緻 2 +精美 2 +精采 2 +糖份 2 +約克 2 +約會 2 +紅木 2 +紅樓 2 +紅麴 2 +納姆 2 +納爾 2 +納辛 2 +紐卡 2 +紓緩 2 +純淨 2 +純粹 2 +紙幣 2 +紛紛 2 +素質 2 +索引 2 +索瓦 2 +索菲 2 +細緻 2 +組長 2 +結晶 2 +結識 2 +絕望 2 +統稱 2 +絲綢 2 +經紀 2 +經費 2 +維修 2 +維克 2 +維利 2 +維奇 2 +維奧 2 +維尼 2 +維年 2 +維迪 2 +維鈞 2 +維魯 2 +網上 2 +網友 2 +網民 2 +緊鄰 2 +線粒 2 +線西 2 +編入 2 +編寫 2 +編製 2 +緩存 2 +緩慢 2 +緬因 2 +縣城 2 +縣治 2 +縣長 2 +縱橫 2 +縱貫 2 +總值 2 +總會 2 +總監 2 +總管 2 +總額 2 +繁忙 2 +繁榮 2 +繁殖 2 +繞城 2 +繪圖 2 +續篇 2 +續約 2 +罕布 2 +罕默 2 +罪案 2 +罪行 2 +署名 2 +署長 2 +罷黜 2 +罹患 2 +羅丹 2 +羅伊 2 +羅克 2 +羅塞 2 +羅夫 2 +羅希 2 +羅拉 2 +羅漢 2 +羅素 2 +羅貝 2 +羅那 2 +羅陀 2 +羊曲 2 +羊毛 2 +美女 2 +義和 2 +習性 2 +翠絲 2 +翻新 2 +翻越 2 +老年 2 +老式 2 +老舍 2 +考場 2 +考證 2 +耕作 2 +耕種 2 +耳他 2 +耳道 2 +耶和 2 +耶律 2 +耶魯 2 +聊生 2 +聖三 2 +聖赫 2 +聘任 2 +聚合 2 +聚居 2 +聯名 2 +聰明 2 +聲名 2 +聲望 2 +聲道 2 +職員 2 +肉糕 2 +肉食 2 +肖像 2 +肖金 2 +肝臟 2 +股權 2 +肯尼 2 +育才 2 +育種 2 +肺炎 2 +胎兒 2 +胖子 2 +能源 2 +能級 2 +腓特 2 +腳趾 2 +腹面 2 +腺葉 2 +膝蓋 2 +膠質 2 +臘汁 2 +臣民 2 +臨床 2 +臨淄 2 +臨近 2 +臨邑 2 +自主 2 +自助 2 +自家 2 +自殺 2 +自衛 2 +自轉 2 +臭氧 2 +至於 2 +至關 2 +致死 2 +臺中 2 +臺北 2 +興化 2 +舉人 2 +舉動 2 +舊址 2 +舒服 2 +舒適 2 +舞台 2 +船尾 2 +船廠 2 +船艦 2 +船長 2 +艦艇 2 +艱苦 2 +色度 2 +色素 2 +艾塞 2 +艾爾 2 +花卉 2 +花崗 2 +花樣 2 +花費 2 +苣苔 2 +若干 2 +若是 2 +苦惱 2 +苯乙 2 +英俊 2 +英超 2 +英雄 2 +茅斯 2 +茨威 2 +荷花 2 +莆田 2 +莉拉 2 +莎拉 2 +莎莉 2 +莫名 2 +莫泊 2 +莫雷 2 +莫高 2 +菁英 2 +菩薩 2 +華夏 2 +華隆 2 +菲德 2 +萄牙 2 +萊爾 2 +萬9 2 +萬宜 2 +萬春 2 +萬萬 2 +落入 2 +落差 2 +葉子 2 +葉木 2 +葉海 2 +葉片 2 +著想 2 +著迷 2 +葛馮 2 +葵盛 2 +蒂斯 2 +蒂羅 2 +蒂芬 2 +蒐集 2 +蒙哥 2 +蒙山 2 +蒙蔽 2 +蒸餾 2 +蓄電 2 +蓮屬 2 +蔬菜 2 +蔭權 2 +薩哈 2 +薩拉 2 +薩維 2 +薩達 2 +薪資 2 +藉口 2 +藉著 2 +藍調 2 +藍鯨 2 +藏在 2 +藝員 2 +藤葉 2 +蘇丹 2 +蘇黎 2 +蘭德 2 +蘭特 2 +蘭西 2 +蘭豬 2 +虎豹 2 +虐待 2 +虔誠 2 +處境 2 +蛇夫 2 +蛇類 2 +螺旋 2 +蠟燭 2 +蠻族 2 +血清 2 +血緣 2 +行李 2 +行程 2 +行車 2 +行駛 2 +術士 2 +街區 2 +衛冕 2 +衛戍 2 +表皮 2 +袋中 2 +裁定 2 +補充 2 +補助 2 +裝病 2 +裡忒 2 +製冷 2 +製片 2 +西元 2 +西區 2 +西奧 2 +西沙 2 +西甲 2 +西站 2 +西西 2 +西迪 2 +西鄰 2 +西鐵 2 +西雅 2 +要素 2 +要職 2 +見天 2 +見義 2 +見證 2 +規範 2 +視覺 2 +親密 2 +親屬 2 +親情 2 +親戚 2 +親緣 2 +親近 2 +觀世 2 +觀塘 2 +觀賞 2 +角宿 2 +角逐 2 +解鎖 2 +解體 2 +言論 2 +訂婚 2 +訂購 2 +討伐 2 +記號 2 +許可 2 +許諾 2 +訴說 2 +註冊 2 +評議 2 +評選 2 +詞彙 2 +詩篇 2 +詮釋 2 +話語 2 +該廟 2 +該車 2 +該館 2 +誕辰 2 +誘發 2 +語堂 2 +誤導 2 +說唱 2 +課題 2 +調動 2 +調料 2 +調景 2 +論壇 2 +諸侯 2 +諸塞 2 +諸葛 2 +諾夫 2 +諾斯 2 +諾貝 2 +謙虛 2 +謝尼 2 +謠言 2 +證件 2 +證券 2 +證據 2 +譜寫 2 +警報 2 +警官 2 +警方 2 +警長 2 +譯名 2 +譯法 2 +護士 2 +護法 2 +變得 2 +變換 2 +變更 2 +變異 2 +象棋 2 +象牙 2 +豪華 2 +貓頭 2 +財務 2 +財困 2 +財團 2 +財物 2 +貨櫃 2 +販子 2 +貪污 2 +貴人 2 +買下 2 +買來 2 +買加 2 +賀氏 2 +資方 2 +資產 2 +賈斯 2 +賠償 2 +賢妃 2 +質子 2 +質疑 2 +質素 2 +賴恩 2 +購入 2 +購物 2 +賽馬 2 +赤川 2 +赫勒 2 +赫爾 2 +走出 2 +走路 2 +起飛 2 +趁機 2 +超越 2 +越低 2 +越獄 2 +越遠 2 +越高 2 +趕出 2 +趨同 2 +路上 2 +路口 2 +身長 2 +躲過 2 +車中 2 +車廂 2 +車資 2 +車隊 2 +軍區 2 +軍校 2 +軍法 2 +載重 2 +輔音 2 +輕傷 2 +輕型 2 +輕視 2 +輟學 2 +轄境 2 +轄有 2 +轉介 2 +轉車 2 +轎車 2 +轟動 2 +辛亥 2 +辛堡 2 +辛普 2 +辣妹 2 +辭退 2 +辯護 2 +農地 2 +農場 2 +農曆 2 +農田 2 +農藥 2 +近藤 2 +近衛 2 +迦納 2 +迪克 2 +迪爾 2 +迫害 2 +迴避 2 +迷信 2 +迷幻 2 +追溯 2 +退化 2 +送入 2 +逃出 2 +逃避 2 +透明 2 +逐鹿 2 +通報 2 +通婚 2 +通知 2 +通稱 2 +通航 2 +速寫 2 +速率 2 +造出 2 +造船 2 +連同 2 +連帶 2 +連鎖 2 +週年 2 +進修 2 +進化 2 +進駐 2 +遇到 2 +遊仙 2 +遊客 2 +遊玩 2 +運河 2 +運用 2 +運轉 2 +過世 2 +過勞 2 +過年 2 +過於 2 +過海 2 +過關 2 +道場 2 +道夫 2 +道理 2 +道生 2 +達尼 2 +違反 2 +遞歸 2 +遠東 2 +遭受 2 +遴選 2 +遵守 2 +遷往 2 +選中 2 +選拔 2 +選民 2 +選為 2 +遺囑 2 +遺產 2 +遺跡 2 +遼東 2 +還珠 2 +那些 2 +那樣 2 +邦初 2 +郊外 2 +部下 2 +部族 2 +郵件 2 +都柏 2 +都洛 2 +都統 2 +鄭國 2 +鄭氏 2 +鄰國 2 +配合 2 +配對 2 +酒吧 2 +酒泉 2 +酒醉 2 +醜聞 2 +醫藥 2 +釉下 2 +釋迦 2 +里奇 2 +里巴 2 +里昂 2 +里發 2 +里程 2 +里高 2 +重修 2 +重型 2 +重華 2 +重言 2 +重返 2 +重重 2 +重量 2 +野獸 2 +量表 2 +金星 2 +金漢 2 +金牌 2 +金蓮 2 +金酸 2 +金雞 2 +金馬 2 +鋼鐵 2 +錫金 2 +鍵盤 2 +鐘錶 2 +鐵伊 2 +鐵達 2 +鑄造 2 +長久 2 +長城 2 +長女 2 +長春 2 +長老 2 +長者 2 +長興 2 +長蘆 2 +長軸 2 +長音 2 +門前 2 +門口 2 +門子 2 +門戶 2 +門齒 2 +開創 2 +開口 2 +開心 2 +開拍 2 +開採 2 +開會 2 +開火 2 +開羅 2 +開賽 2 +開通 2 +開門 2 +開除 2 +閏年 2 +間接 2 +間隙 2 +閘門 2 +閱讀 2 +關注 2 +關聯 2 +關說 2 +關鍵 2 +關門 2 +關閉 2 +防範 2 +防衛 2 +阻擋 2 +阻礙 2 +阿保 2 +阿姆 2 +阿森 2 +阿特 2 +阿美 2 +附帶 2 +降級 2 +降解 2 +除籍 2 +陰霾 2 +陵墓 2 +陵寢 2 +陶瓷 2 +陷阱 2 +陽澄 2 +隆頭 2 +隊友 2 +隊長 2 +隋代 2 +隕石 2 +隨之 2 +雅圖 2 +集成 2 +集資 2 +集雨 2 +集體 2 +雍正 2 +雕像 2 +離任 2 +離婚 2 +離心 2 +雨林 2 +雪貂 2 +雲想 2 +零星 2 +雷拉 2 +雷馬 2 +電動 2 +電壓 2 +電流 2 +電纜 2 +電能 2 +電路 2 +電鐵 2 +震動 2 +震盪 2 +震驚 2 +霍爾 2 +霸主 2 +霸王 2 +靈活 2 +靈素 2 +青聯 2 +青藏 2 +青銅 2 +靜電 2 +面熊 2 +面臨 2 +面試 2 +面部 2 +韋斯 2 +音系 2 +音變 2 +韻律 2 +順序 2 +預備 2 +預定 2 +預言 2 +預計 2 +頒布 2 +頒發 2 +領地 2 +頭等 2 +頭部 2 +頭魚 2 +頭鷹 2 +願望 2 +顯得 2 +顯聖 2 +顯著 2 +風俗 2 +風景 2 +風氣 2 +風濕 2 +風雲 2 +風靡 2 +食夢 2 +食材 2 +飢荒 2 +飲品 2 +飲用 2 +餘額 2 +館藏 2 +饑荒 2 +饒舌 2 +首位 2 +首播 2 +首爾 2 +首腦 2 +首部 2 +香蕉 2 +馬其 2 +馬拉 2 +馬歇 2 +馬耳 2 +馬薩 2 +駐紮 2 +駐足 2 +駕駛 2 +骨頭 2 +骨髓 2 +體制 2 +體力 2 +體型 2 +體校 2 +體重 2 +體驗 2 +高傲 2 +高利 2 +高壓 2 +高平 2 +高校 2 +高止 2 +高爾 2 +高能 2 +高興 2 +高郵 2 +鬆散 2 +魅力 2 +魚雷 2 +魚頭 2 +魯斯 2 +魯明 2 +魯殊 2 +魯茲 2 +鮮明 2 +鯉形 2 +鯉科 2 +鰂魚 2 +鱸形 2 +鳥取 2 +鳥綱 2 +鳳凰 2 +鳳翔 2 +麗亞 2 +麗珠 2 +麗茲 2 +麗華 2 +麟趾 2 +麻河 2 +麻省 2 +黃帝 2 +黃色 2 +黎世 2 +黎加 2 +黑幫 2 +黑貓 2 +黑龍 2 +默多 2 +默德 2 +點擊 2 +點數 2 +點球 2 +黨派 2 +鼎盛 2 +鼠猴 2 +鼠疫 2 +齊克 2 +齒擦 2 +齒虎 2 +齒軌 2 +齒龍 2 +齧齒 2 +龐家 2 +'s 1 +-A 1 +-B 1 +-L 1 +-P 1 +-S 1 +-U 1 +-r 1 +.q 1 +.x 1 +9F 1 +9L 1 +9M 1 +9O 1 +9X 1 +9c 1 +9n 1 +9成 1 +AF 1 +AM 1 +AN 1 +AR 1 +Aa 1 +Ac 1 +Ag 1 +B- 1 +B9 1 +BH 1 +BK 1 +BS 1 +Bl 1 +Bu 1 +CB 1 +CD 1 +CN 1 +Ci 1 +Cs 1 +Cu 1 +Cá 1 +D- 1 +D9 1 +DD 1 +DF 1 +DM 1 +DO 1 +Dr 1 +Du 1 +EG 1 +EK 1 +EP 1 +ER 1 +ES 1 +EX 1 +Ed 1 +Em 1 +Es 1 +Ex 1 +F9 1 +FA 1 +FD 1 +FH 1 +FI 1 +FL 1 +FS 1 +FU 1 +Fe 1 +Fl 1 +Fu 1 +G9 1 +GF 1 +GT 1 +GY 1 +Gh 1 +Gu 1 +HC 1 +HE 1 +HK 1 +HO 1 +HP 1 +HS 1 +I- 1 +IA 1 +IB 1 +IF 1 +IN 1 +IP 1 +IT 1 +IU 1 +Il 1 +Ir 1 +It 1 +JP 1 +KI 1 +KK 1 +KR 1 +Kn 1 +Ko 1 +LA 1 +LC 1 +LD 1 +LR 1 +LS 1 +LY 1 +MD 1 +MF 1 +ML 1 +MM 1 +MS 1 +NC 1 +NG 1 +NH 1 +NI 1 +NM 1 +NZ 1 +O. 1 +O9 1 +OK 1 +ON 1 +OS 1 +OV 1 +Od 1 +On 1 +Op 1 +Os 1 +Ot 1 +P9 1 +PF 1 +PH 1 +PT 1 +PU 1 +Ps 1 +R9 1 +RC 1 +RE 1 +RY 1 +Ra 1 +Rh 1 +S- 1 +S9 1 +SE 1 +SH 1 +SI 1 +SS 1 +Si 1 +Sn 1 +Sr 1 +Sy 1 +Sō 1 +T9 1 +TA 1 +TI 1 +TN 1 +Ta 1 +Ts 1 +Ty 1 +UD 1 +UM 1 +UP 1 +Uh 1 +Ut 1 +VA 1 +VF 1 +VS 1 +Vo 1 +WH 1 +WT 1 +Wh 1 +XE 1 +YP 1 +Ye 1 +ZZ 1 +Ze 1 +`` 1 +a. 1 +aw 1 +ax 1 +bb 1 +bd 1 +bs 1 +cm 1 +cn 1 +cq 1 +dm 1 +dn 1 +dt 1 +du 1 +dv 1 +dy 1 +dé 1 +eH 1 +eS 1 +ej 1 +ex 1 +f( 1 +fe 1 +fi 1 +fk 1 +fl 1 +g( 1 +gb 1 +gd 1 +gf 1 +gj 1 +gm 1 +gn 1 +gr 1 +hC 1 +iB 1 +iT 1 +ih 1 +ik 1 +iw 1 +ja 1 +ji 1 +ju 1 +k. 1 +kl 1 +kn 1 +ko 1 +kt 1 +kö 1 +l9 1 +lR 1 +lc 1 +lf 1 +lg 1 +lw 1 +m. 1 +mg 1 +mn 1 +mr 1 +ms 1 +n= 1 +nf 1 +nh 1 +nl 1 +np 1 +nv 1 +nw 1 +ny 1 +oM 1 +oe 1 +oz 1 +pb 1 +pk 1 +pl 1 +pm 1 +pt 1 +q- 1 +q. 1 +qq 1 +r- 1 +rj 1 +rp 1 +rz 1 +sM 1 +sl 1 +sn 1 +t' 1 +tS 1 +tc 1 +tm 1 +tp 1 +u. 1 +uT 1 +uv 1 +ux 1 +uz 1 +vo 1 +vr 1 +w= 1 +wn 1 +wo 1 +x. 1 +xa 1 +xi 1 +xp 1 +yf 1 +yg 1 +yh 1 +yl 1 +ym 1 +yo 1 +za 1 +zo 1 +zp 1 +zu 1 +zz 1 +ál 1 +ém 1 +öy 1 +ōy 1 +​​ 1 +​物 1 +、再 1 +一, 1 +一一 1 +一九 1 +一併 1 +一億 1 +一分 1 +一到 1 +一勞 1 +一反 1 +一句 1 +一字 1 +一式 1 +一成 1 +一戰 1 +一指 1 +一改 1 +一概 1 +一模 1 +一民 1 +一氧 1 +一炮 1 +一無 1 +一爭 1 +一發 1 +一益 1 +一而 1 +一舉 1 +一落 1 +一見 1 +一談 1 +一路 1 +一身 1 +一邊 1 +一點 1 +丁字 1 +丁斯 1 +丁漢 1 +丁目 1 +丁蛋 1 +七七 1 +七里 1 +三、 1 +三一 1 +三亞 1 +三元 1 +三千 1 +三原 1 +三崎 1 +三星 1 +三浦 1 +三王 1 +三索 1 +三船 1 +三菱 1 +三萬 1 +三藩 1 +三軍 1 +三郎 1 +三門 1 +上傳 1 +上去 1 +上古 1 +上司 1 +上埔 1 +上報 1 +上塘 1 +上奏 1 +上學 1 +上尉 1 +上手 1 +上新 1 +上朝 1 +上林 1 +上沖 1 +上班 1 +上端 1 +上網 1 +上線 1 +上色 1 +上蓋 1 +上訪 1 +上調 1 +上路 1 +上身 1 +上車 1 +上選 1 +上部 1 +上限 1 +上集 1 +上雲 1 +上顎 1 +上高 1 +下剋 1 +下圖 1 +下徹 1 +下樓 1 +下河 1 +下潛 1 +下獄 1 +下稱 1 +下蝕 1 +下行 1 +下設 1 +下課 1 +下跌 1 +下遊 1 +下部 1 +下關 1 +下院 1 +下集 1 +下雷 1 +下面 1 +下顎 1 +下風 1 +不丹 1 +不乏 1 +不了 1 +不以 1 +不克 1 +不入 1 +不凡 1 +不利 1 +不到 1 +不力 1 +不動 1 +不去 1 +不吃 1 +不合 1 +不和 1 +不問 1 +不均 1 +不多 1 +不大 1 +不定 1 +不容 1 +不實 1 +不惜 1 +不愛 1 +不懷 1 +不扣 1 +不折 1 +不捨 1 +不收 1 +不敬 1 +不料 1 +不易 1 +不景 1 +不服 1 +不朽 1 +不歸 1 +不渝 1 +不準 1 +不理 1 +不畏 1 +不符 1 +不紊 1 +不純 1 +不絕 1 +不經 1 +不群 1 +不自 1 +不行 1 +不衰 1 +不要 1 +不見 1 +不解 1 +不計 1 +不該 1 +不詳 1 +不豐 1 +不賣 1 +不輸 1 +不辭 1 +不道 1 +不達 1 +不適 1 +不銹 1 +不限 1 +不露 1 +不顧 1 +且是 1 +世上 1 +世人 1 +世代 1 +世充 1 +世則 1 +世子 1 +世昌 1 +世田 1 +世矚 1 +世祿 1 +世綱 1 +世貿 1 +世道 1 +世銘 1 +丙組 1 +丞益 1 +丞相 1 +並無 1 +並稱 1 +並系 1 +並芘 1 +中仙 1 +中信 1 +中原 1 +中堅 1 +中場 1 +中外 1 +中底 1 +中彈 1 +中性 1 +中投 1 +中斷 1 +中旬 1 +中校 1 +中樞 1 +中檔 1 +中殿 1 +中毒 1 +中波 1 +中田 1 +中級 1 +中綴 1 +中線 1 +中耳 1 +中聯 1 +中興 1 +中落 1 +中葉 1 +中藥 1 +中觀 1 +中超 1 +中農 1 +中鐵 1 +串聯 1 +丸都 1 +丹噶 1 +丹姆 1 +丹路 1 +主修 1 +主創 1 +主導 1 +主帶 1 +主幹 1 +主意 1 +主控 1 +主治 1 +主炮 1 +主犯 1 +主筆 1 +主船 1 +主食 1 +乃伊 1 +乃威 1 +乃狄 1 +久經 1 +久藏 1 +之介 1 +之好 1 +之所 1 +之樂 1 +之泰 1 +之申 1 +之道 1 +之銓 1 +之鋒 1 +乘勢 1 +乘搭 1 +乘撘 1 +乘裝 1 +乙二 1 +乙未 1 +乙組 1 +乙苯 1 +九五 1 +九十 1 +九江 1 +九鐵 1 +乞多 1 +也夫 1 +乳房 1 +乾季 1 +乾德 1 +乾淨 1 +乾西 1 +亂倫 1 +亂刀 1 +事先 1 +事態 1 +事發 1 +事與 1 +事跡 1 +事蹟 1 +二中 1 +二二 1 +二八 1 +二宮 1 +二戶 1 +二氮 1 +二烷 1 +二道 1 +于敏 1 +互作 1 +互利 1 +互助 1 +互惠 1 +互通 1 +互選 1 +五一 1 +五中 1 +五八 1 +五分 1 +五常 1 +五弟 1 +五彩 1 +五成 1 +五指 1 +五氧 1 +五萬 1 +井住 1 +井字 1 +井村 1 +井田 1 +些微 1 +亞丁 1 +亞亞 1 +亞他 1 +亞吉 1 +亞哥 1 +亞基 1 +亞堡 1 +亞士 1 +亞奧 1 +亞尼 1 +亞布 1 +亞彬 1 +亞托 1 +亞文 1 +亞斯 1 +亞普 1 +亞東 1 +亞流 1 +亞烏 1 +亞瑟 1 +亞絲 1 +亞莫 1 +亞西 1 +亞豬 1 +亞路 1 +亞辛 1 +亞迪 1 +亞運 1 +亞邦 1 +亞麻 1 +亡故 1 +交付 1 +交代 1 +交出 1 +交口 1 +交回 1 +交州 1 +交替 1 +交棒 1 +交涉 1 +交界 1 +交行 1 +交角 1 +交談 1 +交道 1 +交錯 1 +亦即 1 +亨得 1 +京劇 1 +京王 1 +京釜 1 +亭湖 1 +亮相 1 +人世 1 +人仕 1 +人字 1 +人客 1 +人意 1 +人手 1 +人打 1 +人日 1 +人權 1 +人殉 1 +人氣 1 +人祭 1 +人種 1 +人稱 1 +人行 1 +人道 1 +人選 1 +人麻 1 +什倫 1 +什圖 1 +什沃 1 +什維 1 +什艾 1 +仁傑 1 +仁和 1 +仁壽 1 +仁守 1 +仁宗 1 +仁慕 1 +仁煥 1 +仁牙 1 +仁玕 1 +仁社 1 +仁穆 1 +仁粹 1 +仁青 1 +仇人 1 +今川 1 +介壽 1 +介質 1 +仍是 1 +仍有 1 +仍算 1 +仔林 1 +仔沙 1 +他倆 1 +他定 1 +他家 1 +他能 1 +他那 1 +仙人 1 +仙奴 1 +仙鶴 1 +代之 1 +代亞 1 +代價 1 +代名 1 +代幣 1 +代數 1 +代牧 1 +代相 1 +代碼 1 +令狐 1 +令華 1 +以千 1 +以爲 1 +仰光 1 +仰望 1 +仲雄 1 +任免 1 +任選 1 +伊什 1 +伊克 1 +伊喀 1 +伊塔 1 +伊娃 1 +伊尹 1 +伊德 1 +伊摩 1 +伊朗 1 +伊杜 1 +伊爾 1 +伊犁 1 +伊瑪 1 +伊甸 1 +伊薩 1 +伊里 1 +伊阿 1 +伊頓 1 +伍士 1 +伎倆 1 +伏塔 1 +伏契 1 +伏爾 1 +伏瓦 1 +伐克 1 +休假 1 +休克 1 +休士 1 +休憩 1 +休斯 1 +休閑 1 +休養 1 +伙食 1 +伯來 1 +伯克 1 +伯塔 1 +伯多 1 +伯拉 1 +伯明 1 +伯格 1 +伯溫 1 +伯爾 1 +伯茲 1 +伯莎 1 +伯虎 1 +伯謙 1 +伯達 1 +伯里 1 +伴侶 1 +伴奏 1 +伴有 1 +伴生 1 +伸一 1 +伸冤 1 +伸延 1 +伸港 1 +伽馬 1 +但斯 1 +佈局 1 +佈置 1 +佈道 1 +位在 1 +位居 1 +位階 1 +位面 1 +低下 1 +低估 1 +低價 1 +低層 1 +低平 1 +低座 1 +低檔 1 +低潮 1 +低等 1 +低調 1 +低額 1 +住友 1 +住所 1 +住進 1 +佐佐 1 +佐勞 1 +佐和 1 +佐拉 1 +佐木 1 +佐民 1 +佔用 1 +何利 1 +何力 1 +何方 1 +佛事 1 +佛典 1 +佛森 1 +佛經 1 +佛萊 1 +佛蒙 1 +佛雷 1 +佛頭 1 +佛龍 1 +作對 1 +作怪 1 +作曲 1 +作次 1 +作法 1 +作為 1 +作畫 1 +作自 1 +作雲 1 +作風 1 +你變 1 +佩佐 1 +佩儂 1 +佩克 1 +佩戴 1 +佩斯 1 +佩琪 1 +佩蘭 1 +佳作 1 +佳佳 1 +佳節 1 +併發 1 +使喚 1 +使團 1 +使節 1 +侄子 1 +來杜 1 +來看 1 +來納 1 +來臨 1 +來襲 1 +來館 1 +侈談 1 +侍奉 1 +侍女 1 +侍從 1 +侏羅 1 +供水 1 +供電 1 +供養 1 +依拉 1 +依次 1 +依照 1 +依瑪 1 +依附 1 +侮辱 1 +侵佔 1 +侵害 1 +便利 1 +便捷 1 +便是 1 +便服 1 +便當 1 +便秘 1 +俊業 1 +俘獲 1 +俚頭 1 +保住 1 +保全 1 +保大 1 +保定 1 +保密 1 +保明 1 +保溫 1 +保送 1 +保養 1 +信中 1 +信念 1 +信教 1 +信玄 1 +信神 1 +信竹 1 +信裡 1 +修好 1 +修學 1 +修憲 1 +修斯 1 +修煉 1 +修葺 1 +修鞋 1 +修養 1 +俯瞰 1 +俸祿 1 +俾路 1 +倉促 1 +倉庫 1 +個位 1 +個個 1 +個展 1 +倒下 1 +倒入 1 +倖免 1 +候旨 1 +候補 1 +倚天 1 +倚靠 1 +倩文 1 +倫之 1 +倫努 1 +倫巴 1 +倫布 1 +倫拜 1 +倫春 1 +倫納 1 +倫西 1 +倫貝 1 +倬標 1 +倭國 1 +倭寇 1 +假使 1 +假借 1 +假名 1 +假帳 1 +假設 1 +假說 1 +假象 1 +假釋 1 +假面 1 +偉強 1 +偏低 1 +偏僻 1 +偏向 1 +偏小 1 +偏東 1 +偏重 1 +偏離 1 +做到 1 +停刊 1 +停業 1 +停機 1 +停泊 1 +停職 1 +停辦 1 +停靠 1 +停飛 1 +健壯 1 +健將 1 +健身 1 +側目 1 +側邊 1 +偵察 1 +偵測 1 +偵緝 1 +偶像 1 +偶發 1 +偷取 1 +偷羊 1 +偷襲 1 +偷走 1 +偽季 1 +偽裝 1 +傀儡 1 +傅萊 1 +傍晚 1 +傑克 1 +傑志 1 +傑斐 1 +備忘 1 +備戰 1 +備案 1 +備用 1 +備註 1 +傢具 1 +催芽 1 +傭人 1 +傲不 1 +傳來 1 +傳給 1 +傳記 1 +傳遍 1 +債券 1 +傷及 1 +傷心 1 +傷患 1 +傷悲 1 +傷病 1 +傷透 1 +傾中 1 +傾心 1 +傾談 1 +僅屬 1 +僅用 1 +像差 1 +僕人 1 +僧人 1 +僧孺 1 +僧尼 1 +僧格 1 +僧祐 1 +僱主 1 +僱傭 1 +僵局 1 +價位 1 +價錢 1 +儀器 1 +儒士 1 +儘快 1 +儘量 1 +償付 1 +優值 1 +優良 1 +優裕 1 +優質 1 +儲量 1 +允良 1 +元子 1 +元朝 1 +元氣 1 +元澄 1 +元老 1 +元起 1 +兄長 1 +充任 1 +充分 1 +充氣 1 +充滿 1 +充軍 1 +兆基 1 +兆楠 1 +兆陽 1 +兇多 1 +兇悍 1 +兇猛 1 +先前 1 +先帝 1 +先師 1 +先賢 1 +先鋒 1 +先驗 1 +光啟 1 +光大 1 +光學 1 +光宇 1 +光州 1 +光度 1 +光復 1 +光景 1 +光束 1 +光泰 1 +光滑 1 +光照 1 +光環 1 +光范 1 +光華 1 +光顧 1 +克伍 1 +克佛 1 +克利 1 +克力 1 +克勤 1 +克南 1 +克孜 1 +克安 1 +克家 1 +克巴 1 +克希 1 +克敏 1 +克欽 1 +克沙 1 +克漢 1 +克瑟 1 +克禮 1 +克穆 1 +克萊 1 +克蘇 1 +克裡 1 +克貝 1 +克農 1 +克連 1 +克默 1 +兌換 1 +免生 1 +免疫 1 +免遭 1 +兒島 1 +兒道 1 +兔毛 1 +兢兢 1 +兢業 1 +入世 1 +入地 1 +入塞 1 +入境 1 +入手 1 +入聲 1 +入股 1 +入閘 1 +入院 1 +入駐 1 +內亞 1 +內化 1 +內卡 1 +內在 1 +內埔 1 +內壁 1 +內拉 1 +內政 1 +內特 1 +內瑞 1 +內置 1 +內羅 1 +內胎 1 +內臟 1 +內蒂 1 +內載 1 +內遷 1 +內阿 1 +內韋 1 +全劇 1 +全十 1 +全名 1 +全境 1 +全壘 1 +全套 1 +全島 1 +全州 1 +全得 1 +全德 1 +全效 1 +全敗 1 +全數 1 +全書 1 +全盛 1 +全盤 1 +全省 1 +全福 1 +全程 1 +全稱 1 +全線 1 +全興 1 +全邨 1 +全鎮 1 +全隊 1 +全額 1 +全黑 1 +兩億 1 +八世 1 +八億 1 +八十 1 +八卦 1 +八大 1 +八思 1 +八成 1 +八杉 1 +八面 1 +公仔 1 +公佈 1 +公克 1 +公告 1 +公堂 1 +公墓 1 +公屋 1 +公斤 1 +公款 1 +公正 1 +公狼 1 +公約 1 +公衛 1 +公袥 1 +公視 1 +公超 1 +公關 1 +公頃 1 +公館 1 +六合 1 +六四 1 +六安 1 +六甲 1 +共享 1 +共尾 1 +共生 1 +共苦 1 +共處 1 +共識 1 +共鳴 1 +兵房 1 +兵鋒 1 +其妻 1 +其子 1 +其數 1 +其次 1 +其母 1 +其道 1 +典籍 1 +兼修 1 +兼優 1 +兼具 1 +兼容 1 +兼屬 1 +兼并 1 +冀望 1 +再、 1 +再三 1 +再保 1 +再用 1 +再而 1 +再臨 1 +再補 1 +再見 1 +冒險 1 +冠上 1 +冠峰 1 +冠狀 1 +冠玉 1 +冤案 1 +冥冥 1 +冥想 1 +冬初 1 +冬眠 1 +冬青 1 +冰冰 1 +冰塔 1 +冰晶 1 +冰柱 1 +冰河 1 +冰湖 1 +冰瀑 1 +冰球 1 +冰風 1 +冷凍 1 +冷暖 1 +冷次 1 +冷氣 1 +冷眼 1 +冷遇 1 +冷靜 1 +凄美 1 +准考 1 +凈白 1 +凌日 1 +凌晨 1 +凌辱 1 +凌駕 1 +凍傷 1 +凝結 1 +凡娜 1 +凱勒 1 +凱文 1 +凱爾 1 +凱維 1 +凱美 1 +凱茜 1 +凱蒂 1 +凱馬 1 +凸起 1 +凹版 1 +出世 1 +出人 1 +出到 1 +出動 1 +出去 1 +出名 1 +出品 1 +出國 1 +出城 1 +出奇 1 +出嫁 1 +出局 1 +出師 1 +出廠 1 +出征 1 +出戶 1 +出所 1 +出手 1 +出擊 1 +出校 1 +出榜 1 +出牆 1 +出血 1 +出訪 1 +出路 1 +出逃 1 +出門 1 +出頭 1 +刀鞘 1 +分工 1 +分店 1 +分批 1 +分攤 1 +分數 1 +分文 1 +分明 1 +分枝 1 +分校 1 +分泌 1 +分流 1 +分為 1 +分發 1 +分科 1 +分立 1 +分站 1 +分管 1 +分組 1 +分缺 1 +分貝 1 +分辨 1 +分部 1 +分鏡 1 +分隔 1 +分離 1 +分題 1 +分點 1 +切下 1 +切分 1 +切割 1 +切合 1 +切哇 1 +切實 1 +切成 1 +切望 1 +切片 1 +切華 1 +刑事 1 +刑部 1 +划算 1 +划艇 1 +列塔 1 +列姆 1 +列梅 1 +列維 1 +初中 1 +初始 1 +初時 1 +初次 1 +初步 1 +初見 1 +判令 1 +判定 1 +判寺 1 +判詞 1 +別人 1 +別克 1 +別名 1 +別院 1 +利他 1 +利刃 1 +利南 1 +利卡 1 +利奇 1 +利好 1 +利妮 1 +利帕 1 +利文 1 +利欽 1 +利歐 1 +利沙 1 +利潘 1 +利烏 1 +利牛 1 +利班 1 +利維 1 +利茅 1 +利茲 1 +利華 1 +利菲 1 +利雙 1 +刪剪 1 +刮目 1 +到任 1 +到期 1 +到發 1 +制動 1 +制式 1 +制瓷 1 +制約 1 +制酸 1 +刷到 1 +券頂 1 +刺殺 1 +刺特 1 +刻劃 1 +刻寫 1 +刻板 1 +刻滿 1 +刻畫 1 +則士 1 +則里 1 +削減 1 +剋上 1 +剌旭 1 +前傾 1 +前去 1 +前因 1 +前奏 1 +前委 1 +前嫌 1 +前季 1 +前提 1 +前景 1 +前稱 1 +前端 1 +前綴 1 +前者 1 +前肢 1 +前齒 1 +剛剛 1 +剛性 1 +剛直 1 +剛鐸 1 +剩餘 1 +副長 1 +割據 1 +割破 1 +割讓 1 +割開 1 +創保 1 +創傷 1 +創刊 1 +創煥 1 +創生 1 +剷除 1 +剿滅 1 +劃出 1 +劃歸 1 +劃界 1 +劇中 1 +劇作 1 +劇場 1 +劇組 1 +劉楊 1 +劍俠 1 +劍法 1 +劍麻 1 +劑量 1 +力佛 1 +力圖 1 +力崗 1 +力特 1 +力霸 1 +力馬 1 +功勞 1 +功德 1 +功樂 1 +功績 1 +加下 1 +加侖 1 +加保 1 +加值 1 +加冕 1 +加劇 1 +加勁 1 +加多 1 +加尼 1 +加恩 1 +加拉 1 +加粗 1 +加薪 1 +加藤 1 +加賀 1 +加迪 1 +加速 1 +加達 1 +加電 1 +加霜 1 +助手 1 +助燃 1 +助聽 1 +助長 1 +努兒 1 +努斯 1 +劫匪 1 +劫持 1 +効忠 1 +勁光 1 +勁報 1 +勁敵 1 +勁歌 1 +勃勃 1 +勃起 1 +勇俊 1 +勇士 1 +勇武 1 +勒溫 1 +動人 1 +動向 1 +動土 1 +動用 1 +動能 1 +動蕩 1 +動詞 1 +動量 1 +勘探 1 +務工 1 +勝任 1 +勝昭 1 +勝素 1 +勝者 1 +勝訴 1 +勝賴 1 +勞埃 1 +勞庇 1 +勞斯 1 +勞永 1 +勞爾 1 +勞累 1 +勞賓 1 +募款 1 +募集 1 +勢傾 1 +勢能 1 +勤先 1 +勤快 1 +勳位 1 +勳爵 1 +勵珍 1 +勾形 1 +勾畫 1 +勾結 1 +包袱 1 +包裹 1 +包覆 1 +包頭 1 +化二 1 +化名 1 +化妝 1 +化成 1 +化整 1 +化氦 1 +化用 1 +化碳 1 +化肥 1 +化鉛 1 +化鐵 1 +北伐 1 +北側 1 +北冰 1 +北卡 1 +北景 1 +北歐 1 +北段 1 +北甘 1 +北美 1 +北車 1 +北返 1 +北達 1 +北邊 1 +匯入 1 +匯合 1 +匯報 1 +匯聯 1 +匯集 1 +匹亞 1 +匹斯 1 +匹茲 1 +匾額 1 +區塊 1 +區段 1 +區間 1 +十九 1 +十億 1 +十全 1 +十數 1 +十美 1 +千丈 1 +千五 1 +千兆 1 +千克 1 +千四 1 +千島 1 +千方 1 +千春 1 +千瓦 1 +千計 1 +千里 1 +千陽 1 +千餘 1 +千鶴 1 +升值 1 +升到 1 +升天 1 +升越 1 +升降 1 +升高 1 +午膳 1 +半導 1 +半牧 1 +半農 1 +卑爾 1 +卑詩 1 +卓著 1 +協合 1 +協理 1 +南亞 1 +南人 1 +南伽 1 +南加 1 +南卡 1 +南哲 1 +南多 1 +南大 1 +南寧 1 +南市 1 +南征 1 +南端 1 +南線 1 +南美 1 +南臨 1 +南航 1 +南船 1 +南路 1 +南通 1 +南遷 1 +南鄰 1 +南門 1 +南開 1 +南院 1 +南雄 1 +南麓 1 +博倫 1 +博凱 1 +博多 1 +博夫 1 +博學 1 +博尼 1 +博斯 1 +博格 1 +博洛 1 +博滕 1 +博義 1 +博覽 1 +卜拉 1 +卜楞 1 +占星 1 +卡亞 1 +卡內 1 +卡利 1 +卡力 1 +卡加 1 +卡姆 1 +卡巴 1 +卡希 1 +卡帕 1 +卡波 1 +卡納 1 +卡臣 1 +卡車 1 +卡默 1 +卧底 1 +卧病 1 +卧薪 1 +印信 1 +印刷 1 +印地 1 +印表 1 +危在 1 +危害 1 +危殆 1 +即場 1 +即有 1 +卵內 1 +原先 1 +原型 1 +原姓 1 +原屬 1 +原平 1 +原意 1 +原指 1 +原文 1 +原核 1 +原畫 1 +原籍 1 +原罪 1 +原諒 1 +厭世 1 +厭惡 1 +去搶 1 +去留 1 +去看 1 +參戰 1 +參政 1 +參演 1 +參看 1 +參禮 1 +參贊 1 +參閱 1 +又廷 1 +又或 1 +及利 1 +及後 1 +及時 1 +及爾 1 +友情 1 +友邦 1 +反共 1 +反其 1 +反動 1 +反右 1 +反向 1 +反恐 1 +反省 1 +反綁 1 +反證 1 +反響 1 +反黨 1 +叔父 1 +取下 1 +取出 1 +取名 1 +取回 1 +取悅 1 +取液 1 +取物 1 +取用 1 +取而 1 +受命 1 +受孕 1 +受害 1 +受挫 1 +受洗 1 +受精 1 +受罰 1 +受襲 1 +受賄 1 +受阻 1 +受雇 1 +叛徒 1 +叛變 1 +叛軍 1 +叢刊 1 +叢書 1 +口供 1 +口信 1 +口否 1 +口吻 1 +口感 1 +口服 1 +口秀 1 +口音 1 +口魚 1 +古丁 1 +古喙 1 +古堡 1 +古寺 1 +古廟 1 +古惑 1 +古武 1 +古爾 1 +古特 1 +古迹 1 +古都 1 +古魯 1 +句子 1 +句點 1 +另加 1 +另娶 1 +另立 1 +另築 1 +另類 1 +只好 1 +只是 1 +只會 1 +只知 1 +只能 1 +叫作 1 +叫拜 1 +叫聲 1 +召集 1 +可及 1 +可可 1 +可塑 1 +可夫 1 +可巴 1 +可愛 1 +可憐 1 +可樂 1 +可欣 1 +可歸 1 +可熱 1 +可西 1 +可靠 1 +可風 1 +台南 1 +台標 1 +台視 1 +台詞 1 +台長 1 +史前 1 +史坦 1 +史塔 1 +史官 1 +史帝 1 +史特 1 +史稱 1 +史記 1 +史跡 1 +史館 1 +右任 1 +右手 1 +右方 1 +右神 1 +右臂 1 +司可 1 +司鐸 1 +吁宋 1 +吃上 1 +吃到 1 +吃掉 1 +吃法 1 +吃起 1 +各布 1 +各方 1 +各業 1 +各球 1 +各異 1 +各科 1 +各職 1 +各處 1 +各行 1 +各隊 1 +各項 1 +合共 1 +合力 1 +合台 1 +合和 1 +合唱 1 +合夥 1 +合奏 1 +合流 1 +合約 1 +合計 1 +合資 1 +合辦 1 +合適 1 +合陽 1 +合體 1 +吉克 1 +吉利 1 +吉士 1 +吉姆 1 +吉少 1 +吉拉 1 +吉祥 1 +吉米 1 +吉西 1 +吉阿 1 +吉隆 1 +同仁 1 +同伴 1 +同僚 1 +同台 1 +同型 1 +同志 1 +同日 1 +同校 1 +同步 1 +同母 1 +同父 1 +同甘 1 +同行 1 +同郷 1 +同食 1 +同飲 1 +名作 1 +名分 1 +名城 1 +名帥 1 +名師 1 +名方 1 +名村 1 +名氣 1 +名流 1 +名聲 1 +名臣 1 +名茶 1 +名號 1 +名門 1 +名額 1 +后妃 1 +吐嘈 1 +向前 1 +向滋 1 +君如 1 +君權 1 +君長 1 +君龍 1 +吞下 1 +吞聲 1 +吟唱 1 +否決 1 +吩咐 1 +含糖 1 +含量 1 +吳王 1 +吵醒 1 +吸塵 1 +吸毒 1 +吸菸 1 +吸附 1 +吸食 1 +吹來 1 +吹氣 1 +吹滅 1 +吻部 1 +呂宋 1 +呂智 1 +呈交 1 +告戒 1 +告白 1 +周代 1 +周刊 1 +周敏 1 +周日 1 +周朝 1 +周期 1 +周迅 1 +周遭 1 +味道 1 +呼倫 1 +呼和 1 +命題 1 +和夫 1 +和好 1 +和子 1 +和宜 1 +和康 1 +和必 1 +和暖 1 +和會 1 +和林 1 +和樹 1 +和浩 1 +和睦 1 +和美 1 +和衷 1 +和親 1 +和記 1 +和諧 1 +和議 1 +咧嘴 1 +咬弦 1 +咸平 1 +咸康 1 +咸淳 1 +咸美 1 +咸鏡 1 +咸陽 1 +哀悼 1 +品嘗 1 +品學 1 +品德 1 +品源 1 +哈丹 1 +哈依 1 +哈剌 1 +哈吉 1 +哈布 1 +哈希 1 +哈德 1 +哈恩 1 +哈拉 1 +哈斯 1 +哈珊 1 +哈索 1 +哈羅 1 +哈莫 1 +哈萊 1 +哈薩 1 +哈達 1 +哈頓 1 +哈默 1 +員佐 1 +員外 1 +員遼 1 +哥什 1 +哥利 1 +哥德 1 +哥拉 1 +哥爾 1 +哥華 1 +哥馬 1 +哨所 1 +哲也 1 +哲元 1 +哲孟 1 +哲生 1 +哲蚌 1 +唇槍 1 +唐代 1 +售予 1 +售出 1 +售票 1 +唯獨 1 +唱戲 1 +唱法 1 +唸珠 1 +唾液 1 +啄木 1 +商事 1 +商務 1 +商圈 1 +商城 1 +商埠 1 +商場 1 +商幫 1 +商朝 1 +商湯 1 +商用 1 +商羯 1 +商船 1 +商量 1 +問吧 1 +問話 1 +啟傑 1 +啟明 1 +啟發 1 +啟示 1 +啟程 1 +啟聯 1 +啟鑰 1 +啤酒 1 +喀什 1 +喀拉 1 +喀比 1 +喀里 1 +善事 1 +善作 1 +善如 1 +善待 1 +善後 1 +善惡 1 +善撲 1 +善良 1 +喇薩 1 +喊出 1 +喘息 1 +喙啄 1 +喙端 1 +喙龍 1 +喚回 1 +喚起 1 +喜好 1 +喝醉 1 +喝采 1 +喪失 1 +喬姆 1 +喬木 1 +喬科 1 +單獨 1 +單調 1 +單質 1 +單項 1 +嗅到 1 +嗜酸 1 +嗜鹼 1 +嗣位 1 +嗣業 1 +嘉慕 1 +嘉樂 1 +嘉許 1 +嘉道 1 +嘉陵 1 +嘉靖 1 +嘔吐 1 +嘗膽 1 +嘩然 1 +嘯林 1 +噁心 1 +噁爆 1 +器具 1 +器械 1 +器蓋 1 +器身 1 +噴射 1 +噶爾 1 +噸位 1 +嚇人 1 +嚮導 1 +嚴令 1 +嚴加 1 +嚴島 1 +嚴懲 1 +嚴斥 1 +嚴氏 1 +嚴肅 1 +嚴謹 1 +囊胚 1 +囑咐 1 +囚犯 1 +四周 1 +四平 1 +四方 1 +四牌 1 +四百 1 +四萬 1 +四郎 1 +回信 1 +回合 1 +回填 1 +回家 1 +回寺 1 +回彈 1 +回復 1 +回教 1 +回程 1 +回答 1 +因弗 1 +因後 1 +因茨 1 +因達 1 +困住 1 +困擾 1 +固態 1 +固有 1 +國中 1 +國主 1 +國光 1 +國公 1 +國共 1 +國史 1 +國名 1 +國君 1 +國土 1 +國奧 1 +國妃 1 +國安 1 +國府 1 +國庫 1 +國情 1 +國慶 1 +國成 1 +國松 1 +國父 1 +國牧 1 +國產 1 +國界 1 +國短 1 +國立 1 +國策 1 +國諱 1 +國雄 1 +圍坐 1 +圍棋 1 +圍牆 1 +圍魏 1 +園丁 1 +園主 1 +園內 1 +園明 1 +園林 1 +園蔥 1 +圓圓 1 +圓弧 1 +圓柱 1 +圓滑 1 +圓環 1 +圖取 1 +圖布 1 +圖形 1 +圖片 1 +圖示 1 +圖稿 1 +團圓 1 +團隊 1 +土匪 1 +土司 1 +土石 1 +土虱 1 +在上 1 +在崗 1 +在旦 1 +在校 1 +在身 1 +地亞 1 +地名 1 +地域 1 +地基 1 +地夫 1 +地安 1 +地平 1 +地庫 1 +地政 1 +地板 1 +地標 1 +地盤 1 +地級 1 +地表 1 +地貌 1 +地質 1 +地道 1 +地遠 1 +地震 1 +坂本 1 +均勻 1 +均衡 1 +坎特 1 +坎貝 1 +坎農 1 +坐在 1 +坐監 1 +坐骨 1 +坡子 1 +坤玲 1 +坦克 1 +坦利 1 +坦干 1 +坦然 1 +坦白 1 +坦福 1 +坦貝 1 +坦頓 1 +型式 1 +垮台 1 +埃內 1 +埃弗 1 +埃德 1 +埃拉 1 +埃爾 1 +埃索 1 +埃胡 1 +埃雷 1 +埋名 1 +埋怨 1 +埋葬 1 +埋藏 1 +城主 1 +城光 1 +城內 1 +城南 1 +城嘉 1 +城址 1 +城巴 1 +城池 1 +城牆 1 +城西 1 +城隍 1 +埔寨 1 +埜堂 1 +執委 1 +執業 1 +執飛 1 +培元 1 +培烏 1 +培育 1 +培茲 1 +基層 1 +基希 1 +基平 1 +基徹 1 +基數 1 +基斯 1 +基石 1 +基苯 1 +基酸 1 +基里 1 +基頻 1 +基龍 1 +堂堂 1 +堂正 1 +堅城 1 +堅定 1 +堅尼 1 +堅拒 1 +堅蜥 1 +堆填 1 +堆積 1 +堪憐 1 +堪稱 1 +堪薩 1 +報仇 1 +報刊 1 +報名 1 +報復 1 +報讀 1 +場內 1 +場均 1 +場景 1 +塑像 1 +塑料 1 +塑有 1 +塑膠 1 +塔克 1 +塔利 1 +塔卜 1 +塔台 1 +塔吉 1 +塔塔 1 +塔德 1 +塔拉 1 +塔林 1 +塔樓 1 +塔法 1 +塔納 1 +塔茨 1 +塔莉 1 +塔蒂 1 +塔西 1 +塔龍 1 +塗魚 1 +塗黑 1 +塞冬 1 +塞古 1 +塞德 1 +塞法 1 +塞諸 1 +塞音 1 +塞馬 1 +墓葬 1 +墓頂 1 +墜入 1 +墜落 1 +增殖 1 +增生 1 +增祥 1 +增進 1 +增額 1 +墟內 1 +墨客 1 +墨色 1 +墾田 1 +壓縮 1 +壞球 1 +壩上 1 +壩下 1 +士域 1 +士尼 1 +士打 1 +士滿 1 +士珍 1 +士禛 1 +士評 1 +士達 1 +壯漢 1 +壯烈 1 +壺中 1 +壽命 1 +壽宴 1 +壽星 1 +夏威 1 +夏愨 1 +夏秋 1 +夏至 1 +夏荷 1 +夏默 1 +外借 1 +外公 1 +外力 1 +外加 1 +外務 1 +外匯 1 +外地 1 +外壁 1 +外套 1 +外婆 1 +外層 1 +外形 1 +外殼 1 +外省 1 +外管 1 +外表 1 +外褂 1 +外訪 1 +外語 1 +外銷 1 +多元 1 +多克 1 +多加 1 +多吉 1 +多夫 1 +多尼 1 +多弗 1 +多毗 1 +多汁 1 +多爾 1 +多特 1 +多祿 1 +多納 1 +多莉 1 +多萬 1 +多謝 1 +多雨 1 +夜夜 1 +夜戰 1 +夠大 1 +夢中 1 +夢境 1 +夢幻 1 +夢想 1 +夢雲 1 +夢鴿 1 +夥兒 1 +大不 1 +大丹 1 +大乘 1 +大二 1 +大儒 1 +大區 1 +大友 1 +大受 1 +大吉 1 +大名 1 +大君 1 +大和 1 +大喊 1 +大國 1 +大圍 1 +大城 1 +大堆 1 +大堤 1 +大增 1 +大士 1 +大失 1 +大島 1 +大嶼 1 +大幅 1 +大怒 1 +大悟 1 +大敵 1 +大新 1 +大校 1 +大概 1 +大正 1 +大殿 1 +大汗 1 +大河 1 +大洋 1 +大湖 1 +大溪 1 +大漠 1 +大獲 1 +大理 1 +大發 1 +大白 1 +大窘 1 +大紅 1 +大經 1 +大綱 1 +大腦 1 +大腸 1 +大膽 1 +大興 1 +大舉 1 +大艇 1 +大華 1 +大蒜 1 +大薇 1 +大跌 1 +大路 1 +大辦 1 +大通 1 +大進 1 +大郎 1 +大部 1 +大都 1 +大醉 1 +大釗 1 +大銘 1 +大門 1 +大雄 1 +大韓 1 +大馬 1 +大驚 1 +大體 1 +大鬧 1 +大黨 1 +大鼠 1 +天份 1 +天佐 1 +天使 1 +天倫 1 +天元 1 +天安 1 +天寶 1 +天差 1 +天性 1 +天悅 1 +天慶 1 +天才 1 +天母 1 +天河 1 +天涯 1 +天球 1 +天祐 1 +天窗 1 +天紀 1 +天翔 1 +天翼 1 +天賜 1 +天賦 1 +天馬 1 +太傅 1 +太元 1 +太冷 1 +太初 1 +太后 1 +太宗 1 +太宰 1 +太尉 1 +太常 1 +太湖 1 +太炎 1 +太監 1 +太行 1 +太近 1 +太遠 1 +夫仇 1 +夫喬 1 +夫堡 1 +夫妻 1 +夫尼 1 +夫森 1 +夫納 1 +夫茨 1 +夫魯 1 +央行 1 +失利 1 +失地 1 +失所 1 +失效 1 +失職 1 +失能 1 +失落 1 +失誤 1 +失蹤 1 +夷昧 1 +夸特 1 +夾狀 1 +奇俠 1 +奇幻 1 +奇怪 1 +奇斯 1 +奇曼 1 +奇缺 1 +奇耶 1 +奇里 1 +奇非 1 +奇頓 1 +奈克 1 +奈德 1 +奈爾 1 +奈葉 1 +奉命 1 +奉安 1 +奉律 1 +奉新 1 +奉系 1 +奎德 1 +奎茲 1 +奏鳴 1 +契克 1 +契特 1 +奕詝 1 +套出 1 +套用 1 +奢華 1 +奧伊 1 +奧內 1 +奧利 1 +奧古 1 +奧姆 1 +奧得 1 +奧托 1 +奧格 1 +奧洛 1 +奧瓦 1 +奧的 1 +奧米 1 +奧羅 1 +奧羽 1 +奧蒂 1 +奪去 1 +奬懲 1 +女人 1 +女傭 1 +女僕 1 +女優 1 +女友 1 +女嬰 1 +女木 1 +女水 1 +女版 1 +女生 1 +女眷 1 +女短 1 +奴役 1 +奶爸 1 +她倆 1 +好上 1 +好奇 1 +好意 1 +好手 1 +好氧 1 +好色 1 +如指 1 +如數 1 +如流 1 +如生 1 +妄圖 1 +妊娠 1 +妖怪 1 +妥也 1 +妮科 1 +妮綺 1 +妹夫 1 +妻妹 1 +妻姐 1 +妻室 1 +姆古 1 +姆士 1 +姆希 1 +姆庫 1 +姆德 1 +姆瓦 1 +姆萊 1 +姆齊 1 +姊姊 1 +始發 1 +始祖 1 +始稱 1 +始興 1 +姑娘 1 +姑母 1 +姓埋 1 +委內 1 +委身 1 +姚里 1 +姥姥 1 +姦情 1 +姪女 1 +姬瑪 1 +姿色 1 +威光 1 +威嚇 1 +威塞 1 +威夷 1 +威權 1 +威治 1 +威特 1 +威瑟 1 +威舍 1 +威靈 1 +娘家 1 +娜塔 1 +娜茲 1 +婆羅 1 +婚事 1 +婚宴 1 +婚禮 1 +婢女 1 +婷婷 1 +媒介 1 +媚娘 1 +嫁與 1 +嫘縈 1 +嫣然 1 +嬰孩 1 +子孫 1 +子文 1 +子球 1 +子程 1 +孕育 1 +孕酮 1 +字一 1 +字喃 1 +字幕 1 +字模 1 +字號 1 +存世 1 +存取 1 +存放 1 +孛許 1 +孜別 1 +孝感 1 +孝次 1 +孟加 1 +孟德 1 +孟雄 1 +季後 1 +季惟 1 +季米 1 +季采 1 +季風 1 +季龍 1 +孤島 1 +孤芳 1 +孤身 1 +孩提 1 +學兼 1 +學到 1 +學前 1 +學家 1 +學業 1 +學民 1 +學津 1 +學社 1 +學聯 1 +學苑 1 +宇航 1 +守備 1 +守孝 1 +守文 1 +守法 1 +守臣 1 +守謙 1 +守齋 1 +安二 1 +安妮 1 +安娜 1 +安安 1 +安岳 1 +安徒 1 +安托 1 +安撫 1 +安放 1 +安普 1 +安會 1 +安樂 1 +安正 1 +安民 1 +安汶 1 +安然 1 +安營 1 +安理 1 +安聯 1 +安葬 1 +安蘭 1 +安諾 1 +安達 1 +宋國 1 +完好 1 +完畢 1 +宏偉 1 +宏坤 1 +宏德 1 +宏聲 1 +宏道 1 +宏量 1 +宗偉 1 +宗哈 1 +宗憲 1 +宗谷 1 +宗龍 1 +官兵 1 +官司 1 +官府 1 +官服 1 +官腔 1 +官話 1 +官邸 1 +官野 1 +官長 1 +宙域 1 +定位 1 +定價 1 +定向 1 +定康 1 +定影 1 +定性 1 +定案 1 +定理 1 +定量 1 +宛城 1 +宜合 1 +宜興 1 +宜諾 1 +客場 1 +客家 1 +客觀 1 +客貨 1 +客輪 1 +客量 1 +宣判 1 +宣化 1 +宣帝 1 +宣誓 1 +室外 1 +室溫 1 +宦官 1 +宮人 1 +宮崎 1 +宰李 1 +宴席 1 +宴會 1 +家光 1 +家勁 1 +家務 1 +家口 1 +家可 1 +家外 1 +家奴 1 +家干 1 +家用 1 +家立 1 +家道 1 +家驤 1 +容器 1 +容忍 1 +容許 1 +容量 1 +宿敵 1 +宿根 1 +寄存 1 +寄送 1 +寅成 1 +密山 1 +密文 1 +密歇 1 +密西 1 +密集 1 +富卡 1 +富商 1 +富恩 1 +富翁 1 +富蘭 1 +富裕 1 +富豪 1 +富貴 1 +富邦 1 +察合 1 +察哈 1 +察沃 1 +寡尿 1 +實則 1 +實屬 1 +實情 1 +實戰 1 +實收 1 +實權 1 +實況 1 +實踐 1 +寧波 1 +審批 1 +審理 1 +審計 1 +審評 1 +審議 1 +寫下 1 +寫信 1 +寫入 1 +寫出 1 +寫字 1 +寫成 1 +寫進 1 +寬容 1 +寬度 1 +寬敞 1 +寬條 1 +寬順 1 +寮國 1 +寵物 1 +寵臣 1 +寶光 1 +寶劍 1 +寶如 1 +寶應 1 +寶樓 1 +寶殿 1 +寶玉 1 +寶田 1 +寶血 1 +寶雞 1 +寶雲 1 +寶麗 1 +寺事 1 +寺前 1 +封土 1 +封為 1 +封爵 1 +封穴 1 +封號 1 +封裝 1 +封路 1 +射失 1 +射程 1 +射箭 1 +射線 1 +射鵰 1 +將來 1 +將領 1 +專任 1 +專制 1 +專吃 1 +專指 1 +專政 1 +專機 1 +專橫 1 +專欄 1 +專權 1 +專款 1 +專注 1 +專線 1 +專註 1 +專賣 1 +專長 1 +專項 1 +尊崇 1 +尊敬 1 +尊稱 1 +尋三 1 +尋回 1 +尋親 1 +對上 1 +對付 1 +對撞 1 +對準 1 +對照 1 +對生 1 +對白 1 +對稱 1 +對立 1 +對簿 1 +對話 1 +對面 1 +對飛 1 +導入 1 +導出 1 +導向 1 +導彈 1 +導播 1 +導正 1 +小人 1 +小兔 1 +小刀 1 +小南 1 +小國 1 +小小 1 +小島 1 +小巷 1 +小息 1 +小數 1 +小書 1 +小欖 1 +小水 1 +小河 1 +小津 1 +小浪 1 +小澤 1 +小片 1 +小生 1 +小田 1 +小知 1 +小石 1 +小童 1 +小舖 1 +小虎 1 +小街 1 +小輪 1 +小野 1 +小隊 1 +小順 1 +小顏 1 +小風 1 +小體 1 +少兒 1 +少將 1 +少年 1 +少懷 1 +少林 1 +少見 1 +少許 1 +少量 1 +尖端 1 +尖酸 1 +尖頂 1 +尚州 1 +尚德 1 +尚方 1 +尚書 1 +尤利 1 +尤勒 1 +尤指 1 +尤里 1 +就此 1 +就熟 1 +就職 1 +尷尬 1 +尹氏 1 +尼地 1 +尼夫 1 +尼師 1 +尼庫 1 +尼律 1 +尼拉 1 +尼歐 1 +尼比 1 +尼茲 1 +尼萊 1 +尼蘇 1 +尼諾 1 +尼赫 1 +尼郡 1 +尾巴 1 +尾柄 1 +尾隨 1 +尾雉 1 +尾鰭 1 +尾龍 1 +局勢 1 +局間 1 +居家 1 +居所 1 +居留 1 +居禮 1 +屆滿 1 +屈一 1 +屋宇 1 +屋頂 1 +屍體 1 +屏山 1 +屏東 1 +屏風 1 +展品 1 +展望 1 +展貿 1 +屠村 1 +屠龍 1 +層壓 1 +層次 1 +層疊 1 +層級 1 +層面 1 +履行 1 +屬國 1 +屬於 1 +屬靈 1 +屯南 1 +山下 1 +山內 1 +山口 1 +山地 1 +山姆 1 +山峰 1 +山崖 1 +山手 1 +山月 1 +山村 1 +山楂 1 +山猿 1 +山田 1 +山胞 1 +山葉 1 +山陵 1 +山麓 1 +山龍 1 +岐女 1 +岐阜 1 +岐陽 1 +岔江 1 +岡恩 1 +岡本 1 +岩屋 1 +岩心 1 +岩手 1 +岩漿 1 +岳泰 1 +岷江 1 +岸川 1 +岸賈 1 +岸邊 1 +峯崎 1 +峰倉 1 +峰景 1 +島內 1 +島國 1 +島戴 1 +島蚺 1 +峽灣 1 +峽谷 1 +崇善 1 +崇尚 1 +崇敬 1 +崎頭 1 +崔奇 1 +崔陂 1 +崗斜 1 +崙頂 1 +崞縣 1 +崩坍 1 +崩潰 1 +嵩祝 1 +巔峰 1 +川南 1 +川村 1 +川邊 1 +州界 1 +州舞 1 +巡査 1 +工事 1 +工務 1 +工序 1 +工廠 1 +工會 1 +工法 1 +工潮 1 +左岸 1 +左拉 1 +左派 1 +左膀 1 +左轉 1 +巨作 1 +巨像 1 +巨冊 1 +巨型 1 +巨石 1 +巨賈 1 +巨野 1 +巫師 1 +差分 1 +差別 1 +差勁 1 +差地 1 +差會 1 +差無 1 +差諾 1 +己二 1 +己巳 1 +己酉 1 +已故 1 +已晚 1 +已死 1 +巴亞 1 +巴卑 1 +巴喬 1 +巴城 1 +巴孛 1 +巴巴 1 +巴底 1 +巴庫 1 +巴德 1 +巴思 1 +巴恩 1 +巴羅 1 +巴英 1 +巴莫 1 +巴蒂 1 +巴薩 1 +巴諾 1 +巴賽 1 +巴赫 1 +巴雷 1 +巴頓 1 +市售 1 +市縣 1 +市轄 1 +市面 1 +布丹 1 +布伯 1 +布倫 1 +布列 1 +布哈 1 +布地 1 +布夏 1 +布宜 1 +布尼 1 +布巴 1 +布政 1 +布料 1 +布林 1 +布氏 1 +布置 1 +布隆 1 +布頓 1 +帆布 1 +帆船 1 +希伯 1 +希克 1 +希姆 1 +希涅 1 +希特 1 +希皮 1 +希鵬 1 +帕內 1 +帕器 1 +帕搏 1 +帕沙 1 +帕爾 1 +帕特 1 +帕米 1 +帕維 1 +帕薩 1 +帕西 1 +帕迪 1 +帕那 1 +帕金 1 +帝王 1 +帝芬 1 +帝都 1 +師今 1 +師團 1 +師徒 1 +師從 1 +師父 1 +師生 1 +師益 1 +席勒 1 +帳目 1 +帶上 1 +帶出 1 +帶子 1 +帶少 1 +帶水 1 +帶英 1 +常住 1 +常勝 1 +常客 1 +常態 1 +常盛 1 +常識 1 +常量 1 +常青 1 +常駐 1 +幅員 1 +幕府 1 +幕後 1 +幣原 1 +幪面 1 +幫主 1 +干伊 1 +干王 1 +干達 1 +平反 1 +平和 1 +平地 1 +平坦 1 +平帝 1 +平常 1 +平手 1 +平日 1 +平林 1 +平沼 1 +平滑 1 +平臺 1 +平行 1 +平陵 1 +平陽 1 +平頓 1 +年中 1 +年份 1 +年幼 1 +年息 1 +年益 1 +年第 1 +年老 1 +年號 1 +年資 1 +年青 1 +并行 1 +幸一 1 +幸好 1 +幸運 1 +幹事 1 +幹掉 1 +幹流 1 +幹道 1 +幼子 1 +幼年 1 +幼弟 1 +幼發 1 +幼稚 1 +幼貓 1 +幼魚 1 +幼鯨 1 +幼鳥 1 +幽閣 1 +幾內 1 +幾千 1 +幾多 1 +幾百 1 +庇烏 1 +床鋪 1 +底冊 1 +底格 1 +底比 1 +底片 1 +底特 1 +底稿 1 +底質 1 +店家 1 +庚戌 1 +府中 1 +府二 1 +府城 1 +府尹 1 +府第 1 +度宗 1 +度尼 1 +度蘭 1 +座位 1 +座右 1 +座座 1 +座椅 1 +座苣 1 +座西 1 +座談 1 +庫哈 1 +庫柏 1 +庫欣 1 +庫瑙 1 +庫賽 1 +庫赫 1 +庫迪 1 +庫頁 1 +庭園 1 +庭薺 1 +庭長 1 +康史 1 +康奈 1 +康子 1 +康安 1 +康寧 1 +康德 1 +康樂 1 +康濟 1 +康福 1 +康科 1 +康羅 1 +廉潔 1 +廚師 1 +廝守 1 +廟倉 1 +廟方 1 +廟橋 1 +廟鎮 1 +廢待 1 +廢棄 1 +廢熱 1 +廢舊 1 +廣受 1 +廣權 1 +廣澳 1 +廣稱 1 +廣金 1 +廬山 1 +廳局 1 +廳長 1 +延安 1 +延年 1 +延音 1 +廷和 1 +廷尉 1 +建好 1 +建威 1 +建市 1 +建構 1 +建武 1 +建置 1 +建華 1 +建超 1 +廿五 1 +廿六 1 +弄到 1 +弄清 1 +弄眼 1 +弊案 1 +式一 1 +式塔 1 +式微 1 +弓尾 1 +弓弦 1 +弓箭 1 +引來 1 +引咎 1 +引導 1 +引江 1 +引渡 1 +引申 1 +引資 1 +弗內 1 +弗拉 1 +弗格 1 +弗洛 1 +弗特 1 +弗萊 1 +弗蘭 1 +弘前 1 +弘宣 1 +弭兵 1 +張家 1 +張氏 1 +強勁 1 +強化 1 +強拍 1 +強暴 1 +強權 1 +強求 1 +強盜 1 +強迫 1 +強韌 1 +強項 1 +彈劾 1 +彈塗 1 +彈撥 1 +彈盡 1 +彌撒 1 +彌斯 1 +彌格 1 +彌補 1 +彌賽 1 +彎曲 1 +彗差 1 +彗星 1 +彙編 1 +形像 1 +形同 1 +形翼 1 +形體 1 +彥根 1 +彥直 1 +彩畫 1 +彩繪 1 +彩繽 1 +彩雲 1 +彩鳳 1 +彪馬 1 +彭劉 1 +彭博 1 +彭古 1 +彭定 1 +彭拿 1 +彰信 1 +影帝 1 +影機 1 +影線 1 +影評 1 +影迷 1 +影集 1 +影音 1 +彷彿 1 +彼得 1 +彼特 1 +彼落 1 +彼魯 1 +往上 1 +往世 1 +往日 1 +征西 1 +待到 1 +待舉 1 +很小 1 +很強 1 +很忙 1 +很懶 1 +很是 1 +很深 1 +很遠 1 +很重 1 +很長 1 +律定 1 +律斯 1 +律狄 1 +後世 1 +後代 1 +後勤 1 +後南 1 +後周 1 +後宮 1 +後庄 1 +後悔 1 +後援 1 +後春 1 +後果 1 +後梁 1 +後段 1 +後母 1 +後稱 1 +後續 1 +後置 1 +後藤 1 +後送 1 +後防 1 +後齒 1 +徒具 1 +徒手 1 +徒生 1 +得克 1 +得利 1 +得哥 1 +得堡 1 +得心 1 +得悉 1 +得患 1 +得獎 1 +得益 1 +得維 1 +從來 1 +從句 1 +從周 1 +從善 1 +從政 1 +御史 1 +御名 1 +御墨 1 +御宅 1 +御窯 1 +御雷 1 +復健 1 +復合 1 +復寫 1 +復生 1 +復甦 1 +循道 1 +微型 1 +微妙 1 +微小 1 +微波 1 +微觀 1 +微量 1 +徵兆 1 +徵招 1 +徵祥 1 +德勝 1 +德哥 1 +德奧 1 +德妃 1 +德姆 1 +德威 1 +德宏 1 +德富 1 +德干 1 +德愛 1 +德懷 1 +德文 1 +德曼 1 +德林 1 +德比 1 +德江 1 +德瓦 1 +德甲 1 +德米 1 +德納 1 +德西 1 +德諾 1 +德靈 1 +德馬 1 +德高 1 +徽章 1 +心勃 1 +心境 1 +心宿 1 +心意 1 +心應 1 +心智 1 +心疾 1 +心目 1 +心肌 1 +必和 1 +必拓 1 +必走 1 +必需 1 +忍心 1 +忍氣 1 +忒彌 1 +志摩 1 +志明 1 +志道 1 +忘記 1 +忠於 1 +忠誠 1 +快上 1 +快捷 1 +快綫 1 +忽視 1 +思侯 1 +思巴 1 +思從 1 +思德 1 +思成 1 +思維 1 +思缽 1 +思考 1 +急劇 1 +急忙 1 +急救 1 +急於 1 +急流 1 +急症 1 +急行 1 +性向 1 +性命 1 +性情 1 +性腺 1 +怪圈 1 +怪聲 1 +恆大 1 +恆德 1 +恆河 1 +恐嚇 1 +恐懼 1 +恢豐 1 +恣意 1 +恩利 1 +恩南 1 +恩卡 1 +恩哈 1 +恩慈 1 +恩斯 1 +恩特 1 +恩秀 1 +恩贈 1 +恭子 1 +息率 1 +恰尼 1 +悉心 1 +悉達 1 +悟到 1 +悟空 1 +患失 1 +患得 1 +患病 1 +悲傷 1 +悲劇 1 +悲嘆 1 +悲慘 1 +悲鴻 1 +悼念 1 +情不 1 +情人 1 +情勢 1 +情愁 1 +情愛 1 +情景 1 +情結 1 +情誼 1 +情資 1 +情陷 1 +情願 1 +惇曧 1 +惠亞 1 +惠梨 1 +惠特 1 +惡人 1 +惡化 1 +惡夢 1 +惡性 1 +惡搞 1 +惡臭 1 +惡靈 1 +惡魔 1 +想必 1 +想起 1 +愈加 1 +愈大 1 +愈高 1 +愉快 1 +意圖 1 +意念 1 +意甲 1 +意魔 1 +愙威 1 +愚園 1 +愚昧 1 +愛好 1 +愛思 1 +愛恨 1 +愛意 1 +愛慕 1 +愛明 1 +愛樂 1 +愛河 1 +愛莎 1 +愛迪 1 +愛默 1 +感冒 1 +感謝 1 +慈湖 1 +慈濟 1 +慌亂 1 +慎太 1 +慕容 1 +慕肯 1 +慘叫 1 +慘重 1 +慚愧 1 +慢行 1 +慢駛 1 +慧嫻 1 +慰安 1 +慶典 1 +慶曆 1 +慶貽 1 +慶黎 1 +慷慨 1 +憂憤 1 +憲政 1 +憲民 1 +憲法 1 +憶蓮 1 +應付 1 +應允 1 +應屆 1 +應戰 1 +應手 1 +應昌 1 +應當 1 +應許 1 +應邀 1 +懲罰 1 +懶爪 1 +懶甸 1 +懷仁 1 +懷克 1 +懷好 1 +懷念 1 +懷慶 1 +懷抱 1 +懷水 1 +懷聖 1 +懸掛 1 +懼高 1 +戀人 1 +戀屍 1 +戀童 1 +戈德 1 +戈爾 1 +戈登 1 +戈矛 1 +戈維 1 +戈蘭 1 +成事 1 +成仁 1 +成化 1 +成半 1 +成名 1 +成品 1 +成套 1 +成對 1 +成形 1 +成梁 1 +成行 1 +成語 1 +我國 1 +戟鯨 1 +截然 1 +截至 1 +截頜 1 +戰事 1 +戰力 1 +戰勝 1 +戰地 1 +戰平 1 +戰情 1 +戰船 1 +戲子 1 +戲曲 1 +戲法 1 +戲碼 1 +戲謔 1 +戲院 1 +戴上 1 +戴克 1 +戴勝 1 +戴斯 1 +戴爾 1 +戴維 1 +戴蒙 1 +戴頓 1 +戶田 1 +戶籍 1 +房東 1 +所不 1 +所料 1 +所望 1 +所為 1 +所長 1 +手上 1 +手工 1 +手感 1 +手抄 1 +手指 1 +手提 1 +手旁 1 +手槍 1 +手稿 1 +手筆 1 +手腳 1 +手邊 1 +手風 1 +手龍 1 +才子 1 +才是 1 +才智 1 +扎什 1 +扎爾 1 +扎特 1 +扎阿 1 +打亂 1 +打人 1 +打包 1 +打坐 1 +打撈 1 +打死 1 +打水 1 +打牌 1 +打碎 1 +打菲 1 +打造 1 +打響 1 +扔出 1 +托倫 1 +托加 1 +托弗 1 +托格 1 +托洛 1 +托瓦 1 +托盤 1 +托米 1 +托茂 1 +扣上 1 +扶林 1 +批次 1 +扼止 1 +找來 1 +找續 1 +承天 1 +承德 1 +承接 1 +承斌 1 +承租 1 +技師 1 +技戰 1 +技法 1 +抑制 1 +抑鬱 1 +抒解 1 +抓到 1 +投交 1 +投奔 1 +投標 1 +投球 1 +投身 1 +投靠 1 +抗大 1 +抗拒 1 +抗衡 1 +抗體 1 +折不 1 +折射 1 +折斷 1 +折衷 1 +抨擊 1 +披覆 1 +披頭 1 +抬昇 1 +抱持 1 +抵受 1 +抵禦 1 +押韻 1 +抽檢 1 +抽煙 1 +抽象 1 +抽走 1 +拆分 1 +拆卸 1 +拆掉 1 +拆遷 1 +拉O 1 +拉亞 1 +拉什 1 +拉倫 1 +拉利 1 +拉博 1 +拉卜 1 +拉只 1 +拉圭 1 +拉塞 1 +拉奏 1 +拉尼 1 +拉差 1 +拉布 1 +拉帕 1 +拉彼 1 +拉扎 1 +拉拉 1 +拉日 1 +拉林 1 +拉柯 1 +拉桑 1 +拉森 1 +拉欣 1 +拉法 1 +拉漢 1 +拉特 1 +拉珀 1 +拉瑙 1 +拉瑪 1 +拉籌 1 +拉維 1 +拉罕 1 +拉美 1 +拉華 1 +拉薩 1 +拉諾 1 +拉貝 1 +拉赫 1 +拉越 1 +拉那 1 +拉麥 1 +拉齊 1 +拉龍 1 +拋棄 1 +拋物 1 +拍照 1 +拍賣 1 +拒不 1 +拓務 1 +拓建 1 +拓撲 1 +拔刀 1 +拖進 1 +拖錯 1 +拖鞋 1 +拙劣 1 +招潮 1 +招生 1 +招聘 1 +招降 1 +拜仁 1 +拜拜 1 +括弧 1 +拱廊 1 +拱橋 1 +拳一 1 +拳擊 1 +拳賽 1 +拷問 1 +拼寫 1 +拾糞 1 +拿來 1 +拿島 1 +拿路 1 +拿錯 1 +持久 1 +持球 1 +指使 1 +指掌 1 +指標 1 +指派 1 +指稱 1 +指責 1 +挑選 1 +挖子 1 +挖掘 1 +挪動 1 +挪用 1 +振動 1 +振幅 1 +振林 1 +挹江 1 +挺身 1 +挽回 1 +挾持 1 +捉弄 1 +捉拿 1 +捉襟 1 +捍衛 1 +捐款 1 +捐獻 1 +捕撈 1 +捕殺 1 +捕獵 1 +捕魚 1 +捕鼠 1 +捲入 1 +捷徑 1 +捷沃 1 +授勳 1 +授意 1 +授權 1 +授與 1 +掉頭 1 +掌控 1 +掌摑 1 +掌權 1 +掌鏡 1 +排場 1 +排外 1 +排序 1 +掙扎 1 +掛果 1 +掛牌 1 +掛鉤 1 +掠奪 1 +採信 1 +採摘 1 +採樣 1 +採納 1 +採購 1 +採集 1 +採食 1 +探明 1 +探望 1 +探求 1 +探究 1 +探險 1 +接到 1 +接力 1 +接班 1 +接納 1 +接聽 1 +接見 1 +接辦 1 +接送 1 +接連 1 +控告 1 +控訴 1 +推介 1 +推免 1 +推前 1 +推力 1 +推導 1 +推斷 1 +推測 1 +推演 1 +推特 1 +推理 1 +推舉 1 +推論 1 +推遲 1 +掩蓋 1 +描摹 1 +描繪 1 +提亞 1 +提前 1 +提問 1 +提子 1 +提康 1 +提拔 1 +提攜 1 +提昇 1 +提煉 1 +提督 1 +提籃 1 +提米 1 +提醒 1 +插手 1 +插曲 1 +揚光 1 +揚言 1 +換成 1 +換算 1 +握帶 1 +握持 1 +揭曉 1 +揭發 1 +揭開 1 +揮舞 1 +援助 1 +援外 1 +援引 1 +援手 1 +援救 1 +搜尋 1 +搜狐 1 +搜羅 1 +搜集 1 +搞垮 1 +搞錯 1 +搬動 1 +搬往 1 +搬移 1 +搬遷 1 +搭乘 1 +搭配 1 +搶先 1 +搶劫 1 +搶奪 1 +搶救 1 +摒棄 1 +摘下 1 +摘星 1 +摘錄 1 +摧毀 1 +摩亞 1 +摩加 1 +摩天 1 +摩崖 1 +摩托 1 +摩擦 1 +摩琴 1 +摩登 1 +摩納 1 +摩西 1 +摯友 1 +摸摸 1 +撒冷 1 +撒拉 1 +撒營 1 +撞入 1 +撞死 1 +撤回 1 +撤職 1 +撤退 1 +撤除 1 +撥出 1 +撥號 1 +撫養 1 +播種 1 +撮合 1 +撰述 1 +撲克 1 +撿起 1 +擁堵 1 +擁戴 1 +擁擠 1 +擁而 1 +擁護 1 +擂台 1 +擊中 1 +擊劍 1 +擊斃 1 +擊毀 1 +擊潰 1 +擊破 1 +擋住 1 +操控 1 +操縱 1 +擒拿 1 +擔憂 1 +擔竿 1 +擔綱 1 +據傳 1 +據此 1 +據稱 1 +據點 1 +擠塞 1 +擠壓 1 +擠奶 1 +擠眉 1 +擠迫 1 +擢升 1 +擬桿 1 +擬獅 1 +擬訂 1 +擬議 1 +擴散 1 +擴編 1 +擺弄 1 +擺渡 1 +擾亂 1 +攀爬 1 +攔截 1 +攝像 1 +攝取 1 +攪拌 1 +支取 1 +支廳 1 +支派 1 +支那 1 +支隊 1 +收場 1 +收容 1 +收市 1 +收支 1 +收生 1 +收益 1 +收租 1 +收緊 1 +收聽 1 +收買 1 +收費 1 +收養 1 +攸之 1 +改作 1 +改委 1 +改屬 1 +改投 1 +改採 1 +改換 1 +改派 1 +改發 1 +改穿 1 +改組 1 +改選 1 +改隸 1 +攻下 1 +攻勢 1 +攻堅 1 +攻方 1 +攻殺 1 +攻訐 1 +攻讀 1 +放任 1 +放入 1 +放出 1 +放到 1 +放大 1 +放影 1 +放榜 1 +放牧 1 +放緩 1 +放送 1 +放逐 1 +放開 1 +放鬆 1 +政團 1 +政委 1 +政局 1 +政廳 1 +政敵 1 +政樞 1 +政法 1 +政爭 1 +政界 1 +故郷 1 +效尤 1 +效能 1 +敏能 1 +敏銳 1 +敏錠 1 +救人 1 +救出 1 +救助 1 +救國 1 +救援 1 +救星 1 +救災 1 +救生 1 +救贖 1 +救趙 1 +敕令 1 +敕書 1 +敗局 1 +敗死 1 +敗瓦 1 +敗退 1 +教務 1 +教士 1 +教室 1 +教席 1 +教材 1 +教案 1 +教籍 1 +教總 1 +教義 1 +教職 1 +散射 1 +敦煌 1 +敬仰 1 +敬堯 1 +敬請 1 +敲擊 1 +敲訂 1 +整塊 1 +整所 1 +整架 1 +整為 1 +整片 1 +整篇 1 +整軍 1 +整顆 1 +整齊 1 +敵兵 1 +敵方 1 +數以 1 +數值 1 +數澤 1 +數百 1 +數碼 1 +數萬 1 +數論 1 +文哲 1 +文姬 1 +文岳 1 +文巨 1 +文德 1 +文摘 1 +文政 1 +文書 1 +文本 1 +文楷 1 +文武 1 +文法 1 +文清 1 +文職 1 +文賢 1 +文集 1 +文飾 1 +斐遜 1 +斑塊 1 +斑點 1 +斗貴 1 +斜坡 1 +斥教 1 +斬落 1 +斯佩 1 +斯凱 1 +斯哥 1 +斯塘 1 +斯妥 1 +斯威 1 +斯安 1 +斯尼 1 +斯巴 1 +斯廷 1 +斯楚 1 +斯汀 1 +斯狸 1 +斯珀 1 +斯班 1 +斯瓦 1 +斯聯 1 +斯艾 1 +斯菲 1 +斯雷 1 +斯頓 1 +新任 1 +新修 1 +新元 1 +新址 1 +新埔 1 +新太 1 +新奧 1 +新字 1 +新寧 1 +新屋 1 +新巴 1 +新思 1 +新昌 1 +新明 1 +新春 1 +新月 1 +新核 1 +新榮 1 +新民 1 +新浪 1 +新版 1 +新生 1 +新秀 1 +新篇 1 +新編 1 +新義 1 +新舊 1 +新製 1 +新開 1 +新飛 1 +新馬 1 +新高 1 +新鴻 1 +新黨 1 +斷後 1 +斷盡 1 +斷言 1 +方丈 1 +方八 1 +方尖 1 +方正 1 +方田 1 +方百 1 +方石 1 +方程 1 +方蓋 1 +方蟹 1 +於維 1 +施哈 1 +施奈 1 +施文 1 +施瓦 1 +施用 1 +施韋 1 +旁觀 1 +旅居 1 +旅程 1 +旋渦 1 +旋轉 1 +族雄 1 +族頭 1 +旗艦 1 +旗面 1 +既得 1 +既是 1 +既然 1 +日井 1 +日出 1 +日向 1 +日夜 1 +日子 1 +日日 1 +日照 1 +日用 1 +日色 1 +日落 1 +日誌 1 +日賜 1 +旦增 1 +旦夕 1 +早有 1 +早餐 1 +旭烈 1 +旱災 1 +旻寧 1 +昂納 1 +昆丁 1 +昆蟲 1 +昌吉 1 +昌都 1 +明中 1 +明亞 1 +明亮 1 +明代 1 +明內 1 +明園 1 +明宗 1 +明尼 1 +明憲 1 +明昌 1 +明智 1 +明正 1 +明潭 1 +明白 1 +明碁 1 +明翰 1 +明視 1 +易卜 1 +易守 1 +易幟 1 +易水 1 +易燃 1 +易經 1 +易莎 1 +昔蘭 1 +星團 1 +星塵 1 +星展 1 +星崎 1 +星系 1 +映像 1 +春丕 1 +春季 1 +春會 1 +春田 1 +春筍 1 +春節 1 +春緋 1 +春耕 1 +昨日 1 +昭侯 1 +昭儀 1 +昭宗 1 +昭禮 1 +昭通 1 +是年 1 +是方 1 +是次 1 +時事 1 +時份 1 +時值 1 +時光 1 +時刻 1 +時報 1 +時弊 1 +時稱 1 +時舉 1 +時針 1 +晃動 1 +晉之 1 +晉北 1 +晉哲 1 +晉江 1 +晉級 1 +晒乾 1 +晨間 1 +普世 1 +普什 1 +普伊 1 +普利 1 +普提 1 +普曼 1 +普森 1 +普第 1 +普肯 1 +普芮 1 +普薩 1 +普里 1 +普金 1 +景泰 1 +景灣 1 +晴神 1 +晶瑩 1 +晶閘 1 +智伯 1 +智利 1 +智趣 1 +暑期 1 +暖氣 1 +暗中 1 +暗喻 1 +暗影 1 +暗房 1 +暗指 1 +暗礁 1 +暗紅 1 +暗號 1 +暫別 1 +暫無 1 +暮光 1 +暱稱 1 +暴亂 1 +暴斂 1 +暴死 1 +暴風 1 +暹羅 1 +曄之 1 +曉夫 1 +曉彬 1 +曉得 1 +曉聲 1 +曉舟 1 +曖昧 1 +曬相 1 +曬衣 1 +曲口 1 +曲同 1 +曲張 1 +曲率 1 +曲目 1 +曲線 1 +曲藝 1 +曲阜 1 +曲頜 1 +更低 1 +更佳 1 +更大 1 +更審 1 +更小 1 +更強 1 +更快 1 +更新 1 +更是 1 +更硬 1 +更衣 1 +更輕 1 +更長 1 +曷懶 1 +書本 1 +書裡 1 +書迷 1 +書面 1 +書香 1 +曹家 1 +曹甸 1 +曹記 1 +曼什 1 +曼切 1 +曼哈 1 +曼城 1 +曼寧 1 +曼徹 1 +曼成 1 +曼斯 1 +曼海 1 +曼涅 1 +曼玉 1 +曼科 1 +曼達 1 +曾任 1 +曾孫 1 +曾愛 1 +曾祖 1 +替人 1 +最內 1 +最前 1 +最受 1 +最外 1 +最強 1 +最旺 1 +最最 1 +最末 1 +最東 1 +最純 1 +最遠 1 +會上 1 +會址 1 +會師 1 +會戰 1 +會所 1 +會晤 1 +會章 1 +會見 1 +會計 1 +月色 1 +月薪 1 +有份 1 +有別 1 +有力 1 +有名 1 +有愛 1 +有方 1 +有期 1 +有染 1 +有條 1 +有異 1 +有病 1 +有稱 1 +有花 1 +有點 1 +服刑 1 +朗丹 1 +朗恰 1 +朗杜 1 +朗西 1 +朗豪 1 +朗頓 1 +望族 1 +朝下 1 +朝元 1 +朝政 1 +朝散 1 +朝東 1 +朝聖 1 +朝覲 1 +朝貢 1 +朝陽 1 +期刊 1 +木中 1 +木乃 1 +木刻 1 +木卡 1 +木城 1 +木尼 1 +木屋 1 +木工 1 +木戶 1 +木揚 1 +木斯 1 +木村 1 +木樣 1 +木櫾 1 +木蘭 1 +木造 1 +木鳥 1 +木齊 1 +未入 1 +未敢 1 +未有 1 +未深 1 +未滿 1 +末端 1 +本劇 1 +本名 1 +本城 1 +本始 1 +本季 1 +本島 1 +本市 1 +本德 1 +本書 1 +本營 1 +本目 1 +本省 1 +本社 1 +本縣 1 +本能 1 +本著 1 +本郡 1 +本部 1 +本鄉 1 +本集 1 +本領 1 +札幌 1 +札特 1 +朱里 1 +朴次 1 +朵眼 1 +杉並 1 +李察 1 +杏出 1 +杏子 1 +材官 1 +材質 1 +村旁 1 +杖責 1 +杜乃 1 +杜伊 1 +杜克 1 +杜利 1 +杜成 1 +杜浦 1 +杜甫 1 +杜隆 1 +杜鵑 1 +杯賽 1 +杰仔 1 +東主 1 +東加 1 +東勝 1 +東坡 1 +東姑 1 +東宮 1 +東岸 1 +東巡 1 +東急 1 +東支 1 +東昇 1 +東映 1 +東桑 1 +東條 1 +東武 1 +東涌 1 +東渡 1 +東直 1 +東站 1 +東興 1 +東華 1 +東距 1 +東道 1 +東邊 1 +東郊 1 +東鄉 1 +東鐵 1 +東隧 1 +東風 1 +松下 1 +松坂 1 +松山 1 +松島 1 +松州 1 +松翔 1 +松花 1 +板式 1 +林克 1 +林地 1 +林堡 1 +林場 1 +林威 1 +林布 1 +林斯 1 +林業 1 +林檎 1 +林翼 1 +林胡 1 +林豬 1 +果然 1 +果真 1 +果酒 1 +枝葉 1 +架次 1 +枸杞 1 +柏力 1 +柏加 1 +柏村 1 +柏松 1 +柏臣 1 +染手 1 +染病 1 +柔道 1 +柚木 1 +柝聲 1 +柢固 1 +查找 1 +查普 1 +查氏 1 +查特 1 +柬埔 1 +柯伊 1 +柯克 1 +柱銘 1 +柳川 1 +柳州 1 +柳德 1 +柳葉 1 +柴電 1 +柿本 1 +栗橋 1 +校呔 1 +校簿 1 +校門 1 +栩如 1 +栩栩 1 +株式 1 +核孔 1 +核實 1 +核工 1 +核彈 1 +核發 1 +核研 1 +核算 1 +根培 1 +根深 1 +根生 1 +根究 1 +根莖 1 +根部 1 +格丁 1 +格仔 1 +格但 1 +格來 1 +格司 1 +格奧 1 +格子 1 +格尼 1 +格拿 1 +格森 1 +格瑪 1 +格莫 1 +格陵 1 +格雷 1 +桂陵 1 +桃子 1 +框架 1 +框線 1 +案例 1 +案達 1 +桐生 1 +桑克 1 +桑德 1 +桑托 1 +桑納 1 +桓子 1 +桓玄 1 +桿菌 1 +梁伐 1 +梁贊 1 +梁龍 1 +梅園 1 +梅帕 1 +梅捷 1 +梅里 1 +梓里 1 +條不 1 +條款 1 +條紋 1 +梧州 1 +梨花 1 +梨香 1 +梭羅 1 +梯隊 1 +梳頜 1 +梵安 1 +棉條 1 +棋局 1 +棋盤 1 +棋聖 1 +棋院 1 +棋類 1 +棒錘 1 +棕色 1 +棕褐 1 +森德 1 +森斯 1 +森納 1 +森費 1 +棲地 1 +棲身 1 +植株 1 +椎名 1 +椰林 1 +楓樹 1 +楚克 1 +楚瑜 1 +楚紅 1 +楠桂 1 +楠溪 1 +業主 1 +業業 1 +業餘 1 +極北 1 +極區 1 +極少 1 +極為 1 +極矮 1 +極長 1 +極闊 1 +極限 1 +楷書 1 +楷模 1 +概要 1 +榆林 1 +榔頭 1 +榕樹 1 +榜羅 1 +榨出 1 +榫眼 1 +榮廷 1 +榮洲 1 +榮茂 1 +榴彈 1 +構思 1 +構造 1 +槍尖 1 +槍尾 1 +槍殺 1 +槍舌 1 +槍術 1 +樂園 1 +樂安 1 +樂官 1 +樂山 1 +樂師 1 +樂手 1 +樂敏 1 +樂樂 1 +樂活 1 +樂環 1 +樂美 1 +樂翠 1 +樂觀 1 +樂趣 1 +樓夢 1 +樓宇 1 +樓層 1 +樓底 1 +樓煩 1 +樓盤 1 +樓面 1 +樓高 1 +標售 1 +標志 1 +標明 1 +標有 1 +標示 1 +標籤 1 +標記 1 +標註 1 +標高 1 +樞密 1 +模一 1 +模里 1 +樣品 1 +樣式 1 +樣貌 1 +樸實 1 +樸歸 1 +樹上 1 +樹幹 1 +樹枝 1 +樹龍 1 +橈腳 1 +橋上 1 +橋樑 1 +橋面 1 +機上 1 +機位 1 +機型 1 +機密 1 +機師 1 +機床 1 +機敏 1 +機械 1 +機理 1 +機種 1 +機能 1 +機製 1 +機遇 1 +橫帶 1 +橫徵 1 +橫渡 1 +橫線 1 +檔案 1 +檔次 1 +檜山 1 +檢驗 1 +檨仔 1 +檳榔 1 +檸七 1 +櫃檯 1 +櫟社 1 +欄目 1 +權氏 1 +權限 1 +次克 1 +次席 1 +次月 1 +次生 1 +次程 1 +次茅 1 +欣快 1 +欲絕 1 +款式 1 +歇根 1 +歌人 1 +歌壇 1 +歌星 1 +歌舞 1 +歌詞 1 +歌頌 1 +歐律 1 +歐盟 1 +歐羅 1 +歐青 1 +歐麥 1 +歡慶 1 +歡樂 1 +正值 1 +正傳 1 +正夫 1 +正子 1 +正宇 1 +正巧 1 +正平 1 +正正 1 +正比 1 +正派 1 +正版 1 +正當 1 +正經 1 +正負 1 +正配 1 +正陽 1 +此事 1 +此地 1 +此夢 1 +此書 1 +此樓 1 +此橋 1 +此片 1 +此處 1 +此語 1 +此起 1 +此路 1 +此陵 1 +此項 1 +此魚 1 +步伐 1 +步蟾 1 +步行 1 +步驟 1 +武克 1 +武力 1 +武威 1 +武帝 1 +武廟 1 +武廠 1 +武德 1 +武打 1 +武王 1 +武略 1 +武皇 1 +武者 1 +武藏 1 +歲月 1 +歷代 1 +歷來 1 +歷屬 1 +歷程 1 +歸來 1 +歸入 1 +歸到 1 +歸功 1 +歸咎 1 +歸案 1 +歸真 1 +歸還 1 +歸附 1 +死不 1 +死刑 1 +死回 1 +死因 1 +死地 1 +死戰 1 +死期 1 +死板 1 +死狀 1 +死而 1 +死黨 1 +殉教 1 +殉爆 1 +殉職 1 +殊榮 1 +殘疾 1 +殘破 1 +殘遺 1 +殘部 1 +殲滅 1 +殺人 1 +殺手 1 +殺機 1 +殼層 1 +殼體 1 +殿堂 1 +毀壞 1 +毀容 1 +毅仁 1 +毅然 1 +母拿 1 +母會 1 +母校 1 +母狼 1 +母猴 1 +母艦 1 +母語 1 +母貓 1 +毎年 1 +每元 1 +每座 1 +每戶 1 +每所 1 +每枚 1 +每每 1 +每股 1 +每邊 1 +每集 1 +每鼎 1 +毒​ 1 +毒品 1 +毒死 1 +毒癮 1 +毒舌 1 +毓林 1 +毓楓 1 +毓芳 1 +比哈 1 +比喻 1 +比妥 1 +比得 1 +比恩 1 +比斯 1 +比方 1 +比格 1 +比武 1 +比薩 1 +比袍 1 +比褂 1 +比魯 1 +毗闍 1 +毛色 1 +毛豬 1 +毛髮 1 +毫安 1 +毫無 1 +毯子 1 +氈幕 1 +氏亞 1 +氏奇 1 +民事 1 +民俗 1 +民力 1 +民居 1 +民工 1 +民心 1 +民意 1 +民房 1 +民柬 1 +民權 1 +民法 1 +民盟 1 +民答 1 +民航 1 +民英 1 +民謠 1 +民豐 1 +民選 1 +民鐸 1 +民防 1 +氣吞 1 +氣息 1 +氣態 1 +氣憤 1 +氣旋 1 +氣槍 1 +氣死 1 +氣溫 1 +氣燄 1 +氣胸 1 +氣象 1 +氧釩 1 +氨基 1 +氫化 1 +氫氣 1 +氫鍵 1 +氮素 1 +氯乙 1 +氯氧 1 +氯雷 1 +水世 1 +水份 1 +水圈 1 +水壓 1 +水床 1 +水扁 1 +水攻 1 +水晶 1 +水氯 1 +水汽 1 +水流 1 +水火 1 +水球 1 +水產 1 +水療 1 +水翼 1 +水能 1 +水警 1 +水面 1 +水鳥 1 +水鴨 1 +永久 1 +永元 1 +永升 1 +永吉 1 +永和 1 +永壽 1 +永平 1 +永成 1 +永昌 1 +永權 1 +永續 1 +永輝 1 +永逸 1 +永靖 1 +汁液 1 +求偶 1 +求出 1 +求助 1 +求問 1 +求婚 1 +求情 1 +求援 1 +求籤 1 +求醫 1 +汝寧 1 +汞柱 1 +江協 1 +江口 1 +江浙 1 +江海 1 +江源 1 +江漢 1 +江灣 1 +江谷 1 +江都 1 +江閣 1 +江魚 1 +池塘 1 +池田 1 +污損 1 +污點 1 +汲及 1 +決意 1 +決擇 1 +決然 1 +決裂 1 +汽油 1 +汽船 1 +沃奎 1 +沃季 1 +沃州 1 +沃斯 1 +沃爾 1 +沃羅 1 +沈氏 1 +沉水 1 +沉迷 1 +沉重 1 +沉降 1 +沒能 1 +沒落 1 +沒錯 1 +沖之 1 +沖片 1 +沖走 1 +沙丘 1 +沙伯 1 +沙依 1 +沙尼 1 +沙崙 1 +沙巴 1 +沙普 1 +沙梁 1 +沙池 1 +沙洛 1 +沙漠 1 +沙瓦 1 +沙田 1 +沙畹 1 +沙蠶 1 +沙迦 1 +沙邦 1 +沙鄢 1 +沙里 1 +沢駅 1 +河兒 1 +河卡 1 +河圖 1 +河岸 1 +河心 1 +河段 1 +河漫 1 +河西 1 +油煙 1 +油片 1 +油田 1 +油菜 1 +油量 1 +油電 1 +治中 1 +治勲 1 +治勳 1 +治喪 1 +治國 1 +治學 1 +治水 1 +治理 1 +治軍 1 +沽渚 1 +沾解 1 +沿線 1 +沿襲 1 +沿途 1 +泊苷 1 +法令 1 +法凱 1 +法師 1 +法政 1 +法斯 1 +法格 1 +法比 1 +法海 1 +法特 1 +法登 1 +法羅 1 +法老 1 +法西 1 +法輪 1 +法迪 1 +泛濫 1 +波利 1 +波包 1 +波卡 1 +波及 1 +波因 1 +波圖 1 +波城 1 +波形 1 +波恩 1 +波折 1 +波普 1 +波森 1 +波爾 1 +波瓦 1 +波的 1 +波羅 1 +波西 1 +波里 1 +波錠 1 +波門 1 +波黑 1 +泥土 1 +泥潭 1 +注資 1 +泰共 1 +泰勒 1 +泰北 1 +泰始 1 +泰州 1 +泰曾 1 +泰琳 1 +泰米 1 +泰興 1 +泳屋 1 +泳灘 1 +洋介 1 +洋坪 1 +洗劫 1 +洗衣 1 +洛伊 1 +洛伐 1 +洛佩 1 +洛加 1 +洛城 1 +洛塞 1 +洛尼 1 +洛布 1 +洛恩 1 +洛書 1 +洛曼 1 +洛洛 1 +洛特 1 +洛珊 1 +洛琳 1 +洛茲 1 +洛蒙 1 +洛雷 1 +洛頓 1 +洞子 1 +洞穴 1 +洞窟 1 +津貼 1 +洩慾 1 +洩漏 1 +洪堡 1 +洪家 1 +洪橋 1 +洵美 1 +活出 1 +活化 1 +活埋 1 +活水 1 +活潑 1 +活現 1 +活用 1 +活躍 1 +活靈 1 +派對 1 +派往 1 +流下 1 +流亡 1 +流入 1 +流出 1 +流嶼 1 +流放 1 +流星 1 +流標 1 +流民 1 +流水 1 +流浪 1 +流產 1 +流程 1 +流言 1 +流逝 1 +流露 1 +浚稽 1 +浦市 1 +浦那 1 +浦鎮 1 +浩文 1 +浩特 1 +浪底 1 +浪漫 1 +浪潮 1 +浪費 1 +浪跡 1 +浮動 1 +浴場 1 +海事 1 +海光 1 +海因 1 +海地 1 +海姆 1 +海峰 1 +海布 1 +海平 1 +海廷 1 +海怡 1 +海昌 1 +海景 1 +海淀 1 +海港 1 +海濱 1 +海灘 1 +海爾 1 +海神 1 +海秀 1 +海老 1 +海航 1 +海藍 1 +海螺 1 +海豐 1 +海陸 1 +海風 1 +海鷗 1 +浸染 1 +浸泡 1 +涅夫 1 +涅托 1 +涅日 1 +涅爾 1 +涅米 1 +涇波 1 +涇陽 1 +消極 1 +消耗 1 +消退 1 +消除 1 +涉世 1 +涉嫌 1 +涉足 1 +涪江 1 +涮煮 1 +液化 1 +液壓 1 +涵蓋 1 +淄川 1 +淑妃 1 +淑怡 1 +淘寶 1 +淘金 1 +淡定 1 +淡色 1 +淨土 1 +淪落 1 +淪陷 1 +淫蕩 1 +淮南 1 +淮許 1 +深受 1 +深埋 1 +深層 1 +深度 1 +深感 1 +深有 1 +深柢 1 +深海 1 +深港 1 +深溪 1 +深紅 1 +深綠 1 +深色 1 +深處 1 +深造 1 +淵源 1 +混亂 1 +混凝 1 +混沌 1 +混為 1 +混燃 1 +淹浸 1 +淺水 1 +淺綠 1 +添丁 1 +清償 1 +清凈 1 +清單 1 +清帝 1 +清拆 1 +清教 1 +清文 1 +清明 1 +清潔 1 +清理 1 +清道 1 +清遠 1 +清還 1 +清鄉 1 +減低 1 +減刑 1 +減小 1 +減退 1 +渠子 1 +渣打 1 +渤海 1 +測繪 1 +渭州 1 +港交 1 +港區 1 +港府 1 +渴求 1 +游標 1 +游說 1 +湄洲 1 +湖上 1 +湖人 1 +湖名 1 +湖畔 1 +湘南 1 +湘西 1 +湘陰 1 +湛恩 1 +湧現 1 +湮滅 1 +湯料 1 +源於 1 +源田 1 +準基 1 +準將 1 +準確 1 +溝壑 1 +溝齒 1 +溢漏 1 +溪峪 1 +溪水 1 +溪美 1 +溪鱂 1 +溫克 1 +溫哥 1 +溫坡 1 +溫徹 1 +溫斯 1 +溫柔 1 +溶劑 1 +溶氣 1 +滑板 1 +滑稽 1 +滑鼠 1 +滕氏 1 +滕費 1 +滙業 1 +滬江 1 +滯洪 1 +滲出 1 +滴下 1 +滾動 1 +滾石 1 +滿意 1 +滿清 1 +滿載 1 +漁村 1 +漁梁 1 +漁船 1 +漂浮 1 +漆器 1 +演成 1 +演戲 1 +演技 1 +演繹 1 +演義 1 +演講 1 +漢中 1 +漢姆 1 +漢娜 1 +漢字 1 +漢桓 1 +漫漶 1 +漫長 1 +漱芳 1 +漲幅 1 +漸變 1 +漸趨 1 +潔瑩 1 +潘丘 1 +潘恩 1 +潘迪 1 +潛伏 1 +潛力 1 +潛望 1 +潛水 1 +潛游 1 +潟湖 1 +潢川 1 +潭尾 1 +潭村 1 +潭東 1 +潭陽 1 +潮蟹 1 +潰散 1 +澀谷 1 +澤尻 1 +澤爾 1 +激勵 1 +激發 1 +激素 1 +激進 1 +濃厚 1 +濃煙 1 +濕地 1 +濟世 1 +濟亞 1 +濟科 1 +濟邦 1 +濟鼐 1 +濫用 1 +濱海 1 +濾掉 1 +瀏陽 1 +瀕危 1 +瀘溪 1 +瀝泗 1 +瀟洒 1 +火上 1 +火不 1 +火候 1 +火喉 1 +火山 1 +火心 1 +火掌 1 +火炮 1 +火爆 1 +火鍋 1 +灰棕 1 +灰雲 1 +灰黑 1 +災禍 1 +炎熱 1 +炙手 1 +炭疽 1 +炸彈 1 +炸死 1 +炸毀 1 +炸糕 1 +為一 1 +為二 1 +為力 1 +為時 1 +為然 1 +為零 1 +烈格 1 +烏代 1 +烏來 1 +烏宗 1 +烏干 1 +烏德 1 +烏拉 1 +烏普 1 +烏腳 1 +烏魯 1 +烷基 1 +烹煮 1 +焊接 1 +焗豆 1 +焚屍 1 +焚燒 1 +焜耀 1 +無二 1 +無俚 1 +無分 1 +無危 1 +無厭 1 +無子 1 +無家 1 +無幾 1 +無心 1 +無忌 1 +無所 1 +無暇 1 +無有 1 +無機 1 +無氧 1 +無水 1 +無派 1 +無產 1 +無疑 1 +無盡 1 +無罪 1 +無聊 1 +無能 1 +無與 1 +無色 1 +無處 1 +無視 1 +無誤 1 +無過 1 +無量 1 +無限 1 +無雙 1 +無頭 1 +無點 1 +無齒 1 +焦尼 1 +焦點 1 +然不 1 +煉油 1 +煉金 1 +煙囪 1 +煙槍 1 +煙霧 1 +煜全 1 +煤建 1 +煤氣 1 +照射 1 +煮食 1 +煽動 1 +熄匙 1 +熊族 1 +熊本 1 +熊隊 1 +熏烤 1 +熏陶 1 +熔化 1 +熔岩 1 +熟知 1 +熟釜 1 +熱值 1 +熱刺 1 +熱力 1 +熱夫 1 +熱心 1 +熱愛 1 +熱羅 1 +熱身 1 +熱量 1 +熱電 1 +熱鬧 1 +熾熱 1 +燃氣 1 +燈謎 1 +燒灼 1 +燒荒 1 +燕窩 1 +營口 1 +營團 1 +營地 1 +營寨 1 +營帳 1 +營火 1 +營盤 1 +營造 1 +營長 1 +營養 1 +燦爛 1 +燭光 1 +爪部 1 +爪龍 1 +爬到 1 +爬山 1 +爬梯 1 +爭冠 1 +爭占 1 +爭吵 1 +爭奪 1 +爭寵 1 +爭得 1 +爭界 1 +爭相 1 +爭端 1 +爭競 1 +爭論 1 +爭鬥 1 +父風 1 +爸爸 1 +爺爺 1 +爽文 1 +爾他 1 +爾南 1 +爾吉 1 +爾地 1 +爾基 1 +爾塔 1 +爾布 1 +爾帕 1 +爾恩 1 +爾恰 1 +爾拉 1 +爾摩 1 +爾斐 1 +爾普 1 +爾格 1 +爾歇 1 +爾比 1 +爾汗 1 +爾法 1 +爾溫 1 +爾炘 1 +爾瑪 1 +爾瓦 1 +爾發 1 +爾皮 1 +爾納 1 +爾紐 1 +爾蒙 1 +爾蘇 1 +爾虎 1 +爾諾 1 +爾貝 1 +爾辛 1 +爾達 1 +爾金 1 +爾頓 1 +爾高 1 +爾默 1 +爾齊 1 +牆上 1 +牆身 1 +牆面 1 +片劑 1 +片尾 1 +片斷 1 +片頭 1 +版主 1 +版畫 1 +牌照 1 +牙喙 1 +牙因 1 +牙籤 1 +牙線 1 +牙薩 1 +牙醫 1 +牛斯 1 +牛池 1 +牛潭 1 +牛石 1 +牛花 1 +牛首 1 +牛鼻 1 +牟利 1 +牟合 1 +牟尼 1 +牡蠣 1 +牧區 1 +牧民 1 +牧谷 1 +物件 1 +物產 1 +物象 1 +物鏡 1 +物阜 1 +牲畜 1 +特伊 1 +特伯 1 +特佛 1 +特備 1 +特優 1 +特凱 1 +特利 1 +特務 1 +特勞 1 +特區 1 +特夸 1 +特奇 1 +特威 1 +特尼 1 +特工 1 +特律 1 +特德 1 +特快 1 +特意 1 +特攝 1 +特普 1 +特曼 1 +特森 1 +特洛 1 +特派 1 +特瓦 1 +特產 1 +特異 1 +特福 1 +特米 1 +特菲 1 +特萊 1 +特重 1 +特隆 1 +特雷 1 +特魯 1 +牽引 1 +牽牛 1 +犧牲 1 +犬科 1 +犬種 1 +犬髖 1 +犯人 1 +狂亂 1 +狄克 1 +狄刻 1 +狄拉 1 +狐庸 1 +狡猾 1 +狹小 1 +狼人 1 +狼堡 1 +狼影 1 +狼群 1 +猜忌 1 +猜想 1 +猝死 1 +猴年 1 +猴群 1 +猶大 1 +獅子 1 +獎牌 1 +獎盃 1 +獨一 1 +獨具 1 +獨唱 1 +獨孤 1 +獨家 1 +獨有 1 +獨眠 1 +獨行 1 +獨資 1 +獲准 1 +獲判 1 +獲勳 1 +獲召 1 +獲悉 1 +獲授 1 +獲獎 1 +獲益 1 +獲薦 1 +獲選 1 +獲頒 1 +獵物 1 +獸人 1 +獸族 1 +獻上 1 +獻堂 1 +獻策 1 +獻議 1 +玄天 1 +玄宗 1 +玄武 1 +玄策 1 +玄貓 1 +玉柴 1 +玉純 1 +玉魔 1 +玉鳳 1 +玉麟 1 +王儲 1 +王冠 1 +王墓 1 +王宮 1 +王座 1 +王爾 1 +王蓮 1 +玩伴 1 +玩弄 1 +玩法 1 +玩笑 1 +玫瑰 1 +玲玲 1 +玷染 1 +珍寶 1 +珠璣 1 +珠鋼 1 +班克 1 +班卓 1 +班子 1 +班布 1 +班機 1 +班次 1 +班禪 1 +班級 1 +班讓 1 +現役 1 +現身 1 +球壇 1 +球差 1 +球星 1 +球根 1 +球狀 1 +球道 1 +球面 1 +理性 1 +理曼 1 +理由 1 +琉古 1 +琉西 1 +琴弓 1 +琺琅 1 +瑙恩 1 +瑜伽 1 +瑜陀 1 +瑞坦 1 +瑞安 1 +瑞拉 1 +瑞普 1 +瑞欽 1 +瑞爾 1 +瑞阿 1 +瑞霖 1 +瑟洛 1 +瑟芬 1 +瑣法 1 +瑪君 1 +瑪哈 1 +瑪喀 1 +瑪莎 1 +瑪諾 1 +環保 1 +環帶 1 +環狀 1 +環節 1 +環繞 1 +瓊斯 1 +瓊珊 1 +瓊茲 1 +瓜里 1 +瓦內 1 +瓦卡 1 +瓦史 1 +瓦坦 1 +瓦多 1 +瓦尼 1 +瓦德 1 +瓦本 1 +瓦桑 1 +瓦涅 1 +瓦瓦 1 +瓦納 1 +瓦蒂 1 +瓦薩 1 +瓦解 1 +瓦里 1 +甄別 1 +甘共 1 +甘斯 1 +甘油 1 +甘草 1 +甘馬 1 +甚厚 1 +甚嚴 1 +甚多 1 +甚小 1 +甚深 1 +甚篤 1 +甜兒 1 +甜度 1 +生主 1 +生出 1 +生動 1 +生天 1 +生子 1 +生平 1 +生性 1 +生效 1 +生機 1 +生殺 1 +生氣 1 +生火 1 +生肖 1 +生財 1 +生還 1 +產出 1 +產經 1 +甥女 1 +甦醒 1 +用人 1 +用來 1 +用光 1 +用兵 1 +用字 1 +用完 1 +用手 1 +用有 1 +用水 1 +用藥 1 +用計 1 +用詞 1 +田園 1 +田地 1 +田心 1 +田急 1 +田納 1 +田谷 1 +田野 1 +田頭 1 +甲山 1 +甲殼 1 +申辦 1 +男人 1 +男士 1 +男嬰 1 +男方 1 +男童 1 +界定 1 +界限 1 +留傳 1 +留哥 1 +留待 1 +留空 1 +留聲 1 +留良 1 +畜牧 1 +畜養 1 +畢打 1 +畢氏 1 +畢蘭 1 +畢馬 1 +略帶 1 +略有 1 +略為 1 +畫下 1 +畫中 1 +畫分 1 +畫會 1 +畫畫 1 +畫面 1 +異事 1 +異姓 1 +異度 1 +異形 1 +異曲 1 +異母 1 +異端 1 +當上 1 +當下 1 +當值 1 +當勞 1 +當官 1 +當屆 1 +當政 1 +當斯 1 +當晚 1 +當期 1 +當歸 1 +當面 1 +疆域 1 +疏浚 1 +疏遠 1 +疑點 1 +疙瘩 1 +疲勞 1 +疲弱 1 +疼痛 1 +疾首 1 +病原 1 +病患 1 +病情 1 +病歷 1 +病死 1 +病重 1 +症候 1 +症狀 1 +痕跡 1 +痙攣 1 +痛心 1 +痛欲 1 +痢疾 1 +瘧疾 1 +癥狀 1 +登丹 1 +登尼 1 +發佈 1 +發作 1 +發兵 1 +發呆 1 +發奮 1 +發拉 1 +發揚 1 +發改 1 +發放 1 +發洩 1 +發炎 1 +發燒 1 +發牌 1 +發球 1 +發病 1 +發聲 1 +發財 1 +發車 1 +發配 1 +白丁 1 +白井 1 +白公 1 +白利 1 +白化 1 +白堊 1 +白天 1 +白宮 1 +白砂 1 +白蓮 1 +白蛇 1 +白質 1 +白軍 1 +白銅 1 +白陵 1 +白雲 1 +白面 1 +白頸 1 +白鹿 1 +白麗 1 +百事 1 +百五 1 +百代 1 +百億 1 +百兆 1 +百六 1 +百帕 1 +百幾 1 +百廢 1 +百濟 1 +百無 1 +百老 1 +百花 1 +百計 1 +百貨 1 +百鳴 1 +的士 1 +的尼 1 +的斯 1 +的確 1 +的黎 1 +皇位 1 +皇冠 1 +皇城 1 +皇太 1 +皇妃 1 +皇廷 1 +皇權 1 +皇發 1 +皈依 1 +皓若 1 +皮亞 1 +皮克 1 +皮內 1 +皮奇 1 +皮奧 1 +皮杜 1 +皮耶 1 +皮雅 1 +皰疹 1 +盆地 1 +盈盈 1 +益友 1 +益城 1 +益壽 1 +益新 1 +益處 1 +盔甲 1 +盛事 1 +盛大 1 +盛妝 1 +盛揮 1 +盛產 1 +盛行 1 +盜用 1 +盟軍 1 +盡到 1 +盡喪 1 +盡情 1 +盡糧 1 +盡頭 1 +監工 1 +監控 1 +監測 1 +監禁 1 +監聽 1 +盤踞 1 +盧加 1 +盧普 1 +盧溝 1 +盧瓦 1 +盧甘 1 +盧福 1 +目相 1 +目睹 1 +目鏡 1 +直勉 1 +直屬 1 +直覺 1 +直言 1 +直說 1 +直間 1 +相位 1 +相大 1 +相容 1 +相差 1 +相悖 1 +相應 1 +相挺 1 +相異 1 +相看 1 +相稱 1 +相約 1 +相繼 1 +相聲 1 +相若 1 +相處 1 +相見 1 +相較 1 +相通 1 +相速 1 +相鄰 1 +相間 1 +盾座 1 +盾系 1 +省務 1 +省思 1 +省油 1 +眈眈 1 +眉山 1 +眉弄 1 +看中 1 +看出 1 +看台 1 +看得 1 +看看 1 +看管 1 +看見 1 +看透 1 +看重 1 +真光 1 +真北 1 +真名 1 +真好 1 +真希 1 +真木 1 +真核 1 +眯眼 1 +眷村 1 +眼下 1 +眼淚 1 +眼狀 1 +眼球 1 +眼皮 1 +眼神 1 +眾經 1 +眾說 1 +睡眠 1 +睡覺 1 +督撫 1 +睾丁 1 +睿智 1 +瞪羚 1 +瞬時 1 +瞭如 1 +矗立 1 +矢口 1 +知府 1 +知曉 1 +知留 1 +知足 1 +短少 1 +短期 1 +短毛 1 +短草 1 +短裙 1 +短詩 1 +短語 1 +短音 1 +短髮 1 +矮星 1 +石像 1 +石器 1 +石塊 1 +石材 1 +石湖 1 +石灰 1 +石牆 1 +石牌 1 +石華 1 +石頭 1 +砂拉 1 +砂漿 1 +砂紙 1 +砍伐 1 +砒霜 1 +研磨 1 +砝碼 1 +破損 1 +破滅 1 +破舊 1 +破落 1 +硝庫 1 +硝酸 1 +硫酸 1 +硬幣 1 +碑亭 1 +碑刻 1 +碧嘉 1 +碧波 1 +碧琴 1 +碰撞 1 +碳紙 1 +碳酸 1 +確知 1 +確診 1 +磁性 1 +磐田 1 +磚室 1 +磨坊 1 +磨折 1 +磨槽 1 +磷化 1 +磷素 1 +磷酸 1 +礦場 1 +礦物 1 +礦石 1 +礦藏 1 +示人 1 +示愛 1 +社皮 1 +社論 1 +社長 1 +祁鏞 1 +祈願 1 +祐希 1 +祖上 1 +祖圭 1 +祖宗 1 +祖籍 1 +祖魯 1 +神仙 1 +神偷 1 +神器 1 +神明 1 +神殿 1 +神社 1 +神策 1 +神籤 1 +神魔 1 +祥子 1 +票據 1 +票數 1 +祭司 1 +祭壇 1 +祭師 1 +祭物 1 +祭祀 1 +祭酒 1 +祿勸 1 +祿山 1 +禁煙 1 +禁用 1 +禁藥 1 +禁賽 1 +福克 1 +福安 1 +福康 1 +福德 1 +福慧 1 +福池 1 +福清 1 +福瓦 1 +禪師 1 +禮堂 1 +禮濤 1 +禮炮 1 +禮物 1 +禱文 1 +禽流 1 +秀實 1 +秀康 1 +秀怡 1 +秀珠 1 +私下 1 +私交 1 +私奔 1 +私宅 1 +私家 1 +私立 1 +私財 1 +秉國 1 +秋人 1 +秋季 1 +秋山 1 +秋爽 1 +秋興 1 +秋香 1 +科他 1 +科伊 1 +科多 1 +科屬 1 +科德 1 +科恩 1 +科教 1 +科朗 1 +科目 1 +科維 1 +秘指 1 +秘果 1 +租予 1 +租務 1 +租地 1 +租戶 1 +租用 1 +秦城 1 +秦州 1 +秦晉 1 +秦朝 1 +秦石 1 +秩序 1 +移交 1 +移往 1 +移植 1 +移至 1 +移送 1 +稀釋 1 +稅項 1 +稍為 1 +稗官 1 +種內 1 +種名 1 +種子 1 +種屬 1 +稱海 1 +稱病 1 +稱銜 1 +稻子 1 +稻草 1 +稼祥 1 +穀物 1 +穆宗 1 +穆拉 1 +穆薩 1 +積山 1 +積良 1 +穩固 1 +穩妥 1 +究底 1 +究竟 1 +穹哇 1 +空出 1 +空前 1 +空名 1 +空客 1 +空戰 1 +空隙 1 +空難 1 +穿幫 1 +穿戴 1 +穿甲 1 +穿行 1 +穿過 1 +突尼 1 +突感 1 +突現 1 +窄袖 1 +窗口 1 +窗外 1 +窘境 1 +窟檐 1 +窮苦 1 +窮追 1 +窯洞 1 +竄紅 1 +竊聽 1 +立交 1 +立國 1 +立村 1 +立營 1 +立花 1 +立蒙 1 +立面 1 +立體 1 +站內 1 +站名 1 +站坪 1 +站廳 1 +站點 1 +章回 1 +章斐 1 +童女 1 +童男 1 +端川 1 +競相 1 +竹器 1 +竹治 1 +竹溪 1 +竹片 1 +符桐 1 +第廿 1 +第比 1 +第谷 1 +笳冬 1 +等位 1 +等客 1 +等號 1 +筐仔 1 +筒狀 1 +答應 1 +答那 1 +策軍 1 +算出 1 +算術 1 +管制 1 +管子 1 +箬松 1 +箱型 1 +箴言 1 +節度 1 +節節 1 +範疇 1 +篇累 1 +篡位 1 +篡國 1 +篡地 1 +簡化 1 +簡約 1 +簡訊 1 +簽名 1 +簽定 1 +簽認 1 +簽證 1 +簽賬 1 +簿公 1 +籃筐 1 +籌伯 1 +籌備 1 +籌措 1 +籌款 1 +籌資 1 +籌辦 1 +籍貫 1 +籠式 1 +籠草 1 +米內 1 +米加 1 +米南 1 +米古 1 +米哈 1 +米思 1 +米沙 1 +米洛 1 +米烏 1 +米琳 1 +米線 1 +米酒 1 +米高 1 +粉碎 1 +粉紅 1 +粉絲 1 +粒體 1 +粗壯 1 +粗鱗 1 +粵明 1 +粽子 1 +精力 1 +精子 1 +精密 1 +精心 1 +精湛 1 +精算 1 +精索 1 +精蓄 1 +精裝 1 +糖尿 1 +糖蒜 1 +糟糕 1 +糧儲 1 +糧絕 1 +糧餉 1 +系數 1 +糾正 1 +糾紛 1 +紀元 1 +約尼 1 +約拉 1 +約熱 1 +約長 1 +紅旗 1 +紅日 1 +紅杏 1 +紅樹 1 +紅玉 1 +紅磨 1 +紅茶 1 +紅襪 1 +紅遍 1 +紅酒 1 +紅點 1 +紋路 1 +紋飾 1 +納克 1 +納入 1 +納加 1 +納哥 1 +納塔 1 +納多 1 +納夫 1 +納巴 1 +納波 1 +納澤 1 +納特 1 +納瓦 1 +納蘇 1 +納西 1 +納雷 1 +紐國 1 +紐斯 1 +紐澤 1 +紐芬 1 +紐華 1 +紐黑 1 +純一 1 +純凈 1 +純樸 1 +純陽 1 +紙上 1 +紙條 1 +紙盒 1 +級數 1 +紛紜 1 +素包 1 +素食 1 +素餡 1 +索不 1 +索倫 1 +索尼 1 +索居 1 +索比 1 +索洛 1 +索溪 1 +索爾 1 +索維 1 +索西 1 +索賠 1 +索頜 1 +紮實 1 +累牘 1 +累計 1 +細岡 1 +細窄 1 +細菌 1 +細部 1 +細長 1 +紳士 1 +紹儀 1 +紹榮 1 +紺三 1 +終審 1 +終身 1 +組件 1 +組像 1 +組別 1 +組口 1 +組態 1 +組隊 1 +結交 1 +結冰 1 +結尾 1 +結雅 1 +絕壁 1 +絕大 1 +絕後 1 +絕版 1 +絕罰 1 +絞刑 1 +絞死 1 +絞痛 1 +給定 1 +給職 1 +給藥 1 +給體 1 +統帥 1 +統籌 1 +絲山 1 +絲帶 1 +綏遠 1 +經國 1 +經意 1 +經文 1 +經昌 1 +經期 1 +經由 1 +經界 1 +綜理 1 +綜錄 1 +綠化 1 +綠帶 1 +綠滙 1 +綠燈 1 +綠社 1 +綠黨 1 +維健 1 +維勒 1 +維匯 1 +維塔 1 +維希 1 +維德 1 +維拿 1 +維斯 1 +維景 1 +維生 1 +維祀 1 +維羅 1 +維茲 1 +維西 1 +維記 1 +維護 1 +綱領 1 +網址 1 +網易 1 +網線 1 +網購 1 +綺塍 1 +綺色 1 +綽號 1 +綿羊 1 +緊張 1 +緊緊 1 +緊要 1 +緊貼 1 +緊逼 1 +緊閉 1 +線上 1 +線前 1 +線度 1 +線條 1 +線索 1 +線道 1 +締造 1 +編上 1 +編導 1 +編程 1 +編篡 1 +編繪 1 +編纂 1 +編者 1 +編腔 1 +編隊 1 +緩衝 1 +緩解 1 +緩鬢 1 +緩龍 1 +緯來 1 +練兵 1 +縣市 1 +縣裡 1 +縫製 1 +縮寫 1 +縮小 1 +縱使 1 +縱觀 1 +縱隊 1 +總區 1 +總和 1 +總局 1 +總站 1 +總行 1 +總裁 1 +總計 1 +總辦 1 +績效 1 +繁多 1 +繁瑣 1 +繁盛 1 +繁雜 1 +繁體 1 +織胺 1 +繞境 1 +繞開 1 +繩架 1 +繳付 1 +繳納 1 +繼業 1 +繼科 1 +繽紛 1 +續航 1 +續部 1 +纏足 1 +纜車 1 +缺口 1 +缺失 1 +缺少 1 +缺氧 1 +缺血 1 +罕有 1 +罪惡 1 +置有 1 +置物 1 +罰則 1 +署理 1 +罵聲 1 +罷免 1 +罷工 1 +罹癌 1 +罹難 1 +羅乞 1 +羅什 1 +羅來 1 +羅先 1 +羅加 1 +羅培 1 +羅姆 1 +羅巴 1 +羅德 1 +羅恩 1 +羅拔 1 +羅提 1 +羅曼 1 +羅柔 1 +羅森 1 +羅洛 1 +羅涅 1 +羅納 1 +羅索 1 +羅維 1 +羅費 1 +羅迪 1 +羅里 1 +羅隆 1 +羊圈 1 +羊犬 1 +美味 1 +美孚 1 +美寶 1 +美幸 1 +美擬 1 +美林 1 +美爾 1 +美特 1 +美琴 1 +美知 1 +美稱 1 +美索 1 +美聯 1 +美聲 1 +美薇 1 +美術 1 +美西 1 +美觀 1 +美譽 1 +美里 1 +美頓 1 +美食 1 +羚羊 1 +羞恥 1 +群峰 1 +群族 1 +群索 1 +群組 1 +群落 1 +群速 1 +群雄 1 +群體 1 +羨慕 1 +義久 1 +義安 1 +義工 1 +義弘 1 +義春 1 +義民 1 +義父 1 +義項 1 +羯羅 1 +羱羊 1 +羽田 1 +羽絨 1 +翌日 1 +習經 1 +翔麟 1 +翠鳥 1 +翰內 1 +翰麥 1 +翻覆 1 +翼手 1 +耀樞 1 +耀武 1 +耀邦 1 +老人 1 +老匯 1 +老名 1 +老大 1 +老套 1 +老婦 1 +老將 1 +老少 1 +老弱 1 +老橋 1 +老漢 1 +考上 1 +考夫 1 +考尼 1 +考柯 1 +考牙 1 +考生 1 +考究 1 +考績 1 +考進 1 +考選 1 +而三 1 +而代 1 +而再 1 +而出 1 +而已 1 +而復 1 +而至 1 +耐受 1 +耐庵 1 +耐玩 1 +耐航 1 +耳光 1 +耳勺 1 +耳孔 1 +耳忒 1 +耳朵 1 +耳珠 1 +耳環 1 +耳癤 1 +耳蝸 1 +耳門 1 +耳骨 1 +耶特 1 +耶索 1 +耶路 1 +耽擱 1 +聆聽 1 +聊賴 1 +聖人 1 +聖保 1 +聖克 1 +聖名 1 +聖彌 1 +聖彼 1 +聖徒 1 +聖拉 1 +聖歌 1 +聖水 1 +聖求 1 +聖潔 1 +聖祖 1 +聖神 1 +聖經 1 +聖訓 1 +聖路 1 +聖體 1 +聘問 1 +聘用 1 +聚氯 1 +聚禮 1 +聚苯 1 +聚變 1 +聚體 1 +聞名 1 +聞言 1 +聯姻 1 +聯播 1 +聯江 1 +聯浦 1 +聯產 1 +聯美 1 +聯酋 1 +聰敏 1 +聲恆 1 +聲援 1 +聲波 1 +聲谷 1 +聲門 1 +聲音 1 +聶丞 1 +職棒 1 +聽到 1 +聽命 1 +聽從 1 +聽眾 1 +聽聞 1 +聾人 1 +肅宗 1 +肆意 1 +肉夾 1 +肉湯 1 +肉瘤 1 +肉緊 1 +肌肉 1 +肖嚴 1 +肚臍 1 +肚餓 1 +股市 1 +股本 1 +肥牛 1 +肥田 1 +肥胖 1 +肯亞 1 +肯特 1 +育有 1 +育樂 1 +育空 1 +肺病 1 +胃石 1 +背上 1 +背依 1 +背包 1 +背叛 1 +背後 1 +背靠 1 +背面 1 +背鰭 1 +胚胎 1 +胞弟 1 +胡德 1 +胡特 1 +胡禮 1 +胡蜂 1 +胡馬 1 +胸痛 1 +胸管 1 +胸部 1 +胸鰭 1 +能人 1 +能否 1 +能幹 1 +能為 1 +脊椎 1 +脫疽 1 +脫落 1 +脫隊 1 +脫離 1 +脱口 1 +脾氣 1 +腐敗 1 +腐蝕 1 +腓力 1 +腔蛇 1 +腫瘤 1 +腳掌 1 +腳本 1 +腳點 1 +腸胃 1 +腸道 1 +腸骨 1 +腿部 1 +膝傷 1 +膝頭 1 +膠州 1 +膠東 1 +膠澳 1 +膠體 1 +膨脹 1 +膽酸 1 +臉頰 1 +臉龐 1 +臥兒 1 +臥龍 1 +臨榆 1 +臨終 1 +臨高 1 +自作 1 +自保 1 +自信 1 +自卑 1 +自受 1 +自在 1 +自學 1 +自帶 1 +自強 1 +自從 1 +自成 1 +自用 1 +自發 1 +自禁 1 +自製 1 +自訂 1 +自負 1 +自賞 1 +自辦 1 +至上 1 +至善 1 +至是 1 +至柔 1 +至正 1 +至死 1 +致使 1 +致函 1 +致恐 1 +致病 1 +致瘋 1 +致癌 1 +臺大 1 +舀出 1 +舅父 1 +與倫 1 +與姆 1 +與願 1 +興國 1 +興學 1 +興業 1 +興海 1 +興祖 1 +舉世 1 +舉例 1 +舉國 1 +舉止 1 +舉薦 1 +舉起 1 +舊友 1 +舊屋 1 +舊時 1 +舊稱 1 +舊部 1 +舊金 1 +舌劍 1 +舌頭 1 +舍爾 1 +舍訥 1 +舒查 1 +舒爾 1 +舜初 1 +舞劇 1 +舞陽 1 +航天 1 +航站 1 +般若 1 +船塢 1 +船山 1 +船業 1 +船體 1 +艦身 1 +良師 1 +良心 1 +良性 1 +良新 1 +良田 1 +良知 1 +艱巨 1 +色佳 1 +色布 1 +色帶 1 +色情 1 +色目 1 +色調 1 +色龍 1 +艷姬 1 +艷麗 1 +艾伍 1 +艾倫 1 +艾利 1 +艾因 1 +艾夏 1 +艾崔 1 +艾巴 1 +艾度 1 +艾琳 1 +艾瑞 1 +艾瑪 1 +艾登 1 +艾美 1 +艾蓮 1 +艾薩 1 +艾迴 1 +艾雲 1 +艾麗 1 +芬妮 1 +芬華 1 +芬迪 1 +芭蕉 1 +芭黎 1 +花上 1 +花俏 1 +花坮 1 +花城 1 +花店 1 +花旗 1 +花月 1 +花果 1 +花枝 1 +花瓶 1 +花甲 1 +花蜜 1 +花鞋 1 +花齊 1 +芳自 1 +苗栗 1 +苗穗 1 +苟且 1 +若愚 1 +若羌 1 +若英 1 +苦力 1 +苦悶 1 +苦情 1 +苦苣 1 +苦讀 1 +苯並 1 +英一 1 +英乙 1 +英倫 1 +英傑 1 +英勇 1 +英吋 1 +英寸 1 +英年 1 +英廷 1 +英男 1 +英額 1 +英麗 1 +英龍 1 +茂名 1 +范恩 1 +茄南 1 +茄芮 1 +茅家 1 +茨卡 1 +茨海 1 +茨科 1 +茨門 1 +茲堡 1 +茲海 1 +茲羅 1 +茲與 1 +茲薇 1 +茲貝 1 +茵蘭 1 +茶樓 1 +茶湯 1 +茶館 1 +茸切 1 +茸穹 1 +荃灣 1 +荃麟 1 +草原 1 +草地 1 +草坪 1 +草席 1 +草稿 1 +荊州 1 +荒地 1 +荒蕪 1 +荒誕 1 +荔灣 1 +荷林 1 +荷爾 1 +荷銀 1 +莉亞 1 +莉安 1 +莊嚴 1 +莊王 1 +莎尼 1 +莎樂 1 +莫吉 1 +莫埃 1 +莫尼 1 +莫扎 1 +莫札 1 +莫桑 1 +莫瑙 1 +莫瓦 1 +莫納 1 +莫臥 1 +莫過 1 +莫里 1 +莫鱷 1 +莽山 1 +菊花 1 +華倫 1 +華克 1 +華少 1 +華新 1 +華族 1 +華林 1 +華爾 1 +華界 1 +華石 1 +華秀 1 +華納 1 +華絲 1 +華西 1 +華頓 1 +菲亞 1 +菲力 1 +菲國 1 +菲萊 1 +菲詩 1 +菸害 1 +萊利 1 +萊博 1 +萊因 1 +萊夫 1 +萊希 1 +萊德 1 +萊斯 1 +萊明 1 +萊曼 1 +萊蕪 1 +萊采 1 +萊默 1 +萌芽 1 +萎縮 1 +萬一 1 +萬三 1 +萬丹 1 +萬億 1 +萬多 1 +萬貴 1 +落下 1 +落千 1 +落實 1 +落敗 1 +落葉 1 +葆玖 1 +葉利 1 +葉士 1 +葉序 1 +葉綠 1 +葉魚 1 +著手 1 +著有 1 +著譯 1 +葛力 1 +葛朱 1 +葛浩 1 +葛羅 1 +葛蕾 1 +葛量 1 +葡超 1 +葫蘆 1 +葬禮 1 +葵青 1 +蒂利 1 +蒂娜 1 +蒂洛 1 +蒂爾 1 +蒂迦 1 +蒙丹 1 +蒙卡 1 +蒙塔 1 +蒙巴 1 +蒙得 1 +蒙羞 1 +蒙面 1 +蒙馬 1 +蒲飛 1 +蒸氣 1 +蒸發 1 +蒼白 1 +蓄水 1 +蓄銳 1 +蓋兒 1 +蓋因 1 +蓋多 1 +蓋曼 1 +蓋朗 1 +蓋爾 1 +蓋頂 1 +蓓天 1 +蓬塔 1 +蓬拉 1 +蓬皮 1 +蓮娜 1 +蓮安 1 +蓮花 1 +蔑稱 1 +蔡斯 1 +蔣公 1 +蔥蝸 1 +蕙嫻 1 +蕨類 1 +蕩漾 1 +蕾妮 1 +蕾絲 1 +薄弱 1 +薄扶 1 +薛慶 1 +薩凡 1 +薩卡 1 +薩平 1 +薩德 1 +薩瑞 1 +薩諸 1 +薩諾 1 +薩迪 1 +薩馬 1 +薪俸 1 +薪嘗 1 +藉助 1 +藉此 1 +藍儂 1 +藍寶 1 +藍尼 1 +藍本 1 +藍欽 1 +藍潟 1 +藍灰 1 +藍田 1 +藍白 1 +藍背 1 +藍邊 1 +藍領 1 +藍黨 1 +藏之 1 +藏寶 1 +藏有 1 +藝名 1 +藝能 1 +藝謀 1 +藝電 1 +藤原 1 +藤木 1 +藤本 1 +藤村 1 +藤枝 1 +藤藝 1 +藥品 1 +藥師 1 +藥材 1 +藥水 1 +藥石 1 +藩主 1 +藩士 1 +藩市 1 +藩西 1 +蘇利 1 +蘇北 1 +蘇尋 1 +蘇斯 1 +蘇木 1 +蘇美 1 +蘇萊 1 +蘇達 1 +蘇醒 1 +蘇魯 1 +蘊藏 1 +蘭利 1 +蘭堡 1 +蘭多 1 +蘭大 1 +蘭封 1 +蘭尼 1 +蘭弗 1 +蘭登 1 +虎式 1 +虎棒 1 +虎翼 1 +虎視 1 +虔信 1 +處之 1 +處女 1 +處決 1 +處置 1 +處長 1 +虛弱 1 +虛榮 1 +虛無 1 +號吾 1 +號子 1 +號稱 1 +號誌 1 +虢國 1 +虹橋 1 +蚊類 1 +蚩尤 1 +蛇油 1 +蛇種 1 +蛇魔 1 +蜂擁 1 +蜂蜜 1 +蜆殼 1 +蜚聲 1 +蜥蜴 1 +蜿蜒 1 +蝴蝶 1 +蝸牛 1 +融入 1 +融化 1 +融和 1 +融雪 1 +螞蟻 1 +螢幕 1 +蟬聯 1 +蟲洞 1 +蠟浸 1 +蠶院 1 +蠻子 1 +血型 1 +血液 1 +血竭 1 +血管 1 +血腥 1 +行人 1 +行使 1 +行列 1 +行各 1 +行將 1 +行用 1 +行禮 1 +行長 1 +行騙 1 +街上 1 +街名 1 +街小 1 +街市 1 +街路 1 +街頭 1 +衛理 1 +衝動 1 +衝鋒 1 +衡量 1 +衢山 1 +衣冠 1 +衣物 1 +衣索 1 +表型 1 +表妹 1 +表姐 1 +表徵 1 +表情 1 +表態 1 +表揚 1 +表格 1 +表決 1 +表白 1 +表述 1 +衰敗 1 +衰落 1 +袖手 1 +袖箭 1 +被告 1 +被子 1 +裁決 1 +裁減 1 +裂縫 1 +裂變 1 +裋褐 1 +裕智 1 +裕軍 1 +裙子 1 +補償 1 +補天 1 +補教 1 +補時 1 +補褂 1 +裝修 1 +裝備 1 +裝嵌 1 +裝有 1 +裝瓶 1 +裝葯 1 +裝設 1 +裝載 1 +裡斯 1 +裴林 1 +裸子 1 +裸照 1 +製備 1 +製得 1 +複數 1 +褐色 1 +褪色 1 +褲子 1 +褲袋 1 +襄助 1 +襄王 1 +襄陽 1 +襟見 1 +襲封 1 +西京 1 +西佗 1 +西利 1 +西卡 1 +西向 1 +西周 1 +西哈 1 +西坑 1 +西域 1 +西夏 1 +西奇 1 +西宮 1 +西尼 1 +西岸 1 +西島 1 +西廠 1 +西式 1 +西弗 1 +西拉 1 +西晉 1 +西段 1 +西比 1 +西河 1 +西漢 1 +西爾 1 +西甌 1 +西米 1 +西絲 1 +西線 1 +西美 1 +西翼 1 +西蒙 1 +西薩 1 +西距 1 +西鄉 1 +西醫 1 +西里 1 +要是 1 +要脅 1 +要衝 1 +要道 1 +見人 1 +見稱 1 +見聞 1 +見肘 1 +見解 1 +見識 1 +見鍾 1 +見長 1 +規例 1 +覓食 1 +視乎 1 +視作 1 +視圖 1 +視眈 1 +視角 1 +親人 1 +親信 1 +親政 1 +親朋 1 +親筆 1 +親臨 1 +親身 1 +覺察 1 +觀光 1 +觀察 1 +觀念 1 +觀戰 1 +觀望 1 +觀看 1 +觀者 1 +角膜 1 +解僱 1 +解夢 1 +解析 1 +解答 1 +解職 1 +解脫 1 +解說 1 +觸怒 1 +觸手 1 +觸覺 1 +觸診 1 +言官 1 +言語 1 +言辭 1 +言閒 1 +訂位 1 +訃告 1 +訄書 1 +訇開 1 +計其 1 +計委 1 +計謀 1 +討逆 1 +託泊 1 +記念 1 +記述 1 +記集 1 +訥費 1 +設站 1 +許昌 1 +許願 1 +訴求 1 +訴諸 1 +註明 1 +註銷 1 +詐死 1 +詔書 1 +評出 1 +評判 1 +評鑑 1 +詛咒 1 +詞幹 1 +詞義 1 +詢問 1 +試劑 1 +試播 1 +試種 1 +試製 1 +試音 1 +試飛 1 +詩文 1 +該事 1 +該人 1 +該墓 1 +該島 1 +該年 1 +該批 1 +該族 1 +該會 1 +該條 1 +該段 1 +該科 1 +該系 1 +該處 1 +該路 1 +該黨 1 +詳情 1 +詳細 1 +詼諧 1 +誇德 1 +誇祖 1 +誌家 1 +認一 1 +認同 1 +認定 1 +認罪 1 +認證 1 +認輔 1 +誓言 1 +誕下 1 +誕不 1 +誘因 1 +語文 1 +語法 1 +語流 1 +語訓 1 +語調 1 +語速 1 +語音 1 +誠意 1 +誤信 1 +誤差 1 +誤會 1 +誤槍 1 +誤譯 1 +誥命 1 +說出 1 +說客 1 +說成 1 +說紛 1 +說話 1 +說謊 1 +說道 1 +課本 1 +誹謗 1 +調值 1 +調停 1 +調入 1 +調和 1 +調控 1 +調水 1 +調沙 1 +調研 1 +調節 1 +調職 1 +調解 1 +諂媚 1 +談判 1 +談妥 1 +談論 1 +請來 1 +請辭 1 +請願 1 +論事 1 +諜海 1 +諧波 1 +諸如 1 +諸暨 1 +諸河 1 +諺言 1 +諾丁 1 +諾域 1 +諾娃 1 +諾曼 1 +諾爾 1 +諾瓦 1 +謀取 1 +謀士 1 +謀求 1 +謀職 1 +謁者 1 +謊言 1 +謙卑 1 +講完 1 +講究 1 +講談 1 +講道 1 +謝世 1 +謝列 1 +謝爾 1 +謝瓦 1 +謝蓋 1 +謹慎 1 +譜代 1 +警務 1 +警句 1 +警告 1 +警員 1 +警戒 1 +警衛 1 +警覺 1 +警鐘 1 +譯作 1 +譯員 1 +譯場 1 +譯本 1 +議席 1 +譴責 1 +護佑 1 +護城 1 +護墊 1 +護送 1 +讀取 1 +讀法 1 +變動 1 +變差 1 +變色 1 +變調 1 +變身 1 +變遷 1 +變革 1 +讓步 1 +讓開 1 +讚喻 1 +讚揚 1 +讚美 1 +讚譽 1 +谷山 1 +谷氨 1 +豆瓣 1 +豎立 1 +豎起 1 +豐久 1 +豐厚 1 +豐城 1 +豐臣 1 +豐隆 1 +象數 1 +象晉 1 +豢養 1 +豪宅 1 +豪門 1 +豫南 1 +豬圈 1 +豬油 1 +豬籠 1 +豬肉 1 +貓咪 1 +貓囒 1 +貓科 1 +貝加 1 +貝南 1 +貝斯 1 +貝格 1 +貝碧 1 +貝納 1 +貝都 1 +貝類 1 +貞昌 1 +貞潔 1 +貞觀 1 +負擔 1 +負粒 1 +負芻 1 +負荷 1 +負面 1 +負額 1 +財之 1 +財經 1 +財落 1 +貢品 1 +貢哥 1 +貢嘎 1 +貢巴 1 +貧乏 1 +貧窮 1 +貧鈾 1 +貨品 1 +貨機 1 +販賣 1 +貪圖 1 +貪婪 1 +貪心 1 +貪瀆 1 +貫徹 1 +貫穿 1 +貫通 1 +責怪 1 +責難 1 +貴子 1 +貴築 1 +貴賓 1 +貴陽 1 +貴霜 1 +貶意 1 +買入 1 +買賣 1 +費曼 1 +費用 1 +費盡 1 +費羅 1 +費雷 1 +貼身 1 +賀特 1 +賀立 1 +賄選 1 +資政 1 +資陽 1 +賈亞 1 +賈克 1 +賈多 1 +賈氏 1 +賓客 1 +賓尼 1 +賓州 1 +賓登 1 +賞識 1 +賠禮 1 +賡臣 1 +賢思 1 +賣出 1 +賣到 1 +賣地 1 +賣家 1 +賣掉 1 +賣空 1 +賤女 1 +賤民 1 +質詢 1 +賭徒 1 +賭檔 1 +賴宣 1 +賴滕 1 +賺取 1 +賺錢 1 +購得 1 +購置 1 +賽亞 1 +賽場 1 +賽拉 1 +賽普 1 +賽爾 1 +賽車 1 +賽道 1 +贈送 1 +贊博 1 +贊成 1 +贊比 1 +贊諾 1 +贏家 1 +贖回 1 +赤坂 1 +赤壁 1 +赤樹 1 +赤狐 1 +赤鱲 1 +赫伯 1 +赫塔 1 +赫姆 1 +赫斯 1 +赫曼 1 +赫比 1 +赫盧 1 +赫莫 1 +赫雷 1 +赫魯 1 +走上 1 +走到 1 +走勢 1 +走漏 1 +走私 1 +起事 1 +起伏 1 +起初 1 +起名 1 +起因 1 +起始 1 +起建 1 +起彼 1 +起止 1 +起死 1 +起碼 1 +起端 1 +起舞 1 +起落 1 +起訖 1 +起降 1 +起點 1 +超出 1 +超導 1 +超強 1 +超我 1 +超時 1 +超武 1 +超然 1 +超重 1 +超齡 1 +越亮 1 +越共 1 +越前 1 +越好 1 +越弱 1 +越戰 1 +越早 1 +越暗 1 +越牆 1 +越發 1 +越近 1 +越過 1 +趕往 1 +趙氏 1 +趣事 1 +趨勢 1 +趨於 1 +足不 1 +足夠 1 +足見 1 +足跡 1 +趾爪 1 +趾骨 1 +跋扈 1 +跑壘 1 +跑步 1 +跑車 1 +跑馬 1 +跟操 1 +跟班 1 +跟蹤 1 +跟進 1 +跟隨 1 +跨國 1 +跨度 1 +跨步 1 +跨足 1 +跨過 1 +路士 1 +路撒 1 +路支 1 +路政 1 +路殊 1 +路濟 1 +路綫 1 +路網 1 +路透 1 +路過 1 +路障 1 +路面 1 +跳動 1 +跳槽 1 +跳過 1 +跳遠 1 +跳高 1 +踏上 1 +踏入 1 +踢進 1 +躁動 1 +躍升 1 +身受 1 +身型 1 +身大 1 +身旁 1 +身為 1 +身無 1 +身而 1 +身著 1 +身軀 1 +身高 1 +躬耕 1 +躲到 1 +車上 1 +車仁 1 +車型 1 +車士 1 +車外 1 +車尾 1 +車市 1 +車廠 1 +車手 1 +車票 1 +車程 1 +車窗 1 +車系 1 +車號 1 +車費 1 +車路 1 +車迷 1 +車頭 1 +軋箏 1 +軌跡 1 +軍中 1 +軍備 1 +軍功 1 +軍務 1 +軍委 1 +軍師 1 +軍援 1 +軍方 1 +軍服 1 +軍營 1 +軍艦 1 +軍裝 1 +軍階 1 +軍需 1 +軒轅 1 +軟化 1 +軟硬 1 +軟骨 1 +軸心 1 +較低 1 +較佳 1 +較厚 1 +較快 1 +較深 1 +載人 1 +載淳 1 +輔佐 1 +輕微 1 +輕易 1 +輕軌 1 +輕鐵 1 +輕髻 1 +輕鬆 1 +輝彥 1 +輪周 1 +輪廓 1 +輪流 1 +輪船 1 +輪迴 1 +輯錄 1 +輸掉 1 +輸精 1 +輸血 1 +輸送 1 +輻轍 1 +輾轉 1 +轉交 1 +轉任 1 +轉動 1 +轉化 1 +轉向 1 +轉型 1 +轉差 1 +轉往 1 +轉念 1 +轉播 1 +轉會 1 +轉正 1 +轉角 1 +轉賣 1 +轉赴 1 +辛勞 1 +辛哈 1 +辛基 1 +辛奈 1 +辛納 1 +辛辛 1 +辛那 1 +辟邪 1 +辦學 1 +辦有 1 +辨別 1 +辨明 1 +辨識 1 +辭典 1 +辭官 1 +辭歲 1 +辭辛 1 +辯證 1 +辰國 1 +辰男 1 +農事 1 +農墾 1 +農書 1 +農林 1 +農舍 1 +迅即 1 +迅猛 1 +迎神 1 +迎賓 1 +迎送 1 +迎面 1 +近似 1 +近侍 1 +近平 1 +近日 1 +近東 1 +近海 1 +近現 1 +近親 1 +近鄰 1 +返樸 1 +迢迢 1 +迦南 1 +迦牟 1 +迦罕 1 +迪士 1 +迪尼 1 +迪恩 1 +迪文 1 +迪歐 1 +迪比 1 +迪沙 1 +迪特 1 +迪生 1 +迪米 1 +迪納 1 +迫切 1 +迴流 1 +迷你 1 +迷唐 1 +迷路 1 +追兇 1 +追回 1 +追封 1 +追尋 1 +追尾 1 +追思 1 +追憶 1 +追查 1 +追根 1 +追殺 1 +追求 1 +追究 1 +追討 1 +追述 1 +退位 1 +退回 1 +退夷 1 +退居 1 +退敵 1 +退隱 1 +送來 1 +送到 1 +送回 1 +送殯 1 +送給 1 +送院 1 +逃亡 1 +逃奔 1 +逃至 1 +逃跑 1 +逆戟 1 +逍遙 1 +透徹 1 +透支 1 +透水 1 +透視 1 +透鏡 1 +逐客 1 +途中 1 +途人 1 +途經 1 +這兒 1 +這時 1 +通俗 1 +通商 1 +通天 1 +通宏 1 +通州 1 +通渭 1 +通貨 1 +通通 1 +通運 1 +通靈 1 +通風 1 +逛街 1 +速往 1 +速銷 1 +造價 1 +造反 1 +造就 1 +造幣 1 +造福 1 +造血 1 +造訪 1 +造謠 1 +逢吉 1 +連串 1 +連克 1 +連坐 1 +連年 1 +連座 1 +連德 1 +連成 1 +連拍 1 +連筆 1 +連篇 1 +連結 1 +連絡 1 +連通 1 +連進 1 +連餓 1 +週末 1 +週邊 1 +進位 1 +進來 1 +進動 1 +進犯 1 +逼使 1 +逼停 1 +逼到 1 +逾期 1 +遂起 1 +遇上 1 +遇刺 1 +遇有 1 +遇陛 1 +遇難 1 +遊憩 1 +遊擊 1 +遊歷 1 +遊艇 1 +遊覽 1 +遊說 1 +遊離 1 +運回 1 +運往 1 +運煤 1 +運算 1 +運糧 1 +運補 1 +運載 1 +遍布 1 +過冷 1 +過剩 1 +過多 1 +過往 1 +過敏 1 +過橋 1 +過濾 1 +過甚 1 +過繼 1 +過苛 1 +過路 1 +過頭 1 +道世 1 +道中 1 +道具 1 +道刺 1 +道墟 1 +道士 1 +道學 1 +道宇 1 +道安 1 +道格 1 +道歉 1 +道綽 1 +道羅 1 +道義 1 +道靜 1 +達上 1 +達人 1 +達信 1 +達倉 1 +達克 1 +達加 1 +達古 1 +達多 1 +達恩 1 +達拉 1 +達拏 1 +達拖 1 +達智 1 +達母 1 +達濠 1 +達科 1 +達章 1 +達米 1 +達羅 1 +達華 1 +達賴 1 +達農 1 +違背 1 +遙陽 1 +遜位 1 +遞交 1 +遞增 1 +遠呂 1 +遠嫁 1 +遠揚 1 +遠日 1 +遠洋 1 +遠處 1 +遠遠 1 +遠離 1 +遣返 1 +適之 1 +適用 1 +遭殃 1 +遮天 1 +遮蔭 1 +遮陰 1 +遲遲 1 +遷出 1 +遷居 1 +遷校 1 +選上 1 +選修 1 +選定 1 +選用 1 +選美 1 +選訓 1 +選調 1 +選進 1 +選題 1 +遺物 1 +遺留 1 +遺腹 1 +遺迹 1 +遺骸 1 +遼西 1 +遼闊 1 +避禍 1 +避開 1 +邁克 1 +邁向 1 +邁阿 1 +還擊 1 +還有 1 +邊區 1 +邗江 1 +那修 1 +那峨 1 +那提 1 +那時 1 +那普 1 +那曲 1 +那瑞 1 +那瓦 1 +那罕 1 +那順 1 +邦國 1 +邦德 1 +邦蒂 1 +邦達 1 +邪惡 1 +邪神 1 +邪馬 1 +邱家 1 +邳縣 1 +邵伯 1 +邵氏 1 +郊狼 1 +郡區 1 +郡縣 1 +郡艾 1 +部位 1 +部字 1 +部將 1 +部首 1 +郪江 1 +郫縣 1 +郭家 1 +郵報 1 +郵輪 1 +都因 1 +都城 1 +都察 1 +都尉 1 +都斯 1 +都會 1 +都有 1 +都督 1 +都靈 1 +鄂倫 1 +鄂溫 1 +鄂霍 1 +鄉內 1 +鄉團 1 +鄉村 1 +鄉長 1 +鄰域 1 +鄰居 1 +鄰里 1 +酃縣 1 +酊大 1 +配上 1 +配件 1 +配備 1 +配器 1 +配有 1 +配角 1 +酒家 1 +酒杯 1 +酒樓 1 +酒鬼 1 +酩酊 1 +酵母 1 +酷似 1 +酷刑 1 +酸根 1 +酸甘 1 +酸銨 1 +酸鎂 1 +醉醺 1 +醋酸 1 +醫書 1 +醫科 1 +醫術 1 +醬貨 1 +醴陵 1 +醺醺 1 +釀成 1 +釀造 1 +采巴 1 +釉色 1 +釋出 1 +里先 1 +里內 1 +里利 1 +里南 1 +里卡 1 +里士 1 +里多 1 +里夫 1 +里姆 1 +里希 1 +里德 1 +里拉 1 +里施 1 +里森 1 +里波 1 +里港 1 +里納 1 +里維 1 +里茨 1 +里西 1 +里賽 1 +里迢 1 +里達 1 +里馬 1 +重創 1 +重力 1 +重回 1 +重復 1 +重心 1 +重情 1 +重播 1 +重核 1 +重物 1 +重獲 1 +重現 1 +重生 1 +重用 1 +重疊 1 +重禮 1 +重組 1 +重義 1 +重考 1 +重製 1 +重複 1 +重見 1 +重讀 1 +重鎮 1 +重開 1 +重陽 1 +重音 1 +重鳳 1 +野史 1 +野外 1 +野心 1 +野戰 1 +野木 1 +野球 1 +野菜 1 +量壽 1 +量度 1 +量洪 1 +金剛 1 +金寶 1 +金帶 1 +金幣 1 +金平 1 +金德 1 +金斯 1 +金森 1 +金氏 1 +金泉 1 +金浦 1 +金湖 1 +金牛 1 +金獎 1 +金箔 1 +金羅 1 +金美 1 +金華 1 +金質 1 +金邊 1 +金銀 1 +金錢 1 +金門 1 +金靴 1 +金頂 1 +金魚 1 +金鵰 1 +釜山 1 +針劑 1 +釧路 1 +鈺源 1 +鉑金 1 +銀杏 1 +銀熊 1 +銀牌 1 +銀白 1 +銀紅 1 +銀色 1 +銅仁 1 +銅像 1 +銅削 1 +銅斧 1 +銅柄 1 +銅臿 1 +銅製 1 +銅銎 1 +銅錛 1 +銅錢 1 +銘皖 1 +銘銘 1 +銜稱 1 +銳利 1 +銷毀 1 +銷量 1 +鋪成 1 +鋪有 1 +鋸齒 1 +鋼板 1 +錄影 1 +錄得 1 +錄放 1 +錘樹 1 +錢上 1 +錦俊 1 +錦承 1 +錦江 1 +錦田 1 +錫伯 1 +錫勇 1 +錫昌 1 +錯視 1 +錯覺 1 +錳礦 1 +鍊金 1 +鍋中 1 +鍋內 1 +鍋爐 1 +鍛鍊 1 +鍾情 1 +鎖妖 1 +鎖閉 1 +鎮守 1 +鎮岳 1 +鎮朔 1 +鎮賚 1 +鎮里 1 +鎮靜 1 +鎳銀 1 +鏡波 1 +鏡湖 1 +鐵削 1 +鐵匾 1 +鐵木 1 +鐵棍 1 +鐵民 1 +鐵爐 1 +鐵管 1 +鐵釘 1 +鐵銹 1 +鐵錛 1 +鑑別 1 +鑑定 1 +鑑泉 1 +鑑證 1 +鑒定 1 +鑫新 1 +鑽入 1 +鑽出 1 +鑽探 1 +鑿出 1 +長凳 1 +長史 1 +長婁 1 +長孫 1 +長尾 1 +長岡 1 +長崎 1 +長廊 1 +長廷 1 +長方 1 +長榮 1 +長毛 1 +長治 1 +長溝 1 +長滿 1 +長瑪 1 +長盛 1 +長笛 1 +長篇 1 +長編 1 +長跑 1 +長頸 1 +長髮 1 +門修 1 +門坎 1 +門廳 1 +門式 1 +閃米 1 +閃長 1 +閃電 1 +閉日 1 +開價 1 +開光 1 +開啟 1 +開場 1 +開墾 1 +開學 1 +開工 1 +開往 1 +開戰 1 +開拓 1 +開挖 1 +開支 1 +開教 1 +開業 1 +開槍 1 +開球 1 +開瑞 1 +開票 1 +開車 1 +開辦 1 +開錄 1 +閑聊 1 +閑談 1 +閒言 1 +閒語 1 +間斷 1 +間碟 1 +間距 1 +閘口 1 +閘機 1 +閩侯 1 +閩南 1 +闖進 1 +關中 1 +關斷 1 +關緊 1 +關連 1 +關重 1 +闡述 1 +阡陌 1 +阪神 1 +防凍 1 +防止 1 +防盜 1 +防護 1 +阻塞 1 +阻撓 1 +阻隔 1 +阿一 1 +阿仙 1 +阿信 1 +阿修 1 +阿內 1 +阿勒 1 +阿勝 1 +阿勞 1 +阿基 1 +阿堯 1 +阿奇 1 +阿宋 1 +阿密 1 +阿寧 1 +阿尼 1 +阿布 1 +阿斗 1 +阿普 1 +阿曼 1 +阿東 1 +阿比 1 +阿波 1 +阿猴 1 +阿瑜 1 +阿穆 1 +阿納 1 +阿羅 1 +阿耳 1 +阿聯 1 +阿育 1 +阿茲 1 +阿諾 1 +阿賈 1 +阿赫 1 +阿連 1 +阿道 1 +阿達 1 +阿里 1 +阿隆 1 +陀斯 1 +陀耶 1 +附上 1 +附加 1 +附蟲 1 +附表 1 +附身 1 +降將 1 +降格 1 +降水 1 +降班 1 +降臨 1 +降魔 1 +限定 1 +限時 1 +陡壁 1 +院士 1 +院子 1 +院落 1 +除冰 1 +除夕 1 +除此 1 +除非 1 +陪葬 1 +陪都 1 +陰天 1 +陰暗 1 +陰陽 1 +陳國 1 +陳屍 1 +陳相 1 +陳述 1 +陵園 1 +陵蘭 1 +陶恩 1 +陷落 1 +陸仔 1 +陸域 1 +陸行 1 +陽安 1 +陽明 1 +隆亨 1 +隆坡 1 +隆坦 1 +隆基 1 +隆拿 1 +隆納 1 +隆索 1 +隆赫 1 +隊列 1 +隊名 1 +隔日 1 +隔開 1 +隕星 1 +隕鐵 1 +際春 1 +隠居 1 +隨丁 1 +隨便 1 +隨同 1 +隨往 1 +隨時 1 +隨軍 1 +隨隊 1 +險些 1 +險要 1 +隱含 1 +隱姓 1 +隱居 1 +隱性 1 +隱私 1 +隻身 1 +雄師 1 +雄獅 1 +雅克 1 +雅加 1 +雅可 1 +雅各 1 +雅君 1 +雅福 1 +集寧 1 +集結 1 +集聚 1 +雌性 1 +雌獸 1 +雌鯨 1 +雙十 1 +雙子 1 +雙收 1 +雙江 1 +雜姓 1 +雜糧 1 +雜處 1 +雜食 1 +雞腿 1 +雞頭 1 +離別 1 +離域 1 +離場 1 +離子 1 +離島 1 +離群 1 +離職 1 +難吃 1 +難得 1 +難攻 1 +難過 1 +雨季 1 +雨後 1 +雪上 1 +雪佛 1 +雪兒 1 +雪崩 1 +雪弟 1 +雪梅 1 +雲中 1 +雲亭 1 +雲岩 1 +雲松 1 +雲里 1 +零件 1 +零部 1 +零食 1 +雷他 1 +雷切 1 +雷利 1 +雷姆 1 +雷定 1 +雷托 1 +雷斯 1 +雷昂 1 +雷曼 1 +雷格 1 +雷特 1 +雷王 1 +雷羅 1 +雷蒂 1 +雷西 1 +雷雨 1 +電信 1 +電器 1 +電極 1 +電氣 1 +電瓶 1 +電線 1 +電通 1 +電邀 1 +需時 1 +霆鋒 1 +震寰 1 +震波 1 +震災 1 +霍亂 1 +霍伊 1 +霍夫 1 +霍姆 1 +霍巴 1 +霍斯 1 +霍次 1 +露出 1 +露比 1 +露臉 1 +露西 1 +霸佔 1 +霸權 1 +靈前 1 +靈力 1 +靈性 1 +靈感 1 +靈柩 1 +靈異 1 +靈籤 1 +靈長 1 +靈魂 1 +青梅 1 +青森 1 +青睞 1 +青訓 1 +青金 1 +靖雯 1 +靜安 1 +靜岡 1 +靜華 1 +非鯽 1 +靠右 1 +靠左 1 +面具 1 +面向 1 +面貌 1 +革除 1 +鞦韆 1 +韃靼 1 +韋塔 1 +韋契 1 +韋德 1 +韋拉 1 +韋拿 1 +韋比 1 +韋科 1 +韓氏 1 +韓浜 1 +音律 1 +音色 1 +音量 1 +音高 1 +韶之 1 +響號 1 +頂上 1 +頂尖 1 +頂峰 1 +頂端 1 +頂級 1 +項鏈 1 +順宗 1 +順岸 1 +順德 1 +順應 1 +順懷 1 +順治 1 +順滑 1 +順陽 1 +頌平 1 +頌揚 1 +預估 1 +預告 1 +預知 1 +預示 1 +預約 1 +頑石 1 +頒給 1 +頗多 1 +頗大 1 +頗有 1 +頗盛 1 +頗豐 1 +領事 1 +領取 1 +領奏 1 +領航 1 +領軍 1 +領隊 1 +頜形 1 +頜翼 1 +頜腔 1 +頜鯉 1 +頭上 1 +頭前 1 +頭型 1 +頭士 1 +頭尾 1 +頭槌 1 +頭版 1 +頭盔 1 +頭紗 1 +頭門 1 +頭髮 1 +頸部 1 +頸長 1 +頸鹿 1 +頹垣 1 +頻寬 1 +頻散 1 +頻繁 1 +頻頻 1 +題獻 1 +題記 1 +額外 1 +額度 1 +願違 1 +類別 1 +類固 1 +顯光 1 +顯徑 1 +顯現 1 +顯靈 1 +風化 1 +風尚 1 +風波 1 +風行 1 +風間 1 +風雨 1 +風雪 1 +飛往 1 +飛抵 1 +飛毛 1 +飛沫 1 +飛碟 1 +飛鏢 1 +飛靶 1 +飛鳥 1 +飛龍 1 +食人 1 +食肆 1 +食肉 1 +食蟲 1 +食鹽 1 +飲茶 1 +飼料 1 +飼草 1 +飽和 1 +飽經 1 +飾曲 1 +飾物 1 +餃子 1 +養份 1 +養大 1 +養女 1 +養母 1 +養父 1 +養精 1 +養育 1 +養菊 1 +養蠶 1 +餐車 1 +餘熱 1 +餘眾 1 +餘萬 1 +館前 1 +館名 1 +館址 1 +饑餓 1 +饒平 1 +饕餮 1 +首仗 1 +首個 1 +首名 1 +首場 1 +首屈 1 +首席 1 +首戰 1 +首批 1 +首日 1 +首映 1 +首條 1 +首艦 1 +首讀 1 +香世 1 +香亭 1 +香儂 1 +香吉 1 +香味 1 +香坊 1 +香塍 1 +香水 1 +香洲 1 +香火 1 +香織 1 +馬上 1 +馬修 1 +馬內 1 +馬六 1 +馬匹 1 +馬台 1 +馬喇 1 +馬圈 1 +馬塔 1 +馬奇 1 +馬威 1 +馬托 1 +馬提 1 +馬特 1 +馬球 1 +馬粦 1 +馬約 1 +馬莎 1 +馬賽 1 +馬赫 1 +馬路 1 +馬雅 1 +馬雍 1 +馬鞍 1 +馬黑 1 +馳名 1 +馴化 1 +駐任 1 +駐地 1 +駐防 1 +駕崩 1 +駙馬 1 +駛入 1 +駛過 1 +駿業 1 +騁遠 1 +騎馬 1 +騏一 1 +騙徒 1 +騰出 1 +騰訊 1 +騷擾 1 +驗屍 1 +驗票 1 +驗證 1 +驗電 1 +驚人 1 +驚動 1 +驚喜 1 +驚嘆 1 +驚訝 1 +驚醒 1 +驟減 1 +驟逝 1 +驢肉 1 +骨幹 1 +骯髒 1 +骷髏 1 +體側 1 +體外 1 +體委 1 +體工 1 +體教 1 +體會 1 +體溫 1 +髖骨 1 +高下 1 +高出 1 +高升 1 +高在 1 +高地 1 +高大 1 +高峰 1 +高座 1 +高手 1 +高效 1 +高新 1 +高杉 1 +高梅 1 +高檔 1 +高清 1 +高漲 1 +高潮 1 +高熱 1 +高燥 1 +高琦 1 +高盧 1 +高聳 1 +高處 1 +高買 1 +高質 1 +高超 1 +高雄 1 +高高 1 +髮生 1 +髮辮 1 +鬆髻 1 +鬚鯨 1 +鬥雞 1 +鬧出 1 +鬼影 1 +鬼怪 1 +鬼道 1 +魁智 1 +魅惑 1 +魏國 1 +魏救 1 +魏斯 1 +魏氏 1 +魏澤 1 +魔力 1 +魔界 1 +魔石 1 +魔鬼 1 +魚尾 1 +魚腹 1 +魚苗 1 +魚類 1 +魯伯 1 +魯克 1 +魯國 1 +魯敉 1 +魯曉 1 +魯木 1 +魯特 1 +魯瓊 1 +魯登 1 +魯良 1 +魯茨 1 +魯西 1 +魯道 1 +鮑亞 1 +鮑克 1 +鮑爾 1 +鮑維 1 +鮑里 1 +鮑魚 1 +鮮有 1 +鮮用 1 +鮮虞 1 +鯉齒 1 +鰓蓋 1 +鰭條 1 +鰺沢 1 +鱗甲 1 +鱗蟒 1 +鱗骨 1 +鳥獸 1 +鳥種 1 +鳳彬 1 +鳳花 1 +鳴叫 1 +鳴放 1 +鳴道 1 +鴛鴦 1 +鴻南 1 +鴻基 1 +鴻章 1 +鴻績 1 +鴻華 1 +鴻超 1 +鴻逵 1 +鴻銘 1 +鹽城 1 +鹽州 1 +鹽酸 1 +鹿鼎 1 +麗卡 1 +麗晶 1 +麗泰 1 +麗特 1 +麗珍 1 +麗金 1 +麗閣 1 +麗雨 1 +麗魚 1 +麥加 1 +麥卡 1 +麥拉 1 +麥格 1 +麥當 1 +麥芽 1 +麥迪 1 +麩氨 1 +麵團 1 +麵皮 1 +麻呂 1 +麻城 1 +麻塞 1 +麻將 1 +麻布 1 +麻木 1 +麻痹 1 +黃岡 1 +黃巾 1 +黃昏 1 +黃沙 1 +黃河 1 +黃蜂 1 +黎家 1 +黎明 1 +黎波 1 +黎筍 1 +黎絲 1 +黑奴 1 +黑帶 1 +黑手 1 +黑文 1 +黑暗 1 +黑木 1 +黑板 1 +黑死 1 +黑海 1 +黑衫 1 +黑錢 1 +黑鐵 1 +黑雲 1 +黑髮 1 +黑麻 1 +默古 1 +默史 1 +默比 1 +默生 1 +默默 1 +黛安 1 +黛絲 1 +點陣 1 +點頭 1 +點點 1 +黨團 1 +黨委 1 +黨校 1 +黨歌 1 +黨衛 1 +黨部 1 +黨魁 1 +鼎灶 1 +鼎芬 1 +鼎金 1 +鼓手 1 +鼬鼠 1 +鼻栓 1 +齊國 1 +齊放 1 +齊蒂 1 +齊蓋 1 +齒狀 1 +齒輪 1 +齒鼩 1 +齲齒 1 +龍台 1 +龍女 1 +龍文 1 +龍眼 1 +龍耳 1 +龍華 1 +龍頭 1 +龐特 1 +龐貝 1 +龜茲 1 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/label-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/label-map new file mode 100644 index 0000000000000000000000000000000000000000..d46b2d1bcbc014e63b3447e64d68cd8becbdd739 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/label-map @@ -0,0 +1,43 @@ +42 +punct 12965 +nmod 11147 +nsubj 7134 +obj 6016 +nummod 4732 +case:suff 4179 +acl 4163 +root 3797 +mark 3445 +det 3434 +advmod 2962 +case 2739 +case:dec 2517 +conj 2421 +obl 2000 +dep 1998 +mark:relcl 1833 +clf 1722 +ccomp 1655 +amod 1525 +xcomp 1382 +acl:relcl 1356 +cop 1349 +cc 1334 +nmod:tmod 1199 +appos 1089 +case:aspect 718 +aux 675 +case:pref 569 +aux:pass 324 +csubj 280 +flat:foreign 250 +nsubj:pass 211 +discourse 151 +aux:caus 149 +advcl 125 +mark:advb 79 +iobj 61 +dislocated 45 +mark:comp 17 +csubj:pass 5 +vocative 1 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/lcword-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/lcword-map new file mode 100644 index 0000000000000000000000000000000000000000..4315174293f17157ddb0dfacedef874877c8bd28 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/lcword-map @@ -0,0 +1,16263 @@ +16262 +, 5851 +的 4289 +. 3759 +在 1273 +年 1165 +9999 1070 +、 952 +是 918 +為 887 +一 863 +於 680 +99 647 +和 639 +9 617 +了 614 +人 467 +個 466 +月 456 +有 453 +他 439 +( 429 +) 429 +與 380 +中 376 +日 356 +」 321 +「 320 +被 315 +這 300 +會 258 +並 255 +以 253 +而 245 +也 244 +上 228 +中國 218 +由 215 +《 213 +》 213 +之 211 +兩 203 +後 202 +及 191 +時 188 +位 186 +· 183 +999 178 +等 175 +到 172 +但 162 +對 158 +大 157 +此 157 +不 156 +其 155 +所 150 +種 143 +或 140 +將 139 +次 132 +美國 131 +成 130 +者 127 +至 125 +該 123 +區 118 +開始 118 +部 117 +三 116 +家 116 +可以 115 +她 115 +都 114 +來 113 +因 113 +國 109 +人口 108 +軍 107 +市 104 +使用 102 +省 102 +從 101 +名 98 +著 97 +則 95 +多 94 +用 94 +日本 93 +沒有 93 +地 92 +曾 92 +第一 92 +他們 91 +州 90 +公司 88 +就 88 +性 88 +由於 88 +其中 87 +地區 87 +新 87 +稱 87 +國家 86 +政府 86 +: 84 +已 84 +主要 83 +小 82 +; 81 +世界 81 +可 81 +大學 81 +下 80 +不同 79 +自 79 +香港 79 +縣 77 +自己 77 +前 76 +因為 76 +研究 76 +總 76 +最 75 +面積 75 +李 74 +還 73 +向 72 +王 72 +進行 72 +它 71 +包括 69 +站 69 +四 68 +號 67 +當時 66 +這些 66 +部分 66 +工作 65 +米 65 +認為 65 +也是 64 +以及 64 +學 64 +村 64 +發現 64 +說 64 +作 63 +又 62 +屬 62 +平方公里 62 +中華 61 +同時 60 +學院 60 +條 60 +成立 59 +第二 59 +二 58 +五 58 +亦 58 +代表 58 +發展 58 +發生 58 +美 58 +能 58 +之後 57 +使 57 +社會 57 +要 57 +一些 56 +人民 56 +內 56 +其他 56 +約 56 +世紀 54 +元 54 +場 54 +過 54 +建築 53 +為了 53 +線 53 +只 52 +張 52 +把 52 +獲得 52 +目前 52 +台 51 +文化 51 +英國 51 +重要 51 +中心 50 +但是 50 +局 50 +更 50 +許多 50 +之間 49 +可能 49 +如 49 +歷史 49 +遊戲 49 +公里 48 +共 48 +帝國 48 +期間 48 +歲 48 +處 48 +音樂 48 +黨 48 +一般 47 +年代 47 +根據 47 +行星 47 +隊 47 +電影 47 +政治 46 +鐵路 46 +城市 45 +故事 45 +組織 45 +便 44 +學校 44 +所有 44 +科學 44 +英 44 +- 43 +任 43 +作品 43 +指 43 +最後 43 +機 43 +語 43 +通過 43 +間 43 +關係 43 +已經 42 +建立 42 +時間 42 +當 42 +電視 42 +共和 41 +後來 41 +比 41 +管理 41 +表示 41 +讓 41 +通常 41 +高 41 +出現 40 +影響 40 +成功 40 +戰爭 40 +提供 40 +系統 40 +動物 39 +地方 39 +就是 39 +座 39 +設計 39 +負責 39 +鎮 39 +長 39 +館 39 +卻 38 +國際 38 +德國 38 +技術 38 +方面 38 +最終 38 +父親 38 +車站 38 +上海 37 +人物 37 +出 37 +分 37 +台灣 37 +各 37 +層 37 +山 37 +方 37 +河 37 +即 36 +參加 36 +擔任 36 +時期 36 +服務 36 +正式 36 +生活 36 +給 36 +要求 36 +路 36 +運動 36 +9,999 35 +一直 35 +再 35 +單位 35 +委員 35 +很 35 +書 35 +段 35 +民國 35 +法國 35 +理論 35 +人類 34 +均 34 +女 34 +才 34 +教 34 +文 34 +歐洲 34 +決定 34 +漢 34 +現在 34 +第三 34 +航空 34 +行政 34 +足球 34 +雖然 34 +八 33 +問題 33 +小說 33 +我 33 +教育 33 +製作 33 +不是 32 +保護 32 +全國 32 +北 32 +印度 32 +員 32 +形成 32 +很多 32 +得到 32 +活動 32 +節目 32 +西班牙 32 +主義 31 +寺 31 +屆 31 +島 31 +市鎮 31 +方式 31 +時代 31 +最高 31 +生 31 +街 31 +起 31 +需要 31 +99% 30 +中央 30 +另 30 +另外 30 +器 30 +天 30 +得 30 +控制 30 +擁有 30 +每 30 +產生 30 +經濟 30 +羅馬 30 +進入 30 +隨 30 +仍 29 +公園 29 +具有 29 +去 29 +大陸 29 +式 29 +接受 29 +東 29 +球隊 29 +當地 29 +院 29 +雙 29 +9.99 28 +並且 28 +北京 28 +受到 28 +同 28 +如果 28 +學生 28 +工程 28 +時候 28 +港 28 +物 28 +級 28 +計劃 28 +超過 28 +道 28 +電腦 28 +存在 27 +室 27 +對於 27 +情況 27 +戰鬥 27 +方法 27 +林 27 +機場 27 +比賽 27 +總統 27 +義大利 27 +都是 27 +非 27 +非常 27 +點 27 +人員 26 +做 26 +原因 26 +國民 26 +支持 26 +數 26 +法 26 +派 26 +然而 26 +獨立 26 +甚至 26 +生物 26 +聯合 26 +項 26 +主 25 +兒子 25 +出版 25 +劉 25 +南 25 +巴士 25 +幾 25 +我們 25 +權 25 +海拔 25 +第99 25 +經過 25 +議會 25 +賽 25 +99.99 24 +交通 24 +例如 24 +分布 24 +加入 24 +化 24 +同年 24 +城 24 +大量 24 +於是 24 +族 24 +最大 24 +未 24 +海 24 +湖 24 +生產 24 +皇帝 24 +科 24 +第9 24 +系列 24 +高度 24 +9.9 23 +事件 23 +們 23 +內容 23 +命名 23 +型 23 +宣布 23 +導致 23 +帶 23 +必須 23 +成員 23 +本 23 +正 23 +清朝 23 +演出 23 +無 23 +直接 23 +行為 23 +裡 23 +西 23 +距離 23 +軍事 23 +部隊 23 +鄉 23 +銀行 23 +集團 23 +99,999 22 +一樣 22 +不少 22 +不過 22 +傳統 22 +僅 22 +副 22 +反對 22 +單 22 +增加 22 +它們 22 +思想 22 +有關 22 +業 22 +此外 22 +母親 22 +水 22 +灣 22 +版 22 +紐約 22 +組成 22 +結構 22 +聯盟 22 +聯賽 22 +能力 22 +華 22 +設 22 +語言 22 +附近 22 +除 22 +一起 21 +作用 21 +出生 21 +制 21 +力 21 +受 21 +古 21 +只有 21 +唯一 21 +地位 21 +府 21 +廣泛 21 +植物 21 +海軍 21 +無法 21 +獲 21 +率 21 +球 21 +環境 21 +紀念 21 +結束 21 +舉行 21 +角色 21 +議員 21 +選舉 21 +里 21 +量 21 +韓 21 +體 21 +主席 20 +仍然 20 +六 20 +冠軍 20 +出任 20 +分子 20 +原子 20 +參與 20 +地下 20 +城鎮 20 +天津 20 +工業 20 +希臘 20 +度 20 +引起 20 +採用 20 +攻擊 20 +整個 20 +文學 20 +文物 20 +朝鮮 20 +東北 20 +核 20 +機構 20 +比較 20 +清 20 +猶太 20 +現代 20 +管轄 20 +範圍 20 +細胞 20 +經常 20 +胡 20 +自治 20 +自由 20 +角 20 +逐漸 20 +重新 20 +類型 20 +不久 19 +不能 19 +代 19 +以上 19 +佔領 19 +全 19 +分別 19 +原 19 +台北 19 +唐 19 +多數 19 +天文 19 +字 19 +巴黎 19 +最早 19 +會議 19 +有些 19 +民族 19 +洋 19 +結果 19 +繼續 19 +能夠 19 +趙 19 +造成 19 +達 19 +達到 19 +部份 19 +鄭 19 +風格 19 +不會 18 +亞 18 +令 18 +任何 18 +企業 18 +先後 18 +列車 18 +功能 18 +半 18 +取得 18 +合併 18 +外交 18 +子 18 +廣州 18 +戰役 18 +所以 18 +明朝 18 +期 18 +每年 18 +毛 18 +治療 18 +法院 18 +畢業 18 +疾病 18 +相當 18 +節 18 +艦隊 18 +身體 18 +軍隊 18 +進 18 +陳 18 +離開 18 +領導 18 +體育 18 +99.9 17 +七 17 +你 17 +再次 17 +十 17 +名字 17 +大戰 17 +宗教 17 +家族 17 +希望 17 +廣場 17 +想 17 +戰 17 +採取 17 +提出 17 +改 17 +教堂 17 +新聞 17 +星 17 +曲 17 +最初 17 +歐 17 +漫畫 17 +片 17 +物理 17 +特別 17 +發行 17 +經 17 +總部 17 +自然 17 +蘇聯 17 +行動 17 +製造 17 +西北 17 +資料 17 +選擇 17 +那 17 +金 17 +領域 17 +顆 17 +類 17 +飛機 17 +九龍 16 +低 16 +像 16 +共同 16 +利用 16 +制度 16 +前往 16 +創作 16 +勢力 16 +區域 16 +協助 16 +各種 16 +大樓 16 +家庭 16 +實驗 16 +居民 16 +山東 16 +心理 16 +或者 16 +拒絕 16 +步 16 +武器 16 +民主 16 +法律 16 +爆發 16 +狀態 16 +而且 16 +藝術 16 +表現 16 +記者 16 +設有 16 +設立 16 +資源 16 +軌道 16 +過程 16 +道路 16 +還是 16 +革命 16 +首次 16 +高速 16 +下轄 15 +中共 15 +主角 15 +作戰 15 +初 15 +則是 15 +化石 15 +十分 15 +南京 15 +南部 15 +商 15 +噸 15 +回到 15 +國內 15 +國王 15 +地球 15 +基督 15 +大廈 15 +大約 15 +太陽 15 +女兒 15 +女性 15 +如此 15 +學習 15 +完全 15 +實際 15 +常 15 +常見 15 +幾乎 15 +應用 15 +承認 15 +投資 15 +指出 15 +指揮 15 +普查 15 +未來 15 +東南 15 +橋 15 +此後 15 +火星 15 +版本 15 +牠們 15 +發表 15 +白 15 +直到 15 +碼頭 15 +科技 15 +立法 15 +組 15 +統治 15 +老 15 +職業 15 +著名 15 +蒙古 15 +西部 15 +調查 15 +跟 15 +路線 15 +車輛 15 +農業 15 +這樣 15 +酒 15 +鐵道 15 +集 15 +/ 14 +99999 14 +999萬 14 +一定 14 +交易 14 +人們 14 +今 14 +以來 14 +位置 14 +使得 14 +俄羅斯 14 +俱樂部 14 +傳播 14 +兒童 14 +公主 14 +劇 14 +北部 14 +博物 14 +合作 14 +基本 14 +境內 14 +外 14 +太平 14 +失去 14 +完成 14 +容易 14 +密度 14 +專業 14 +市場 14 +幫助 14 +建造 14 +抗 14 +擊敗 14 +旗 14 +曾經 14 +有限 14 +架 14 +案 14 +棲息 14 +波蘭 14 +澳門 14 +營運 14 +特色 14 +獎 14 +男 14 +相同 14 +看到 14 +簡稱 14 +系 14 +統計 14 +網路 14 +聯邦 14 +色 14 +董事 14 +規模 14 +視 14 +解決 14 +言 14 +起來 14 +車 14 +這裡 14 +進攻 14 +開發 14 +限制 14 +顯示 14 +黃 14 +99萬 13 +九 13 +倫敦 13 +全部 13 +公路 13 +公開 13 +其後 13 +初期 13 +加上 13 +博士 13 +司令 13 +同意 13 +因而 13 +圖書 13 +土地 13 +埃及 13 +基礎 13 +堂 13 +墨西哥 13 +天主 13 +妻子 13 +娛樂 13 +建 13 +建設 13 +形 13 +形式 13 +從事 13 +手 13 +打 13 +改變 13 +故 13 +教會 13 +數學 13 +數據 13 +數量 13 +早期 13 +更多 13 +東京 13 +梁 13 +樂團 13 +樓 13 +模式 13 +死 13 +死亡 13 +每個 13 +水平 13 +流域 13 +準備 13 +物種 13 +物質 13 +王國 13 +玩家 13 +男性 13 +當選 13 +病 13 +目 13 +目標 13 +相關 13 +知識 13 +社 13 +第四 13 +紀錄 13 +統一 13 +舊 13 +街道 13 +設定 13 +身份 13 +較 13 +辦公 13 +速度 13 +運輸 13 +郡 13 +項目 13 +食物 13 +馬 13 +---- 12 +一帶 12 +上帝 12 +且 12 +中學 12 +中部 12 +之前 12 +京 12 +人數 12 +什麼 12 +以下 12 +份 12 +保留 12 +個人 12 +價值 12 +元素 12 +內部 12 +公元 12 +具 12 +半島 12 +原本 12 +反應 12 +反映 12 +可是 12 +商業 12 +嚴重 12 +基地 12 +大型 12 +女子 12 +孫 12 +將軍 12 +尤其 12 +居住 12 +師 12 +帶來 12 +平均 12 +建議 12 +很大 12 +律師 12 +恆星 12 +恐怖 12 +應 12 +據 12 +改革 12 +政策 12 +新加坡 12 +月台 12 +有時 12 +東部 12 +楊 12 +標準 12 +機關 12 +歌手 12 +決賽 12 +汽車 12 +減少 12 +潛艇 12 +熱帶 12 +瑞典 12 +生命 12 +產品 12 +產業 12 +盃 12 +相對 12 +眾 12 +眾多 12 +知道 12 +神 12 +精神 12 +經營 12 +船 12 +該國 12 +變成 12 +賽事 12 +近 12 +透過 12 +遭到 12 +遺址 12 +避免 12 +郭 12 +醫院 12 +重建 12 +重慶 12 +門 12 +電子 12 +? 11 +主張 11 +主持 11 +主教 11 +之中 11 +亨利 11 +人士 11 +以前 11 +以色列 11 +件 11 +伊斯蘭 11 +佔 11 +作者 11 +保持 11 +信仰 11 +先生 11 +全球 11 +出身 11 +創立 11 +創辦 11 +力量 11 +去世 11 +反 11 +取代 11 +召開 11 +周 11 +園 11 +團 11 +大會 11 +奧地利 11 +威脅 11 +季 11 +安全 11 +專輯 11 +帝 11 +平方米 11 +強烈 11 +接近 11 +推出 11 +描述 11 +播放 11 +文字 11 +普遍 11 +末 11 +朱 11 +業務 11 +殖民 11 +江 11 +江蘇 11 +涉及 11 +現時 11 +界 11 +留下 11 +目的 11 +相信 11 +看 11 +社區 11 +福建 11 +管 11 +給予 11 +網站 11 +線路 11 +繼承 11 +英格蘭 11 +見 11 +試圖 11 +資訊 11 +超 11 +邊 11 +部門 11 +隻 11 +面 11 +首 11 +99.9% 10 +99.99% 10 +並非 10 +事 10 +事業 10 +交流 10 +以後 10 +來往 10 +供 10 +俄 10 +儘管 10 +勞動 10 +包含 10 +化學 10 +協會 10 +君主 10 +和平 10 +唱片 10 +圈 10 +國旗 10 +國會 10 +報 10 +報告 10 +威廉 10 +學位 10 +寬 10 +廠 10 +徐 10 +復興 10 +感到 10 +手術 10 +投入 10 +接 10 +推動 10 +播出 10 +支 10 +改名 10 +文明 10 +文藝 10 +明顯 10 +有效 10 +杭州 10 +東方 10 +條件 10 +模型 10 +殺 10 +河流 10 +法庭 10 +波 10 +洲 10 +派遣 10 +演員 10 +演唱 10 +火車 10 +爭議 10 +特定 10 +特徵 10 +特殊 10 +獨特 10 +生長 10 +當中 10 +症 10 +發動 10 +發射 10 +確定 10 +神話 10 +移民 10 +空間 10 +立 10 +篇 10 +終於 10 +結婚 10 +綫 10 +維持 10 +總理 10 +群 10 +若 10 +華盛頓 10 +葡萄 10 +蔡 10 +藏 10 +蘇 10 +衝突 10 +西藏 10 +規定 10 +訓練 10 +記 10 +記載 10 +記錄 10 +話 10 +該市 10 +警察 10 +變化 10 +責任 10 +起源 10 +逝世 10 +運行 10 +醫 10 +錦標 10 +關於 10 +陸軍 10 +雜誌 10 +需 10 +類似 10 +飛行 10 +首都 10 +駐 10 +'' 9 +一切 9 +一致 9 +上陣 9 +下降 9 +不斷 9 +不滿 9 +中山 9 +丹麥 9 +之外 9 +事務 9 +互相 9 +介紹 9 +來到 9 +健康 9 +光 9 +內閣 9 +全長 9 +公布 9 +其實 9 +再度 9 +出來 9 +出售 9 +分支 9 +到達 9 +動畫 9 +南方 9 +危險 9 +古代 9 +古典 9 +叫 9 +吃 9 +各類 9 +品 9 +國務 9 +團體 9 +地點 9 +執行 9 +塔 9 +士兵 9 +奪得 9 +好 9 +媒體 9 +字母 9 +孩子 9 +學者 9 +寫 9 +對手 9 +就讀 9 +工人 9 +帶領 9 +廟 9 +引擎 9 +強 9 +強大 9 +後期 9 +快速 9 +恢復 9 +意外 9 +戰略 9 +打擊 9 +批評 9 +拍攝 9 +接觸 9 +攻入 9 +放棄 9 +政權 9 +教學 9 +星期 9 +普通 9 +朋友 9 +未能 9 +本人 9 +本身 9 +枚 9 +柏林 9 +核心 9 +森林 9 +標誌 9 +機會 9 +機車 9 +權利 9 +此時 9 +殿 9 +民間 9 +沿海 9 +浙江 9 +湖泊 9 +滿洲 9 +爆炸 9 +特大 9 +狀況 9 +現 9 +瑞士 9 +當局 9 +發布 9 +皇后 9 +皇家 9 +相互 9 +相似 9 +石 9 +破壞 9 +穩定 9 +空中 9 +第五 9 +絕對 9 +經歷 9 +經理 9 +綜合 9 +總督 9 +老師 9 +而是 9 +聯繫 9 +職務 9 +肉 9 +自行 9 +芬蘭 9 +花園 9 +菲律賓 9 +處理 9 +觀眾 9 +解放 9 +評 9 +貢獻 9 +資格 9 +進士 9 +運作 9 +遭 9 +那麼 9 +酒店 9 +金屬 9 +階段 9 +隧道 9 +隨後 9 +集中 9 +電話 9 +青年 9 +頻道 9 +顏色 9 +高等 9 +-- 8 +the 8 +上升 8 +下來 8 +中環 8 +主題 8 +亞洲 8 +人工 8 +以外 8 +佔地 8 +何 8 +依據 8 +俄國 8 +保守 8 +信息 8 +傅 8 +價格 8 +儒 8 +光棍 8 +內地 8 +內戰 8 +公分 8 +分鐘 8 +利益 8 +劇情 8 +劑 8 +加 8 +加拿大 8 +十一 8 +即使 8 +原來 8 +口 8 +古老 8 +同樣 8 +命令 8 +喜歡 8 +因素 8 +圖 8 +圖案 8 +地鐵 8 +報道 8 +增長 8 +大多 8 +大小 8 +大道 8 +始 8 +官 8 +家人 8 +專門 8 +小型 8 +小時 8 +尚 8 +局長 8 +山脈 8 +山西 8 +工藝 8 +工資 8 +左右 8 +巨大 8 +平方千米 8 +幻想 8 +廣播 8 +廣東 8 +廳 8 +往往 8 +從此 8 +德 8 +意義 8 +意見 8 +或是 8 +房屋 8 +批 8 +按 8 +提升 8 +提高 8 +攝影 8 +政 8 +效果 8 +教授 8 +文章 8 +方案 8 +旅遊 8 +早 8 +明確 8 +書記 8 +書院 8 +曹 8 +材料 8 +武 8 +武漢 8 +比如 8 +污染 8 +注意 8 +測試 8 +澳大利亞 8 +澳洲 8 +瀋陽 8 +燃料 8 +爵士 8 +父母 8 +現存 8 +男子 8 +病逝 8 +發明 8 +白色 8 +的話 8 +皆 8 +監督 8 +真正 8 +知 8 +知名 8 +秘書 8 +秦 8 +程度 8 +立方米 8 +符號 8 +等等 8 +粒子 8 +紅 8 +維也納 8 +編碼 8 +編輯 8 +署 8 +羽毛 8 +翻譯 8 +考慮 8 +聚集 8 +股份 8 +臨時 8 +良好 8 +芝加哥 8 +葉 8 +表達 8 +複雜 8 +襲擊 8 +西南 8 +解釋 8 +討論 8 +許 8 +詞 8 +變 8 +貓 8 +賽季 8 +贏得 8 +軟體 8 +轉 8 +通 8 +過去 8 +邨 8 +部長 8 +鄧 8 +重 8 +重大 8 +銀河 8 +鏡 8 +長度 8 +隨即 8 +雄性 8 +靠 8 +餐廳 8 +首府 8 +高中 8 +a 7 +a999 7 +~ 7 +下午 7 +不可 7 +主人 7 +之下 7 +事實 7 +事情 7 +二世 7 +二戰 7 +交換 7 +任命 7 +伊麗莎白 7 +住宅 7 +佛教 7 +保險 7 +倍 7 +傳說 7 +入侵 7 +公共 7 +公務 7 +公爵 7 +共產 7 +典型 7 +分析 7 +列 7 +前身 7 +創造 7 +匈奴 7 +北角 7 +十字 7 +卡 7 +原著 7 +右 7 +各地 7 +名稱 7 +名義 7 +吳 7 +吸引 7 +命 7 +員工 7 +哲學 7 +唐朝 7 +喬治 7 +回 7 +在此 7 +城堡 7 +城門 7 +基金 7 +場所 7 +大使 7 +天星 7 +天然 7 +失敗 7 +套 7 +奴隸 7 +學術 7 +安排 7 +宋 7 +實現 7 +實行 7 +專科 7 +尋找 7 +尋求 7 +小組 7 +島嶼 7 +左 7 +差異 7 +市區 7 +市民 7 +常常 7 +幣 7 +平原 7 +年級 7 +年輕 7 +店 7 +建國 7 +弗吉尼亞 7 +強調 7 +形象 7 +很少 7 +想像 7 +意識 7 +愛爾蘭 7 +戲 7 +找到 7 +持有 7 +指導 7 +探測 7 +支援 7 +收斂 7 +放 7 +教師 7 +施 7 +旗下 7 +明 7 +最多 7 +本地 7 +某些 7 +校園 7 +核糖 7 +條約 7 +榮譽 7 +樂隊 7 +檢查 7 +款 7 +母音 7 +氏 7 +氣候 7 +水庫 7 +沒 7 +海岸 7 +海洋 7 +混合 7 +清真 7 +港島 7 +湖南 7 +湯姆 7 +滿 7 +激烈 7 +無綫 7 +然後 7 +熊貓 7 +熱 7 +特有 7 +班 7 +現有 7 +現象 7 +球員 7 +球季 7 +理工 7 +甘肅 7 +生態 7 +申請 7 +真實 7 +石油 7 +礁 7 +秘密 7 +移動 7 +空軍 7 +突破 7 +策略 7 +簽訂 7 +約翰 7 +結合 7 +維新 7 +綱 7 +網 7 +翌年 7 +臺灣 7 +興建 7 +興趣 7 +舉辦 7 +航班 7 +航線 7 +艦 7 +茶 7 +著作 7 +衛生 7 +表演 7 +表面 7 +裔 7 +西方 7 +規劃 7 +覺得 7 +觀測 7 +觀點 7 +計算 7 +訪問 7 +設施 7 +評論 7 +調整 7 +講述 7 +議院 7 +讀 7 +貴族 7 +貿易 7 +較小 7 +較為 7 +輛 7 +轟炸 7 +迅速 7 +近年 7 +連接 7 +道德 7 +達成 7 +適合 7 +選出 7 +邏輯 7 +醫學 7 +重點 7 +錄製 7 +鏡頭 7 +長期 7 +長達 7 +關 7 +降低 7 +雖 7 +需求 7 +面對 7 +韓國 7 +領先 7 +領袖 7 +題材 7 +風暴 7 +食用 7 +駐守 7 +體現 7 +體系 7 +高級 7 +高達 7 +魔法 7 +魚 7 +999999 6 +999億 6 +999多 6 +jr 6 +丈夫 6 +上市 6 +上映 6 +乘 6 +事物 6 +二十 6 +亦是 6 +享受 6 +亮 6 +代理 6 +任務 6 +但丁 6 +住 6 +作出 6 +來源 6 +依然 6 +依靠 6 +促進 6 +信號 6 +個體 6 +做法 6 +側 6 +傳 6 +優勢 6 +元朗 6 +全家 6 +公民 6 +公眾 6 +兼 6 +出土 6 +判決 6 +剛 6 +劃分 6 +加工 6 +助理 6 +努力 6 +動力 6 +十八 6 +協議 6 +卡爾 6 +原始 6 +反射 6 +取消 6 +口號 6 +司 6 +司法 6 +含有 6 +吸收 6 +呂 6 +呼吸 6 +咖啡 6 +商品 6 +商店 6 +嘗試 6 +四川 6 +困難 6 +國歌 6 +地產 6 +基 6 +基因 6 +壓力 6 +外國 6 +多樣 6 +大大 6 +大獎 6 +大眾 6 +太空 6 +夫人 6 +奧運 6 +她們 6 +好友 6 +如同 6 +始建 6 +嬴 6 +季節 6 +官方 6 +定居 6 +定義 6 +客運 6 +宣佈 6 +宮 6 +家中 6 +密碼 6 +封 6 +對應 6 +對抗 6 +對象 6 +導演 6 +展覽 6 +島上 6 +師範 6 +席 6 +平等 6 +平面 6 +底 6 +廣告 6 +延伸 6 +強度 6 +形容 6 +形態 6 +形狀 6 +影片 6 +彼此 6 +徒 6 +情感 6 +意 6 +意味 6 +愛 6 +感 6 +懷孕 6 +戀 6 +成熟 6 +成績 6 +成長 6 +手法 6 +打算 6 +批准 6 +投票 6 +授予 6 +提名 6 +搖滾 6 +搜索 6 +操作 6 +擴展 6 +改編 6 +效力 6 +敘利亞 6 +教導 6 +新城 6 +方向 6 +方形 6 +日報 6 +日耳曼 6 +時任 6 +時常 6 +普魯士 6 +更名 6 +最近 6 +朝廷 6 +杯 6 +校區 6 +校長 6 +楚 6 +樹 6 +歌曲 6 +止 6 +死後 6 +民眾 6 +池 6 +河道 6 +流行 6 +海盜 6 +消費 6 +深入 6 +深圳 6 +滅亡 6 +火 6 +無論 6 +版權 6 +牙齒 6 +王朝 6 +玻璃 6 +生存 6 +男友 6 +町 6 +畫 6 +畫家 6 +病毒 6 +發出 6 +發起 6 +發達 6 +短 6 +碑 6 +確認 6 +神奇 6 +神經 6 +禁止 6 +私人 6 +秦國 6 +穆斯林 6 +立刻 6 +立場 6 +童年 6 +端 6 +第七 6 +籃球 6 +米蘭 6 +經典 6 +經驗 6 +緬甸 6 +繪畫 6 +缺乏 6 +羅 6 +美麗 6 +習俗 6 +翡翠 6 +職 6 +能量 6 +色彩 6 +蔣 6 +蕭 6 +藉由 6 +虛擬 6 +血統 6 +行 6 +行走 6 +表明 6 +袁 6 +製成 6 +覆蓋 6 +規則 6 +設置 6 +試驗 6 +詩 6 +詩人 6 +詩歌 6 +該片 6 +說服 6 +說法 6 +論 6 +諮詢 6 +證明 6 +豐富 6 +走 6 +超人 6 +越來越 6 +跑道 6 +路易斯 6 +車展 6 +輿論 6 +近代 6 +返回 6 +退役 6 +通往 6 +通訊 6 +造 6 +進步 6 +過來 6 +選區 6 +遺傳 6 +邀請 6 +邊緣 6 +邱 6 +酒精 6 +醫生 6 +醫療 6 +金融 6 +銷售 6 +開展 6 +開放 6 +阻止 6 +陷入 6 +隊員 6 +階級 6 +隨機 6 +雕刻 6 +離 6 +雲南 6 +電池 6 +非洲 6 +須 6 +顧問 6 +首先 6 +騎兵 6 +黎 6 +9,999,999 5 +99.9萬 5 +999,999 5 +99億 5 +9千億 5 +『 5 +上述 5 +不僅 5 +不好 5 +中立 5 +中間 5 +主流 5 +事故 5 +亞歷山大 5 +亞馬遜 5 +人均 5 +今天 5 +今日 5 +介入 5 +以北 5 +任期 5 +佔據 5 +作家 5 +依舊 5 +侵略 5 +保存 5 +信 5 +信任 5 +信奉 5 +信託 5 +修正 5 +停止 5 +傑出 5 +傳承 5 +傷害 5 +像是 5 +儀式 5 +先 5 +免費 5 +入 5 +公交 5 +公會 5 +兵 5 +其它 5 +其餘 5 +冷卻 5 +分配 5 +分類 5 +列入 5 +別墅 5 +刺激 5 +創建 5 +加熱 5 +加盟 5 +動作 5 +勞工 5 +化合 5 +北海 5 +十二 5 +千 5 +升級 5 +南北 5 +南極 5 +印第安那 5 +參謀 5 +參議 5 +受傷 5 +叫做 5 +史 5 +司馬 5 +各個 5 +合 5 +合法 5 +合理 5 +同盟 5 +名單 5 +否認 5 +呈 5 +呈現 5 +周圍 5 +品牌 5 +哈定 5 +啟超 5 +善 5 +喇嘛 5 +固定 5 +固體 5 +圍 5 +圖像 5 +土耳其 5 +在內 5 +地圖 5 +城區 5 +執政 5 +培養 5 +堅持 5 +堡 5 +場地 5 +壁畫 5 +壘 5 +外科 5 +大氣 5 +大西 5 +如下 5 +如今 5 +妻 5 +始終 5 +孔 5 +學名 5 +學會 5 +學科 5 +宇宙 5 +安裝 5 +官吏 5 +客戶 5 +客體 5 +宮廷 5 +家長 5 +容納 5 +宿舍 5 +察覺 5 +寫作 5 +專利 5 +專家 5 +對外 5 +對此 5 +少數 5 +展出 5 +展開 5 +岸 5 +工 5 +工具 5 +巴西 5 +市政 5 +席位 5 +年度 5 +底部 5 +廈門 5 +廖 5 +廣 5 +廣西 5 +建成 5 +引發 5 +弟弟 5 +得知 5 +微博 5 +心 5 +意思 5 +愛情 5 +感情 5 +感覺 5 +慈善 5 +態度 5 +慶祝 5 +成年 5 +成本 5 +成都 5 +戰國 5 +戰後 5 +房間 5 +手中 5 +手段 5 +托勒密 5 +找 5 +技能 5 +抗議 5 +抵抗 5 +抵達 5 +拜占庭 5 +持續 5 +指定 5 +指示 5 +掌握 5 +排名 5 +接管 5 +推進 5 +措施 5 +提到 5 +換 5 +撤銷 5 +收入 5 +收藏 5 +政務 5 +故宮 5 +教皇 5 +教習 5 +敵人 5 +文忠 5 +文獻 5 +斯 5 +新型 5 +新華 5 +新鮮 5 +方便 5 +方言 5 +施工 5 +旅行 5 +日期 5 +早年 5 +明治 5 +更加 5 +書中 5 +有的 5 +朝 5 +本片 5 +杜 5 +東海 5 +東西 5 +架構 5 +某種 5 +查爾斯 5 +查理 5 +柯林頓 5 +棉花 5 +棒球 5 +極 5 +榜 5 +構成 5 +樓梯 5 +機制 5 +機器 5 +次年 5 +欣賞 5 +歡迎 5 +正常 5 +正確 5 +武裝 5 +歸 5 +殺害 5 +每天 5 +民 5 +民兵 5 +氣體 5 +水果 5 +水系 5 +汞 5 +江西 5 +決策 5 +河北 5 +河南 5 +波音 5 +泥塑 5 +泰安 5 +泳兒 5 +洛桑 5 +洪 5 +海峽 5 +海底 5 +消息 5 +游擊 5 +湖北 5 +溫 5 +溫度 5 +溫泉 5 +滅絕 5 +演化 5 +演奏 5 +漢朝 5 +潘 5 +澤東 5 +澳 5 +濃度 5 +炎 5 +無關 5 +牌 5 +物品 5 +物業 5 +物體 5 +狗 5 +狩獵 5 +王子 5 +珊瑚 5 +現場 5 +現實 5 +甘 5 +生下 5 +生涯 5 +用作 5 +發送 5 +百 5 +直徑 5 +直至 5 +相 5 +真理 5 +眼 5 +督 5 +祖先 5 +神秘 5 +神聖 5 +秋 5 +移居 5 +程 5 +程序 5 +種植 5 +種類 5 +稱作 5 +空氣 5 +穿 5 +突變 5 +競賽 5 +符合 5 +第六 5 +簡單 5 +粵 5 +紅軍 5 +紐西蘭 5 +級別 5 +素 5 +細節 5 +組合 5 +結局 5 +編號 5 +練習 5 +總署 5 +繞 5 +美洲 5 +群島 5 +群眾 5 +耕地 5 +聯絡 5 +聲明 5 +肯定 5 +臺 5 +興奮 5 +興起 5 +般 5 +船上 5 +艘 5 +花 5 +華視 5 +落後 5 +藝人 5 +藥物 5 +蘇格蘭 5 +虎丘 5 +虛構 5 +融合 5 +血壓 5 +行業 5 +裝甲 5 +裝置 5 +裡面 5 +西遊 5 +觀 5 +解散 5 +設備 5 +診斷 5 +該地 5 +該屬 5 +認可 5 +認知 5 +認識 5 +誕生 5 +請 5 +象徵 5 +貝多芬 5 +財產 5 +貨車 5 +質量 5 +赤道 5 +赴 5 +超級 5 +越南 5 +趙國 5 +路易 5 +身亡 5 +軍團 5 +輪 5 +轉移 5 +轉變 5 +辭去 5 +辭職 5 +退出 5 +通車 5 +通道 5 +連 5 +連任 5 +連續 5 +進而 5 +進軍 5 +遠 5 +適當 5 +遭遇 5 +那裡 5 +邦 5 +郗 5 +郵政 5 +鄉鎮 5 +鄰近 5 +醒亞 5 +醫師 5 +鎊 5 +鎮壓 5 +鐵 5 +鑒 5 +長大 5 +長官 5 +長沙 5 +開設 5 +防禦 5 +陝西 5 +院長 5 +陸 5 +階層 5 +障礙 5 +隸屬 5 +難 5 +電梯 5 +電車 5 +青海 5 +預測 5 +預算 5 +預防 5 +領土 5 +頻率 5 +食品 5 +飲料 5 +飾 5 +首相 5 +馬來西亞 5 +馬達 5 +馮 5 +騎士 5 +體積 5 +體色 5 +黑人 5 +龐大 5 +9-9 4 +9.99億 4 +9.9億 4 +9.9萬 4 +b 4 +casey 4 +county 4 +google 4 +john 4 +m9 4 +nba 4 +of 4 +to 4 +you 4 +』 4 +一世 4 +一半 4 +一旦 4 +三世 4 +上演 4 +上訴 4 +下令 4 +下頜 4 +不及 4 +不得 4 +不應 4 +不等 4 +不足 4 +世凱 4 +中東 4 +中止 4 +中華龍鳥 4 +中視 4 +丹羽 4 +主演 4 +乃 4 +久 4 +之上 4 +乘坐 4 +乘客 4 +乾燥 4 +乾隆 4 +了解 4 +予 4 +事變 4 +于 4 +五世 4 +亞軍 4 +交給 4 +交配 4 +交響 4 +亦為 4 +享年 4 +人事 4 +代言 4 +以西 4 +任職 4 +企圖 4 +伊拉克 4 +伺服 4 +供應 4 +依法 4 +侵蝕 4 +保加利亞 4 +保障 4 +信徒 4 +修復 4 +倫理 4 +做出 4 +停留 4 +價 4 +優惠 4 +優秀 4 +兄弟 4 +充電 4 +先進 4 +克拉克 4 +入口 4 +入選 4 +全面 4 +公 4 +公安 4 +公式 4 +共振 4 +其間 4 +具體 4 +冬天 4 +出場 4 +出戰 4 +出發 4 +出租 4 +出色 4 +刀 4 +分佈 4 +分成 4 +分期 4 +列表 4 +則天 4 +則為 4 +前期 4 +前線 4 +前進 4 +劇集 4 +劍 4 +加州 4 +加強 4 +勝利 4 +包 4 +包裝 4 +匈牙利 4 +區劃 4 +十五 4 +協定 4 +協調 4 +南側 4 +南延 4 +印第安 4 +危機 4 +原有 4 +原理 4 +參考 4 +參賽 4 +古物 4 +句 4 +只要 4 +各國 4 +各界 4 +合成 4 +合眾 4 +合金 4 +吉 4 +吉他 4 +同事 4 +同治 4 +名將 4 +名詞 4 +呎 4 +呢 4 +周年 4 +命運 4 +哈爾濱 4 +哥倫比亞 4 +商人 4 +啟用 4 +喬治亞 4 +單車 4 +嘲諷 4 +回來 4 +國防 4 +圓形 4 +地底 4 +地形 4 +地面 4 +坊 4 +基辛格 4 +堅決 4 +墓 4 +夏 4 +外星 4 +夜 4 +夢 4 +大同 4 +大帝 4 +大臣 4 +天皇 4 +夫婦 4 +失望 4 +妹妹 4 +姐姐 4 +姓氏 4 +委任 4 +婚姻 4 +婦女 4 +媽媽 4 +學堂 4 +官員 4 +定律 4 +宣稱 4 +實業 4 +實體 4 +寶貝 4 +小學 4 +少女 4 +尼龍 4 +局部 4 +展示 4 +屯門 4 +山區 4 +山頂 4 +岩 4 +島式 4 +島津 4 +嶺 4 +巡迴 4 +帶給 4 +常用 4 +幅度 4 +幫 4 +平方英里 4 +年齡 4 +幽默 4 +度假 4 +庫 4 +廚房 4 +廢除 4 +廷 4 +影像 4 +影業 4 +往 4 +很快 4 +很難 4 +後者 4 +得分 4 +得名 4 +得寵 4 +循環 4 +微 4 +徵召 4 +志願 4 +快 4 +怎麼 4 +性別 4 +性質 4 +恐龍 4 +患者 4 +情報 4 +情形 4 +情節 4 +情緒 4 +慕尼黑 4 +應該 4 +戀愛 4 +成人 4 +成分 4 +戰敗 4 +戰死 4 +扮演 4 +批判 4 +技巧 4 +抒情 4 +拓展 4 +招募 4 +指數 4 +按照 4 +挪威 4 +排列 4 +排水 4 +排行 4 +接收 4 +接替 4 +推薦 4 +推行 4 +揚州 4 +擔心 4 +擴大 4 +擴建 4 +擴張 4 +收到 4 +收錄 4 +改造 4 +攻克 4 +敖 4 +教區 4 +教宗 4 +教練 4 +整理 4 +數千 4 +數字 4 +文泰 4 +新疆 4 +新竹 4 +旅客 4 +既 4 +日常 4 +昆明 4 +明基 4 +星球 4 +星等 4 +春秋 4 +時段 4 +晉國 4 +晚間 4 +暗示 4 +暴力 4 +更換 4 +曼德拉 4 +最低 4 +最好 4 +最長 4 +有用 4 +服裝 4 +望遠 4 +木材 4 +本作 4 +本土 4 +本科 4 +本線 4 +本魚 4 +東區 4 +某 4 +校舍 4 +格 4 +案件 4 +楚國 4 +樂 4 +樂器 4 +標本 4 +樞紐 4 +模仿 4 +橄欖 4 +檔 4 +檢測 4 +欲 4 +歌 4 +正月 4 +此前 4 +此次 4 +步兵 4 +武術 4 +歷任 4 +死神 4 +殺死 4 +毀 4 +母 4 +每秒 4 +比例 4 +毫克 4 +毫米 4 +水深 4 +永江 4 +污泥 4 +沈 4 +沉澱 4 +沙灘 4 +河川 4 +油價 4 +治 4 +法案 4 +法蘭克 4 +法規 4 +波希米亞 4 +波斯 4 +注入 4 +洛杉磯 4 +洛陽 4 +流經 4 +浦 4 +海域 4 +海外 4 +海德堡 4 +海戰 4 +海水 4 +海灣 4 +海面 4 +液態 4 +液體 4 +深 4 +測量 4 +港鐵 4 +湯 4 +源 4 +滬 4 +滿貫 4 +潮濕 4 +濟南 4 +灣仔 4 +火災 4 +炸藥 4 +烏克蘭 4 +無意 4 +無線 4 +無錫 4 +照片 4 +營 4 +營業 4 +父 4 +爽 4 +牛奶 4 +牧場 4 +特遣 4 +特點 4 +犯罪 4 +狀元 4 +狂 4 +狙擊 4 +獎勵 4 +王后 4 +珍珠 4 +現今 4 +現金 4 +球會 4 +理事 4 +理想 4 +琉球 4 +瑪麗 4 +瓷器 4 +甘珠爾 4 +生化 4 +生意 4 +產地 4 +產量 4 +留 4 +畝 4 +當天 4 +當年 4 +當日 4 +當然 4 +疫苗 4 +癌症 4 +發育 4 +發言 4 +發酵 4 +皮膚 4 +監獄 4 +直 4 +直升 4 +直隸 4 +相比 4 +相近 4 +省份 4 +省委 4 +省級 4 +真相 4 +督察 4 +矩陣 4 +短暫 4 +短篇 4 +研發 4 +社團 4 +神廟 4 +神達 4 +票房 4 +租界 4 +種族 4 +稱號 4 +空調 4 +突然 4 +立即 4 +童 4 +競爭 4 +等級 4 +節日 4 +簽約 4 +粒 4 +精確 4 +紋理 4 +納粹 4 +純 4 +終止 4 +終結 4 +維吾爾 4 +網球 4 +緊密 4 +總量 4 +總長 4 +繼 4 +繼任 4 +罕見 4 +罪名 4 +置 4 +羅伯特 4 +羅馬尼亞 4 +義務 4 +習慣 4 +老闆 4 +考察 4 +考試 4 +聖 4 +聖母 4 +聲稱 4 +聲譽 4 +背景 4 +胡佛 4 +自動 4 +船隻 4 +艙 4 +艱難 4 +苦艾 4 +草本 4 +荷蘭 4 +莊 4 +莊園 4 +莫斯科 4 +華航 4 +落成 4 +著重 4 +董 4 +蒙扎 4 +蓉蓉 4 +薩摩 4 +蘇家 4 +蘋果 4 +蛋白 4 +蜘蛛 4 +血栓 4 +行省 4 +術語 4 +衛星 4 +製 4 +西側 4 +西曼 4 +親 4 +親王 4 +評價 4 +評定 4 +詞語 4 +試 4 +該劇 4 +該區 4 +詹姆斯 4 +誰 4 +課程 4 +談話 4 +請求 4 +論文 4 +識字 4 +警署 4 +議長 4 +讀者 4 +負 4 +財富 4 +財政 4 +貨幣 4 +貨物 4 +貨運 4 +費 4 +賀 4 +資本 4 +資深 4 +資金 4 +賈 4 +質 4 +購買 4 +贊助 4 +起義 4 +足 4 +身上 4 +身分 4 +躲避 4 +車序 4 +軍人 4 +軍力 4 +軍官 4 +軍閥 4 +較多 4 +較少 4 +較高 4 +輸入 4 +輻射 4 +輻鰭魚 4 +轄下 4 +轉換 4 +辦事 4 +辦法 4 +辦理 4 +農民 4 +逃往 4 +這麼 4 +週期 4 +進口 4 +進球 4 +進程 4 +遂 4 +遊行 4 +過度 4 +過枝 4 +遷移 4 +遼寧 4 +邊境 4 +邵 4 +部落 4 +郵票 4 +重視 4 +野生 4 +量子 4 +金字 4 +針對 4 +銅鑼 4 +鋼琴 4 +錯誤 4 +鏡片 4 +鏡面 4 +長子 4 +長江 4 +門診 4 +開 4 +開幕 4 +開闢 4 +關心 4 +防守 4 +阿拉伯 4 +院校 4 +陽光 4 +隊伍 4 +階 4 +隔離 4 +雕塑 4 +雨水 4 +電 4 +電力 4 +電台 4 +電磁 4 +電訊 4 +靜態 4 +靜脈 4 +非法 4 +靠近 4 +音 4 +順 4 +順位 4 +預期 4 +頭 4 +題 4 +願意 4 +風險 4 +颱風 4 +飛 4 +飼養 4 +餐 4 +餘下 4 +首領 4 +體內 4 +體長 4 +高架 4 +高溫 4 +鬥爭 4 +鳥類 4 +黃埔 4 +黑 4 +黑色 4 +黨籍 4 +鼓勵 4 +! 3 +9.9% 3 +9.999 3 +9999萬 3 +99多 3 +99餘 3 +center 3 +close 3 +game 3 +gdp 3 +h9n9 3 +iii 3 +james 3 +mappy 3 +new 3 +psp 3 +°c 3 +─ 3 +・ 3 +一同 3 +丁 3 +三十 3 +三江 3 +上游 3 +上環 3 +上表 3 +上課 3 +上面 3 +下列 3 +下台 3 +下場 3 +下級 3 +不但 3 +不想 3 +不敵 3 +不明 3 +不遠 3 +不韋 3 +世 3 +世宗 3 +丟失 3 +中古 3 +中子 3 +中將 3 +中期 3 +中西 3 +中轉 3 +中風 3 +丹佛 3 +丹尼士 3 +主任 3 +主動 3 +主唱 3 +主機 3 +主編 3 +主辦 3 +主體 3 +之內 3 +之時 3 +也好 3 +互聯 3 +五角 3 +些 3 +亞當 3 +亞目 3 +亞視 3 +交往 3 +京都 3 +亮度 3 +人性 3 +人次 3 +人生 3 +人身 3 +他人 3 +付出 3 +以往 3 +以為 3 +以致 3 +任內 3 +任教 3 +份子 3 +企鵝 3 +伊恩 3 +伊賀 3 +休息 3 +估計 3 +伸出 3 +似 3 +伽利略 3 +住戶 3 +住院 3 +佔有 3 +佛 3 +佛像 3 +佛學 3 +佛羅倫薩 3 +作霖 3 +併 3 +來自 3 +例子 3 +供奉 3 +供給 3 +依 3 +依賴 3 +俄亥俄 3 +俘虜 3 +保安 3 +保育 3 +保證 3 +信德 3 +修建 3 +修道 3 +個別 3 +個性 3 +倖存 3 +候選 3 +借用 3 +倪 3 +值 3 +值得 3 +偉大 3 +偏 3 +停 3 +停車 3 +備受 3 +傳奇 3 +傳教 3 +傳染 3 +傾向 3 +優異 3 +允許 3 +元洪 3 +光源 3 +光緒 3 +克里米亞 3 +兒女 3 +內陸 3 +全市 3 +全縣 3 +全體 3 +公國 3 +公尺 3 +公轉 3 +六十 3 +共計 3 +兵力 3 +兼任 3 +冊封 3 +冷 3 +凱撒 3 +出使 3 +出口 3 +出獄 3 +函數 3 +分散 3 +分行 3 +分裂 3 +分解 3 +切斷 3 +刊物 3 +列為 3 +利 3 +制定 3 +前後 3 +前鋒 3 +前面 3 +剛好 3 +創意 3 +劇團 3 +劇本 3 +劇目 3 +劇院 3 +劍橋 3 +力學 3 +加利福尼亞 3 +勞倫斯 3 +匈 3 +化工 3 +北冕 3 +北洋 3 +區別 3 +十七 3 +十多 3 +升 3 +升任 3 +升格 3 +南昌 3 +南海 3 +占 3 +印象 3 +即將 3 +卻是 3 +厘米 3 +原則 3 +原告 3 +原料 3 +友誼 3 +取 3 +受損 3 +叛亂 3 +口徑 3 +古城 3 +可惜 3 +台中 3 +史上 3 +各州 3 +各省 3 +同名 3 +同性 3 +同情 3 +名譽 3 +告訴 3 +周邊 3 +呼聲 3 +和也 3 +和約 3 +品種 3 +哥哥 3 +哲 3 +哺乳 3 +唱 3 +喜愛 3 +單一 3 +嘉賓 3 +器官 3 +噴泉 3 +嚴格 3 +四世 3 +回應 3 +回歸 3 +國泰 3 +國籍 3 +國軍 3 +圍繞 3 +園區 3 +土 3 +土壤 3 +在任 3 +在場 3 +地中 3 +地勢 3 +地獄 3 +地理 3 +坡 3 +報導 3 +場合 3 +塊 3 +塑造 3 +塘 3 +塞爾維亞 3 +填充 3 +填海 3 +填補 3 +境地 3 +墜毀 3 +士官 3 +壯 3 +壯觀 3 +夏天 3 +外來 3 +外界 3 +外部 3 +多達 3 +大夫 3 +大家 3 +大師 3 +大橋 3 +大權 3 +大致 3 +大賽 3 +大選 3 +天國 3 +天王 3 +天空 3 +太 3 +太小 3 +失業 3 +奇異 3 +奈米 3 +契約 3 +奧 3 +奪取 3 +女巫 3 +女王 3 +女神 3 +好評 3 +如何 3 +妃 3 +妨礙 3 +委派 3 +委託 3 +威力 3 +威尼斯 3 +威爾士 3 +威爾斯 3 +娃娃 3 +娶 3 +嫁給 3 +嫌疑 3 +嬌嬌 3 +子女 3 +孟席斯 3 +孫子 3 +學府 3 +宇 3 +安 3 +安德烈 3 +安徽 3 +安置 3 +宋朝 3 +完備 3 +完善 3 +完工 3 +完美 3 +宏 3 +宗 3 +官僚 3 +定 3 +宣傳 3 +宣告 3 +室內 3 +宰相 3 +家寶 3 +家裡 3 +家鄉 3 +富有 3 +富江 3 +寒冷 3 +實在 3 +實施 3 +封閉 3 +射入 3 +射擊 3 +專用 3 +尊 3 +對方 3 +對比 3 +導航 3 +小吃 3 +小堂 3 +小孩 3 +小平 3 +少 3 +尖 3 +就算 3 +尼山 3 +局面 3 +屈 3 +屋邨 3 +屠 3 +屯 3 +州長 3 +已婚 3 +已知 3 +巴哈伊 3 +巴斯 3 +布庫 3 +布袋 3 +師傅 3 +帶到 3 +帶走 3 +帶頭 3 +帽 3 +帽子 3 +平台 3 +平方呎 3 +平方英尺 3 +平民 3 +平衡 3 +年間 3 +幸福 3 +幹線 3 +幼 3 +幾何 3 +序列 3 +度過 3 +康 3 +庾 3 +延續 3 +延長 3 +建業 3 +弓毛 3 +引入 3 +引力 3 +引用 3 +弟子 3 +弱小 3 +強制 3 +強壯 3 +彈 3 +彈簧 3 +彭 3 +彰化 3 +影視 3 +往來 3 +往後 3 +征服 3 +待 3 +待遇 3 +很好 3 +很高 3 +後人 3 +後方 3 +後衛 3 +後面 3 +徑 3 +徒步 3 +復 3 +復工 3 +復辟 3 +徵收 3 +德克薩斯 3 +德川 3 +德意志 3 +德綱 3 +徹底 3 +心情 3 +必然 3 +必要 3 +忽略 3 +思潮 3 +怡和 3 +急速 3 +性格 3 +怪物 3 +怪獸 3 +恩來 3 +悠久 3 +情書 3 +想到 3 +想法 3 +愛上 3 +愛國 3 +愛達荷 3 +感應 3 +慢慢 3 +憑藉 3 +憤怒 3 +懷疑 3 +懸崖 3 +成份 3 +成千上萬 3 +成果 3 +戒毒 3 +截止 3 +戰俘 3 +戰時 3 +戰艦 3 +戲劇 3 +戶 3 +房地產 3 +房子 3 +手下 3 +扎維耶 3 +扭曲 3 +扶手 3 +承受 3 +承擔 3 +投手 3 +抗戰 3 +抵擋 3 +拆除 3 +拉丁 3 +拯救 3 +持 3 +指令 3 +挑戰 3 +挺 3 +捐助 3 +捕捉 3 +捷克 3 +授 3 +排出 3 +探討 3 +接任 3 +接唱 3 +控 3 +提議 3 +換乘 3 +損失 3 +損害 3 +搬到 3 +撞擊 3 +播映 3 +撰寫 3 +擔當 3 +據說 3 +擴充 3 +支付 3 +支撐 3 +支流 3 +收回 3 +收拾 3 +收購 3 +改制 3 +改善 3 +改稱 3 +改進 3 +攻打 3 +放射 3 +故障 3 +救 3 +敘述 3 +教養 3 +文人 3 +文件 3 +料理 3 +斯里蘭卡 3 +新增 3 +新建 3 +新教 3 +新村 3 +新羅 3 +旁遮普 3 +族群 3 +日內瓦 3 +日後 3 +日間 3 +旨 3 +明星 3 +明珠 3 +明納努 3 +昏迷 3 +易 3 +星光 3 +星際 3 +星雲 3 +映射 3 +昭和 3 +是否 3 +時機 3 +時空 3 +晚 3 +晚年 3 +晨興 3 +普選 3 +景德 3 +景點 3 +晶體 3 +暗 3 +暨 3 +暫時 3 +暴動 3 +更為 3 +書店 3 +曼聯 3 +替 3 +替代 3 +替換 3 +最佳 3 +最為 3 +會堂 3 +月氏 3 +月球 3 +有利 3 +有機 3 +有權 3 +有趣 3 +服役 3 +服用 3 +朝日 3 +期望 3 +木板 3 +本來 3 +村民 3 +杜蘭戈 3 +杰 3 +東側 3 +東港 3 +東面 3 +板塊 3 +柏立基 3 +某個 3 +栃木 3 +校名 3 +核電 3 +栽培 3 +栽種 3 +桃 3 +桃園 3 +桃浦 3 +梅 3 +梅妃 3 +梅莉迪絲 3 +條例 3 +極度 3 +極端 3 +概念 3 +概率 3 +榮 3 +榮聲 3 +槍 3 +槍手 3 +樂曲 3 +樂章 3 +樊 3 +模擬 3 +機率 3 +檢察 3 +檸檬 3 +權力 3 +權勢 3 +權益 3 +次子 3 +次日 3 +歌劇 3 +正義 3 +正選 3 +步槍 3 +步道 3 +死傷 3 +死去 3 +毀滅 3 +比起 3 +民進 3 +氣壓 3 +氣泡 3 +氧化 3 +氧氣 3 +水上 3 +水域 3 +水塔 3 +水族 3 +水溝 3 +水稻 3 +永遠 3 +求救 3 +江南 3 +江孜 3 +污水 3 +決議 3 +沒收 3 +沖 3 +沙烏地阿拉伯 3 +油脂 3 +沼澤 3 +沿著 3 +法人 3 +法學 3 +法官 3 +波動 3 +波士頓 3 +波長 3 +泰國 3 +洋房 3 +洋行 3 +洗浴 3 +活佛 3 +活力 3 +流動 3 +流感 3 +流量 3 +海上 3 +海珊 3 +消滅 3 +淋巴 3 +淘汰 3 +淡水 3 +清代 3 +渝 3 +港口 3 +湖水 3 +湯瑪斯 3 +準則 3 +溥儀 3 +溫帶 3 +溶解 3 +滉 3 +滑冰 3 +漂亮 3 +漢城 3 +漳州 3 +潛入 3 +火箭 3 +災難 3 +為期 3 +無數 3 +煙草 3 +照相 3 +煩惱 3 +熱庫 3 +熱能 3 +爬行 3 +爭 3 +爲 3 +牆壁 3 +牛 3 +牛津 3 +物資 3 +特化 3 +狹窄 3 +獎學 3 +獎項 3 +獲利 3 +獲取 3 +獵食 3 +獻給 3 +率領 3 +王室 3 +珠海 3 +班納蒂克 3 +現任 3 +球場 3 +理解 3 +瑞草 3 +環 3 +生前 3 +生成 3 +生殖 3 +產下 3 +用品 3 +用戶 3 +用法 3 +用途 3 +田 3 +男女 3 +男孩 3 +畢 3 +畫作 3 +異常 3 +當今 3 +當作 3 +當初 3 +疑問 3 +病故 3 +瘋狂 3 +登上 3 +登基 3 +登場 3 +登陸 3 +發揮 3 +發源 3 +白人 3 +百科 3 +皇 3 +皇室 3 +盟友 3 +盟旗 3 +監管 3 +目錄 3 +直系 3 +直線 3 +直選 3 +相反 3 +相機 3 +相遇 3 +真人 3 +真武 3 +眼睛 3 +睡蓮 3 +瞭解 3 +知情 3 +短尾貓 3 +短短 3 +破曉 3 +破產 3 +碎片 3 +碳 3 +碳化 3 +確保 3 +確立 3 +社群 3 +祖父 3 +神父 3 +票價 3 +福 3 +福島 3 +福斯 3 +禮 3 +禮儀 3 +禮拜 3 +秀 3 +科系 3 +科隆 3 +租借 3 +租賃 3 +種種 3 +積極 3 +窯瓷 3 +立陶宛 3 +章 3 +童話 3 +競選 3 +竹子 3 +第八 3 +第十 3 +第十三 3 +筆 3 +算 3 +管道 3 +箱 3 +節慶 3 +節省 3 +籍 3 +精通 3 +精選 3 +糧食 3 +紅磡 3 +紅色 3 +紛爭 3 +素貞 3 +紡織 3 +索馬利亞 3 +細小 3 +終點 3 +組建 3 +組裝 3 +結成 3 +維京 3 +維吉爾 3 +維基 3 +維多利亞 3 +編劇 3 +總共 3 +總數 3 +總結 3 +總體 3 +繪製 3 +繼位 3 +纖維 3 +缺席 3 +缺點 3 +置富 3 +羊肉 3 +美利堅 3 +翁 3 +老鼠 3 +考 3 +考古 3 +考驗 3 +而非 3 +耶穌 3 +聖誕 3 +聖靈 3 +聘請 3 +聚會 3 +聯手 3 +聯軍 3 +聲勢 3 +職位 3 +股價 3 +股票 3 +胡安 3 +膜 3 +自傳 3 +自我 3 +自稱 3 +自身 3 +自願 3 +至少 3 +致力 3 +致命 3 +臺南 3 +臼齒 3 +舞蹈 3 +航海 3 +航程 3 +船員 3 +艾女 3 +艾滋 3 +芭比 3 +英九 3 +范 3 +茶葉 3 +草食 3 +莽 3 +華格納 3 +菲利普斯 3 +萊姆 3 +萊茵 3 +著稱 3 +蒙大拿 3 +蒙特內哥羅 3 +蒸汽 3 +蓬勃 3 +薩魯曼 3 +藍 3 +藍色 3 +藤 3 +藥 3 +藩 3 +蘇州 3 +虎鯨 3 +蜀 3 +衍生 3 +衙門 3 +衛 3 +衛視 3 +衝擊 3 +衣服 3 +表 3 +裁判 3 +裏 3 +補給 3 +裝 3 +裝飾 3 +複合 3 +複製 3 +西安 3 +西洋 3 +西湖 3 +西納 3 +西門 3 +西關 3 +西面 3 +見到 3 +規格 3 +視頻 3 +親自 3 +計畫 3 +記憶 3 +評估 3 +評審 3 +該寺 3 +該書 3 +該校 3 +該站 3 +該鎮 3 +誠實 3 +誤認 3 +說明 3 +課室 3 +諷刺 3 +諸多 3 +謀殺 3 +謂 3 +謝 3 +證實 3 +識別 3 +護照 3 +譽 3 +讀書 3 +變形 3 +變數 3 +變體 3 +讚賞 3 +象 3 +貝爾 3 +貴妃 3 +貴州 3 +買 3 +買家 3 +費德勒 3 +費雪 3 +資助 3 +賓夕法尼亞 3 +賢 3 +賦予 3 +走廊 3 +起訴 3 +越 3 +足協 3 +距 3 +路徑 3 +跳 3 +踢 3 +身邊 3 +車體 3 +較大 3 +較長 3 +輔助 3 +輔導 3 +輔政 3 +輸出 3 +轄 3 +轄區 3 +轉乘 3 +轉到 3 +轉投 3 +轉讓 3 +農 3 +近期 3 +迫 3 +迫使 3 +追逐 3 +退休 3 +逃離 3 +逐步 3 +通信 3 +通用 3 +通行 3 +速食 3 +逢 3 +連環 3 +連線 3 +逮捕 3 +週 3 +逾 3 +遇見 3 +遊樂 3 +運營 3 +過渡 3 +道光 3 +達也 3 +違法 3 +遠航 3 +適應 3 +遷徙 3 +選 3 +選手 3 +選秀 3 +遺體 3 +邊界 3 +那裏 3 +邦聯 3 +都市 3 +鄉議 3 +配 3 +配樂 3 +配置 3 +酗酒 3 +酸 3 +釀酒 3 +釋放 3 +重傷 3 +重整 3 +金庫 3 +金庸 3 +金鐘 3 +銅 3 +鋼 3 +錄 3 +錄音 3 +鍵 3 +鎳 3 +鐵人 3 +長安 3 +長州 3 +長相 3 +長遠 3 +開播 3 +關節 3 +阿兒法 3 +阿拉斯加 3 +阿根廷 3 +阿爾卑斯 3 +附屬 3 +降 3 +降落 3 +陣營 3 +除外 3 +陵 3 +陸地 3 +陸續 3 +際 3 +隱藏 3 +隱語 3 +雅典 3 +雌雄 3 +雙立 3 +雜技 3 +難度 3 +雪梨 3 +雪莉 3 +雲 3 +零 3 +零售 3 +雷睦斯 3 +靈 3 +青島 3 +鞏固 3 +音頻 3 +頂層 3 +順利 3 +預先 3 +頭銜 3 +頻譜 3 +題寫 3 +額 3 +風 3 +食 3 +飲食 3 +飾演 3 +首任 3 +首演 3 +馬來 3 +馬來亞 3 +馬德里 3 +驅動 3 +驅逐 3 +體操 3 +高低槓 3 +高原 3 +高層 3 +高山 3 +高麗 3 +鳳山 3 +麥田 3 +黃金 3 +黑子 3 +黑斑 3 +黑洞 3 +黛比 3 +黨員 3 +龍 3 +龍馬 3 +$ 2 +' 2 +... 2 +...... 2 +9.9999999 2 +99%-99% 2 +99.9億 2 +999.9 2 +999.99999 2 +999.9999999999999 2 +999.99億 2 +9999.9 2 +9999/99 2 +9999多 2 +9999餘 2 +99:99 2 +99a 2 +99° 2 +9:99 2 +9d 2 +9億9千9百萬 2 +9百萬 2 +9萬 2 +aac 2 +abc 2 +ai 2 +aldridge 2 +and 2 +arts 2 +bbc 2 +before 2 +boy 2 +c 2 +dc-99 2 +de 2 +dj 2 +dna 2 +e 2 +e9 2 +e99 2 +europipe 2 +eve 2 +f-99a 2 +fc 2 +finn 2 +gcmg 2 +gravion 2 +hall 2 +ii 2 +ipod 2 +jason 2 +jean 2 +k 2 +karin 2 +km/h 2 +l 2 +la 2 +lee 2 +live 2 +m9999 2 +n999 2 +nasa 2 +nds 2 +net 2 +nicea 2 +orochi 2 +p 2 +phillips 2 +pvc 2 +rivers 2 +robert 2 +s 2 +silver 2 +station 2 +strait 2 +tvb 2 +u99 2 +ua 2 +v 2 +winston 2 +wyclef 2 +x 2 +xii 2 +‧ 2 +〈 2 +〉 2 +一中 2 +一共 2 +一千 2 +一向 2 +一度 2 +一手 2 +一提 2 +一貫 2 +一面 2 +七喜 2 +三棟屋 2 +三氯化金 2 +三藏 2 +上下車 2 +上任 2 +上佳 2 +上午 2 +上吊 2 +上將 2 +上層 2 +上方 2 +上校 2 +上街 2 +下去 2 +下層 2 +下屬 2 +下旬 2 +下水 2 +下海 2 +下游 2 +下野 2 +不一 2 +不停 2 +不再 2 +不列顛 2 +不受 2 +不夠 2 +不如 2 +不宜 2 +不已 2 +不幸 2 +不法 2 +不清 2 +不用 2 +不管 2 +不良 2 +不論 2 +不變 2 +不錯 2 +不需 2 +不願 2 +丐幫 2 +世俗 2 +世博 2 +世卿 2 +世襲 2 +世錦 2 +丘陵 2 +中區 2 +中午 2 +中天 2 +中巴 2 +中正 2 +中途 2 +中道 2 +中遠 2 +主上 2 +主力 2 +主因 2 +主場 2 +主權 2 +主管 2 +主線 2 +乘船 2 +乘車 2 +乙級 2 +九一八 2 +九州 2 +九巴 2 +也有 2 +也許 2 +乳酪 2 +事後 2 +二十六 2 +二甘醇 2 +互動 2 +五四 2 +五峰 2 +五百 2 +井 2 +亞冠 2 +亞利桑那 2 +亞得里亞 2 +交 2 +交互 2 +交到 2 +交匯 2 +交好 2 +交情 2 +交戰 2 +交手 2 +交趾 2 +人力 2 +人心 2 +人才 2 +人文 2 +人格 2 +人熙 2 +人群 2 +人間 2 +人魚 2 +仁慈 2 +仁記 2 +今年 2 +介乎 2 +介石 2 +仍舊 2 +付款 2 +仙 2 +仙劍 2 +仙女 2 +以南 2 +以東 2 +以至 2 +任城 2 +任天堂 2 +任意 2 +份額 2 +仿 2 +伊比利亞 2 +伏威 2 +休閒 2 +伯公 2 +伯爵 2 +伯靈頓 2 +伴隨 2 +似乎 2 +低地 2 +低廉 2 +低溫 2 +住房 2 +佐土原 2 +佐藤 2 +佛山 2 +佛朗明哥 2 +佛殿 2 +作好 2 +作業 2 +作物 2 +佩劍 2 +併入 2 +使命 2 +使者 2 +使館 2 +來訪 2 +例 2 +例外 2 +供暖 2 +供熱 2 +供職 2 +侵 2 +侵入 2 +侵犯 2 +便宜 2 +促使 2 +促成 2 +俗稱 2 +保有 2 +保級 2 +保羅 2 +保衛 2 +信心 2 +信義 2 +信長 2 +信雄 2 +修士 2 +修理 2 +修習 2 +修訂 2 +修鍊 2 +個案 2 +倒台 2 +倒掛 2 +候鳥 2 +倡導 2 +倫 2 +假如 2 +假期 2 +假髮 2 +偏差 2 +停戰 2 +停滯 2 +偶然 2 +偶爾 2 +偽造 2 +傑作 2 +備 2 +催化 2 +傳入 2 +傳到 2 +傳動 2 +傳媒 2 +傳導 2 +傳授 2 +傳聞 2 +傳言 2 +傳送 2 +傳達 2 +債務 2 +傷 2 +傾聽 2 +僅僅 2 +僱員 2 +儀錶 2 +儒家 2 +優先 2 +儲備 2 +元代 2 +元件 2 +元帥 2 +元年 2 +元洲 2 +元璋 2 +元甲 2 +元首 2 +充斥 2 +充當 2 +兆帕 2 +先知 2 +先行 2 +先驅 2 +光線 2 +光譜 2 +光軸 2 +克基拉 2 +克用 2 +克隆 2 +免職 2 +入伍 2 +入圍 2 +入學 2 +入獄 2 +入讀 2 +入門 2 +內務 2 +內外 2 +內心 2 +內流 2 +全新 2 +全日 2 +全校 2 +全權 2 +全能 2 +全身 2 +八一 2 +八百餘 2 +公學 2 +公寓 2 +公署 2 +公認 2 +兵營 2 +其父 2 +具備 2 +典禮 2 +再造 2 +冬季 2 +冰兄 2 +冰峰 2 +冰川 2 +冰雪 2 +凡 2 +凱特 2 +凱瑞 2 +出入 2 +出入口 2 +出家 2 +出席 2 +出演 2 +出產 2 +出賽 2 +出道 2 +分享 2 +分化 2 +分區 2 +分手 2 +分擔 2 +分歧 2 +分隊 2 +刊載 2 +列傳 2 +列出 2 +初學 2 +初年 2 +初稿 2 +初級 2 +初賽 2 +判斷 2 +判處 2 +別 2 +別列佐夫斯基 2 +利比亞 2 +利物浦 2 +利特維年科 2 +到來 2 +到底 2 +制止 2 +制裁 2 +制訂 2 +刺 2 +刺客 2 +刺死 2 +刻 2 +刻有 2 +削弱 2 +前任 2 +前來 2 +前妻 2 +前途 2 +剝奪 2 +剩下 2 +副本 2 +創 2 +創始 2 +創新 2 +創業 2 +劃入 2 +劃給 2 +劇烈 2 +劍術 2 +劍齒虎 2 +功 2 +功率 2 +加之 2 +加勒比 2 +加堆 2 +加重 2 +劣勢 2 +助戰 2 +勒格里 2 +勒沃 2 +動機 2 +動脈 2 +動車 2 +勝出 2 +勳 2 +勳章 2 +勳銜 2 +勾引 2 +包圍 2 +包廂 2 +包衣 2 +匕首 2 +化纖 2 +化身 2 +北宋 2 +北平 2 +北方 2 +北端 2 +北約 2 +北道 2 +北齊 2 +匯率 2 +區分 2 +十一世 2 +十三 2 +十六 2 +升學 2 +半山 2 +半球 2 +協商 2 +協奏 2 +協約 2 +南下 2 +南山 2 +南斯拉夫 2 +南遣 2 +南邊 2 +南陽 2 +南非 2 +南面 2 +南韓 2 +博弈 2 +博彩 2 +博恩 2 +占卜 2 +卡梅隆 2 +卡片 2 +印 2 +印加 2 +印尼 2 +印製 2 +即位 2 +即時 2 +即興 2 +卷 2 +卿 2 +卿雲 2 +厄運 2 +原名 2 +原址 2 +原聲 2 +去除 2 +參觀 2 +參選 2 +又是 2 +又稱 2 +及格 2 +友好 2 +反叛 2 +反抗 2 +反擊 2 +叔叔 2 +取決 2 +受審 2 +受益 2 +受體 2 +口中 2 +口述 2 +古巴 2 +古柯鹼 2 +古蹟 2 +召喚 2 +可汗 2 +史學 2 +史密斯 2 +史提夫 2 +史蒂芬 2 +右岸 2 +司機 2 +司長 2 +司鼓 2 +吃肉 2 +吃飯 2 +各式 2 +各式各樣 2 +各級 2 +各自 2 +各部 2 +合同 2 +合川 2 +合稱 2 +合葬 2 +吉布斯 2 +吉林 2 +吉里巴斯 2 +同人 2 +同居 2 +同體 2 +名人 2 +名利 2 +名古屋 2 +名縉 2 +名鎮 2 +向量 2 +君 2 +君王 2 +吞併 2 +否則 2 +否定 2 +告別 2 +告知 2 +告終 2 +周歲 2 +味 2 +呼叫 2 +呼籲 2 +和解 2 +和談 2 +咬金 2 +品行 2 +哈里發 2 +哥斯大黎加 2 +哥本哈根 2 +哥特 2 +哪裡 2 +售賣 2 +唯有 2 +唯美 2 +問 2 +啟動 2 +啟睿 2 +啟航 2 +啟蒙 2 +善化 2 +善意 2 +喉嚨 2 +喜劇 2 +喝 2 +喪生 2 +喬伊斯 2 +喬艾爾 2 +單元 2 +單曲 2 +喻 2 +嘉慶 2 +嘉玲 2 +器物 2 +噪音 2 +噴氣 2 +嚴密 2 +囚禁 2 +四分之一 2 +四十 2 +回國 2 +回想 2 +回憶 2 +回收 2 +國代 2 +國外 2 +國寶 2 +國徽 2 +國璋 2 +國語 2 +國鋒 2 +圍攻 2 +園藝 2 +圓頂 2 +圖樣 2 +圖畫 2 +團結 2 +團聚 2 +團長 2 +在位 2 +在來 2 +地外 2 +地帶 2 +坐診 2 +型態 2 +埃 2 +埃米莉 2 +城中 2 +城子 2 +域名 2 +執導 2 +執掌 2 +執教 2 +執法 2 +基底 2 +堂區 2 +堅 2 +堅固 2 +堅強 2 +報紙 2 +場場 2 +塞普勒斯 2 +境外 2 +墓地 2 +墓室 2 +增多 2 +增建 2 +增強 2 +增設 2 +墮胎 2 +壓倒 2 +壓強 2 +壓迫 2 +士 2 +壯大 2 +壯年 2 +夏伊 2 +夏季 2 +外傳 2 +外圍 2 +外在 2 +外援 2 +外觀 2 +外資 2 +多倫多 2 +多半 2 +多少 2 +夜晚 2 +夠 2 +夥伴 2 +大亂 2 +大佛 2 +大公 2 +大力 2 +大勝 2 +大半 2 +大堂 2 +大妃 2 +大將 2 +大屋 2 +大廳 2 +大批 2 +大敗 2 +大槍 2 +大火 2 +大碟 2 +大笨 2 +大街 2 +大衛 2 +大連 2 +大阪 2 +天地 2 +天子 2 +天師 2 +天敵 2 +天氣 2 +天衣 2 +天雷 2 +太古 2 +太多 2 +太大 2 +太子 2 +太守 2 +太祖 2 +夸脫 2 +奉天 2 +契合 2 +奢侈 2 +奧布賴恩 2 +奧斯曼 2 +奧朗則布 2 +奧林匹克 2 +奪冠 2 +女士 2 +女孩 2 +女皇 2 +妖精 2 +妖魔 2 +妥善 2 +姊妹 2 +始皇 2 +姐妹 2 +姐弟 2 +姑家 2 +姓 2 +姓名 2 +姚 2 +姜 2 +姿態 2 +威斯康辛 2 +威爾遜 2 +娘舅 2 +婆婆 2 +嫉妒 2 +子夜 2 +子珍 2 +孔子 2 +字元 2 +字型 2 +字體 2 +存有 2 +存活 2 +孟能 2 +季前 2 +季軍 2 +孤僻 2 +孤獨 2 +孵化 2 +學制 2 +學問 2 +學士 2 +學年 2 +學期 2 +學童 2 +學系 2 +學費 2 +宇一郎 2 +守衛 2 +安修 2 +安息 2 +安打 2 +安東尼 2 +安菲特裡忒 2 +安邑 2 +完 2 +完整 2 +宏觀 2 +宗室 2 +官職 2 +定下 2 +定名 2 +定型 2 +定期 2 +客 2 +客串 2 +客人 2 +客室 2 +客機 2 +客車 2 +宣戰 2 +害怕 2 +家久 2 +家境 2 +家屬 2 +家產 2 +家衛 2 +家貓 2 +寄宿 2 +密切 2 +密蘇里 2 +富 2 +富人 2 +富特 2 +實力 2 +實務 2 +實用 2 +實習 2 +審 2 +審判 2 +審查 2 +寫道 2 +寬廣 2 +寬頻 2 +寬鬆 2 +寶石 2 +寺廟 2 +寺院 2 +封神 2 +封面 2 +射殺 2 +專區 2 +專員 2 +專有 2 +專題 2 +尉 2 +尊嚴 2 +尊重 2 +尋常 2 +對峙 2 +對待 2 +對陣 2 +小兒 2 +小姐 2 +小心 2 +小桃 2 +小梅 2 +小鎮 2 +小閻 2 +小青 2 +就任 2 +就業 2 +尺 2 +尼克森 2 +尼羅 2 +尼西亞 2 +尼采 2 +尾部 2 +局限 2 +居 2 +居委 2 +居里 2 +屋大維 2 +屋苑 2 +展 2 +展館 2 +履仁 2 +屬名 2 +山丘 2 +山坡 2 +山海 2 +岩石 2 +岳母 2 +岳父 2 +崇拜 2 +崔西 2 +崖 2 +嵌 2 +嶺南 2 +嶽麓 2 +川 2 +工兵 2 +工商 2 +工農 2 +工黨 2 +左上 2 +左側 2 +巧眉 2 +巧言 2 +差距 2 +差點 2 +巴克特里亞 2 +巴勒斯坦 2 +巴哈歐拉 2 +巴格曼 2 +巴格達 2 +巴洛克 2 +巴爾幹 2 +巴納德 2 +巷 2 +市值 2 +市內 2 +市商 2 +市郊 2 +市長 2 +布卡 2 +布拉格 2 +布朗 2 +布爾薩 2 +布魯明頓 2 +希羅 2 +帕洛馬 2 +帛琉 2 +帶去 2 +帶有 2 +常務 2 +常年 2 +常德 2 +常春藤葉 2 +常規 2 +幫忙 2 +干擾 2 +干涉 2 +干預 2 +平 2 +平安 2 +平息 2 +平成 2 +平方尺 2 +平時 2 +平頂 2 +年初 2 +年紀 2 +年譜 2 +幼體 2 +床墊 2 +序數 2 +底層 2 +店鋪 2 +度母 2 +座堂 2 +庫夫 2 +庫容 2 +庭 2 +康乃爾 2 +康復 2 +廉租 2 +廠房 2 +廢 2 +廢墟 2 +廢止 2 +廣安 2 +廣義 2 +延任 2 +延遲 2 +建有 2 +建銘 2 +引種 2 +引退 2 +引進 2 +弗朗索瓦 2 +強風 2 +彈奏 2 +彈性 2 +彌迦 2 +彙集 2 +彩色 2 +影展 2 +往返 2 +征 2 +征戰 2 +很近 2 +後端 2 +後裔 2 +徒刑 2 +得票 2 +得道 2 +從小 2 +從而 2 +從軍 2 +御苑 2 +微山 2 +德州 2 +德瑞克 2 +德輔 2 +心中 2 +必 2 +志剛 2 +快樂 2 +忽必烈 2 +思念 2 +思明 2 +思科 2 +性交 2 +恆鳳 2 +恐慌 2 +恥辱 2 +恩 2 +恩寵 2 +恩賜 2 +悅強 2 +悲觀 2 +情意 2 +惠 2 +惠山 2 +愈 2 +愉景 2 +意志 2 +意願 2 +愛因斯坦 2 +愛德華 2 +愛惜 2 +感動 2 +感受 2 +感染 2 +慈幼 2 +慈鯛 2 +態 2 +慘敗 2 +慣例 2 +慶尚 2 +慶豐 2 +慾望 2 +憎恨 2 +憑 2 +應對 2 +懊惱 2 +懷俄明 2 +懷舊 2 +懸浮 2 +懸索 2 +成仙 2 +成傑 2 +成因 2 +成型 2 +成就 2 +成群 2 +成貓 2 +戰亂 2 +戰列 2 +戰場 2 +戰士 2 +戰線 2 +戰術 2 +戴 2 +戴麟趾 2 +房 2 +手冊 2 +手動 2 +手機 2 +手裡 2 +才能 2 +打工 2 +打敗 2 +打破 2 +打開 2 +托爾斯 2 +托爾斯泰 2 +扶植 2 +找出 2 +找回 2 +找尋 2 +承諾 2 +抄襲 2 +抓 2 +抓住 2 +投影 2 +投降 2 +抗擊 2 +抽取 2 +拆穿 2 +拆解 2 +拉斐爾 2 +拉格 2 +拔出 2 +拖 2 +拖延 2 +招商 2 +招股 2 +拷貝 2 +拼音 2 +拿 2 +拿到 2 +拿破崙 2 +拿走 2 +指引 2 +指控 2 +指涉 2 +按鍵 2 +挖角 2 +挽救 2 +捐贈 2 +捕 2 +捕獲 2 +捕食 2 +捷運 2 +掉 2 +排 2 +排放 2 +排氣 2 +排演 2 +掛架 2 +掠過 2 +掠食 2 +採訪 2 +接待 2 +接掌 2 +接種 2 +接駁 2 +控球 2 +推廣 2 +推翻 2 +推選 2 +描寫 2 +提倡 2 +提及 2 +提示 2 +插圖 2 +揚聲 2 +換入 2 +換股 2 +損傷 2 +損毀 2 +搞笑 2 +搭檔 2 +搶險 2 +摩根 2 +摩爾 2 +撤出 2 +撤軍 2 +播 2 +播客 2 +擅長 2 +擊 2 +擊退 2 +擒抱 2 +擔負 2 +據守 2 +擺脫 2 +擾動 2 +支出 2 +支柱 2 +收 2 +收復 2 +收發 2 +收穫 2 +收視 2 +收集 2 +改回 2 +改寫 2 +改建 2 +改版 2 +改良 2 +改裝 2 +攻佔 2 +攻陷 2 +放映 2 +放置 2 +政協 2 +政變 2 +政黨 2 +故意 2 +故此 2 +故鄉 2 +效忠 2 +效率 2 +敏 2 +敏感 2 +敗給 2 +教友 2 +教員 2 +教徒 2 +教派 2 +教科文 2 +整修 2 +整套 2 +整體 2 +敵對 2 +數位 2 +數十 2 +數理 2 +數目 2 +文元 2 +文官 2 +文帝 2 +文康 2 +文英 2 +文華 2 +文革 2 +斐濟 2 +斥資 2 +斯圖爾特 2 +斯大林 2 +斯氏星蟒 2 +斯洛維尼亞 2 +斯特勒謝尼 2 +斯理 2 +新宿 2 +新岩 2 +新曲 2 +新澤西 2 +新田 2 +新興 2 +斷裂 2 +方位 2 +方針 2 +施行 2 +旁 2 +旁邊 2 +旅鴿 2 +旋律 2 +日喀則 2 +日本龍 2 +日益 2 +日航 2 +日行 2 +早上 2 +旺山 2 +旺盛 2 +昆士蘭 2 +昌 2 +明帝 2 +易名 2 +昔日 2 +星形 2 +春天 2 +春日 2 +是否是 2 +時尚 2 +時速 2 +晉升 2 +晚上 2 +晚會 2 +普 2 +普及 2 +普陀 2 +景 2 +景帝 2 +景觀 2 +景象 2 +智慧 2 +暑假 2 +暗殺 2 +暢銷 2 +暫停 2 +暫緩 2 +暴露 2 +曝氣 2 +更好 2 +更改 2 +更深 2 +更高 2 +書信 2 +書寫 2 +書房 2 +書法 2 +最久 2 +最小 2 +最少 2 +最新 2 +最遊 2 +會員 2 +會場 2 +會社 2 +會談 2 +會長 2 +月刊 2 +有助 2 +有意 2 +有毒 2 +有罪 2 +服 2 +服從 2 +朔日 2 +朝代 2 +木星 2 +木管 2 +末年 2 +末期 2 +本區 2 +本屆 2 +本班 2 +本站 2 +本質 2 +本願 2 +朴 2 +村落 2 +村頭 2 +束縛 2 +杭 2 +東亞 2 +東吳 2 +東山 2 +東征 2 +東晉 2 +東正 2 +東視 2 +松潘 2 +松鼠猴 2 +林庄 2 +果實 2 +果汁 2 +架設 2 +柏油 2 +染色 2 +柔佛 2 +柔弱 2 +查 2 +查德 2 +柯 2 +柱 2 +柳 2 +柳江 2 +柴油 2 +柴灣 2 +校內 2 +校隊 2 +核能 2 +根本 2 +格式 2 +格林 2 +格林維爾 2 +格格 2 +格檔 2 +格里高利 2 +桃太洛斯 2 +桌面 2 +桑葚 2 +棕熊 2 +棣 2 +植被 2 +楊樹 2 +業者 2 +極大 2 +極性 2 +極高 2 +榮獲 2 +樁 2 +樂農 2 +標語 2 +標題 2 +樞機 2 +樟湖 2 +模具 2 +樣本 2 +樹木 2 +橙 2 +機動 2 +機員 2 +機槍 2 +橡膠 2 +橫山 2 +橫濱 2 +橫跨 2 +檢索 2 +檢討 2 +權威 2 +權貴 2 +次數 2 +次級 2 +次要 2 +次郎 2 +次長 2 +欺騙 2 +歌仔 2 +歌唱 2 +歌聲 2 +歌迷 2 +正是 2 +正直 2 +正統 2 +正面 2 +此人 2 +此案 2 +此物 2 +此種 2 +此線 2 +此舉 2 +此類 2 +步態 2 +武大 2 +武昌 2 +武松 2 +歧視 2 +歸類 2 +死靈 2 +殘存 2 +殘忍 2 +殘酷 2 +殯葬 2 +殺傷 2 +殺掉 2 +每位 2 +每周 2 +每層 2 +每日 2 +每次 2 +毒性 2 +毒殺 2 +毒藥 2 +毗鄰 2 +毛利 2 +民不聊生 2 +民調 2 +民都洛水牛 2 +氣 2 +氣田 2 +氧 2 +氫彈 2 +氯金酸 2 +水孔 2 +水手 2 +水準 2 +水溫 2 +水滸 2 +水質 2 +水道 2 +水餃 2 +永嘉 2 +永寧 2 +汗位 2 +汝霖 2 +江北 2 +江戶 2 +池尻 2 +決心 2 +決戰 2 +沃夫 2 +沖繩 2 +沙 2 +沙咀 2 +沙柏 2 +沙河 2 +沙龍 2 +河水 2 +泉州 2 +法蘭西 2 +法醫 2 +泡沫 2 +波塞摩斯 2 +注射 2 +注重 2 +泰坦 2 +洗 2 +洗手 2 +洛辛堡 2 +洞 2 +活 2 +活性 2 +派出 2 +派別 2 +派駐 2 +流傳 2 +流失 2 +流求 2 +流派 2 +流通 2 +浙東 2 +浩劫 2 +浮冰 2 +海南 2 +海涌 2 +海豹 2 +海邊 2 +海關 2 +消化 2 +消失 2 +涌 2 +涮 2 +淄博 2 +淮 2 +淮河 2 +深厚 2 +深得 2 +深愛 2 +深遠 2 +淹沒 2 +添加 2 +清晨 2 +清楚 2 +清華 2 +清鍾 2 +減輕 2 +游牧 2 +湖州 2 +湘 2 +湯興 2 +溝通 2 +溪流 2 +溫和 2 +溫州 2 +溫暖 2 +滄州 2 +滅口 2 +滙豐 2 +滬東 2 +滿足 2 +漁業 2 +漂流 2 +演說 2 +漢佛瑞 2 +漢口 2 +漸漸 2 +潔 2 +潭西 2 +潮州 2 +澤普 2 +澳底 2 +激光 2 +激戰 2 +激起 2 +濃縮 2 +濕原 2 +濕度 2 +濟寧 2 +濱松 2 +濱湖 2 +瀏覽 2 +灌木 2 +灘 2 +火藥 2 +灰狼 2 +灰色 2 +災害 2 +炮台 2 +為數 2 +烏孫 2 +無力 2 +無效 2 +無界 2 +無緣 2 +無辜 2 +無黨 2 +焦耳 2 +然 2 +煙熏 2 +照料 2 +照顧 2 +煮制 2 +熊 2 +熊隻 2 +熱比婭 2 +熱衷 2 +燃燒 2 +燒毀 2 +燒餅 2 +燕山 2 +爪 2 +爪獸 2 +爭取 2 +爭執 2 +爭辯 2 +爭霸 2 +父子 2 +爾後 2 +牆體 2 +片段 2 +牙買加 2 +牛仔 2 +牛肉 2 +牧師 2 +牧養 2 +物價 2 +特使 2 +特性 2 +特權 2 +特種 2 +犬隻 2 +犬齒 2 +犯 2 +狀 2 +狐狸 2 +猛烈 2 +猛虎 2 +猶他 2 +猶豫 2 +獅 2 +獎章 2 +獎金 2 +獨居 2 +獨自 2 +獵奇 2 +獵殺 2 +玄 2 +玄機 2 +率軍 2 +玉帶 2 +玉門 2 +王位 2 +王妃 2 +玩 2 +玩具 2 +珀西 2 +珍 2 +珍品 2 +現址 2 +現狀 2 +球迷 2 +理念 2 +琪 2 +琴 2 +琴行 2 +瑪利亞 2 +瑪納斯 2 +瑪莉 2 +瑪麗亞 2 +環島 2 +環形 2 +環球 2 +環礁 2 +瓊璘 2 +瓜分 2 +瓦爾那 2 +甚 2 +甚少 2 +甚麼 2 +甜甜 2 +生日 2 +生母 2 +生病 2 +產值 2 +產區 2 +產物 2 +用地 2 +用電 2 +由來 2 +由衷 2 +甲 2 +甲板 2 +甲醇 2 +申花 2 +男爵 2 +町村 2 +留存 2 +留學 2 +留意 2 +留香 2 +畜 2 +番 2 +畫上 2 +畫報 2 +異性 2 +當事 2 +當代 2 +當前 2 +當場 2 +當成 2 +疫情 2 +病人 2 +病理 2 +痕迹 2 +登記 2 +登輝 2 +發售 2 +發回 2 +發掘 2 +發覺 2 +發音 2 +白紙 2 +白金漢 2 +白馬 2 +百度 2 +皇子 2 +皇宮 2 +皮 2 +皮埃蒙特 2 +盆子 2 +益世 2 +盟校 2 +監察 2 +監製 2 +監視 2 +直人 2 +直布羅陀 2 +直轄 2 +直通 2 +相戀 2 +相等 2 +相識 2 +相連 2 +省立 2 +看似 2 +看法 2 +真宗 2 +真情 2 +真的 2 +眼鏡 2 +眾人 2 +睡衣 2 +矚目 2 +矛盾 2 +知節 2 +短面熊 2 +矮人 2 +石化 2 +石原 2 +石家 2 +砍柴 2 +研製 2 +研討 2 +砲 2 +硅 2 +硫磺 2 +硬 2 +硬體 2 +碎石 2 +碘 2 +碧翠絲 2 +碩士 2 +確實 2 +磅 2 +磨損 2 +礦 2 +礦業 2 +示 2 +示威 2 +社交 2 +祂 2 +祕教 2 +神代 2 +票 2 +祺瑞 2 +福來 2 +福利 2 +福部 2 +福音 2 +禮節 2 +禽龍 2 +秀全 2 +秀吉 2 +秋天 2 +科幻 2 +科爾多瓦 2 +科羅拉多 2 +科赫 2 +科雷馬 2 +秘魯 2 +租客 2 +移除 2 +稀有 2 +稅 2 +程式 2 +種姓 2 +稱呼 2 +稱臣 2 +稱讚 2 +稻盛 2 +穆罕默德 2 +積分 2 +空缺 2 +穿耳 2 +突出 2 +突厥 2 +突擊 2 +窟 2 +立下 2 +立憲 2 +立熙 2 +站台 2 +竟然 2 +竣工 2 +童星 2 +競技 2 +競馬 2 +笑話 2 +笨 2 +第九 2 +第十一 2 +筆下 2 +等到 2 +等待 2 +策 2 +策劃 2 +管弦 2 +管治 2 +節奏 2 +簡 2 +簡易 2 +簽署 2 +籃壇 2 +籃子 2 +籌建 2 +籤 2 +米利特 2 +米格 2 +米爾扎 2 +精度 2 +精武 2 +精液 2 +精緻 2 +精美 2 +精采 2 +糖 2 +糖份 2 +紀 2 +約克 2 +約定俗成 2 +約會 2 +約瑟夫 2 +紅木 2 +紅麴 2 +紋 2 +紐卡斯爾 2 +紓緩 2 +純淨 2 +純粹 2 +紙 2 +紙幣 2 +紛紛 2 +素質 2 +索引 2 +細緻 2 +終 2 +組長 2 +結 2 +結晶 2 +結識 2 +絕望 2 +統稱 2 +絲綢 2 +經紀 2 +經費 2 +綠 2 +維 2 +維修 2 +維吉尼亞 2 +維鈞 2 +網上 2 +網友 2 +網民 2 +緊鄰 2 +線粒 2 +線西 2 +編入 2 +編寫 2 +編製 2 +緩存 2 +緩慢 2 +緬因 2 +縣城 2 +縣治 2 +縣長 2 +縱橫 2 +縱貫 2 +總值 2 +總會 2 +總監 2 +總管 2 +總額 2 +繁忙 2 +繁榮 2 +繁殖 2 +繚 2 +繞城 2 +繪圖 2 +續篇 2 +續約 2 +罪 2 +罪案 2 +罪行 2 +署名 2 +署長 2 +罷黜 2 +罹患 2 +羅丹 2 +羅塞塔 2 +羅斯 2 +羅斯基勒 2 +羅斯提 2 +羅漢 2 +羅素 2 +羅貝爾 2 +羊 2 +羊曲 2 +羊毛 2 +羌 2 +美女 2 +義 2 +義和 2 +習 2 +習性 2 +翦 2 +翻新 2 +翻越 2 +翼 2 +老年 2 +老式 2 +老舍 2 +考場 2 +考證 2 +耕作 2 +耕種 2 +耳道 2 +耶和華 2 +耶律 2 +耶魯 2 +聖三 2 +聖地 2 +聘任 2 +聚 2 +聚合 2 +聚居 2 +聯 2 +聯名 2 +聰明 2 +聲名 2 +聲望 2 +聲道 2 +聽 2 +肉糕 2 +肉食 2 +肖 2 +肖像 2 +肖金 2 +肝臟 2 +股 2 +股權 2 +肢 2 +肯尼迪 2 +育才 2 +育種 2 +肺炎 2 +胎兒 2 +胖子 2 +能源 2 +能級 2 +腎 2 +腓特烈 2 +腳 2 +腳趾 2 +腹面 2 +腺葉木犀欖 2 +膝蓋 2 +膠質 2 +臘汁 2 +臣民 2 +臨床 2 +臨淄 2 +臨近 2 +臨邑 2 +自主 2 +自助 2 +自家 2 +自殺 2 +自衛 2 +自轉 2 +臭氧 2 +至於 2 +致 2 +致死 2 +臺中 2 +臺北 2 +興化 2 +舉 2 +舉人 2 +舉動 2 +舊址 2 +舒服 2 +舒適 2 +舞台 2 +船尾 2 +船廠 2 +船艦 2 +船長 2 +艇 2 +艦艇 2 +艱苦 2 +色度 2 +色素 2 +花卉 2 +花崗 2 +花樣 2 +花費 2 +苗 2 +若干 2 +若是 2 +苦惱 2 +英俊 2 +英超 2 +英雄 2 +茨威格 2 +荷花 2 +荷蘭豬 2 +莆田 2 +莉拉 2 +莎拉 2 +莎莉 2 +莫名 2 +莫泊桑 2 +莫爾庫斯 2 +莫雷爾 2 +莫高 2 +菁英 2 +菌 2 +菩薩 2 +華夏 2 +華隆 2 +華麗 2 +菲 2 +萊特 2 +萬宜 2 +萬春 2 +萬萬 2 +落入 2 +落差 2 +葉子 2 +葉海亞 2 +葉片 2 +著想 2 +著迷 2 +葛馮 2 +葡萄牙 2 +葵盛 2 +蒂羅爾 2 +蒐集 2 +蒙 2 +蒙山 2 +蒙蔽 2 +蒸餾 2 +蓄電 2 +蓮屬 2 +蔬菜 2 +蔭權 2 +薩 2 +薩達姆 2 +薪資 2 +藉 2 +藉口 2 +藉著 2 +藍調 2 +藍鯨 2 +藏在 2 +藝員 2 +蘇丹 2 +蘇爾曼 2 +蘇維埃 2 +蘇黎世 2 +蘭 2 +虎豹 2 +虐待 2 +虔誠 2 +處境 2 +蛇 2 +蛇夫 2 +蛇類 2 +螺旋 2 +蠟燭 2 +蠻族 2 +血清 2 +血緣 2 +行李 2 +行程 2 +行車 2 +行駛 2 +術士 2 +街區 2 +衛冕 2 +衛戍 2 +表皮 2 +袋中 2 +裁定 2 +補充 2 +補助 2 +裝病 2 +製冷 2 +製片 2 +西元 2 +西區 2 +西沙 2 +西甲 2 +西站 2 +西鄰 2 +西鐵 2 +西門子 2 +西雅圖 2 +要素 2 +要職 2 +見義勇為 2 +見證 2 +規範 2 +視覺 2 +親密 2 +親屬 2 +親情 2 +親戚 2 +親緣 2 +親近 2 +觀世音 2 +觀塘 2 +觀賞 2 +角宿 2 +角逐 2 +解 2 +解鎖 2 +解體 2 +言論 2 +訂婚 2 +訂購 2 +計 2 +討伐 2 +記號 2 +許可 2 +訴說 2 +註冊 2 +評議 2 +評選 2 +詞彙 2 +詩篇 2 +詮釋 2 +話語 2 +該廟 2 +該車 2 +該館 2 +誕辰 2 +誘發 2 +語堂 2 +誤導 2 +說唱 2 +課 2 +課題 2 +調動 2 +調料 2 +調景 2 +論壇 2 +諸侯 2 +諸葛 2 +諾貝爾 2 +謎 2 +謙虛 2 +講 2 +謠言 2 +證件 2 +證券 2 +證據 2 +譜 2 +譜寫 2 +警報 2 +警官 2 +警方 2 +警長 2 +譯 2 +譯名 2 +譯法 2 +護士 2 +護法 2 +變得 2 +變換 2 +變更 2 +變異 2 +谷 2 +象棋 2 +豪華 2 +豫 2 +貓頭鷹 2 +貝爾尼納 2 +財務 2 +財困 2 +財團 2 +財物 2 +貨櫃 2 +販子 2 +貪污 2 +貴人 2 +買下 2 +買來 2 +賀氏 2 +資方 2 +資產 2 +賠償 2 +賢妃 2 +質子 2 +質疑 2 +質素 2 +購入 2 +購物 2 +賽馬 2 +赤川 2 +走出 2 +走路 2 +起飛 2 +趁機 2 +超越 2 +越低 2 +越獄 2 +越遠 2 +越高 2 +趕出 2 +趨 2 +趨同 2 +路上 2 +路口 2 +身 2 +身長 2 +躲過 2 +車中 2 +車廂 2 +車資 2 +車隊 2 +軍區 2 +軍校 2 +軍法 2 +載重 2 +輔音 2 +輕 2 +輕傷 2 +輕型 2 +輕視 2 +輟學 2 +轄境 2 +轄有 2 +轉介 2 +轉車 2 +轎車 2 +轟動一時 2 +辛亥 2 +辣妹 2 +辦 2 +辭退 2 +辯護 2 +農地 2 +農場 2 +農曆 2 +農田 2 +農藥 2 +近藤 2 +近衛 2 +迫害 2 +迴避 2 +迷 2 +迷信 2 +迷幻 2 +追溯 2 +退化 2 +送 2 +送入 2 +逃 2 +逃出 2 +逃避 2 +透明 2 +逐鹿 2 +通報 2 +通婚 2 +通知 2 +通稱 2 +通航 2 +速寫 2 +速率 2 +造出 2 +造船 2 +連同 2 +連帶 2 +連鎖 2 +週年 2 +進修 2 +進出口 2 +進化 2 +進駐 2 +遇到 2 +遊仙 2 +遊客 2 +遊玩 2 +運河 2 +運用 2 +運轉 2 +過世 2 +過勞 2 +過年 2 +過於 2 +過海 2 +過關 2 +道場 2 +道生 2 +達爾 2 +違反 2 +遞歸 2 +遠東 2 +遭受 2 +遴選 2 +遵守 2 +遷 2 +遷往 2 +選中 2 +選拔 2 +選民 2 +選為 2 +遺囑 2 +遺產 2 +遺跡 2 +遼東 2 +還珠 2 +那些 2 +那樣 2 +邦初 2 +郊外 2 +部下 2 +部件 2 +部族 2 +郵件 2 +都柏林 2 +都統 2 +鄭國 2 +鄭氏 2 +鄰國 2 +配合 2 +配對 2 +酒吧 2 +酒泉 2 +酒醉 2 +醜聞 2 +醫藥 2 +釉下 2 +里昂 2 +里程 2 +重修 2 +重型 2 +重華 2 +重言 2 +重返 2 +重重 2 +重量 2 +野獸 2 +量表 2 +金山 2 +金星 2 +金牌 2 +金蓮 2 +金雞 2 +金馬 2 +鈞 2 +銀 2 +鋅 2 +鋼鐵 2 +錢 2 +錫金 2 +鍋 2 +鍵盤 2 +鐘錶 2 +鐵伊 2 +鐵達尼 2 +鑄造 2 +長久 2 +長城 2 +長女 2 +長春 2 +長老 2 +長者 2 +長興 2 +長蘆 2 +長軸 2 +長音 2 +門前 2 +門口 2 +門戶 2 +門齒 2 +開創 2 +開口 2 +開心 2 +開拍 2 +開採 2 +開會 2 +開火 2 +開羅 2 +開賽 2 +開通 2 +開門 2 +開除 2 +閏年 2 +間接 2 +間隙 2 +閘門 2 +閱讀 2 +闊 2 +關注 2 +關聯 2 +關說 2 +關鍵 2 +關門 2 +關閉 2 +防範 2 +防衛 2 +阻擋 2 +阻礙 2 +阿 2 +阿保機 2 +阿姆斯特丹 2 +阿格 2 +阿美 2 +附帶 2 +降級 2 +降解 2 +除籍 2 +陰霾 2 +陵墓 2 +陵寢 2 +陶瓷 2 +陷阱 2 +陽澄 2 +隆頭魚 2 +隊友 2 +隊長 2 +隋 2 +隋代 2 +隔 2 +隕石 2 +隨之 2 +集成 2 +集資 2 +集雨 2 +集體 2 +雍正 2 +雕像 2 +離任 2 +離婚 2 +離心 2 +雪貂 2 +雲想 2 +零星 2 +電動 2 +電壓 2 +電流 2 +電纜 2 +電能 2 +電路 2 +電鐵 2 +震動 2 +震盪 2 +震驚 2 +霍 2 +霍普 2 +霸主 2 +霸王 2 +靈素 2 +青聯 2 +青藏 2 +青銅 2 +靜電 2 +面臨 2 +面試 2 +面部 2 +音系 2 +音變 2 +韻律 2 +頂 2 +順序 2 +預備 2 +預定 2 +預言 2 +預計 2 +頒布 2 +頒發 2 +頓 2 +領地 2 +頭等 2 +頭部 2 +願望 2 +顧 2 +顯得 2 +顯聖 2 +顯著 2 +風俗 2 +風景 2 +風氣 2 +風濕 2 +風雲 2 +風靡 2 +食夢 2 +食材 2 +飢荒 2 +飯 2 +飲品 2 +飲用 2 +餘額 2 +館藏 2 +饑荒 2 +饒舌 2 +首位 2 +首播 2 +首爾 2 +首腦 2 +首部 2 +香蕉 2 +馬丁 2 +馬克思 2 +馬克斯 2 +馬其頓 2 +馬拉松 2 +馬歇爾 2 +馬耳他 2 +馬里奧 2 +駐紮 2 +駐足 2 +駕駛 2 +骨 2 +骨頭 2 +骨髓 2 +體制 2 +體力 2 +體型 2 +體校 2 +體重 2 +體驗 2 +高低 2 +高壓 2 +高平 2 +高校 2 +高止 2 +高能 2 +高興 2 +高郵 2 +鬆散 2 +鬼 2 +魅力 2 +魏 2 +魚雷 2 +魚頭 2 +魯殊 2 +鮮明 2 +鯉形 2 +鯉科 2 +鰂魚 2 +鱸形 2 +鳥取 2 +鳥綱 2 +鳳凰 2 +鳳翔 2 +鹼 2 +鹿 2 +鹿兒 2 +麗珠 2 +麗茲 2 +麥克塞 2 +麥爾斯 2 +麻河 2 +麻省 2 +黃帝 2 +黃色 2 +黑幫 2 +黑貓 2 +黑龍 2 +黔 2 +點擊 2 +點數 2 +點球 2 +黨派 2 +鼎盛 2 +鼠疫 2 +齊 2 +齊克果 2 +齒擦 2 +齒軌 2 +齧齒 2 +龐家堡 2 +$9,999 1 +$99,999 1 ++ 1 +-99 1 +-999 1 +9--9 1 +9-9.9 1 +9.99% 1 +9.999% 1 +9.9999萬 1 +9.99萬 1 +9/9 1 +99-99 1 +999-999lr 1 +999.999 1 +9999-9999 1 +999cm 1 +999m 1 +999x 1 +999萬9千餘 1 +999餘 1 +999餘萬 1 +99b 1 +99萬9千 1 +9c 1 +9f 1 +9nd 1 +9億9999萬 1 +9億9千萬 1 +9成 1 +9百多萬 1 +9萬億 1 +9萬多 1 +`` 1 +a9 1 +a999-999 1 +aankhen 1 +abante 1 +abdurrahman 1 +ac 1 +académie 1 +activision 1 +adilabad 1 +adisumarmo 1 +admiral 1 +advance 1 +aeg 1 +aek 1 +aero 1 +aeromobile 1 +aethra 1 +afd 1 +ages 1 +airlines 1 +airport 1 +aleksej 1 +alliance 1 +alpha 1 +alyssum 1 +amorc 1 +android 1 +anne 1 +antarctic 1 +architecture 1 +argonauts 1 +arwadi 1 +arzacq-arraziguet 1 +asteroid 1 +auld 1 +auteuil 1 +avenue 1 +aviation 1 +aviv 1 +b9 1 +bad 1 +baldwin 1 +ballklub 1 +bank 1 +bar 1 +baronet 1 +barros 1 +barsbold 1 +beatles 1 +beaune 1 +beaune-sud 1 +beckham 1 +beinasco 1 +belgaum 1 +bellagio 1 +berg 1 +berne-belp 1 +besar 1 +bhcs 1 +blake 1 +books 1 +boot 1 +bransoni 1 +brett 1 +brian 1 +briann 1 +bronfenbrenner 1 +brough 1 +bruce 1 +bt 1 +bud 1 +caen 1 +calling 1 +campaign 1 +campostoma 1 +can 1 +canal 1 +cannon 1 +capital 1 +caret 1 +caroline 1 +castle 1 +cathedral 1 +cbe 1 +cec 1 +cerro 1 +cet 1 +ceyhan 1 +chapman 1 +chase 1 +chau 1 +chell 1 +christopher 1 +chrome 1 +churchill 1 +ci-9999 1 +cit999b 1 +city 1 +claritin 1 +clark 1 +cms 1 +cnzz 1 +cohen 1 +colchis 1 +color 1 +comic 1 +company 1 +connecticut 1 +conroy 1 +copper 1 +cornell 1 +cost 1 +costa 1 +council 1 +cp 1 +cpu 1 +crh999b 1 +crh999b-999 1 +crh999c 1 +crypton 1 +cushing 1 +cálida 1 +daisuke 1 +dakota 1 +damrosch 1 +daria 1 +dark 1 +dart 1 +dawn 1 +ddc 1 +dennis 1 +derby 1 +desanctis 1 +devasthanam 1 +dfh9 1 +dialogue 1 +digi 1 +digibook 1 +direct 1 +director 1 +divisione 1 +dmfc 1 +dog 1 +doodle 1 +dorian 1 +dossing 1 +double 1 +dragon 1 +ds 1 +dsm 1 +durst 1 +earth 1 +eden 1 +el 1 +electronic 1 +elisabeth 1 +ellie 1 +elliot 1 +eminescu 1 +end 1 +entertainment 1 +entity 1 +epa 1 +epithema 1 +epstein 1 +estate 1 +et 1 +exe 1 +expedition 1 +f(x) 1 +falls 1 +family 1 +fernando 1 +films 1 +firefox 1 +firozpur 1 +fleet 1 +fly 1 +fook 1 +forever 1 +fortran 1 +fox 1 +frank 1 +franpipe 1 +fred 1 +frito-lay 1 +fsb 1 +fudosi 1 +fund 1 +g 1 +g(x) 1 +g99a 1 +galliano 1 +gb 1 +gbest 1 +gear 1 +geophysical 1 +german 1 +ghost 1 +gibbs 1 +giuliano 1 +golden 1 +good 1 +goodnow 1 +government 1 +grant 1 +greater 1 +greenbelt 1 +greenville 1 +groening 1 +ground 1 +group 1 +gto 1 +guariglia 1 +halifax 1 +hangar 1 +harry 1 +harvey 1 +hau 1 +haven 1 +hear 1 +heh 1 +herrera 1 +herschel 1 +higher 1 +hillman 1 +hiv 1 +holy 1 +hondt 1 +hopkins 1 +housing 1 +hp 1 +humphrey 1 +hunt 1 +i 1 +ib 1 +igbt 1 +igy 1 +illumination 1 +in 1 +india 1 +ingeri 1 +innocence 1 +international 1 +ipark 1 +iphone 1 +iron 1 +isartor 1 +ischl 1 +it's 1 +itunes 1 +iupac 1 +iv 1 +jay 1 +jazz 1 +jeff 1 +johnson 1 +jpl 1 +justice 1 +justin 1 +juvisy 1 +kansas 1 +karaköy 1 +karlstor 1 +kate 1 +kekal 1 +kenway 1 +kilpatrick 1 +kingfisher 1 +kink.com 1 +kinross 1 +kkr 1 +km 1 +knudstrup 1 +koffka 1 +kurnool 1 +kurt 1 +langdon 1 +langford 1 +language 1 +last 1 +laurifolia 1 +lcd 1 +ld99 1 +leaf 1 +lees 1 +lennart 1 +lethal 1 +liability 1 +liaoxipterus 1 +lilim 1 +linux 1 +liu 1 +lomidine 1 +loratadin 1 +lotz 1 +low 1 +lowell 1 +maddie 1 +magic 1 +magma 1 +mallride 1 +mamaia 1 +man 1 +managing 1 +manea 1 +maolan 1 +maria 1 +mario 1 +market 1 +marshlands 1 +martin 1 +mayflower 1 +md-99 1 +mechernich 1 +medical 1 +menachem 1 +merina 1 +methala 1 +metress 1 +meyers 1 +michaelerkirche 1 +micro 1 +micro-usm 1 +middle 1 +mihai 1 +mintz 1 +mitchell 1 +mm 1 +modern 1 +mogens 1 +money 1 +monsters 1 +montana 1 +morus 1 +multitier 1 +mundell 1 +museum 1 +my 1 +myers 1 +n99 1 +n=9 1 +name 1 +nanocells 1 +natasha 1 +nazionale 1 +ncaa 1 +neluset 1 +neverwhere 1 +nhk 1 +niarchos 1 +nibiru 1 +nickel 1 +nirmal 1 +nist 1 +no 1 +norman 1 +north 1 +novogrudok 1 +o. 1 +odd 1 +omega 1 +omniworld 1 +one 1 +online 1 +opus 1 +ori 1 +orjan 1 +orkney 1 +ornatum 1 +ospatulus 1 +otto 1 +ova 1 +p9o9 1 +paleorhinus 1 +pangjiabu 1 +papa 1 +park 1 +pasmo 1 +pau 1 +paul 1 +pbest 1 +peronismo 1 +perouse 1 +persson 1 +perth 1 +pfa 1 +phil 1 +philippa 1 +piano 1 +pinerolo 1 +pisapia 1 +pittsburghia 1 +place 1 +planes 1 +planetshanghai 1 +playgirl 1 +police 1 +pre-rendering 1 +presbyterian 1 +primary 1 +psychology 1 +pukaki 1 +pulau 1 +purma 1 +quartet 1 +quentin 1 +quest 1 +r9 1 +railway 1 +rbk 1 +rbs 1 +record 1 +recordon 1 +reserve 1 +return 1 +review 1 +rhcl9 1 +rhythm 1 +riaa 1 +rinchen 1 +river 1 +roble 1 +rocha 1 +rock 1 +rolf 1 +rosenborg 1 +rossabi 1 +ruger 1 +russell 1 +s9 1 +safari 1 +salomon 1 +sam 1 +sandwithii 1 +sara 1 +sarianidi 1 +savannah 1 +sbe 1 +school 1 +schuchat 1 +scream 1 +scree 1 +sea 1 +sec 1 +secobarbital 1 +seemann 1 +sendlinger 1 +sensme 1 +shame 1 +sharon 1 +sheegog 1 +sheinkin 1 +shelters 1 +simon 1 +snipes 1 +social 1 +sofi 1 +soobedars 1 +soviet 1 +space 1 +spector 1 +spirit 1 +spittel 1 +sportsnet 1 +srisailamgudem 1 +ss 1 +st 1 +standard 1 +stanton 1 +star 1 +statpipe 1 +stavros 1 +steinbeck 1 +stephen 1 +steven 1 +stif 1 +stonewall 1 +street 1 +streymoy 1 +study 1 +stutsman 1 +suica 1 +sunset 1 +supply 1 +suzuki 1 +syahrin 1 +sōya 1 +t 1 +t.999.com 1 +t.qq.com 1 +t.sina.com.cn 1 +t.sohu.com 1 +t.xxxx.com 1 +tau 1 +technology 1 +tel 1 +texas 1 +tf 1 +tf99 1 +theodor 1 +thomas 1 +thrissur 1 +timati 1 +time 1 +tnm 1 +tor 1 +touch 1 +trail 1 +train 1 +tru 1 +truncatulus 1 +tsang 1 +tvs-9 1 +tweddle 1 +twisty 1 +tyler 1 +uhler-phillips 1 +umls 1 +un 1 +union 1 +university 1 +usphs 1 +utricularia 1 +valla 1 +varginha 1 +victoria 1 +view 1 +viktor 1 +villa 1 +volantis 1 +vvvf 1 +w=9 1 +walker 1 +walter 1 +wesley 1 +west 1 +westmeath 1 +wheeler 1 +white 1 +who 1 +wii 1 +william 1 +wing 1 +wireless 1 +woman 1 +wood 1 +woodside 1 +world 1 +wta 1 +year 1 +youtube 1 +zeepipe 1 +zone 1 +° 1 +ð 1 +þ 1 +̄ 1 +θ 1 +〔 1 +〕 1 +一, 1 +一中全會 1 +一九五八 1 +一併 1 +一億 1 +一八 1 +一分為二 1 +一到 1 +一勞永逸 1 +一反其道 1 +一字一句 1 +一式一樣 1 +一成 1 +一戰 1 +一改 1 +一時 1 +一概 1 +一模一樣 1 +一氧化碳 1 +一炮 1 +一爭 1 +一發 1 +一百 1 +一百幾十 1 +一百萬 1 +一百餘 1 +一益 1 +一而再、再而三 1 +一舉 1 +一落千丈 1 +一見鍾情 1 +一路 1 +一身 1 +一邊 1 +一點 1 +丁字 1 +丁目 1 +七七 1 +七十 1 +七里 1 +三、 1 +三一 1 +三中 1 +三中全會 1 +三井 1 +三井住友 1 +三亞 1 +三元 1 +三十四 1 +三原 1 +三崎 1 +三星 1 +三氯化銠 1 +三氯氧釩 1 +三浦 1 +三王 1 +三百 1 +三百六七十 1 +三百多 1 +三索頜腔蛇 1 +三船 1 +三菱 1 +三萬 1 +三藩市 1 +三軍 1 +三門 1 +上下 1 +上下行 1 +上傳 1 +上去 1 +上古 1 +上司 1 +上埔 1 +上報 1 +上塘 1 +上奏 1 +上學 1 +上尉 1 +上手 1 +上新世 1 +上朝 1 +上林 1 +上沖 1 +上班 1 +上端 1 +上網 1 +上線 1 +上色 1 +上蓋 1 +上訪 1 +上調 1 +上路 1 +上身 1 +上車 1 +上選 1 +上部 1 +上限 1 +上集 1 +上雲 1 +上顎 1 +下剋上高潮 1 +下圖 1 +下徹 1 +下樓 1 +下河 1 +下潛 1 +下獄 1 +下稱 1 +下蝕 1 +下設 1 +下課 1 +下跌 1 +下車 1 +下遊 1 +下部 1 +下關 1 +下院 1 +下集 1 +下雷 1 +下面 1 +下顎 1 +下風 1 +不丹 1 +不乏 1 +不以為然 1 +不克 1 +不入 1 +不凡 1 +不出 1 +不出所料 1 +不利 1 +不到 1 +不力 1 +不動 1 +不去 1 +不吃 1 +不合 1 +不和 1 +不問 1 +不均 1 +不多 1 +不大 1 +不定 1 +不實 1 +不惜 1 +不愛 1 +不懷好意 1 +不折不扣 1 +不捨 1 +不收 1 +不敬 1 +不料 1 +不易 1 +不景 1 +不服 1 +不朽 1 +不歸 1 +不準 1 +不理 1 +不畏 1 +不符 1 +不純 1 +不絕 1 +不行 1 +不衰 1 +不要 1 +不見天日 1 +不解 1 +不計其數 1 +不該 1 +不詳 1 +不豐 1 +不賣 1 +不輸 1 +不辭辛勞 1 +不道 1 +不適 1 +不銹 1 +不限 1 +不露 1 +不顧 1 +且是 1 +世上 1 +世人 1 +世代相傳 1 +世充 1 +世則 1 +世子 1 +世家 1 +世昌 1 +世民 1 +世田谷 1 +世祿 1 +世綱 1 +世貿 1 +世道 1 +世銘 1 +丘 1 +丙組 1 +丞相 1 +並無 1 +並稱 1 +並系 1 +中信 1 +中南 1 +中南海 1 +中原 1 +中堅 1 +中場 1 +中底層 1 +中彈 1 +中性 1 +中投 1 +中斷 1 +中旬 1 +中校 1 +中樞 1 +中檔 1 +中殿 1 +中毒 1 +中波希米亞 1 +中田 1 +中級 1 +中綴 1 +中線 1 +中耳 1 +中聯 1 +中興 1 +中葉 1 +中藥 1 +中西方 1 +中西醫 1 +中觀 1 +中超 1 +中農 1 +中鐵 1 +串聯 1 +丸都 1 +丹 1 +丹噶爾 1 +丹尼士達智 1 +丹路殊 1 +主修 1 +主創 1 +主導 1 +主帶 1 +主幹 1 +主意 1 +主控 1 +主治 1 +主炮 1 +主犯 1 +主筆 1 +主船 1 +主食 1 +乃威 1 +久經 1 +久藏 1 +之所以 1 +之申 1 +之銓 1 +之鋒 1 +乘勢 1 +乘搭 1 +乘撘 1 +乘裝 1 +乙 1 +乙二胺 1 +乙未 1 +乙組 1 +乙苯 1 +九一一 1 +九十 1 +九江 1 +九鐵 1 +乳房 1 +乾季 1 +乾德 1 +乾淨 1 +乾西 1 +亂 1 +亂倫 1 +亂刀 1 +事先 1 +事態 1 +事發 1 +事與願違 1 +事跡 1 +事蹟 1 +二中全會 1 +二二八 1 +二十一 1 +二十二 1 +二十五 1 +二十八 1 +二十多 1 +二十萬 1 +二宮 1 +二戶 1 +二百 1 +二百五十餘 1 +二百餘 1 +二胺 1 +二郎 1 +于敏 1 +互作 1 +互利 1 +互助 1 +互惠 1 +互通 1 +互選 1 +五一 1 +五中全會 1 +五分之一 1 +五十 1 +五十一 1 +五十六 1 +五常 1 +五弟 1 +五彩繽紛 1 +五成半 1 +五指 1 +五氧化二氮 1 +五百萬 1 +五萬三千 1 +井字 1 +井村 1 +井田 1 +些微 1 +亞丁 1 +亞他那修 1 +亞伯塔 1 +亞伯拉罕 1 +亞冠龍 1 +亞利桑納 1 +亞基 1 +亞奧 1 +亞彬 1 +亞德里亞堡 1 +亞文 1 +亞普芮 1 +亞東 1 +亞歷山大丹尼士 1 +亞流 1 +亞烏扎 1 +亞特蘭大 1 +亞瑟 1 +亞當斯 1 +亞西爾 1 +亞運 1 +亞邦 1 +亞麻 1 +亡故 1 +交付 1 +交代 1 +交出 1 +交口 1 +交回 1 +交州 1 +交替 1 +交棒 1 +交涉 1 +交界 1 +交行 1 +交角 1 +交談 1 +交道 1 +交錯 1 +亦即 1 +亨 1 +亨得利 1 +享 1 +京劇 1 +京王 1 +京釜 1 +亭湖 1 +亮相 1 +人世 1 +人仕 1 +人字 1 +人客 1 +人手 1 +人日 1 +人權 1 +人殉 1 +人氣 1 +人祭 1 +人種 1 +人稱 1 +人行 1 +人道 1 +人選 1 +人麻呂 1 +仁傑 1 +仁和 1 +仁壽 1 +仁守 1 +仁宗 1 +仁煥 1 +仁牙因 1 +仁玕 1 +仁穆 1 +仁粹 1 +仁青 1 +仇人 1 +今川 1 +介壽 1 +介質 1 +仍是 1 +仍有 1 +仍算 1 +他倆 1 +他家 1 +仙人打坐 1 +仙女木 1 +仙鶴 1 +代亞布羅 1 +代價 1 +代名詞 1 +代幣 1 +代數 1 +代牧 1 +代碼 1 +令狐 1 +令華 1 +以爲 1 +仰光 1 +仰望 1 +仲 1 +仲雄 1 +任免 1 +任選 1 +伊 1 +伊克巴爾 1 +伊利 1 +伊利沙伯 1 +伊塔蒂亞亞 1 +伊娃 1 +伊尹 1 +伊摩琴 1 +伊朗 1 +伊犁 1 +伊甸 1 +伊薩爾 1 +伊里亞德 1 +伊阿宋 1 +伊頓 1 +伍德 1 +伍德羅 1 +伎倆 1 +伏塔 1 +伏契克 1 +伏爾加 1 +伏瓦蒂爾 1 +伐 1 +休假 1 +休克 1 +休士頓 1 +休憩 1 +休斯 1 +休閑 1 +休養 1 +伙食 1 +伯克爾 1 +伯多祿 1 +伯恩 1 +伯恩哈德 1 +伯明翰 1 +伯格 1 +伯溫 1 +伯爾尼 1 +伯納姆 1 +伯納雷 1 +伯茲貝格 1 +伯莎 1 +伯虎 1 +伯謙 1 +伯達 1 +伴侶 1 +伴奏 1 +伴有 1 +伴生 1 +伶 1 +伸一 1 +伸冤 1 +伸延 1 +伸港 1 +伽馬 1 +佈局 1 +佈置 1 +佈道 1 +位在 1 +位居 1 +位階 1 +位面 1 +低下 1 +低估 1 +低價 1 +低層 1 +低平 1 +低座 1 +低檔 1 +低潮 1 +低等 1 +低調 1 +低額 1 +住所 1 +住進 1 +佐佐木 1 +佐勞爾 1 +佐和子 1 +佐民 1 +佔用 1 +何利菲德 1 +何力特 1 +何方 1 +佛事 1 +佛典 1 +佛瑞爾斯 1 +佛經 1 +佛羅倫斯 1 +佛羅里達 1 +佛萊明 1 +佛蒙特 1 +佛頭 1 +作對 1 +作怪 1 +作曲 1 +作次郎 1 +作法 1 +作為 1 +作畫 1 +作雲 1 +作風 1 +佩佐拉諾 1 +佩儂 1 +佩戴 1 +佩琪 1 +佩蘭多 1 +佬 1 +佳作 1 +佳佳 1 +佳節 1 +併發 1 +使喚 1 +使團 1 +使節 1 +侄子 1 +來看 1 +來臨 1 +來襲 1 +來館 1 +侈談 1 +侍奉 1 +侍女 1 +侍從 1 +侏羅 1 +供水 1 +供電 1 +供養 1 +依次 1 +依照 1 +依瑪 1 +依託 1 +依託泊苷 1 +依附 1 +侮辱 1 +侯 1 +侵佔 1 +侵害 1 +便利 1 +便捷 1 +便是 1 +便服 1 +便當 1 +便秘 1 +俊業 1 +俗 1 +俘獲 1 +保 1 +保住 1 +保全 1 +保加爾 1 +保大 1 +保定 1 +保密 1 +保明 1 +保溫 1 +保羅費雷拉 1 +保送 1 +保養 1 +俠 1 +信中 1 +信念 1 +信教 1 +信玄 1 +信神 1 +信竹 1 +信裡 1 +修好 1 +修學 1 +修憲 1 +修煉 1 +修羅 1 +修葺 1 +修鞋 1 +修養 1 +俯瞰 1 +俸祿 1 +俾路支 1 +倉促 1 +倉庫 1 +個位 1 +個個 1 +個展 1 +倒下 1 +倒入 1 +倖免 1 +候旨 1 +候補 1 +倚天 1 +倚靠 1 +借 1 +倩文 1 +倫巴底 1 +倫拜 1 +倫納特 1 +倬標 1 +倭國 1 +倭寇 1 +假使 1 +假借 1 +假名 1 +假帳 1 +假設 1 +假說 1 +假象 1 +假釋 1 +假面 1 +偉 1 +偉強 1 +偏低 1 +偏僻 1 +偏向 1 +偏小 1 +偏東 1 +偏重 1 +偏離 1 +做到 1 +停刊 1 +停業 1 +停機 1 +停泊 1 +停職 1 +停辦 1 +停靠 1 +停飛 1 +健壯 1 +健將 1 +健身 1 +側目 1 +側邊 1 +偵察 1 +偵測 1 +偵緝 1 +偶像 1 +偶發 1 +偷取 1 +偷羊 1 +偷襲 1 +偷走 1 +偽 1 +偽季米特里 1 +偽裝 1 +傀儡 1 +傅萊 1 +傍 1 +傍晚 1 +傑克托爾 1 +傑志 1 +傑斐遜 1 +備忘 1 +備戰 1 +備案 1 +備用 1 +備註 1 +傢具 1 +催芽 1 +傭人 1 +傳來 1 +傳給 1 +傳記 1 +傳遍 1 +債券 1 +傷及 1 +傷心 1 +傷患 1 +傷悲 1 +傷病 1 +傷透 1 +傻 1 +傾心 1 +傾談 1 +僅屬 1 +僅用 1 +像差 1 +僑 1 +僕人 1 +僖 1 +僧人 1 +僧孺 1 +僧尼 1 +僧格 1 +僧祐 1 +僱主 1 +僱傭 1 +僵局 1 +價位 1 +價錢 1 +儀器 1 +億 1 +儒士 1 +儘快 1 +儘量 1 +償付 1 +優 1 +優值 1 +優良 1 +優裕 1 +優質 1 +儲量 1 +儷 1 +允 1 +允良 1 +元子 1 +元朝 1 +元氣 1 +元澄 1 +元老 1 +元起 1 +兄 1 +兄長 1 +充任 1 +充分 1 +充氣 1 +充滿 1 +充軍 1 +兆基 1 +兆楠 1 +兆陽 1 +兇多吉少 1 +兇悍 1 +兇猛 1 +先前 1 +先帝 1 +先師 1 +先賢 1 +先鋒 1 +先驗 1 +光啟 1 +光學 1 +光宇 1 +光州 1 +光度 1 +光復 1 +光景 1 +光束 1 +光泰 1 +光滑 1 +光照 1 +光環 1 +光范 1 +光華 1 +光顧 1 +克利普頓 1 +克力佛 1 +克勤 1 +克家 1 +克拉瑪 1 +克拉西奇 1 +克敏能 1 +克欽 1 +克洛頓 1 +克特勒 1 +克羅維茲 1 +克羅迪歐 1 +克蘇魯 1 +克裡斯 1 +克農 1 +克里姆希爾特 1 +克里斯多夫 1 +克里斯多弗 1 +克里斯托弗 1 +克里波門 1 +克魯 1 +兌換 1 +免 1 +免疫 1 +免遭 1 +兔毛 1 +兢兢業業 1 +入世 1 +入地 1 +入塞 1 +入境 1 +入手 1 +入聲 1 +入股 1 +入閘 1 +入院 1 +入駐 1 +內化 1 +內卡薩 1 +內在 1 +內埔 1 +內壁 1 +內政 1 +內置 1 +內胎 1 +內臟 1 +內載 1 +內遷 1 +全劇 1 +全名 1 +全境 1 +全壘 1 +全套 1 +全島 1 +全州 1 +全得 1 +全德 1 +全效 1 +全敗 1 +全數 1 +全書 1 +全盛 1 +全盤 1 +全省 1 +全福 1 +全程 1 +全稱 1 +全線 1 +全興 1 +全邨 1 +全鎮 1 +全隊 1 +全額 1 +全黑 1 +兩億 1 +兩千五百萬 1 +兩千萬 1 +八世 1 +八十九 1 +八卦 1 +八思巴 1 +八成 1 +八杉 1 +八百 1 +公仔 1 +公佈 1 +公克 1 +公告 1 +公墓 1 +公屋 1 +公斤 1 +公款 1 +公正 1 +公狼 1 +公約 1 +公衛 1 +公袥 1 +公視 1 +公超 1 +公關 1 +公頃 1 +公館 1 +六七 1 +六千 1 +六千四百萬 1 +六合 1 +六四 1 +六安 1 +共享 1 +共尾 1 +共生 1 +共處 1 +共識 1 +共鳴 1 +兵房 1 +兵鋒 1 +其妻 1 +其子 1 +其次 1 +其母 1 +典籍 1 +兼修 1 +兼具 1 +兼容 1 +兼屬 1 +兼并 1 +冀望 1 +冉 1 +冊 1 +再三 1 +再保 1 +再用 1 +再臨 1 +再補 1 +再見 1 +冒 1 +冒險 1 +冠 1 +冠上 1 +冠峰 1 +冠狀 1 +冠玉 1 +冢 1 +冤案 1 +冥冥 1 +冥想 1 +冬初 1 +冬眠 1 +冬青 1 +冰 1 +冰冰 1 +冰塔 1 +冰晶 1 +冰柱 1 +冰河 1 +冰湖 1 +冰瀑 1 +冰球 1 +冰風 1 +冷凍 1 +冷暖氣 1 +冷次 1 +冷氣 1 +冷眼 1 +冷遇 1 +冷靜 1 +凄美 1 +准 1 +准考 1 +凈白 1 +凊 1 +凌 1 +凌日 1 +凌晨 1 +凌辱 1 +凌駕 1 +凍傷 1 +凝結 1 +凡爾登 1 +凡爾賽 1 +凱恩 1 +凱文 1 +凱爾特 1 +凱維埃爾 1 +凱美特 1 +凱茜 1 +凶 1 +凸 1 +凸起 1 +凹版 1 +出世 1 +出人意料 1 +出到 1 +出動 1 +出去 1 +出名 1 +出品 1 +出國 1 +出城 1 +出奇 1 +出嫁 1 +出局 1 +出師 1 +出廠 1 +出征 1 +出手 1 +出擊 1 +出校 1 +出榜 1 +出血 1 +出訪 1 +出路 1 +出逃 1 +出門 1 +出頭 1 +刀鞘 1 +分工 1 +分店 1 +分批 1 +分攤 1 +分數 1 +分明 1 +分枝 1 +分校 1 +分泌 1 +分流 1 +分發 1 +分科 1 +分立 1 +分站 1 +分管 1 +分組 1 +分缺 1 +分貝 1 +分辨 1 +分部 1 +分鏡 1 +分隔 1 +分離 1 +分題 1 +分點 1 +切下 1 +切分 1 +切割 1 +切合 1 +切實 1 +切成 1 +切望 1 +切爾尼赫 1 +切片 1 +刑事 1 +刑部 1 +划算 1 +划艇 1 +列斯聯 1 +列維爾 1 +初中 1 +初始 1 +初時 1 +初次 1 +初步 1 +初見 1 +判 1 +判令 1 +判定 1 +判寺事 1 +判詞 1 +別人 1 +別名 1 +別院 1 +利他能 1 +利刃 1 +利好 1 +利潘迪特蘭堡 1 +利維奧 1 +刪剪 1 +刮目相看 1 +到任 1 +到期 1 +到發 1 +制動 1 +制式 1 +制瓷 1 +制約 1 +制酸 1 +刷 1 +刷到 1 +券 1 +券頂 1 +刺殺 1 +刻劃 1 +刻寫 1 +刻板 1 +刻滿 1 +刻畫 1 +則士 1 +則里拉 1 +削減 1 +前傾 1 +前去 1 +前因後果 1 +前奏 1 +前委 1 +前嫌 1 +前季 1 +前提 1 +前景 1 +前稱 1 +前端 1 +前綴 1 +前者 1 +前肢 1 +前齒 1 +剛剛 1 +剛性 1 +剛直 1 +剛鐸 1 +剩 1 +剩餘 1 +副長 1 +割據 1 +割破 1 +割讓 1 +割開 1 +創保 1 +創傷 1 +創刊 1 +創煥 1 +創生 1 +剷除 1 +剿 1 +剿滅 1 +劃出 1 +劃歸 1 +劃界 1 +劇中 1 +劇作 1 +劇場 1 +劇組 1 +劍俠 1 +劍法 1 +劍麻 1 +劑量 1 +力克 1 +力圖 1 +力霸 1 +功勞 1 +功德 1 +功樂 1 +功績 1 +加侖 1 +加值 1 +加冕 1 +加利奇 1 +加劇 1 +加勁 1 +加恩卡納 1 +加爾文 1 +加粗 1 +加藤 1 +加賀 1 +加速 1 +加電 1 +劣 1 +助 1 +助手 1 +助燃 1 +助聽 1 +助長 1 +努兒道刺特 1 +劫匪 1 +劫持 1 +効忠 1 +勁光 1 +勁報 1 +勁敵 1 +勁歌 1 +勃起 1 +勇俊 1 +勇士 1 +勇武 1 +勒溫 1 +動人 1 +動向 1 +動土 1 +動漫 1 +動漫畫 1 +動用 1 +動能 1 +動蕩 1 +動詞 1 +動量 1 +勘探 1 +務工 1 +勝 1 +勝任 1 +勝昭 1 +勝素 1 +勝者 1 +勝訴 1 +勝賴 1 +勞埃德 1 +勞累 1 +募款 1 +募集 1 +勢傾中外 1 +勢能 1 +勤先 1 +勤快 1 +勳位 1 +勳爵 1 +勵珍 1 +勸 1 +勾形 1 +勾畫 1 +勾結 1 +包袱 1 +包裹 1 +包覆 1 +包頭 1 +化名 1 +化妝 1 +化成 1 +化整為零 1 +化用 1 +化肥 1 +北伐 1 +北側 1 +北冰 1 +北卡羅萊納 1 +北景 1 +北歐 1 +北段 1 +北甘馬粦 1 +北美擬獅 1 +北車 1 +北返 1 +北達科他 1 +北邊 1 +匡 1 +匯入 1 +匯合 1 +匯報 1 +匯聯 1 +匯集 1 +匹 1 +匹茲堡 1 +匾額 1 +區塊 1 +區段 1 +區間 1 +十二世 1 +十二烷基苯 1 +十全十美 1 +十八億 1 +十八大 1 +十四 1 +十數 1 +十萬 1 +十餘 1 +千兆 1 +千克 1 +千島 1 +千方百計 1 +千春 1 +千瓦 1 +千米 1 +千萬 1 +千里迢迢 1 +千陽 1 +千鶴 1 +升值 1 +升到 1 +升天 1 +升越 1 +升降 1 +升高 1 +午膳 1 +半導體 1 +半牧 1 +半農 1 +卑詩 1 +卓著 1 +協合 1 +協理 1 +南人 1 +南卡羅萊納 1 +南哲 1 +南大 1 +南安 1 +南安普頓 1 +南寧 1 +南市 1 +南征 1 +南端 1 +南線 1 +南美 1 +南臨 1 +南航 1 +南船 1 +南路 1 +南通 1 +南遷 1 +南鄰 1 +南門 1 +南開 1 +南院 1 +南雄 1 +南麓 1 +博 1 +博凱蒂 1 +博多 1 +博學 1 +博斯維爾 1 +博格 1 +博洛尼亞 1 +博滕 1 +博義 1 +博覽 1 +占星 1 +卡亞尼 1 +卡內拉 1 +卡利帕斯 1 +卡力崗 1 +卡夫 1 +卡夫卡 1 +卡巴雷羅 1 +卡希 1 +卡帕克 1 +卡拉ok 1 +卡拉柯伊 1 +卡拉維拿 1 +卡斯楚 1 +卡斯特羅 1 +卡普里維 1 +卡波特 1 +卡洛克 1 +卡洛斯 1 +卡洛曼 1 +卡洛琳 1 +卡羅來納 1 +卡羅萊納 1 +卡臣 1 +卡薩諾瓦 1 +卡車 1 +卡達 1 +卦 1 +卧底 1 +卧病 1 +卧薪嘗膽 1 +印信 1 +印刷 1 +印地安那 1 +印度尼西亞 1 +印第安納 1 +印第安納波利斯 1 +印表 1 +危在旦夕 1 +危害 1 +危殆 1 +即場 1 +即有 1 +卵內 1 +厘 1 +原先 1 +原型 1 +原姓 1 +原屬 1 +原平 1 +原意 1 +原指 1 +原文 1 +原核 1 +原畫 1 +原籍 1 +原罪 1 +原諒 1 +厥 1 +厭世 1 +厭惡 1 +去搶 1 +去留 1 +去看 1 +參戰 1 +參政 1 +參演 1 +參看 1 +參禮 1 +參贊 1 +參閱 1 +又廷 1 +又或 1 +及後 1 +及時 1 +友 1 +友情 1 +友邦 1 +反共 1 +反動 1 +反右 1 +反向 1 +反恐 1 +反省 1 +反綁 1 +反證 1 +反響 1 +反黨 1 +叔父 1 +取下 1 +取出 1 +取名 1 +取回 1 +取悅 1 +取液 1 +取物 1 +取用 1 +取而代之 1 +受命 1 +受孕 1 +受害 1 +受挫 1 +受洗 1 +受精 1 +受罰 1 +受襲 1 +受賄 1 +受阻 1 +受雇 1 +叛徒 1 +叛變 1 +叛軍 1 +叡 1 +叢刊 1 +叢書 1 +口供 1 +口信 1 +口吻 1 +口感 1 +口服 1 +口音 1 +古喙龍 1 +古堡 1 +古寺 1 +古廟 1 +古德諾 1 +古惑 1 +古斯塔夫 1 +古爾德 1 +古迹 1 +古都斯 1 +句子 1 +句點 1 +另加 1 +另娶 1 +另立 1 +另築 1 +另類 1 +只好 1 +只是 1 +只會 1 +只知 1 +只能 1 +叫作 1 +叫拜 1 +叫聲 1 +召 1 +召集 1 +可可 1 +可塑 1 +可愛 1 +可憐 1 +可樂 1 +可欣 1 +可西卡 1 +可靠 1 +可風 1 +台南 1 +台標 1 +台視 1 +台詞 1 +台長 1 +史前 1 +史坦貝克 1 +史官 1 +史帝芬 1 +史特勞斯 1 +史稱 1 +史記 1 +史跡 1 +史館 1 +右任 1 +右手 1 +右方 1 +右臂 1 +司可巴比妥 1 +司鐸 1 +吁宋 1 +吃上 1 +吃到 1 +吃掉 1 +吃法 1 +吃起 1 +各方 1 +各球 1 +各異 1 +各科 1 +各職 1 +各處 1 +各行各業 1 +各隊 1 +各項 1 +合共 1 +合力 1 +合和 1 +合唱 1 +合夥 1 +合奏 1 +合流 1 +合約 1 +合計 1 +合資 1 +合辦 1 +合適 1 +合陽 1 +合體 1 +吉利 1 +吉奧瓦尼 1 +吉姆 1 +吉布地 1 +吉拉德 1 +吉爾伯特 1 +吉祥 1 +吉米 1 +吉西 1 +吉隆坡 1 +吋 1 +同仁社 1 +同伴 1 +同僚 1 +同台 1 +同型 1 +同工 1 +同志 1 +同日 1 +同校 1 +同步 1 +同母 1 +同父 1 +同甘共苦 1 +同行 1 +同郷 1 +同食 1 +同飲 1 +名作 1 +名分 1 +名利雙收 1 +名城 1 +名帥 1 +名師 1 +名村 1 +名氣 1 +名流 1 +名聲 1 +名臣 1 +名茶 1 +名號 1 +名門 1 +名額 1 +后 1 +后妃 1 +吐 1 +吐嘈 1 +向前 1 +向滋 1 +君如 1 +君權 1 +君長 1 +吞下 1 +吟唱 1 +否 1 +否決 1 +吧 1 +吩咐 1 +含 1 +含糖 1 +含量 1 +吳王 1 +吵醒 1 +吸塵 1 +吸毒 1 +吸菸 1 +吸附 1 +吸食 1 +吹來 1 +吹氣 1 +吹滅 1 +吻部 1 +呀 1 +呂宋 1 +呆 1 +呈交 1 +告戒 1 +告白 1 +周代 1 +周刊 1 +周敏 1 +周日 1 +周朝 1 +周期 1 +周迅 1 +周遭 1 +味道 1 +呼 1 +呼倫貝爾 1 +呼和浩特 1 +命題 1 +和夫 1 +和好 1 +和宜合 1 +和康 1 +和暖 1 +和會 1 +和林 1 +和樹 1 +和睦 1 +和美 1 +和衷 1 +和親 1 +和記 1 +和諧 1 +和議 1 +咧嘴 1 +咬弦 1 +咸平 1 +咸康 1 +咸淳 1 +咸美頓 1 +咸鏡 1 +咸陽 1 +哀悼 1 +品嘗 1 +品學兼優 1 +品德 1 +品源 1 +哈 1 +哈丹姆 1 +哈依拉爾 1 +哈剌旭烈 1 +哈吉 1 +哈布斯堡 1 +哈希姆 1 +哈恩 1 +哈拉帕那瓦 1 +哈索爾 1 +哈羅 1 +哈萊姆 1 +哈薩克 1 +哈達 1 +哈里斯堡 1 +哈里森 1 +哈默史密斯 1 +員佐 1 +員外 1 +哥利茲 1 +哥德堡 1 +哨所 1 +哪 1 +哭 1 +哲也 1 +哲元 1 +哲孟雄 1 +哲生 1 +哲蚌 1 +唇槍舌劍 1 +唐代 1 +售予 1 +售出 1 +售票 1 +唯 1 +唯獨 1 +唱戲 1 +唱法 1 +唸 1 +唸珠 1 +唾液 1 +商事 1 +商務 1 +商圈 1 +商城 1 +商埠 1 +商場 1 +商幫 1 +商朝 1 +商湯 1 +商用 1 +商羯羅 1 +商船 1 +商量 1 +啊 1 +問吧 1 +問話 1 +啟 1 +啟傑 1 +啟明 1 +啟發 1 +啟示 1 +啟程 1 +啟聯 1 +啟鑰 1 +啤酒 1 +喀什 1 +喀拉拉邦 1 +喀里多尼亞 1 +善事 1 +善作 1 +善待 1 +善後 1 +善惡 1 +善撲 1 +善良 1 +喇薩 1 +喊出 1 +喘息 1 +喙 1 +喙端 1 +喚 1 +喚回 1 +喚起 1 +喜 1 +喜好 1 +喝醉 1 +喝采 1 +喪失 1 +喬姆斯基 1 +喬木 1 +喬科維奇 1 +單獨 1 +單調 1 +單質 1 +單項 1 +嗅到 1 +嗎 1 +嗜酸 1 +嗜鹼 1 +嗣位 1 +嗣業 1 +嘉木揚 1 +嘉木樣 1 +嘉樂 1 +嘉許 1 +嘉道理 1 +嘉陵 1 +嘉靖 1 +嘔吐 1 +嘩然 1 +嘯林 1 +嘴 1 +噁心 1 +噁爆 1 +器具 1 +器械 1 +器蓋 1 +器身 1 +噴射 1 +噸位 1 +嚇人 1 +嚮導 1 +嚴 1 +嚴令 1 +嚴加 1 +嚴島 1 +嚴懲 1 +嚴斥 1 +嚴氏 1 +嚴肅 1 +嚴謹 1 +囊胚 1 +囑咐 1 +囚犯 1 +四十三 1 +四十多 1 +四十餘 1 +四周 1 +四平 1 +四方八面 1 +四牌 1 +四萬 1 +四郎 1 +回信 1 +回合 1 +回填 1 +回家 1 +回寺 1 +回彈 1 +回復 1 +回教 1 +回生 1 +回程 1 +回答 1 +因弗內斯 1 +因達農 1 +困 1 +困住 1 +困擾 1 +固 1 +固態 1 +固有 1 +固醇 1 +國中 1 +國主 1 +國光 1 +國公 1 +國共 1 +國史 1 +國名 1 +國君 1 +國土 1 +國奧 1 +國妃 1 +國安會 1 +國府 1 +國庫 1 +國情 1 +國慶 1 +國成 1 +國松 1 +國父 1 +國產 1 +國界 1 +國立 1 +國策 1 +國諱 1 +國雄 1 +圍坐 1 +圍棋 1 +圍牆 1 +圍魏救趙 1 +園丁 1 +園主 1 +園內 1 +園明園 1 +園林 1 +圓 1 +圓圓 1 +圓弧 1 +圓柱 1 +圓滑 1 +圓環 1 +圖取 1 +圖布丹 1 +圖形 1 +圖片 1 +圖示 1 +圖稿 1 +團圓 1 +團隊 1 +土匪 1 +土司 1 +土石 1 +土虱 1 +在崗 1 +在校 1 +在身 1 +地名 1 +地域 1 +地基 1 +地平 1 +地庫 1 +地政 1 +地板 1 +地標 1 +地盤 1 +地級 1 +地表 1 +地貌 1 +地質 1 +地道 1 +地震 1 +坂本 1 +均勻 1 +均衡 1 +坎特伯里 1 +坎貝爾 1 +坎農 1 +坐在 1 +坐監 1 +坐骨 1 +坡子 1 +坤玲 1 +坦 1 +坦克 1 +坦干伊喀 1 +坦然 1 +坦白 1 +型式 1 +垮台 1 +埃內韋塔克 1 +埃弗里 1 +埃米內斯庫 1 +埃米琳 1 +埃胡德 1 +埃雷拉 1 +埋怨 1 +埋葬 1 +埋藏 1 +城主 1 +城光 1 +城內 1 +城南 1 +城址 1 +城巴 1 +城池 1 +城牆 1 +城西 1 +城隍 1 +埜堂 1 +埤 1 +執委 1 +執業 1 +執飛 1 +培元 1 +培育 1 +基層 1 +基希涅夫 1 +基平 1 +基徹 1 +基數 1 +基石 1 +基頻 1 +堂堂正正 1 +堅城 1 +堅定 1 +堅尼地 1 +堅拒 1 +堅蜥 1 +堆填 1 +堆積 1 +堈 1 +堪憐 1 +堪稱 1 +堪薩斯 1 +報仇 1 +報刊 1 +報名 1 +報復 1 +報讀 1 +場內 1 +場均 1 +場景 1 +塑像 1 +塑料 1 +塑有 1 +塑膠 1 +塔利班 1 +塔台 1 +塔吉克 1 +塔塔爾 1 +塔夫茨 1 +塔林 1 +塔樓 1 +塔西佗 1 +塗黑 1 +塚 1 +塞古拉 1 +塞德爾恰尼 1 +塞普提米烏斯 1 +塞法迪 1 +塞爾達 1 +塞琉古 1 +塞琉西 1 +塞維利亞 1 +塞維魯 1 +塞維魯敉 1 +塞隆 1 +塞音 1 +塞馬 1 +墓葬 1 +墓頂 1 +墜入 1 +墜落 1 +增殖 1 +增生 1 +增祥 1 +增進 1 +增額 1 +墟 1 +墟內 1 +墨 1 +墨客 1 +墨色 1 +墳 1 +墾田 1 +壓 1 +壓縮 1 +壞球 1 +壩上 1 +壩下 1 +士珍 1 +士禛 1 +士評 1 +壯漢 1 +壯烈 1 +壹 1 +壺 1 +壺中仙 1 +壽命 1 +壽宴 1 +壽星 1 +夏威夷 1 +夏愨 1 +夏秋季 1 +夏至 1 +夏茸切哇 1 +夏茸穹哇 1 +夏荷林 1 +夏默 1 +外借 1 +外力 1 +外加 1 +外務 1 +外匯 1 +外地 1 +外壁 1 +外套 1 +外層 1 +外形 1 +外殼 1 +外甥 1 +外甥女 1 +外省 1 +外管 1 +外表 1 +外褂 1 +外訪 1 +外語 1 +外銷 1 +多倫 1 +多元 1 +多汁 1 +多納德 1 +多謝 1 +多雨 1 +夜夜 1 +夜戰 1 +夠大 1 +夢中 1 +夢境 1 +夢幻 1 +夢想 1 +夢雲 1 +夢鴿 1 +夥兒 1 +大不了 1 +大乘 1 +大事 1 +大二 1 +大儒 1 +大區 1 +大友 1 +大受 1 +大吉 1 +大名 1 +大君 1 +大和 1 +大喊 1 +大國 1 +大圍 1 +大城 1 +大堆 1 +大堤 1 +大增 1 +大士 1 +大失所望 1 +大島 1 +大嶼 1 +大幅 1 +大怒 1 +大悟 1 +大敵 1 +大新 1 +大校 1 +大概 1 +大正 1 +大殿 1 +大汗 1 +大河 1 +大洋 1 +大湖 1 +大溪 1 +大漠 1 +大獲 1 +大理 1 +大發 1 +大窘 1 +大紅 1 +大經 1 +大綱 1 +大腦 1 +大腸 1 +大膽 1 +大舉 1 +大艇 1 +大華 1 +大蒜 1 +大街小巷 1 +大跌 1 +大路 1 +大辦 1 +大通 1 +大進 1 +大郎 1 +大部 1 +大都 1 +大釗 1 +大銘 1 +大門 1 +大雄 1 +大韓 1 +大馬 1 +大驚 1 +大體 1 +大鬧 1 +大黨 1 +大鼠 1 +天份 1 +天佐 1 +天使 1 +天倫之樂 1 +天元 1 +天安 1 +天寶樓 1 +天差地遠 1 +天性 1 +天悅 1 +天慶 1 +天才 1 +天母 1 +天河 1 +天涯 1 +天球 1 +天祐 1 +天窗 1 +天紀 1 +天翔 1 +天賜 1 +天賦 1 +天馬 1 +太傅 1 +太元 1 +太冷 1 +太初 1 +太后 1 +太宗 1 +太宰 1 +太尉 1 +太常 1 +太極 1 +太湖 1 +太炎 1 +太監 1 +太行 1 +太近 1 +太遠 1 +太郎 1 +夫仇 1 +夫妻 1 +央行 1 +失利 1 +失地 1 +失效 1 +失職 1 +失能 1 +失落 1 +失誤 1 +失蹤 1 +夷昧 1 +夾 1 +夾狀 1 +奇俠 1 +奇幻 1 +奇怪 1 +奇缺 1 +奈葉 1 +奉 1 +奉命 1 +奉安 1 +奉律 1 +奉新 1 +奉系 1 +奎德林堡 1 +奏 1 +奏鳴 1 +奕 1 +奕詝 1 +套出 1 +套用 1 +奢華 1 +奧伊 1 +奧克尼 1 +奧克蘭 1 +奧古斯丁 1 +奧姆 1 +奧得 1 +奧托 1 +奧斯卡 1 +奧斯威爾 1 +奧斯汀 1 +奧林匹亞絲 1 +奧林匹斯 1 +奧格斯堡 1 +奧爾滕 1 +奧爾登堡 1 +奧爾良 1 +奧特 1 +奧特伊 1 +奧的斯 1 +奧米加 1 +奧羽 1 +奧蒂洛 1 +奪去 1 +奬懲 1 +女人 1 +女傭 1 +女僕 1 +女優 1 +女友 1 +女嬰 1 +女水 1 +女版 1 +女生 1 +女眷 1 +奴役 1 +奶爸 1 +奸 1 +她倆 1 +好上 1 +好奇 1 +好手 1 +好氧 1 +好色 1 +如數 1 +妄圖 1 +妊娠 1 +妖怪 1 +妙 1 +妮科爾 1 +妮綺 1 +妳 1 +妹 1 +妹夫 1 +妻妹 1 +妻姐 1 +妻室 1 +姊姊 1 +始發 1 +始祖 1 +始稱 1 +始興 1 +姑娘 1 +姑母 1 +委內瑞拉 1 +委身 1 +姚里 1 +姥姥 1 +姦情 1 +姪女 1 +姿色 1 +威 1 +威光 1 +威嚇 1 +威塞克斯 1 +威斯特米思從 1 +威格莫爾 1 +威權 1 +威爾伯 1 +威爾歇 1 +威特 1 +威舍 1 +威靈頓 1 +娘 1 +娘家 1 +娜塔莉 1 +婁 1 +婆 1 +婆羅 1 +婚 1 +婚事 1 +婚宴 1 +婚禮 1 +婢女 1 +婦 1 +婷婷 1 +媒介 1 +媚娘 1 +嫁與 1 +嫘縈 1 +嫣然 1 +嬰孩 1 +子孫 1 +子文 1 +子球 1 +子程 1 +孕育 1 +孕酮 1 +字喃 1 +字幕 1 +字模 1 +字號 1 +存世 1 +存取 1 +存放 1 +孝感 1 +孝次 1 +孟 1 +孟加拉 1 +孟德爾 1 +季後 1 +季惟 1 +季風 1 +季龍 1 +孤島 1 +孤芳自賞 1 +孤身 1 +孩提 1 +學到 1 +學前 1 +學家 1 +學府二道 1 +學業 1 +學民 1 +學津 1 +學社 1 +學聯 1 +學苑 1 +宇航 1 +守備 1 +守孝 1 +守文 1 +守法 1 +守臣 1 +守謙 1 +守齋 1 +安二郎 1 +安妮 1 +安安 1 +安岳 1 +安徒生 1 +安得拉 1 +安得拉邦 1 +安德魯 1 +安托瓦內特 1 +安撫 1 +安放 1 +安東 1 +安樂 1 +安正 1 +安民 1 +安汶 1 +安然 1 +安營 1 +安理 1 +安納 1 +安聯 1 +安葬 1 +安蘭 1 +安達信 1 +安那瑞安 1 +安那罕 1 +宋國 1 +完好 1 +完畢 1 +宏偉 1 +宏坤 1 +宏聲 1 +宏道 1 +宏量 1 +宗偉 1 +宗憲 1 +宗谷 1 +宗龍 1 +官兵 1 +官司 1 +官府 1 +官服 1 +官腔 1 +官話 1 +官邸 1 +官長 1 +宙域 1 +定位 1 +定價 1 +定向 1 +定影 1 +定性 1 +定案 1 +定理 1 +定量 1 +宛城 1 +宜興 1 +客場 1 +客家 1 +客觀 1 +客貨運 1 +客輪 1 +客量 1 +宣 1 +宣判 1 +宣化 1 +宣帝 1 +宣誓 1 +室外 1 +室溫 1 +宦官 1 +宮人 1 +宮崎 1 +宰李 1 +宴席 1 +宴會 1 +家光 1 +家勁 1 +家務 1 +家外 1 +家奴 1 +家干 1 +家用 1 +家立 1 +家道中落 1 +家驤 1 +容 1 +容器 1 +容忍 1 +容許 1 +容量 1 +宿敵 1 +宿根 1 +寄存 1 +寄送 1 +寅成 1 +密 1 +密山 1 +密文 1 +密歇根 1 +密西西比 1 +密集 1 +富商 1 +富恩特德奧羅 1 +富翁 1 +富蘭克林 1 +富裕 1 +富豪 1 +富貴 1 +富邦 1 +察合台 1 +察哈爾 1 +察沃 1 +寡尿 1 +實 1 +實則 1 +實屬 1 +實情 1 +實戰 1 +實收 1 +實權 1 +實況 1 +實踐 1 +寧波 1 +審批 1 +審理 1 +審計 1 +審評 1 +審議 1 +寫下 1 +寫信 1 +寫入 1 +寫出 1 +寫字 1 +寫成 1 +寫進 1 +寬容 1 +寬度 1 +寬敞 1 +寬條 1 +寬順 1 +寮國 1 +寵物 1 +寵臣 1 +寶光 1 +寶劍 1 +寶如 1 +寶應 1 +寶殿 1 +寶玉 1 +寶田 1 +寶血 1 +寶雞 1 +寶雲 1 +寶麗金 1 +寺前 1 +封土 1 +封為 1 +封爵 1 +封穴 1 +封號 1 +封裝 1 +封路 1 +射失 1 +射程 1 +射箭 1 +射線 1 +射鵰 1 +將來 1 +將領 1 +專 1 +專任 1 +專制 1 +專吃 1 +專指 1 +專政 1 +專機 1 +專橫 1 +專欄 1 +專權 1 +專款 1 +專注 1 +專線 1 +專註 1 +專賣 1 +專長 1 +專項 1 +尊崇 1 +尊敬 1 +尊稱 1 +尋回 1 +尋親 1 +對上 1 +對付 1 +對撞 1 +對準 1 +對照 1 +對生 1 +對白 1 +對稱 1 +對立 1 +對簿公堂 1 +對話 1 +對面 1 +對飛 1 +導 1 +導入 1 +導出 1 +導向 1 +導彈 1 +導播 1 +導正 1 +導體 1 +小人 1 +小兔 1 +小刀 1 +小南 1 +小國 1 +小小 1 +小島 1 +小息 1 +小數 1 +小書 1 +小欖 1 +小水鴨 1 +小河兒 1 +小津 1 +小浪底 1 +小澤 1 +小片 1 +小生 1 +小田急 1 +小知 1 +小石 1 +小童 1 +小舖 1 +小虎 1 +小街 1 +小輪 1 +小野 1 +小隊 1 +小順 1 +小顏 1 +小風 1 +小體 1 +少兒 1 +少將 1 +少年 1 +少懷 1 +少林 1 +少見 1 +少許 1 +少量 1 +尖端 1 +尖酸 1 +尖頂 1 +尚州 1 +尚德 1 +尚方 1 +尚書 1 +尤利烏斯 1 +尤勒 1 +尤指 1 +尤里卡 1 +就此 1 +就熟 1 +就職 1 +尷尬 1 +尹 1 +尹氏 1 +尼克貝 1 +尼古丁 1 +尼古拉 1 +尼奧爾德 1 +尼師今 1 +尼庫瑙 1 +尼歐斯 1 +尼比魯 1 +尼爾 1 +尼爾斯 1 +尼爾馬爾 1 +尾 1 +尾巴 1 +尾柄 1 +尾隨 1 +尾鰭 1 +尾龍 1 +局勢 1 +局間 1 +居家 1 +居所 1 +居留 1 +居禮 1 +屆滿 1 +屋 1 +屋大薇 1 +屋宇 1 +屋頂 1 +屍 1 +屍體 1 +屏山 1 +屏東 1 +屏風 1 +展品 1 +展望 1 +展貿 1 +屠村 1 +屠龍 1 +層壓 1 +層次 1 +層疊 1 +層級 1 +層面 1 +履行 1 +屬國 1 +屬於 1 +屬靈 1 +屯南 1 +山下 1 +山內 1 +山口 1 +山地 1 +山姆 1 +山峰 1 +山崖 1 +山手 1 +山月 1 +山村 1 +山楂 1 +山猿 1 +山田 1 +山胞 1 +山葉 1 +山陵 1 +山麓 1 +山龍眼 1 +岐女短 1 +岐阜 1 +岐陽 1 +岑 1 +岔江 1 +岡恩 1 +岡本 1 +岩屋 1 +岩心 1 +岩手 1 +岩漿 1 +岳 1 +岳泰 1 +岷江 1 +岸川 1 +岸賈 1 +岸邊 1 +峯崎 1 +峰倉 1 +峰景 1 +島內 1 +島國 1 +島蚺 1 +峽 1 +峽灣 1 +峽谷 1 +崇善 1 +崇尚 1 +崇敬 1 +崎頭 1 +崔 1 +崔陂 1 +崗 1 +崗斜 1 +崙頂 1 +崞縣 1 +崩坍 1 +崩潰 1 +嵩祝 1 +巔峰 1 +川南 1 +川村 1 +川邊 1 +州界 1 +州舞 1 +巡査 1 +巢 1 +工事 1 +工務 1 +工序 1 +工廠 1 +工會 1 +工法 1 +工潮 1 +左右神策軍 1 +左岸 1 +左拉 1 +左派 1 +左膀 1 +左轉 1 +巨作 1 +巨像 1 +巨冊 1 +巨型 1 +巨石 1 +巨賈 1 +巨野 1 +巫師 1 +差 1 +差分 1 +差別 1 +差勁 1 +差會 1 +己二胺 1 +己巳 1 +己酉 1 +已故 1 +已晚 1 +已死 1 +巴 1 +巴亞莫 1 +巴克 1 +巴克禮 1 +巴列姆 1 +巴列斯特爾 1 +巴卑爾 1 +巴喬 1 +巴城 1 +巴塞 1 +巴塞羅那 1 +巴塞隆拿 1 +巴塞隆納 1 +巴孛許諾 1 +巴巴克 1 +巴庫 1 +巴思缽 1 +巴恩斯 1 +巴拉克 1 +巴拉尼 1 +巴斯克 1 +巴斯德 1 +巴斯蒂亞 1 +巴比 1 +巴爾虎 1 +巴爾齊蒂斯 1 +巴納夫 1 +巴納巴 1 +巴羅爾 1 +巴英額 1 +巴莫鱷 1 +巴蒂斯塔 1 +巴西利卡 1 +巴西班讓 1 +巴諾 1 +巴賽 1 +巴赫 1 +巴頓 1 +市售 1 +市縣 1 +市轄 1 +市面 1 +布 1 +布伯 1 +布倫努斯 1 +布列塔尼 1 +布哈林 1 +布宜諾斯艾利斯 1 +布拉亞斯 1 +布拉德 1 +布政 1 +布料 1 +布林 1 +布氏奇非鯽 1 +布爾 1 +布置 1 +布萊姆 1 +布蘭特福德 1 +布蘭登堡 1 +布賴滕費爾德 1 +布里奇曼 1 +布里斯托 1 +布里斯班 1 +布雷克 1 +布雷西亞 1 +布魯克林 1 +布魯斯 1 +帆布 1 +帆船 1 +希伯來 1 +希克森 1 +希爾曼 1 +希特勒 1 +希皮奧內 1 +希鵬 1 +帕克 1 +帕內爾 1 +帕搏 1 +帕爾曼 1 +帕特羅克洛斯 1 +帕米爾 1 +帕納辛奈克斯 1 +帕納辛納克斯 1 +帕維亞 1 +帕薩迪納 1 +帕西奧利 1 +帕迪恩 1 +帕金森 1 +帝王 1 +帝都 1 +師團 1 +師徒 1 +師從 1 +師父 1 +師生 1 +席勒 1 +帳目 1 +帶上 1 +帶出 1 +帶子 1 +帶少 1 +帶水 1 +常住 1 +常勝 1 +常客 1 +常態 1 +常春 1 +常春藤 1 +常盛 1 +常識 1 +常量 1 +常青 1 +常駐 1 +幀 1 +幅 1 +幅員遼闊 1 +幕 1 +幕府 1 +幕後 1 +幢 1 +幣原 1 +幪面 1 +幫主 1 +干王 1 +平反 1 +平和 1 +平地 1 +平坦 1 +平帝 1 +平常 1 +平手 1 +平日 1 +平林 1 +平沼 1 +平滑 1 +平臺 1 +平行 1 +平陵 1 +平陽 1 +年中 1 +年份 1 +年幼 1 +年息 1 +年第 1 +年老 1 +年號 1 +年資 1 +年青 1 +并行 1 +幸一 1 +幸好 1 +幸運 1 +幹 1 +幹事 1 +幹掉 1 +幹流 1 +幹道 1 +幼子 1 +幼年 1 +幼弟 1 +幼發拉底 1 +幼稚 1 +幼貓 1 +幼魚 1 +幼鯨 1 +幼鳥 1 +幽閣 1 +幾內亞 1 +幾十 1 +幾千 1 +幾多 1 +幾百 1 +床 1 +床鋪 1 +底冊 1 +底格里斯 1 +底比斯 1 +底片 1 +底特律 1 +底稿 1 +底質 1 +店家 1 +庚戌 1 +府中 1 +府城 1 +府尹 1 +府第 1 +度宗 1 +座位 1 +座右 1 +座座 1 +座椅 1 +座西 1 +座談 1 +庫伊瓦 1 +庫伊瓦涅米 1 +庫哈斯 1 +庫柏力克 1 +庫欣 1 +庫爾特 1 +庫賽 1 +庫赫莫 1 +庫迪尼奧 1 +庫頁 1 +庭園 1 +庭薺 1 +庭長 1 +康乃狄克 1 +康史 1 +康奈爾 1 +康子 1 +康寧 1 +康樂 1 +康濟鼐 1 +康福 1 +康科德 1 +康羅伊 1 +廂 1 +廉潔 1 +廚師 1 +廝守 1 +廟倉 1 +廟方 1 +廟橋 1 +廟鎮 1 +廢棄 1 +廢熱 1 +廢舊 1 +廣受 1 +廣大 1 +廣大興 1 +廣權 1 +廣澳 1 +廣稱 1 +廣金 1 +廬山 1 +廳局 1 +廳長 1 +延安 1 +延年益壽 1 +延音 1 +廷和 1 +廷尉 1 +建好 1 +建威 1 +建市 1 +建御名方 1 +建御雷 1 +建構 1 +建武 1 +建置 1 +建華 1 +建超 1 +廿五 1 +廿六 1 +弄到 1 +弄清 1 +弊案 1 +式微 1 +弓尾 1 +弓弦 1 +弓箭 1 +引來 1 +引咎 1 +引導 1 +引江 1 +引渡 1 +引申 1 +引資 1 +弗拉格斯塔夫 1 +弗朗丹 1 +弗朗恰 1 +弗朗索 1 +弗朗西絲 1 +弗格森 1 +弗洛伊德 1 +弗特 1 +弗蘭克 1 +弗里德里希 1 +弗里施 1 +弗里茨 1 +弘 1 +弘前 1 +弘宣 1 +弭兵 1 +弱 1 +張家口 1 +張氏 1 +強勁 1 +強化 1 +強拍 1 +強暴 1 +強權 1 +強求 1 +強盜 1 +強迫 1 +強韌 1 +強項 1 +彈劾 1 +彈塗魚 1 +彈撥 1 +彈盡糧絕 1 +彌撒 1 +彌補 1 +彌賽亞 1 +彎曲 1 +彗差 1 +彗星 1 +彙編 1 +彝 1 +形像 1 +形同 1 +形體 1 +彥根 1 +彥直 1 +彩 1 +彩畫 1 +彩繪 1 +彩雲 1 +彩鳳 1 +彪馬 1 +彭劉楊 1 +彭博倫 1 +彭古魯 1 +彭定康 1 +彭拿路 1 +彰信 1 +影帝 1 +影線 1 +影評 1 +影迷 1 +影集 1 +影音 1 +彷彿 1 +役 1 +彼特 1 +往上 1 +往世 1 +往日 1 +征西 1 +待到 1 +很小 1 +很強 1 +很忙 1 +很懶 1 +很是 1 +很深 1 +很遠 1 +很重 1 +很長 1 +律定 1 +後世 1 +後代 1 +後勤 1 +後南 1 +後周 1 +後宮 1 +後庄 1 +後悔 1 +後援 1 +後梁 1 +後段 1 +後母 1 +後稱 1 +後續 1 +後置 1 +後藤 1 +後送 1 +後防 1 +後齒 1 +徒具 1 +徒手 1 +得克薩斯 1 +得心應手 1 +得悉 1 +得獎 1 +得益 1 +從來 1 +從句 1 +從周 1 +從善如流 1 +從政 1 +御史 1 +御墨 1 +御宅 1 +御窯 1 +復健 1 +復合 1 +復寫 1 +復甦 1 +循道 1 +微型 1 +微妙 1 +微小 1 +微波 1 +微粒 1 +微粒體 1 +微觀 1 +微量 1 +徵兆 1 +徵招 1 +徵祥 1 +德勝 1 +德國牧羊犬 1 +德妃 1 +德宏德特 1 +德富卡 1 +德干 1 +德愛 1 +德懷 1 +德拉瓦 1 +德文 1 +德比 1 +德江 1 +德爾 1 +德爾加多 1 +德爾斐 1 +德甲 1 +德高 1 +德魯茲 1 +徽 1 +徽章 1 +心境 1 +心宿 1 +心意 1 +心智 1 +心目 1 +心肌 1 +必和必拓 1 +必走 1 +必需 1 +忍心 1 +忍氣吞聲 1 +志 1 +志摩 1 +志明 1 +志道 1 +忘 1 +忘記 1 +忙 1 +忠 1 +忠於 1 +忠誠 1 +快上 1 +快捷 1 +快綫 1 +忽 1 +忽視 1 +怎 1 +怒 1 +怕 1 +思侯 1 +思成 1 +思維 1 +思考 1 +怡 1 +急劇 1 +急忙 1 +急救 1 +急於 1 +急流 1 +急症 1 +急行 1 +性向 1 +性命 1 +性情 1 +性腺 1 +怪 1 +怪圈 1 +怪聲 1 +恆 1 +恆大 1 +恆德 1 +恆河 1 +恐嚇 1 +恐懼 1 +恢豐 1 +恣意 1 +恤 1 +恨 1 +恩南伽 1 +恩慈 1 +恩秀 1 +恩贈 1 +恭子 1 +息率 1 +悉心 1 +悉達多 1 +悟到 1 +悟空 1 +患 1 +患得患失 1 +患病 1 +您 1 +悲傷 1 +悲劇 1 +悲嘆 1 +悲慘 1 +悲痛 1 +悲痛欲絕 1 +悲鴻 1 +悼念 1 +情 1 +情不自禁 1 +情人 1 +情勢 1 +情愁 1 +情愛 1 +情景 1 +情結 1 +情誼 1 +情資 1 +情陷 1 +情願 1 +惇曧 1 +惟 1 +惠亞 1 +惠梨香 1 +惠特蘭 1 +惡 1 +惡人 1 +惡化 1 +惡夢 1 +惡性 1 +惡搞 1 +惡臭 1 +惡靈 1 +惡魔 1 +想必 1 +想起 1 +愈加 1 +愈大 1 +愈高 1 +愉快 1 +意圖 1 +意念 1 +意料 1 +意甲 1 +意魔 1 +愙威 1 +愚園 1 +愚昧 1 +愛好 1 +愛娜 1 +愛娜茲薇 1 +愛思德 1 +愛恨 1 +愛意 1 +愛慕 1 +愛明內斯庫 1 +愛樂 1 +愛河 1 +愛莎尼亞 1 +愛迪生 1 +愛默生 1 +感冒 1 +感謝 1 +慈湖 1 +慈濟 1 +慌亂 1 +慎 1 +慎太郎 1 +慕容 1 +慕肯 1 +慘叫 1 +慘重 1 +慚愧 1 +慢行 1 +慢駛 1 +慧嫻 1 +慰安 1 +慶 1 +慶典 1 +慶曆 1 +慶貽 1 +慶黎 1 +慷慨 1 +憂 1 +憂憤 1 +憲政 1 +憲民 1 +憲法 1 +憶蓮 1 +懂 1 +應付 1 +應允 1 +應屆 1 +應戰 1 +應昌 1 +應當 1 +應許 1 +應邀 1 +懲罰 1 +懶爪龍 1 +懷 1 +懷仁 1 +懷克里夫 1 +懷念 1 +懷慶 1 +懷抱 1 +懷水 1 +懷聖 1 +懸掛 1 +懼高 1 +懿 1 +戀人 1 +戀屍 1 +戀童 1 +戈德曼 1 +戈爾 1 +戈登 1 +戈矛 1 +戈蘭 1 +成事 1 +成仁 1 +成化 1 +成名 1 +成品 1 +成套 1 +成對 1 +成形 1 +成梁 1 +成行 1 +成語 1 +我國 1 +截 1 +截然不同 1 +截至 1 +截頜鯉 1 +戰事 1 +戰力 1 +戰勝 1 +戰地 1 +戰平 1 +戰情 1 +戰船 1 +戲子 1 +戲曲 1 +戲法 1 +戲碼 1 +戲謔 1 +戲院 1 +戴上 1 +戴克里先 1 +戴斯德 1 +戴爾馬 1 +戴維斯 1 +戴蒙 1 +戴頓 1 +戶田 1 +戶籍 1 +房東 1 +所為 1 +所長 1 +手上 1 +手工 1 +手感 1 +手抄 1 +手指 1 +手提 1 +手槍 1 +手稿 1 +手筆 1 +手腳 1 +手邊 1 +手風 1 +才子 1 +才是 1 +才智 1 +扎什倫布 1 +打亂 1 +打人 1 +打包 1 +打撈 1 +打死 1 +打水 1 +打牌 1 +打碎 1 +打造 1 +打響 1 +扔出 1 +托倫 1 +托加下 1 +托洛洛 1 +托盤 1 +托米 1 +托茂 1 +扣上 1 +批次 1 +扼止 1 +找來 1 +找續 1 +承天 1 +承德 1 +承接 1 +承斌 1 +承租 1 +技師 1 +技戰術 1 +技法 1 +抑制 1 +抑鬱 1 +抒解 1 +抓到 1 +投交 1 +投奔 1 +投標 1 +投球 1 +投身 1 +投靠 1 +抗大 1 +抗拒 1 +抗衡 1 +抗體 1 +折射 1 +折斷 1 +折衷 1 +抨擊 1 +披覆 1 +披頭士 1 +抬昇 1 +抱 1 +抱持 1 +抵受 1 +抵禦 1 +押韻 1 +抽檢 1 +抽煙 1 +抽象 1 +抽走 1 +拆分 1 +拆卸 1 +拆掉 1 +拆遷 1 +拉 1 +拉什沃思 1 +拉卜楞 1 +拉塞爾 1 +拉多加 1 +拉奏 1 +拉姆齊 1 +拉差諾 1 +拉布 1 +拉彼魯茲 1 +拉日色布 1 +拉林 1 +拉森 1 +拉爾夫 1 +拉特蘭 1 +拉珀斯維爾 1 +拉瑙 1 +拉籌伯 1 +拉美西斯 1 +拉薩 1 +拉西拉 1 +拉赫曼尼諾夫 1 +拋棄 1 +拋物 1 +拍 1 +拍照 1 +拍賣 1 +拒不 1 +拓務 1 +拓建 1 +拓撲 1 +拔刀 1 +拖進 1 +拖鞋 1 +拙劣 1 +招 1 +招潮蟹 1 +招生 1 +招聘 1 +招降 1 +拜仁慕尼黑 1 +拜拜 1 +括弧 1 +拱廊 1 +拱橋 1 +拳一 1 +拳擊 1 +拳賽 1 +拷問 1 +拼寫 1 +拾糞 1 +拿來 1 +持久 1 +持球 1 +指使 1 +指標 1 +指派 1 +指稱 1 +指責 1 +挑選 1 +挖 1 +挖子 1 +挖掘 1 +挪動 1 +挪用 1 +振 1 +振動 1 +振幅 1 +振林 1 +挹江 1 +挺身而出 1 +挽回 1 +挾持 1 +捉弄 1 +捉拿 1 +捉襟見肘 1 +捍衛 1 +捐 1 +捐款 1 +捐獻 1 +捕撈 1 +捕殺 1 +捕獵 1 +捕魚 1 +捕鼠 1 +捲入 1 +捷徑 1 +授勳 1 +授意 1 +授權 1 +授與 1 +掉頭 1 +掌 1 +掌控 1 +掌摑 1 +掌權 1 +掌鏡 1 +排場 1 +排外 1 +排序 1 +掙扎 1 +掛 1 +掛果 1 +掛牌 1 +掛鉤 1 +掠奪 1 +採 1 +採信 1 +採摘 1 +採樣 1 +採納 1 +採購 1 +採集 1 +採食 1 +探明 1 +探望 1 +探求 1 +探究 1 +探險 1 +接到 1 +接力 1 +接班 1 +接納 1 +接聽 1 +接見 1 +接辦 1 +接送 1 +接連 1 +控告 1 +控訴 1 +推介 1 +推免生 1 +推前 1 +推力 1 +推導 1 +推斷 1 +推測 1 +推演 1 +推特 1 +推理 1 +推舉 1 +推論 1 +推遲 1 +掩 1 +掩蓋 1 +描摹 1 +描繪 1 +提前 1 +提問 1 +提子 1 +提康德羅加 1 +提拔 1 +提攜 1 +提昇 1 +提煉 1 +提督 1 +提籃 1 +提醒 1 +插手 1 +插曲 1 +揚言 1 +換成 1 +換算 1 +握帶 1 +握持 1 +揭曉 1 +揭發 1 +揭開 1 +揮舞 1 +援 1 +援助 1 +援外 1 +援引 1 +援手 1 +援救 1 +搜尋 1 +搜狐 1 +搜羅 1 +搜集 1 +搞垮 1 +搞錯 1 +搬動 1 +搬往 1 +搬移 1 +搬遷 1 +搭乘 1 +搭配 1 +搶 1 +搶先 1 +搶劫 1 +搶奪 1 +搶救 1 +摒棄 1 +摔 1 +摘下 1 +摘星 1 +摘錄 1 +摧毀 1 +摩加迪沙 1 +摩天 1 +摩崖 1 +摩托 1 +摩擦 1 +摩爾多瓦 1 +摩登 1 +摩納哥 1 +摩西 1 +摯友 1 +摸摸 1 +撒拉 1 +撒營盤 1 +撞入 1 +撞死 1 +撤回 1 +撤職 1 +撤退 1 +撤除 1 +撥 1 +撥出 1 +撥號 1 +撫養 1 +播種 1 +撮合 1 +撰述 1 +撲克 1 +撿 1 +撿起 1 +擁 1 +擁堵 1 +擁戴 1 +擁擠 1 +擁護 1 +擂台 1 +擊中 1 +擊劍 1 +擊斃 1 +擊毀 1 +擊潰 1 +擊破 1 +擋住 1 +操 1 +操控 1 +操縱 1 +擒拿 1 +擔憂 1 +擔竿 1 +擔綱 1 +據傳 1 +據此 1 +據稱 1 +據點 1 +擠塞 1 +擠壓 1 +擠奶 1 +擠眉弄眼 1 +擠迫 1 +擢升 1 +擬 1 +擬桿菌 1 +擬訂 1 +擬議 1 +擴散 1 +擴編 1 +擺弄 1 +擺渡 1 +擾亂 1 +攀爬 1 +攔截 1 +攝像 1 +攝取 1 +攪拌 1 +支取 1 +支廳 1 +支派 1 +支那 1 +支隊 1 +收場 1 +收容 1 +收市 1 +收支 1 +收生 1 +收益 1 +收租 1 +收緊 1 +收聽 1 +收買 1 +收費 1 +收養 1 +攸之 1 +改作 1 +改屬 1 +改投 1 +改採 1 +改換 1 +改派 1 +改發 1 +改穿 1 +改組 1 +改選 1 +改隸 1 +攻下 1 +攻勢 1 +攻堅 1 +攻方 1 +攻殺 1 +攻訐 1 +攻讀 1 +放任 1 +放入 1 +放出 1 +放到 1 +放大 1 +放榜 1 +放牧 1 +放緩 1 +放送 1 +放逐 1 +放開 1 +放鬆 1 +政團 1 +政委 1 +政局 1 +政廳 1 +政敵 1 +政樞 1 +政法 1 +政爭 1 +政界 1 +故郷 1 +效尤 1 +效能 1 +敏銳 1 +救人 1 +救出 1 +救助 1 +救國 1 +救援 1 +救星 1 +救災 1 +救生 1 +救贖 1 +敕 1 +敕令 1 +敕書 1 +敗 1 +敗局 1 +敗死 1 +敗瓦 1 +敗退 1 +教務 1 +教士 1 +教室 1 +教席 1 +教材 1 +教案 1 +教科 1 +教籍 1 +教總 1 +教義 1 +教職員 1 +散射 1 +敦 1 +敦煌 1 +敬仰 1 +敬堯 1 +敬請 1 +敲擊 1 +敲訂 1 +整 1 +整塊 1 +整所 1 +整架 1 +整片 1 +整篇 1 +整軍 1 +整顆 1 +整齊 1 +敵兵 1 +敵方 1 +數以千計 1 +數值 1 +數十億 1 +數十萬 1 +數澤 1 +數百 1 +數碼 1 +數萬 1 +數論 1 +文哲 1 +文姬 1 +文岳 1 +文巨 1 +文德 1 +文摘 1 +文政 1 +文書 1 +文本 1 +文楷 1 +文武 1 +文法 1 +文清 1 +文職 1 +文賢 1 +文集 1 +文飾曲口魚 1 +文體 1 +文體教 1 +斑塊 1 +斑點 1 +斗貴子 1 +料 1 +斜 1 +斜坡 1 +斥教 1 +斬落 1 +斯佩克特 1 +斯凱勒 1 +斯哥特 1 +斯坦利 1 +斯坦福 1 +斯坦頓 1 +斯基龍 1 +斯塔茨門 1 +斯尼夫魯 1 +斯德哥爾摩 1 +斯托克 1 +斯氏亞冠龍 1 +斯洛伐克 1 +斯洛特 1 +斯特奇斯 1 +斯特萊默 1 +斯瓦爾恩 1 +斯科特 1 +斯維亞托斯拉夫 1 +斯里賽拉姆古德姆德瓦斯塔納姆 1 +新任 1 +新修 1 +新址 1 +新埔 1 +新太郎 1 +新奧爾良 1 +新字 1 +新寧 1 +新屋 1 +新巴 1 +新思 1 +新昌 1 +新明 1 +新春 1 +新月 1 +新核 1 +新榮 1 +新民 1 +新浪 1 +新版 1 +新生 1 +新秀 1 +新篇 1 +新編 1 +新罕布夏 1 +新罕布希爾 1 +新義 1 +新舊 1 +新製 1 +新開 1 +新飛 1 +新馬 1 +新高 1 +新鴻基 1 +新黨 1 +斷後 1 +斷盡 1 +斷言 1 +方丈 1 +方尖 1 +方正 1 +方田 1 +方石 1 +方程 1 +方蓋 1 +方蟹 1 +於維西 1 +施奈德 1 +施文 1 +施瓦本 1 +施用 1 +施韋比施哈爾 1 +旅 1 +旅居 1 +旅程 1 +旋渦 1 +旋轉 1 +族雄 1 +族頭 1 +旗艦 1 +旗面 1 +既得 1 +既是 1 +既然 1 +日出 1 +日向 1 +日夜 1 +日子 1 +日日 1 +日照 1 +日用 1 +日落 1 +日誌 1 +日賜 1 +旦增 1 +早有 1 +早餐 1 +旭 1 +旱災 1 +旻寧 1 +昆丁 1 +昆蟲 1 +昌吉 1 +昌都 1 +明中 1 +明亞 1 +明亮 1 +明代 1 +明宗 1 +明尼蘇達 1 +明憲 1 +明昌 1 +明智 1 +明正 1 +明潭 1 +明白 1 +明碁 1 +明視 1 +易卜拉欣 1 +易守 1 +易幟 1 +易斯 1 +易水 1 +易燃 1 +易經 1 +昔蘭尼 1 +星團 1 +星塵 1 +星展 1 +星崎 1 +星系 1 +映像 1 +春 1 +春丕 1 +春季 1 +春日井 1 +春會 1 +春田 1 +春節 1 +春緋 1 +春耕 1 +昨日 1 +昭侯 1 +昭儀 1 +昭宗 1 +昭禮 1 +昭通 1 +是年 1 +是方 1 +是次 1 +時事 1 +時份 1 +時值 1 +時光 1 +時刻 1 +時報 1 +時弊 1 +時稱 1 +時舉 1 +時針 1 +晃動 1 +晉 1 +晉北 1 +晉哲 1 +晉江 1 +晉級 1 +晒乾 1 +晨間 1 +普世 1 +普什圖 1 +普伊瑪諾娃 1 +普利茅斯 1 +普朗克 1 +普爾塔龍 1 +景泰 1 +晴神 1 +晶 1 +晶瑩 1 +晶閘 1 +智伯 1 +智利 1 +智趣 1 +暑期 1 +暖 1 +暗中 1 +暗喻 1 +暗影 1 +暗房 1 +暗指 1 +暗礁 1 +暗紅 1 +暗號 1 +暫 1 +暫別 1 +暫無 1 +暮光 1 +暱稱 1 +暴亂 1 +暴斂 1 +暴死 1 +暴風雪 1 +暹羅 1 +曄之 1 +曉彬 1 +曉得 1 +曉聲 1 +曉舟 1 +曖昧 1 +曬相 1 +曬衣 1 +曲張 1 +曲率 1 +曲目 1 +曲線 1 +曲藝 1 +曲阜 1 +曲頜形翼龍 1 +更低 1 +更佳 1 +更大 1 +更審 1 +更小 1 +更強 1 +更快 1 +更新世 1 +更是 1 +更硬 1 +更衣 1 +更輕 1 +更長 1 +曷懶甸 1 +書本 1 +書裡 1 +書迷 1 +書面 1 +書香世家 1 +曹家 1 +曹甸 1 +曹記 1 +曼切華 1 +曼哈頓 1 +曼城 1 +曼寧 1 +曼徹斯特 1 +曼成 1 +曼斯菲爾德 1 +曼海姆 1 +曼涅托 1 +曼玉 1 +曼科 1 +曾任 1 +曾孫 1 +曾愛 1 +曾祖父母 1 +替人 1 +最內 1 +最前 1 +最受 1 +最外 1 +最強 1 +最旺 1 +最最 1 +最末 1 +最東 1 +最純 1 +最遠 1 +會上 1 +會址 1 +會師 1 +會戰 1 +會所 1 +會晤 1 +會章 1 +會見 1 +會計 1 +月色 1 +月薪 1 +有份 1 +有別 1 +有力 1 +有名 1 +有愛 1 +有方 1 +有期 1 +有染 1 +有條不紊 1 +有異 1 +有病 1 +有稱 1 +有花 1 +有點 1 +服刑 1 +朔 1 +朗豪 1 +朗頓 1 +望族 1 +朝下 1 +朝元 1 +朝政 1 +朝散 1 +朝東 1 +朝聖 1 +朝覲 1 +朝貢 1 +朝陽 1 +期刊 1 +木中 1 +木乃伊 1 +木刻 1 +木卡姆 1 +木城 1 +木尼 1 +木屋 1 +木工 1 +木戶 1 +木斯塘 1 +木村 1 +木櫾 1 +木蘭 1 +木造 1 +未入 1 +未敢 1 +未有 1 +未深 1 +未滿 1 +末端 1 +本劇 1 +本名 1 +本城 1 +本始 1 +本季 1 +本島 1 +本市 1 +本德 1 +本書 1 +本營 1 +本目 1 +本省 1 +本社 1 +本縣 1 +本能 1 +本著 1 +本郡 1 +本部 1 +本鄉 1 +本集 1 +本領 1 +札幌 1 +朱里 1 +朴次茅斯 1 +杉並 1 +李察 1 +杏子 1 +材 1 +材官 1 +材質 1 +村旁 1 +杖責 1 +杜乃爾 1 +杜伊 1 +杜利華 1 +杜成 1 +杜浦 1 +杜甫 1 +杜蘭戈維多利亞 1 +杜隆坦 1 +束 1 +杯賽 1 +杰仔 1 +東主 1 +東加 1 +東勝 1 +東南亞 1 +東坡 1 +東姑 1 +東宮 1 +東尼 1 +東岸 1 +東巡 1 +東急 1 +東支 1 +東昇 1 +東映 1 +東桑 1 +東條 1 +東武 1 +東涌 1 +東渡 1 +東直 1 +東站 1 +東興 1 +東華 1 +東西向 1 +東距 1 +東道 1 +東邊 1 +東郊 1 +東鄉 1 +東鐵 1 +東隧 1 +東風 1 +松下 1 +松坂 1 +松山 1 +松島 1 +松州 1 +松翔 1 +松花 1 +松鼠 1 +板 1 +板式 1 +林克 1 +林地 1 +林場 1 +林業 1 +林檎 1 +林翼 1 +林胡 1 +果然 1 +果真 1 +果酒 1 +枝葉 1 +架次 1 +枸杞 1 +柏 1 +柏加 1 +柏村 1 +柏松 1 +柏臣 1 +染手 1 +染病 1 +柔道 1 +柚木 1 +柝聲 1 +查找 1 +查普曼 1 +查氏 1 +查爾頓 1 +查理曼 1 +柬 1 +柬埔寨 1 +柯克伍德 1 +柯林斯 1 +柯爾 1 +柯爾克孜 1 +柯爾貝爾 1 +柱銘 1 +柳川 1 +柳州 1 +柳德米拉 1 +柳葉魚 1 +柴電 1 +柿本 1 +栗橋 1 +校呔 1 +校簿 1 +校門 1 +栩栩如生 1 +株 1 +株式 1 +核孔 1 +核實 1 +核工 1 +核彈 1 +核發 1 +核研 1 +核算 1 +根 1 +根培烏孜 1 +根深柢固 1 +根生 1 +根莖 1 +根部 1 +格丁尼亞 1 +格仔 1 +格但斯克 1 +格來 1 +格勞庇烏 1 +格勞賓登 1 +格奧爾格 1 +格子 1 +格式塔 1 +格拉博夫斯基 1 +格拉漢姆 1 +格林威治 1 +格林布希 1 +格羅先 1 +格羅夫納 1 +格羅希 1 +格蘭特 1 +格陵蘭 1 +格魯 1 +格魯瓊茲與姆瓦瓦 1 +桂陵 1 +桃子 1 +框架 1 +框線 1 +案例 1 +案達羅 1 +桐生 1 +桑德威斯狸藻 1 +桑托斯 1 +桓子 1 +桓玄 1 +梁贊諾夫 1 +梁龍 1 +梅園 1 +梅塔 1 +梅塔拉 1 +梅帕器 1 +梅里納 1 +梓里 1 +條款 1 +條紋 1 +梧州 1 +梨花 1 +梭羅 1 +梯隊 1 +梳 1 +梳頜翼龍 1 +梵安 1 +棉條 1 +棋局 1 +棋盤 1 +棋聖 1 +棋院 1 +棋類 1 +棒 1 +棒錘樹 1 +棕色 1 +棕褐 1 +森德靈 1 +棲地 1 +棲身 1 +棵 1 +植株 1 +椎名 1 +椰林 1 +楓樹 1 +楚克 1 +楚瑜 1 +楚紅 1 +楠桂 1 +楠溪 1 +業主 1 +業餘 1 +極北 1 +極區 1 +極少 1 +極為 1 +極矮 1 +極長 1 +極闊 1 +極限 1 +楷書 1 +楷模 1 +概要 1 +榆林 1 +榔頭 1 +榕樹 1 +榜羅 1 +榨出 1 +榫眼 1 +榮廷 1 +榮洲 1 +榮茂 1 +榴彈 1 +構思 1 +構造 1 +槍尖 1 +槍尾 1 +槍殺 1 +槍術 1 +槳 1 +樂園 1 +樂安 1 +樂官 1 +樂山 1 +樂師 1 +樂手 1 +樂敏錠 1 +樂樂 1 +樂活 1 +樂翠 1 +樂觀 1 +樂趣 1 +樓宇 1 +樓層 1 +樓底 1 +樓煩 1 +樓盤 1 +樓面 1 +樓高 1 +標 1 +標售 1 +標志 1 +標明 1 +標有 1 +標示 1 +標籤 1 +標記 1 +標註 1 +標高 1 +樞密 1 +模里西斯 1 +樣 1 +樣品 1 +樣式 1 +樣貌 1 +樸實 1 +樹上 1 +樹幹 1 +樹枝 1 +橈腳 1 +橋上 1 +橋樑 1 +橋面 1 +機上 1 +機位 1 +機型 1 +機密 1 +機師 1 +機床 1 +機敏 1 +機械 1 +機理 1 +機種 1 +機能 1 +機製 1 +機遇 1 +橡樹 1 +橡樹龍 1 +橢 1 +橫 1 +橫帶 1 +橫徵 1 +橫渡 1 +橫線 1 +檔案 1 +檔次 1 +檜山 1 +檢驗 1 +檨仔林 1 +檳榔 1 +檸七 1 +櫃 1 +櫃檯 1 +櫟社 1 +欄目 1 +權氏 1 +權限 1 +次席 1 +次月 1 +次生 1 +次程 1 +欣快 1 +欺 1 +欽 1 +款式 1 +歆 1 +歌人 1 +歌壇 1 +歌星 1 +歌舞 1 +歌詞 1 +歌頌 1 +歐律狄刻 1 +歐斯巴特 1 +歐盟 1 +歐羅巴 1 +歐青 1 +歐麥爾 1 +歡 1 +歡慶 1 +歡樂 1 +正值 1 +正傳 1 +正夫 1 +正子 1 +正宇 1 +正巧 1 +正平 1 +正比 1 +正派 1 +正版 1 +正當 1 +正經 1 +正負粒子 1 +正配 1 +正陽 1 +此事 1 +此地 1 +此夢 1 +此書 1 +此樓 1 +此橋 1 +此片 1 +此處 1 +此語 1 +此起彼落 1 +此路 1 +此陵 1 +此項 1 +此魚 1 +步伐 1 +步蟾 1 +步行 1 +步驟 1 +武克希 1 +武力 1 +武威 1 +武帝 1 +武廟 1 +武廠 1 +武德 1 +武打 1 +武王 1 +武略 1 +武皇 1 +武者 1 +武藏 1 +歩 1 +歲月 1 +歷代 1 +歷來 1 +歷屬 1 +歷程 1 +歸來 1 +歸入 1 +歸到 1 +歸功 1 +歸咎 1 +歸案 1 +歸還 1 +歸附 1 +死刑 1 +死因 1 +死地 1 +死戰 1 +死期 1 +死板 1 +死狀 1 +死而復生 1 +死黨 1 +殉教 1 +殉爆 1 +殉職 1 +殊榮 1 +殘疾 1 +殘破 1 +殘遺 1 +殘部 1 +殲滅 1 +殺人 1 +殺手 1 +殺機 1 +殼層 1 +殼體 1 +殿堂 1 +毀壞 1 +毀容 1 +毅 1 +毅仁 1 +毅然 1 +母會 1 +母校 1 +母狼 1 +母猴 1 +母艦 1 +母語 1 +母貓 1 +毎年 1 +每元 1 +每座 1 +每戶 1 +每所 1 +每枚 1 +每每 1 +每股 1 +每邊 1 +每集 1 +每鼎 1 +毒​​物 1 +毒品 1 +毒死 1 +毒癮 1 +毒舌 1 +毓林 1 +毓楓 1 +毓芳 1 +比亞迪 1 +比亞韋斯托克 1 +比利 1 +比利牛斯 1 +比哈爾 1 +比喻 1 +比得哥什 1 +比方 1 +比武 1 +比薩 1 +比袍 1 +比褂 1 +毛色 1 +毛髮 1 +毫安 1 +毫無 1 +毯子 1 +氈幕 1 +民事 1 +民俗 1 +民力 1 +民居 1 +民工 1 +民心 1 +民意 1 +民房 1 +民柬 1 +民權 1 +民法 1 +民盟 1 +民答那峨 1 +民航 1 +民英 1 +民謠 1 +民豐 1 +民選 1 +民鐸 1 +民防 1 +氘 1 +氚 1 +氣息 1 +氣態 1 +氣憤 1 +氣旋 1 +氣槍 1 +氣死 1 +氣溫 1 +氣燄 1 +氣胸 1 +氣象 1 +氦 1 +氧化鐵 1 +氨基酸 1 +氫 1 +氫化氦 1 +氫氣 1 +氫鍵 1 +氮 1 +氮素 1 +氯化 1 +氯化氫 1 +氯化銠 1 +氯化鋁 1 +氯雷他定 1 +水世 1 +水份 1 +水圈 1 +水壓 1 +水床 1 +水扁 1 +水攻 1 +水晶 1 +水汽 1 +水流 1 +水火不容 1 +水球 1 +水產 1 +水療 1 +水翼 1 +水能 1 +水警 1 +水面 1 +水鳥 1 +永久 1 +永元 1 +永升 1 +永吉 1 +永和 1 +永壽 1 +永平 1 +永成 1 +永昌 1 +永樂 1 +永樂環 1 +永權 1 +永續 1 +永輝 1 +永靖 1 +汁液 1 +求 1 +求偶 1 +求出 1 +求助 1 +求問 1 +求婚 1 +求情 1 +求援 1 +求籤 1 +求醫 1 +汝寧 1 +汞柱 1 +江協 1 +江口 1 +江浙 1 +江海 1 +江源 1 +江漢 1 +江灣 1 +江谷 1 +江都 1 +江閣 1 +江魚 1 +池塘 1 +池田 1 +污損 1 +污點 1 +汪 1 +汪達 1 +汪達爾 1 +汲及 1 +決意 1 +決擇 1 +決然 1 +決裂 1 +汽油 1 +汽船 1 +沃奎茲 1 +沃季采 1 +沃州 1 +沃思 1 +沃斯托克 1 +沃爾 1 +沃羅涅日 1 +沈氏 1 +沉水 1 +沉迷 1 +沉重 1 +沉降 1 +沒能 1 +沒落 1 +沒錯 1 +沖之 1 +沖片 1 +沖走 1 +沙丘 1 +沙依 1 +沙崙 1 +沙巴 1 +沙普爾 1 +沙梁伐 1 +沙池 1 +沙洛蒙 1 +沙漠 1 +沙瓦納 1 +沙田 1 +沙畹 1 +沙蠶 1 +沙迦罕 1 +沙邦 1 +沙里亞 1 +河卡 1 +河圖 1 +河岸 1 +河心 1 +河段 1 +河漫 1 +河西 1 +油煙 1 +油田 1 +油菜 1 +油量 1 +油電 1 +治中 1 +治勲 1 +治勳 1 +治喪 1 +治國 1 +治學 1 +治水 1 +治理 1 +治軍 1 +沼 1 +沽渚 1 +沾解 1 +沿 1 +沿線 1 +沿襲 1 +沿途 1 +泉 1 +法令 1 +法師 1 +法拉利 1 +法拉龍 1 +法政 1 +法斯塔夫 1 +法格拿 1 +法比恩 1 +法海 1 +法登 1 +法羅 1 +法老 1 +法蘭克尼亞 1 +法西斯 1 +法輪 1 +泛濫 1 +泠 1 +波包 1 +波卡特洛 1 +波及 1 +波因 1 +波圖 1 +波城 1 +波塞冬 1 +波形 1 +波恩 1 +波折 1 +波普 1 +波森 1 +波爾 1 +波特威瑟 1 +波特蘭 1 +波瓦坦 1 +波的尼亞 1 +波西斯 1 +波錠 1 +波黑 1 +泥土 1 +泥潭 1 +注資 1 +泰 1 +泰共 1 +泰勒 1 +泰北 1 +泰始 1 +泰姬 1 +泰姬瑪哈 1 +泰州 1 +泰曾 1 +泰然 1 +泰琳達 1 +泰米爾納德 1 +泰興 1 +泳屋 1 +泳灘 1 +洋介 1 +洗劫 1 +洗衣 1 +洛佩斯 1 +洛加尼斯 1 +洛城 1 +洛夫喬伊 1 +洛夫森 1 +洛布尼亞 1 +洛恩 1 +洛書 1 +洛珊 1 +洛維爾 1 +洛茲 1 +洛雷托 1 +洞子 1 +洞穴 1 +洞窟 1 +津 1 +津貼 1 +洩慾 1 +洩漏 1 +洪堡 1 +洪家 1 +洪橋 1 +洵 1 +洵美 1 +活出 1 +活化 1 +活埋 1 +活水 1 +活潑 1 +活用 1 +活躍 1 +活靈活現 1 +派對 1 +派往 1 +流 1 +流下 1 +流亡 1 +流入 1 +流出 1 +流嶼 1 +流放 1 +流星 1 +流標 1 +流民 1 +流水 1 +流浪 1 +流產 1 +流程 1 +流言 1 +流逝 1 +流露 1 +浚稽 1 +浦市 1 +浦那 1 +浦鎮 1 +浪 1 +浪漫 1 +浪潮 1 +浪費 1 +浪跡 1 +浮 1 +浮動 1 +浴場 1 +海事 1 +海光 1 +海因茨 1 +海地 1 +海峰 1 +海布隆 1 +海平 1 +海廷 1 +海德克 1 +海怡 1 +海昌 1 +海景 1 +海淀 1 +海港 1 +海濱 1 +海灘 1 +海爾賽 1 +海神 1 +海秀 1 +海老名 1 +海航 1 +海藍 1 +海螺 1 +海豐 1 +海陸 1 +海風 1 +海鷗 1 +浸染 1 +浸泡 1 +涅爾皮奇耶 1 +涇波 1 +涇陽 1 +消極 1 +消耗 1 +消退 1 +消除 1 +涉世 1 +涉嫌 1 +涉足 1 +涪江 1 +涮煮 1 +液 1 +液化 1 +液壓 1 +涵蓋 1 +淄川 1 +淑妃 1 +淑怡 1 +淘寶 1 +淘金 1 +淡 1 +淡定 1 +淡色 1 +淨土 1 +淪 1 +淪落 1 +淪陷 1 +淫蕩 1 +淮南 1 +淮許 1 +深受 1 +深埋 1 +深層 1 +深度 1 +深感 1 +深有 1 +深海 1 +深港 1 +深溪 1 +深紅 1 +深綠 1 +深色 1 +深處 1 +深造 1 +淵源 1 +混 1 +混亂 1 +混凝 1 +混沌 1 +混為一談 1 +混燃 1 +淹浸 1 +淺 1 +淺水 1 +淺綠 1 +添丁 1 +清償 1 +清凈 1 +清單 1 +清帝 1 +清拆 1 +清教 1 +清文 1 +清明 1 +清潔 1 +清理 1 +清道 1 +清遠 1 +清還 1 +清鄉 1 +減低 1 +減刑 1 +減小 1 +減退 1 +渠子 1 +渡 1 +渣打 1 +渤海 1 +測繪 1 +渭州 1 +港交 1 +港區 1 +港府 1 +渴求 1 +游 1 +游標 1 +游說 1 +渾 1 +湄洲 1 +湖上 1 +湖人 1 +湖名 1 +湖畔 1 +湘南 1 +湘西 1 +湘陰 1 +湛恩 1 +湧現 1 +湮滅 1 +湯姆萊利 1 +湯料 1 +源於 1 +源田 1 +準 1 +準基 1 +準將 1 +準確 1 +溝 1 +溝壑 1 +溝齒鼩 1 +溢漏 1 +溪 1 +溪水 1 +溪美 1 +溪鱂 1 +溫哥華 1 +溫坡 1 +溫布萊 1 +溫布頓 1 +溫徹斯特 1 +溫斯頓 1 +溫柔 1 +溫特夸特斯 1 +溫特斯 1 +溶劑 1 +溶氣 1 +滅 1 +滑板 1 +滑稽 1 +滑鼠 1 +滕氏 1 +滙業 1 +滬江 1 +滯洪 1 +滲出 1 +滴下 1 +滾動 1 +滾石 1 +滿意 1 +滿清 1 +滿載 1 +漁村 1 +漁梁 1 +漁船 1 +漂浮 1 +漆器 1 +演 1 +演成 1 +演戲 1 +演技 1 +演繹 1 +演義 1 +演講 1 +漢中 1 +漢娜 1 +漢字 1 +漢桓 1 +漫漶 1 +漫長 1 +漬 1 +漱芳 1 +漲幅 1 +漸變 1 +漸趨 1 +潑 1 +潔瑩 1 +潘丘 1 +潘恩 1 +潛伏 1 +潛力 1 +潛望 1 +潛水 1 +潛游 1 +潟湖 1 +潢川 1 +潭村 1 +潭東 1 +潭陽 1 +潰散 1 +澀谷 1 +澤尻 1 +激勵 1 +激發 1 +激素 1 +激進 1 +濁 1 +濃 1 +濃厚 1 +濃煙 1 +濕地 1 +濟 1 +濟世 1 +濟科 1 +濟邦 1 +濤 1 +濫用 1 +濱海 1 +濾掉 1 +瀏陽 1 +瀕危 1 +瀘溪 1 +瀝泗 1 +瀟洒 1 +火上加薪 1 +火候 1 +火喉 1 +火山 1 +火心 1 +火掌 1 +火炮 1 +火爆 1 +火鍋 1 +灰棕 1 +灰雲 1 +灰黑 1 +災禍 1 +炎熱 1 +炙手可熱 1 +炭疽 1 +炮 1 +炸彈 1 +炸死 1 +炸毀 1 +炸糕 1 +為時 1 +烈格司 1 +烏代 1 +烏來杜鵑 1 +烏孜別克 1 +烏宗哈珊 1 +烏干達 1 +烏德特 1 +烏扎 1 +烏拉圭 1 +烏普薩拉 1 +烏腳 1 +烏魯木齊 1 +烴 1 +烹煮 1 +焊接 1 +焗豆 1 +焚 1 +焚屍 1 +焚燒 1 +焜耀 1 +無俚頭 1 +無危 1 +無厭 1 +無子 1 +無家可歸 1 +無心 1 +無忌 1 +無所不能 1 +無暇 1 +無有 1 +無機 1 +無氧 1 +無水氯化鋁 1 +無派 1 +無產 1 +無疑 1 +無盡 1 +無罪 1 +無能為力 1 +無與倫比 1 +無色 1 +無處 1 +無視 1 +無誤 1 +無過 1 +無量壽 1 +無關緊要 1 +無限 1 +無雙 1 +無頭 1 +無點 1 +無齒龍 1 +焦尼 1 +焦點 1 +煉油 1 +煉金 1 +煙 1 +煙囪 1 +煙槍 1 +煙霧 1 +煜全 1 +煤建 1 +煤氣 1 +煥 1 +煦 1 +照射 1 +煮 1 +煮食 1 +煽動 1 +熄匙 1 +熊族 1 +熊本 1 +熊隊 1 +熏烤 1 +熏陶 1 +熔化 1 +熔岩 1 +熟知 1 +熟釜 1 +熱值 1 +熱刺 1 +熱力 1 +熱心 1 +熱愛 1 +熱羅姆 1 +熱身 1 +熱量 1 +熱電 1 +熱鬧 1 +熾熱 1 +燁 1 +燃氣 1 +燈謎 1 +燒灼 1 +燒荒 1 +燕 1 +燕窩 1 +營口 1 +營團 1 +營地 1 +營寨 1 +營帳 1 +營火 1 +營造 1 +營長 1 +營養 1 +燦爛 1 +燭光 1 +燾 1 +爐 1 +爪部 1 +爬到 1 +爬山 1 +爬梯 1 +爭冠 1 +爭占 1 +爭吵 1 +爭奪 1 +爭寵 1 +爭得 1 +爭界 1 +爭相 1 +爭端 1 +爭競 1 +爭論 1 +爭鬥 1 +父風 1 +爸爸 1 +爺 1 +爺爺 1 +爽文 1 +爾炘 1 +牆 1 +牆上 1 +牆身 1 +牆面 1 +片劑 1 +片尾 1 +片斷 1 +片頭 1 +版主 1 +版畫 1 +牌照 1 +牙籤 1 +牙線 1 +牙薩克 1 +牙醫 1 +牛池 1 +牛潭尾 1 +牛石 1 +牛首 1 +牛鼻栓 1 +牟 1 +牟利 1 +牟合 1 +牠 1 +牡蠣 1 +牧 1 +牧區 1 +牧民 1 +牧羊 1 +牧谷 1 +物件 1 +物產 1 +物象 1 +物鏡 1 +物阜 1 +牲畜 1 +特備 1 +特優 1 +特務 1 +特區 1 +特工 1 +特快 1 +特意 1 +特拉華 1 +特攝 1 +特派 1 +特爾瑪 1 +特瓦史塔 1 +特產 1 +特異 1 +特菲爾 1 +特重 1 +特隆赫姆 1 +特雷格羅恩 1 +牽引 1 +牽牛花 1 +犧牲 1 +犬科 1 +犬種 1 +犬髖 1 +犯人 1 +狂亂 1 +狄 1 +狄拉克 1 +狐 1 +狐庸 1 +狡猾 1 +狸藻 1 +狹小 1 +狼人 1 +狼堡 1 +狼影 1 +狼群 1 +猜忌 1 +猜想 1 +猝死 1 +猴年 1 +猴群 1 +猶大 1 +獅子 1 +獎牌 1 +獎盃 1 +獨一無二 1 +獨具 1 +獨唱 1 +獨孤 1 +獨家 1 +獨有 1 +獨眠 1 +獨行 1 +獨資 1 +獲准 1 +獲判 1 +獲勳 1 +獲召 1 +獲悉 1 +獲授 1 +獲獎 1 +獲益 1 +獲薦 1 +獲選 1 +獲頒 1 +獵物 1 +獸人 1 +獸族 1 +獻 1 +獻上 1 +獻堂 1 +獻策 1 +獻議 1 +玄天 1 +玄宗 1 +玄武 1 +玄策 1 +玄貓 1 +玉柴 1 +玉純 1 +玉魔 1 +玉鳳花 1 +玉麟 1 +王儲 1 +王冠 1 +王墓 1 +王宮 1 +王座 1 +王爾德 1 +王蓮 1 +玩伴 1 +玩弄 1 +玩法 1 +玩笑 1 +玫瑰 1 +玲玲 1 +玷染 1 +珀斯 1 +珍寶 1 +珠 1 +珠璣 1 +珠鋼 1 +班克斯 1 +班卓 1 +班子 1 +班布里奇 1 +班機 1 +班次 1 +班禪 1 +班級 1 +現役 1 +現身 1 +球壇 1 +球差 1 +球星 1 +球根 1 +球狀 1 +球道 1 +球面 1 +琅 1 +理性 1 +理由 1 +琦 1 +琬 1 +琳 1 +琳達 1 +琴弓 1 +琺琅 1 +瑋 1 +瑛 1 +瑜伽 1 +瑞普肯 1 +瑞欽 1 +瑞霖 1 +瑟洛 1 +瑣法 1 +瑪 1 +瑪利 1 +瑪利亞路易莎 1 +瑪利歐 1 +瑪君龍 1 +瑪莉安 1 +瑪莎 1 +瑪麗特 1 +瑾 1 +環保 1 +環帶 1 +環狀 1 +環節 1 +環繞 1 +瓊斯 1 +瓊珊 1 +瓘 1 +瓜里利亞 1 +瓦伊什維爾卡斯 1 +瓦伊杜 1 +瓦卡加 1 +瓦德 1 +瓦拉 1 +瓦薩 1 +瓦解 1 +瓦里奧 1 +甄別 1 +甘草 1 +甚厚 1 +甚嚴 1 +甚多 1 +甚小 1 +甚深 1 +甚篤 1 +甚至是 1 +甜兒 1 +甜度 1 +生主 1 +生出 1 +生動 1 +生天 1 +生子 1 +生平 1 +生性 1 +生效 1 +生機 1 +生殺 1 +生氣 1 +生火 1 +生肖 1 +生財之道 1 +生還 1 +產 1 +產出 1 +產經 1 +甦醒 1 +用人 1 +用來 1 +用光 1 +用兵 1 +用字 1 +用完 1 +用手 1 +用有 1 +用水 1 +用藥 1 +用計 1 +用詞 1 +甬 1 +田園 1 +田地 1 +田心 1 +田納西 1 +田野 1 +田頭 1 +甲山 1 +甲殼 1 +申辦 1 +男人 1 +男士 1 +男嬰 1 +男方 1 +男童 1 +界定 1 +界限 1 +畔 1 +留傳 1 +留哥 1 +留待 1 +留空 1 +留聲 1 +留良 1 +畜牧 1 +畜養 1 +畢打 1 +畢氏 1 +畢蘭德拉 1 +畢馬威 1 +略帶 1 +略有 1 +略為 1 +畫下 1 +畫中 1 +畫分 1 +畫會 1 +畫畫 1 +畫面 1 +異事 1 +異姓 1 +異度 1 +異形 1 +異曲同工 1 +異母 1 +異端 1 +當上 1 +當下 1 +當值 1 +當官 1 +當屆 1 +當政 1 +當晚 1 +當期 1 +當歸 1 +當面 1 +疆域 1 +疏浚 1 +疏遠 1 +疑 1 +疑點 1 +疙瘩 1 +疲勞 1 +疲弱 1 +疼痛 1 +病原 1 +病患 1 +病情 1 +病歷 1 +病死 1 +病重 1 +症候 1 +症狀 1 +痕跡 1 +痙攣 1 +痛心疾首 1 +痢疾 1 +痰 1 +瘦 1 +瘧疾 1 +癌 1 +癖 1 +癥狀 1 +登 1 +登丹 1 +發 1 +發佈 1 +發作 1 +發兵 1 +發呆 1 +發奮 1 +發揚光大 1 +發改委 1 +發放 1 +發洩 1 +發炎 1 +發燒 1 +發牌 1 +發球 1 +發病 1 +發聲 1 +發財 1 +發車 1 +發配 1 +白丁 1 +白井 1 +白公 1 +白利南 1 +白化 1 +白堊 1 +白天 1 +白宮 1 +白砂 1 +白蓮 1 +白蛇 1 +白軍 1 +白金 1 +白銅 1 +白陵 1 +白雲 1 +白面 1 +白頸長尾雉 1 +白鹿 1 +白麗 1 +百事 1 +百代 1 +百億 1 +百兆 1 +百帕斯卡 1 +百廢待舉 1 +百濟 1 +百無聊賴 1 +百老匯 1 +百花齊放 1 +百萬 1 +百貨 1 +百餘 1 +百鳴 1 +的士 1 +的確 1 +的黎波里 1 +皇位 1 +皇冠 1 +皇城 1 +皇太極 1 +皇妃 1 +皇廷 1 +皇權 1 +皇發 1 +皈依 1 +皋 1 +皓 1 +皓若 1 +皮亞韋 1 +皮克爾 1 +皮內羅洛 1 +皮特 1 +皮特凱恩 1 +皮耶特普拉桑克穆斯特魯 1 +皮雅福斯 1 +皰疹 1 +盆地 1 +盈盈 1 +益 1 +益城 1 +益新 1 +益處 1 +盔甲 1 +盛事 1 +盛大 1 +盛妝 1 +盛揮 1 +盛產 1 +盛行 1 +盜用 1 +盟 1 +盟軍 1 +盡到 1 +盡喪 1 +盡情 1 +盡頭 1 +監工 1 +監控 1 +監測 1 +監禁 1 +監聽 1 +盤踞 1 +盧 1 +盧加 1 +盧溝 1 +盧瓦斯 1 +盧甘斯克 1 +盧福瓦 1 +盪 1 +目睹 1 +目鏡 1 +直勉 1 +直屬 1 +直覺 1 +直言 1 +直說 1 +直間 1 +相位 1 +相傳 1 +相容 1 +相差無幾 1 +相悖 1 +相應 1 +相挺 1 +相異 1 +相稱 1 +相約 1 +相繼 1 +相聲 1 +相若 1 +相處 1 +相見 1 +相較 1 +相通 1 +相速 1 +相鄰 1 +相間 1 +盾座苣苔 1 +盾系 1 +省務 1 +省思 1 +省油 1 +眉山 1 +看中 1 +看出 1 +看台 1 +看得 1 +看看 1 +看管 1 +看見 1 +看透 1 +看重 1 +真 1 +真光 1 +真北 1 +真名 1 +真好 1 +真希 1 +真木 1 +真核 1 +真相大白 1 +眯眼 1 +眷村 1 +眼下 1 +眼淚 1 +眼狀 1 +眼球 1 +眼皮 1 +眼神 1 +眾經 1 +眾說紛紜 1 +睡 1 +睡眠 1 +睡覺 1 +督撫 1 +睾丁蛋白 1 +睿 1 +睿智 1 +瞪羚 1 +瞬時 1 +瞭如指掌 1 +矗立 1 +矛 1 +矢口否認 1 +知府 1 +知曉 1 +知足 1 +短少 1 +短期 1 +短草 1 +短裙 1 +短詩 1 +短語 1 +短音 1 +短髮 1 +矮星 1 +石像 1 +石器 1 +石塊 1 +石材 1 +石湖 1 +石灰 1 +石牆 1 +石牌 1 +石頭門坎 1 +砂拉越 1 +砂漿 1 +砂紙 1 +砍伐 1 +砒霜 1 +研磨 1 +砝碼 1 +破損 1 +破滅 1 +破舊 1 +破落 1 +硝庫爾 1 +硝酸甘油片 1 +硫 1 +硫化氫 1 +硫化鉛 1 +硫酸銨 1 +硬幣 1 +碑亭 1 +碑刻 1 +碧波 1 +碧琴 1 +碰撞 1 +碳紙 1 +碳酸鎂 1 +確知 1 +確診 1 +碼 1 +磁性 1 +磐田 1 +磚室 1 +磨坊 1 +磨折 1 +磨槽 1 +磷化 1 +磷素 1 +磷酸 1 +礙 1 +礦場 1 +礦物 1 +礦石 1 +礦藏 1 +示人 1 +示愛 1 +社皮 1 +社論 1 +社長 1 +祁鏞 1 +祈願 1 +祐希 1 +祖 1 +祖上 1 +祖圭 1 +祖外公 1 +祖外婆 1 +祖宗 1 +祖籍 1 +神仙 1 +神偷 1 +神器 1 +神明 1 +神殿 1 +神社 1 +神秘果 1 +神籤 1 +神魔 1 +祠 1 +祥子 1 +票據 1 +票數 1 +祭司 1 +祭壇 1 +祭師 1 +祭物 1 +祭祀 1 +祭酒 1 +祿勸 1 +祿山 1 +禁煙 1 +禁用 1 +禁藥 1 +禁賽 1 +禍 1 +福克沙尼 1 +福安 1 +福康安 1 +福慧 1 +福池 1 +福清 1 +禕 1 +禪師 1 +禮堂 1 +禮濤 1 +禮炮 1 +禮物 1 +禱文 1 +禽流感 1 +秀實 1 +秀康 1 +秀怡 1 +秀珠 1 +私下 1 +私交 1 +私奔 1 +私宅 1 +私家 1 +私立 1 +私財 1 +秉國 1 +秋人 1 +秋山 1 +秋爽 1 +秋興 1 +秋香 1 +科多爾 1 +科屬 1 +科恩 1 +科教 1 +科朗 1 +科爾基斯 1 +科特 1 +科目 1 +秘指 1 +租予 1 +租務 1 +租地 1 +租戶 1 +租用 1 +秦城 1 +秦州 1 +秦晉之好 1 +秦朝 1 +秦石 1 +秩序 1 +移交 1 +移往 1 +移植 1 +移至 1 +移送 1 +稀釋 1 +稅項 1 +稍為 1 +稗官野史 1 +種內 1 +種名 1 +種子 1 +種屬 1 +稱海 1 +稱病 1 +稱銜 1 +稻子 1 +稻草 1 +稼祥 1 +穀 1 +穀物 1 +穆宗 1 +穆拉 1 +穆斯塔法凱馬爾帕沙 1 +穆爾西亞 1 +穆薩 1 +積山 1 +積良 1 +穩 1 +穩固 1 +穩妥 1 +究竟 1 +空出 1 +空前 1 +空名 1 +空客 1 +空戰 1 +空隙 1 +空難 1 +穿幫 1 +穿戴 1 +穿甲 1 +穿行 1 +穿過 1 +突尼西亞 1 +突感 1 +突現 1 +窄袖 1 +窗口 1 +窗外 1 +窘境 1 +窟檐 1 +窮苦 1 +窮追 1 +窯 1 +窯洞 1 +竄紅 1 +竊聽 1 +立交 1 +立國 1 +立村 1 +立營 1 +立花 1 +立蒙 1 +立面 1 +立體 1 +站內 1 +站名 1 +站坪 1 +站廳 1 +站點 1 +竟 1 +章回 1 +章斐 1 +童女 1 +童男 1 +端川 1 +競相 1 +竹 1 +竹器 1 +竹治 1 +竹溪 1 +竹片 1 +笛 1 +符 1 +符桐 1 +第 1 +第999 1 +第三十三 1 +第十七 1 +第十五 1 +第十四 1 +第廿 1 +第比利斯 1 +第谷 1 +笳冬 1 +等位 1 +等客 1 +等號 1 +筐仔沙 1 +筒狀 1 +答應 1 +箏 1 +算出 1 +算術 1 +管制 1 +管子 1 +箬松 1 +箱型 1 +箴言 1 +節度 1 +節節 1 +範疇 1 +篡位 1 +篡國 1 +篡地 1 +簡化 1 +簡約 1 +簡訊 1 +簧 1 +簽名 1 +簽定 1 +簽認 1 +簽證 1 +簽賬 1 +籃筐 1 +籌備 1 +籌措 1 +籌款 1 +籌資 1 +籌辦 1 +籍貫 1 +籠式 1 +米南加保 1 +米古 1 +米哈伊 1 +米拉麥克斯 1 +米沙鄢 1 +米洛塞維奇 1 +米特斯 1 +米線 1 +米酒 1 +米高梅 1 +粉 1 +粉碎 1 +粉紅 1 +粉絲 1 +粗壯 1 +粗鱗蟒 1 +粵明 1 +粽子 1 +精 1 +精力 1 +精子 1 +精密 1 +精心 1 +精湛 1 +精算 1 +精索 1 +精裝 1 +糖尿 1 +糖蒜 1 +糞 1 +糟糕 1 +糧儲 1 +糧餉 1 +系數 1 +糾正 1 +糾紛 1 +紀元 1 +紂 1 +約定 1 +約熱夫 1 +約瑟芬 1 +約翰內斯堡 1 +約翰麥克連 1 +約長 1 +紅旗 1 +紅日 1 +紅杏出牆 1 +紅樓 1 +紅樓夢 1 +紅樹 1 +紅玉 1 +紅磨 1 +紅茶 1 +紅襪 1 +紅遍 1 +紅酒 1 +紅點 1 +紈 1 +紋路 1 +紋飾 1 +納入 1 +納塔爾 1 +納爾西斯 1 +納爾遜 1 +納瓦拉 1 +納蘇爾 1 +紐國 1 +紐澤西 1 +紐約尼克斯 1 +紐芬蘭 1 +紐華克 1 +紐黑文 1 +純一 1 +純凈 1 +純樸 1 +純陽 1 +紙上 1 +紙條 1 +紙盒 1 +級數 1 +素包 1 +素食 1 +素餡 1 +索倫 1 +索尼 1 +索溪峪 1 +索維克 1 +索菲 1 +索菲亞 1 +索西納 1 +索賠 1 +索馬里 1 +紮實 1 +累計 1 +細 1 +細岡 1 +細窄 1 +細菌 1 +細部 1 +細長 1 +紳士 1 +紹 1 +紹儀 1 +紹榮 1 +紺三郎 1 +終審 1 +終身大事 1 +組件 1 +組像 1 +組別 1 +組口 1 +組態 1 +組織胺 1 +組隊 1 +結交 1 +結冰 1 +結尾 1 +結雅 1 +絕壁 1 +絕大 1 +絕後 1 +絕版 1 +絕罰 1 +絞刑 1 +絞死 1 +絞痛 1 +給定 1 +給職 1 +給藥 1 +給體 1 +統 1 +統帥 1 +統籌 1 +絲山 1 +絲帶 1 +絶 1 +綁 1 +綉 1 +綏遠 1 +經國 1 +經意 1 +經文 1 +經昌 1 +經期 1 +經由 1 +經界 1 +綜 1 +綜理 1 +綜錄 1 +綠化 1 +綠帶 1 +綠滙 1 +綠燈 1 +綠社 1 +綠黨 1 +維健 1 +維克托 1 +維利爾斯 1 +維埃拉 1 +維多莉亞 1 +維希 1 +維德 1 +維景灣 1 +維爾紐斯 1 +維生 1 +維祀 1 +維羅納 1 +維記 1 +維護 1 +維迪斯 1 +維迪爾 1 +綱領 1 +網址 1 +網易 1 +網線 1 +網購 1 +綺塍 1 +綺色佳 1 +綽號 1 +綿羊 1 +緊張 1 +緊緊 1 +緊貼 1 +緊逼 1 +緊閉 1 +線上 1 +線前 1 +線度 1 +線條 1 +線索 1 +線道 1 +締造 1 +編上 1 +編導 1 +編程 1 +編篡 1 +編繪 1 +編纂 1 +編者 1 +編腔 1 +編隊 1 +緩衝 1 +緩解 1 +緩鬢 1 +緩龍 1 +緬 1 +緯來 1 +練兵 1 +緹 1 +縣市 1 +縣裡 1 +縫 1 +縫製 1 +縮寫 1 +縮小 1 +縱 1 +縱使 1 +縱觀 1 +縱隊 1 +總區 1 +總和 1 +總局 1 +總站 1 +總行 1 +總裁 1 +總計 1 +總辦 1 +績效 1 +繁多 1 +繁瑣 1 +繁盛 1 +繁雜 1 +繁體 1 +繞境 1 +繞開 1 +繡 1 +繩架 1 +繭 1 +繳付 1 +繳納 1 +繼業 1 +繼科 1 +續航 1 +續部 1 +纏足 1 +纜車 1 +缺口 1 +缺失 1 +缺少 1 +缺氧 1 +缺血 1 +罕有 1 +罪惡 1 +置有 1 +置物 1 +罰則 1 +署理 1 +罵聲 1 +罷免 1 +罷工 1 +罹癌 1 +罹難 1 +羅乞多毗闍 1 +羅什艾因 1 +羅伊 1 +羅克斯堡 1 +羅培茲 1 +羅夫 1 +羅希 1 +羅德西亞 1 +羅拔 1 +羅曼什 1 +羅柔 1 +羅森費爾德 1 +羅爾夫 1 +羅隆基 1 +羊圈 1 +美味 1 +美孚 1 +美寶 1 +美幸 1 +美林豬籠草 1 +美琴 1 +美知留 1 +美稱 1 +美索不達米亞 1 +美聯 1 +美聲 1 +美薇 1 +美術 1 +美觀 1 +美譽 1 +美里 1 +美食 1 +美麗華 1 +羚羊 1 +羞恥 1 +群峰 1 +群族 1 +群組 1 +群落 1 +群速 1 +群雄 1 +群體 1 +羨慕 1 +義久 1 +義勇 1 +義安 1 +義工 1 +義弘 1 +義春 1 +義民 1 +義父 1 +義項 1 +羱羊 1 +羲 1 +羽田 1 +羽絨 1 +翌日 1 +習經 1 +翔 1 +翔麟 1 +翟 1 +翠鳥 1 +翻覆 1 +翼手龍 1 +翼龍 1 +耀樞 1 +耀武 1 +耀邦 1 +老人 1 +老大 1 +老套 1 +老婦 1 +老將 1 +老少 1 +老弱 1 +老橋 1 +老漢 1 +考上 1 +考夫卡 1 +考尼律斯 1 +考柯 1 +考牙 1 +考生 1 +考究 1 +考績 1 +考進 1 +考選 1 +而已 1 +耐受 1 +耐庵 1 +耐玩 1 +耐航 1 +耳光 1 +耳勺 1 +耳孔 1 +耳朵眼 1 +耳珠 1 +耳環 1 +耳癤 1 +耳蝸 1 +耳門 1 +耳骨 1 +耶索洛 1 +耶路撒冷 1 +耽擱 1 +聆聽 1 +聖人 1 +聖保羅 1 +聖克萊爾 1 +聖名 1 +聖地亞哥 1 +聖彌格 1 +聖彼得堡 1 +聖徒 1 +聖拉扎爾 1 +聖歌 1 +聖水 1 +聖求 1 +聖潔 1 +聖祖 1 +聖神 1 +聖經 1 +聖訓 1 +聖赫勒拿 1 +聖赫勒拿島戴勝 1 +聖路易斯 1 +聖體 1 +聘問 1 +聘用 1 +聚氯乙烯 1 +聚禮 1 +聚苯乙烯 1 +聚變 1 +聚體 1 +聞名 1 +聞言 1 +聯姻 1 +聯播 1 +聯江 1 +聯浦 1 +聯產 1 +聯美 1 +聰敏 1 +聲恆 1 +聲援 1 +聲波 1 +聲谷 1 +聲門 1 +聲音 1 +聶丞益 1 +職員 1 +職棒 1 +聽到 1 +聽命 1 +聽從 1 +聽眾 1 +聽聞 1 +聾人 1 +肅宗 1 +肆 1 +肆意 1 +肇 1 +肉夾 1 +肉湯 1 +肉瘤 1 +肉緊 1 +肌肉 1 +肖嚴 1 +肚臍 1 +肚餓 1 +肝 1 +股市 1 +股本 1 +肥牛 1 +肥田 1 +肥胖 1 +肩 1 +肯 1 +肯亞 1 +肯特 1 +育有 1 +育樂 1 +育空 1 +肺病 1 +胃 1 +胃石 1 +背上 1 +背依 1 +背包 1 +背叛 1 +背後 1 +背靠 1 +背面 1 +背鰭 1 +胎 1 +胚 1 +胚胎 1 +胞 1 +胞弟 1 +胡特勒 1 +胡禮 1 +胡蜂 1 +胡馬雍 1 +胸痛 1 +胸管 1 +胸部 1 +胸鰭 1 +能人 1 +能否 1 +能幹 1 +脆 1 +脊椎 1 +脫疽 1 +脫落 1 +脫隊 1 +脫離 1 +脱口秀 1 +脾氣 1 +腐敗 1 +腐蝕 1 +腓力 1 +腔 1 +腫瘤 1 +腳掌 1 +腳本 1 +腳點 1 +腸胃 1 +腸道 1 +腸骨 1 +腹 1 +腿 1 +腿部 1 +膝傷 1 +膝頭 1 +膠 1 +膠州 1 +膠東 1 +膠澳 1 +膠體 1 +膨脹 1 +膽 1 +膽酸 1 +臉 1 +臉頰 1 +臉龐 1 +臘 1 +臥龍 1 +臧 1 +臨 1 +臨榆 1 +臨終 1 +臨高 1 +自作自受 1 +自保 1 +自信 1 +自卑 1 +自在 1 +自學 1 +自帶 1 +自強 1 +自從 1 +自成 1 +自用 1 +自發 1 +自製 1 +自訂 1 +自負 1 +自辦 1 +至上 1 +至善 1 +至柔 1 +至正 1 +至死不渝 1 +至關 1 +至關重要 1 +致使 1 +致函 1 +致恐 1 +致病 1 +致瘋 1 +致癌 1 +臺大 1 +舀出 1 +舅父 1 +興 1 +興國 1 +興學 1 +興業 1 +興海 1 +興祖 1 +舉世矚目 1 +舉例 1 +舉國 1 +舉止 1 +舉薦 1 +舉起 1 +舊友 1 +舊屋 1 +舊時 1 +舊稱 1 +舊部 1 +舊金山 1 +舌頭 1 +舍爾 1 +舍訥費爾德 1 +舒 1 +舒查特 1 +舒爾特 1 +舜初 1 +舞 1 +舞劇 1 +舞陽 1 +舟 1 +航天 1 +航站 1 +般若 1 +船塢 1 +船山 1 +船業 1 +船體 1 +艦身 1 +良 1 +良師益友 1 +良心 1 +良性 1 +良田 1 +良知 1 +艱巨 1 +色帶 1 +色情 1 +色目 1 +色調 1 +艷姬 1 +艷麗 1 +艾伍士 1 +艾倫 1 +艾塞羅 1 +艾夏 1 +艾崔奇 1 +艾巴德 1 +艾度蘭 1 +艾琳 1 +艾瑞 1 +艾瑪 1 +艾登堡 1 +艾美 1 +艾蓮娜 1 +艾薩克 1 +艾迴 1 +艾雲 1 +艾麗卡 1 +芬妮 1 +芬華絲 1 +芬迪絲 1 +芭蕉 1 +芭黎絲 1 +花上 1 +花俏 1 +花園蔥蝸牛 1 +花坮 1 +花城 1 +花店 1 +花旗 1 +花月 1 +花果 1 +花枝 1 +花瓶 1 +花甲 1 +花蜜 1 +花鞋 1 +苗栗 1 +苗穗 1 +苟且 1 +若愚 1 +若羌 1 +若英 1 +苦 1 +苦力 1 +苦悶 1 +苦情 1 +苦苣苔 1 +苦讀 1 +苯並芘 1 +苯乙烯 1 +英一 1 +英乙 1 +英倫 1 +英傑 1 +英勇 1 +英吋 1 +英國短毛豬 1 +英寸 1 +英尺 1 +英年 1 +英廷 1 +英格瑪 1 +英男 1 +英里 1 +英龍華 1 +茂 1 +茂名 1 +范恩 1 +茄南 1 +茄芮 1 +茅家 1 +茲羅提 1 +茶樓 1 +茶湯 1 +茶館 1 +荃灣 1 +荃麟 1 +草原 1 +草地 1 +草坪 1 +草席 1 +草稿 1 +荊州 1 +荒地 1 +荒蕪 1 +荒誕不經 1 +荔灣 1 +荷爾蒙 1 +荷銀 1 +莆 1 +莊嚴 1 +莊王 1 +莎樂美 1 +莫 1 +莫吉爾諾 1 +莫埃索 1 +莫扎特 1 +莫札特 1 +莫桑 1 +莫瑙恩 1 +莫瓦桑 1 +莫納加斯 1 +莫臥兒 1 +莫過 1 +莫里亞 1 +莽山 1 +菅 1 +菊 1 +菊花 1 +菜 1 +華倫西亞 1 +華少 1 +華新 1 +華族 1 +華林 1 +華爾 1 +華界 1 +華石 1 +華秀 1 +華納 1 +華西 1 +華頓 1 +菲力 1 +菲國 1 +菲德爾 1 +菲爾 1 +菲萊 1 +菲詩 1 +菸害 1 +萊因 1 +萊夫斯 1 +萊希 1 +萊斯特 1 +萊爾 1 +萊特曼 1 +萊茵蘭 1 +萊蕪 1 +萊采巴 1 +萌 1 +萌芽 1 +萎縮 1 +萬一 1 +萬丹 1 +萬貴 1 +落 1 +落下 1 +落實 1 +落敗 1 +落葉 1 +葆玖 1 +葉利欽 1 +葉士域治 1 +葉序 1 +葉綠 1 +著手 1 +著有 1 +著譯 1 +葛 1 +葛力馬 1 +葛朱 1 +葛浩文 1 +葛羅斯 1 +葛蕾絲 1 +葛量洪 1 +葡 1 +葡超 1 +葫蘆 1 +葬禮 1 +葵青 1 +蒂利妮 1 +蒂娜 1 +蒂迦納 1 +蒙丹 1 +蒙卡達 1 +蒙哥 1 +蒙哥馬利 1 +蒙塔尼萊博恩 1 +蒙巴薩 1 +蒙得維 1 +蒙特利爾 1 +蒙羞 1 +蒙面 1 +蒙馬特 1 +蒲 1 +蒲飛 1 +蒸氣 1 +蒸發 1 +蒼白 1 +蓄水 1 +蓋兒 1 +蓋因 1 +蓋多 1 +蓋曼 1 +蓋朗杜克西亞 1 +蓋頂 1 +蓓 1 +蓓天翼龍 1 +蓬塔德馬塔 1 +蓬拉貝 1 +蓬皮杜 1 +蓮 1 +蓮安 1 +蓮花 1 +蔑稱 1 +蔡斯 1 +蔣公 1 +蕙嫻 1 +蕨類 1 +蕩漾 1 +蕾妮 1 +薄 1 +薄弱 1 +薄扶林 1 +薔 1 +薛慶 1 +薦 1 +薩克森 1 +薩凡娜 1 +薩卡拉瓦 1 +薩哈 1 +薩哈林 1 +薩平頓 1 +薩德 1 +薩拉只 1 +薩摩亞 1 +薩爾曼 1 +薩爾瓦多 1 +薩爾茨卡默古特 1 +薩爾馬提亞 1 +薩瑞阿尼迪 1 +薩維塔 1 +薩維奧洛夫 1 +薩馬 1 +薪俸 1 +藉助 1 +藉此 1 +藍儂 1 +藍寶石華麗雨林 1 +藍尼 1 +藍本 1 +藍欽 1 +藍潟 1 +藍灰 1 +藍田 1 +藍白 1 +藍背 1 +藍邊 1 +藍領 1 +藍黨 1 +藏之介 1 +藏寶 1 +藏有 1 +藝 1 +藝名 1 +藝能 1 +藝謀 1 +藝電 1 +藤原 1 +藤木 1 +藤本 1 +藤村 1 +藤枝 1 +藤藝 1 +藥品 1 +藥師 1 +藥材 1 +藥水 1 +藥石 1 +藩主 1 +藩士 1 +藩西 1 +蘇利文 1 +蘇北 1 +蘇尋三 1 +蘇木 1 +蘇格拉底 1 +蘇維匯 1 +蘇美爾 1 +蘇萊曼尼亞 1 +蘇醒 1 +蘇里南 1 +蘊藏 1 +蘭利 1 +蘭卡斯特 1 +蘭封 1 +蘭弗朗克 1 +蘭德 1 +虎式 1 +虎棒 1 +虎翼 1 +虎視眈眈 1 +虔信 1 +處之泰然 1 +處女 1 +處決 1 +處置 1 +處長 1 +虛弱 1 +虛榮 1 +虛無 1 +號吾 1 +號子 1 +號稱 1 +號誌 1 +虢 1 +虢國 1 +虹 1 +虹橋 1 +蚊類 1 +蚩尤 1 +蛇油 1 +蛇種 1 +蛇魔 1 +蛋 1 +蛋白質 1 +蛙 1 +蜂擁而至 1 +蜂蜜 1 +蜆殼 1 +蜚聲 1 +蜥蜴 1 +蜿蜒 1 +蝴蝶 1 +融入 1 +融化 1 +融和 1 +融雪 1 +螞蟻 1 +螢幕 1 +蟬聯 1 +蟲 1 +蟲洞 1 +蠟浸 1 +蠶院 1 +蠻子 1 +血型 1 +血液 1 +血竭 1 +血管 1 +血腥 1 +行人 1 +行使 1 +行列 1 +行將 1 +行用 1 +行禮 1 +行長 1 +行騙 1 +術 1 +街上 1 +街名 1 +街市 1 +街路 1 +街頭 1 +衛理 1 +衝動 1 +衝鋒 1 +衡 1 +衡量 1 +衢山 1 +衣 1 +衣冠 1 +衣物 1 +衣索比亞 1 +表型 1 +表妹 1 +表姐 1 +表徵 1 +表情 1 +表態 1 +表揚 1 +表格 1 +表決 1 +表白 1 +表述 1 +衰敗 1 +衰落 1 +袖手旁觀 1 +袖箭 1 +被告 1 +被子 1 +裁決 1 +裁減 1 +裂縫 1 +裂變 1 +裋褐 1 +裕 1 +裕智 1 +裕軍 1 +裙子 1 +補償 1 +補天 1 +補教 1 +補時 1 +補褂 1 +裝修 1 +裝備 1 +裝嵌 1 +裝有 1 +裝瓶 1 +裝葯 1 +裝設 1 +裝載 1 +裴 1 +裴林 1 +裸子 1 +裸照 1 +製備 1 +製得 1 +複數 1 +褐色 1 +褪色 1 +褲 1 +褲子 1 +褲袋 1 +襄 1 +襄助 1 +襄王 1 +襄陽 1 +襲 1 +襲封 1 +西亞特 1 +西京 1 +西周 1 +西哈莫尼 1 +西坑 1 +西域 1 +西夏 1 +西奧多 1 +西宮 1 +西岸 1 +西島 1 +西廠 1 +西式 1 +西弗萊德 1 +西斯廷 1 +西晉 1 +西段 1 +西河 1 +西洋坪 1 +西漢 1 +西甌 1 +西線 1 +西美 1 +西蒙 1 +西薩 1 +西蘭卡普 1 +西西里 1 +西距 1 +西迪 1 +西鄉 1 +要是 1 +要脅 1 +要衝 1 +要道 1 +見人 1 +見稱 1 +見聞 1 +見解 1 +見識 1 +見長 1 +規例 1 +覓食 1 +視乎 1 +視作 1 +視圖 1 +視角 1 +親人 1 +親信 1 +親政 1 +親朋 1 +親筆 1 +親臨 1 +親身 1 +覺察 1 +覽 1 +觀光 1 +觀察 1 +觀念 1 +觀戰 1 +觀望 1 +觀看 1 +觀者 1 +角膜 1 +解僱 1 +解夢 1 +解析 1 +解答 1 +解職 1 +解脫 1 +解說 1 +觸怒 1 +觸手可及 1 +觸覺 1 +觸診 1 +言官 1 +言語 1 +言辭 1 +訂位 1 +訃告 1 +訄書 1 +訇開 1 +計委 1 +計謀 1 +討逆 1 +訓 1 +託 1 +記念 1 +記述 1 +記集 1 +設站 1 +許昌 1 +許諾 1 +許願 1 +訴 1 +訴求 1 +訴諸 1 +註 1 +註明 1 +註銷 1 +詐死 1 +詔書 1 +評出 1 +評判 1 +評鑑 1 +詛咒 1 +詞幹 1 +詞義 1 +詢問 1 +試劑 1 +試播 1 +試種 1 +試製 1 +試音 1 +試飛 1 +詩文 1 +該事 1 +該人 1 +該墓 1 +該島 1 +該年 1 +該批 1 +該族 1 +該會 1 +該條 1 +該段 1 +該科 1 +該系 1 +該處 1 +該路 1 +該黨 1 +詳情 1 +詳細 1 +詹姆士 1 +詼諧 1 +誇德拉多 1 +誇祖魯 1 +誌 1 +誌家 1 +認一民 1 +認同 1 +認定 1 +認罪 1 +認證 1 +認輔 1 +誓言 1 +誕 1 +誕下 1 +誘因 1 +語文 1 +語法 1 +語流 1 +語訓 1 +語調 1 +語速 1 +語音 1 +誠意 1 +誤 1 +誤信 1 +誤差 1 +誤會 1 +誤槍 1 +誤譯 1 +誥命 1 +誦 1 +說出 1 +說客 1 +說成 1 +說話 1 +說謊 1 +說道 1 +課本 1 +誹謗 1 +調值 1 +調停 1 +調入 1 +調和 1 +調控 1 +調水 1 +調沙 1 +調研 1 +調節 1 +調職 1 +調解 1 +諂媚 1 +談判 1 +談妥 1 +談論 1 +請來 1 +請辭 1 +請願 1 +論事 1 +諜海 1 +諧波 1 +諶 1 +諸 1 +諸如 1 +諸暨 1 +諸河 1 +諺言 1 +諾丁漢 1 +諾域治 1 +諾斯 1 +諾曼 1 +諾爾曼 1 +謀取 1 +謀士 1 +謀求 1 +謀職 1 +謁者 1 +謇 1 +謊言 1 +謙卑 1 +謚 1 +講完 1 +講究 1 +講談 1 +講道 1 +謝世 1 +謝列梅捷沃 1 +謝爾比 1 +謝瓦爾德納澤 1 +謝蓋爾 1 +謹 1 +謹慎 1 +證 1 +譚 1 +譜代 1 +警務 1 +警句 1 +警告 1 +警員 1 +警戒 1 +警衛 1 +警覺 1 +警鐘 1 +譯作 1 +譯員 1 +譯場 1 +譯本 1 +議席 1 +譴責 1 +護佑 1 +護城 1 +護墊 1 +護送 1 +讀取 1 +讀法 1 +變動 1 +變差 1 +變調 1 +變身 1 +變遷 1 +變革 1 +讓步 1 +讓開 1 +讚喻 1 +讚揚 1 +讚美 1 +讚譽 1 +谷山 1 +谷氨酸 1 +豆瓣 1 +豈 1 +豎立 1 +豎起 1 +豐久 1 +豐厚 1 +豐城 1 +豐臣 1 +豐隆 1 +象數 1 +象晉 1 +象牙 1 +象牙喙啄木鳥 1 +豢養 1 +豪宅 1 +豪門 1 +豫南 1 +豬 1 +豬圈 1 +豬油 1 +豬肉 1 +貂 1 +貓咪 1 +貓囒 1 +貓科 1 +貝克 1 +貝克漢 1 +貝加爾 1 +貝南 1 +貝斯 1 +貝爾普 1 +貝爾蘇斯 1 +貝碧嘉 1 +貝納斯科 1 +貝都因 1 +貝類 1 +貞昌 1 +貞潔 1 +貞觀 1 +負擔 1 +負芻 1 +負荷 1 +負面 1 +負額 1 +財經 1 +財落 1 +貢 1 +貢品 1 +貢哥拉 1 +貢嘎 1 +貢巴 1 +貧 1 +貧乏 1 +貧窮 1 +貧鈾 1 +貨 1 +貨品 1 +貨機 1 +販賣 1 +貪圖 1 +貪婪 1 +貪心 1 +貪瀆 1 +貫徹 1 +貫穿 1 +貫通 1 +責怪 1 +責難 1 +貴築 1 +貴賓 1 +貴陽 1 +貴霜 1 +貶意 1 +買入 1 +買賣 1 +費曼 1 +費爾南多 1 +費用 1 +費盡 1 +費羅 1 +貼身 1 +賀特 1 +賀立 1 +賄選 1 +資 1 +資政 1 +資陽 1 +賈亞辛哈 1 +賈多特 1 +賈斯丁 1 +賈斯珀 1 +賈氏 1 +賓客 1 +賓尼迪斯 1 +賓州 1 +賞識 1 +賠禮 1 +賡臣 1 +賢思 1 +賣 1 +賣出 1 +賣到 1 +賣地 1 +賣家 1 +賣掉 1 +賣空 1 +賤女 1 +賤民 1 +質詢 1 +賭徒 1 +賭檔 1 +賴宣 1 +賺取 1 +賺錢 1 +購得 1 +購置 1 +賽場 1 +賽普勒斯 1 +賽爾金德 1 +賽車 1 +賽道 1 +贈 1 +贈送 1 +贊博尼 1 +贊成 1 +贊比西亞 1 +贏家 1 +贖回 1 +赤坂 1 +赤壁 1 +赤樹 1 +赤狐 1 +赤鱲 1 +赦 1 +赫伯特 1 +赫塔卜 1 +赫斯 1 +赫比格 1 +赫爾克 1 +赫爾辛基 1 +赫雷爾斯 1 +赫魯曉夫 1 +走上 1 +走到 1 +走勢 1 +走漏 1 +走私 1 +起事 1 +起伏 1 +起初 1 +起名 1 +起因 1 +起始 1 +起建 1 +起止 1 +起死回生 1 +起碼 1 +起端 1 +起舞 1 +起落 1 +起訖 1 +起降 1 +起點 1 +趁 1 +超出 1 +超導 1 +超強 1 +超我 1 +超時 1 +超武 1 +超然 1 +超重 1 +超齡 1 +越亮 1 +越共 1 +越前 1 +越好 1 +越弱 1 +越戰 1 +越早 1 +越暗 1 +越牆 1 +越發 1 +越近 1 +越過 1 +趕往 1 +趙氏 1 +趟 1 +趣事 1 +趨勢 1 +趨於 1 +足不出戶 1 +足夠 1 +足見 1 +足跡 1 +趾爪 1 +趾骨 1 +跋扈 1 +跌 1 +跑 1 +跑壘 1 +跑步 1 +跑車 1 +跑馬 1 +跟操 1 +跟班 1 +跟蹤 1 +跟進 1 +跟隨 1 +跨 1 +跨國 1 +跨度 1 +跨步 1 +跨足 1 +跨過 1 +路政 1 +路易斯安那 1 +路濟亞 1 +路綫 1 +路網 1 +路透 1 +路過 1 +路障 1 +路面 1 +跳動 1 +跳槽 1 +跳過 1 +跳遠 1 +跳高 1 +踏上 1 +踏入 1 +踢進 1 +躁 1 +躁動 1 +躍升 1 +身受 1 +身型 1 +身旁 1 +身為 1 +身無分文 1 +身著 1 +身軀 1 +身高 1 +躬耕 1 +躲到 1 +車上 1 +車仁 1 +車型 1 +車士打菲特 1 +車外 1 +車尾 1 +車市 1 +車廠 1 +車手 1 +車票 1 +車程 1 +車窗 1 +車系 1 +車號 1 +車費 1 +車路士 1 +車迷 1 +車頭 1 +軋箏 1 +軌跡 1 +軍中 1 +軍備 1 +軍功 1 +軍務 1 +軍委 1 +軍師 1 +軍援 1 +軍方 1 +軍服 1 +軍營 1 +軍艦 1 +軍裝 1 +軍階 1 +軍需 1 +軒轅 1 +軟 1 +軟化 1 +軟硬體 1 +軟骨 1 +軸 1 +軸心 1 +較低 1 +較佳 1 +較厚 1 +較快 1 +較深 1 +載人 1 +載淳 1 +輔 1 +輔佐 1 +輕微 1 +輕易 1 +輕軌 1 +輕鐵 1 +輕髻 1 +輕鬆 1 +輝 1 +輝彥 1 +輪周 1 +輪廓 1 +輪流 1 +輪船 1 +輪迴 1 +輯 1 +輯錄 1 +輸 1 +輸掉 1 +輸精 1 +輸血 1 +輸送 1 +輻轍 1 +輻鰭 1 +輾轉 1 +轅 1 +轉交 1 +轉任 1 +轉動 1 +轉化 1 +轉向 1 +轉型 1 +轉差 1 +轉往 1 +轉念 1 +轉播 1 +轉會 1 +轉正 1 +轉角 1 +轉賣 1 +轉赴 1 +辛普朗 1 +辛普森 1 +辛辛那提 1 +辜 1 +辟邪 1 +辦學 1 +辦有 1 +辨別 1 +辨明 1 +辨識 1 +辭典 1 +辭官 1 +辭歲 1 +辯證 1 +辰國 1 +辰男 1 +農事 1 +農墾 1 +農書 1 +農林 1 +農舍 1 +迅 1 +迅即 1 +迅猛 1 +迎 1 +迎神 1 +迎賓 1 +迎送 1 +迎面 1 +近似 1 +近侍 1 +近平 1 +近日 1 +近東 1 +近海 1 +近現代 1 +近親 1 +近鄰 1 +返 1 +返樸歸真 1 +迦南 1 +迦納 1 +迪克 1 +迪克蘭 1 +迪士尼 1 +迪斯雷利 1 +迪比亞吉奧 1 +迪爾汗 1 +迪米特 1 +迫切 1 +述 1 +迴流 1 +迷你變色龍 1 +迷唐 1 +迷路 1 +追兇 1 +追回 1 +追封 1 +追尋 1 +追尾 1 +追思 1 +追憶 1 +追查 1 +追根究底 1 +追殺 1 +追求 1 +追究 1 +追討 1 +追述 1 +退位 1 +退回 1 +退夷 1 +退居 1 +退敵 1 +退隱 1 +送來 1 +送到 1 +送回 1 +送殯 1 +送給 1 +送院 1 +逃亡 1 +逃奔 1 +逃至 1 +逃跑 1 +逆 1 +逆戟鯨 1 +逍遙 1 +透徹 1 +透支 1 +透水 1 +透視 1 +透鏡 1 +逐客 1 +途中 1 +途人 1 +途經 1 +這兒 1 +這時 1 +通俗 1 +通商 1 +通天 1 +通宏 1 +通州 1 +通渭 1 +通貨 1 +通通 1 +通運 1 +通靈 1 +通風 1 +逛街 1 +速往 1 +速銷 1 +造價 1 +造反 1 +造就 1 +造幣 1 +造福 1 +造血 1 +造訪 1 +造謠 1 +逢吉 1 +連串 1 +連克 1 +連坐 1 +連年 1 +連座 1 +連成 1 +連拍 1 +連筆 1 +連篇累牘 1 +連結 1 +連絡 1 +連通 1 +連進 1 +連餓 1 +週末 1 +週邊 1 +進位 1 +進來 1 +進出 1 +進動 1 +進犯 1 +逼 1 +逼使 1 +逼停 1 +逼到 1 +逾期 1 +遂起 1 +遇上 1 +遇刺 1 +遇有 1 +遇陛 1 +遇難 1 +遊憩 1 +遊擊 1 +遊歷 1 +遊艇 1 +遊覽 1 +遊說 1 +遊離 1 +運 1 +運回 1 +運往 1 +運煤 1 +運算 1 +運糧 1 +運補 1 +運載 1 +遍 1 +遍布 1 +過冷 1 +過剩 1 +過多 1 +過往 1 +過敏 1 +過橋 1 +過濾 1 +過甚 1 +過繼 1 +過苛 1 +過路 1 +過頭 1 +道世民 1 +道具 1 +道墟 1 +道士 1 +道學 1 +道宇 1 +道安 1 +道格拉斯 1 +道歉 1 +道理 1 +道綽 1 +道羅 1 +道義 1 +道靜 1 +達上 1 +達人 1 +達克斯 1 +達古武 1 +達恩利 1 +達拉斯 1 +達拏 1 +達拖錯 1 +達母拿錯 1 +達濠 1 +達爾文 1 +達章 1 +達華 1 +達賴 1 +違背 1 +遙陽 1 +遜位 1 +遞交 1 +遞增 1 +遠呂智 1 +遠嫁 1 +遠揚 1 +遠日 1 +遠洋 1 +遠處 1 +遠遠 1 +遠離 1 +遣 1 +遣返 1 +適之 1 +適用 1 +遭殃 1 +遮天 1 +遮蔭 1 +遮陰 1 +遲 1 +遲遲 1 +遷出 1 +遷居 1 +遷校 1 +選上 1 +選修 1 +選定 1 +選用 1 +選美 1 +選訓 1 +選調 1 +選進 1 +選題 1 +遹 1 +遺物 1 +遺留 1 +遺腹 1 +遺迹 1 +遺骸 1 +遼西翼龍 1 +避 1 +避禍 1 +避開 1 +邁克 1 +邁向 1 +邁阿密 1 +還擊 1 +還有 1 +邊區 1 +邗江 1 +那時 1 +那普拉夫尼克 1 +那曲 1 +邦國 1 +邦德 1 +邦蒂 1 +邦達倉 1 +邪惡 1 +邪神 1 +邪馬台 1 +邱家 1 +邳縣 1 +邵伯 1 +邵氏 1 +郊狼 1 +郎 1 +郝 1 +郡區 1 +郡縣 1 +郡艾塞克斯 1 +部位 1 +部字 1 +部將 1 +部首 1 +郪江 1 +郫縣 1 +郭家 1 +郵報 1 +郵輪 1 +都城嘉慕 1 +都察 1 +都尉 1 +都會 1 +都有 1 +都督 1 +都靈 1 +鄂 1 +鄂倫春 1 +鄂溫克 1 +鄂霍次克 1 +鄉內 1 +鄉團 1 +鄉村 1 +鄉長 1 +鄰 1 +鄰域 1 +鄰居 1 +鄰里 1 +酃縣 1 +酆 1 +配上 1 +配件 1 +配備 1 +配器 1 +配有 1 +配角 1 +酒家 1 +酒杯 1 +酒樓 1 +酒鬼 1 +酩酊大醉 1 +酵母 1 +酷似 1 +酷刑 1 +醉醺醺 1 +醋酸根 1 +醫書 1 +醫科 1 +醫術 1 +醬貨 1 +醴陵 1 +釀成 1 +釀造 1 +釉色 1 +釋出 1 +釋迦 1 +釋迦牟尼 1 +里士滿 1 +里奧多 1 +里港 1 +里馬 1 +重創 1 +重力 1 +重回 1 +重復 1 +重心 1 +重情 1 +重播 1 +重核 1 +重物 1 +重獲 1 +重現 1 +重生 1 +重用 1 +重疊 1 +重禮 1 +重組 1 +重義 1 +重考 1 +重製 1 +重複 1 +重見天日 1 +重讀 1 +重鎮 1 +重開 1 +重陽 1 +重音 1 +重鳳 1 +野外 1 +野心勃勃 1 +野戰 1 +野木 1 +野球 1 +野菜 1 +量度 1 +金剛 1 +金寶 1 +金帶英麗魚 1 +金幣 1 +金平 1 +金氏 1 +金泉 1 +金浦 1 +金湖 1 +金牛 1 +金獎 1 +金箔 1 +金羅斯 1 +金美 1 +金華 1 +金質 1 +金邊 1 +金銀 1 +金錢 1 +金門 1 +金靴 1 +金頂 1 +金魚 1 +金鵰 1 +釜山 1 +針劑 1 +釧路 1 +鈇 1 +鈦 1 +鈺源 1 +鉑金 1 +銀杏 1 +銀熊 1 +銀牌 1 +銀白 1 +銀紅 1 +銀色 1 +銅仁 1 +銅像 1 +銅削 1 +銅斧 1 +銅柄 1 +銅臿 1 +銅製 1 +銅銎 1 +銅錛 1 +銅錢 1 +銘 1 +銘皖 1 +銘銘 1 +銜稱 1 +銠 1 +銳利 1 +銷毀 1 +銷量 1 +鋒 1 +鋪成 1 +鋪有 1 +鋸齒龍 1 +鋼板 1 +錄影 1 +錄得 1 +錄放影機 1 +錢上 1 +錦 1 +錦俊 1 +錦承 1 +錦江 1 +錦田 1 +錫 1 +錫伯 1 +錫勇 1 +錫昌 1 +錯 1 +錯視 1 +錯覺 1 +錳 1 +錳礦 1 +鍊金 1 +鍋中 1 +鍋內 1 +鍋爐 1 +鍔 1 +鍛鍊 1 +鍝 1 +鍾 1 +鎖妖 1 +鎖閉 1 +鎮守 1 +鎮岳 1 +鎮朔 1 +鎮賚 1 +鎮里 1 +鎮靜 1 +鎰 1 +鎳銀 1 +鏈 1 +鏡波 1 +鏡湖 1 +鐳 1 +鐵削 1 +鐵匾 1 +鐵棍 1 +鐵民 1 +鐵爐 1 +鐵管 1 +鐵釘 1 +鐵銹 1 +鐵錛 1 +鑑別 1 +鑑定 1 +鑑泉 1 +鑑證 1 +鑒定 1 +鑫新 1 +鑽入 1 +鑽出 1 +鑽探 1 +鑿出 1 +長凳 1 +長史 1 +長婁 1 +長孫 1 +長岡 1 +長崎 1 +長廊 1 +長廷 1 +長方 1 +長榮 1 +長毛 1 +長治 1 +長溝 1 +長滿 1 +長瑪喀比 1 +長盛 1 +長笛 1 +長篇 1 +長編 1 +長跑 1 +長頸鹿 1 +長髮 1 +門修斯 1 +門廳 1 +門式 1 +閃米特 1 +閃長 1 +閃電 1 +閉日 1 +開價 1 +開光 1 +開啟 1 +開場 1 +開墾 1 +開學 1 +開工 1 +開往 1 +開戰 1 +開拓 1 +開挖 1 +開支 1 +開教 1 +開業 1 +開槍 1 +開球 1 +開瑞坦 1 +開票 1 +開車 1 +開辦 1 +開錄 1 +閑聊 1 +閑談 1 +閒言閒語 1 +間斷 1 +間碟 1 +間距 1 +閘口 1 +閘機 1 +閣 1 +閩侯 1 +閩南 1 +闖進 1 +關中 1 +關斷 1 +關連 1 +闡述 1 +闢 1 +阡陌 1 +阪神 1 +防凍 1 +防止 1 +防盜 1 +防護 1 +阻塞 1 +阻撓 1 +阻隔 1 +阿一 1 +阿仙奴 1 +阿信 1 +阿修羅 1 +阿內爾卡 1 +阿勒格尼郡 1 +阿勝 1 +阿勞 1 +阿基里斯 1 +阿堯 1 +阿奇里斯 1 +阿寧 1 +阿布 1 +阿拉法特 1 +阿斗 1 +阿普第 1 +阿曼達 1 +阿東 1 +阿格拉 1 +阿格雷斯蒂 1 +阿森斯 1 +阿森納 1 +阿比西尼亞豬 1 +阿波羅 1 +阿爾及利亞 1 +阿爾及爾 1 +阿爾布巴 1 +阿爾扎阿爾拉齊蓋 1 +阿爾法 1 +阿爾發 1 +阿爾茨海默 1 +阿爾高 1 +阿特 1 +阿特拉斯 1 +阿猴 1 +阿瑜陀耶 1 +阿穆爾 1 +阿羅那順 1 +阿耳忒彌斯 1 +阿聯酋 1 +阿育 1 +阿茲海默 1 +阿諾 1 +阿賈克斯 1 +阿赫 1 +阿連德 1 +阿道夫 1 +阿達姆庫斯 1 +阿里 1 +阿隆索 1 +陀斯妥也夫斯基 1 +附上 1 +附加 1 +附蟲 1 +附表 1 +附身 1 +降將 1 +降格 1 +降水 1 +降班 1 +降臨 1 +降魔 1 +限 1 +限定 1 +限時 1 +陞 1 +陡壁 1 +院士 1 +院子 1 +院落 1 +陣 1 +除冰 1 +除夕 1 +除此 1 +除非 1 +陪葬 1 +陪都 1 +陰天 1 +陰暗 1 +陰陽 1 +陳國 1 +陳屍 1 +陳相 1 +陳述 1 +陵園 1 +陶恩 1 +陷落 1 +陸仔 1 +陸域 1 +陸行 1 +陽 1 +陽安 1 +陽明 1 +隆亨 1 +隊列 1 +隊名 1 +隔日 1 +隔開 1 +隕星 1 +隕鐵 1 +際春 1 +隠居 1 +隨丁 1 +隨便 1 +隨同 1 +隨往 1 +隨時 1 +隨軍 1 +隨隊 1 +險些 1 +險要 1 +隱含 1 +隱姓埋名 1 +隱居 1 +隱性 1 +隱私 1 +隻身 1 +雄 1 +雄師 1 +雄獅 1 +雅克 1 +雅加達 1 +雅各布 1 +雅君 1 +集寧 1 +集結 1 +集聚 1 +雌性 1 +雌獸 1 +雌鯨 1 +雎 1 +雙十 1 +雙子 1 +雙江 1 +雜姓 1 +雜糧 1 +雜處 1 +雜食 1 +雞腿 1 +雞頭 1 +離別 1 +離域 1 +離場 1 +離子 1 +離島 1 +離群索居 1 +離職 1 +難吃 1 +難得 1 +難攻 1 +難過 1 +雨季 1 +雨後春筍 1 +雨林 1 +雪上加霜 1 +雪佛龍 1 +雪兒 1 +雪崩 1 +雪弟 1 +雪梅 1 +雲中 1 +雲亭 1 +雲岩 1 +雲松 1 +雲里 1 +零件 1 +零部件 1 +零食 1 +雷 1 +雷克南 1 +雷克斯 1 +雷切爾 1 +雷姆 1 +雷定 1 +雷昂納多 1 +雷曼 1 +雷王 1 +雷蒂亞 1 +雷雨 1 +電信 1 +電器 1 +電極 1 +電氣 1 +電瓶 1 +電線 1 +電通 1 +電邀 1 +需時 1 +霆鋒 1 +震寰 1 +震波 1 +震災 1 +霍亂 1 +霍伊爾 1 +霍夫堡 1 +霍姆 1 +霍巴特 1 +霍斯 1 +霍普金斯 1 +霍爾滕 1 +霍爾特 1 +霞 1 +霧 1 +露出 1 +露比 1 +露臉 1 +露西 1 +霸佔 1 +霸權 1 +靈前 1 +靈力 1 +靈性 1 +靈感 1 +靈柩 1 +靈活 1 +靈異 1 +靈籤 1 +靈長 1 +靈魂 1 +青 1 +青梅 1 +青森 1 +青睞 1 +青訓 1 +青金 1 +靖 1 +靖雯 1 +靜安 1 +靜岡 1 +靜華 1 +靠右 1 +靠左 1 +面具 1 +面向 1 +面貌 1 +革除 1 +鞏 1 +鞦韆 1 +韃靼 1 +韋 1 +韋契特 1 +韋德 1 +韋拉克魯斯 1 +韋拿 1 +韋斯特 1 +韋科 1 +韌 1 +韓氏 1 +韓浜 1 +音律 1 +音色 1 +音量 1 +音高 1 +韶之 1 +響號 1 +頂上 1 +頂尖 1 +頂峰 1 +頂端 1 +頂級 1 +項鏈 1 +順宗 1 +順岸 1 +順德 1 +順應 1 +順懷 1 +順治 1 +順滑 1 +順陽 1 +頌平 1 +頌揚 1 +預 1 +預估 1 +預告 1 +預知 1 +預示 1 +預約 1 +頑石 1 +頒給 1 +頗 1 +頗多 1 +頗大 1 +頗有 1 +頗盛 1 +頗豐 1 +領事 1 +領取 1 +領奏 1 +領航 1 +領軍 1 +領隊 1 +頡 1 +頭上 1 +頭前 1 +頭型 1 +頭尾 1 +頭槌 1 +頭版 1 +頭盔 1 +頭紗 1 +頭髮 1 +頸 1 +頸部 1 +頹垣 1 +頻 1 +頻寬 1 +頻散 1 +頻繁 1 +頻頻 1 +題獻 1 +題記 1 +額外 1 +額度 1 +類別 1 +類固醇 1 +顥 1 +顯 1 +顯光 1 +顯徑 1 +顯現 1 +顯靈 1 +風化 1 +風尚 1 +風波 1 +風行 1 +風間 1 +風雨 1 +飈 1 +飛往 1 +飛抵 1 +飛毛 1 +飛沫 1 +飛碟 1 +飛鏢 1 +飛靶 1 +飛鳥 1 +飛龍 1 +食人 1 +食肆 1 +食肉 1 +食蟲 1 +食鹽 1 +飲茶 1 +飼料 1 +飼草 1 +飽和 1 +飽經 1 +飾物 1 +餃子 1 +餅 1 +養份 1 +養大 1 +養女 1 +養母 1 +養父 1 +養精蓄銳 1 +養育 1 +養菊 1 +養蠶 1 +餐車 1 +餘 1 +餘熱 1 +餘眾 1 +館前 1 +館名 1 +館址 1 +饃 1 +饑餓 1 +饒平 1 +饕餮 1 +首仗 1 +首個 1 +首名 1 +首場 1 +首屈一指 1 +首席 1 +首戰 1 +首批 1 +首日 1 +首映 1 +首條 1 +首艦 1 +首讀 1 +香 1 +香亭 1 +香儂 1 +香吉士 1 +香味 1 +香坊 1 +香塍 1 +香水 1 +香洲 1 +香火 1 +香織 1 +馬丁尼茲 1 +馬丁斯維勒 1 +馬上 1 +馬修 1 +馬克安諾 1 +馬克西米利 1 +馬內阿 1 +馬六甲 1 +馬匹 1 +馬喇 1 +馬圈 1 +馬奇頓 1 +馬尼拉 1 +馬托格羅索 1 +馬爾他 1 +馬爾吉阿納 1 +馬爾地夫 1 +馬爾默 1 +馬球 1 +馬約拉那 1 +馬莎 1 +馬薩 1 +馬薩諸塞 1 +馬賽 1 +馬赫盧普 1 +馬路 1 +馬達加斯加 1 +馬里內蒂 1 +馬里蘭 1 +馬雅可夫斯基 1 +馬鞍 1 +馬黑麻 1 +馳名 1 +馴化 1 +駐任 1 +駐地 1 +駐防 1 +駕崩 1 +駙馬 1 +駛 1 +駛入 1 +駛過 1 +駿業 1 +騁遠 1 +騎 1 +騎馬 1 +騏一郎 1 +騙徒 1 +騰出 1 +騰訊 1 +騷擾 1 +驅 1 +驗屍 1 +驗票 1 +驗證 1 +驗電 1 +驚人 1 +驚動 1 +驚喜 1 +驚嘆 1 +驚訝 1 +驚醒 1 +驟減 1 +驟逝 1 +驢肉 1 +驥 1 +骨幹 1 +骯髒 1 +骷髏 1 +體側 1 +體外 1 +體委 1 +體工 1 +體會 1 +體溫 1 +髖骨 1 +高下 1 +高傲 1 +高傲不群 1 +高出 1 +高升 1 +高地 1 +高大 1 +高峰 1 +高座 1 +高手 1 +高效 1 +高新 1 +高杉 1 +高檔 1 +高清 1 +高漲 1 +高熱 1 +高燥 1 +高爾夫 1 +高爾德 1 +高琦 1 +高盧 1 +高聳 1 +高處 1 +高買 1 +高質 1 +高超 1 +高雄 1 +高高在上 1 +髮 1 +髮生 1 +髮辮 1 +鬆髻 1 +鬚 1 +鬚鯨 1 +鬥雞 1 +鬧 1 +鬧出 1 +鬼影 1 +鬼怪 1 +鬼道 1 +魁智 1 +魅惑 1 +魏國 1 +魏斯曼 1 +魏氏 1 +魏澤爾 1 +魔力 1 +魔界 1 +魔石 1 +魔鬼 1 +魚尾 1 +魚腹 1 +魚苗 1 +魚類 1 +魯 1 +魯伯 1 +魯國 1 +魯特 1 +魯登尼亞 1 +魯良新元 1 +魯茨科伊 1 +魯西迪 1 +魯道夫 1 +鮑亞士 1 +鮑克瑟 1 +鮑爾溫 1 +鮑維 1 +鮑里斯 1 +鮑魚 1 +鮮 1 +鮮有 1 +鮮用 1 +鮮虞 1 +鯉齒 1 +鰓蓋 1 +鰭條 1 +鰺沢駅 1 +鱗 1 +鱗甲 1 +鱗骨 1 +鳥 1 +鳥獸 1 +鳥種 1 +鳳 1 +鳳彬 1 +鳴叫 1 +鳴放 1 +鳴道 1 +鴛鴦 1 +鴻南 1 +鴻章 1 +鴻績 1 +鴻華 1 +鴻超 1 +鴻逵 1 +鴻銘 1 +鹽 1 +鹽城 1 +鹽州 1 +鹽酸 1 +鹿兒島 1 +鹿鼎 1 +麒 1 +麗晶 1 +麗泰 1 +麗珍 1 +麗華 1 +麗閣 1 +麥克 1 +麥克佛森 1 +麥克羅伯特森 1 +麥克默多 1 +麥加利 1 +麥卡特尼 1 +麥拉倫 1 +麥格林 1 +麥當勞 1 +麥芽 1 +麥迪文 1 +麩氨酸 1 +麵 1 +麵團 1 +麵皮 1 +麻城 1 +麻塞諸塞 1 +麻將 1 +麻布 1 +麻木 1 +麻痹 1 +黃岡 1 +黃巾 1 +黃昏 1 +黃沙 1 +黃河 1 +黃蜂 1 +黎家 1 +黎明 1 +黎筍 1 +黑奴 1 +黑帶 1 +黑手 1 +黑暗 1 +黑木 1 +黑板 1 +黑死 1 +黑海 1 +黑衫 1 +黑錢 1 +黑鐵木 1 +黑雲 1 +黑髮 1 +默多克 1 +默比施 1 +默默 1 +黛安娜 1 +黛絲 1 +點陣 1 +點點頭 1 +黨團 1 +黨委 1 +黨校 1 +黨歌 1 +黨衛 1 +黨部 1 +黨魁 1 +鼎灶 1 +鼎芬 1 +鼎金 1 +鼓手 1 +鼬鼠 1 +齊國 1 +齋 1 +齒狀 1 +齒輪 1 +齲齒 1 +龍台 1 +龍女 1 +龍文 1 +龍耳 1 +龍頭 1 +龐 1 +龐特佛雷特 1 +龐貝 1 +龜茲 1 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/prefix-table b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/prefix-table new file mode 100644 index 0000000000000000000000000000000000000000..1e49511d31a9733c33068faa4dffceb916bcb4c8 Binary files /dev/null and b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/prefix-table differ diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/suffix-table b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/suffix-table new file mode 100644 index 0000000000000000000000000000000000000000..1e6ca998763792cb4a84595351b4513a71d3ea82 Binary files /dev/null and b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/suffix-table differ diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/tag-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/tag-map new file mode 100644 index 0000000000000000000000000000000000000000..7e4f4c0ba54c71e47a061bbb5de6be4d7943e6c9 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/tag-map @@ -0,0 +1,43 @@ +42 +NN 21794 +VV 13177 +NNP 8280 +, 5824 +CD 5082 +DEC 4350 +RB 4323 +SFN 4229 +IN 4165 +NNB 3963 +. 3807 +JJ 2318 +VC 1935 +CC 1329 +PRP 996 +DT 994 +EC 942 +FW 778 +AS 718 +MD 681 +( 641 +) 641 +PFA 555 +BB 472 +'' 331 +`` 329 +PRD 324 +/ 202 +: 165 +UH 150 +DEV 96 +HYPH 76 +WP 23 +SFV 18 +XX 17 +ADD 8 +SFA 7 +... 4 +PFN 4 +LS 3 +" 1 +VERB 1 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/tag-to-category b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/tag-to-category new file mode 100644 index 0000000000000000000000000000000000000000..13174eaa43d58b1614e797076f8c09af33ee54b8 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/tag-to-category @@ -0,0 +1,42 @@ +" PUNCT +'' PUNCT +( PUNCT +) PUNCT +, PUNCT +. PUNCT +... PUNCT +/ PUNCT +: PUNCT +ADD NOUN +AS PART +BB VERB +CC CCONJ +CD NUM +DEC PART +DEV PART +DT DET +EC PUNCT +FW X +HYPH PUNCT +IN ADP +JJ ADJ +LS X +MD AUX +NN NOUN +NNB NOUN +NNP PROPN +PFA PART +PFN PART +PRD PRON +PRP PRON +RB ADV +SFA PART +SFN PART +SFV PART +UH X +VC VERB +VERB VERB +VV AUX +WP PRON +XX X +`` PUNCT diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/word-map b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/word-map new file mode 100644 index 0000000000000000000000000000000000000000..3b10bdefd7229ebd0c1ceb6bb8e283bc11b1cf80 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter-resource/word-map @@ -0,0 +1,16269 @@ +16268 +, 5851 +的 4289 +. 3759 +在 1273 +年 1165 +9999 1070 +、 952 +是 918 +為 887 +一 863 +於 680 +99 647 +和 639 +9 617 +了 614 +人 467 +個 466 +月 456 +有 453 +他 439 +( 429 +) 429 +與 380 +中 376 +日 356 +」 321 +「 320 +被 315 +這 300 +會 258 +並 255 +以 253 +而 245 +也 244 +上 228 +中國 218 +由 215 +《 213 +》 213 +之 211 +兩 203 +後 202 +及 191 +時 188 +位 186 +· 183 +999 178 +等 175 +到 172 +但 162 +對 158 +大 157 +此 157 +不 156 +其 155 +所 150 +種 143 +或 140 +將 139 +次 132 +美國 131 +成 130 +者 127 +至 125 +該 123 +區 118 +開始 118 +部 117 +三 116 +家 116 +可以 115 +她 115 +都 114 +來 113 +因 113 +國 109 +人口 108 +軍 107 +市 104 +使用 102 +省 102 +從 101 +名 98 +著 97 +則 95 +多 94 +用 94 +日本 93 +沒有 93 +地 92 +曾 92 +第一 92 +他們 91 +州 90 +公司 88 +就 88 +性 88 +由於 88 +其中 87 +地區 87 +新 87 +稱 87 +國家 86 +政府 86 +: 84 +已 84 +主要 83 +小 82 +; 81 +世界 81 +可 81 +大學 81 +下 80 +不同 79 +自 79 +香港 79 +縣 77 +自己 77 +前 76 +因為 76 +研究 76 +總 76 +最 75 +面積 75 +李 74 +還 73 +向 72 +王 72 +進行 72 +它 71 +包括 69 +站 69 +四 68 +號 67 +當時 66 +這些 66 +部分 66 +工作 65 +米 65 +認為 65 +也是 64 +以及 64 +學 64 +村 64 +發現 64 +說 64 +作 63 +又 62 +屬 62 +平方公里 62 +中華 61 +同時 60 +學院 60 +條 60 +成立 59 +第二 59 +二 58 +五 58 +亦 58 +代表 58 +發展 58 +發生 58 +美 58 +能 58 +之後 57 +使 57 +社會 57 +要 57 +一些 56 +人民 56 +內 56 +其他 56 +約 56 +世紀 54 +元 54 +場 54 +過 54 +建築 53 +為了 53 +線 53 +只 52 +張 52 +把 52 +獲得 52 +目前 52 +台 51 +文化 51 +英國 51 +重要 51 +中心 50 +但是 50 +局 50 +更 50 +許多 50 +之間 49 +可能 49 +如 49 +歷史 49 +遊戲 49 +公里 48 +共 48 +帝國 48 +期間 48 +歲 48 +處 48 +音樂 48 +黨 48 +一般 47 +年代 47 +根據 47 +行星 47 +隊 47 +電影 47 +政治 46 +鐵路 46 +城市 45 +故事 45 +組織 45 +便 44 +學校 44 +所有 44 +科學 44 +英 44 +- 43 +任 43 +作品 43 +指 43 +最後 43 +機 43 +語 43 +通過 43 +間 43 +關係 43 +已經 42 +建立 42 +時間 42 +當 42 +電視 42 +共和 41 +後來 41 +比 41 +管理 41 +表示 41 +讓 41 +通常 41 +高 41 +出現 40 +影響 40 +成功 40 +戰爭 40 +提供 40 +系統 40 +動物 39 +地方 39 +就是 39 +座 39 +設計 39 +負責 39 +鎮 39 +長 39 +館 39 +卻 38 +國際 38 +德國 38 +技術 38 +方面 38 +最終 38 +父親 38 +車站 38 +上海 37 +人物 37 +出 37 +分 37 +台灣 37 +各 37 +層 37 +山 37 +方 37 +河 37 +即 36 +參加 36 +擔任 36 +時期 36 +服務 36 +正式 36 +生活 36 +給 36 +要求 36 +路 36 +運動 36 +9,999 35 +一直 35 +再 35 +單位 35 +委員 35 +很 35 +書 35 +段 35 +民國 35 +法國 35 +理論 35 +人類 34 +均 34 +女 34 +才 34 +教 34 +文 34 +歐洲 34 +決定 34 +漢 34 +現在 34 +第三 34 +航空 34 +行政 34 +足球 34 +雖然 34 +八 33 +問題 33 +小說 33 +我 33 +教育 33 +製作 33 +不是 32 +保護 32 +全國 32 +北 32 +印度 32 +員 32 +形成 32 +很多 32 +得到 32 +活動 32 +節目 32 +西班牙 32 +主義 31 +寺 31 +屆 31 +島 31 +市鎮 31 +方式 31 +時代 31 +最高 31 +生 31 +街 31 +起 31 +需要 31 +99% 30 +中央 30 +另 30 +另外 30 +器 30 +天 30 +得 30 +控制 30 +擁有 30 +每 30 +產生 30 +經濟 30 +羅馬 30 +進入 30 +隨 30 +仍 29 +公園 29 +具有 29 +去 29 +大陸 29 +式 29 +接受 29 +東 29 +球隊 29 +當地 29 +院 29 +雙 29 +9.99 28 +並且 28 +北京 28 +受到 28 +同 28 +如果 28 +學生 28 +工程 28 +時候 28 +港 28 +物 28 +級 28 +計劃 28 +超過 28 +道 28 +電腦 28 +存在 27 +室 27 +對於 27 +情況 27 +戰鬥 27 +方法 27 +林 27 +機場 27 +比賽 27 +總統 27 +義大利 27 +都是 27 +非 27 +非常 27 +點 27 +人員 26 +做 26 +原因 26 +國民 26 +支持 26 +數 26 +法 26 +派 26 +然而 26 +獨立 26 +甚至 26 +生物 26 +聯合 26 +項 26 +主 25 +兒子 25 +出版 25 +劉 25 +南 25 +巴士 25 +幾 25 +我們 25 +權 25 +海拔 25 +第99 25 +經過 25 +議會 25 +賽 25 +99.99 24 +交通 24 +例如 24 +分布 24 +加入 24 +化 24 +同年 24 +城 24 +大量 24 +於是 24 +族 24 +最大 24 +未 24 +海 24 +湖 24 +生產 24 +皇帝 24 +科 24 +第9 24 +系列 24 +高度 24 +9.9 23 +事件 23 +們 23 +內容 23 +命名 23 +型 23 +宣布 23 +導致 23 +帶 23 +必須 23 +成員 23 +本 23 +正 23 +清朝 23 +演出 23 +無 23 +直接 23 +行為 23 +裡 23 +西 23 +距離 23 +軍事 23 +部隊 23 +鄉 23 +銀行 23 +集團 23 +99,999 22 +一樣 22 +不少 22 +不過 22 +傳統 22 +僅 22 +副 22 +反對 22 +單 22 +增加 22 +它們 22 +思想 22 +有關 22 +業 22 +此外 22 +母親 22 +水 22 +灣 22 +版 22 +紐約 22 +組成 22 +結構 22 +聯盟 22 +聯賽 22 +能力 22 +華 22 +設 22 +語言 22 +附近 22 +除 22 +一起 21 +作用 21 +出生 21 +制 21 +力 21 +受 21 +古 21 +只有 21 +唯一 21 +地位 21 +府 21 +廣泛 21 +植物 21 +海軍 21 +無法 21 +獲 21 +率 21 +球 21 +環境 21 +紀念 21 +結束 21 +舉行 21 +角色 21 +議員 21 +選舉 21 +里 21 +量 21 +韓 21 +體 21 +主席 20 +仍然 20 +六 20 +冠軍 20 +出任 20 +分子 20 +原子 20 +參與 20 +地下 20 +城鎮 20 +天津 20 +工業 20 +希臘 20 +度 20 +引起 20 +採用 20 +攻擊 20 +整個 20 +文學 20 +文物 20 +朝鮮 20 +東北 20 +核 20 +機構 20 +比較 20 +清 20 +猶太 20 +現代 20 +管轄 20 +範圍 20 +細胞 20 +經常 20 +胡 20 +自治 20 +自由 20 +角 20 +逐漸 20 +重新 20 +類型 20 +不久 19 +不能 19 +代 19 +以上 19 +佔領 19 +全 19 +分別 19 +原 19 +台北 19 +唐 19 +多數 19 +天文 19 +字 19 +巴黎 19 +最早 19 +會議 19 +有些 19 +民族 19 +洋 19 +結果 19 +繼續 19 +能夠 19 +趙 19 +造成 19 +達 19 +達到 19 +部份 19 +鄭 19 +風格 19 +不會 18 +亞 18 +令 18 +任何 18 +企業 18 +先後 18 +列車 18 +功能 18 +半 18 +取得 18 +合併 18 +外交 18 +子 18 +廣州 18 +戰役 18 +所以 18 +明朝 18 +期 18 +每年 18 +毛 18 +治療 18 +法院 18 +畢業 18 +疾病 18 +相當 18 +節 18 +艦隊 18 +身體 18 +軍隊 18 +進 18 +陳 18 +離開 18 +領導 18 +體育 18 +99.9 17 +七 17 +你 17 +再次 17 +十 17 +名字 17 +大戰 17 +宗教 17 +家族 17 +希望 17 +廣場 17 +想 17 +戰 17 +採取 17 +提出 17 +改 17 +教堂 17 +新聞 17 +星 17 +曲 17 +最初 17 +歐 17 +漫畫 17 +片 17 +物理 17 +特別 17 +發行 17 +經 17 +總部 17 +自然 17 +蘇聯 17 +行動 17 +製造 17 +西北 17 +資料 17 +選擇 17 +那 17 +金 17 +領域 17 +顆 17 +類 17 +飛機 17 +九龍 16 +低 16 +像 16 +共同 16 +利用 16 +制度 16 +前往 16 +創作 16 +勢力 16 +區域 16 +協助 16 +各種 16 +大樓 16 +家庭 16 +實驗 16 +居民 16 +山東 16 +心理 16 +或者 16 +拒絕 16 +步 16 +武器 16 +民主 16 +法律 16 +爆發 16 +狀態 16 +而且 16 +藝術 16 +表現 16 +記者 16 +設有 16 +設立 16 +資源 16 +軌道 16 +過程 16 +道路 16 +還是 16 +革命 16 +首次 16 +高速 16 +下轄 15 +中共 15 +主角 15 +作戰 15 +初 15 +則是 15 +化石 15 +十分 15 +南京 15 +南部 15 +商 15 +噸 15 +回到 15 +國內 15 +國王 15 +地球 15 +基督 15 +大廈 15 +大約 15 +太陽 15 +女兒 15 +女性 15 +如此 15 +學習 15 +完全 15 +實際 15 +常 15 +常見 15 +幾乎 15 +應用 15 +承認 15 +投資 15 +指出 15 +指揮 15 +普查 15 +未來 15 +東南 15 +橋 15 +此後 15 +火星 15 +版本 15 +牠們 15 +發表 15 +白 15 +直到 15 +碼頭 15 +科技 15 +立法 15 +組 15 +統治 15 +老 15 +職業 15 +著名 15 +蒙古 15 +西部 15 +調查 15 +跟 15 +路線 15 +車輛 15 +農業 15 +這樣 15 +酒 15 +鐵道 15 +集 15 +/ 14 +99999 14 +999萬 14 +一定 14 +交易 14 +人們 14 +今 14 +以來 14 +位置 14 +使得 14 +俄羅斯 14 +俱樂部 14 +傳播 14 +兒童 14 +公主 14 +劇 14 +北部 14 +博物 14 +合作 14 +基本 14 +境內 14 +外 14 +太平 14 +失去 14 +完成 14 +容易 14 +密度 14 +專業 14 +市場 14 +幫助 14 +建造 14 +抗 14 +擊敗 14 +旗 14 +曾經 14 +有限 14 +架 14 +案 14 +棲息 14 +波蘭 14 +澳門 14 +營運 14 +特色 14 +獎 14 +男 14 +相同 14 +看到 14 +簡稱 14 +系 14 +統計 14 +網路 14 +聯邦 14 +色 14 +董事 14 +規模 14 +視 14 +解決 14 +言 14 +起來 14 +車 14 +這裡 14 +進攻 14 +開發 14 +限制 14 +顯示 14 +黃 14 +99萬 13 +九 13 +倫敦 13 +全部 13 +公路 13 +公開 13 +其後 13 +初期 13 +加上 13 +博士 13 +司令 13 +同意 13 +因而 13 +圖書 13 +土地 13 +埃及 13 +基礎 13 +堂 13 +墨西哥 13 +天主 13 +妻子 13 +娛樂 13 +建 13 +建設 13 +形 13 +形式 13 +從事 13 +手 13 +打 13 +改變 13 +故 13 +教會 13 +數學 13 +數據 13 +數量 13 +早期 13 +更多 13 +東京 13 +梁 13 +樂團 13 +樓 13 +模式 13 +死 13 +死亡 13 +每個 13 +水平 13 +流域 13 +準備 13 +物種 13 +物質 13 +王國 13 +玩家 13 +男性 13 +當選 13 +病 13 +目 13 +目標 13 +相關 13 +知識 13 +社 13 +第四 13 +紀錄 13 +統一 13 +舊 13 +街道 13 +設定 13 +身份 13 +較 13 +辦公 13 +速度 13 +運輸 13 +郡 13 +項目 13 +食物 13 +馬 13 +---- 12 +一帶 12 +上帝 12 +且 12 +中學 12 +中部 12 +之前 12 +京 12 +人數 12 +什麼 12 +以下 12 +份 12 +保留 12 +個人 12 +價值 12 +元素 12 +內部 12 +公元 12 +具 12 +半島 12 +原本 12 +反應 12 +反映 12 +可是 12 +商業 12 +嚴重 12 +基地 12 +大型 12 +女子 12 +孫 12 +將軍 12 +尤其 12 +居住 12 +師 12 +帶來 12 +平均 12 +建議 12 +很大 12 +律師 12 +恆星 12 +恐怖 12 +應 12 +據 12 +改革 12 +政策 12 +新加坡 12 +月台 12 +有時 12 +東部 12 +楊 12 +標準 12 +機關 12 +歌手 12 +決賽 12 +汽車 12 +減少 12 +潛艇 12 +熱帶 12 +瑞典 12 +生命 12 +產品 12 +產業 12 +盃 12 +相對 12 +眾 12 +眾多 12 +知道 12 +神 12 +精神 12 +經營 12 +船 12 +該國 12 +變成 12 +賽事 12 +近 12 +透過 12 +遭到 12 +遺址 12 +避免 12 +郭 12 +醫院 12 +重建 12 +重慶 12 +門 12 +電子 12 +? 11 +主張 11 +主持 11 +主教 11 +之中 11 +亨利 11 +人士 11 +以前 11 +以色列 11 +件 11 +伊斯蘭 11 +佔 11 +作者 11 +保持 11 +信仰 11 +先生 11 +全球 11 +出身 11 +創立 11 +創辦 11 +力量 11 +去世 11 +反 11 +取代 11 +召開 11 +周 11 +園 11 +團 11 +大會 11 +奧地利 11 +威脅 11 +季 11 +安全 11 +專輯 11 +帝 11 +平方米 11 +強烈 11 +接近 11 +推出 11 +描述 11 +播放 11 +文字 11 +普遍 11 +末 11 +朱 11 +業務 11 +殖民 11 +江 11 +江蘇 11 +涉及 11 +現時 11 +界 11 +留下 11 +目的 11 +相信 11 +看 11 +社區 11 +福建 11 +管 11 +給予 11 +網站 11 +線路 11 +繼承 11 +英格蘭 11 +見 11 +試圖 11 +資訊 11 +超 11 +邊 11 +部門 11 +隻 11 +面 11 +首 11 +99.9% 10 +99.99% 10 +並非 10 +事 10 +事業 10 +交流 10 +以後 10 +來往 10 +供 10 +俄 10 +儘管 10 +勞動 10 +包含 10 +化學 10 +協會 10 +君主 10 +和平 10 +唱片 10 +圈 10 +國旗 10 +國會 10 +報 10 +報告 10 +威廉 10 +學位 10 +寬 10 +廠 10 +徐 10 +復興 10 +感到 10 +手術 10 +投入 10 +接 10 +推動 10 +播出 10 +支 10 +改名 10 +文明 10 +文藝 10 +明顯 10 +有效 10 +杭州 10 +東方 10 +條件 10 +模型 10 +殺 10 +河流 10 +法庭 10 +波 10 +洲 10 +派遣 10 +演員 10 +演唱 10 +火車 10 +爭議 10 +特定 10 +特徵 10 +特殊 10 +獨特 10 +生長 10 +當中 10 +症 10 +發動 10 +發射 10 +確定 10 +神話 10 +移民 10 +空間 10 +立 10 +篇 10 +終於 10 +結婚 10 +綫 10 +維持 10 +總理 10 +群 10 +若 10 +華盛頓 10 +葡萄 10 +蔡 10 +藏 10 +蘇 10 +衝突 10 +西藏 10 +規定 10 +訓練 10 +記 10 +記載 10 +記錄 10 +話 10 +該市 10 +警察 10 +變化 10 +責任 10 +起源 10 +逝世 10 +運行 10 +醫 10 +錦標 10 +關於 10 +陸軍 10 +雜誌 10 +需 10 +類似 10 +飛行 10 +首都 10 +駐 10 +'' 9 +一切 9 +一致 9 +上陣 9 +下降 9 +不斷 9 +不滿 9 +中山 9 +丹麥 9 +之外 9 +事務 9 +互相 9 +介紹 9 +來到 9 +健康 9 +光 9 +內閣 9 +全長 9 +公布 9 +其實 9 +再度 9 +出來 9 +出售 9 +分支 9 +到達 9 +動畫 9 +南方 9 +危險 9 +古代 9 +古典 9 +叫 9 +吃 9 +各類 9 +品 9 +國務 9 +團體 9 +地點 9 +執行 9 +塔 9 +士兵 9 +奪得 9 +好 9 +媒體 9 +字母 9 +孩子 9 +學者 9 +寫 9 +對手 9 +就讀 9 +工人 9 +帶領 9 +廟 9 +引擎 9 +強 9 +強大 9 +後期 9 +快速 9 +恢復 9 +意外 9 +戰略 9 +打擊 9 +批評 9 +拍攝 9 +接觸 9 +攻入 9 +放棄 9 +政權 9 +教學 9 +星期 9 +普通 9 +朋友 9 +未能 9 +本人 9 +本身 9 +枚 9 +柏林 9 +核心 9 +森林 9 +標誌 9 +機會 9 +機車 9 +權利 9 +此時 9 +殿 9 +民間 9 +沿海 9 +浙江 9 +湖泊 9 +滿洲 9 +爆炸 9 +特大 9 +狀況 9 +現 9 +瑞士 9 +當局 9 +發布 9 +皇后 9 +皇家 9 +相互 9 +相似 9 +石 9 +破壞 9 +穩定 9 +空中 9 +第五 9 +絕對 9 +經歷 9 +經理 9 +綜合 9 +總督 9 +老師 9 +而是 9 +聯繫 9 +職務 9 +肉 9 +自行 9 +芬蘭 9 +花園 9 +菲律賓 9 +處理 9 +觀眾 9 +解放 9 +評 9 +貢獻 9 +資格 9 +進士 9 +運作 9 +遭 9 +那麼 9 +酒店 9 +金屬 9 +階段 9 +隧道 9 +隨後 9 +集中 9 +電話 9 +青年 9 +頻道 9 +顏色 9 +高等 9 +-- 8 +上升 8 +下來 8 +中環 8 +主題 8 +亞洲 8 +人工 8 +以外 8 +佔地 8 +何 8 +依據 8 +俄國 8 +保守 8 +信息 8 +傅 8 +價格 8 +儒 8 +光棍 8 +內地 8 +內戰 8 +公分 8 +分鐘 8 +利益 8 +劇情 8 +劑 8 +加 8 +加拿大 8 +十一 8 +即使 8 +原來 8 +口 8 +古老 8 +同樣 8 +命令 8 +喜歡 8 +因素 8 +圖 8 +圖案 8 +地鐵 8 +報道 8 +增長 8 +大多 8 +大小 8 +大道 8 +始 8 +官 8 +家人 8 +專門 8 +小型 8 +小時 8 +尚 8 +局長 8 +山脈 8 +山西 8 +工藝 8 +工資 8 +左右 8 +巨大 8 +平方千米 8 +幻想 8 +廣播 8 +廣東 8 +廳 8 +往往 8 +從此 8 +德 8 +意義 8 +意見 8 +或是 8 +房屋 8 +批 8 +按 8 +提升 8 +提高 8 +攝影 8 +政 8 +效果 8 +教授 8 +文章 8 +方案 8 +旅遊 8 +早 8 +明確 8 +書記 8 +書院 8 +曹 8 +材料 8 +武 8 +武漢 8 +比如 8 +污染 8 +注意 8 +測試 8 +澳大利亞 8 +澳洲 8 +瀋陽 8 +燃料 8 +爵士 8 +父母 8 +現存 8 +男子 8 +病逝 8 +發明 8 +白色 8 +的話 8 +皆 8 +監督 8 +真正 8 +知 8 +知名 8 +秘書 8 +秦 8 +程度 8 +立方米 8 +符號 8 +等等 8 +粒子 8 +紅 8 +維也納 8 +編碼 8 +編輯 8 +署 8 +羽毛 8 +翻譯 8 +考慮 8 +聚集 8 +股份 8 +臨時 8 +良好 8 +芝加哥 8 +葉 8 +表達 8 +複雜 8 +襲擊 8 +西南 8 +解釋 8 +討論 8 +許 8 +詞 8 +變 8 +貓 8 +賽季 8 +贏得 8 +軟體 8 +轉 8 +通 8 +過去 8 +邨 8 +部長 8 +鄧 8 +重 8 +重大 8 +銀河 8 +鏡 8 +長度 8 +隨即 8 +雄性 8 +靠 8 +餐廳 8 +首府 8 +高中 8 +A 7 +A999 7 +~ 7 +下午 7 +不可 7 +主人 7 +之下 7 +事實 7 +事情 7 +二世 7 +二戰 7 +交換 7 +任命 7 +伊麗莎白 7 +住宅 7 +佛教 7 +保險 7 +倍 7 +傳說 7 +入侵 7 +公共 7 +公務 7 +公爵 7 +共產 7 +典型 7 +分析 7 +列 7 +前身 7 +創造 7 +匈奴 7 +北角 7 +十字 7 +卡 7 +原著 7 +右 7 +各地 7 +名稱 7 +名義 7 +吳 7 +吸引 7 +命 7 +員工 7 +哲學 7 +唐朝 7 +喬治 7 +回 7 +在此 7 +城堡 7 +城門 7 +基金 7 +場所 7 +大使 7 +天星 7 +天然 7 +失敗 7 +套 7 +奴隸 7 +學術 7 +安排 7 +宋 7 +實現 7 +實行 7 +專科 7 +尋找 7 +尋求 7 +小組 7 +島嶼 7 +左 7 +差異 7 +市區 7 +市民 7 +常常 7 +幣 7 +平原 7 +年級 7 +年輕 7 +店 7 +建國 7 +弗吉尼亞 7 +強調 7 +形象 7 +很少 7 +想像 7 +意識 7 +愛爾蘭 7 +戲 7 +找到 7 +持有 7 +指導 7 +探測 7 +支援 7 +收斂 7 +放 7 +教師 7 +施 7 +旗下 7 +明 7 +最多 7 +本地 7 +某些 7 +校園 7 +核糖 7 +條約 7 +榮譽 7 +樂隊 7 +檢查 7 +款 7 +母音 7 +氏 7 +氣候 7 +水庫 7 +沒 7 +海岸 7 +海洋 7 +混合 7 +清真 7 +港島 7 +湖南 7 +湯姆 7 +滿 7 +激烈 7 +無綫 7 +然後 7 +熊貓 7 +熱 7 +特有 7 +班 7 +現有 7 +現象 7 +球員 7 +球季 7 +理工 7 +甘肅 7 +生態 7 +申請 7 +真實 7 +石油 7 +礁 7 +秘密 7 +移動 7 +空軍 7 +突破 7 +策略 7 +簽訂 7 +約翰 7 +結合 7 +維新 7 +綱 7 +網 7 +翌年 7 +臺灣 7 +興建 7 +興趣 7 +舉辦 7 +航班 7 +航線 7 +艦 7 +茶 7 +著作 7 +衛生 7 +表演 7 +表面 7 +裔 7 +西方 7 +規劃 7 +覺得 7 +觀測 7 +觀點 7 +計算 7 +訪問 7 +設施 7 +評論 7 +調整 7 +講述 7 +議院 7 +讀 7 +貴族 7 +貿易 7 +較小 7 +較為 7 +輛 7 +轟炸 7 +迅速 7 +近年 7 +連接 7 +道德 7 +達成 7 +適合 7 +選出 7 +邏輯 7 +醫學 7 +重點 7 +錄製 7 +鏡頭 7 +長期 7 +長達 7 +關 7 +降低 7 +雖 7 +需求 7 +面對 7 +韓國 7 +領先 7 +領袖 7 +題材 7 +風暴 7 +食用 7 +駐守 7 +體現 7 +體系 7 +高級 7 +高達 7 +魔法 7 +魚 7 +999999 6 +999億 6 +999多 6 +JR 6 +The 6 +丈夫 6 +上市 6 +上映 6 +乘 6 +事物 6 +二十 6 +亦是 6 +享受 6 +亮 6 +代理 6 +任務 6 +但丁 6 +住 6 +作出 6 +來源 6 +依然 6 +依靠 6 +促進 6 +信號 6 +個體 6 +做法 6 +側 6 +傳 6 +優勢 6 +元朗 6 +全家 6 +公民 6 +公眾 6 +兼 6 +出土 6 +判決 6 +剛 6 +劃分 6 +加工 6 +助理 6 +努力 6 +動力 6 +十八 6 +協議 6 +卡爾 6 +原始 6 +反射 6 +取消 6 +口號 6 +司 6 +司法 6 +含有 6 +吸收 6 +呂 6 +呼吸 6 +咖啡 6 +商品 6 +商店 6 +嘗試 6 +四川 6 +困難 6 +國歌 6 +地產 6 +基 6 +基因 6 +壓力 6 +外國 6 +多樣 6 +大大 6 +大獎 6 +大眾 6 +太空 6 +夫人 6 +奧運 6 +她們 6 +好友 6 +如同 6 +始建 6 +嬴 6 +季節 6 +官方 6 +定居 6 +定義 6 +客運 6 +宣佈 6 +宮 6 +家中 6 +密碼 6 +封 6 +對應 6 +對抗 6 +對象 6 +導演 6 +展覽 6 +島上 6 +師範 6 +席 6 +平等 6 +平面 6 +底 6 +廣告 6 +延伸 6 +強度 6 +形容 6 +形態 6 +形狀 6 +影片 6 +彼此 6 +徒 6 +情感 6 +意 6 +意味 6 +愛 6 +感 6 +懷孕 6 +戀 6 +成熟 6 +成績 6 +成長 6 +手法 6 +打算 6 +批准 6 +投票 6 +授予 6 +提名 6 +搖滾 6 +搜索 6 +操作 6 +擴展 6 +改編 6 +效力 6 +敘利亞 6 +教導 6 +新城 6 +方向 6 +方形 6 +日報 6 +日耳曼 6 +時任 6 +時常 6 +普魯士 6 +更名 6 +最近 6 +朝廷 6 +杯 6 +校區 6 +校長 6 +楚 6 +樹 6 +歌曲 6 +止 6 +死後 6 +民眾 6 +池 6 +河道 6 +流行 6 +海盜 6 +消費 6 +深入 6 +深圳 6 +滅亡 6 +火 6 +無論 6 +版權 6 +牙齒 6 +王朝 6 +玻璃 6 +生存 6 +男友 6 +町 6 +畫 6 +畫家 6 +病毒 6 +發出 6 +發起 6 +發達 6 +短 6 +碑 6 +確認 6 +神奇 6 +神經 6 +禁止 6 +私人 6 +秦國 6 +穆斯林 6 +立刻 6 +立場 6 +童年 6 +端 6 +第七 6 +籃球 6 +米蘭 6 +經典 6 +經驗 6 +緬甸 6 +繪畫 6 +缺乏 6 +羅 6 +美麗 6 +習俗 6 +翡翠 6 +職 6 +能量 6 +色彩 6 +蔣 6 +蕭 6 +藉由 6 +虛擬 6 +血統 6 +行 6 +行走 6 +表明 6 +袁 6 +製成 6 +覆蓋 6 +規則 6 +設置 6 +試驗 6 +詩 6 +詩人 6 +詩歌 6 +該片 6 +說服 6 +說法 6 +論 6 +諮詢 6 +證明 6 +豐富 6 +走 6 +超人 6 +越來越 6 +跑道 6 +路易斯 6 +車展 6 +輿論 6 +近代 6 +返回 6 +退役 6 +通往 6 +通訊 6 +造 6 +進步 6 +過來 6 +選區 6 +遺傳 6 +邀請 6 +邊緣 6 +邱 6 +酒精 6 +醫生 6 +醫療 6 +金融 6 +銷售 6 +開展 6 +開放 6 +阻止 6 +陷入 6 +隊員 6 +階級 6 +隨機 6 +雕刻 6 +離 6 +雲南 6 +電池 6 +非洲 6 +須 6 +顧問 6 +首先 6 +騎兵 6 +黎 6 +9,999,999 5 +99.9萬 5 +999,999 5 +99億 5 +9千億 5 +『 5 +上述 5 +不僅 5 +不好 5 +中立 5 +中間 5 +主流 5 +事故 5 +亞歷山大 5 +亞馬遜 5 +人均 5 +今天 5 +今日 5 +介入 5 +以北 5 +任期 5 +佔據 5 +作家 5 +依舊 5 +侵略 5 +保存 5 +信 5 +信任 5 +信奉 5 +信託 5 +修正 5 +停止 5 +傑出 5 +傳承 5 +傷害 5 +像是 5 +儀式 5 +先 5 +免費 5 +入 5 +公交 5 +公會 5 +兵 5 +其它 5 +其餘 5 +冷卻 5 +分配 5 +分類 5 +列入 5 +別墅 5 +刺激 5 +創建 5 +加熱 5 +加盟 5 +動作 5 +勞工 5 +化合 5 +北海 5 +十二 5 +千 5 +升級 5 +南北 5 +南極 5 +印第安那 5 +參謀 5 +參議 5 +受傷 5 +叫做 5 +史 5 +司馬 5 +各個 5 +合 5 +合法 5 +合理 5 +同盟 5 +名單 5 +否認 5 +呈 5 +呈現 5 +周圍 5 +品牌 5 +哈定 5 +啟超 5 +善 5 +喇嘛 5 +固定 5 +固體 5 +圍 5 +圖像 5 +土耳其 5 +在內 5 +地圖 5 +城區 5 +執政 5 +培養 5 +堅持 5 +堡 5 +場地 5 +壁畫 5 +壘 5 +外科 5 +大氣 5 +大西 5 +如下 5 +如今 5 +妻 5 +始終 5 +孔 5 +學名 5 +學會 5 +學科 5 +宇宙 5 +安裝 5 +官吏 5 +客戶 5 +客體 5 +宮廷 5 +家長 5 +容納 5 +宿舍 5 +察覺 5 +寫作 5 +專利 5 +專家 5 +對外 5 +對此 5 +少數 5 +展出 5 +展開 5 +岸 5 +工 5 +工具 5 +巴西 5 +市政 5 +席位 5 +年度 5 +底部 5 +廈門 5 +廖 5 +廣 5 +廣西 5 +建成 5 +引發 5 +弟弟 5 +得知 5 +微博 5 +心 5 +意思 5 +愛情 5 +感情 5 +感覺 5 +慈善 5 +態度 5 +慶祝 5 +成年 5 +成本 5 +成都 5 +戰國 5 +戰後 5 +房間 5 +手中 5 +手段 5 +托勒密 5 +找 5 +技能 5 +抗議 5 +抵抗 5 +抵達 5 +拜占庭 5 +持續 5 +指定 5 +指示 5 +掌握 5 +排名 5 +接管 5 +推進 5 +措施 5 +提到 5 +換 5 +撤銷 5 +收入 5 +收藏 5 +政務 5 +故宮 5 +教皇 5 +教習 5 +敵人 5 +文忠 5 +文獻 5 +斯 5 +新型 5 +新華 5 +新鮮 5 +方便 5 +方言 5 +施工 5 +旅行 5 +日期 5 +早年 5 +明治 5 +更加 5 +書中 5 +有的 5 +朝 5 +本片 5 +杜 5 +東海 5 +東西 5 +架構 5 +某種 5 +查爾斯 5 +查理 5 +柯林頓 5 +棉花 5 +棒球 5 +極 5 +榜 5 +構成 5 +樓梯 5 +機制 5 +機器 5 +次年 5 +欣賞 5 +歡迎 5 +正常 5 +正確 5 +武裝 5 +歸 5 +殺害 5 +每天 5 +民 5 +民兵 5 +氣體 5 +水果 5 +水系 5 +汞 5 +江西 5 +決策 5 +河北 5 +河南 5 +波音 5 +泥塑 5 +泰安 5 +泳兒 5 +洛桑 5 +洪 5 +海峽 5 +海底 5 +消息 5 +游擊 5 +湖北 5 +溫 5 +溫度 5 +溫泉 5 +滅絕 5 +演化 5 +演奏 5 +漢朝 5 +潘 5 +澤東 5 +澳 5 +濃度 5 +炎 5 +無關 5 +牌 5 +物品 5 +物業 5 +物體 5 +狗 5 +狩獵 5 +王子 5 +珊瑚 5 +現場 5 +現實 5 +甘 5 +生下 5 +生涯 5 +用作 5 +發送 5 +百 5 +直徑 5 +直至 5 +相 5 +真理 5 +眼 5 +督 5 +祖先 5 +神秘 5 +神聖 5 +秋 5 +移居 5 +程 5 +程序 5 +種植 5 +種類 5 +稱作 5 +空氣 5 +穿 5 +突變 5 +競賽 5 +符合 5 +第六 5 +簡單 5 +粵 5 +紅軍 5 +紐西蘭 5 +級別 5 +素 5 +細節 5 +組合 5 +結局 5 +編號 5 +練習 5 +總署 5 +繞 5 +美洲 5 +群島 5 +群眾 5 +耕地 5 +聯絡 5 +聲明 5 +肯定 5 +臺 5 +興奮 5 +興起 5 +般 5 +船上 5 +艘 5 +花 5 +華視 5 +落後 5 +藝人 5 +藥物 5 +蘇格蘭 5 +虎丘 5 +虛構 5 +融合 5 +血壓 5 +行業 5 +裝甲 5 +裝置 5 +裡面 5 +西遊 5 +觀 5 +解散 5 +設備 5 +診斷 5 +該地 5 +該屬 5 +認可 5 +認知 5 +認識 5 +誕生 5 +請 5 +象徵 5 +貝多芬 5 +財產 5 +貨車 5 +質量 5 +赤道 5 +赴 5 +超級 5 +越南 5 +趙國 5 +路易 5 +身亡 5 +軍團 5 +輪 5 +轉移 5 +轉變 5 +辭去 5 +辭職 5 +退出 5 +通車 5 +通道 5 +連 5 +連任 5 +連續 5 +進而 5 +進軍 5 +遠 5 +適當 5 +遭遇 5 +那裡 5 +邦 5 +郗 5 +郵政 5 +鄉鎮 5 +鄰近 5 +醒亞 5 +醫師 5 +鎊 5 +鎮壓 5 +鐵 5 +鑒 5 +長大 5 +長官 5 +長沙 5 +開設 5 +防禦 5 +陝西 5 +院長 5 +陸 5 +階層 5 +障礙 5 +隸屬 5 +難 5 +電梯 5 +電車 5 +青海 5 +預測 5 +預算 5 +預防 5 +領土 5 +頻率 5 +食品 5 +飲料 5 +飾 5 +首相 5 +馬來西亞 5 +馬達 5 +馮 5 +騎士 5 +體積 5 +體色 5 +黑人 5 +龐大 5 +9-9 4 +9.99億 4 +9.9億 4 +9.9萬 4 +B 4 +Casey 4 +County 4 +Google 4 +John 4 +M9 4 +NBA 4 +of 4 +』 4 +一世 4 +一半 4 +一旦 4 +三世 4 +上演 4 +上訴 4 +下令 4 +下頜 4 +不及 4 +不得 4 +不應 4 +不等 4 +不足 4 +世凱 4 +中東 4 +中止 4 +中華龍鳥 4 +中視 4 +丹羽 4 +主演 4 +乃 4 +久 4 +之上 4 +乘坐 4 +乘客 4 +乾燥 4 +乾隆 4 +了解 4 +予 4 +事變 4 +于 4 +五世 4 +亞軍 4 +交給 4 +交配 4 +交響 4 +亦為 4 +享年 4 +人事 4 +代言 4 +以西 4 +任職 4 +企圖 4 +伊拉克 4 +伺服 4 +供應 4 +依法 4 +侵蝕 4 +保加利亞 4 +保障 4 +信徒 4 +修復 4 +倫理 4 +做出 4 +停留 4 +價 4 +優惠 4 +優秀 4 +兄弟 4 +充電 4 +先進 4 +克拉克 4 +入口 4 +入選 4 +全面 4 +公 4 +公安 4 +公式 4 +共振 4 +其間 4 +具體 4 +冬天 4 +出場 4 +出戰 4 +出發 4 +出租 4 +出色 4 +刀 4 +分佈 4 +分成 4 +分期 4 +列表 4 +則天 4 +則為 4 +前期 4 +前線 4 +前進 4 +劇集 4 +劍 4 +加州 4 +加強 4 +勝利 4 +包 4 +包裝 4 +匈牙利 4 +區劃 4 +十五 4 +協定 4 +協調 4 +南側 4 +南延 4 +印第安 4 +危機 4 +原有 4 +原理 4 +參考 4 +參賽 4 +古物 4 +句 4 +只要 4 +各國 4 +各界 4 +合成 4 +合眾 4 +合金 4 +吉 4 +吉他 4 +同事 4 +同治 4 +名將 4 +名詞 4 +呎 4 +呢 4 +周年 4 +命運 4 +哈爾濱 4 +哥倫比亞 4 +商人 4 +啟用 4 +喬治亞 4 +單車 4 +嘲諷 4 +回來 4 +國防 4 +圓形 4 +地底 4 +地形 4 +地面 4 +坊 4 +基辛格 4 +堅決 4 +墓 4 +夏 4 +外星 4 +夜 4 +夢 4 +大同 4 +大帝 4 +大臣 4 +天皇 4 +夫婦 4 +失望 4 +妹妹 4 +姐姐 4 +姓氏 4 +委任 4 +婚姻 4 +婦女 4 +媽媽 4 +學堂 4 +官員 4 +定律 4 +宣稱 4 +實業 4 +實體 4 +寶貝 4 +小學 4 +少女 4 +尼龍 4 +局部 4 +展示 4 +屯門 4 +山區 4 +山頂 4 +岩 4 +島式 4 +島津 4 +嶺 4 +巡迴 4 +帶給 4 +常用 4 +幅度 4 +幫 4 +平方英里 4 +年齡 4 +幽默 4 +度假 4 +庫 4 +廚房 4 +廢除 4 +廷 4 +影像 4 +影業 4 +往 4 +很快 4 +很難 4 +後者 4 +得分 4 +得名 4 +得寵 4 +循環 4 +微 4 +徵召 4 +志願 4 +快 4 +怎麼 4 +性別 4 +性質 4 +恐龍 4 +患者 4 +情報 4 +情形 4 +情節 4 +情緒 4 +慕尼黑 4 +應該 4 +戀愛 4 +成人 4 +成分 4 +戰敗 4 +戰死 4 +扮演 4 +批判 4 +技巧 4 +抒情 4 +拓展 4 +招募 4 +指數 4 +按照 4 +挪威 4 +排列 4 +排水 4 +排行 4 +接收 4 +接替 4 +推薦 4 +推行 4 +揚州 4 +擔心 4 +擴大 4 +擴建 4 +擴張 4 +收到 4 +收錄 4 +改造 4 +攻克 4 +敖 4 +教區 4 +教宗 4 +教練 4 +整理 4 +數千 4 +數字 4 +文泰 4 +新疆 4 +新竹 4 +旅客 4 +既 4 +日常 4 +昆明 4 +明基 4 +星球 4 +星等 4 +春秋 4 +時段 4 +晉國 4 +晚間 4 +暗示 4 +暴力 4 +更換 4 +曼德拉 4 +最低 4 +最好 4 +最長 4 +有用 4 +服裝 4 +望遠 4 +木材 4 +本作 4 +本土 4 +本科 4 +本線 4 +本魚 4 +東區 4 +某 4 +校舍 4 +格 4 +案件 4 +楚國 4 +樂 4 +樂器 4 +標本 4 +樞紐 4 +模仿 4 +橄欖 4 +檔 4 +檢測 4 +欲 4 +歌 4 +正月 4 +此前 4 +此次 4 +步兵 4 +武術 4 +歷任 4 +死神 4 +殺死 4 +毀 4 +母 4 +每秒 4 +比例 4 +毫克 4 +毫米 4 +水深 4 +永江 4 +污泥 4 +沈 4 +沉澱 4 +沙灘 4 +河川 4 +油價 4 +治 4 +法案 4 +法蘭克 4 +法規 4 +波希米亞 4 +波斯 4 +注入 4 +洛杉磯 4 +洛陽 4 +流經 4 +浦 4 +海域 4 +海外 4 +海德堡 4 +海戰 4 +海水 4 +海灣 4 +海面 4 +液態 4 +液體 4 +深 4 +測量 4 +港鐵 4 +湯 4 +源 4 +滬 4 +滿貫 4 +潮濕 4 +濟南 4 +灣仔 4 +火災 4 +炸藥 4 +烏克蘭 4 +無意 4 +無線 4 +無錫 4 +照片 4 +營 4 +營業 4 +父 4 +爽 4 +牛奶 4 +牧場 4 +特遣 4 +特點 4 +犯罪 4 +狀元 4 +狂 4 +狙擊 4 +獎勵 4 +王后 4 +珍珠 4 +現今 4 +現金 4 +球會 4 +理事 4 +理想 4 +琉球 4 +瑪麗 4 +瓷器 4 +甘珠爾 4 +生化 4 +生意 4 +產地 4 +產量 4 +留 4 +畝 4 +當天 4 +當年 4 +當日 4 +當然 4 +疫苗 4 +癌症 4 +發育 4 +發言 4 +發酵 4 +皮膚 4 +監獄 4 +直 4 +直升 4 +直隸 4 +相比 4 +相近 4 +省份 4 +省委 4 +省級 4 +真相 4 +督察 4 +矩陣 4 +短暫 4 +短篇 4 +研發 4 +社團 4 +神廟 4 +神達 4 +票房 4 +租界 4 +種族 4 +稱號 4 +空調 4 +突然 4 +立即 4 +童 4 +競爭 4 +等級 4 +節日 4 +簽約 4 +粒 4 +精確 4 +紋理 4 +納粹 4 +純 4 +終止 4 +終結 4 +維吾爾 4 +網球 4 +緊密 4 +總量 4 +總長 4 +繼 4 +繼任 4 +罕見 4 +罪名 4 +置 4 +羅伯特 4 +羅馬尼亞 4 +義務 4 +習慣 4 +老闆 4 +考察 4 +考試 4 +聖 4 +聖母 4 +聲稱 4 +聲譽 4 +背景 4 +胡佛 4 +自動 4 +船隻 4 +艙 4 +艱難 4 +苦艾 4 +草本 4 +荷蘭 4 +莊 4 +莊園 4 +莫斯科 4 +華航 4 +落成 4 +著重 4 +董 4 +蒙扎 4 +蓉蓉 4 +薩摩 4 +蘇家 4 +蘋果 4 +蛋白 4 +蜘蛛 4 +血栓 4 +行省 4 +術語 4 +衛星 4 +製 4 +西側 4 +西曼 4 +親 4 +親王 4 +評價 4 +評定 4 +詞語 4 +試 4 +該劇 4 +該區 4 +詹姆斯 4 +誰 4 +課程 4 +談話 4 +請求 4 +論文 4 +識字 4 +警署 4 +議長 4 +讀者 4 +負 4 +財富 4 +財政 4 +貨幣 4 +貨物 4 +貨運 4 +費 4 +賀 4 +資本 4 +資深 4 +資金 4 +賈 4 +質 4 +購買 4 +贊助 4 +起義 4 +足 4 +身上 4 +身分 4 +躲避 4 +車序 4 +軍人 4 +軍力 4 +軍官 4 +軍閥 4 +較多 4 +較少 4 +較高 4 +輸入 4 +輻射 4 +輻鰭魚 4 +轄下 4 +轉換 4 +辦事 4 +辦法 4 +辦理 4 +農民 4 +逃往 4 +這麼 4 +週期 4 +進口 4 +進球 4 +進程 4 +遂 4 +遊行 4 +過度 4 +過枝 4 +遷移 4 +遼寧 4 +邊境 4 +邵 4 +部落 4 +郵票 4 +重視 4 +野生 4 +量子 4 +金字 4 +針對 4 +銅鑼 4 +鋼琴 4 +錯誤 4 +鏡片 4 +鏡面 4 +長子 4 +長江 4 +門診 4 +開 4 +開幕 4 +開闢 4 +關心 4 +防守 4 +阿拉伯 4 +院校 4 +陽光 4 +隊伍 4 +階 4 +隔離 4 +雕塑 4 +雨水 4 +電 4 +電力 4 +電台 4 +電磁 4 +電訊 4 +靜態 4 +靜脈 4 +非法 4 +靠近 4 +音 4 +順 4 +順位 4 +預期 4 +頭 4 +題 4 +願意 4 +風險 4 +颱風 4 +飛 4 +飼養 4 +餐 4 +餘下 4 +首領 4 +體內 4 +體長 4 +高架 4 +高溫 4 +鬥爭 4 +鳥類 4 +黃埔 4 +黑 4 +黑色 4 +黨籍 4 +鼓勵 4 +! 3 +9.9% 3 +9.999 3 +9999萬 3 +99多 3 +99餘 3 +Center 3 +Close 3 +GDP 3 +Game 3 +H9N9 3 +III 3 +James 3 +Mappy 3 +New 3 +PSP 3 +To 3 +You 3 +°C 3 +─ 3 +・ 3 +一同 3 +丁 3 +三十 3 +三江 3 +上游 3 +上環 3 +上表 3 +上課 3 +上面 3 +下列 3 +下台 3 +下場 3 +下級 3 +不但 3 +不想 3 +不敵 3 +不明 3 +不遠 3 +不韋 3 +世 3 +世宗 3 +丟失 3 +中古 3 +中子 3 +中將 3 +中期 3 +中西 3 +中轉 3 +中風 3 +丹佛 3 +丹尼士 3 +主任 3 +主動 3 +主唱 3 +主機 3 +主編 3 +主辦 3 +主體 3 +之內 3 +之時 3 +也好 3 +互聯 3 +五角 3 +些 3 +亞當 3 +亞目 3 +亞視 3 +交往 3 +京都 3 +亮度 3 +人性 3 +人次 3 +人生 3 +人身 3 +他人 3 +付出 3 +以往 3 +以為 3 +以致 3 +任內 3 +任教 3 +份子 3 +企鵝 3 +伊恩 3 +伊賀 3 +休息 3 +估計 3 +伸出 3 +似 3 +伽利略 3 +住戶 3 +住院 3 +佔有 3 +佛 3 +佛像 3 +佛學 3 +佛羅倫薩 3 +作霖 3 +併 3 +來自 3 +例子 3 +供奉 3 +供給 3 +依 3 +依賴 3 +俄亥俄 3 +俘虜 3 +保安 3 +保育 3 +保證 3 +信德 3 +修建 3 +修道 3 +個別 3 +個性 3 +倖存 3 +候選 3 +借用 3 +倪 3 +值 3 +值得 3 +偉大 3 +偏 3 +停 3 +停車 3 +備受 3 +傳奇 3 +傳教 3 +傳染 3 +傾向 3 +優異 3 +允許 3 +元洪 3 +光源 3 +光緒 3 +克里米亞 3 +兒女 3 +內陸 3 +全市 3 +全縣 3 +全體 3 +公國 3 +公尺 3 +公轉 3 +六十 3 +共計 3 +兵力 3 +兼任 3 +冊封 3 +冷 3 +凱撒 3 +出使 3 +出口 3 +出獄 3 +函數 3 +分散 3 +分行 3 +分裂 3 +分解 3 +切斷 3 +刊物 3 +列為 3 +利 3 +制定 3 +前後 3 +前鋒 3 +前面 3 +剛好 3 +創意 3 +劇團 3 +劇本 3 +劇目 3 +劇院 3 +劍橋 3 +力學 3 +加利福尼亞 3 +勞倫斯 3 +匈 3 +化工 3 +北冕 3 +北洋 3 +區別 3 +十七 3 +十多 3 +升 3 +升任 3 +升格 3 +南昌 3 +南海 3 +占 3 +印象 3 +即將 3 +卻是 3 +厘米 3 +原則 3 +原告 3 +原料 3 +友誼 3 +取 3 +受損 3 +叛亂 3 +口徑 3 +古城 3 +可惜 3 +台中 3 +史上 3 +各州 3 +各省 3 +同名 3 +同性 3 +同情 3 +名譽 3 +告訴 3 +周邊 3 +呼聲 3 +和也 3 +和約 3 +品種 3 +哥哥 3 +哲 3 +哺乳 3 +唱 3 +喜愛 3 +單一 3 +嘉賓 3 +器官 3 +噴泉 3 +嚴格 3 +四世 3 +回應 3 +回歸 3 +國泰 3 +國籍 3 +國軍 3 +圍繞 3 +園區 3 +土 3 +土壤 3 +在任 3 +在場 3 +地中 3 +地勢 3 +地獄 3 +地理 3 +坡 3 +報導 3 +場合 3 +塊 3 +塑造 3 +塘 3 +塞爾維亞 3 +填充 3 +填海 3 +填補 3 +境地 3 +墜毀 3 +士官 3 +壯 3 +壯觀 3 +夏天 3 +外來 3 +外界 3 +外部 3 +多達 3 +大夫 3 +大家 3 +大師 3 +大橋 3 +大權 3 +大致 3 +大賽 3 +大選 3 +天國 3 +天王 3 +天空 3 +太 3 +太小 3 +失業 3 +奇異 3 +奈米 3 +契約 3 +奧 3 +奪取 3 +女巫 3 +女王 3 +女神 3 +好評 3 +如何 3 +妃 3 +妨礙 3 +委派 3 +委託 3 +威力 3 +威尼斯 3 +威爾士 3 +威爾斯 3 +娃娃 3 +娶 3 +嫁給 3 +嫌疑 3 +嬌嬌 3 +子女 3 +孟席斯 3 +孫子 3 +學府 3 +宇 3 +安 3 +安德烈 3 +安徽 3 +安置 3 +宋朝 3 +完備 3 +完善 3 +完工 3 +完美 3 +宏 3 +宗 3 +官僚 3 +定 3 +宣傳 3 +宣告 3 +室內 3 +宰相 3 +家寶 3 +家裡 3 +家鄉 3 +富有 3 +富江 3 +寒冷 3 +實在 3 +實施 3 +封閉 3 +射入 3 +射擊 3 +專用 3 +尊 3 +對方 3 +對比 3 +導航 3 +小吃 3 +小堂 3 +小孩 3 +小平 3 +少 3 +尖 3 +就算 3 +尼山 3 +局面 3 +屈 3 +屋邨 3 +屠 3 +屯 3 +州長 3 +已婚 3 +已知 3 +巴哈伊 3 +巴斯 3 +布庫 3 +布袋 3 +師傅 3 +帶到 3 +帶走 3 +帶頭 3 +帽 3 +帽子 3 +平台 3 +平方呎 3 +平方英尺 3 +平民 3 +平衡 3 +年間 3 +幸福 3 +幹線 3 +幼 3 +幾何 3 +序列 3 +度過 3 +康 3 +庾 3 +延續 3 +延長 3 +建業 3 +弓毛 3 +引入 3 +引力 3 +引用 3 +弟子 3 +弱小 3 +強制 3 +強壯 3 +彈 3 +彈簧 3 +彭 3 +彰化 3 +影視 3 +往來 3 +往後 3 +征服 3 +待 3 +待遇 3 +很好 3 +很高 3 +後人 3 +後方 3 +後衛 3 +後面 3 +徑 3 +徒步 3 +復 3 +復工 3 +復辟 3 +徵收 3 +德克薩斯 3 +德川 3 +德意志 3 +德綱 3 +徹底 3 +心情 3 +必然 3 +必要 3 +忽略 3 +思潮 3 +怡和 3 +急速 3 +性格 3 +怪物 3 +怪獸 3 +恩來 3 +悠久 3 +情書 3 +想到 3 +想法 3 +愛上 3 +愛國 3 +愛達荷 3 +感應 3 +慢慢 3 +憑藉 3 +憤怒 3 +懷疑 3 +懸崖 3 +成份 3 +成千上萬 3 +成果 3 +戒毒 3 +截止 3 +戰俘 3 +戰時 3 +戰艦 3 +戲劇 3 +戶 3 +房地產 3 +房子 3 +手下 3 +扎維耶 3 +扭曲 3 +扶手 3 +承受 3 +承擔 3 +投手 3 +抗戰 3 +抵擋 3 +拆除 3 +拉丁 3 +拯救 3 +持 3 +指令 3 +挑戰 3 +挺 3 +捐助 3 +捕捉 3 +捷克 3 +授 3 +排出 3 +探討 3 +接任 3 +接唱 3 +控 3 +提議 3 +換乘 3 +損失 3 +損害 3 +搬到 3 +撞擊 3 +播映 3 +撰寫 3 +擔當 3 +據說 3 +擴充 3 +支付 3 +支撐 3 +支流 3 +收回 3 +收拾 3 +收購 3 +改制 3 +改善 3 +改稱 3 +改進 3 +攻打 3 +放射 3 +故障 3 +救 3 +敘述 3 +教養 3 +文人 3 +文件 3 +料理 3 +斯里蘭卡 3 +新增 3 +新建 3 +新教 3 +新村 3 +新羅 3 +旁遮普 3 +族群 3 +日內瓦 3 +日後 3 +日間 3 +旨 3 +明星 3 +明珠 3 +明納努 3 +昏迷 3 +易 3 +星光 3 +星際 3 +星雲 3 +映射 3 +昭和 3 +是否 3 +時機 3 +時空 3 +晚 3 +晚年 3 +晨興 3 +普選 3 +景德 3 +景點 3 +晶體 3 +暗 3 +暨 3 +暫時 3 +暴動 3 +更為 3 +書店 3 +曼聯 3 +替 3 +替代 3 +替換 3 +最佳 3 +最為 3 +會堂 3 +月氏 3 +月球 3 +有利 3 +有機 3 +有權 3 +有趣 3 +服役 3 +服用 3 +朝日 3 +期望 3 +木板 3 +本來 3 +村民 3 +杜蘭戈 3 +杰 3 +東側 3 +東港 3 +東面 3 +板塊 3 +柏立基 3 +某個 3 +栃木 3 +校名 3 +核電 3 +栽培 3 +栽種 3 +桃 3 +桃園 3 +桃浦 3 +梅 3 +梅妃 3 +梅莉迪絲 3 +條例 3 +極度 3 +極端 3 +概念 3 +概率 3 +榮 3 +榮聲 3 +槍 3 +槍手 3 +樂曲 3 +樂章 3 +樊 3 +模擬 3 +機率 3 +檢察 3 +檸檬 3 +權力 3 +權勢 3 +權益 3 +次子 3 +次日 3 +歌劇 3 +正義 3 +正選 3 +步槍 3 +步道 3 +死傷 3 +死去 3 +毀滅 3 +比起 3 +民進 3 +氣壓 3 +氣泡 3 +氧化 3 +氧氣 3 +水上 3 +水域 3 +水塔 3 +水族 3 +水溝 3 +水稻 3 +永遠 3 +求救 3 +江南 3 +江孜 3 +污水 3 +決議 3 +沒收 3 +沖 3 +沙烏地阿拉伯 3 +油脂 3 +沼澤 3 +沿著 3 +法人 3 +法學 3 +法官 3 +波動 3 +波士頓 3 +波長 3 +泰國 3 +洋房 3 +洋行 3 +洗浴 3 +活佛 3 +活力 3 +流動 3 +流感 3 +流量 3 +海上 3 +海珊 3 +消滅 3 +淋巴 3 +淘汰 3 +淡水 3 +清代 3 +渝 3 +港口 3 +湖水 3 +湯瑪斯 3 +準則 3 +溥儀 3 +溫帶 3 +溶解 3 +滉 3 +滑冰 3 +漂亮 3 +漢城 3 +漳州 3 +潛入 3 +火箭 3 +災難 3 +為期 3 +無數 3 +煙草 3 +照相 3 +煩惱 3 +熱庫 3 +熱能 3 +爬行 3 +爭 3 +爲 3 +牆壁 3 +牛 3 +牛津 3 +物資 3 +特化 3 +狹窄 3 +獎學 3 +獎項 3 +獲利 3 +獲取 3 +獵食 3 +獻給 3 +率領 3 +王室 3 +珠海 3 +班納蒂克 3 +現任 3 +球場 3 +理解 3 +瑞草 3 +環 3 +生前 3 +生成 3 +生殖 3 +產下 3 +用品 3 +用戶 3 +用法 3 +用途 3 +田 3 +男女 3 +男孩 3 +畢 3 +畫作 3 +異常 3 +當今 3 +當作 3 +當初 3 +疑問 3 +病故 3 +瘋狂 3 +登上 3 +登基 3 +登場 3 +登陸 3 +發揮 3 +發源 3 +白人 3 +百科 3 +皇 3 +皇室 3 +盟友 3 +盟旗 3 +監管 3 +目錄 3 +直系 3 +直線 3 +直選 3 +相反 3 +相機 3 +相遇 3 +真人 3 +真武 3 +眼睛 3 +睡蓮 3 +瞭解 3 +知情 3 +短尾貓 3 +短短 3 +破曉 3 +破產 3 +碎片 3 +碳 3 +碳化 3 +確保 3 +確立 3 +社群 3 +祖父 3 +神父 3 +票價 3 +福 3 +福島 3 +福斯 3 +禮 3 +禮儀 3 +禮拜 3 +秀 3 +科系 3 +科隆 3 +租借 3 +租賃 3 +種種 3 +積極 3 +窯瓷 3 +立陶宛 3 +章 3 +童話 3 +競選 3 +竹子 3 +第八 3 +第十 3 +第十三 3 +筆 3 +算 3 +管道 3 +箱 3 +節慶 3 +節省 3 +籍 3 +精通 3 +精選 3 +糧食 3 +紅磡 3 +紅色 3 +紛爭 3 +素貞 3 +紡織 3 +索馬利亞 3 +細小 3 +終點 3 +組建 3 +組裝 3 +結成 3 +維京 3 +維吉爾 3 +維基 3 +維多利亞 3 +編劇 3 +總共 3 +總數 3 +總結 3 +總體 3 +繪製 3 +繼位 3 +纖維 3 +缺席 3 +缺點 3 +置富 3 +羊肉 3 +美利堅 3 +翁 3 +老鼠 3 +考 3 +考古 3 +考驗 3 +而非 3 +耶穌 3 +聖誕 3 +聖靈 3 +聘請 3 +聚會 3 +聯手 3 +聯軍 3 +聲勢 3 +職位 3 +股價 3 +股票 3 +胡安 3 +膜 3 +自傳 3 +自我 3 +自稱 3 +自身 3 +自願 3 +至少 3 +致力 3 +致命 3 +臺南 3 +臼齒 3 +舞蹈 3 +航海 3 +航程 3 +船員 3 +艾女 3 +艾滋 3 +芭比 3 +英九 3 +范 3 +茶葉 3 +草食 3 +莽 3 +華格納 3 +菲利普斯 3 +萊姆 3 +萊茵 3 +著稱 3 +蒙大拿 3 +蒙特內哥羅 3 +蒸汽 3 +蓬勃 3 +薩魯曼 3 +藍 3 +藍色 3 +藤 3 +藥 3 +藩 3 +蘇州 3 +虎鯨 3 +蜀 3 +衍生 3 +衙門 3 +衛 3 +衛視 3 +衝擊 3 +衣服 3 +表 3 +裁判 3 +裏 3 +補給 3 +裝 3 +裝飾 3 +複合 3 +複製 3 +西安 3 +西洋 3 +西湖 3 +西納 3 +西門 3 +西關 3 +西面 3 +見到 3 +規格 3 +視頻 3 +親自 3 +計畫 3 +記憶 3 +評估 3 +評審 3 +該寺 3 +該書 3 +該校 3 +該站 3 +該鎮 3 +誠實 3 +誤認 3 +說明 3 +課室 3 +諷刺 3 +諸多 3 +謀殺 3 +謂 3 +謝 3 +證實 3 +識別 3 +護照 3 +譽 3 +讀書 3 +變形 3 +變數 3 +變體 3 +讚賞 3 +象 3 +貝爾 3 +貴妃 3 +貴州 3 +買 3 +買家 3 +費德勒 3 +費雪 3 +資助 3 +賓夕法尼亞 3 +賢 3 +賦予 3 +走廊 3 +起訴 3 +越 3 +足協 3 +距 3 +路徑 3 +跳 3 +踢 3 +身邊 3 +車體 3 +較大 3 +較長 3 +輔助 3 +輔導 3 +輔政 3 +輸出 3 +轄 3 +轄區 3 +轉乘 3 +轉到 3 +轉投 3 +轉讓 3 +農 3 +近期 3 +迫 3 +迫使 3 +追逐 3 +退休 3 +逃離 3 +逐步 3 +通信 3 +通用 3 +通行 3 +速食 3 +逢 3 +連環 3 +連線 3 +逮捕 3 +週 3 +逾 3 +遇見 3 +遊樂 3 +運營 3 +過渡 3 +道光 3 +達也 3 +違法 3 +遠航 3 +適應 3 +遷徙 3 +選 3 +選手 3 +選秀 3 +遺體 3 +邊界 3 +那裏 3 +邦聯 3 +都市 3 +鄉議 3 +配 3 +配樂 3 +配置 3 +酗酒 3 +酸 3 +釀酒 3 +釋放 3 +重傷 3 +重整 3 +金庫 3 +金庸 3 +金鐘 3 +銅 3 +鋼 3 +錄 3 +錄音 3 +鍵 3 +鎳 3 +鐵人 3 +長安 3 +長州 3 +長相 3 +長遠 3 +開播 3 +關節 3 +阿兒法 3 +阿拉斯加 3 +阿根廷 3 +阿爾卑斯 3 +附屬 3 +降 3 +降落 3 +陣營 3 +除外 3 +陵 3 +陸地 3 +陸續 3 +際 3 +隱藏 3 +隱語 3 +雅典 3 +雌雄 3 +雙立 3 +雜技 3 +難度 3 +雪梨 3 +雪莉 3 +雲 3 +零 3 +零售 3 +雷睦斯 3 +靈 3 +青島 3 +鞏固 3 +音頻 3 +頂層 3 +順利 3 +預先 3 +頭銜 3 +頻譜 3 +題寫 3 +額 3 +風 3 +食 3 +飲食 3 +飾演 3 +首任 3 +首演 3 +馬來 3 +馬來亞 3 +馬德里 3 +驅動 3 +驅逐 3 +體操 3 +高低槓 3 +高原 3 +高層 3 +高山 3 +高麗 3 +鳳山 3 +麥田 3 +黃金 3 +黑子 3 +黑斑 3 +黑洞 3 +黛比 3 +黨員 3 +龍 3 +龍馬 3 +$ 2 +' 2 +... 2 +...... 2 +9.9999999 2 +99%-99% 2 +99.9億 2 +999.9 2 +999.99999 2 +999.9999999999999 2 +999.99億 2 +9999.9 2 +9999/99 2 +9999多 2 +9999餘 2 +99:99 2 +99A 2 +99° 2 +9:99 2 +9D 2 +9億9千9百萬 2 +9百萬 2 +9萬 2 +AAC 2 +ABC 2 +AI 2 +Aldridge 2 +Arts 2 +BBC 2 +Before 2 +Boy 2 +C 2 +DC-99 2 +DJ 2 +DNA 2 +E9 2 +E99 2 +Europipe 2 +Eve 2 +F-99A 2 +FC 2 +Finn 2 +GCMG 2 +Gravion 2 +Hall 2 +II 2 +Jason 2 +Jean 2 +K 2 +Karin 2 +L 2 +La 2 +Lee 2 +Live 2 +M9999 2 +N999 2 +NASA 2 +NDS 2 +NET 2 +Nicea 2 +OROCHI 2 +PVC 2 +Phillips 2 +Rivers 2 +Robert 2 +S 2 +Station 2 +Strait 2 +TVB 2 +U99 2 +UA 2 +V 2 +Winston 2 +Wyclef 2 +XII 2 +and 2 +de 2 +iPod 2 +km/h 2 +silver 2 +the 2 +‧ 2 +〈 2 +〉 2 +一中 2 +一共 2 +一千 2 +一向 2 +一度 2 +一手 2 +一提 2 +一貫 2 +一面 2 +七喜 2 +三棟屋 2 +三氯化金 2 +三藏 2 +上下車 2 +上任 2 +上佳 2 +上午 2 +上吊 2 +上將 2 +上層 2 +上方 2 +上校 2 +上街 2 +下去 2 +下層 2 +下屬 2 +下旬 2 +下水 2 +下海 2 +下游 2 +下野 2 +不一 2 +不停 2 +不再 2 +不列顛 2 +不受 2 +不夠 2 +不如 2 +不宜 2 +不已 2 +不幸 2 +不法 2 +不清 2 +不用 2 +不管 2 +不良 2 +不論 2 +不變 2 +不錯 2 +不需 2 +不願 2 +丐幫 2 +世俗 2 +世博 2 +世卿 2 +世襲 2 +世錦 2 +丘陵 2 +中區 2 +中午 2 +中天 2 +中巴 2 +中正 2 +中途 2 +中道 2 +中遠 2 +主上 2 +主力 2 +主因 2 +主場 2 +主權 2 +主管 2 +主線 2 +乘船 2 +乘車 2 +乙級 2 +九一八 2 +九州 2 +九巴 2 +也有 2 +也許 2 +乳酪 2 +事後 2 +二十六 2 +二甘醇 2 +互動 2 +五四 2 +五峰 2 +五百 2 +井 2 +亞冠 2 +亞利桑那 2 +亞得里亞 2 +交 2 +交互 2 +交到 2 +交匯 2 +交好 2 +交情 2 +交戰 2 +交手 2 +交趾 2 +人力 2 +人心 2 +人才 2 +人文 2 +人格 2 +人熙 2 +人群 2 +人間 2 +人魚 2 +仁慈 2 +仁記 2 +今年 2 +介乎 2 +介石 2 +仍舊 2 +付款 2 +仙 2 +仙劍 2 +仙女 2 +以南 2 +以東 2 +以至 2 +任城 2 +任天堂 2 +任意 2 +份額 2 +仿 2 +伊比利亞 2 +伏威 2 +休閒 2 +伯公 2 +伯爵 2 +伯靈頓 2 +伴隨 2 +似乎 2 +低地 2 +低廉 2 +低溫 2 +住房 2 +佐土原 2 +佐藤 2 +佛山 2 +佛朗明哥 2 +佛殿 2 +作好 2 +作業 2 +作物 2 +佩劍 2 +併入 2 +使命 2 +使者 2 +使館 2 +來訪 2 +例 2 +例外 2 +供暖 2 +供熱 2 +供職 2 +侵 2 +侵入 2 +侵犯 2 +便宜 2 +促使 2 +促成 2 +俗稱 2 +保有 2 +保級 2 +保羅 2 +保衛 2 +信心 2 +信義 2 +信長 2 +信雄 2 +修士 2 +修理 2 +修習 2 +修訂 2 +修鍊 2 +個案 2 +倒台 2 +倒掛 2 +候鳥 2 +倡導 2 +倫 2 +假如 2 +假期 2 +假髮 2 +偏差 2 +停戰 2 +停滯 2 +偶然 2 +偶爾 2 +偽造 2 +傑作 2 +備 2 +催化 2 +傳入 2 +傳到 2 +傳動 2 +傳媒 2 +傳導 2 +傳授 2 +傳聞 2 +傳言 2 +傳送 2 +傳達 2 +債務 2 +傷 2 +傾聽 2 +僅僅 2 +僱員 2 +儀錶 2 +儒家 2 +優先 2 +儲備 2 +元代 2 +元件 2 +元帥 2 +元年 2 +元洲 2 +元璋 2 +元甲 2 +元首 2 +充斥 2 +充當 2 +兆帕 2 +先知 2 +先行 2 +先驅 2 +光線 2 +光譜 2 +光軸 2 +克基拉 2 +克用 2 +克隆 2 +免職 2 +入伍 2 +入圍 2 +入學 2 +入獄 2 +入讀 2 +入門 2 +內務 2 +內外 2 +內心 2 +內流 2 +全新 2 +全日 2 +全校 2 +全權 2 +全能 2 +全身 2 +八一 2 +八百餘 2 +公學 2 +公寓 2 +公署 2 +公認 2 +兵營 2 +其父 2 +具備 2 +典禮 2 +再造 2 +冬季 2 +冰兄 2 +冰峰 2 +冰川 2 +冰雪 2 +凡 2 +凱特 2 +凱瑞 2 +出入 2 +出入口 2 +出家 2 +出席 2 +出演 2 +出產 2 +出賽 2 +出道 2 +分享 2 +分化 2 +分區 2 +分手 2 +分擔 2 +分歧 2 +分隊 2 +刊載 2 +列傳 2 +列出 2 +初學 2 +初年 2 +初稿 2 +初級 2 +初賽 2 +判斷 2 +判處 2 +別 2 +別列佐夫斯基 2 +利比亞 2 +利物浦 2 +利特維年科 2 +到來 2 +到底 2 +制止 2 +制裁 2 +制訂 2 +刺 2 +刺客 2 +刺死 2 +刻 2 +刻有 2 +削弱 2 +前任 2 +前來 2 +前妻 2 +前途 2 +剝奪 2 +剩下 2 +副本 2 +創 2 +創始 2 +創新 2 +創業 2 +劃入 2 +劃給 2 +劇烈 2 +劍術 2 +劍齒虎 2 +功 2 +功率 2 +加之 2 +加勒比 2 +加堆 2 +加重 2 +劣勢 2 +助戰 2 +勒格里 2 +勒沃 2 +動機 2 +動脈 2 +動車 2 +勝出 2 +勳 2 +勳章 2 +勳銜 2 +勾引 2 +包圍 2 +包廂 2 +包衣 2 +匕首 2 +化纖 2 +化身 2 +北宋 2 +北平 2 +北方 2 +北端 2 +北約 2 +北道 2 +北齊 2 +匯率 2 +區分 2 +十一世 2 +十三 2 +十六 2 +升學 2 +半山 2 +半球 2 +協商 2 +協奏 2 +協約 2 +南下 2 +南山 2 +南斯拉夫 2 +南遣 2 +南邊 2 +南陽 2 +南非 2 +南面 2 +南韓 2 +博弈 2 +博彩 2 +博恩 2 +占卜 2 +卡梅隆 2 +卡片 2 +印 2 +印加 2 +印尼 2 +印製 2 +即位 2 +即時 2 +即興 2 +卷 2 +卿 2 +卿雲 2 +厄運 2 +原名 2 +原址 2 +原聲 2 +去除 2 +參觀 2 +參選 2 +又是 2 +又稱 2 +及格 2 +友好 2 +反叛 2 +反抗 2 +反擊 2 +叔叔 2 +取決 2 +受審 2 +受益 2 +受體 2 +口中 2 +口述 2 +古巴 2 +古柯鹼 2 +古蹟 2 +召喚 2 +可汗 2 +史學 2 +史密斯 2 +史提夫 2 +史蒂芬 2 +右岸 2 +司機 2 +司長 2 +司鼓 2 +吃肉 2 +吃飯 2 +各式 2 +各式各樣 2 +各級 2 +各自 2 +各部 2 +合同 2 +合川 2 +合稱 2 +合葬 2 +吉布斯 2 +吉林 2 +吉里巴斯 2 +同人 2 +同居 2 +同體 2 +名人 2 +名利 2 +名古屋 2 +名縉 2 +名鎮 2 +向量 2 +君 2 +君王 2 +吞併 2 +否則 2 +否定 2 +告別 2 +告知 2 +告終 2 +周歲 2 +味 2 +呼叫 2 +呼籲 2 +和解 2 +和談 2 +咬金 2 +品行 2 +哈里發 2 +哥斯大黎加 2 +哥本哈根 2 +哥特 2 +哪裡 2 +售賣 2 +唯有 2 +唯美 2 +問 2 +啟動 2 +啟睿 2 +啟航 2 +啟蒙 2 +善化 2 +善意 2 +喉嚨 2 +喜劇 2 +喝 2 +喪生 2 +喬伊斯 2 +喬艾爾 2 +單元 2 +單曲 2 +喻 2 +嘉慶 2 +嘉玲 2 +器物 2 +噪音 2 +噴氣 2 +嚴密 2 +囚禁 2 +四分之一 2 +四十 2 +回國 2 +回想 2 +回憶 2 +回收 2 +國代 2 +國外 2 +國寶 2 +國徽 2 +國璋 2 +國語 2 +國鋒 2 +圍攻 2 +園藝 2 +圓頂 2 +圖樣 2 +圖畫 2 +團結 2 +團聚 2 +團長 2 +在位 2 +在來 2 +地外 2 +地帶 2 +坐診 2 +型態 2 +埃 2 +埃米莉 2 +城中 2 +城子 2 +域名 2 +執導 2 +執掌 2 +執教 2 +執法 2 +基底 2 +堂區 2 +堅 2 +堅固 2 +堅強 2 +報紙 2 +場場 2 +塞普勒斯 2 +境外 2 +墓地 2 +墓室 2 +增多 2 +增建 2 +增強 2 +增設 2 +墮胎 2 +壓倒 2 +壓強 2 +壓迫 2 +士 2 +壯大 2 +壯年 2 +夏伊 2 +夏季 2 +外傳 2 +外圍 2 +外在 2 +外援 2 +外觀 2 +外資 2 +多倫多 2 +多半 2 +多少 2 +夜晚 2 +夠 2 +夥伴 2 +大亂 2 +大佛 2 +大公 2 +大力 2 +大勝 2 +大半 2 +大堂 2 +大妃 2 +大將 2 +大屋 2 +大廳 2 +大批 2 +大敗 2 +大槍 2 +大火 2 +大碟 2 +大笨 2 +大街 2 +大衛 2 +大連 2 +大阪 2 +天地 2 +天子 2 +天師 2 +天敵 2 +天氣 2 +天衣 2 +天雷 2 +太古 2 +太多 2 +太大 2 +太子 2 +太守 2 +太祖 2 +夸脫 2 +奉天 2 +契合 2 +奢侈 2 +奧布賴恩 2 +奧斯曼 2 +奧朗則布 2 +奧林匹克 2 +奪冠 2 +女士 2 +女孩 2 +女皇 2 +妖精 2 +妖魔 2 +妥善 2 +姊妹 2 +始皇 2 +姐妹 2 +姐弟 2 +姑家 2 +姓 2 +姓名 2 +姚 2 +姜 2 +姿態 2 +威斯康辛 2 +威爾遜 2 +娘舅 2 +婆婆 2 +嫉妒 2 +子夜 2 +子珍 2 +孔子 2 +字元 2 +字型 2 +字體 2 +存有 2 +存活 2 +孟能 2 +季前 2 +季軍 2 +孤僻 2 +孤獨 2 +孵化 2 +學制 2 +學問 2 +學士 2 +學年 2 +學期 2 +學童 2 +學系 2 +學費 2 +宇一郎 2 +守衛 2 +安修 2 +安息 2 +安打 2 +安東尼 2 +安菲特裡忒 2 +安邑 2 +完 2 +完整 2 +宏觀 2 +宗室 2 +官職 2 +定下 2 +定名 2 +定型 2 +定期 2 +客 2 +客串 2 +客人 2 +客室 2 +客機 2 +客車 2 +宣戰 2 +害怕 2 +家久 2 +家境 2 +家屬 2 +家產 2 +家衛 2 +家貓 2 +寄宿 2 +密切 2 +密蘇里 2 +富 2 +富人 2 +富特 2 +實力 2 +實務 2 +實用 2 +實習 2 +審 2 +審判 2 +審查 2 +寫道 2 +寬廣 2 +寬頻 2 +寬鬆 2 +寶石 2 +寺廟 2 +寺院 2 +封神 2 +封面 2 +射殺 2 +專區 2 +專員 2 +專有 2 +專題 2 +尉 2 +尊嚴 2 +尊重 2 +尋常 2 +對峙 2 +對待 2 +對陣 2 +小兒 2 +小姐 2 +小心 2 +小桃 2 +小梅 2 +小鎮 2 +小閻 2 +小青 2 +就任 2 +就業 2 +尺 2 +尼克森 2 +尼羅 2 +尼西亞 2 +尼采 2 +尾部 2 +局限 2 +居 2 +居委 2 +居里 2 +屋大維 2 +屋苑 2 +展 2 +展館 2 +履仁 2 +屬名 2 +山丘 2 +山坡 2 +山海 2 +岩石 2 +岳母 2 +岳父 2 +崇拜 2 +崔西 2 +崖 2 +嵌 2 +嶺南 2 +嶽麓 2 +川 2 +工兵 2 +工商 2 +工農 2 +工黨 2 +左上 2 +左側 2 +巧眉 2 +巧言 2 +差距 2 +差點 2 +巴克特里亞 2 +巴勒斯坦 2 +巴哈歐拉 2 +巴格曼 2 +巴格達 2 +巴洛克 2 +巴爾幹 2 +巴納德 2 +巷 2 +市值 2 +市內 2 +市商 2 +市郊 2 +市長 2 +布卡 2 +布拉格 2 +布朗 2 +布爾薩 2 +布魯明頓 2 +希羅 2 +帕洛馬 2 +帛琉 2 +帶去 2 +帶有 2 +常務 2 +常年 2 +常德 2 +常春藤葉 2 +常規 2 +幫忙 2 +干擾 2 +干涉 2 +干預 2 +平 2 +平安 2 +平息 2 +平成 2 +平方尺 2 +平時 2 +平頂 2 +年初 2 +年紀 2 +年譜 2 +幼體 2 +床墊 2 +序數 2 +底層 2 +店鋪 2 +度母 2 +座堂 2 +庫夫 2 +庫容 2 +庭 2 +康乃爾 2 +康復 2 +廉租 2 +廠房 2 +廢 2 +廢墟 2 +廢止 2 +廣安 2 +廣義 2 +延任 2 +延遲 2 +建有 2 +建銘 2 +引種 2 +引退 2 +引進 2 +弗朗索瓦 2 +強風 2 +彈奏 2 +彈性 2 +彌迦 2 +彙集 2 +彩色 2 +影展 2 +往返 2 +征 2 +征戰 2 +很近 2 +後端 2 +後裔 2 +徒刑 2 +得票 2 +得道 2 +從小 2 +從而 2 +從軍 2 +御苑 2 +微山 2 +德州 2 +德瑞克 2 +德輔 2 +心中 2 +必 2 +志剛 2 +快樂 2 +忽必烈 2 +思念 2 +思明 2 +思科 2 +性交 2 +恆鳳 2 +恐慌 2 +恥辱 2 +恩 2 +恩寵 2 +恩賜 2 +悅強 2 +悲觀 2 +情意 2 +惠 2 +惠山 2 +愈 2 +愉景 2 +意志 2 +意願 2 +愛因斯坦 2 +愛德華 2 +愛惜 2 +感動 2 +感受 2 +感染 2 +慈幼 2 +慈鯛 2 +態 2 +慘敗 2 +慣例 2 +慶尚 2 +慶豐 2 +慾望 2 +憎恨 2 +憑 2 +應對 2 +懊惱 2 +懷俄明 2 +懷舊 2 +懸浮 2 +懸索 2 +成仙 2 +成傑 2 +成因 2 +成型 2 +成就 2 +成群 2 +成貓 2 +戰亂 2 +戰列 2 +戰場 2 +戰士 2 +戰線 2 +戰術 2 +戴 2 +戴麟趾 2 +房 2 +手冊 2 +手動 2 +手機 2 +手裡 2 +才能 2 +打工 2 +打敗 2 +打破 2 +打開 2 +托爾斯 2 +托爾斯泰 2 +扶植 2 +找出 2 +找回 2 +找尋 2 +承諾 2 +抄襲 2 +抓 2 +抓住 2 +投影 2 +投降 2 +抗擊 2 +抽取 2 +拆穿 2 +拆解 2 +拉斐爾 2 +拉格 2 +拔出 2 +拖 2 +拖延 2 +招商 2 +招股 2 +拷貝 2 +拼音 2 +拿 2 +拿到 2 +拿破崙 2 +拿走 2 +指引 2 +指控 2 +指涉 2 +按鍵 2 +挖角 2 +挽救 2 +捐贈 2 +捕 2 +捕獲 2 +捕食 2 +捷運 2 +掉 2 +排 2 +排放 2 +排氣 2 +排演 2 +掛架 2 +掠過 2 +掠食 2 +採訪 2 +接待 2 +接掌 2 +接種 2 +接駁 2 +控球 2 +推廣 2 +推翻 2 +推選 2 +描寫 2 +提倡 2 +提及 2 +提示 2 +插圖 2 +揚聲 2 +換入 2 +換股 2 +損傷 2 +損毀 2 +搞笑 2 +搭檔 2 +搶險 2 +摩根 2 +摩爾 2 +撤出 2 +撤軍 2 +播 2 +播客 2 +擅長 2 +擊 2 +擊退 2 +擒抱 2 +擔負 2 +據守 2 +擺脫 2 +擾動 2 +支出 2 +支柱 2 +收 2 +收復 2 +收發 2 +收穫 2 +收視 2 +收集 2 +改回 2 +改寫 2 +改建 2 +改版 2 +改良 2 +改裝 2 +攻佔 2 +攻陷 2 +放映 2 +放置 2 +政協 2 +政變 2 +政黨 2 +故意 2 +故此 2 +故鄉 2 +效忠 2 +效率 2 +敏 2 +敏感 2 +敗給 2 +教友 2 +教員 2 +教徒 2 +教派 2 +教科文 2 +整修 2 +整套 2 +整體 2 +敵對 2 +數位 2 +數十 2 +數理 2 +數目 2 +文元 2 +文官 2 +文帝 2 +文康 2 +文英 2 +文華 2 +文革 2 +斐濟 2 +斥資 2 +斯圖爾特 2 +斯大林 2 +斯氏星蟒 2 +斯洛維尼亞 2 +斯特勒謝尼 2 +斯理 2 +新宿 2 +新岩 2 +新曲 2 +新澤西 2 +新田 2 +新興 2 +斷裂 2 +方位 2 +方針 2 +施行 2 +旁 2 +旁邊 2 +旅鴿 2 +旋律 2 +日喀則 2 +日本龍 2 +日益 2 +日航 2 +日行 2 +早上 2 +旺山 2 +旺盛 2 +昆士蘭 2 +昌 2 +明帝 2 +易名 2 +昔日 2 +星形 2 +春天 2 +春日 2 +是否是 2 +時尚 2 +時速 2 +晉升 2 +晚上 2 +晚會 2 +普 2 +普及 2 +普陀 2 +景 2 +景帝 2 +景觀 2 +景象 2 +智慧 2 +暑假 2 +暗殺 2 +暢銷 2 +暫停 2 +暫緩 2 +暴露 2 +曝氣 2 +更好 2 +更改 2 +更深 2 +更高 2 +書信 2 +書寫 2 +書房 2 +書法 2 +最久 2 +最小 2 +最少 2 +最新 2 +最遊 2 +會員 2 +會場 2 +會社 2 +會談 2 +會長 2 +月刊 2 +有助 2 +有意 2 +有毒 2 +有罪 2 +服 2 +服從 2 +朔日 2 +朝代 2 +木星 2 +木管 2 +末年 2 +末期 2 +本區 2 +本屆 2 +本班 2 +本站 2 +本質 2 +本願 2 +朴 2 +村落 2 +村頭 2 +束縛 2 +杭 2 +東亞 2 +東吳 2 +東山 2 +東征 2 +東晉 2 +東正 2 +東視 2 +松潘 2 +松鼠猴 2 +林庄 2 +果實 2 +果汁 2 +架設 2 +柏油 2 +染色 2 +柔佛 2 +柔弱 2 +查 2 +查德 2 +柯 2 +柱 2 +柳 2 +柳江 2 +柴油 2 +柴灣 2 +校內 2 +校隊 2 +核能 2 +根本 2 +格式 2 +格林 2 +格林維爾 2 +格格 2 +格檔 2 +格里高利 2 +桃太洛斯 2 +桌面 2 +桑葚 2 +棕熊 2 +棣 2 +植被 2 +楊樹 2 +業者 2 +極大 2 +極性 2 +極高 2 +榮獲 2 +樁 2 +樂農 2 +標語 2 +標題 2 +樞機 2 +樟湖 2 +模具 2 +樣本 2 +樹木 2 +橙 2 +機動 2 +機員 2 +機槍 2 +橡膠 2 +橫山 2 +橫濱 2 +橫跨 2 +檢索 2 +檢討 2 +權威 2 +權貴 2 +次數 2 +次級 2 +次要 2 +次郎 2 +次長 2 +欺騙 2 +歌仔 2 +歌唱 2 +歌聲 2 +歌迷 2 +正是 2 +正直 2 +正統 2 +正面 2 +此人 2 +此案 2 +此物 2 +此種 2 +此線 2 +此舉 2 +此類 2 +步態 2 +武大 2 +武昌 2 +武松 2 +歧視 2 +歸類 2 +死靈 2 +殘存 2 +殘忍 2 +殘酷 2 +殯葬 2 +殺傷 2 +殺掉 2 +每位 2 +每周 2 +每層 2 +每日 2 +每次 2 +毒性 2 +毒殺 2 +毒藥 2 +毗鄰 2 +毛利 2 +民不聊生 2 +民調 2 +民都洛水牛 2 +氣 2 +氣田 2 +氧 2 +氫彈 2 +氯金酸 2 +水孔 2 +水手 2 +水準 2 +水溫 2 +水滸 2 +水質 2 +水道 2 +水餃 2 +永嘉 2 +永寧 2 +汗位 2 +汝霖 2 +江北 2 +江戶 2 +池尻 2 +決心 2 +決戰 2 +沃夫 2 +沖繩 2 +沙 2 +沙咀 2 +沙柏 2 +沙河 2 +沙龍 2 +河水 2 +泉州 2 +法蘭西 2 +法醫 2 +泡沫 2 +波塞摩斯 2 +注射 2 +注重 2 +泰坦 2 +洗 2 +洗手 2 +洛辛堡 2 +洞 2 +活 2 +活性 2 +派出 2 +派別 2 +派駐 2 +流傳 2 +流失 2 +流求 2 +流派 2 +流通 2 +浙東 2 +浩劫 2 +浮冰 2 +海南 2 +海涌 2 +海豹 2 +海邊 2 +海關 2 +消化 2 +消失 2 +涌 2 +涮 2 +淄博 2 +淮 2 +淮河 2 +深厚 2 +深得 2 +深愛 2 +深遠 2 +淹沒 2 +添加 2 +清晨 2 +清楚 2 +清華 2 +清鍾 2 +減輕 2 +游牧 2 +湖州 2 +湘 2 +湯興 2 +溝通 2 +溪流 2 +溫和 2 +溫州 2 +溫暖 2 +滄州 2 +滅口 2 +滙豐 2 +滬東 2 +滿足 2 +漁業 2 +漂流 2 +演說 2 +漢佛瑞 2 +漢口 2 +漸漸 2 +潔 2 +潭西 2 +潮州 2 +澤普 2 +澳底 2 +激光 2 +激戰 2 +激起 2 +濃縮 2 +濕原 2 +濕度 2 +濟寧 2 +濱松 2 +濱湖 2 +瀏覽 2 +灌木 2 +灘 2 +火藥 2 +灰狼 2 +灰色 2 +災害 2 +炮台 2 +為數 2 +烏孫 2 +無力 2 +無效 2 +無界 2 +無緣 2 +無辜 2 +無黨 2 +焦耳 2 +然 2 +煙熏 2 +照料 2 +照顧 2 +煮制 2 +熊 2 +熊隻 2 +熱比婭 2 +熱衷 2 +燃燒 2 +燒毀 2 +燒餅 2 +燕山 2 +爪 2 +爪獸 2 +爭取 2 +爭執 2 +爭辯 2 +爭霸 2 +父子 2 +爾後 2 +牆體 2 +片段 2 +牙買加 2 +牛仔 2 +牛肉 2 +牧師 2 +牧養 2 +物價 2 +特使 2 +特性 2 +特權 2 +特種 2 +犬隻 2 +犬齒 2 +犯 2 +狀 2 +狐狸 2 +猛烈 2 +猛虎 2 +猶他 2 +猶豫 2 +獅 2 +獎章 2 +獎金 2 +獨居 2 +獨自 2 +獵奇 2 +獵殺 2 +玄 2 +玄機 2 +率軍 2 +玉帶 2 +玉門 2 +王位 2 +王妃 2 +玩 2 +玩具 2 +珀西 2 +珍 2 +珍品 2 +現址 2 +現狀 2 +球迷 2 +理念 2 +琪 2 +琴 2 +琴行 2 +瑪利亞 2 +瑪納斯 2 +瑪莉 2 +瑪麗亞 2 +環島 2 +環形 2 +環球 2 +環礁 2 +瓊璘 2 +瓜分 2 +瓦爾那 2 +甚 2 +甚少 2 +甚麼 2 +甜甜 2 +生日 2 +生母 2 +生病 2 +產值 2 +產區 2 +產物 2 +用地 2 +用電 2 +由來 2 +由衷 2 +甲 2 +甲板 2 +甲醇 2 +申花 2 +男爵 2 +町村 2 +留存 2 +留學 2 +留意 2 +留香 2 +畜 2 +番 2 +畫上 2 +畫報 2 +異性 2 +當事 2 +當代 2 +當前 2 +當場 2 +當成 2 +疫情 2 +病人 2 +病理 2 +痕迹 2 +登記 2 +登輝 2 +發售 2 +發回 2 +發掘 2 +發覺 2 +發音 2 +白紙 2 +白金漢 2 +白馬 2 +百度 2 +皇子 2 +皇宮 2 +皮 2 +皮埃蒙特 2 +盆子 2 +益世 2 +盟校 2 +監察 2 +監製 2 +監視 2 +直人 2 +直布羅陀 2 +直轄 2 +直通 2 +相戀 2 +相等 2 +相識 2 +相連 2 +省立 2 +看似 2 +看法 2 +真宗 2 +真情 2 +真的 2 +眼鏡 2 +眾人 2 +睡衣 2 +矚目 2 +矛盾 2 +知節 2 +短面熊 2 +矮人 2 +石化 2 +石原 2 +石家 2 +砍柴 2 +研製 2 +研討 2 +砲 2 +硅 2 +硫磺 2 +硬 2 +硬體 2 +碎石 2 +碘 2 +碧翠絲 2 +碩士 2 +確實 2 +磅 2 +磨損 2 +礦 2 +礦業 2 +示 2 +示威 2 +社交 2 +祂 2 +祕教 2 +神代 2 +票 2 +祺瑞 2 +福來 2 +福利 2 +福部 2 +福音 2 +禮節 2 +禽龍 2 +秀全 2 +秀吉 2 +秋天 2 +科幻 2 +科爾多瓦 2 +科羅拉多 2 +科赫 2 +科雷馬 2 +秘魯 2 +租客 2 +移除 2 +稀有 2 +稅 2 +程式 2 +種姓 2 +稱呼 2 +稱臣 2 +稱讚 2 +稻盛 2 +穆罕默德 2 +積分 2 +空缺 2 +穿耳 2 +突出 2 +突厥 2 +突擊 2 +窟 2 +立下 2 +立憲 2 +立熙 2 +站台 2 +竟然 2 +竣工 2 +童星 2 +競技 2 +競馬 2 +笑話 2 +笨 2 +第九 2 +第十一 2 +筆下 2 +等到 2 +等待 2 +策 2 +策劃 2 +管弦 2 +管治 2 +節奏 2 +簡 2 +簡易 2 +簽署 2 +籃壇 2 +籃子 2 +籌建 2 +籤 2 +米利特 2 +米格 2 +米爾扎 2 +精度 2 +精武 2 +精液 2 +精緻 2 +精美 2 +精采 2 +糖 2 +糖份 2 +紀 2 +約克 2 +約定俗成 2 +約會 2 +約瑟夫 2 +紅木 2 +紅麴 2 +紋 2 +紐卡斯爾 2 +紓緩 2 +純淨 2 +純粹 2 +紙 2 +紙幣 2 +紛紛 2 +素質 2 +索引 2 +細緻 2 +終 2 +組長 2 +結 2 +結晶 2 +結識 2 +絕望 2 +統稱 2 +絲綢 2 +經紀 2 +經費 2 +綠 2 +維 2 +維修 2 +維吉尼亞 2 +維鈞 2 +網上 2 +網友 2 +網民 2 +緊鄰 2 +線粒 2 +線西 2 +編入 2 +編寫 2 +編製 2 +緩存 2 +緩慢 2 +緬因 2 +縣城 2 +縣治 2 +縣長 2 +縱橫 2 +縱貫 2 +總值 2 +總會 2 +總監 2 +總管 2 +總額 2 +繁忙 2 +繁榮 2 +繁殖 2 +繚 2 +繞城 2 +繪圖 2 +續篇 2 +續約 2 +罪 2 +罪案 2 +罪行 2 +署名 2 +署長 2 +罷黜 2 +罹患 2 +羅丹 2 +羅塞塔 2 +羅斯 2 +羅斯基勒 2 +羅斯提 2 +羅漢 2 +羅素 2 +羅貝爾 2 +羊 2 +羊曲 2 +羊毛 2 +羌 2 +美女 2 +義 2 +義和 2 +習 2 +習性 2 +翦 2 +翻新 2 +翻越 2 +翼 2 +老年 2 +老式 2 +老舍 2 +考場 2 +考證 2 +耕作 2 +耕種 2 +耳道 2 +耶和華 2 +耶律 2 +耶魯 2 +聖三 2 +聖地 2 +聘任 2 +聚 2 +聚合 2 +聚居 2 +聯 2 +聯名 2 +聰明 2 +聲名 2 +聲望 2 +聲道 2 +聽 2 +肉糕 2 +肉食 2 +肖 2 +肖像 2 +肖金 2 +肝臟 2 +股 2 +股權 2 +肢 2 +肯尼迪 2 +育才 2 +育種 2 +肺炎 2 +胎兒 2 +胖子 2 +能源 2 +能級 2 +腎 2 +腓特烈 2 +腳 2 +腳趾 2 +腹面 2 +腺葉木犀欖 2 +膝蓋 2 +膠質 2 +臘汁 2 +臣民 2 +臨床 2 +臨淄 2 +臨近 2 +臨邑 2 +自主 2 +自助 2 +自家 2 +自殺 2 +自衛 2 +自轉 2 +臭氧 2 +至於 2 +致 2 +致死 2 +臺中 2 +臺北 2 +興化 2 +舉 2 +舉人 2 +舉動 2 +舊址 2 +舒服 2 +舒適 2 +舞台 2 +船尾 2 +船廠 2 +船艦 2 +船長 2 +艇 2 +艦艇 2 +艱苦 2 +色度 2 +色素 2 +花卉 2 +花崗 2 +花樣 2 +花費 2 +苗 2 +若干 2 +若是 2 +苦惱 2 +英俊 2 +英超 2 +英雄 2 +茨威格 2 +荷花 2 +荷蘭豬 2 +莆田 2 +莉拉 2 +莎拉 2 +莎莉 2 +莫名 2 +莫泊桑 2 +莫爾庫斯 2 +莫雷爾 2 +莫高 2 +菁英 2 +菌 2 +菩薩 2 +華夏 2 +華隆 2 +華麗 2 +菲 2 +萊特 2 +萬宜 2 +萬春 2 +萬萬 2 +落入 2 +落差 2 +葉子 2 +葉海亞 2 +葉片 2 +著想 2 +著迷 2 +葛馮 2 +葡萄牙 2 +葵盛 2 +蒂羅爾 2 +蒐集 2 +蒙 2 +蒙山 2 +蒙蔽 2 +蒸餾 2 +蓄電 2 +蓮屬 2 +蔬菜 2 +蔭權 2 +薩 2 +薩達姆 2 +薪資 2 +藉 2 +藉口 2 +藉著 2 +藍調 2 +藍鯨 2 +藏在 2 +藝員 2 +蘇丹 2 +蘇爾曼 2 +蘇維埃 2 +蘇黎世 2 +蘭 2 +虎豹 2 +虐待 2 +虔誠 2 +處境 2 +蛇 2 +蛇夫 2 +蛇類 2 +螺旋 2 +蠟燭 2 +蠻族 2 +血清 2 +血緣 2 +行李 2 +行程 2 +行車 2 +行駛 2 +術士 2 +街區 2 +衛冕 2 +衛戍 2 +表皮 2 +袋中 2 +裁定 2 +補充 2 +補助 2 +裝病 2 +製冷 2 +製片 2 +西元 2 +西區 2 +西沙 2 +西甲 2 +西站 2 +西鄰 2 +西鐵 2 +西門子 2 +西雅圖 2 +要素 2 +要職 2 +見義勇為 2 +見證 2 +規範 2 +視覺 2 +親密 2 +親屬 2 +親情 2 +親戚 2 +親緣 2 +親近 2 +觀世音 2 +觀塘 2 +觀賞 2 +角宿 2 +角逐 2 +解 2 +解鎖 2 +解體 2 +言論 2 +訂婚 2 +訂購 2 +計 2 +討伐 2 +記號 2 +許可 2 +訴說 2 +註冊 2 +評議 2 +評選 2 +詞彙 2 +詩篇 2 +詮釋 2 +話語 2 +該廟 2 +該車 2 +該館 2 +誕辰 2 +誘發 2 +語堂 2 +誤導 2 +說唱 2 +課 2 +課題 2 +調動 2 +調料 2 +調景 2 +論壇 2 +諸侯 2 +諸葛 2 +諾貝爾 2 +謎 2 +謙虛 2 +講 2 +謠言 2 +證件 2 +證券 2 +證據 2 +譜 2 +譜寫 2 +警報 2 +警官 2 +警方 2 +警長 2 +譯 2 +譯名 2 +譯法 2 +護士 2 +護法 2 +變得 2 +變換 2 +變更 2 +變異 2 +谷 2 +象棋 2 +豪華 2 +豫 2 +貓頭鷹 2 +貝爾尼納 2 +財務 2 +財困 2 +財團 2 +財物 2 +貨櫃 2 +販子 2 +貪污 2 +貴人 2 +買下 2 +買來 2 +賀氏 2 +資方 2 +資產 2 +賠償 2 +賢妃 2 +質子 2 +質疑 2 +質素 2 +購入 2 +購物 2 +賽馬 2 +赤川 2 +走出 2 +走路 2 +起飛 2 +趁機 2 +超越 2 +越低 2 +越獄 2 +越遠 2 +越高 2 +趕出 2 +趨 2 +趨同 2 +路上 2 +路口 2 +身 2 +身長 2 +躲過 2 +車中 2 +車廂 2 +車資 2 +車隊 2 +軍區 2 +軍校 2 +軍法 2 +載重 2 +輔音 2 +輕 2 +輕傷 2 +輕型 2 +輕視 2 +輟學 2 +轄境 2 +轄有 2 +轉介 2 +轉車 2 +轎車 2 +轟動一時 2 +辛亥 2 +辣妹 2 +辦 2 +辭退 2 +辯護 2 +農地 2 +農場 2 +農曆 2 +農田 2 +農藥 2 +近藤 2 +近衛 2 +迫害 2 +迴避 2 +迷 2 +迷信 2 +迷幻 2 +追溯 2 +退化 2 +送 2 +送入 2 +逃 2 +逃出 2 +逃避 2 +透明 2 +逐鹿 2 +通報 2 +通婚 2 +通知 2 +通稱 2 +通航 2 +速寫 2 +速率 2 +造出 2 +造船 2 +連同 2 +連帶 2 +連鎖 2 +週年 2 +進修 2 +進出口 2 +進化 2 +進駐 2 +遇到 2 +遊仙 2 +遊客 2 +遊玩 2 +運河 2 +運用 2 +運轉 2 +過世 2 +過勞 2 +過年 2 +過於 2 +過海 2 +過關 2 +道場 2 +道生 2 +達爾 2 +違反 2 +遞歸 2 +遠東 2 +遭受 2 +遴選 2 +遵守 2 +遷 2 +遷往 2 +選中 2 +選拔 2 +選民 2 +選為 2 +遺囑 2 +遺產 2 +遺跡 2 +遼東 2 +還珠 2 +那些 2 +那樣 2 +邦初 2 +郊外 2 +部下 2 +部件 2 +部族 2 +郵件 2 +都柏林 2 +都統 2 +鄭國 2 +鄭氏 2 +鄰國 2 +配合 2 +配對 2 +酒吧 2 +酒泉 2 +酒醉 2 +醜聞 2 +醫藥 2 +釉下 2 +里昂 2 +里程 2 +重修 2 +重型 2 +重華 2 +重言 2 +重返 2 +重重 2 +重量 2 +野獸 2 +量表 2 +金山 2 +金星 2 +金牌 2 +金蓮 2 +金雞 2 +金馬 2 +鈞 2 +銀 2 +鋅 2 +鋼鐵 2 +錢 2 +錫金 2 +鍋 2 +鍵盤 2 +鐘錶 2 +鐵伊 2 +鐵達尼 2 +鑄造 2 +長久 2 +長城 2 +長女 2 +長春 2 +長老 2 +長者 2 +長興 2 +長蘆 2 +長軸 2 +長音 2 +門前 2 +門口 2 +門戶 2 +門齒 2 +開創 2 +開口 2 +開心 2 +開拍 2 +開採 2 +開會 2 +開火 2 +開羅 2 +開賽 2 +開通 2 +開門 2 +開除 2 +閏年 2 +間接 2 +間隙 2 +閘門 2 +閱讀 2 +闊 2 +關注 2 +關聯 2 +關說 2 +關鍵 2 +關門 2 +關閉 2 +防範 2 +防衛 2 +阻擋 2 +阻礙 2 +阿 2 +阿保機 2 +阿姆斯特丹 2 +阿格 2 +阿美 2 +附帶 2 +降級 2 +降解 2 +除籍 2 +陰霾 2 +陵墓 2 +陵寢 2 +陶瓷 2 +陷阱 2 +陽澄 2 +隆頭魚 2 +隊友 2 +隊長 2 +隋 2 +隋代 2 +隔 2 +隕石 2 +隨之 2 +集成 2 +集資 2 +集雨 2 +集體 2 +雍正 2 +雕像 2 +離任 2 +離婚 2 +離心 2 +雪貂 2 +雲想 2 +零星 2 +電動 2 +電壓 2 +電流 2 +電纜 2 +電能 2 +電路 2 +電鐵 2 +震動 2 +震盪 2 +震驚 2 +霍 2 +霍普 2 +霸主 2 +霸王 2 +靈素 2 +青聯 2 +青藏 2 +青銅 2 +靜電 2 +面臨 2 +面試 2 +面部 2 +音系 2 +音變 2 +韻律 2 +頂 2 +順序 2 +預備 2 +預定 2 +預言 2 +預計 2 +頒布 2 +頒發 2 +頓 2 +領地 2 +頭等 2 +頭部 2 +願望 2 +顧 2 +顯得 2 +顯聖 2 +顯著 2 +風俗 2 +風景 2 +風氣 2 +風濕 2 +風雲 2 +風靡 2 +食夢 2 +食材 2 +飢荒 2 +飯 2 +飲品 2 +飲用 2 +餘額 2 +館藏 2 +饑荒 2 +饒舌 2 +首位 2 +首播 2 +首爾 2 +首腦 2 +首部 2 +香蕉 2 +馬丁 2 +馬克思 2 +馬克斯 2 +馬其頓 2 +馬拉松 2 +馬歇爾 2 +馬耳他 2 +馬里奧 2 +駐紮 2 +駐足 2 +駕駛 2 +骨 2 +骨頭 2 +骨髓 2 +體制 2 +體力 2 +體型 2 +體校 2 +體重 2 +體驗 2 +高低 2 +高壓 2 +高平 2 +高校 2 +高止 2 +高能 2 +高興 2 +高郵 2 +鬆散 2 +鬼 2 +魅力 2 +魏 2 +魚雷 2 +魚頭 2 +魯殊 2 +鮮明 2 +鯉形 2 +鯉科 2 +鰂魚 2 +鱸形 2 +鳥取 2 +鳥綱 2 +鳳凰 2 +鳳翔 2 +鹼 2 +鹿 2 +鹿兒 2 +麗珠 2 +麗茲 2 +麥克塞 2 +麥爾斯 2 +麻河 2 +麻省 2 +黃帝 2 +黃色 2 +黑幫 2 +黑貓 2 +黑龍 2 +黔 2 +點擊 2 +點數 2 +點球 2 +黨派 2 +鼎盛 2 +鼠疫 2 +齊 2 +齊克果 2 +齒擦 2 +齒軌 2 +齧齒 2 +龐家堡 2 +$9,999 1 +$99,999 1 ++ 1 +-99 1 +-999 1 +9--9 1 +9-9.9 1 +9.99% 1 +9.999% 1 +9.9999萬 1 +9.99萬 1 +9/9 1 +99-99 1 +999-999LR 1 +999.999 1 +9999-9999 1 +999M 1 +999X 1 +999cm 1 +999萬9千餘 1 +999餘 1 +999餘萬 1 +99B 1 +99萬9千 1 +9C 1 +9F 1 +9nd 1 +9億9999萬 1 +9億9千萬 1 +9成 1 +9百多萬 1 +9萬億 1 +9萬多 1 +A9 1 +A999-999 1 +AC 1 +AEG 1 +AEK 1 +AFD 1 +AMORC 1 +Aankhen 1 +Abante 1 +Abdurrahman 1 +Activision 1 +Adilabad 1 +Adisumarmo 1 +Admiral 1 +Advance 1 +Aero 1 +AeroMobile 1 +Aethra 1 +Ages 1 +Airlines 1 +Airport 1 +Aleksej 1 +Alliance 1 +Alpha 1 +Alyssum 1 +Android 1 +Anne 1 +Antarctic 1 +Argonauts 1 +Arwadi 1 +Arzacq-Arraziguet 1 +Auld 1 +Auteuil 1 +Avenue 1 +Aviation 1 +Aviv 1 +B9 1 +BHCs 1 +BT 1 +Bad 1 +Baldwin 1 +Ballklub 1 +Bank 1 +Baronet 1 +Barros 1 +Barsbold 1 +Beatles 1 +Beaune 1 +Beaune-Sud 1 +Beckham 1 +Beinasco 1 +Belgaum 1 +Bellagio 1 +Berg 1 +Berne-Belp 1 +Besar 1 +Blake 1 +Books 1 +Boot 1 +Brett 1 +Brian 1 +Briann 1 +Bronfenbrenner 1 +Brough 1 +Bruce 1 +Bud 1 +CARET 1 +CBE 1 +CEC 1 +CET 1 +CI-9999 1 +CIT999B 1 +CMS 1 +CNZZ 1 +CP 1 +CPU 1 +CRH999B 1 +CRH999B-999 1 +CRH999C 1 +CRYPTON 1 +Caen 1 +Calling 1 +Campaign 1 +Campostoma 1 +Canal 1 +Cannon 1 +Capital 1 +Caroline 1 +Castle 1 +Cathedral 1 +Cerro 1 +Chapman 1 +Chase 1 +Chau 1 +Chell 1 +Christopher 1 +Chrome 1 +Churchill 1 +City 1 +Claritin 1 +Clark 1 +Cohen 1 +Colchis 1 +Color 1 +Comic 1 +Company 1 +Connecticut 1 +Conroy 1 +Cornell 1 +Cost 1 +Costa 1 +Council 1 +Cushing 1 +Cálida 1 +DDC 1 +DFH9 1 +DMFC 1 +DS 1 +DSM 1 +Daisuke 1 +Dakota 1 +Damrosch 1 +Daria 1 +Dark 1 +Dart 1 +Dawn 1 +DeSanctis 1 +Dennis 1 +Derby 1 +Devasthanam 1 +Dialogue 1 +Digi 1 +DigiBook 1 +Direct 1 +Divisione 1 +Dog 1 +Doodle 1 +Dorian 1 +Dossing 1 +Dragon 1 +Durst 1 +E 1 +EPA 1 +ET 1 +EXE 1 +Eden 1 +El 1 +Electronic 1 +Elisabeth 1 +Ellie 1 +Elliot 1 +Eminescu 1 +End 1 +Entertainment 1 +Epithema 1 +Epstein 1 +Estate 1 +Expedition 1 +FLY 1 +FSB 1 +FUDOSI 1 +Falls 1 +Family 1 +Fernando 1 +Films 1 +Firefox 1 +Firozpur 1 +Fleet 1 +Fook 1 +Forever 1 +Fortran 1 +Fox 1 +Frank 1 +Franpipe 1 +Fred 1 +Frito-Lay 1 +Fund 1 +G 1 +G99A 1 +GB 1 +GTO 1 +Galliano 1 +Gear 1 +Geophysical 1 +German 1 +Ghost 1 +Gibbs 1 +Giuliano 1 +Golden 1 +Good 1 +Goodnow 1 +Government 1 +Grant 1 +Greater 1 +Greenbelt 1 +Greenville 1 +Groening 1 +Ground 1 +Group 1 +Guariglia 1 +HIV 1 +HP 1 +Halifax 1 +Harry 1 +Harvey 1 +Hau 1 +Haven 1 +HeH 1 +Herrera 1 +Herschel 1 +Higher 1 +Hillman 1 +Holy 1 +Hondt 1 +Hopkins 1 +Housing 1 +Humphrey 1 +Hunt 1 +I 1 +IB 1 +IGBT 1 +IGY 1 +IPark 1 +IUPAC 1 +IV 1 +Illumination 1 +In 1 +India 1 +Ingeri 1 +Innocence 1 +International 1 +Iron 1 +Isartor 1 +Ischl 1 +It's 1 +JPL 1 +Jay 1 +Jazz 1 +Jeff 1 +Johnson 1 +Justin 1 +Juvisy 1 +KINGFISHER 1 +KKR 1 +Kansas 1 +Karaköy 1 +Karlstor 1 +Kate 1 +Kekal 1 +Kenway 1 +Kilpatrick 1 +Kink.com 1 +Kinross 1 +Knudstrup 1 +Koffka 1 +Kurnool 1 +Kurt 1 +LCD 1 +LD99 1 +Langdon 1 +Langford 1 +Language 1 +Last 1 +Leaf 1 +Lees 1 +Lennart 1 +Lethal 1 +Liaoxipterus 1 +Lilim 1 +Linux 1 +Liu 1 +Lomidine 1 +Lotz 1 +Low 1 +Lowell 1 +MD-99 1 +MM 1 +Maddie 1 +Magic 1 +Magma 1 +MallRide 1 +Mamaia 1 +Man 1 +Manea 1 +Maolan 1 +Maria 1 +Mario 1 +Market 1 +Marshlands 1 +Martin 1 +Mayflower 1 +Mechernich 1 +Medical 1 +Menachem 1 +Merina 1 +Methala 1 +Metress 1 +Meyers 1 +Michaelerkirche 1 +Micro 1 +Micro-USM 1 +Middle 1 +Mihai 1 +Mintz 1 +Mitchell 1 +Modern 1 +Mogens 1 +Money 1 +Monsters 1 +Montana 1 +Multitier 1 +Mundell 1 +Museum 1 +My 1 +Myers 1 +N99 1 +NCAA 1 +NHK 1 +NIST 1 +Name 1 +Nanocells 1 +Natasha 1 +Nazionale 1 +Neluset 1 +Neverwhere 1 +Niarchos 1 +Nibiru 1 +Nirmal 1 +Norman 1 +North 1 +Novogrudok 1 +O. 1 +ORI 1 +OVA 1 +Odd 1 +Omega 1 +Omniworld 1 +Online 1 +Opus 1 +Orjan 1 +Orkney 1 +Ospatulus 1 +Otto 1 +P 1 +P9O9 1 +PASMO 1 +PFA 1 +PLANES 1 +Paleorhinus 1 +Pangjiabu 1 +Papa 1 +Park 1 +Pau 1 +Paul 1 +Perouse 1 +Persson 1 +Perth 1 +Phil 1 +Philippa 1 +Piano 1 +Pinerolo 1 +Pisapia 1 +Pittsburghia 1 +Place 1 +PlanetShanghai 1 +Playgirl 1 +Police 1 +Pre-rendering 1 +Presbyterian 1 +Primary 1 +Psychology 1 +Pukaki 1 +Pulau 1 +Purma 1 +Quartet 1 +Quentin 1 +Quest 1 +R9 1 +RBK 1 +RBS 1 +RIAA 1 +Railway 1 +Record 1 +Recordon 1 +Reserve 1 +Return 1 +Review 1 +RhCl9 1 +Rinchen 1 +River 1 +Roble 1 +Rocha 1 +Rolf 1 +Rosenborg 1 +Rossabi 1 +Ruger 1 +Russell 1 +S9 1 +SBE 1 +SEC 1 +SS 1 +ST 1 +STIF 1 +Safari 1 +Salomon 1 +Sam 1 +Sara 1 +Sarianidi 1 +Savannah 1 +School 1 +Schuchat 1 +Sea 1 +Secobarbital 1 +Seemann 1 +Sendlinger 1 +SensMe 1 +Shame 1 +Sharon 1 +Sheegog 1 +Sheinkin 1 +Simon 1 +Snipes 1 +Social 1 +Sofi 1 +Soobedars 1 +Soviet 1 +Spector 1 +Spirit 1 +Spittel 1 +Sportsnet 1 +Srisailamgudem 1 +Standard 1 +Stanton 1 +Star 1 +Statpipe 1 +Stavros 1 +Steinbeck 1 +Stephen 1 +Steven 1 +Stonewall 1 +Street 1 +Streymoy 1 +Stutsman 1 +Suica 1 +Sunset 1 +Suzuki 1 +Syahrin 1 +Sōya 1 +T 1 +TF 1 +TF99 1 +TNM 1 +TVS-9 1 +Tau 1 +Technology 1 +Tel 1 +Texas 1 +Theodor 1 +Thomas 1 +Thrissur 1 +Timati 1 +Time 1 +Tor 1 +Train 1 +Tru 1 +Tsang 1 +Tweddle 1 +Twisty 1 +Tyler 1 +UMLS 1 +USPHS 1 +Uhler-Phillips 1 +Un 1 +Union 1 +University 1 +Utricularia 1 +VVVF 1 +Valla 1 +Varginha 1 +Victoria 1 +Viktor 1 +Villa 1 +Volantis 1 +WHO 1 +WTA 1 +Walker 1 +Walter 1 +Wesley 1 +West 1 +Westmeath 1 +Wheeler 1 +Wii 1 +William 1 +Wing 1 +Wireless 1 +Woman 1 +Wood 1 +Woodside 1 +World 1 +X 1 +Year 1 +YouTube 1 +Zeepipe 1 +`` 1 +académie 1 +architecture 1 +asteroid 1 +bar 1 +bransoni 1 +can 1 +ceyhan 1 +copper 1 +director 1 +double 1 +e 1 +earth 1 +entity 1 +f(x) 1 +g(x) 1 +gbest 1 +hangar 1 +hear 1 +iPhone 1 +iTunes 1 +justice 1 +km 1 +laurifolia 1 +liability 1 +loratadin 1 +managing 1 +morus 1 +n=9 1 +nickel 1 +no 1 +one 1 +ornatum 1 +p 1 +pbest 1 +peronismo 1 +rhythm 1 +rock 1 +sandwithii 1 +scream 1 +scree 1 +shelters 1 +space 1 +study 1 +supply 1 +t.999.com 1 +t.qq.com 1 +t.sina.com.cn 1 +t.sohu.com 1 +t.xxxx.com 1 +to 1 +touch 1 +trail 1 +truncatulus 1 +view 1 +w=9 1 +white 1 +x 1 +you 1 +zone 1 +° 1 +ð 1 +þ 1 +̄ 1 +θ 1 +〔 1 +〕 1 +一, 1 +一中全會 1 +一九五八 1 +一併 1 +一億 1 +一八 1 +一分為二 1 +一到 1 +一勞永逸 1 +一反其道 1 +一字一句 1 +一式一樣 1 +一成 1 +一戰 1 +一改 1 +一時 1 +一概 1 +一模一樣 1 +一氧化碳 1 +一炮 1 +一爭 1 +一發 1 +一百 1 +一百幾十 1 +一百萬 1 +一百餘 1 +一益 1 +一而再、再而三 1 +一舉 1 +一落千丈 1 +一見鍾情 1 +一路 1 +一身 1 +一邊 1 +一點 1 +丁字 1 +丁目 1 +七七 1 +七十 1 +七里 1 +三、 1 +三一 1 +三中 1 +三中全會 1 +三井 1 +三井住友 1 +三亞 1 +三元 1 +三十四 1 +三原 1 +三崎 1 +三星 1 +三氯化銠 1 +三氯氧釩 1 +三浦 1 +三王 1 +三百 1 +三百六七十 1 +三百多 1 +三索頜腔蛇 1 +三船 1 +三菱 1 +三萬 1 +三藩市 1 +三軍 1 +三門 1 +上下 1 +上下行 1 +上傳 1 +上去 1 +上古 1 +上司 1 +上埔 1 +上報 1 +上塘 1 +上奏 1 +上學 1 +上尉 1 +上手 1 +上新世 1 +上朝 1 +上林 1 +上沖 1 +上班 1 +上端 1 +上網 1 +上線 1 +上色 1 +上蓋 1 +上訪 1 +上調 1 +上路 1 +上身 1 +上車 1 +上選 1 +上部 1 +上限 1 +上集 1 +上雲 1 +上顎 1 +下剋上高潮 1 +下圖 1 +下徹 1 +下樓 1 +下河 1 +下潛 1 +下獄 1 +下稱 1 +下蝕 1 +下設 1 +下課 1 +下跌 1 +下車 1 +下遊 1 +下部 1 +下關 1 +下院 1 +下集 1 +下雷 1 +下面 1 +下顎 1 +下風 1 +不丹 1 +不乏 1 +不以為然 1 +不克 1 +不入 1 +不凡 1 +不出 1 +不出所料 1 +不利 1 +不到 1 +不力 1 +不動 1 +不去 1 +不吃 1 +不合 1 +不和 1 +不問 1 +不均 1 +不多 1 +不大 1 +不定 1 +不實 1 +不惜 1 +不愛 1 +不懷好意 1 +不折不扣 1 +不捨 1 +不收 1 +不敬 1 +不料 1 +不易 1 +不景 1 +不服 1 +不朽 1 +不歸 1 +不準 1 +不理 1 +不畏 1 +不符 1 +不純 1 +不絕 1 +不行 1 +不衰 1 +不要 1 +不見天日 1 +不解 1 +不計其數 1 +不該 1 +不詳 1 +不豐 1 +不賣 1 +不輸 1 +不辭辛勞 1 +不道 1 +不適 1 +不銹 1 +不限 1 +不露 1 +不顧 1 +且是 1 +世上 1 +世人 1 +世代相傳 1 +世充 1 +世則 1 +世子 1 +世家 1 +世昌 1 +世民 1 +世田谷 1 +世祿 1 +世綱 1 +世貿 1 +世道 1 +世銘 1 +丘 1 +丙組 1 +丞相 1 +並無 1 +並稱 1 +並系 1 +中信 1 +中南 1 +中南海 1 +中原 1 +中堅 1 +中場 1 +中底層 1 +中彈 1 +中性 1 +中投 1 +中斷 1 +中旬 1 +中校 1 +中樞 1 +中檔 1 +中殿 1 +中毒 1 +中波希米亞 1 +中田 1 +中級 1 +中綴 1 +中線 1 +中耳 1 +中聯 1 +中興 1 +中葉 1 +中藥 1 +中西方 1 +中西醫 1 +中觀 1 +中超 1 +中農 1 +中鐵 1 +串聯 1 +丸都 1 +丹 1 +丹噶爾 1 +丹尼士達智 1 +丹路殊 1 +主修 1 +主創 1 +主導 1 +主帶 1 +主幹 1 +主意 1 +主控 1 +主治 1 +主炮 1 +主犯 1 +主筆 1 +主船 1 +主食 1 +乃威 1 +久經 1 +久藏 1 +之所以 1 +之申 1 +之銓 1 +之鋒 1 +乘勢 1 +乘搭 1 +乘撘 1 +乘裝 1 +乙 1 +乙二胺 1 +乙未 1 +乙組 1 +乙苯 1 +九一一 1 +九十 1 +九江 1 +九鐵 1 +乳房 1 +乾季 1 +乾德 1 +乾淨 1 +乾西 1 +亂 1 +亂倫 1 +亂刀 1 +事先 1 +事態 1 +事發 1 +事與願違 1 +事跡 1 +事蹟 1 +二中全會 1 +二二八 1 +二十一 1 +二十二 1 +二十五 1 +二十八 1 +二十多 1 +二十萬 1 +二宮 1 +二戶 1 +二百 1 +二百五十餘 1 +二百餘 1 +二胺 1 +二郎 1 +于敏 1 +互作 1 +互利 1 +互助 1 +互惠 1 +互通 1 +互選 1 +五一 1 +五中全會 1 +五分之一 1 +五十 1 +五十一 1 +五十六 1 +五常 1 +五弟 1 +五彩繽紛 1 +五成半 1 +五指 1 +五氧化二氮 1 +五百萬 1 +五萬三千 1 +井字 1 +井村 1 +井田 1 +些微 1 +亞丁 1 +亞他那修 1 +亞伯塔 1 +亞伯拉罕 1 +亞冠龍 1 +亞利桑納 1 +亞基 1 +亞奧 1 +亞彬 1 +亞德里亞堡 1 +亞文 1 +亞普芮 1 +亞東 1 +亞歷山大丹尼士 1 +亞流 1 +亞烏扎 1 +亞特蘭大 1 +亞瑟 1 +亞當斯 1 +亞西爾 1 +亞運 1 +亞邦 1 +亞麻 1 +亡故 1 +交付 1 +交代 1 +交出 1 +交口 1 +交回 1 +交州 1 +交替 1 +交棒 1 +交涉 1 +交界 1 +交行 1 +交角 1 +交談 1 +交道 1 +交錯 1 +亦即 1 +亨 1 +亨得利 1 +享 1 +京劇 1 +京王 1 +京釜 1 +亭湖 1 +亮相 1 +人世 1 +人仕 1 +人字 1 +人客 1 +人手 1 +人日 1 +人權 1 +人殉 1 +人氣 1 +人祭 1 +人種 1 +人稱 1 +人行 1 +人道 1 +人選 1 +人麻呂 1 +仁傑 1 +仁和 1 +仁壽 1 +仁守 1 +仁宗 1 +仁煥 1 +仁牙因 1 +仁玕 1 +仁穆 1 +仁粹 1 +仁青 1 +仇人 1 +今川 1 +介壽 1 +介質 1 +仍是 1 +仍有 1 +仍算 1 +他倆 1 +他家 1 +仙人打坐 1 +仙女木 1 +仙鶴 1 +代亞布羅 1 +代價 1 +代名詞 1 +代幣 1 +代數 1 +代牧 1 +代碼 1 +令狐 1 +令華 1 +以爲 1 +仰光 1 +仰望 1 +仲 1 +仲雄 1 +任免 1 +任選 1 +伊 1 +伊克巴爾 1 +伊利 1 +伊利沙伯 1 +伊塔蒂亞亞 1 +伊娃 1 +伊尹 1 +伊摩琴 1 +伊朗 1 +伊犁 1 +伊甸 1 +伊薩爾 1 +伊里亞德 1 +伊阿宋 1 +伊頓 1 +伍德 1 +伍德羅 1 +伎倆 1 +伏塔 1 +伏契克 1 +伏爾加 1 +伏瓦蒂爾 1 +伐 1 +休假 1 +休克 1 +休士頓 1 +休憩 1 +休斯 1 +休閑 1 +休養 1 +伙食 1 +伯克爾 1 +伯多祿 1 +伯恩 1 +伯恩哈德 1 +伯明翰 1 +伯格 1 +伯溫 1 +伯爾尼 1 +伯納姆 1 +伯納雷 1 +伯茲貝格 1 +伯莎 1 +伯虎 1 +伯謙 1 +伯達 1 +伴侶 1 +伴奏 1 +伴有 1 +伴生 1 +伶 1 +伸一 1 +伸冤 1 +伸延 1 +伸港 1 +伽馬 1 +佈局 1 +佈置 1 +佈道 1 +位在 1 +位居 1 +位階 1 +位面 1 +低下 1 +低估 1 +低價 1 +低層 1 +低平 1 +低座 1 +低檔 1 +低潮 1 +低等 1 +低調 1 +低額 1 +住所 1 +住進 1 +佐佐木 1 +佐勞爾 1 +佐和子 1 +佐民 1 +佔用 1 +何利菲德 1 +何力特 1 +何方 1 +佛事 1 +佛典 1 +佛瑞爾斯 1 +佛經 1 +佛羅倫斯 1 +佛羅里達 1 +佛萊明 1 +佛蒙特 1 +佛頭 1 +作對 1 +作怪 1 +作曲 1 +作次郎 1 +作法 1 +作為 1 +作畫 1 +作雲 1 +作風 1 +佩佐拉諾 1 +佩儂 1 +佩戴 1 +佩琪 1 +佩蘭多 1 +佬 1 +佳作 1 +佳佳 1 +佳節 1 +併發 1 +使喚 1 +使團 1 +使節 1 +侄子 1 +來看 1 +來臨 1 +來襲 1 +來館 1 +侈談 1 +侍奉 1 +侍女 1 +侍從 1 +侏羅 1 +供水 1 +供電 1 +供養 1 +依次 1 +依照 1 +依瑪 1 +依託 1 +依託泊苷 1 +依附 1 +侮辱 1 +侯 1 +侵佔 1 +侵害 1 +便利 1 +便捷 1 +便是 1 +便服 1 +便當 1 +便秘 1 +俊業 1 +俗 1 +俘獲 1 +保 1 +保住 1 +保全 1 +保加爾 1 +保大 1 +保定 1 +保密 1 +保明 1 +保溫 1 +保羅費雷拉 1 +保送 1 +保養 1 +俠 1 +信中 1 +信念 1 +信教 1 +信玄 1 +信神 1 +信竹 1 +信裡 1 +修好 1 +修學 1 +修憲 1 +修煉 1 +修羅 1 +修葺 1 +修鞋 1 +修養 1 +俯瞰 1 +俸祿 1 +俾路支 1 +倉促 1 +倉庫 1 +個位 1 +個個 1 +個展 1 +倒下 1 +倒入 1 +倖免 1 +候旨 1 +候補 1 +倚天 1 +倚靠 1 +借 1 +倩文 1 +倫巴底 1 +倫拜 1 +倫納特 1 +倬標 1 +倭國 1 +倭寇 1 +假使 1 +假借 1 +假名 1 +假帳 1 +假設 1 +假說 1 +假象 1 +假釋 1 +假面 1 +偉 1 +偉強 1 +偏低 1 +偏僻 1 +偏向 1 +偏小 1 +偏東 1 +偏重 1 +偏離 1 +做到 1 +停刊 1 +停業 1 +停機 1 +停泊 1 +停職 1 +停辦 1 +停靠 1 +停飛 1 +健壯 1 +健將 1 +健身 1 +側目 1 +側邊 1 +偵察 1 +偵測 1 +偵緝 1 +偶像 1 +偶發 1 +偷取 1 +偷羊 1 +偷襲 1 +偷走 1 +偽 1 +偽季米特里 1 +偽裝 1 +傀儡 1 +傅萊 1 +傍 1 +傍晚 1 +傑克托爾 1 +傑志 1 +傑斐遜 1 +備忘 1 +備戰 1 +備案 1 +備用 1 +備註 1 +傢具 1 +催芽 1 +傭人 1 +傳來 1 +傳給 1 +傳記 1 +傳遍 1 +債券 1 +傷及 1 +傷心 1 +傷患 1 +傷悲 1 +傷病 1 +傷透 1 +傻 1 +傾心 1 +傾談 1 +僅屬 1 +僅用 1 +像差 1 +僑 1 +僕人 1 +僖 1 +僧人 1 +僧孺 1 +僧尼 1 +僧格 1 +僧祐 1 +僱主 1 +僱傭 1 +僵局 1 +價位 1 +價錢 1 +儀器 1 +億 1 +儒士 1 +儘快 1 +儘量 1 +償付 1 +優 1 +優值 1 +優良 1 +優裕 1 +優質 1 +儲量 1 +儷 1 +允 1 +允良 1 +元子 1 +元朝 1 +元氣 1 +元澄 1 +元老 1 +元起 1 +兄 1 +兄長 1 +充任 1 +充分 1 +充氣 1 +充滿 1 +充軍 1 +兆基 1 +兆楠 1 +兆陽 1 +兇多吉少 1 +兇悍 1 +兇猛 1 +先前 1 +先帝 1 +先師 1 +先賢 1 +先鋒 1 +先驗 1 +光啟 1 +光學 1 +光宇 1 +光州 1 +光度 1 +光復 1 +光景 1 +光束 1 +光泰 1 +光滑 1 +光照 1 +光環 1 +光范 1 +光華 1 +光顧 1 +克利普頓 1 +克力佛 1 +克勤 1 +克家 1 +克拉瑪 1 +克拉西奇 1 +克敏能 1 +克欽 1 +克洛頓 1 +克特勒 1 +克羅維茲 1 +克羅迪歐 1 +克蘇魯 1 +克裡斯 1 +克農 1 +克里姆希爾特 1 +克里斯多夫 1 +克里斯多弗 1 +克里斯托弗 1 +克里波門 1 +克魯 1 +兌換 1 +免 1 +免疫 1 +免遭 1 +兔毛 1 +兢兢業業 1 +入世 1 +入地 1 +入塞 1 +入境 1 +入手 1 +入聲 1 +入股 1 +入閘 1 +入院 1 +入駐 1 +內化 1 +內卡薩 1 +內在 1 +內埔 1 +內壁 1 +內政 1 +內置 1 +內胎 1 +內臟 1 +內載 1 +內遷 1 +全劇 1 +全名 1 +全境 1 +全壘 1 +全套 1 +全島 1 +全州 1 +全得 1 +全德 1 +全效 1 +全敗 1 +全數 1 +全書 1 +全盛 1 +全盤 1 +全省 1 +全福 1 +全程 1 +全稱 1 +全線 1 +全興 1 +全邨 1 +全鎮 1 +全隊 1 +全額 1 +全黑 1 +兩億 1 +兩千五百萬 1 +兩千萬 1 +八世 1 +八十九 1 +八卦 1 +八思巴 1 +八成 1 +八杉 1 +八百 1 +公仔 1 +公佈 1 +公克 1 +公告 1 +公墓 1 +公屋 1 +公斤 1 +公款 1 +公正 1 +公狼 1 +公約 1 +公衛 1 +公袥 1 +公視 1 +公超 1 +公關 1 +公頃 1 +公館 1 +六七 1 +六千 1 +六千四百萬 1 +六合 1 +六四 1 +六安 1 +共享 1 +共尾 1 +共生 1 +共處 1 +共識 1 +共鳴 1 +兵房 1 +兵鋒 1 +其妻 1 +其子 1 +其次 1 +其母 1 +典籍 1 +兼修 1 +兼具 1 +兼容 1 +兼屬 1 +兼并 1 +冀望 1 +冉 1 +冊 1 +再三 1 +再保 1 +再用 1 +再臨 1 +再補 1 +再見 1 +冒 1 +冒險 1 +冠 1 +冠上 1 +冠峰 1 +冠狀 1 +冠玉 1 +冢 1 +冤案 1 +冥冥 1 +冥想 1 +冬初 1 +冬眠 1 +冬青 1 +冰 1 +冰冰 1 +冰塔 1 +冰晶 1 +冰柱 1 +冰河 1 +冰湖 1 +冰瀑 1 +冰球 1 +冰風 1 +冷凍 1 +冷暖氣 1 +冷次 1 +冷氣 1 +冷眼 1 +冷遇 1 +冷靜 1 +凄美 1 +准 1 +准考 1 +凈白 1 +凊 1 +凌 1 +凌日 1 +凌晨 1 +凌辱 1 +凌駕 1 +凍傷 1 +凝結 1 +凡爾登 1 +凡爾賽 1 +凱恩 1 +凱文 1 +凱爾特 1 +凱維埃爾 1 +凱美特 1 +凱茜 1 +凶 1 +凸 1 +凸起 1 +凹版 1 +出世 1 +出人意料 1 +出到 1 +出動 1 +出去 1 +出名 1 +出品 1 +出國 1 +出城 1 +出奇 1 +出嫁 1 +出局 1 +出師 1 +出廠 1 +出征 1 +出手 1 +出擊 1 +出校 1 +出榜 1 +出血 1 +出訪 1 +出路 1 +出逃 1 +出門 1 +出頭 1 +刀鞘 1 +分工 1 +分店 1 +分批 1 +分攤 1 +分數 1 +分明 1 +分枝 1 +分校 1 +分泌 1 +分流 1 +分發 1 +分科 1 +分立 1 +分站 1 +分管 1 +分組 1 +分缺 1 +分貝 1 +分辨 1 +分部 1 +分鏡 1 +分隔 1 +分離 1 +分題 1 +分點 1 +切下 1 +切分 1 +切割 1 +切合 1 +切實 1 +切成 1 +切望 1 +切爾尼赫 1 +切片 1 +刑事 1 +刑部 1 +划算 1 +划艇 1 +列斯聯 1 +列維爾 1 +初中 1 +初始 1 +初時 1 +初次 1 +初步 1 +初見 1 +判 1 +判令 1 +判定 1 +判寺事 1 +判詞 1 +別人 1 +別名 1 +別院 1 +利他能 1 +利刃 1 +利好 1 +利潘迪特蘭堡 1 +利維奧 1 +刪剪 1 +刮目相看 1 +到任 1 +到期 1 +到發 1 +制動 1 +制式 1 +制瓷 1 +制約 1 +制酸 1 +刷 1 +刷到 1 +券 1 +券頂 1 +刺殺 1 +刻劃 1 +刻寫 1 +刻板 1 +刻滿 1 +刻畫 1 +則士 1 +則里拉 1 +削減 1 +前傾 1 +前去 1 +前因後果 1 +前奏 1 +前委 1 +前嫌 1 +前季 1 +前提 1 +前景 1 +前稱 1 +前端 1 +前綴 1 +前者 1 +前肢 1 +前齒 1 +剛剛 1 +剛性 1 +剛直 1 +剛鐸 1 +剩 1 +剩餘 1 +副長 1 +割據 1 +割破 1 +割讓 1 +割開 1 +創保 1 +創傷 1 +創刊 1 +創煥 1 +創生 1 +剷除 1 +剿 1 +剿滅 1 +劃出 1 +劃歸 1 +劃界 1 +劇中 1 +劇作 1 +劇場 1 +劇組 1 +劍俠 1 +劍法 1 +劍麻 1 +劑量 1 +力克 1 +力圖 1 +力霸 1 +功勞 1 +功德 1 +功樂 1 +功績 1 +加侖 1 +加值 1 +加冕 1 +加利奇 1 +加劇 1 +加勁 1 +加恩卡納 1 +加爾文 1 +加粗 1 +加藤 1 +加賀 1 +加速 1 +加電 1 +劣 1 +助 1 +助手 1 +助燃 1 +助聽 1 +助長 1 +努兒道刺特 1 +劫匪 1 +劫持 1 +効忠 1 +勁光 1 +勁報 1 +勁敵 1 +勁歌 1 +勃起 1 +勇俊 1 +勇士 1 +勇武 1 +勒溫 1 +動人 1 +動向 1 +動土 1 +動漫 1 +動漫畫 1 +動用 1 +動能 1 +動蕩 1 +動詞 1 +動量 1 +勘探 1 +務工 1 +勝 1 +勝任 1 +勝昭 1 +勝素 1 +勝者 1 +勝訴 1 +勝賴 1 +勞埃德 1 +勞累 1 +募款 1 +募集 1 +勢傾中外 1 +勢能 1 +勤先 1 +勤快 1 +勳位 1 +勳爵 1 +勵珍 1 +勸 1 +勾形 1 +勾畫 1 +勾結 1 +包袱 1 +包裹 1 +包覆 1 +包頭 1 +化名 1 +化妝 1 +化成 1 +化整為零 1 +化用 1 +化肥 1 +北伐 1 +北側 1 +北冰 1 +北卡羅萊納 1 +北景 1 +北歐 1 +北段 1 +北甘馬粦 1 +北美擬獅 1 +北車 1 +北返 1 +北達科他 1 +北邊 1 +匡 1 +匯入 1 +匯合 1 +匯報 1 +匯聯 1 +匯集 1 +匹 1 +匹茲堡 1 +匾額 1 +區塊 1 +區段 1 +區間 1 +十二世 1 +十二烷基苯 1 +十全十美 1 +十八億 1 +十八大 1 +十四 1 +十數 1 +十萬 1 +十餘 1 +千兆 1 +千克 1 +千島 1 +千方百計 1 +千春 1 +千瓦 1 +千米 1 +千萬 1 +千里迢迢 1 +千陽 1 +千鶴 1 +升值 1 +升到 1 +升天 1 +升越 1 +升降 1 +升高 1 +午膳 1 +半導體 1 +半牧 1 +半農 1 +卑詩 1 +卓著 1 +協合 1 +協理 1 +南人 1 +南卡羅萊納 1 +南哲 1 +南大 1 +南安 1 +南安普頓 1 +南寧 1 +南市 1 +南征 1 +南端 1 +南線 1 +南美 1 +南臨 1 +南航 1 +南船 1 +南路 1 +南通 1 +南遷 1 +南鄰 1 +南門 1 +南開 1 +南院 1 +南雄 1 +南麓 1 +博 1 +博凱蒂 1 +博多 1 +博學 1 +博斯維爾 1 +博格 1 +博洛尼亞 1 +博滕 1 +博義 1 +博覽 1 +占星 1 +卡亞尼 1 +卡內拉 1 +卡利帕斯 1 +卡力崗 1 +卡夫 1 +卡夫卡 1 +卡巴雷羅 1 +卡希 1 +卡帕克 1 +卡拉OK 1 +卡拉柯伊 1 +卡拉維拿 1 +卡斯楚 1 +卡斯特羅 1 +卡普里維 1 +卡波特 1 +卡洛克 1 +卡洛斯 1 +卡洛曼 1 +卡洛琳 1 +卡羅來納 1 +卡羅萊納 1 +卡臣 1 +卡薩諾瓦 1 +卡車 1 +卡達 1 +卦 1 +卧底 1 +卧病 1 +卧薪嘗膽 1 +印信 1 +印刷 1 +印地安那 1 +印度尼西亞 1 +印第安納 1 +印第安納波利斯 1 +印表 1 +危在旦夕 1 +危害 1 +危殆 1 +即場 1 +即有 1 +卵內 1 +厘 1 +原先 1 +原型 1 +原姓 1 +原屬 1 +原平 1 +原意 1 +原指 1 +原文 1 +原核 1 +原畫 1 +原籍 1 +原罪 1 +原諒 1 +厥 1 +厭世 1 +厭惡 1 +去搶 1 +去留 1 +去看 1 +參戰 1 +參政 1 +參演 1 +參看 1 +參禮 1 +參贊 1 +參閱 1 +又廷 1 +又或 1 +及後 1 +及時 1 +友 1 +友情 1 +友邦 1 +反共 1 +反動 1 +反右 1 +反向 1 +反恐 1 +反省 1 +反綁 1 +反證 1 +反響 1 +反黨 1 +叔父 1 +取下 1 +取出 1 +取名 1 +取回 1 +取悅 1 +取液 1 +取物 1 +取用 1 +取而代之 1 +受命 1 +受孕 1 +受害 1 +受挫 1 +受洗 1 +受精 1 +受罰 1 +受襲 1 +受賄 1 +受阻 1 +受雇 1 +叛徒 1 +叛變 1 +叛軍 1 +叡 1 +叢刊 1 +叢書 1 +口供 1 +口信 1 +口吻 1 +口感 1 +口服 1 +口音 1 +古喙龍 1 +古堡 1 +古寺 1 +古廟 1 +古德諾 1 +古惑 1 +古斯塔夫 1 +古爾德 1 +古迹 1 +古都斯 1 +句子 1 +句點 1 +另加 1 +另娶 1 +另立 1 +另築 1 +另類 1 +只好 1 +只是 1 +只會 1 +只知 1 +只能 1 +叫作 1 +叫拜 1 +叫聲 1 +召 1 +召集 1 +可可 1 +可塑 1 +可愛 1 +可憐 1 +可樂 1 +可欣 1 +可西卡 1 +可靠 1 +可風 1 +台南 1 +台標 1 +台視 1 +台詞 1 +台長 1 +史前 1 +史坦貝克 1 +史官 1 +史帝芬 1 +史特勞斯 1 +史稱 1 +史記 1 +史跡 1 +史館 1 +右任 1 +右手 1 +右方 1 +右臂 1 +司可巴比妥 1 +司鐸 1 +吁宋 1 +吃上 1 +吃到 1 +吃掉 1 +吃法 1 +吃起 1 +各方 1 +各球 1 +各異 1 +各科 1 +各職 1 +各處 1 +各行各業 1 +各隊 1 +各項 1 +合共 1 +合力 1 +合和 1 +合唱 1 +合夥 1 +合奏 1 +合流 1 +合約 1 +合計 1 +合資 1 +合辦 1 +合適 1 +合陽 1 +合體 1 +吉利 1 +吉奧瓦尼 1 +吉姆 1 +吉布地 1 +吉拉德 1 +吉爾伯特 1 +吉祥 1 +吉米 1 +吉西 1 +吉隆坡 1 +吋 1 +同仁社 1 +同伴 1 +同僚 1 +同台 1 +同型 1 +同工 1 +同志 1 +同日 1 +同校 1 +同步 1 +同母 1 +同父 1 +同甘共苦 1 +同行 1 +同郷 1 +同食 1 +同飲 1 +名作 1 +名分 1 +名利雙收 1 +名城 1 +名帥 1 +名師 1 +名村 1 +名氣 1 +名流 1 +名聲 1 +名臣 1 +名茶 1 +名號 1 +名門 1 +名額 1 +后 1 +后妃 1 +吐 1 +吐嘈 1 +向前 1 +向滋 1 +君如 1 +君權 1 +君長 1 +吞下 1 +吟唱 1 +否 1 +否決 1 +吧 1 +吩咐 1 +含 1 +含糖 1 +含量 1 +吳王 1 +吵醒 1 +吸塵 1 +吸毒 1 +吸菸 1 +吸附 1 +吸食 1 +吹來 1 +吹氣 1 +吹滅 1 +吻部 1 +呀 1 +呂宋 1 +呆 1 +呈交 1 +告戒 1 +告白 1 +周代 1 +周刊 1 +周敏 1 +周日 1 +周朝 1 +周期 1 +周迅 1 +周遭 1 +味道 1 +呼 1 +呼倫貝爾 1 +呼和浩特 1 +命題 1 +和夫 1 +和好 1 +和宜合 1 +和康 1 +和暖 1 +和會 1 +和林 1 +和樹 1 +和睦 1 +和美 1 +和衷 1 +和親 1 +和記 1 +和諧 1 +和議 1 +咧嘴 1 +咬弦 1 +咸平 1 +咸康 1 +咸淳 1 +咸美頓 1 +咸鏡 1 +咸陽 1 +哀悼 1 +品嘗 1 +品學兼優 1 +品德 1 +品源 1 +哈 1 +哈丹姆 1 +哈依拉爾 1 +哈剌旭烈 1 +哈吉 1 +哈布斯堡 1 +哈希姆 1 +哈恩 1 +哈拉帕那瓦 1 +哈索爾 1 +哈羅 1 +哈萊姆 1 +哈薩克 1 +哈達 1 +哈里斯堡 1 +哈里森 1 +哈默史密斯 1 +員佐 1 +員外 1 +哥利茲 1 +哥德堡 1 +哨所 1 +哪 1 +哭 1 +哲也 1 +哲元 1 +哲孟雄 1 +哲生 1 +哲蚌 1 +唇槍舌劍 1 +唐代 1 +售予 1 +售出 1 +售票 1 +唯 1 +唯獨 1 +唱戲 1 +唱法 1 +唸 1 +唸珠 1 +唾液 1 +商事 1 +商務 1 +商圈 1 +商城 1 +商埠 1 +商場 1 +商幫 1 +商朝 1 +商湯 1 +商用 1 +商羯羅 1 +商船 1 +商量 1 +啊 1 +問吧 1 +問話 1 +啟 1 +啟傑 1 +啟明 1 +啟發 1 +啟示 1 +啟程 1 +啟聯 1 +啟鑰 1 +啤酒 1 +喀什 1 +喀拉拉邦 1 +喀里多尼亞 1 +善事 1 +善作 1 +善待 1 +善後 1 +善惡 1 +善撲 1 +善良 1 +喇薩 1 +喊出 1 +喘息 1 +喙 1 +喙端 1 +喚 1 +喚回 1 +喚起 1 +喜 1 +喜好 1 +喝醉 1 +喝采 1 +喪失 1 +喬姆斯基 1 +喬木 1 +喬科維奇 1 +單獨 1 +單調 1 +單質 1 +單項 1 +嗅到 1 +嗎 1 +嗜酸 1 +嗜鹼 1 +嗣位 1 +嗣業 1 +嘉木揚 1 +嘉木樣 1 +嘉樂 1 +嘉許 1 +嘉道理 1 +嘉陵 1 +嘉靖 1 +嘔吐 1 +嘩然 1 +嘯林 1 +嘴 1 +噁心 1 +噁爆 1 +器具 1 +器械 1 +器蓋 1 +器身 1 +噴射 1 +噸位 1 +嚇人 1 +嚮導 1 +嚴 1 +嚴令 1 +嚴加 1 +嚴島 1 +嚴懲 1 +嚴斥 1 +嚴氏 1 +嚴肅 1 +嚴謹 1 +囊胚 1 +囑咐 1 +囚犯 1 +四十三 1 +四十多 1 +四十餘 1 +四周 1 +四平 1 +四方八面 1 +四牌 1 +四萬 1 +四郎 1 +回信 1 +回合 1 +回填 1 +回家 1 +回寺 1 +回彈 1 +回復 1 +回教 1 +回生 1 +回程 1 +回答 1 +因弗內斯 1 +因達農 1 +困 1 +困住 1 +困擾 1 +固 1 +固態 1 +固有 1 +固醇 1 +國中 1 +國主 1 +國光 1 +國公 1 +國共 1 +國史 1 +國名 1 +國君 1 +國土 1 +國奧 1 +國妃 1 +國安會 1 +國府 1 +國庫 1 +國情 1 +國慶 1 +國成 1 +國松 1 +國父 1 +國產 1 +國界 1 +國立 1 +國策 1 +國諱 1 +國雄 1 +圍坐 1 +圍棋 1 +圍牆 1 +圍魏救趙 1 +園丁 1 +園主 1 +園內 1 +園明園 1 +園林 1 +圓 1 +圓圓 1 +圓弧 1 +圓柱 1 +圓滑 1 +圓環 1 +圖取 1 +圖布丹 1 +圖形 1 +圖片 1 +圖示 1 +圖稿 1 +團圓 1 +團隊 1 +土匪 1 +土司 1 +土石 1 +土虱 1 +在崗 1 +在校 1 +在身 1 +地名 1 +地域 1 +地基 1 +地平 1 +地庫 1 +地政 1 +地板 1 +地標 1 +地盤 1 +地級 1 +地表 1 +地貌 1 +地質 1 +地道 1 +地震 1 +坂本 1 +均勻 1 +均衡 1 +坎特伯里 1 +坎貝爾 1 +坎農 1 +坐在 1 +坐監 1 +坐骨 1 +坡子 1 +坤玲 1 +坦 1 +坦克 1 +坦干伊喀 1 +坦然 1 +坦白 1 +型式 1 +垮台 1 +埃內韋塔克 1 +埃弗里 1 +埃米內斯庫 1 +埃米琳 1 +埃胡德 1 +埃雷拉 1 +埋怨 1 +埋葬 1 +埋藏 1 +城主 1 +城光 1 +城內 1 +城南 1 +城址 1 +城巴 1 +城池 1 +城牆 1 +城西 1 +城隍 1 +埜堂 1 +埤 1 +執委 1 +執業 1 +執飛 1 +培元 1 +培育 1 +基層 1 +基希涅夫 1 +基平 1 +基徹 1 +基數 1 +基石 1 +基頻 1 +堂堂正正 1 +堅城 1 +堅定 1 +堅尼地 1 +堅拒 1 +堅蜥 1 +堆填 1 +堆積 1 +堈 1 +堪憐 1 +堪稱 1 +堪薩斯 1 +報仇 1 +報刊 1 +報名 1 +報復 1 +報讀 1 +場內 1 +場均 1 +場景 1 +塑像 1 +塑料 1 +塑有 1 +塑膠 1 +塔利班 1 +塔台 1 +塔吉克 1 +塔塔爾 1 +塔夫茨 1 +塔林 1 +塔樓 1 +塔西佗 1 +塗黑 1 +塚 1 +塞古拉 1 +塞德爾恰尼 1 +塞普提米烏斯 1 +塞法迪 1 +塞爾達 1 +塞琉古 1 +塞琉西 1 +塞維利亞 1 +塞維魯 1 +塞維魯敉 1 +塞隆 1 +塞音 1 +塞馬 1 +墓葬 1 +墓頂 1 +墜入 1 +墜落 1 +增殖 1 +增生 1 +增祥 1 +增進 1 +增額 1 +墟 1 +墟內 1 +墨 1 +墨客 1 +墨色 1 +墳 1 +墾田 1 +壓 1 +壓縮 1 +壞球 1 +壩上 1 +壩下 1 +士珍 1 +士禛 1 +士評 1 +壯漢 1 +壯烈 1 +壹 1 +壺 1 +壺中仙 1 +壽命 1 +壽宴 1 +壽星 1 +夏威夷 1 +夏愨 1 +夏秋季 1 +夏至 1 +夏茸切哇 1 +夏茸穹哇 1 +夏荷林 1 +夏默 1 +外借 1 +外力 1 +外加 1 +外務 1 +外匯 1 +外地 1 +外壁 1 +外套 1 +外層 1 +外形 1 +外殼 1 +外甥 1 +外甥女 1 +外省 1 +外管 1 +外表 1 +外褂 1 +外訪 1 +外語 1 +外銷 1 +多倫 1 +多元 1 +多汁 1 +多納德 1 +多謝 1 +多雨 1 +夜夜 1 +夜戰 1 +夠大 1 +夢中 1 +夢境 1 +夢幻 1 +夢想 1 +夢雲 1 +夢鴿 1 +夥兒 1 +大不了 1 +大乘 1 +大事 1 +大二 1 +大儒 1 +大區 1 +大友 1 +大受 1 +大吉 1 +大名 1 +大君 1 +大和 1 +大喊 1 +大國 1 +大圍 1 +大城 1 +大堆 1 +大堤 1 +大增 1 +大士 1 +大失所望 1 +大島 1 +大嶼 1 +大幅 1 +大怒 1 +大悟 1 +大敵 1 +大新 1 +大校 1 +大概 1 +大正 1 +大殿 1 +大汗 1 +大河 1 +大洋 1 +大湖 1 +大溪 1 +大漠 1 +大獲 1 +大理 1 +大發 1 +大窘 1 +大紅 1 +大經 1 +大綱 1 +大腦 1 +大腸 1 +大膽 1 +大舉 1 +大艇 1 +大華 1 +大蒜 1 +大街小巷 1 +大跌 1 +大路 1 +大辦 1 +大通 1 +大進 1 +大郎 1 +大部 1 +大都 1 +大釗 1 +大銘 1 +大門 1 +大雄 1 +大韓 1 +大馬 1 +大驚 1 +大體 1 +大鬧 1 +大黨 1 +大鼠 1 +天份 1 +天佐 1 +天使 1 +天倫之樂 1 +天元 1 +天安 1 +天寶樓 1 +天差地遠 1 +天性 1 +天悅 1 +天慶 1 +天才 1 +天母 1 +天河 1 +天涯 1 +天球 1 +天祐 1 +天窗 1 +天紀 1 +天翔 1 +天賜 1 +天賦 1 +天馬 1 +太傅 1 +太元 1 +太冷 1 +太初 1 +太后 1 +太宗 1 +太宰 1 +太尉 1 +太常 1 +太極 1 +太湖 1 +太炎 1 +太監 1 +太行 1 +太近 1 +太遠 1 +太郎 1 +夫仇 1 +夫妻 1 +央行 1 +失利 1 +失地 1 +失效 1 +失職 1 +失能 1 +失落 1 +失誤 1 +失蹤 1 +夷昧 1 +夾 1 +夾狀 1 +奇俠 1 +奇幻 1 +奇怪 1 +奇缺 1 +奈葉 1 +奉 1 +奉命 1 +奉安 1 +奉律 1 +奉新 1 +奉系 1 +奎德林堡 1 +奏 1 +奏鳴 1 +奕 1 +奕詝 1 +套出 1 +套用 1 +奢華 1 +奧伊 1 +奧克尼 1 +奧克蘭 1 +奧古斯丁 1 +奧姆 1 +奧得 1 +奧托 1 +奧斯卡 1 +奧斯威爾 1 +奧斯汀 1 +奧林匹亞絲 1 +奧林匹斯 1 +奧格斯堡 1 +奧爾滕 1 +奧爾登堡 1 +奧爾良 1 +奧特 1 +奧特伊 1 +奧的斯 1 +奧米加 1 +奧羽 1 +奧蒂洛 1 +奪去 1 +奬懲 1 +女人 1 +女傭 1 +女僕 1 +女優 1 +女友 1 +女嬰 1 +女水 1 +女版 1 +女生 1 +女眷 1 +奴役 1 +奶爸 1 +奸 1 +她倆 1 +好上 1 +好奇 1 +好手 1 +好氧 1 +好色 1 +如數 1 +妄圖 1 +妊娠 1 +妖怪 1 +妙 1 +妮科爾 1 +妮綺 1 +妳 1 +妹 1 +妹夫 1 +妻妹 1 +妻姐 1 +妻室 1 +姊姊 1 +始發 1 +始祖 1 +始稱 1 +始興 1 +姑娘 1 +姑母 1 +委內瑞拉 1 +委身 1 +姚里 1 +姥姥 1 +姦情 1 +姪女 1 +姿色 1 +威 1 +威光 1 +威嚇 1 +威塞克斯 1 +威斯特米思從 1 +威格莫爾 1 +威權 1 +威爾伯 1 +威爾歇 1 +威特 1 +威舍 1 +威靈頓 1 +娘 1 +娘家 1 +娜塔莉 1 +婁 1 +婆 1 +婆羅 1 +婚 1 +婚事 1 +婚宴 1 +婚禮 1 +婢女 1 +婦 1 +婷婷 1 +媒介 1 +媚娘 1 +嫁與 1 +嫘縈 1 +嫣然 1 +嬰孩 1 +子孫 1 +子文 1 +子球 1 +子程 1 +孕育 1 +孕酮 1 +字喃 1 +字幕 1 +字模 1 +字號 1 +存世 1 +存取 1 +存放 1 +孝感 1 +孝次 1 +孟 1 +孟加拉 1 +孟德爾 1 +季後 1 +季惟 1 +季風 1 +季龍 1 +孤島 1 +孤芳自賞 1 +孤身 1 +孩提 1 +學到 1 +學前 1 +學家 1 +學府二道 1 +學業 1 +學民 1 +學津 1 +學社 1 +學聯 1 +學苑 1 +宇航 1 +守備 1 +守孝 1 +守文 1 +守法 1 +守臣 1 +守謙 1 +守齋 1 +安二郎 1 +安妮 1 +安安 1 +安岳 1 +安徒生 1 +安得拉 1 +安得拉邦 1 +安德魯 1 +安托瓦內特 1 +安撫 1 +安放 1 +安東 1 +安樂 1 +安正 1 +安民 1 +安汶 1 +安然 1 +安營 1 +安理 1 +安納 1 +安聯 1 +安葬 1 +安蘭 1 +安達信 1 +安那瑞安 1 +安那罕 1 +宋國 1 +完好 1 +完畢 1 +宏偉 1 +宏坤 1 +宏聲 1 +宏道 1 +宏量 1 +宗偉 1 +宗憲 1 +宗谷 1 +宗龍 1 +官兵 1 +官司 1 +官府 1 +官服 1 +官腔 1 +官話 1 +官邸 1 +官長 1 +宙域 1 +定位 1 +定價 1 +定向 1 +定影 1 +定性 1 +定案 1 +定理 1 +定量 1 +宛城 1 +宜興 1 +客場 1 +客家 1 +客觀 1 +客貨運 1 +客輪 1 +客量 1 +宣 1 +宣判 1 +宣化 1 +宣帝 1 +宣誓 1 +室外 1 +室溫 1 +宦官 1 +宮人 1 +宮崎 1 +宰李 1 +宴席 1 +宴會 1 +家光 1 +家勁 1 +家務 1 +家外 1 +家奴 1 +家干 1 +家用 1 +家立 1 +家道中落 1 +家驤 1 +容 1 +容器 1 +容忍 1 +容許 1 +容量 1 +宿敵 1 +宿根 1 +寄存 1 +寄送 1 +寅成 1 +密 1 +密山 1 +密文 1 +密歇根 1 +密西西比 1 +密集 1 +富商 1 +富恩特德奧羅 1 +富翁 1 +富蘭克林 1 +富裕 1 +富豪 1 +富貴 1 +富邦 1 +察合台 1 +察哈爾 1 +察沃 1 +寡尿 1 +實 1 +實則 1 +實屬 1 +實情 1 +實戰 1 +實收 1 +實權 1 +實況 1 +實踐 1 +寧波 1 +審批 1 +審理 1 +審計 1 +審評 1 +審議 1 +寫下 1 +寫信 1 +寫入 1 +寫出 1 +寫字 1 +寫成 1 +寫進 1 +寬容 1 +寬度 1 +寬敞 1 +寬條 1 +寬順 1 +寮國 1 +寵物 1 +寵臣 1 +寶光 1 +寶劍 1 +寶如 1 +寶應 1 +寶殿 1 +寶玉 1 +寶田 1 +寶血 1 +寶雞 1 +寶雲 1 +寶麗金 1 +寺前 1 +封土 1 +封為 1 +封爵 1 +封穴 1 +封號 1 +封裝 1 +封路 1 +射失 1 +射程 1 +射箭 1 +射線 1 +射鵰 1 +將來 1 +將領 1 +專 1 +專任 1 +專制 1 +專吃 1 +專指 1 +專政 1 +專機 1 +專橫 1 +專欄 1 +專權 1 +專款 1 +專注 1 +專線 1 +專註 1 +專賣 1 +專長 1 +專項 1 +尊崇 1 +尊敬 1 +尊稱 1 +尋回 1 +尋親 1 +對上 1 +對付 1 +對撞 1 +對準 1 +對照 1 +對生 1 +對白 1 +對稱 1 +對立 1 +對簿公堂 1 +對話 1 +對面 1 +對飛 1 +導 1 +導入 1 +導出 1 +導向 1 +導彈 1 +導播 1 +導正 1 +導體 1 +小人 1 +小兔 1 +小刀 1 +小南 1 +小國 1 +小小 1 +小島 1 +小息 1 +小數 1 +小書 1 +小欖 1 +小水鴨 1 +小河兒 1 +小津 1 +小浪底 1 +小澤 1 +小片 1 +小生 1 +小田急 1 +小知 1 +小石 1 +小童 1 +小舖 1 +小虎 1 +小街 1 +小輪 1 +小野 1 +小隊 1 +小順 1 +小顏 1 +小風 1 +小體 1 +少兒 1 +少將 1 +少年 1 +少懷 1 +少林 1 +少見 1 +少許 1 +少量 1 +尖端 1 +尖酸 1 +尖頂 1 +尚州 1 +尚德 1 +尚方 1 +尚書 1 +尤利烏斯 1 +尤勒 1 +尤指 1 +尤里卡 1 +就此 1 +就熟 1 +就職 1 +尷尬 1 +尹 1 +尹氏 1 +尼克貝 1 +尼古丁 1 +尼古拉 1 +尼奧爾德 1 +尼師今 1 +尼庫瑙 1 +尼歐斯 1 +尼比魯 1 +尼爾 1 +尼爾斯 1 +尼爾馬爾 1 +尾 1 +尾巴 1 +尾柄 1 +尾隨 1 +尾鰭 1 +尾龍 1 +局勢 1 +局間 1 +居家 1 +居所 1 +居留 1 +居禮 1 +屆滿 1 +屋 1 +屋大薇 1 +屋宇 1 +屋頂 1 +屍 1 +屍體 1 +屏山 1 +屏東 1 +屏風 1 +展品 1 +展望 1 +展貿 1 +屠村 1 +屠龍 1 +層壓 1 +層次 1 +層疊 1 +層級 1 +層面 1 +履行 1 +屬國 1 +屬於 1 +屬靈 1 +屯南 1 +山下 1 +山內 1 +山口 1 +山地 1 +山姆 1 +山峰 1 +山崖 1 +山手 1 +山月 1 +山村 1 +山楂 1 +山猿 1 +山田 1 +山胞 1 +山葉 1 +山陵 1 +山麓 1 +山龍眼 1 +岐女短 1 +岐阜 1 +岐陽 1 +岑 1 +岔江 1 +岡恩 1 +岡本 1 +岩屋 1 +岩心 1 +岩手 1 +岩漿 1 +岳 1 +岳泰 1 +岷江 1 +岸川 1 +岸賈 1 +岸邊 1 +峯崎 1 +峰倉 1 +峰景 1 +島內 1 +島國 1 +島蚺 1 +峽 1 +峽灣 1 +峽谷 1 +崇善 1 +崇尚 1 +崇敬 1 +崎頭 1 +崔 1 +崔陂 1 +崗 1 +崗斜 1 +崙頂 1 +崞縣 1 +崩坍 1 +崩潰 1 +嵩祝 1 +巔峰 1 +川南 1 +川村 1 +川邊 1 +州界 1 +州舞 1 +巡査 1 +巢 1 +工事 1 +工務 1 +工序 1 +工廠 1 +工會 1 +工法 1 +工潮 1 +左右神策軍 1 +左岸 1 +左拉 1 +左派 1 +左膀 1 +左轉 1 +巨作 1 +巨像 1 +巨冊 1 +巨型 1 +巨石 1 +巨賈 1 +巨野 1 +巫師 1 +差 1 +差分 1 +差別 1 +差勁 1 +差會 1 +己二胺 1 +己巳 1 +己酉 1 +已故 1 +已晚 1 +已死 1 +巴 1 +巴亞莫 1 +巴克 1 +巴克禮 1 +巴列姆 1 +巴列斯特爾 1 +巴卑爾 1 +巴喬 1 +巴城 1 +巴塞 1 +巴塞羅那 1 +巴塞隆拿 1 +巴塞隆納 1 +巴孛許諾 1 +巴巴克 1 +巴庫 1 +巴思缽 1 +巴恩斯 1 +巴拉克 1 +巴拉尼 1 +巴斯克 1 +巴斯德 1 +巴斯蒂亞 1 +巴比 1 +巴爾虎 1 +巴爾齊蒂斯 1 +巴納夫 1 +巴納巴 1 +巴羅爾 1 +巴英額 1 +巴莫鱷 1 +巴蒂斯塔 1 +巴西利卡 1 +巴西班讓 1 +巴諾 1 +巴賽 1 +巴赫 1 +巴頓 1 +市售 1 +市縣 1 +市轄 1 +市面 1 +布 1 +布伯 1 +布倫努斯 1 +布列塔尼 1 +布哈林 1 +布宜諾斯艾利斯 1 +布拉亞斯 1 +布拉德 1 +布政 1 +布料 1 +布林 1 +布氏奇非鯽 1 +布爾 1 +布置 1 +布萊姆 1 +布蘭特福德 1 +布蘭登堡 1 +布賴滕費爾德 1 +布里奇曼 1 +布里斯托 1 +布里斯班 1 +布雷克 1 +布雷西亞 1 +布魯克林 1 +布魯斯 1 +帆布 1 +帆船 1 +希伯來 1 +希克森 1 +希爾曼 1 +希特勒 1 +希皮奧內 1 +希鵬 1 +帕克 1 +帕內爾 1 +帕搏 1 +帕爾曼 1 +帕特羅克洛斯 1 +帕米爾 1 +帕納辛奈克斯 1 +帕納辛納克斯 1 +帕維亞 1 +帕薩迪納 1 +帕西奧利 1 +帕迪恩 1 +帕金森 1 +帝王 1 +帝都 1 +師團 1 +師徒 1 +師從 1 +師父 1 +師生 1 +席勒 1 +帳目 1 +帶上 1 +帶出 1 +帶子 1 +帶少 1 +帶水 1 +常住 1 +常勝 1 +常客 1 +常態 1 +常春 1 +常春藤 1 +常盛 1 +常識 1 +常量 1 +常青 1 +常駐 1 +幀 1 +幅 1 +幅員遼闊 1 +幕 1 +幕府 1 +幕後 1 +幢 1 +幣原 1 +幪面 1 +幫主 1 +干王 1 +平反 1 +平和 1 +平地 1 +平坦 1 +平帝 1 +平常 1 +平手 1 +平日 1 +平林 1 +平沼 1 +平滑 1 +平臺 1 +平行 1 +平陵 1 +平陽 1 +年中 1 +年份 1 +年幼 1 +年息 1 +年第 1 +年老 1 +年號 1 +年資 1 +年青 1 +并行 1 +幸一 1 +幸好 1 +幸運 1 +幹 1 +幹事 1 +幹掉 1 +幹流 1 +幹道 1 +幼子 1 +幼年 1 +幼弟 1 +幼發拉底 1 +幼稚 1 +幼貓 1 +幼魚 1 +幼鯨 1 +幼鳥 1 +幽閣 1 +幾內亞 1 +幾十 1 +幾千 1 +幾多 1 +幾百 1 +床 1 +床鋪 1 +底冊 1 +底格里斯 1 +底比斯 1 +底片 1 +底特律 1 +底稿 1 +底質 1 +店家 1 +庚戌 1 +府中 1 +府城 1 +府尹 1 +府第 1 +度宗 1 +座位 1 +座右 1 +座座 1 +座椅 1 +座西 1 +座談 1 +庫伊瓦 1 +庫伊瓦涅米 1 +庫哈斯 1 +庫柏力克 1 +庫欣 1 +庫爾特 1 +庫賽 1 +庫赫莫 1 +庫迪尼奧 1 +庫頁 1 +庭園 1 +庭薺 1 +庭長 1 +康乃狄克 1 +康史 1 +康奈爾 1 +康子 1 +康寧 1 +康樂 1 +康濟鼐 1 +康福 1 +康科德 1 +康羅伊 1 +廂 1 +廉潔 1 +廚師 1 +廝守 1 +廟倉 1 +廟方 1 +廟橋 1 +廟鎮 1 +廢棄 1 +廢熱 1 +廢舊 1 +廣受 1 +廣大 1 +廣大興 1 +廣權 1 +廣澳 1 +廣稱 1 +廣金 1 +廬山 1 +廳局 1 +廳長 1 +延安 1 +延年益壽 1 +延音 1 +廷和 1 +廷尉 1 +建好 1 +建威 1 +建市 1 +建御名方 1 +建御雷 1 +建構 1 +建武 1 +建置 1 +建華 1 +建超 1 +廿五 1 +廿六 1 +弄到 1 +弄清 1 +弊案 1 +式微 1 +弓尾 1 +弓弦 1 +弓箭 1 +引來 1 +引咎 1 +引導 1 +引江 1 +引渡 1 +引申 1 +引資 1 +弗拉格斯塔夫 1 +弗朗丹 1 +弗朗恰 1 +弗朗索 1 +弗朗西絲 1 +弗格森 1 +弗洛伊德 1 +弗特 1 +弗蘭克 1 +弗里德里希 1 +弗里施 1 +弗里茨 1 +弘 1 +弘前 1 +弘宣 1 +弭兵 1 +弱 1 +張家口 1 +張氏 1 +強勁 1 +強化 1 +強拍 1 +強暴 1 +強權 1 +強求 1 +強盜 1 +強迫 1 +強韌 1 +強項 1 +彈劾 1 +彈塗魚 1 +彈撥 1 +彈盡糧絕 1 +彌撒 1 +彌補 1 +彌賽亞 1 +彎曲 1 +彗差 1 +彗星 1 +彙編 1 +彝 1 +形像 1 +形同 1 +形體 1 +彥根 1 +彥直 1 +彩 1 +彩畫 1 +彩繪 1 +彩雲 1 +彩鳳 1 +彪馬 1 +彭劉楊 1 +彭博倫 1 +彭古魯 1 +彭定康 1 +彭拿路 1 +彰信 1 +影帝 1 +影線 1 +影評 1 +影迷 1 +影集 1 +影音 1 +彷彿 1 +役 1 +彼特 1 +往上 1 +往世 1 +往日 1 +征西 1 +待到 1 +很小 1 +很強 1 +很忙 1 +很懶 1 +很是 1 +很深 1 +很遠 1 +很重 1 +很長 1 +律定 1 +後世 1 +後代 1 +後勤 1 +後南 1 +後周 1 +後宮 1 +後庄 1 +後悔 1 +後援 1 +後梁 1 +後段 1 +後母 1 +後稱 1 +後續 1 +後置 1 +後藤 1 +後送 1 +後防 1 +後齒 1 +徒具 1 +徒手 1 +得克薩斯 1 +得心應手 1 +得悉 1 +得獎 1 +得益 1 +從來 1 +從句 1 +從周 1 +從善如流 1 +從政 1 +御史 1 +御墨 1 +御宅 1 +御窯 1 +復健 1 +復合 1 +復寫 1 +復甦 1 +循道 1 +微型 1 +微妙 1 +微小 1 +微波 1 +微粒 1 +微粒體 1 +微觀 1 +微量 1 +徵兆 1 +徵招 1 +徵祥 1 +德勝 1 +德國牧羊犬 1 +德妃 1 +德宏德特 1 +德富卡 1 +德干 1 +德愛 1 +德懷 1 +德拉瓦 1 +德文 1 +德比 1 +德江 1 +德爾 1 +德爾加多 1 +德爾斐 1 +德甲 1 +德高 1 +德魯茲 1 +徽 1 +徽章 1 +心境 1 +心宿 1 +心意 1 +心智 1 +心目 1 +心肌 1 +必和必拓 1 +必走 1 +必需 1 +忍心 1 +忍氣吞聲 1 +志 1 +志摩 1 +志明 1 +志道 1 +忘 1 +忘記 1 +忙 1 +忠 1 +忠於 1 +忠誠 1 +快上 1 +快捷 1 +快綫 1 +忽 1 +忽視 1 +怎 1 +怒 1 +怕 1 +思侯 1 +思成 1 +思維 1 +思考 1 +怡 1 +急劇 1 +急忙 1 +急救 1 +急於 1 +急流 1 +急症 1 +急行 1 +性向 1 +性命 1 +性情 1 +性腺 1 +怪 1 +怪圈 1 +怪聲 1 +恆 1 +恆大 1 +恆德 1 +恆河 1 +恐嚇 1 +恐懼 1 +恢豐 1 +恣意 1 +恤 1 +恨 1 +恩南伽 1 +恩慈 1 +恩秀 1 +恩贈 1 +恭子 1 +息率 1 +悉心 1 +悉達多 1 +悟到 1 +悟空 1 +患 1 +患得患失 1 +患病 1 +您 1 +悲傷 1 +悲劇 1 +悲嘆 1 +悲慘 1 +悲痛 1 +悲痛欲絕 1 +悲鴻 1 +悼念 1 +情 1 +情不自禁 1 +情人 1 +情勢 1 +情愁 1 +情愛 1 +情景 1 +情結 1 +情誼 1 +情資 1 +情陷 1 +情願 1 +惇曧 1 +惟 1 +惠亞 1 +惠梨香 1 +惠特蘭 1 +惡 1 +惡人 1 +惡化 1 +惡夢 1 +惡性 1 +惡搞 1 +惡臭 1 +惡靈 1 +惡魔 1 +想必 1 +想起 1 +愈加 1 +愈大 1 +愈高 1 +愉快 1 +意圖 1 +意念 1 +意料 1 +意甲 1 +意魔 1 +愙威 1 +愚園 1 +愚昧 1 +愛好 1 +愛娜 1 +愛娜茲薇 1 +愛思德 1 +愛恨 1 +愛意 1 +愛慕 1 +愛明內斯庫 1 +愛樂 1 +愛河 1 +愛莎尼亞 1 +愛迪生 1 +愛默生 1 +感冒 1 +感謝 1 +慈湖 1 +慈濟 1 +慌亂 1 +慎 1 +慎太郎 1 +慕容 1 +慕肯 1 +慘叫 1 +慘重 1 +慚愧 1 +慢行 1 +慢駛 1 +慧嫻 1 +慰安 1 +慶 1 +慶典 1 +慶曆 1 +慶貽 1 +慶黎 1 +慷慨 1 +憂 1 +憂憤 1 +憲政 1 +憲民 1 +憲法 1 +憶蓮 1 +懂 1 +應付 1 +應允 1 +應屆 1 +應戰 1 +應昌 1 +應當 1 +應許 1 +應邀 1 +懲罰 1 +懶爪龍 1 +懷 1 +懷仁 1 +懷克里夫 1 +懷念 1 +懷慶 1 +懷抱 1 +懷水 1 +懷聖 1 +懸掛 1 +懼高 1 +懿 1 +戀人 1 +戀屍 1 +戀童 1 +戈德曼 1 +戈爾 1 +戈登 1 +戈矛 1 +戈蘭 1 +成事 1 +成仁 1 +成化 1 +成名 1 +成品 1 +成套 1 +成對 1 +成形 1 +成梁 1 +成行 1 +成語 1 +我國 1 +截 1 +截然不同 1 +截至 1 +截頜鯉 1 +戰事 1 +戰力 1 +戰勝 1 +戰地 1 +戰平 1 +戰情 1 +戰船 1 +戲子 1 +戲曲 1 +戲法 1 +戲碼 1 +戲謔 1 +戲院 1 +戴上 1 +戴克里先 1 +戴斯德 1 +戴爾馬 1 +戴維斯 1 +戴蒙 1 +戴頓 1 +戶田 1 +戶籍 1 +房東 1 +所為 1 +所長 1 +手上 1 +手工 1 +手感 1 +手抄 1 +手指 1 +手提 1 +手槍 1 +手稿 1 +手筆 1 +手腳 1 +手邊 1 +手風 1 +才子 1 +才是 1 +才智 1 +扎什倫布 1 +打亂 1 +打人 1 +打包 1 +打撈 1 +打死 1 +打水 1 +打牌 1 +打碎 1 +打造 1 +打響 1 +扔出 1 +托倫 1 +托加下 1 +托洛洛 1 +托盤 1 +托米 1 +托茂 1 +扣上 1 +批次 1 +扼止 1 +找來 1 +找續 1 +承天 1 +承德 1 +承接 1 +承斌 1 +承租 1 +技師 1 +技戰術 1 +技法 1 +抑制 1 +抑鬱 1 +抒解 1 +抓到 1 +投交 1 +投奔 1 +投標 1 +投球 1 +投身 1 +投靠 1 +抗大 1 +抗拒 1 +抗衡 1 +抗體 1 +折射 1 +折斷 1 +折衷 1 +抨擊 1 +披覆 1 +披頭士 1 +抬昇 1 +抱 1 +抱持 1 +抵受 1 +抵禦 1 +押韻 1 +抽檢 1 +抽煙 1 +抽象 1 +抽走 1 +拆分 1 +拆卸 1 +拆掉 1 +拆遷 1 +拉 1 +拉什沃思 1 +拉卜楞 1 +拉塞爾 1 +拉多加 1 +拉奏 1 +拉姆齊 1 +拉差諾 1 +拉布 1 +拉彼魯茲 1 +拉日色布 1 +拉林 1 +拉森 1 +拉爾夫 1 +拉特蘭 1 +拉珀斯維爾 1 +拉瑙 1 +拉籌伯 1 +拉美西斯 1 +拉薩 1 +拉西拉 1 +拉赫曼尼諾夫 1 +拋棄 1 +拋物 1 +拍 1 +拍照 1 +拍賣 1 +拒不 1 +拓務 1 +拓建 1 +拓撲 1 +拔刀 1 +拖進 1 +拖鞋 1 +拙劣 1 +招 1 +招潮蟹 1 +招生 1 +招聘 1 +招降 1 +拜仁慕尼黑 1 +拜拜 1 +括弧 1 +拱廊 1 +拱橋 1 +拳一 1 +拳擊 1 +拳賽 1 +拷問 1 +拼寫 1 +拾糞 1 +拿來 1 +持久 1 +持球 1 +指使 1 +指標 1 +指派 1 +指稱 1 +指責 1 +挑選 1 +挖 1 +挖子 1 +挖掘 1 +挪動 1 +挪用 1 +振 1 +振動 1 +振幅 1 +振林 1 +挹江 1 +挺身而出 1 +挽回 1 +挾持 1 +捉弄 1 +捉拿 1 +捉襟見肘 1 +捍衛 1 +捐 1 +捐款 1 +捐獻 1 +捕撈 1 +捕殺 1 +捕獵 1 +捕魚 1 +捕鼠 1 +捲入 1 +捷徑 1 +授勳 1 +授意 1 +授權 1 +授與 1 +掉頭 1 +掌 1 +掌控 1 +掌摑 1 +掌權 1 +掌鏡 1 +排場 1 +排外 1 +排序 1 +掙扎 1 +掛 1 +掛果 1 +掛牌 1 +掛鉤 1 +掠奪 1 +採 1 +採信 1 +採摘 1 +採樣 1 +採納 1 +採購 1 +採集 1 +採食 1 +探明 1 +探望 1 +探求 1 +探究 1 +探險 1 +接到 1 +接力 1 +接班 1 +接納 1 +接聽 1 +接見 1 +接辦 1 +接送 1 +接連 1 +控告 1 +控訴 1 +推介 1 +推免生 1 +推前 1 +推力 1 +推導 1 +推斷 1 +推測 1 +推演 1 +推特 1 +推理 1 +推舉 1 +推論 1 +推遲 1 +掩 1 +掩蓋 1 +描摹 1 +描繪 1 +提前 1 +提問 1 +提子 1 +提康德羅加 1 +提拔 1 +提攜 1 +提昇 1 +提煉 1 +提督 1 +提籃 1 +提醒 1 +插手 1 +插曲 1 +揚言 1 +換成 1 +換算 1 +握帶 1 +握持 1 +揭曉 1 +揭發 1 +揭開 1 +揮舞 1 +援 1 +援助 1 +援外 1 +援引 1 +援手 1 +援救 1 +搜尋 1 +搜狐 1 +搜羅 1 +搜集 1 +搞垮 1 +搞錯 1 +搬動 1 +搬往 1 +搬移 1 +搬遷 1 +搭乘 1 +搭配 1 +搶 1 +搶先 1 +搶劫 1 +搶奪 1 +搶救 1 +摒棄 1 +摔 1 +摘下 1 +摘星 1 +摘錄 1 +摧毀 1 +摩加迪沙 1 +摩天 1 +摩崖 1 +摩托 1 +摩擦 1 +摩爾多瓦 1 +摩登 1 +摩納哥 1 +摩西 1 +摯友 1 +摸摸 1 +撒拉 1 +撒營盤 1 +撞入 1 +撞死 1 +撤回 1 +撤職 1 +撤退 1 +撤除 1 +撥 1 +撥出 1 +撥號 1 +撫養 1 +播種 1 +撮合 1 +撰述 1 +撲克 1 +撿 1 +撿起 1 +擁 1 +擁堵 1 +擁戴 1 +擁擠 1 +擁護 1 +擂台 1 +擊中 1 +擊劍 1 +擊斃 1 +擊毀 1 +擊潰 1 +擊破 1 +擋住 1 +操 1 +操控 1 +操縱 1 +擒拿 1 +擔憂 1 +擔竿 1 +擔綱 1 +據傳 1 +據此 1 +據稱 1 +據點 1 +擠塞 1 +擠壓 1 +擠奶 1 +擠眉弄眼 1 +擠迫 1 +擢升 1 +擬 1 +擬桿菌 1 +擬訂 1 +擬議 1 +擴散 1 +擴編 1 +擺弄 1 +擺渡 1 +擾亂 1 +攀爬 1 +攔截 1 +攝像 1 +攝取 1 +攪拌 1 +支取 1 +支廳 1 +支派 1 +支那 1 +支隊 1 +收場 1 +收容 1 +收市 1 +收支 1 +收生 1 +收益 1 +收租 1 +收緊 1 +收聽 1 +收買 1 +收費 1 +收養 1 +攸之 1 +改作 1 +改屬 1 +改投 1 +改採 1 +改換 1 +改派 1 +改發 1 +改穿 1 +改組 1 +改選 1 +改隸 1 +攻下 1 +攻勢 1 +攻堅 1 +攻方 1 +攻殺 1 +攻訐 1 +攻讀 1 +放任 1 +放入 1 +放出 1 +放到 1 +放大 1 +放榜 1 +放牧 1 +放緩 1 +放送 1 +放逐 1 +放開 1 +放鬆 1 +政團 1 +政委 1 +政局 1 +政廳 1 +政敵 1 +政樞 1 +政法 1 +政爭 1 +政界 1 +故郷 1 +效尤 1 +效能 1 +敏銳 1 +救人 1 +救出 1 +救助 1 +救國 1 +救援 1 +救星 1 +救災 1 +救生 1 +救贖 1 +敕 1 +敕令 1 +敕書 1 +敗 1 +敗局 1 +敗死 1 +敗瓦 1 +敗退 1 +教務 1 +教士 1 +教室 1 +教席 1 +教材 1 +教案 1 +教科 1 +教籍 1 +教總 1 +教義 1 +教職員 1 +散射 1 +敦 1 +敦煌 1 +敬仰 1 +敬堯 1 +敬請 1 +敲擊 1 +敲訂 1 +整 1 +整塊 1 +整所 1 +整架 1 +整片 1 +整篇 1 +整軍 1 +整顆 1 +整齊 1 +敵兵 1 +敵方 1 +數以千計 1 +數值 1 +數十億 1 +數十萬 1 +數澤 1 +數百 1 +數碼 1 +數萬 1 +數論 1 +文哲 1 +文姬 1 +文岳 1 +文巨 1 +文德 1 +文摘 1 +文政 1 +文書 1 +文本 1 +文楷 1 +文武 1 +文法 1 +文清 1 +文職 1 +文賢 1 +文集 1 +文飾曲口魚 1 +文體 1 +文體教 1 +斑塊 1 +斑點 1 +斗貴子 1 +料 1 +斜 1 +斜坡 1 +斥教 1 +斬落 1 +斯佩克特 1 +斯凱勒 1 +斯哥特 1 +斯坦利 1 +斯坦福 1 +斯坦頓 1 +斯基龍 1 +斯塔茨門 1 +斯尼夫魯 1 +斯德哥爾摩 1 +斯托克 1 +斯氏亞冠龍 1 +斯洛伐克 1 +斯洛特 1 +斯特奇斯 1 +斯特萊默 1 +斯瓦爾恩 1 +斯科特 1 +斯維亞托斯拉夫 1 +斯里賽拉姆古德姆德瓦斯塔納姆 1 +新任 1 +新修 1 +新址 1 +新埔 1 +新太郎 1 +新奧爾良 1 +新字 1 +新寧 1 +新屋 1 +新巴 1 +新思 1 +新昌 1 +新明 1 +新春 1 +新月 1 +新核 1 +新榮 1 +新民 1 +新浪 1 +新版 1 +新生 1 +新秀 1 +新篇 1 +新編 1 +新罕布夏 1 +新罕布希爾 1 +新義 1 +新舊 1 +新製 1 +新開 1 +新飛 1 +新馬 1 +新高 1 +新鴻基 1 +新黨 1 +斷後 1 +斷盡 1 +斷言 1 +方丈 1 +方尖 1 +方正 1 +方田 1 +方石 1 +方程 1 +方蓋 1 +方蟹 1 +於維西 1 +施奈德 1 +施文 1 +施瓦本 1 +施用 1 +施韋比施哈爾 1 +旅 1 +旅居 1 +旅程 1 +旋渦 1 +旋轉 1 +族雄 1 +族頭 1 +旗艦 1 +旗面 1 +既得 1 +既是 1 +既然 1 +日出 1 +日向 1 +日夜 1 +日子 1 +日日 1 +日照 1 +日用 1 +日落 1 +日誌 1 +日賜 1 +旦增 1 +早有 1 +早餐 1 +旭 1 +旱災 1 +旻寧 1 +昆丁 1 +昆蟲 1 +昌吉 1 +昌都 1 +明中 1 +明亞 1 +明亮 1 +明代 1 +明宗 1 +明尼蘇達 1 +明憲 1 +明昌 1 +明智 1 +明正 1 +明潭 1 +明白 1 +明碁 1 +明視 1 +易卜拉欣 1 +易守 1 +易幟 1 +易斯 1 +易水 1 +易燃 1 +易經 1 +昔蘭尼 1 +星團 1 +星塵 1 +星展 1 +星崎 1 +星系 1 +映像 1 +春 1 +春丕 1 +春季 1 +春日井 1 +春會 1 +春田 1 +春節 1 +春緋 1 +春耕 1 +昨日 1 +昭侯 1 +昭儀 1 +昭宗 1 +昭禮 1 +昭通 1 +是年 1 +是方 1 +是次 1 +時事 1 +時份 1 +時值 1 +時光 1 +時刻 1 +時報 1 +時弊 1 +時稱 1 +時舉 1 +時針 1 +晃動 1 +晉 1 +晉北 1 +晉哲 1 +晉江 1 +晉級 1 +晒乾 1 +晨間 1 +普世 1 +普什圖 1 +普伊瑪諾娃 1 +普利茅斯 1 +普朗克 1 +普爾塔龍 1 +景泰 1 +晴神 1 +晶 1 +晶瑩 1 +晶閘 1 +智伯 1 +智利 1 +智趣 1 +暑期 1 +暖 1 +暗中 1 +暗喻 1 +暗影 1 +暗房 1 +暗指 1 +暗礁 1 +暗紅 1 +暗號 1 +暫 1 +暫別 1 +暫無 1 +暮光 1 +暱稱 1 +暴亂 1 +暴斂 1 +暴死 1 +暴風雪 1 +暹羅 1 +曄之 1 +曉彬 1 +曉得 1 +曉聲 1 +曉舟 1 +曖昧 1 +曬相 1 +曬衣 1 +曲張 1 +曲率 1 +曲目 1 +曲線 1 +曲藝 1 +曲阜 1 +曲頜形翼龍 1 +更低 1 +更佳 1 +更大 1 +更審 1 +更小 1 +更強 1 +更快 1 +更新世 1 +更是 1 +更硬 1 +更衣 1 +更輕 1 +更長 1 +曷懶甸 1 +書本 1 +書裡 1 +書迷 1 +書面 1 +書香世家 1 +曹家 1 +曹甸 1 +曹記 1 +曼切華 1 +曼哈頓 1 +曼城 1 +曼寧 1 +曼徹斯特 1 +曼成 1 +曼斯菲爾德 1 +曼海姆 1 +曼涅托 1 +曼玉 1 +曼科 1 +曾任 1 +曾孫 1 +曾愛 1 +曾祖父母 1 +替人 1 +最內 1 +最前 1 +最受 1 +最外 1 +最強 1 +最旺 1 +最最 1 +最末 1 +最東 1 +最純 1 +最遠 1 +會上 1 +會址 1 +會師 1 +會戰 1 +會所 1 +會晤 1 +會章 1 +會見 1 +會計 1 +月色 1 +月薪 1 +有份 1 +有別 1 +有力 1 +有名 1 +有愛 1 +有方 1 +有期 1 +有染 1 +有條不紊 1 +有異 1 +有病 1 +有稱 1 +有花 1 +有點 1 +服刑 1 +朔 1 +朗豪 1 +朗頓 1 +望族 1 +朝下 1 +朝元 1 +朝政 1 +朝散 1 +朝東 1 +朝聖 1 +朝覲 1 +朝貢 1 +朝陽 1 +期刊 1 +木中 1 +木乃伊 1 +木刻 1 +木卡姆 1 +木城 1 +木尼 1 +木屋 1 +木工 1 +木戶 1 +木斯塘 1 +木村 1 +木櫾 1 +木蘭 1 +木造 1 +未入 1 +未敢 1 +未有 1 +未深 1 +未滿 1 +末端 1 +本劇 1 +本名 1 +本城 1 +本始 1 +本季 1 +本島 1 +本市 1 +本德 1 +本書 1 +本營 1 +本目 1 +本省 1 +本社 1 +本縣 1 +本能 1 +本著 1 +本郡 1 +本部 1 +本鄉 1 +本集 1 +本領 1 +札幌 1 +朱里 1 +朴次茅斯 1 +杉並 1 +李察 1 +杏子 1 +材 1 +材官 1 +材質 1 +村旁 1 +杖責 1 +杜乃爾 1 +杜伊 1 +杜利華 1 +杜成 1 +杜浦 1 +杜甫 1 +杜蘭戈維多利亞 1 +杜隆坦 1 +束 1 +杯賽 1 +杰仔 1 +東主 1 +東加 1 +東勝 1 +東南亞 1 +東坡 1 +東姑 1 +東宮 1 +東尼 1 +東岸 1 +東巡 1 +東急 1 +東支 1 +東昇 1 +東映 1 +東桑 1 +東條 1 +東武 1 +東涌 1 +東渡 1 +東直 1 +東站 1 +東興 1 +東華 1 +東西向 1 +東距 1 +東道 1 +東邊 1 +東郊 1 +東鄉 1 +東鐵 1 +東隧 1 +東風 1 +松下 1 +松坂 1 +松山 1 +松島 1 +松州 1 +松翔 1 +松花 1 +松鼠 1 +板 1 +板式 1 +林克 1 +林地 1 +林場 1 +林業 1 +林檎 1 +林翼 1 +林胡 1 +果然 1 +果真 1 +果酒 1 +枝葉 1 +架次 1 +枸杞 1 +柏 1 +柏加 1 +柏村 1 +柏松 1 +柏臣 1 +染手 1 +染病 1 +柔道 1 +柚木 1 +柝聲 1 +查找 1 +查普曼 1 +查氏 1 +查爾頓 1 +查理曼 1 +柬 1 +柬埔寨 1 +柯克伍德 1 +柯林斯 1 +柯爾 1 +柯爾克孜 1 +柯爾貝爾 1 +柱銘 1 +柳川 1 +柳州 1 +柳德米拉 1 +柳葉魚 1 +柴電 1 +柿本 1 +栗橋 1 +校呔 1 +校簿 1 +校門 1 +栩栩如生 1 +株 1 +株式 1 +核孔 1 +核實 1 +核工 1 +核彈 1 +核發 1 +核研 1 +核算 1 +根 1 +根培烏孜 1 +根深柢固 1 +根生 1 +根莖 1 +根部 1 +格丁尼亞 1 +格仔 1 +格但斯克 1 +格來 1 +格勞庇烏 1 +格勞賓登 1 +格奧爾格 1 +格子 1 +格式塔 1 +格拉博夫斯基 1 +格拉漢姆 1 +格林威治 1 +格林布希 1 +格羅先 1 +格羅夫納 1 +格羅希 1 +格蘭特 1 +格陵蘭 1 +格魯 1 +格魯瓊茲與姆瓦瓦 1 +桂陵 1 +桃子 1 +框架 1 +框線 1 +案例 1 +案達羅 1 +桐生 1 +桑德威斯狸藻 1 +桑托斯 1 +桓子 1 +桓玄 1 +梁贊諾夫 1 +梁龍 1 +梅園 1 +梅塔 1 +梅塔拉 1 +梅帕器 1 +梅里納 1 +梓里 1 +條款 1 +條紋 1 +梧州 1 +梨花 1 +梭羅 1 +梯隊 1 +梳 1 +梳頜翼龍 1 +梵安 1 +棉條 1 +棋局 1 +棋盤 1 +棋聖 1 +棋院 1 +棋類 1 +棒 1 +棒錘樹 1 +棕色 1 +棕褐 1 +森德靈 1 +棲地 1 +棲身 1 +棵 1 +植株 1 +椎名 1 +椰林 1 +楓樹 1 +楚克 1 +楚瑜 1 +楚紅 1 +楠桂 1 +楠溪 1 +業主 1 +業餘 1 +極北 1 +極區 1 +極少 1 +極為 1 +極矮 1 +極長 1 +極闊 1 +極限 1 +楷書 1 +楷模 1 +概要 1 +榆林 1 +榔頭 1 +榕樹 1 +榜羅 1 +榨出 1 +榫眼 1 +榮廷 1 +榮洲 1 +榮茂 1 +榴彈 1 +構思 1 +構造 1 +槍尖 1 +槍尾 1 +槍殺 1 +槍術 1 +槳 1 +樂園 1 +樂安 1 +樂官 1 +樂山 1 +樂師 1 +樂手 1 +樂敏錠 1 +樂樂 1 +樂活 1 +樂翠 1 +樂觀 1 +樂趣 1 +樓宇 1 +樓層 1 +樓底 1 +樓煩 1 +樓盤 1 +樓面 1 +樓高 1 +標 1 +標售 1 +標志 1 +標明 1 +標有 1 +標示 1 +標籤 1 +標記 1 +標註 1 +標高 1 +樞密 1 +模里西斯 1 +樣 1 +樣品 1 +樣式 1 +樣貌 1 +樸實 1 +樹上 1 +樹幹 1 +樹枝 1 +橈腳 1 +橋上 1 +橋樑 1 +橋面 1 +機上 1 +機位 1 +機型 1 +機密 1 +機師 1 +機床 1 +機敏 1 +機械 1 +機理 1 +機種 1 +機能 1 +機製 1 +機遇 1 +橡樹 1 +橡樹龍 1 +橢 1 +橫 1 +橫帶 1 +橫徵 1 +橫渡 1 +橫線 1 +檔案 1 +檔次 1 +檜山 1 +檢驗 1 +檨仔林 1 +檳榔 1 +檸七 1 +櫃 1 +櫃檯 1 +櫟社 1 +欄目 1 +權氏 1 +權限 1 +次席 1 +次月 1 +次生 1 +次程 1 +欣快 1 +欺 1 +欽 1 +款式 1 +歆 1 +歌人 1 +歌壇 1 +歌星 1 +歌舞 1 +歌詞 1 +歌頌 1 +歐律狄刻 1 +歐斯巴特 1 +歐盟 1 +歐羅巴 1 +歐青 1 +歐麥爾 1 +歡 1 +歡慶 1 +歡樂 1 +正值 1 +正傳 1 +正夫 1 +正子 1 +正宇 1 +正巧 1 +正平 1 +正比 1 +正派 1 +正版 1 +正當 1 +正經 1 +正負粒子 1 +正配 1 +正陽 1 +此事 1 +此地 1 +此夢 1 +此書 1 +此樓 1 +此橋 1 +此片 1 +此處 1 +此語 1 +此起彼落 1 +此路 1 +此陵 1 +此項 1 +此魚 1 +步伐 1 +步蟾 1 +步行 1 +步驟 1 +武克希 1 +武力 1 +武威 1 +武帝 1 +武廟 1 +武廠 1 +武德 1 +武打 1 +武王 1 +武略 1 +武皇 1 +武者 1 +武藏 1 +歩 1 +歲月 1 +歷代 1 +歷來 1 +歷屬 1 +歷程 1 +歸來 1 +歸入 1 +歸到 1 +歸功 1 +歸咎 1 +歸案 1 +歸還 1 +歸附 1 +死刑 1 +死因 1 +死地 1 +死戰 1 +死期 1 +死板 1 +死狀 1 +死而復生 1 +死黨 1 +殉教 1 +殉爆 1 +殉職 1 +殊榮 1 +殘疾 1 +殘破 1 +殘遺 1 +殘部 1 +殲滅 1 +殺人 1 +殺手 1 +殺機 1 +殼層 1 +殼體 1 +殿堂 1 +毀壞 1 +毀容 1 +毅 1 +毅仁 1 +毅然 1 +母會 1 +母校 1 +母狼 1 +母猴 1 +母艦 1 +母語 1 +母貓 1 +毎年 1 +每元 1 +每座 1 +每戶 1 +每所 1 +每枚 1 +每每 1 +每股 1 +每邊 1 +每集 1 +每鼎 1 +毒​​物 1 +毒品 1 +毒死 1 +毒癮 1 +毒舌 1 +毓林 1 +毓楓 1 +毓芳 1 +比亞迪 1 +比亞韋斯托克 1 +比利 1 +比利牛斯 1 +比哈爾 1 +比喻 1 +比得哥什 1 +比方 1 +比武 1 +比薩 1 +比袍 1 +比褂 1 +毛色 1 +毛髮 1 +毫安 1 +毫無 1 +毯子 1 +氈幕 1 +民事 1 +民俗 1 +民力 1 +民居 1 +民工 1 +民心 1 +民意 1 +民房 1 +民柬 1 +民權 1 +民法 1 +民盟 1 +民答那峨 1 +民航 1 +民英 1 +民謠 1 +民豐 1 +民選 1 +民鐸 1 +民防 1 +氘 1 +氚 1 +氣息 1 +氣態 1 +氣憤 1 +氣旋 1 +氣槍 1 +氣死 1 +氣溫 1 +氣燄 1 +氣胸 1 +氣象 1 +氦 1 +氧化鐵 1 +氨基酸 1 +氫 1 +氫化氦 1 +氫氣 1 +氫鍵 1 +氮 1 +氮素 1 +氯化 1 +氯化氫 1 +氯化銠 1 +氯化鋁 1 +氯雷他定 1 +水世 1 +水份 1 +水圈 1 +水壓 1 +水床 1 +水扁 1 +水攻 1 +水晶 1 +水汽 1 +水流 1 +水火不容 1 +水球 1 +水產 1 +水療 1 +水翼 1 +水能 1 +水警 1 +水面 1 +水鳥 1 +永久 1 +永元 1 +永升 1 +永吉 1 +永和 1 +永壽 1 +永平 1 +永成 1 +永昌 1 +永樂 1 +永樂環 1 +永權 1 +永續 1 +永輝 1 +永靖 1 +汁液 1 +求 1 +求偶 1 +求出 1 +求助 1 +求問 1 +求婚 1 +求情 1 +求援 1 +求籤 1 +求醫 1 +汝寧 1 +汞柱 1 +江協 1 +江口 1 +江浙 1 +江海 1 +江源 1 +江漢 1 +江灣 1 +江谷 1 +江都 1 +江閣 1 +江魚 1 +池塘 1 +池田 1 +污損 1 +污點 1 +汪 1 +汪達 1 +汪達爾 1 +汲及 1 +決意 1 +決擇 1 +決然 1 +決裂 1 +汽油 1 +汽船 1 +沃奎茲 1 +沃季采 1 +沃州 1 +沃思 1 +沃斯托克 1 +沃爾 1 +沃羅涅日 1 +沈氏 1 +沉水 1 +沉迷 1 +沉重 1 +沉降 1 +沒能 1 +沒落 1 +沒錯 1 +沖之 1 +沖片 1 +沖走 1 +沙丘 1 +沙依 1 +沙崙 1 +沙巴 1 +沙普爾 1 +沙梁伐 1 +沙池 1 +沙洛蒙 1 +沙漠 1 +沙瓦納 1 +沙田 1 +沙畹 1 +沙蠶 1 +沙迦罕 1 +沙邦 1 +沙里亞 1 +河卡 1 +河圖 1 +河岸 1 +河心 1 +河段 1 +河漫 1 +河西 1 +油煙 1 +油田 1 +油菜 1 +油量 1 +油電 1 +治中 1 +治勲 1 +治勳 1 +治喪 1 +治國 1 +治學 1 +治水 1 +治理 1 +治軍 1 +沼 1 +沽渚 1 +沾解 1 +沿 1 +沿線 1 +沿襲 1 +沿途 1 +泉 1 +法令 1 +法師 1 +法拉利 1 +法拉龍 1 +法政 1 +法斯塔夫 1 +法格拿 1 +法比恩 1 +法海 1 +法登 1 +法羅 1 +法老 1 +法蘭克尼亞 1 +法西斯 1 +法輪 1 +泛濫 1 +泠 1 +波包 1 +波卡特洛 1 +波及 1 +波因 1 +波圖 1 +波城 1 +波塞冬 1 +波形 1 +波恩 1 +波折 1 +波普 1 +波森 1 +波爾 1 +波特威瑟 1 +波特蘭 1 +波瓦坦 1 +波的尼亞 1 +波西斯 1 +波錠 1 +波黑 1 +泥土 1 +泥潭 1 +注資 1 +泰 1 +泰共 1 +泰勒 1 +泰北 1 +泰始 1 +泰姬 1 +泰姬瑪哈 1 +泰州 1 +泰曾 1 +泰然 1 +泰琳達 1 +泰米爾納德 1 +泰興 1 +泳屋 1 +泳灘 1 +洋介 1 +洗劫 1 +洗衣 1 +洛佩斯 1 +洛加尼斯 1 +洛城 1 +洛夫喬伊 1 +洛夫森 1 +洛布尼亞 1 +洛恩 1 +洛書 1 +洛珊 1 +洛維爾 1 +洛茲 1 +洛雷托 1 +洞子 1 +洞穴 1 +洞窟 1 +津 1 +津貼 1 +洩慾 1 +洩漏 1 +洪堡 1 +洪家 1 +洪橋 1 +洵 1 +洵美 1 +活出 1 +活化 1 +活埋 1 +活水 1 +活潑 1 +活用 1 +活躍 1 +活靈活現 1 +派對 1 +派往 1 +流 1 +流下 1 +流亡 1 +流入 1 +流出 1 +流嶼 1 +流放 1 +流星 1 +流標 1 +流民 1 +流水 1 +流浪 1 +流產 1 +流程 1 +流言 1 +流逝 1 +流露 1 +浚稽 1 +浦市 1 +浦那 1 +浦鎮 1 +浪 1 +浪漫 1 +浪潮 1 +浪費 1 +浪跡 1 +浮 1 +浮動 1 +浴場 1 +海事 1 +海光 1 +海因茨 1 +海地 1 +海峰 1 +海布隆 1 +海平 1 +海廷 1 +海德克 1 +海怡 1 +海昌 1 +海景 1 +海淀 1 +海港 1 +海濱 1 +海灘 1 +海爾賽 1 +海神 1 +海秀 1 +海老名 1 +海航 1 +海藍 1 +海螺 1 +海豐 1 +海陸 1 +海風 1 +海鷗 1 +浸染 1 +浸泡 1 +涅爾皮奇耶 1 +涇波 1 +涇陽 1 +消極 1 +消耗 1 +消退 1 +消除 1 +涉世 1 +涉嫌 1 +涉足 1 +涪江 1 +涮煮 1 +液 1 +液化 1 +液壓 1 +涵蓋 1 +淄川 1 +淑妃 1 +淑怡 1 +淘寶 1 +淘金 1 +淡 1 +淡定 1 +淡色 1 +淨土 1 +淪 1 +淪落 1 +淪陷 1 +淫蕩 1 +淮南 1 +淮許 1 +深受 1 +深埋 1 +深層 1 +深度 1 +深感 1 +深有 1 +深海 1 +深港 1 +深溪 1 +深紅 1 +深綠 1 +深色 1 +深處 1 +深造 1 +淵源 1 +混 1 +混亂 1 +混凝 1 +混沌 1 +混為一談 1 +混燃 1 +淹浸 1 +淺 1 +淺水 1 +淺綠 1 +添丁 1 +清償 1 +清凈 1 +清單 1 +清帝 1 +清拆 1 +清教 1 +清文 1 +清明 1 +清潔 1 +清理 1 +清道 1 +清遠 1 +清還 1 +清鄉 1 +減低 1 +減刑 1 +減小 1 +減退 1 +渠子 1 +渡 1 +渣打 1 +渤海 1 +測繪 1 +渭州 1 +港交 1 +港區 1 +港府 1 +渴求 1 +游 1 +游標 1 +游說 1 +渾 1 +湄洲 1 +湖上 1 +湖人 1 +湖名 1 +湖畔 1 +湘南 1 +湘西 1 +湘陰 1 +湛恩 1 +湧現 1 +湮滅 1 +湯姆萊利 1 +湯料 1 +源於 1 +源田 1 +準 1 +準基 1 +準將 1 +準確 1 +溝 1 +溝壑 1 +溝齒鼩 1 +溢漏 1 +溪 1 +溪水 1 +溪美 1 +溪鱂 1 +溫哥華 1 +溫坡 1 +溫布萊 1 +溫布頓 1 +溫徹斯特 1 +溫斯頓 1 +溫柔 1 +溫特夸特斯 1 +溫特斯 1 +溶劑 1 +溶氣 1 +滅 1 +滑板 1 +滑稽 1 +滑鼠 1 +滕氏 1 +滙業 1 +滬江 1 +滯洪 1 +滲出 1 +滴下 1 +滾動 1 +滾石 1 +滿意 1 +滿清 1 +滿載 1 +漁村 1 +漁梁 1 +漁船 1 +漂浮 1 +漆器 1 +演 1 +演成 1 +演戲 1 +演技 1 +演繹 1 +演義 1 +演講 1 +漢中 1 +漢娜 1 +漢字 1 +漢桓 1 +漫漶 1 +漫長 1 +漬 1 +漱芳 1 +漲幅 1 +漸變 1 +漸趨 1 +潑 1 +潔瑩 1 +潘丘 1 +潘恩 1 +潛伏 1 +潛力 1 +潛望 1 +潛水 1 +潛游 1 +潟湖 1 +潢川 1 +潭村 1 +潭東 1 +潭陽 1 +潰散 1 +澀谷 1 +澤尻 1 +激勵 1 +激發 1 +激素 1 +激進 1 +濁 1 +濃 1 +濃厚 1 +濃煙 1 +濕地 1 +濟 1 +濟世 1 +濟科 1 +濟邦 1 +濤 1 +濫用 1 +濱海 1 +濾掉 1 +瀏陽 1 +瀕危 1 +瀘溪 1 +瀝泗 1 +瀟洒 1 +火上加薪 1 +火候 1 +火喉 1 +火山 1 +火心 1 +火掌 1 +火炮 1 +火爆 1 +火鍋 1 +灰棕 1 +灰雲 1 +灰黑 1 +災禍 1 +炎熱 1 +炙手可熱 1 +炭疽 1 +炮 1 +炸彈 1 +炸死 1 +炸毀 1 +炸糕 1 +為時 1 +烈格司 1 +烏代 1 +烏來杜鵑 1 +烏孜別克 1 +烏宗哈珊 1 +烏干達 1 +烏德特 1 +烏扎 1 +烏拉圭 1 +烏普薩拉 1 +烏腳 1 +烏魯木齊 1 +烴 1 +烹煮 1 +焊接 1 +焗豆 1 +焚 1 +焚屍 1 +焚燒 1 +焜耀 1 +無俚頭 1 +無危 1 +無厭 1 +無子 1 +無家可歸 1 +無心 1 +無忌 1 +無所不能 1 +無暇 1 +無有 1 +無機 1 +無氧 1 +無水氯化鋁 1 +無派 1 +無產 1 +無疑 1 +無盡 1 +無罪 1 +無能為力 1 +無與倫比 1 +無色 1 +無處 1 +無視 1 +無誤 1 +無過 1 +無量壽 1 +無關緊要 1 +無限 1 +無雙 1 +無頭 1 +無點 1 +無齒龍 1 +焦尼 1 +焦點 1 +煉油 1 +煉金 1 +煙 1 +煙囪 1 +煙槍 1 +煙霧 1 +煜全 1 +煤建 1 +煤氣 1 +煥 1 +煦 1 +照射 1 +煮 1 +煮食 1 +煽動 1 +熄匙 1 +熊族 1 +熊本 1 +熊隊 1 +熏烤 1 +熏陶 1 +熔化 1 +熔岩 1 +熟知 1 +熟釜 1 +熱值 1 +熱刺 1 +熱力 1 +熱心 1 +熱愛 1 +熱羅姆 1 +熱身 1 +熱量 1 +熱電 1 +熱鬧 1 +熾熱 1 +燁 1 +燃氣 1 +燈謎 1 +燒灼 1 +燒荒 1 +燕 1 +燕窩 1 +營口 1 +營團 1 +營地 1 +營寨 1 +營帳 1 +營火 1 +營造 1 +營長 1 +營養 1 +燦爛 1 +燭光 1 +燾 1 +爐 1 +爪部 1 +爬到 1 +爬山 1 +爬梯 1 +爭冠 1 +爭占 1 +爭吵 1 +爭奪 1 +爭寵 1 +爭得 1 +爭界 1 +爭相 1 +爭端 1 +爭競 1 +爭論 1 +爭鬥 1 +父風 1 +爸爸 1 +爺 1 +爺爺 1 +爽文 1 +爾炘 1 +牆 1 +牆上 1 +牆身 1 +牆面 1 +片劑 1 +片尾 1 +片斷 1 +片頭 1 +版主 1 +版畫 1 +牌照 1 +牙籤 1 +牙線 1 +牙薩克 1 +牙醫 1 +牛池 1 +牛潭尾 1 +牛石 1 +牛首 1 +牛鼻栓 1 +牟 1 +牟利 1 +牟合 1 +牠 1 +牡蠣 1 +牧 1 +牧區 1 +牧民 1 +牧羊 1 +牧谷 1 +物件 1 +物產 1 +物象 1 +物鏡 1 +物阜 1 +牲畜 1 +特備 1 +特優 1 +特務 1 +特區 1 +特工 1 +特快 1 +特意 1 +特拉華 1 +特攝 1 +特派 1 +特爾瑪 1 +特瓦史塔 1 +特產 1 +特異 1 +特菲爾 1 +特重 1 +特隆赫姆 1 +特雷格羅恩 1 +牽引 1 +牽牛花 1 +犧牲 1 +犬科 1 +犬種 1 +犬髖 1 +犯人 1 +狂亂 1 +狄 1 +狄拉克 1 +狐 1 +狐庸 1 +狡猾 1 +狸藻 1 +狹小 1 +狼人 1 +狼堡 1 +狼影 1 +狼群 1 +猜忌 1 +猜想 1 +猝死 1 +猴年 1 +猴群 1 +猶大 1 +獅子 1 +獎牌 1 +獎盃 1 +獨一無二 1 +獨具 1 +獨唱 1 +獨孤 1 +獨家 1 +獨有 1 +獨眠 1 +獨行 1 +獨資 1 +獲准 1 +獲判 1 +獲勳 1 +獲召 1 +獲悉 1 +獲授 1 +獲獎 1 +獲益 1 +獲薦 1 +獲選 1 +獲頒 1 +獵物 1 +獸人 1 +獸族 1 +獻 1 +獻上 1 +獻堂 1 +獻策 1 +獻議 1 +玄天 1 +玄宗 1 +玄武 1 +玄策 1 +玄貓 1 +玉柴 1 +玉純 1 +玉魔 1 +玉鳳花 1 +玉麟 1 +王儲 1 +王冠 1 +王墓 1 +王宮 1 +王座 1 +王爾德 1 +王蓮 1 +玩伴 1 +玩弄 1 +玩法 1 +玩笑 1 +玫瑰 1 +玲玲 1 +玷染 1 +珀斯 1 +珍寶 1 +珠 1 +珠璣 1 +珠鋼 1 +班克斯 1 +班卓 1 +班子 1 +班布里奇 1 +班機 1 +班次 1 +班禪 1 +班級 1 +現役 1 +現身 1 +球壇 1 +球差 1 +球星 1 +球根 1 +球狀 1 +球道 1 +球面 1 +琅 1 +理性 1 +理由 1 +琦 1 +琬 1 +琳 1 +琳達 1 +琴弓 1 +琺琅 1 +瑋 1 +瑛 1 +瑜伽 1 +瑞普肯 1 +瑞欽 1 +瑞霖 1 +瑟洛 1 +瑣法 1 +瑪 1 +瑪利 1 +瑪利亞路易莎 1 +瑪利歐 1 +瑪君龍 1 +瑪莉安 1 +瑪莎 1 +瑪麗特 1 +瑾 1 +環保 1 +環帶 1 +環狀 1 +環節 1 +環繞 1 +瓊斯 1 +瓊珊 1 +瓘 1 +瓜里利亞 1 +瓦伊什維爾卡斯 1 +瓦伊杜 1 +瓦卡加 1 +瓦德 1 +瓦拉 1 +瓦薩 1 +瓦解 1 +瓦里奧 1 +甄別 1 +甘草 1 +甚厚 1 +甚嚴 1 +甚多 1 +甚小 1 +甚深 1 +甚篤 1 +甚至是 1 +甜兒 1 +甜度 1 +生主 1 +生出 1 +生動 1 +生天 1 +生子 1 +生平 1 +生性 1 +生效 1 +生機 1 +生殺 1 +生氣 1 +生火 1 +生肖 1 +生財之道 1 +生還 1 +產 1 +產出 1 +產經 1 +甦醒 1 +用人 1 +用來 1 +用光 1 +用兵 1 +用字 1 +用完 1 +用手 1 +用有 1 +用水 1 +用藥 1 +用計 1 +用詞 1 +甬 1 +田園 1 +田地 1 +田心 1 +田納西 1 +田野 1 +田頭 1 +甲山 1 +甲殼 1 +申辦 1 +男人 1 +男士 1 +男嬰 1 +男方 1 +男童 1 +界定 1 +界限 1 +畔 1 +留傳 1 +留哥 1 +留待 1 +留空 1 +留聲 1 +留良 1 +畜牧 1 +畜養 1 +畢打 1 +畢氏 1 +畢蘭德拉 1 +畢馬威 1 +略帶 1 +略有 1 +略為 1 +畫下 1 +畫中 1 +畫分 1 +畫會 1 +畫畫 1 +畫面 1 +異事 1 +異姓 1 +異度 1 +異形 1 +異曲同工 1 +異母 1 +異端 1 +當上 1 +當下 1 +當值 1 +當官 1 +當屆 1 +當政 1 +當晚 1 +當期 1 +當歸 1 +當面 1 +疆域 1 +疏浚 1 +疏遠 1 +疑 1 +疑點 1 +疙瘩 1 +疲勞 1 +疲弱 1 +疼痛 1 +病原 1 +病患 1 +病情 1 +病歷 1 +病死 1 +病重 1 +症候 1 +症狀 1 +痕跡 1 +痙攣 1 +痛心疾首 1 +痢疾 1 +痰 1 +瘦 1 +瘧疾 1 +癌 1 +癖 1 +癥狀 1 +登 1 +登丹 1 +發 1 +發佈 1 +發作 1 +發兵 1 +發呆 1 +發奮 1 +發揚光大 1 +發改委 1 +發放 1 +發洩 1 +發炎 1 +發燒 1 +發牌 1 +發球 1 +發病 1 +發聲 1 +發財 1 +發車 1 +發配 1 +白丁 1 +白井 1 +白公 1 +白利南 1 +白化 1 +白堊 1 +白天 1 +白宮 1 +白砂 1 +白蓮 1 +白蛇 1 +白軍 1 +白金 1 +白銅 1 +白陵 1 +白雲 1 +白面 1 +白頸長尾雉 1 +白鹿 1 +白麗 1 +百事 1 +百代 1 +百億 1 +百兆 1 +百帕斯卡 1 +百廢待舉 1 +百濟 1 +百無聊賴 1 +百老匯 1 +百花齊放 1 +百萬 1 +百貨 1 +百餘 1 +百鳴 1 +的士 1 +的確 1 +的黎波里 1 +皇位 1 +皇冠 1 +皇城 1 +皇太極 1 +皇妃 1 +皇廷 1 +皇權 1 +皇發 1 +皈依 1 +皋 1 +皓 1 +皓若 1 +皮亞韋 1 +皮克爾 1 +皮內羅洛 1 +皮特 1 +皮特凱恩 1 +皮耶特普拉桑克穆斯特魯 1 +皮雅福斯 1 +皰疹 1 +盆地 1 +盈盈 1 +益 1 +益城 1 +益新 1 +益處 1 +盔甲 1 +盛事 1 +盛大 1 +盛妝 1 +盛揮 1 +盛產 1 +盛行 1 +盜用 1 +盟 1 +盟軍 1 +盡到 1 +盡喪 1 +盡情 1 +盡頭 1 +監工 1 +監控 1 +監測 1 +監禁 1 +監聽 1 +盤踞 1 +盧 1 +盧加 1 +盧溝 1 +盧瓦斯 1 +盧甘斯克 1 +盧福瓦 1 +盪 1 +目睹 1 +目鏡 1 +直勉 1 +直屬 1 +直覺 1 +直言 1 +直說 1 +直間 1 +相位 1 +相傳 1 +相容 1 +相差無幾 1 +相悖 1 +相應 1 +相挺 1 +相異 1 +相稱 1 +相約 1 +相繼 1 +相聲 1 +相若 1 +相處 1 +相見 1 +相較 1 +相通 1 +相速 1 +相鄰 1 +相間 1 +盾座苣苔 1 +盾系 1 +省務 1 +省思 1 +省油 1 +眉山 1 +看中 1 +看出 1 +看台 1 +看得 1 +看看 1 +看管 1 +看見 1 +看透 1 +看重 1 +真 1 +真光 1 +真北 1 +真名 1 +真好 1 +真希 1 +真木 1 +真核 1 +真相大白 1 +眯眼 1 +眷村 1 +眼下 1 +眼淚 1 +眼狀 1 +眼球 1 +眼皮 1 +眼神 1 +眾經 1 +眾說紛紜 1 +睡 1 +睡眠 1 +睡覺 1 +督撫 1 +睾丁蛋白 1 +睿 1 +睿智 1 +瞪羚 1 +瞬時 1 +瞭如指掌 1 +矗立 1 +矛 1 +矢口否認 1 +知府 1 +知曉 1 +知足 1 +短少 1 +短期 1 +短草 1 +短裙 1 +短詩 1 +短語 1 +短音 1 +短髮 1 +矮星 1 +石像 1 +石器 1 +石塊 1 +石材 1 +石湖 1 +石灰 1 +石牆 1 +石牌 1 +石頭門坎 1 +砂拉越 1 +砂漿 1 +砂紙 1 +砍伐 1 +砒霜 1 +研磨 1 +砝碼 1 +破損 1 +破滅 1 +破舊 1 +破落 1 +硝庫爾 1 +硝酸甘油片 1 +硫 1 +硫化氫 1 +硫化鉛 1 +硫酸銨 1 +硬幣 1 +碑亭 1 +碑刻 1 +碧波 1 +碧琴 1 +碰撞 1 +碳紙 1 +碳酸鎂 1 +確知 1 +確診 1 +碼 1 +磁性 1 +磐田 1 +磚室 1 +磨坊 1 +磨折 1 +磨槽 1 +磷化 1 +磷素 1 +磷酸 1 +礙 1 +礦場 1 +礦物 1 +礦石 1 +礦藏 1 +示人 1 +示愛 1 +社皮 1 +社論 1 +社長 1 +祁鏞 1 +祈願 1 +祐希 1 +祖 1 +祖上 1 +祖圭 1 +祖外公 1 +祖外婆 1 +祖宗 1 +祖籍 1 +神仙 1 +神偷 1 +神器 1 +神明 1 +神殿 1 +神社 1 +神秘果 1 +神籤 1 +神魔 1 +祠 1 +祥子 1 +票據 1 +票數 1 +祭司 1 +祭壇 1 +祭師 1 +祭物 1 +祭祀 1 +祭酒 1 +祿勸 1 +祿山 1 +禁煙 1 +禁用 1 +禁藥 1 +禁賽 1 +禍 1 +福克沙尼 1 +福安 1 +福康安 1 +福慧 1 +福池 1 +福清 1 +禕 1 +禪師 1 +禮堂 1 +禮濤 1 +禮炮 1 +禮物 1 +禱文 1 +禽流感 1 +秀實 1 +秀康 1 +秀怡 1 +秀珠 1 +私下 1 +私交 1 +私奔 1 +私宅 1 +私家 1 +私立 1 +私財 1 +秉國 1 +秋人 1 +秋山 1 +秋爽 1 +秋興 1 +秋香 1 +科多爾 1 +科屬 1 +科恩 1 +科教 1 +科朗 1 +科爾基斯 1 +科特 1 +科目 1 +秘指 1 +租予 1 +租務 1 +租地 1 +租戶 1 +租用 1 +秦城 1 +秦州 1 +秦晉之好 1 +秦朝 1 +秦石 1 +秩序 1 +移交 1 +移往 1 +移植 1 +移至 1 +移送 1 +稀釋 1 +稅項 1 +稍為 1 +稗官野史 1 +種內 1 +種名 1 +種子 1 +種屬 1 +稱海 1 +稱病 1 +稱銜 1 +稻子 1 +稻草 1 +稼祥 1 +穀 1 +穀物 1 +穆宗 1 +穆拉 1 +穆斯塔法凱馬爾帕沙 1 +穆爾西亞 1 +穆薩 1 +積山 1 +積良 1 +穩 1 +穩固 1 +穩妥 1 +究竟 1 +空出 1 +空前 1 +空名 1 +空客 1 +空戰 1 +空隙 1 +空難 1 +穿幫 1 +穿戴 1 +穿甲 1 +穿行 1 +穿過 1 +突尼西亞 1 +突感 1 +突現 1 +窄袖 1 +窗口 1 +窗外 1 +窘境 1 +窟檐 1 +窮苦 1 +窮追 1 +窯 1 +窯洞 1 +竄紅 1 +竊聽 1 +立交 1 +立國 1 +立村 1 +立營 1 +立花 1 +立蒙 1 +立面 1 +立體 1 +站內 1 +站名 1 +站坪 1 +站廳 1 +站點 1 +竟 1 +章回 1 +章斐 1 +童女 1 +童男 1 +端川 1 +競相 1 +竹 1 +竹器 1 +竹治 1 +竹溪 1 +竹片 1 +笛 1 +符 1 +符桐 1 +第 1 +第999 1 +第三十三 1 +第十七 1 +第十五 1 +第十四 1 +第廿 1 +第比利斯 1 +第谷 1 +笳冬 1 +等位 1 +等客 1 +等號 1 +筐仔沙 1 +筒狀 1 +答應 1 +箏 1 +算出 1 +算術 1 +管制 1 +管子 1 +箬松 1 +箱型 1 +箴言 1 +節度 1 +節節 1 +範疇 1 +篡位 1 +篡國 1 +篡地 1 +簡化 1 +簡約 1 +簡訊 1 +簧 1 +簽名 1 +簽定 1 +簽認 1 +簽證 1 +簽賬 1 +籃筐 1 +籌備 1 +籌措 1 +籌款 1 +籌資 1 +籌辦 1 +籍貫 1 +籠式 1 +米南加保 1 +米古 1 +米哈伊 1 +米拉麥克斯 1 +米沙鄢 1 +米洛塞維奇 1 +米特斯 1 +米線 1 +米酒 1 +米高梅 1 +粉 1 +粉碎 1 +粉紅 1 +粉絲 1 +粗壯 1 +粗鱗蟒 1 +粵明 1 +粽子 1 +精 1 +精力 1 +精子 1 +精密 1 +精心 1 +精湛 1 +精算 1 +精索 1 +精裝 1 +糖尿 1 +糖蒜 1 +糞 1 +糟糕 1 +糧儲 1 +糧餉 1 +系數 1 +糾正 1 +糾紛 1 +紀元 1 +紂 1 +約定 1 +約熱夫 1 +約瑟芬 1 +約翰內斯堡 1 +約翰麥克連 1 +約長 1 +紅旗 1 +紅日 1 +紅杏出牆 1 +紅樓 1 +紅樓夢 1 +紅樹 1 +紅玉 1 +紅磨 1 +紅茶 1 +紅襪 1 +紅遍 1 +紅酒 1 +紅點 1 +紈 1 +紋路 1 +紋飾 1 +納入 1 +納塔爾 1 +納爾西斯 1 +納爾遜 1 +納瓦拉 1 +納蘇爾 1 +紐國 1 +紐澤西 1 +紐約尼克斯 1 +紐芬蘭 1 +紐華克 1 +紐黑文 1 +純一 1 +純凈 1 +純樸 1 +純陽 1 +紙上 1 +紙條 1 +紙盒 1 +級數 1 +素包 1 +素食 1 +素餡 1 +索倫 1 +索尼 1 +索溪峪 1 +索維克 1 +索菲 1 +索菲亞 1 +索西納 1 +索賠 1 +索馬里 1 +紮實 1 +累計 1 +細 1 +細岡 1 +細窄 1 +細菌 1 +細部 1 +細長 1 +紳士 1 +紹 1 +紹儀 1 +紹榮 1 +紺三郎 1 +終審 1 +終身大事 1 +組件 1 +組像 1 +組別 1 +組口 1 +組態 1 +組織胺 1 +組隊 1 +結交 1 +結冰 1 +結尾 1 +結雅 1 +絕壁 1 +絕大 1 +絕後 1 +絕版 1 +絕罰 1 +絞刑 1 +絞死 1 +絞痛 1 +給定 1 +給職 1 +給藥 1 +給體 1 +統 1 +統帥 1 +統籌 1 +絲山 1 +絲帶 1 +絶 1 +綁 1 +綉 1 +綏遠 1 +經國 1 +經意 1 +經文 1 +經昌 1 +經期 1 +經由 1 +經界 1 +綜 1 +綜理 1 +綜錄 1 +綠化 1 +綠帶 1 +綠滙 1 +綠燈 1 +綠社 1 +綠黨 1 +維健 1 +維克托 1 +維利爾斯 1 +維埃拉 1 +維多莉亞 1 +維希 1 +維德 1 +維景灣 1 +維爾紐斯 1 +維生 1 +維祀 1 +維羅納 1 +維記 1 +維護 1 +維迪斯 1 +維迪爾 1 +綱領 1 +網址 1 +網易 1 +網線 1 +網購 1 +綺塍 1 +綺色佳 1 +綽號 1 +綿羊 1 +緊張 1 +緊緊 1 +緊貼 1 +緊逼 1 +緊閉 1 +線上 1 +線前 1 +線度 1 +線條 1 +線索 1 +線道 1 +締造 1 +編上 1 +編導 1 +編程 1 +編篡 1 +編繪 1 +編纂 1 +編者 1 +編腔 1 +編隊 1 +緩衝 1 +緩解 1 +緩鬢 1 +緩龍 1 +緬 1 +緯來 1 +練兵 1 +緹 1 +縣市 1 +縣裡 1 +縫 1 +縫製 1 +縮寫 1 +縮小 1 +縱 1 +縱使 1 +縱觀 1 +縱隊 1 +總區 1 +總和 1 +總局 1 +總站 1 +總行 1 +總裁 1 +總計 1 +總辦 1 +績效 1 +繁多 1 +繁瑣 1 +繁盛 1 +繁雜 1 +繁體 1 +繞境 1 +繞開 1 +繡 1 +繩架 1 +繭 1 +繳付 1 +繳納 1 +繼業 1 +繼科 1 +續航 1 +續部 1 +纏足 1 +纜車 1 +缺口 1 +缺失 1 +缺少 1 +缺氧 1 +缺血 1 +罕有 1 +罪惡 1 +置有 1 +置物 1 +罰則 1 +署理 1 +罵聲 1 +罷免 1 +罷工 1 +罹癌 1 +罹難 1 +羅乞多毗闍 1 +羅什艾因 1 +羅伊 1 +羅克斯堡 1 +羅培茲 1 +羅夫 1 +羅希 1 +羅德西亞 1 +羅拔 1 +羅曼什 1 +羅柔 1 +羅森費爾德 1 +羅爾夫 1 +羅隆基 1 +羊圈 1 +美味 1 +美孚 1 +美寶 1 +美幸 1 +美林豬籠草 1 +美琴 1 +美知留 1 +美稱 1 +美索不達米亞 1 +美聯 1 +美聲 1 +美薇 1 +美術 1 +美觀 1 +美譽 1 +美里 1 +美食 1 +美麗華 1 +羚羊 1 +羞恥 1 +群峰 1 +群族 1 +群組 1 +群落 1 +群速 1 +群雄 1 +群體 1 +羨慕 1 +義久 1 +義勇 1 +義安 1 +義工 1 +義弘 1 +義春 1 +義民 1 +義父 1 +義項 1 +羱羊 1 +羲 1 +羽田 1 +羽絨 1 +翌日 1 +習經 1 +翔 1 +翔麟 1 +翟 1 +翠鳥 1 +翻覆 1 +翼手龍 1 +翼龍 1 +耀樞 1 +耀武 1 +耀邦 1 +老人 1 +老大 1 +老套 1 +老婦 1 +老將 1 +老少 1 +老弱 1 +老橋 1 +老漢 1 +考上 1 +考夫卡 1 +考尼律斯 1 +考柯 1 +考牙 1 +考生 1 +考究 1 +考績 1 +考進 1 +考選 1 +而已 1 +耐受 1 +耐庵 1 +耐玩 1 +耐航 1 +耳光 1 +耳勺 1 +耳孔 1 +耳朵眼 1 +耳珠 1 +耳環 1 +耳癤 1 +耳蝸 1 +耳門 1 +耳骨 1 +耶索洛 1 +耶路撒冷 1 +耽擱 1 +聆聽 1 +聖人 1 +聖保羅 1 +聖克萊爾 1 +聖名 1 +聖地亞哥 1 +聖彌格 1 +聖彼得堡 1 +聖徒 1 +聖拉扎爾 1 +聖歌 1 +聖水 1 +聖求 1 +聖潔 1 +聖祖 1 +聖神 1 +聖經 1 +聖訓 1 +聖赫勒拿 1 +聖赫勒拿島戴勝 1 +聖路易斯 1 +聖體 1 +聘問 1 +聘用 1 +聚氯乙烯 1 +聚禮 1 +聚苯乙烯 1 +聚變 1 +聚體 1 +聞名 1 +聞言 1 +聯姻 1 +聯播 1 +聯江 1 +聯浦 1 +聯產 1 +聯美 1 +聰敏 1 +聲恆 1 +聲援 1 +聲波 1 +聲谷 1 +聲門 1 +聲音 1 +聶丞益 1 +職員 1 +職棒 1 +聽到 1 +聽命 1 +聽從 1 +聽眾 1 +聽聞 1 +聾人 1 +肅宗 1 +肆 1 +肆意 1 +肇 1 +肉夾 1 +肉湯 1 +肉瘤 1 +肉緊 1 +肌肉 1 +肖嚴 1 +肚臍 1 +肚餓 1 +肝 1 +股市 1 +股本 1 +肥牛 1 +肥田 1 +肥胖 1 +肩 1 +肯 1 +肯亞 1 +肯特 1 +育有 1 +育樂 1 +育空 1 +肺病 1 +胃 1 +胃石 1 +背上 1 +背依 1 +背包 1 +背叛 1 +背後 1 +背靠 1 +背面 1 +背鰭 1 +胎 1 +胚 1 +胚胎 1 +胞 1 +胞弟 1 +胡特勒 1 +胡禮 1 +胡蜂 1 +胡馬雍 1 +胸痛 1 +胸管 1 +胸部 1 +胸鰭 1 +能人 1 +能否 1 +能幹 1 +脆 1 +脊椎 1 +脫疽 1 +脫落 1 +脫隊 1 +脫離 1 +脱口秀 1 +脾氣 1 +腐敗 1 +腐蝕 1 +腓力 1 +腔 1 +腫瘤 1 +腳掌 1 +腳本 1 +腳點 1 +腸胃 1 +腸道 1 +腸骨 1 +腹 1 +腿 1 +腿部 1 +膝傷 1 +膝頭 1 +膠 1 +膠州 1 +膠東 1 +膠澳 1 +膠體 1 +膨脹 1 +膽 1 +膽酸 1 +臉 1 +臉頰 1 +臉龐 1 +臘 1 +臥龍 1 +臧 1 +臨 1 +臨榆 1 +臨終 1 +臨高 1 +自作自受 1 +自保 1 +自信 1 +自卑 1 +自在 1 +自學 1 +自帶 1 +自強 1 +自從 1 +自成 1 +自用 1 +自發 1 +自製 1 +自訂 1 +自負 1 +自辦 1 +至上 1 +至善 1 +至柔 1 +至正 1 +至死不渝 1 +至關 1 +至關重要 1 +致使 1 +致函 1 +致恐 1 +致病 1 +致瘋 1 +致癌 1 +臺大 1 +舀出 1 +舅父 1 +興 1 +興國 1 +興學 1 +興業 1 +興海 1 +興祖 1 +舉世矚目 1 +舉例 1 +舉國 1 +舉止 1 +舉薦 1 +舉起 1 +舊友 1 +舊屋 1 +舊時 1 +舊稱 1 +舊部 1 +舊金山 1 +舌頭 1 +舍爾 1 +舍訥費爾德 1 +舒 1 +舒查特 1 +舒爾特 1 +舜初 1 +舞 1 +舞劇 1 +舞陽 1 +舟 1 +航天 1 +航站 1 +般若 1 +船塢 1 +船山 1 +船業 1 +船體 1 +艦身 1 +良 1 +良師益友 1 +良心 1 +良性 1 +良田 1 +良知 1 +艱巨 1 +色帶 1 +色情 1 +色目 1 +色調 1 +艷姬 1 +艷麗 1 +艾伍士 1 +艾倫 1 +艾塞羅 1 +艾夏 1 +艾崔奇 1 +艾巴德 1 +艾度蘭 1 +艾琳 1 +艾瑞 1 +艾瑪 1 +艾登堡 1 +艾美 1 +艾蓮娜 1 +艾薩克 1 +艾迴 1 +艾雲 1 +艾麗卡 1 +芬妮 1 +芬華絲 1 +芬迪絲 1 +芭蕉 1 +芭黎絲 1 +花上 1 +花俏 1 +花園蔥蝸牛 1 +花坮 1 +花城 1 +花店 1 +花旗 1 +花月 1 +花果 1 +花枝 1 +花瓶 1 +花甲 1 +花蜜 1 +花鞋 1 +苗栗 1 +苗穗 1 +苟且 1 +若愚 1 +若羌 1 +若英 1 +苦 1 +苦力 1 +苦悶 1 +苦情 1 +苦苣苔 1 +苦讀 1 +苯並芘 1 +苯乙烯 1 +英一 1 +英乙 1 +英倫 1 +英傑 1 +英勇 1 +英吋 1 +英國短毛豬 1 +英寸 1 +英尺 1 +英年 1 +英廷 1 +英格瑪 1 +英男 1 +英里 1 +英龍華 1 +茂 1 +茂名 1 +范恩 1 +茄南 1 +茄芮 1 +茅家 1 +茲羅提 1 +茶樓 1 +茶湯 1 +茶館 1 +荃灣 1 +荃麟 1 +草原 1 +草地 1 +草坪 1 +草席 1 +草稿 1 +荊州 1 +荒地 1 +荒蕪 1 +荒誕不經 1 +荔灣 1 +荷爾蒙 1 +荷銀 1 +莆 1 +莊嚴 1 +莊王 1 +莎樂美 1 +莫 1 +莫吉爾諾 1 +莫埃索 1 +莫扎特 1 +莫札特 1 +莫桑 1 +莫瑙恩 1 +莫瓦桑 1 +莫納加斯 1 +莫臥兒 1 +莫過 1 +莫里亞 1 +莽山 1 +菅 1 +菊 1 +菊花 1 +菜 1 +華倫西亞 1 +華少 1 +華新 1 +華族 1 +華林 1 +華爾 1 +華界 1 +華石 1 +華秀 1 +華納 1 +華西 1 +華頓 1 +菲力 1 +菲國 1 +菲德爾 1 +菲爾 1 +菲萊 1 +菲詩 1 +菸害 1 +萊因 1 +萊夫斯 1 +萊希 1 +萊斯特 1 +萊爾 1 +萊特曼 1 +萊茵蘭 1 +萊蕪 1 +萊采巴 1 +萌 1 +萌芽 1 +萎縮 1 +萬一 1 +萬丹 1 +萬貴 1 +落 1 +落下 1 +落實 1 +落敗 1 +落葉 1 +葆玖 1 +葉利欽 1 +葉士域治 1 +葉序 1 +葉綠 1 +著手 1 +著有 1 +著譯 1 +葛 1 +葛力馬 1 +葛朱 1 +葛浩文 1 +葛羅斯 1 +葛蕾絲 1 +葛量洪 1 +葡 1 +葡超 1 +葫蘆 1 +葬禮 1 +葵青 1 +蒂利妮 1 +蒂娜 1 +蒂迦納 1 +蒙丹 1 +蒙卡達 1 +蒙哥 1 +蒙哥馬利 1 +蒙塔尼萊博恩 1 +蒙巴薩 1 +蒙得維 1 +蒙特利爾 1 +蒙羞 1 +蒙面 1 +蒙馬特 1 +蒲 1 +蒲飛 1 +蒸氣 1 +蒸發 1 +蒼白 1 +蓄水 1 +蓋兒 1 +蓋因 1 +蓋多 1 +蓋曼 1 +蓋朗杜克西亞 1 +蓋頂 1 +蓓 1 +蓓天翼龍 1 +蓬塔德馬塔 1 +蓬拉貝 1 +蓬皮杜 1 +蓮 1 +蓮安 1 +蓮花 1 +蔑稱 1 +蔡斯 1 +蔣公 1 +蕙嫻 1 +蕨類 1 +蕩漾 1 +蕾妮 1 +薄 1 +薄弱 1 +薄扶林 1 +薔 1 +薛慶 1 +薦 1 +薩克森 1 +薩凡娜 1 +薩卡拉瓦 1 +薩哈 1 +薩哈林 1 +薩平頓 1 +薩德 1 +薩拉只 1 +薩摩亞 1 +薩爾曼 1 +薩爾瓦多 1 +薩爾茨卡默古特 1 +薩爾馬提亞 1 +薩瑞阿尼迪 1 +薩維塔 1 +薩維奧洛夫 1 +薩馬 1 +薪俸 1 +藉助 1 +藉此 1 +藍儂 1 +藍寶石華麗雨林 1 +藍尼 1 +藍本 1 +藍欽 1 +藍潟 1 +藍灰 1 +藍田 1 +藍白 1 +藍背 1 +藍邊 1 +藍領 1 +藍黨 1 +藏之介 1 +藏寶 1 +藏有 1 +藝 1 +藝名 1 +藝能 1 +藝謀 1 +藝電 1 +藤原 1 +藤木 1 +藤本 1 +藤村 1 +藤枝 1 +藤藝 1 +藥品 1 +藥師 1 +藥材 1 +藥水 1 +藥石 1 +藩主 1 +藩士 1 +藩西 1 +蘇利文 1 +蘇北 1 +蘇尋三 1 +蘇木 1 +蘇格拉底 1 +蘇維匯 1 +蘇美爾 1 +蘇萊曼尼亞 1 +蘇醒 1 +蘇里南 1 +蘊藏 1 +蘭利 1 +蘭卡斯特 1 +蘭封 1 +蘭弗朗克 1 +蘭德 1 +虎式 1 +虎棒 1 +虎翼 1 +虎視眈眈 1 +虔信 1 +處之泰然 1 +處女 1 +處決 1 +處置 1 +處長 1 +虛弱 1 +虛榮 1 +虛無 1 +號吾 1 +號子 1 +號稱 1 +號誌 1 +虢 1 +虢國 1 +虹 1 +虹橋 1 +蚊類 1 +蚩尤 1 +蛇油 1 +蛇種 1 +蛇魔 1 +蛋 1 +蛋白質 1 +蛙 1 +蜂擁而至 1 +蜂蜜 1 +蜆殼 1 +蜚聲 1 +蜥蜴 1 +蜿蜒 1 +蝴蝶 1 +融入 1 +融化 1 +融和 1 +融雪 1 +螞蟻 1 +螢幕 1 +蟬聯 1 +蟲 1 +蟲洞 1 +蠟浸 1 +蠶院 1 +蠻子 1 +血型 1 +血液 1 +血竭 1 +血管 1 +血腥 1 +行人 1 +行使 1 +行列 1 +行將 1 +行用 1 +行禮 1 +行長 1 +行騙 1 +術 1 +街上 1 +街名 1 +街市 1 +街路 1 +街頭 1 +衛理 1 +衝動 1 +衝鋒 1 +衡 1 +衡量 1 +衢山 1 +衣 1 +衣冠 1 +衣物 1 +衣索比亞 1 +表型 1 +表妹 1 +表姐 1 +表徵 1 +表情 1 +表態 1 +表揚 1 +表格 1 +表決 1 +表白 1 +表述 1 +衰敗 1 +衰落 1 +袖手旁觀 1 +袖箭 1 +被告 1 +被子 1 +裁決 1 +裁減 1 +裂縫 1 +裂變 1 +裋褐 1 +裕 1 +裕智 1 +裕軍 1 +裙子 1 +補償 1 +補天 1 +補教 1 +補時 1 +補褂 1 +裝修 1 +裝備 1 +裝嵌 1 +裝有 1 +裝瓶 1 +裝葯 1 +裝設 1 +裝載 1 +裴 1 +裴林 1 +裸子 1 +裸照 1 +製備 1 +製得 1 +複數 1 +褐色 1 +褪色 1 +褲 1 +褲子 1 +褲袋 1 +襄 1 +襄助 1 +襄王 1 +襄陽 1 +襲 1 +襲封 1 +西亞特 1 +西京 1 +西周 1 +西哈莫尼 1 +西坑 1 +西域 1 +西夏 1 +西奧多 1 +西宮 1 +西岸 1 +西島 1 +西廠 1 +西式 1 +西弗萊德 1 +西斯廷 1 +西晉 1 +西段 1 +西河 1 +西洋坪 1 +西漢 1 +西甌 1 +西線 1 +西美 1 +西蒙 1 +西薩 1 +西蘭卡普 1 +西西里 1 +西距 1 +西迪 1 +西鄉 1 +要是 1 +要脅 1 +要衝 1 +要道 1 +見人 1 +見稱 1 +見聞 1 +見解 1 +見識 1 +見長 1 +規例 1 +覓食 1 +視乎 1 +視作 1 +視圖 1 +視角 1 +親人 1 +親信 1 +親政 1 +親朋 1 +親筆 1 +親臨 1 +親身 1 +覺察 1 +覽 1 +觀光 1 +觀察 1 +觀念 1 +觀戰 1 +觀望 1 +觀看 1 +觀者 1 +角膜 1 +解僱 1 +解夢 1 +解析 1 +解答 1 +解職 1 +解脫 1 +解說 1 +觸怒 1 +觸手可及 1 +觸覺 1 +觸診 1 +言官 1 +言語 1 +言辭 1 +訂位 1 +訃告 1 +訄書 1 +訇開 1 +計委 1 +計謀 1 +討逆 1 +訓 1 +託 1 +記念 1 +記述 1 +記集 1 +設站 1 +許昌 1 +許諾 1 +許願 1 +訴 1 +訴求 1 +訴諸 1 +註 1 +註明 1 +註銷 1 +詐死 1 +詔書 1 +評出 1 +評判 1 +評鑑 1 +詛咒 1 +詞幹 1 +詞義 1 +詢問 1 +試劑 1 +試播 1 +試種 1 +試製 1 +試音 1 +試飛 1 +詩文 1 +該事 1 +該人 1 +該墓 1 +該島 1 +該年 1 +該批 1 +該族 1 +該會 1 +該條 1 +該段 1 +該科 1 +該系 1 +該處 1 +該路 1 +該黨 1 +詳情 1 +詳細 1 +詹姆士 1 +詼諧 1 +誇德拉多 1 +誇祖魯 1 +誌 1 +誌家 1 +認一民 1 +認同 1 +認定 1 +認罪 1 +認證 1 +認輔 1 +誓言 1 +誕 1 +誕下 1 +誘因 1 +語文 1 +語法 1 +語流 1 +語訓 1 +語調 1 +語速 1 +語音 1 +誠意 1 +誤 1 +誤信 1 +誤差 1 +誤會 1 +誤槍 1 +誤譯 1 +誥命 1 +誦 1 +說出 1 +說客 1 +說成 1 +說話 1 +說謊 1 +說道 1 +課本 1 +誹謗 1 +調值 1 +調停 1 +調入 1 +調和 1 +調控 1 +調水 1 +調沙 1 +調研 1 +調節 1 +調職 1 +調解 1 +諂媚 1 +談判 1 +談妥 1 +談論 1 +請來 1 +請辭 1 +請願 1 +論事 1 +諜海 1 +諧波 1 +諶 1 +諸 1 +諸如 1 +諸暨 1 +諸河 1 +諺言 1 +諾丁漢 1 +諾域治 1 +諾斯 1 +諾曼 1 +諾爾曼 1 +謀取 1 +謀士 1 +謀求 1 +謀職 1 +謁者 1 +謇 1 +謊言 1 +謙卑 1 +謚 1 +講完 1 +講究 1 +講談 1 +講道 1 +謝世 1 +謝列梅捷沃 1 +謝爾比 1 +謝瓦爾德納澤 1 +謝蓋爾 1 +謹 1 +謹慎 1 +證 1 +譚 1 +譜代 1 +警務 1 +警句 1 +警告 1 +警員 1 +警戒 1 +警衛 1 +警覺 1 +警鐘 1 +譯作 1 +譯員 1 +譯場 1 +譯本 1 +議席 1 +譴責 1 +護佑 1 +護城 1 +護墊 1 +護送 1 +讀取 1 +讀法 1 +變動 1 +變差 1 +變調 1 +變身 1 +變遷 1 +變革 1 +讓步 1 +讓開 1 +讚喻 1 +讚揚 1 +讚美 1 +讚譽 1 +谷山 1 +谷氨酸 1 +豆瓣 1 +豈 1 +豎立 1 +豎起 1 +豐久 1 +豐厚 1 +豐城 1 +豐臣 1 +豐隆 1 +象數 1 +象晉 1 +象牙 1 +象牙喙啄木鳥 1 +豢養 1 +豪宅 1 +豪門 1 +豫南 1 +豬 1 +豬圈 1 +豬油 1 +豬肉 1 +貂 1 +貓咪 1 +貓囒 1 +貓科 1 +貝克 1 +貝克漢 1 +貝加爾 1 +貝南 1 +貝斯 1 +貝爾普 1 +貝爾蘇斯 1 +貝碧嘉 1 +貝納斯科 1 +貝都因 1 +貝類 1 +貞昌 1 +貞潔 1 +貞觀 1 +負擔 1 +負芻 1 +負荷 1 +負面 1 +負額 1 +財經 1 +財落 1 +貢 1 +貢品 1 +貢哥拉 1 +貢嘎 1 +貢巴 1 +貧 1 +貧乏 1 +貧窮 1 +貧鈾 1 +貨 1 +貨品 1 +貨機 1 +販賣 1 +貪圖 1 +貪婪 1 +貪心 1 +貪瀆 1 +貫徹 1 +貫穿 1 +貫通 1 +責怪 1 +責難 1 +貴築 1 +貴賓 1 +貴陽 1 +貴霜 1 +貶意 1 +買入 1 +買賣 1 +費曼 1 +費爾南多 1 +費用 1 +費盡 1 +費羅 1 +貼身 1 +賀特 1 +賀立 1 +賄選 1 +資 1 +資政 1 +資陽 1 +賈亞辛哈 1 +賈多特 1 +賈斯丁 1 +賈斯珀 1 +賈氏 1 +賓客 1 +賓尼迪斯 1 +賓州 1 +賞識 1 +賠禮 1 +賡臣 1 +賢思 1 +賣 1 +賣出 1 +賣到 1 +賣地 1 +賣家 1 +賣掉 1 +賣空 1 +賤女 1 +賤民 1 +質詢 1 +賭徒 1 +賭檔 1 +賴宣 1 +賺取 1 +賺錢 1 +購得 1 +購置 1 +賽場 1 +賽普勒斯 1 +賽爾金德 1 +賽車 1 +賽道 1 +贈 1 +贈送 1 +贊博尼 1 +贊成 1 +贊比西亞 1 +贏家 1 +贖回 1 +赤坂 1 +赤壁 1 +赤樹 1 +赤狐 1 +赤鱲 1 +赦 1 +赫伯特 1 +赫塔卜 1 +赫斯 1 +赫比格 1 +赫爾克 1 +赫爾辛基 1 +赫雷爾斯 1 +赫魯曉夫 1 +走上 1 +走到 1 +走勢 1 +走漏 1 +走私 1 +起事 1 +起伏 1 +起初 1 +起名 1 +起因 1 +起始 1 +起建 1 +起止 1 +起死回生 1 +起碼 1 +起端 1 +起舞 1 +起落 1 +起訖 1 +起降 1 +起點 1 +趁 1 +超出 1 +超導 1 +超強 1 +超我 1 +超時 1 +超武 1 +超然 1 +超重 1 +超齡 1 +越亮 1 +越共 1 +越前 1 +越好 1 +越弱 1 +越戰 1 +越早 1 +越暗 1 +越牆 1 +越發 1 +越近 1 +越過 1 +趕往 1 +趙氏 1 +趟 1 +趣事 1 +趨勢 1 +趨於 1 +足不出戶 1 +足夠 1 +足見 1 +足跡 1 +趾爪 1 +趾骨 1 +跋扈 1 +跌 1 +跑 1 +跑壘 1 +跑步 1 +跑車 1 +跑馬 1 +跟操 1 +跟班 1 +跟蹤 1 +跟進 1 +跟隨 1 +跨 1 +跨國 1 +跨度 1 +跨步 1 +跨足 1 +跨過 1 +路政 1 +路易斯安那 1 +路濟亞 1 +路綫 1 +路網 1 +路透 1 +路過 1 +路障 1 +路面 1 +跳動 1 +跳槽 1 +跳過 1 +跳遠 1 +跳高 1 +踏上 1 +踏入 1 +踢進 1 +躁 1 +躁動 1 +躍升 1 +身受 1 +身型 1 +身旁 1 +身為 1 +身無分文 1 +身著 1 +身軀 1 +身高 1 +躬耕 1 +躲到 1 +車上 1 +車仁 1 +車型 1 +車士打菲特 1 +車外 1 +車尾 1 +車市 1 +車廠 1 +車手 1 +車票 1 +車程 1 +車窗 1 +車系 1 +車號 1 +車費 1 +車路士 1 +車迷 1 +車頭 1 +軋箏 1 +軌跡 1 +軍中 1 +軍備 1 +軍功 1 +軍務 1 +軍委 1 +軍師 1 +軍援 1 +軍方 1 +軍服 1 +軍營 1 +軍艦 1 +軍裝 1 +軍階 1 +軍需 1 +軒轅 1 +軟 1 +軟化 1 +軟硬體 1 +軟骨 1 +軸 1 +軸心 1 +較低 1 +較佳 1 +較厚 1 +較快 1 +較深 1 +載人 1 +載淳 1 +輔 1 +輔佐 1 +輕微 1 +輕易 1 +輕軌 1 +輕鐵 1 +輕髻 1 +輕鬆 1 +輝 1 +輝彥 1 +輪周 1 +輪廓 1 +輪流 1 +輪船 1 +輪迴 1 +輯 1 +輯錄 1 +輸 1 +輸掉 1 +輸精 1 +輸血 1 +輸送 1 +輻轍 1 +輻鰭 1 +輾轉 1 +轅 1 +轉交 1 +轉任 1 +轉動 1 +轉化 1 +轉向 1 +轉型 1 +轉差 1 +轉往 1 +轉念 1 +轉播 1 +轉會 1 +轉正 1 +轉角 1 +轉賣 1 +轉赴 1 +辛普朗 1 +辛普森 1 +辛辛那提 1 +辜 1 +辟邪 1 +辦學 1 +辦有 1 +辨別 1 +辨明 1 +辨識 1 +辭典 1 +辭官 1 +辭歲 1 +辯證 1 +辰國 1 +辰男 1 +農事 1 +農墾 1 +農書 1 +農林 1 +農舍 1 +迅 1 +迅即 1 +迅猛 1 +迎 1 +迎神 1 +迎賓 1 +迎送 1 +迎面 1 +近似 1 +近侍 1 +近平 1 +近日 1 +近東 1 +近海 1 +近現代 1 +近親 1 +近鄰 1 +返 1 +返樸歸真 1 +迦南 1 +迦納 1 +迪克 1 +迪克蘭 1 +迪士尼 1 +迪斯雷利 1 +迪比亞吉奧 1 +迪爾汗 1 +迪米特 1 +迫切 1 +述 1 +迴流 1 +迷你變色龍 1 +迷唐 1 +迷路 1 +追兇 1 +追回 1 +追封 1 +追尋 1 +追尾 1 +追思 1 +追憶 1 +追查 1 +追根究底 1 +追殺 1 +追求 1 +追究 1 +追討 1 +追述 1 +退位 1 +退回 1 +退夷 1 +退居 1 +退敵 1 +退隱 1 +送來 1 +送到 1 +送回 1 +送殯 1 +送給 1 +送院 1 +逃亡 1 +逃奔 1 +逃至 1 +逃跑 1 +逆 1 +逆戟鯨 1 +逍遙 1 +透徹 1 +透支 1 +透水 1 +透視 1 +透鏡 1 +逐客 1 +途中 1 +途人 1 +途經 1 +這兒 1 +這時 1 +通俗 1 +通商 1 +通天 1 +通宏 1 +通州 1 +通渭 1 +通貨 1 +通通 1 +通運 1 +通靈 1 +通風 1 +逛街 1 +速往 1 +速銷 1 +造價 1 +造反 1 +造就 1 +造幣 1 +造福 1 +造血 1 +造訪 1 +造謠 1 +逢吉 1 +連串 1 +連克 1 +連坐 1 +連年 1 +連座 1 +連成 1 +連拍 1 +連筆 1 +連篇累牘 1 +連結 1 +連絡 1 +連通 1 +連進 1 +連餓 1 +週末 1 +週邊 1 +進位 1 +進來 1 +進出 1 +進動 1 +進犯 1 +逼 1 +逼使 1 +逼停 1 +逼到 1 +逾期 1 +遂起 1 +遇上 1 +遇刺 1 +遇有 1 +遇陛 1 +遇難 1 +遊憩 1 +遊擊 1 +遊歷 1 +遊艇 1 +遊覽 1 +遊說 1 +遊離 1 +運 1 +運回 1 +運往 1 +運煤 1 +運算 1 +運糧 1 +運補 1 +運載 1 +遍 1 +遍布 1 +過冷 1 +過剩 1 +過多 1 +過往 1 +過敏 1 +過橋 1 +過濾 1 +過甚 1 +過繼 1 +過苛 1 +過路 1 +過頭 1 +道世民 1 +道具 1 +道墟 1 +道士 1 +道學 1 +道宇 1 +道安 1 +道格拉斯 1 +道歉 1 +道理 1 +道綽 1 +道羅 1 +道義 1 +道靜 1 +達上 1 +達人 1 +達克斯 1 +達古武 1 +達恩利 1 +達拉斯 1 +達拏 1 +達拖錯 1 +達母拿錯 1 +達濠 1 +達爾文 1 +達章 1 +達華 1 +達賴 1 +違背 1 +遙陽 1 +遜位 1 +遞交 1 +遞增 1 +遠呂智 1 +遠嫁 1 +遠揚 1 +遠日 1 +遠洋 1 +遠處 1 +遠遠 1 +遠離 1 +遣 1 +遣返 1 +適之 1 +適用 1 +遭殃 1 +遮天 1 +遮蔭 1 +遮陰 1 +遲 1 +遲遲 1 +遷出 1 +遷居 1 +遷校 1 +選上 1 +選修 1 +選定 1 +選用 1 +選美 1 +選訓 1 +選調 1 +選進 1 +選題 1 +遹 1 +遺物 1 +遺留 1 +遺腹 1 +遺迹 1 +遺骸 1 +遼西翼龍 1 +避 1 +避禍 1 +避開 1 +邁克 1 +邁向 1 +邁阿密 1 +還擊 1 +還有 1 +邊區 1 +邗江 1 +那時 1 +那普拉夫尼克 1 +那曲 1 +邦國 1 +邦德 1 +邦蒂 1 +邦達倉 1 +邪惡 1 +邪神 1 +邪馬台 1 +邱家 1 +邳縣 1 +邵伯 1 +邵氏 1 +郊狼 1 +郎 1 +郝 1 +郡區 1 +郡縣 1 +郡艾塞克斯 1 +部位 1 +部字 1 +部將 1 +部首 1 +郪江 1 +郫縣 1 +郭家 1 +郵報 1 +郵輪 1 +都城嘉慕 1 +都察 1 +都尉 1 +都會 1 +都有 1 +都督 1 +都靈 1 +鄂 1 +鄂倫春 1 +鄂溫克 1 +鄂霍次克 1 +鄉內 1 +鄉團 1 +鄉村 1 +鄉長 1 +鄰 1 +鄰域 1 +鄰居 1 +鄰里 1 +酃縣 1 +酆 1 +配上 1 +配件 1 +配備 1 +配器 1 +配有 1 +配角 1 +酒家 1 +酒杯 1 +酒樓 1 +酒鬼 1 +酩酊大醉 1 +酵母 1 +酷似 1 +酷刑 1 +醉醺醺 1 +醋酸根 1 +醫書 1 +醫科 1 +醫術 1 +醬貨 1 +醴陵 1 +釀成 1 +釀造 1 +釉色 1 +釋出 1 +釋迦 1 +釋迦牟尼 1 +里士滿 1 +里奧多 1 +里港 1 +里馬 1 +重創 1 +重力 1 +重回 1 +重復 1 +重心 1 +重情 1 +重播 1 +重核 1 +重物 1 +重獲 1 +重現 1 +重生 1 +重用 1 +重疊 1 +重禮 1 +重組 1 +重義 1 +重考 1 +重製 1 +重複 1 +重見天日 1 +重讀 1 +重鎮 1 +重開 1 +重陽 1 +重音 1 +重鳳 1 +野外 1 +野心勃勃 1 +野戰 1 +野木 1 +野球 1 +野菜 1 +量度 1 +金剛 1 +金寶 1 +金帶英麗魚 1 +金幣 1 +金平 1 +金氏 1 +金泉 1 +金浦 1 +金湖 1 +金牛 1 +金獎 1 +金箔 1 +金羅斯 1 +金美 1 +金華 1 +金質 1 +金邊 1 +金銀 1 +金錢 1 +金門 1 +金靴 1 +金頂 1 +金魚 1 +金鵰 1 +釜山 1 +針劑 1 +釧路 1 +鈇 1 +鈦 1 +鈺源 1 +鉑金 1 +銀杏 1 +銀熊 1 +銀牌 1 +銀白 1 +銀紅 1 +銀色 1 +銅仁 1 +銅像 1 +銅削 1 +銅斧 1 +銅柄 1 +銅臿 1 +銅製 1 +銅銎 1 +銅錛 1 +銅錢 1 +銘 1 +銘皖 1 +銘銘 1 +銜稱 1 +銠 1 +銳利 1 +銷毀 1 +銷量 1 +鋒 1 +鋪成 1 +鋪有 1 +鋸齒龍 1 +鋼板 1 +錄影 1 +錄得 1 +錄放影機 1 +錢上 1 +錦 1 +錦俊 1 +錦承 1 +錦江 1 +錦田 1 +錫 1 +錫伯 1 +錫勇 1 +錫昌 1 +錯 1 +錯視 1 +錯覺 1 +錳 1 +錳礦 1 +鍊金 1 +鍋中 1 +鍋內 1 +鍋爐 1 +鍔 1 +鍛鍊 1 +鍝 1 +鍾 1 +鎖妖 1 +鎖閉 1 +鎮守 1 +鎮岳 1 +鎮朔 1 +鎮賚 1 +鎮里 1 +鎮靜 1 +鎰 1 +鎳銀 1 +鏈 1 +鏡波 1 +鏡湖 1 +鐳 1 +鐵削 1 +鐵匾 1 +鐵棍 1 +鐵民 1 +鐵爐 1 +鐵管 1 +鐵釘 1 +鐵銹 1 +鐵錛 1 +鑑別 1 +鑑定 1 +鑑泉 1 +鑑證 1 +鑒定 1 +鑫新 1 +鑽入 1 +鑽出 1 +鑽探 1 +鑿出 1 +長凳 1 +長史 1 +長婁 1 +長孫 1 +長岡 1 +長崎 1 +長廊 1 +長廷 1 +長方 1 +長榮 1 +長毛 1 +長治 1 +長溝 1 +長滿 1 +長瑪喀比 1 +長盛 1 +長笛 1 +長篇 1 +長編 1 +長跑 1 +長頸鹿 1 +長髮 1 +門修斯 1 +門廳 1 +門式 1 +閃米特 1 +閃長 1 +閃電 1 +閉日 1 +開價 1 +開光 1 +開啟 1 +開場 1 +開墾 1 +開學 1 +開工 1 +開往 1 +開戰 1 +開拓 1 +開挖 1 +開支 1 +開教 1 +開業 1 +開槍 1 +開球 1 +開瑞坦 1 +開票 1 +開車 1 +開辦 1 +開錄 1 +閑聊 1 +閑談 1 +閒言閒語 1 +間斷 1 +間碟 1 +間距 1 +閘口 1 +閘機 1 +閣 1 +閩侯 1 +閩南 1 +闖進 1 +關中 1 +關斷 1 +關連 1 +闡述 1 +闢 1 +阡陌 1 +阪神 1 +防凍 1 +防止 1 +防盜 1 +防護 1 +阻塞 1 +阻撓 1 +阻隔 1 +阿一 1 +阿仙奴 1 +阿信 1 +阿修羅 1 +阿內爾卡 1 +阿勒格尼郡 1 +阿勝 1 +阿勞 1 +阿基里斯 1 +阿堯 1 +阿奇里斯 1 +阿寧 1 +阿布 1 +阿拉法特 1 +阿斗 1 +阿普第 1 +阿曼達 1 +阿東 1 +阿格拉 1 +阿格雷斯蒂 1 +阿森斯 1 +阿森納 1 +阿比西尼亞豬 1 +阿波羅 1 +阿爾及利亞 1 +阿爾及爾 1 +阿爾布巴 1 +阿爾扎阿爾拉齊蓋 1 +阿爾法 1 +阿爾發 1 +阿爾茨海默 1 +阿爾高 1 +阿特 1 +阿特拉斯 1 +阿猴 1 +阿瑜陀耶 1 +阿穆爾 1 +阿羅那順 1 +阿耳忒彌斯 1 +阿聯酋 1 +阿育 1 +阿茲海默 1 +阿諾 1 +阿賈克斯 1 +阿赫 1 +阿連德 1 +阿道夫 1 +阿達姆庫斯 1 +阿里 1 +阿隆索 1 +陀斯妥也夫斯基 1 +附上 1 +附加 1 +附蟲 1 +附表 1 +附身 1 +降將 1 +降格 1 +降水 1 +降班 1 +降臨 1 +降魔 1 +限 1 +限定 1 +限時 1 +陞 1 +陡壁 1 +院士 1 +院子 1 +院落 1 +陣 1 +除冰 1 +除夕 1 +除此 1 +除非 1 +陪葬 1 +陪都 1 +陰天 1 +陰暗 1 +陰陽 1 +陳國 1 +陳屍 1 +陳相 1 +陳述 1 +陵園 1 +陶恩 1 +陷落 1 +陸仔 1 +陸域 1 +陸行 1 +陽 1 +陽安 1 +陽明 1 +隆亨 1 +隊列 1 +隊名 1 +隔日 1 +隔開 1 +隕星 1 +隕鐵 1 +際春 1 +隠居 1 +隨丁 1 +隨便 1 +隨同 1 +隨往 1 +隨時 1 +隨軍 1 +隨隊 1 +險些 1 +險要 1 +隱含 1 +隱姓埋名 1 +隱居 1 +隱性 1 +隱私 1 +隻身 1 +雄 1 +雄師 1 +雄獅 1 +雅克 1 +雅加達 1 +雅各布 1 +雅君 1 +集寧 1 +集結 1 +集聚 1 +雌性 1 +雌獸 1 +雌鯨 1 +雎 1 +雙十 1 +雙子 1 +雙江 1 +雜姓 1 +雜糧 1 +雜處 1 +雜食 1 +雞腿 1 +雞頭 1 +離別 1 +離域 1 +離場 1 +離子 1 +離島 1 +離群索居 1 +離職 1 +難吃 1 +難得 1 +難攻 1 +難過 1 +雨季 1 +雨後春筍 1 +雨林 1 +雪上加霜 1 +雪佛龍 1 +雪兒 1 +雪崩 1 +雪弟 1 +雪梅 1 +雲中 1 +雲亭 1 +雲岩 1 +雲松 1 +雲里 1 +零件 1 +零部件 1 +零食 1 +雷 1 +雷克南 1 +雷克斯 1 +雷切爾 1 +雷姆 1 +雷定 1 +雷昂納多 1 +雷曼 1 +雷王 1 +雷蒂亞 1 +雷雨 1 +電信 1 +電器 1 +電極 1 +電氣 1 +電瓶 1 +電線 1 +電通 1 +電邀 1 +需時 1 +霆鋒 1 +震寰 1 +震波 1 +震災 1 +霍亂 1 +霍伊爾 1 +霍夫堡 1 +霍姆 1 +霍巴特 1 +霍斯 1 +霍普金斯 1 +霍爾滕 1 +霍爾特 1 +霞 1 +霧 1 +露出 1 +露比 1 +露臉 1 +露西 1 +霸佔 1 +霸權 1 +靈前 1 +靈力 1 +靈性 1 +靈感 1 +靈柩 1 +靈活 1 +靈異 1 +靈籤 1 +靈長 1 +靈魂 1 +青 1 +青梅 1 +青森 1 +青睞 1 +青訓 1 +青金 1 +靖 1 +靖雯 1 +靜安 1 +靜岡 1 +靜華 1 +靠右 1 +靠左 1 +面具 1 +面向 1 +面貌 1 +革除 1 +鞏 1 +鞦韆 1 +韃靼 1 +韋 1 +韋契特 1 +韋德 1 +韋拉克魯斯 1 +韋拿 1 +韋斯特 1 +韋科 1 +韌 1 +韓氏 1 +韓浜 1 +音律 1 +音色 1 +音量 1 +音高 1 +韶之 1 +響號 1 +頂上 1 +頂尖 1 +頂峰 1 +頂端 1 +頂級 1 +項鏈 1 +順宗 1 +順岸 1 +順德 1 +順應 1 +順懷 1 +順治 1 +順滑 1 +順陽 1 +頌平 1 +頌揚 1 +預 1 +預估 1 +預告 1 +預知 1 +預示 1 +預約 1 +頑石 1 +頒給 1 +頗 1 +頗多 1 +頗大 1 +頗有 1 +頗盛 1 +頗豐 1 +領事 1 +領取 1 +領奏 1 +領航 1 +領軍 1 +領隊 1 +頡 1 +頭上 1 +頭前 1 +頭型 1 +頭尾 1 +頭槌 1 +頭版 1 +頭盔 1 +頭紗 1 +頭髮 1 +頸 1 +頸部 1 +頹垣 1 +頻 1 +頻寬 1 +頻散 1 +頻繁 1 +頻頻 1 +題獻 1 +題記 1 +額外 1 +額度 1 +類別 1 +類固醇 1 +顥 1 +顯 1 +顯光 1 +顯徑 1 +顯現 1 +顯靈 1 +風化 1 +風尚 1 +風波 1 +風行 1 +風間 1 +風雨 1 +飈 1 +飛往 1 +飛抵 1 +飛毛 1 +飛沫 1 +飛碟 1 +飛鏢 1 +飛靶 1 +飛鳥 1 +飛龍 1 +食人 1 +食肆 1 +食肉 1 +食蟲 1 +食鹽 1 +飲茶 1 +飼料 1 +飼草 1 +飽和 1 +飽經 1 +飾物 1 +餃子 1 +餅 1 +養份 1 +養大 1 +養女 1 +養母 1 +養父 1 +養精蓄銳 1 +養育 1 +養菊 1 +養蠶 1 +餐車 1 +餘 1 +餘熱 1 +餘眾 1 +館前 1 +館名 1 +館址 1 +饃 1 +饑餓 1 +饒平 1 +饕餮 1 +首仗 1 +首個 1 +首名 1 +首場 1 +首屈一指 1 +首席 1 +首戰 1 +首批 1 +首日 1 +首映 1 +首條 1 +首艦 1 +首讀 1 +香 1 +香亭 1 +香儂 1 +香吉士 1 +香味 1 +香坊 1 +香塍 1 +香水 1 +香洲 1 +香火 1 +香織 1 +馬丁尼茲 1 +馬丁斯維勒 1 +馬上 1 +馬修 1 +馬克安諾 1 +馬克西米利 1 +馬內阿 1 +馬六甲 1 +馬匹 1 +馬喇 1 +馬圈 1 +馬奇頓 1 +馬尼拉 1 +馬托格羅索 1 +馬爾他 1 +馬爾吉阿納 1 +馬爾地夫 1 +馬爾默 1 +馬球 1 +馬約拉那 1 +馬莎 1 +馬薩 1 +馬薩諸塞 1 +馬賽 1 +馬赫盧普 1 +馬路 1 +馬達加斯加 1 +馬里內蒂 1 +馬里蘭 1 +馬雅可夫斯基 1 +馬鞍 1 +馬黑麻 1 +馳名 1 +馴化 1 +駐任 1 +駐地 1 +駐防 1 +駕崩 1 +駙馬 1 +駛 1 +駛入 1 +駛過 1 +駿業 1 +騁遠 1 +騎 1 +騎馬 1 +騏一郎 1 +騙徒 1 +騰出 1 +騰訊 1 +騷擾 1 +驅 1 +驗屍 1 +驗票 1 +驗證 1 +驗電 1 +驚人 1 +驚動 1 +驚喜 1 +驚嘆 1 +驚訝 1 +驚醒 1 +驟減 1 +驟逝 1 +驢肉 1 +驥 1 +骨幹 1 +骯髒 1 +骷髏 1 +體側 1 +體外 1 +體委 1 +體工 1 +體會 1 +體溫 1 +髖骨 1 +高下 1 +高傲 1 +高傲不群 1 +高出 1 +高升 1 +高地 1 +高大 1 +高峰 1 +高座 1 +高手 1 +高效 1 +高新 1 +高杉 1 +高檔 1 +高清 1 +高漲 1 +高熱 1 +高燥 1 +高爾夫 1 +高爾德 1 +高琦 1 +高盧 1 +高聳 1 +高處 1 +高買 1 +高質 1 +高超 1 +高雄 1 +高高在上 1 +髮 1 +髮生 1 +髮辮 1 +鬆髻 1 +鬚 1 +鬚鯨 1 +鬥雞 1 +鬧 1 +鬧出 1 +鬼影 1 +鬼怪 1 +鬼道 1 +魁智 1 +魅惑 1 +魏國 1 +魏斯曼 1 +魏氏 1 +魏澤爾 1 +魔力 1 +魔界 1 +魔石 1 +魔鬼 1 +魚尾 1 +魚腹 1 +魚苗 1 +魚類 1 +魯 1 +魯伯 1 +魯國 1 +魯特 1 +魯登尼亞 1 +魯良新元 1 +魯茨科伊 1 +魯西迪 1 +魯道夫 1 +鮑亞士 1 +鮑克瑟 1 +鮑爾溫 1 +鮑維 1 +鮑里斯 1 +鮑魚 1 +鮮 1 +鮮有 1 +鮮用 1 +鮮虞 1 +鯉齒 1 +鰓蓋 1 +鰭條 1 +鰺沢駅 1 +鱗 1 +鱗甲 1 +鱗骨 1 +鳥 1 +鳥獸 1 +鳥種 1 +鳳 1 +鳳彬 1 +鳴叫 1 +鳴放 1 +鳴道 1 +鴛鴦 1 +鴻南 1 +鴻章 1 +鴻績 1 +鴻華 1 +鴻超 1 +鴻逵 1 +鴻銘 1 +鹽 1 +鹽城 1 +鹽州 1 +鹽酸 1 +鹿兒島 1 +鹿鼎 1 +麒 1 +麗晶 1 +麗泰 1 +麗珍 1 +麗華 1 +麗閣 1 +麥克 1 +麥克佛森 1 +麥克羅伯特森 1 +麥克默多 1 +麥加利 1 +麥卡特尼 1 +麥拉倫 1 +麥格林 1 +麥當勞 1 +麥芽 1 +麥迪文 1 +麩氨酸 1 +麵 1 +麵團 1 +麵皮 1 +麻城 1 +麻塞諸塞 1 +麻將 1 +麻布 1 +麻木 1 +麻痹 1 +黃岡 1 +黃巾 1 +黃昏 1 +黃沙 1 +黃河 1 +黃蜂 1 +黎家 1 +黎明 1 +黎筍 1 +黑奴 1 +黑帶 1 +黑手 1 +黑暗 1 +黑木 1 +黑板 1 +黑死 1 +黑海 1 +黑衫 1 +黑錢 1 +黑鐵木 1 +黑雲 1 +黑髮 1 +默多克 1 +默比施 1 +默默 1 +黛安娜 1 +黛絲 1 +點陣 1 +點點頭 1 +黨團 1 +黨委 1 +黨校 1 +黨歌 1 +黨衛 1 +黨部 1 +黨魁 1 +鼎灶 1 +鼎芬 1 +鼎金 1 +鼓手 1 +鼬鼠 1 +齊國 1 +齋 1 +齒狀 1 +齒輪 1 +齲齒 1 +龍台 1 +龍女 1 +龍文 1 +龍耳 1 +龍頭 1 +龐 1 +龐特佛雷特 1 +龐貝 1 +龜茲 1 diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.data-00000-of-00001 b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.data-00000-of-00001 new file mode 100644 index 0000000000000000000000000000000000000000..1f4b2bbf058ec535366b114289e904a625225537 Binary files /dev/null and b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.data-00000-of-00001 differ diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.index b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.index new file mode 100644 index 0000000000000000000000000000000000000000..0223e61fd79068db6dfc2e3e70ec4c21272f1d0c Binary files /dev/null and b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.index differ diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.meta b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.meta new file mode 100644 index 0000000000000000000000000000000000000000..e765f61837709bc7dd6b975e772da996f421ee1b Binary files /dev/null and b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.checkpoint.meta differ diff --git a/syntaxnet/dragnn/conll2017/sample/zh-segmenter.master_spec b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.master_spec new file mode 100644 index 0000000000000000000000000000000000000000..ceca40c309bfd058581dd7a8abb15a976ca6e108 --- /dev/null +++ b/syntaxnet/dragnn/conll2017/sample/zh-segmenter.master_spec @@ -0,0 +1,187 @@ +component { + name: "lookahead" + transition_system { + registered_name: "shift-only" + parameters { + key: "left_to_right" + value: "false" + } + } + resource { + name: "word-map" + part { + file_pattern: "word-map" + } + } + resource { + name: "tag-map" + part { + file_pattern: "tag-map" + } + } + resource { + name: "tag-to-category" + part { + file_pattern: "tag-to-category" + } + } + resource { + name: "lcword-map" + part { + file_pattern: "lcword-map" + } + } + resource { + name: "category-map" + part { + file_pattern: "category-map" + } + } + resource { + name: "char-map" + part { + file_pattern: "char-map" + } + } + resource { + name: "char-ngram-map" + part { + file_pattern: "char-ngram-map" + } + } + resource { + name: "label-map" + part { + file_pattern: "label-map" + } + } + resource { + name: "prefix-table" + part { + file_pattern: "prefix-table" + } + } + resource { + name: "suffix-table" + part { + file_pattern: "suffix-table" + } + } + fixed_feature { + name: "char" + fml: "input(-1).char input.char input(1).char" + embedding_dim: 32 + vocabulary_size: 3521 + size: 3 + } + fixed_feature { + name: "char-bigram" + fml: "input.char-bigram" + embedding_dim: 32 + vocabulary_size: 6579 + size: 1 + } + network_unit { + registered_name: "wrapped_units.LayerNormBasicLSTMNetwork" + parameters { + key: "hidden_layer_sizes" + value: "256" + } + } + backend { + registered_name: "SyntaxNetComponent" + } + num_actions: 1 + component_builder { + registered_name: "DynamicComponentBuilder" + } +} +component { + name: "segmenter" + transition_system { + registered_name: "binary-segment-transitions" + } + resource { + name: "word-map" + part { + file_pattern: "word-map" + } + } + resource { + name: "tag-map" + part { + file_pattern: "tag-map" + } + } + resource { + name: "tag-to-category" + part { + file_pattern: "tag-to-category" + } + } + resource { + name: "lcword-map" + part { + file_pattern: "lcword-map" + } + } + resource { + name: "category-map" + part { + file_pattern: "category-map" + } + } + resource { + name: "char-map" + part { + file_pattern: "char-map" + } + } + resource { + name: "char-ngram-map" + part { + file_pattern: "char-ngram-map" + } + } + resource { + name: "label-map" + part { + file_pattern: "label-map" + } + } + resource { + name: "prefix-table" + part { + file_pattern: "prefix-table" + } + } + resource { + name: "suffix-table" + part { + file_pattern: "suffix-table" + } + } + linked_feature { + name: "lookahead" + fml: "input.focus stack.focus" + embedding_dim: 32 + size: 2 + source_component: "lookahead" + source_translator: "reverse-token" + source_layer: "state_h_0" + } + network_unit { + registered_name: "wrapped_units.LayerNormBasicLSTMNetwork" + parameters { + key: "hidden_layer_sizes" + value: "128" + } + } + backend { + registered_name: "SyntaxNetComponent" + } + num_actions: 2 + component_builder { + registered_name: "DynamicComponentBuilder" + } +} diff --git a/syntaxnet/dragnn/core/BUILD b/syntaxnet/dragnn/core/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..24978ab755965ed06cbd5ffb391404a4693b00ae --- /dev/null +++ b/syntaxnet/dragnn/core/BUILD @@ -0,0 +1,331 @@ +package( + default_visibility = ["//visibility:public"], + features = ["-layering_check"], +) + +# Test data. +filegroup( + name = "testdata", + data = glob(["testdata/**"]), +) + +cc_library( + name = "beam", + hdrs = ["beam.h"], + deps = [ + "//dragnn/core/interfaces:cloneable_transition_state", + "//dragnn/core/interfaces:transition_state", + "//syntaxnet:base", + ], +) + +cc_library( + name = "component_registry", + srcs = ["component_registry.cc"], + hdrs = ["component_registry.h"], + deps = [ + "//dragnn/core/interfaces:component", + "//syntaxnet:registry", + ], +) + +cc_library( + name = "compute_session", + hdrs = ["compute_session.h"], + deps = [ + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core:index_translator", + "//dragnn/core/interfaces:component", + "//dragnn/protos:spec_proto", + "//dragnn/protos:trace_proto", + ], +) + +cc_library( + name = "compute_session_impl", + srcs = ["compute_session_impl.cc"], + hdrs = ["compute_session_impl.h"], + deps = [ + ":compute_session", + ":index_translator", + ":input_batch_cache", + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/protos:data_proto", + "//dragnn/protos:spec_proto", + "//dragnn/protos:trace_proto", + "//syntaxnet:base", + "//syntaxnet:registry", + ], +) + +cc_library( + name = "compute_session_pool", + srcs = ["compute_session_pool.cc"], + hdrs = ["compute_session_pool.h"], + deps = [ + ":component_registry", + ":compute_session", + ":compute_session_impl", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + ], +) + +cc_library( + name = "index_translator", + srcs = ["index_translator.cc"], + hdrs = ["index_translator.h"], + deps = [ + "//dragnn/core/interfaces:component", + "//dragnn/core/interfaces:transition_state", + "//syntaxnet:base", + ], +) + +cc_library( + name = "input_batch_cache", + hdrs = ["input_batch_cache.h"], + deps = [ + "//dragnn/core/interfaces:input_batch", + "//syntaxnet:base", + ], +) + +cc_library( + name = "resource_container", + hdrs = ["resource_container.h"], + deps = ["//syntaxnet:base"], +) + +# Tests + +cc_test( + name = "beam_test", + srcs = ["beam_test.cc"], + deps = [ + ":beam", + "//dragnn/core/interfaces:cloneable_transition_state", + "//dragnn/core/interfaces:transition_state", + "//dragnn/core/test:mock_transition_state", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_test( + name = "compute_session_impl_test", + srcs = ["compute_session_impl_test.cc"], + deps = [ + ":component_registry", + ":compute_session", + ":compute_session_impl", + ":compute_session_pool", + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core/interfaces:component", + "//dragnn/core/test:generic", + "//dragnn/core/test:mock_component", + "//dragnn/core/test:mock_transition_state", + "//syntaxnet:base", + ], +) + +cc_test( + name = "compute_session_pool_test", + srcs = ["compute_session_pool_test.cc"], + deps = [ + ":compute_session", + ":compute_session_pool", + "//dragnn/core/test:generic", + "//dragnn/core/test:mock_component", + "//dragnn/core/test:mock_compute_session", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_test( + name = "index_translator_test", + srcs = ["index_translator_test.cc"], + deps = [ + ":index_translator", + "//dragnn/core/test:mock_component", + "//dragnn/core/test:mock_transition_state", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_test( + name = "input_batch_cache_test", + srcs = ["input_batch_cache_test.cc"], + deps = [ + ":input_batch_cache", + "//dragnn/core/interfaces:input_batch", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_test( + name = "resource_container_test", + srcs = ["resource_container_test.cc"], + deps = [ + ":resource_container", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +# Tensorflow op kernel BUILD rules. + +load( + "//dragnn:tensorflow_ops.bzl", + "tf_gen_op_libs", + "tf_gen_op_wrapper_py", + "tf_kernel_library", +) + +tf_gen_op_libs( + op_lib_names = ["dragnn_ops"], +) + +tf_gen_op_wrapper_py( + name = "dragnn_ops", + deps = [":dragnn_ops_op_lib"], +) + +tf_gen_op_libs( + op_lib_names = ["dragnn_bulk_ops"], +) + +tf_gen_op_wrapper_py( + name = "dragnn_bulk_ops", + deps = [":dragnn_bulk_ops_op_lib"], +) + +cc_library( + name = "compute_session_op", + srcs = [ + "ops/compute_session_op.cc", + ], + hdrs = ["ops/compute_session_op.h"], + deps = [ + ":compute_session", + ":resource_container", + "//syntaxnet:base", + "@org_tensorflow//third_party/eigen3", + ], +) + +cc_library( + name = "dragnn_ops_cc", + srcs = [ + "ops/dragnn_op_kernels.cc", + "ops/dragnn_ops.cc", + ], + deps = [ + ":compute_session", + ":compute_session_op", + ":compute_session_pool", + ":resource_container", + "//dragnn/protos:data_proto", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "@org_tensorflow//third_party/eigen3", + ], + alwayslink = 1, +) + +cc_library( + name = "dragnn_bulk_ops_cc", + srcs = [ + "ops/dragnn_bulk_op_kernels.cc", + "ops/dragnn_bulk_ops.cc", + ], + deps = [ + ":compute_session_op", + ":resource_container", + "//syntaxnet:base", + "@org_tensorflow//third_party/eigen3", + ], +) + +# Tensorflow kernel libraries, for use with unit tests. + +tf_kernel_library( + name = "dragnn_op_kernels", + srcs = [ + "ops/dragnn_op_kernels.cc", + "ops/dragnn_ops.cc", + ], + hdrs = [ + ], + deps = [ + ":compute_session", + ":compute_session_op", + ":compute_session_pool", + ":resource_container", + "//dragnn/protos:data_proto", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "@org_tensorflow//third_party/eigen3", + ], +) + +tf_kernel_library( + name = "dragnn_bulk_op_kernels", + srcs = [ + "ops/dragnn_bulk_op_kernels.cc", + "ops/dragnn_bulk_ops.cc", + ], + hdrs = [ + ], + deps = [ + ":compute_session", + ":compute_session_op", + ":compute_session_pool", + ":resource_container", + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "@org_tensorflow//tensorflow/core:protos_all_cc", + "@org_tensorflow//third_party/eigen3", + ], +) + +# Tensorflow kernel tests. + +cc_test( + name = "dragnn_op_kernels_test", + srcs = ["ops/dragnn_op_kernels_test.cc"], + deps = [ + ":compute_session", + ":compute_session_pool", + ":dragnn_op_kernels", + ":resource_container", + "//dragnn/core/test:generic", + "//dragnn/core/test:mock_compute_session", + "//syntaxnet:base", + "//syntaxnet:test_main", + "@org_tensorflow//tensorflow/core:protos_all_cc", + "@org_tensorflow//tensorflow/core/kernels:ops_testutil", + "@org_tensorflow//tensorflow/core/kernels:ops_util", + "@org_tensorflow//tensorflow/core/kernels:quantized_ops", + ], +) + +cc_test( + name = "dragnn_bulk_op_kernels_test", + srcs = ["ops/dragnn_bulk_op_kernels_test.cc"], + deps = [ + ":compute_session_pool", + ":dragnn_bulk_op_kernels", + ":resource_container", + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core/test:mock_compute_session", + "//syntaxnet:base", + "//syntaxnet:test_main", + "@org_tensorflow//tensorflow/core/kernels:ops_testutil", + "@org_tensorflow//tensorflow/core/kernels:quantized_ops", + ], +) diff --git a/syntaxnet/dragnn/core/beam.h b/syntaxnet/dragnn/core/beam.h new file mode 100644 index 0000000000000000000000000000000000000000..534cfcd04109494270138556066d8d2dac4b5852 --- /dev/null +++ b/syntaxnet/dragnn/core/beam.h @@ -0,0 +1,363 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_BEAM_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_BEAM_H_ + +#include +#include +#include +#include + +#include "dragnn/core/interfaces/cloneable_transition_state.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +// The Beam class wraps the logic necessary to advance a set of transition +// states for an arbitrary Component. Because the Beam class is generic, it +// doesn't know how to act on the states it is provided - the instantiating +// Component is expected to provide it the three functions it needs to interact +// with that Component's TransitionState subclasses. + +template +class Beam { + public: + // Creates a new Beam which can grow up to max_size elements. + explicit Beam(int max_size) : max_size_(max_size), num_steps_(0) { + VLOG(2) << "Creating beam with max size " << max_size_; + static_assert( + std::is_base_of, T>::value, + "This class must be instantiated to use a CloneableTransitionState"); + } + + // Sets the Beam functions, as follows: + // bool is_allowed(TransitionState *, int): Return true if transition 'int' is + // allowed for transition state 'TransitionState *'. + // void perform_transition(TransitionState *, int): Performs transition 'int' + // on transition state 'TransitionState *'. + // int oracle_function(TransitionState *): Returns the oracle-specified action + // for transition state 'TransitionState *'. + void SetFunctions(std::function is_allowed, + std::function is_final, + std::function perform_transition, + std::function oracle_function) { + is_allowed_ = is_allowed; + is_final_ = is_final; + perform_transition_ = perform_transition; + oracle_function_ = oracle_function; + } + + // Resets the Beam and initializes it with the given set of states. The Beam + // takes ownership of these TransitionStates. + void Init(std::vector> initial_states) { + VLOG(2) << "Initializing beam. Beam max size is " << max_size_; + CHECK_LE(initial_states.size(), max_size_) + << "Attempted to initialize a beam with more states (" + << initial_states.size() << ") than the max size " << max_size_; + beam_ = std::move(initial_states); + std::vector previous_beam_indices(max_size_, -1); + for (int i = 0; i < beam_.size(); ++i) { + previous_beam_indices.at(i) = beam_[i]->ParentBeamIndex(); + beam_[i]->SetBeamIndex(i); + } + beam_index_history_.emplace_back(previous_beam_indices); + } + + // Advances the Beam from the given transition matrix. + void AdvanceFromPrediction(const float transition_matrix[], int matrix_length, + int num_actions) { + // Ensure that the transition matrix is the correct size. All underlying + // states should have the same transition profile, so using the one at 0 + // should be safe. + CHECK_EQ(matrix_length, max_size_ * num_actions) + << "Transition matrix size does not match max beam size * number of " + "state transitions!"; + + if (max_size_ == 1) { + // In the case where beam size is 1, we can advance by simply finding the + // highest score and advancing the beam state in place. + VLOG(2) << "Beam size is 1. Using fast beam path."; + int best_action = -1; + float best_score = -INFINITY; + auto &state = beam_[0]; + for (int action_idx = 0; action_idx < num_actions; ++action_idx) { + if (is_allowed_(state.get(), action_idx) && + transition_matrix[action_idx] > best_score) { + best_score = transition_matrix[action_idx]; + best_action = action_idx; + } + } + CHECK_GE(best_action, 0) << "Num actions: " << num_actions + << " score[0]: " << transition_matrix[0]; + perform_transition_(state.get(), best_action); + const float new_score = state->GetScore() + best_score; + state->SetScore(new_score); + state->SetBeamIndex(0); + } else { + // Create the vector of all possible transitions, along with their scores. + std::vector candidates; + + // Iterate through all beams, examining all actions for each beam. + for (int beam_idx = 0; beam_idx < beam_.size(); ++beam_idx) { + const auto &state = beam_[beam_idx]; + for (int action_idx = 0; action_idx < num_actions; ++action_idx) { + // If the action is allowed, calculate the proposed new score and add + // the candidate action to the vector of all actions at this state. + if (is_allowed_(state.get(), action_idx)) { + Transition candidate; + + // The matrix is laid out by beam index, with a linear set of + // actions for that index - so beam N's actions start at [nr. of + // actions]*[N]. + const int matrix_idx = action_idx + beam_idx * num_actions; + CHECK_LT(matrix_idx, matrix_length) + << "Matrix index out of bounds!"; + const double score_delta = transition_matrix[matrix_idx]; + CHECK(!std::isnan(score_delta)); + candidate.source_idx = beam_idx; + candidate.action = action_idx; + candidate.resulting_score = state->GetScore() + score_delta; + candidates.emplace_back(candidate); + } + } + } + + // Sort the vector of all possible transitions and scores. + const auto comparator = [](const Transition &a, const Transition &b) { + return a.resulting_score > b.resulting_score; + }; + std::stable_sort(candidates.begin(), candidates.end(), comparator); + + // Apply the top transitions, up to a maximum of 'max_size_'. + std::vector> new_beam; + std::vector previous_beam_indices(max_size_, -1); + const int beam_size = + std::min(max_size_, static_cast(candidates.size())); + VLOG(2) << "Previous beam size = " << beam_.size(); + VLOG(2) << "New beam size = " << beam_size; + VLOG(2) << "Maximum beam size = " << max_size_; + for (int i = 0; i < beam_size; ++i) { + // Get the source of the i'th transition. + const auto &transition = candidates[i]; + VLOG(2) << "Taking transition with score: " + << transition.resulting_score + << " and action: " << transition.action; + VLOG(2) << "transition.source_idx = " << transition.source_idx; + const auto &source = beam_[transition.source_idx]; + + // Put the new transition on the new state beam. + auto new_state = source->Clone(); + perform_transition_(new_state.get(), transition.action); + new_state->SetScore(transition.resulting_score); + new_state->SetBeamIndex(i); + previous_beam_indices.at(i) = transition.source_idx; + new_beam.emplace_back(std::move(new_state)); + } + + beam_ = std::move(new_beam); + beam_index_history_.emplace_back(previous_beam_indices); + } + + ++num_steps_; + } + + // Advances the Beam from the state oracles. + void AdvanceFromOracle() { + std::vector previous_beam_indices(max_size_, -1); + for (int i = 0; i < beam_.size(); ++i) { + previous_beam_indices.at(i) = i; + if (is_final_(beam_[i].get())) continue; + const auto oracle_label = oracle_function_(beam_[i].get()); + VLOG(2) << "AdvanceFromOracle beam_index:" << i + << " oracle_label:" << oracle_label; + perform_transition_(beam_[i].get(), oracle_label); + beam_[i]->SetScore(0.0); + beam_[i]->SetBeamIndex(i); + } + if (max_size_ > 1) { + beam_index_history_.emplace_back(previous_beam_indices); + } + num_steps_++; + } + + // Returns true if all states in the beam are final. + bool IsTerminal() { + for (auto &state : beam_) { + if (!is_final_(state.get())) { + return false; + } + } + return true; + } + + // Destroys the states held by this beam and resets its history. + void Reset() { + beam_.clear(); + beam_index_history_.clear(); + num_steps_ = 0; + } + + // Given an index into the current beam, determine the index of the item's + // parent at beam step "step", which should be less than the total number + // of steps taken by this beam. + int FindPreviousIndex(int current_index, int step) const { + VLOG(2) << "FindPreviousIndex requested for current_index:" << current_index + << " at step:" << step; + if (VLOG_IS_ON(2)) { + int step_index = 0; + for (const auto &step : beam_index_history_) { + string row = + "Step " + std::to_string(step_index) + " element source slot: "; + for (const auto &index : step) { + if (index == -1) { + row += " X"; + } else { + row += " " + std::to_string(index); + } + } + VLOG(2) << row; + ++step_index; + } + } + + // If the max size of the beam is 1, make sure the steps are in sync with + // the size. + if (max_size_ > 1) { + CHECK(num_steps_ == beam_index_history_.size() - 1); + } + + // Check if the step is too far into the past or future. + if (step < 0 || step > num_steps_) { + return -1; + } + + // Check that the index is within the beam. + if (current_index < 0 || current_index >= max_size_) { + return -1; + } + + // If the max size of the beam is 1, always return 0. + if (max_size_ == 1) { + return 0; + } + + // Check that the start index isn't -1; -1 means that we don't have an + // actual transition state in that beam slot. + if (beam_index_history_.back().at(current_index) == -1) { + return -1; + } + + int beam_index = current_index; + for (int i = beam_index_history_.size() - 1; i >= step; --i) { + beam_index = beam_index_history_.at(i).at(beam_index); + } + CHECK_GE(beam_index, 0); + VLOG(2) << "Index is " << beam_index; + return beam_index; + } + + // Returns the current state of the beam. + std::vector beam() const { + std::vector state_ptrs; + for (const auto &beam_state : beam_) { + state_ptrs.emplace_back(beam_state.get()); + } + return state_ptrs; + } + + // Returns the beam at the current state index. + T *beam_state(int beam_index) { return beam_.at(beam_index).get(); } + + // Returns the raw history vectors for this beam. + const std::vector> &history() { + if (max_size_ == 1) { + // If max size is 1, we haven't been keeping track of the beam. Quick + // create it. + beam_index_history_.clear(); + beam_index_history_.push_back({beam_[0]->ParentBeamIndex()}); + for (int i = 0; i < num_steps_; ++i) { + beam_index_history_.push_back({0}); + } + } + return beam_index_history_; + } + + // Sets the max size of the beam. + void SetMaxSize(int max_size) { + max_size_ = max_size; + Reset(); + } + + // Returns the number of steps taken so far. + const int num_steps() const { return num_steps_; } + + // Returns the max size of this beam. + const int max_size() const { return max_size_; } + + // Returns the current size of the beam. + const int size() const { return beam_.size(); } + + private: + // Associates an action taken on an index into current_state_ with a score. + struct Transition { + // The index of the source item. + int source_idx; + + // The index of the action being taken. + int action; + + // The score of the full derivation. + double resulting_score; + }; + + // The maximum beam size. + int max_size_; + + // The current beam. + std::vector> beam_; + + // Function to check if a transition is allowed for a given state. + std::function is_allowed_; + + // Function to check if a state is final. + std::function is_final_; + + // Function to perform a transition on a given state. + std::function perform_transition_; + + // Function to provide the oracle action for a given state. + std::function oracle_function_; + + // The history of the states in this beam. The vector indexes across steps. + // For every step, there is a vector in the vector. This inner vector denotes + // the state of the beam at that step, and contains the beam index that + // was transitioned to create the transition state at that index (so, + // if at step 2 the transition state at beam index 4 was created by applying + // a transition to the state in beam index 3 during step 1, the query would + // be "beam_index_history_.at(2).at(4)" and the value would be 3. Empty beam + // states will return -1. + std::vector> beam_index_history_; + + // The number of steps taken so far. + int num_steps_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_BEAM_H_ diff --git a/syntaxnet/dragnn/core/beam_test.cc b/syntaxnet/dragnn/core/beam_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..7ff1f32991bf2345178f74db0485ff05f8725fa8 --- /dev/null +++ b/syntaxnet/dragnn/core/beam_test.cc @@ -0,0 +1,788 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/beam.h" + +#include "dragnn/core/interfaces/cloneable_transition_state.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/core/test/mock_transition_state.h" +#include +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +using testing::MockFunction; +using testing::Return; +using testing::Ne; +using testing::_; + +namespace { + +// ***************************************************************************** +// Test-internal class definitions. +// ***************************************************************************** + +// Create a very basic transition state to test the beam. All it does is keep +// track of its current beam index and score, as well as providing a field +// for the transition function to write in what transition occurred. +// Note that this class does not fulfill the entire TransitionState contract, +// since it is only used in this particular test. +class TestTransitionState + : public CloneableTransitionState { + public: + TestTransitionState() {} + + void Init(const TransitionState &parent) override {} + + std::unique_ptr Clone() const override { + std::unique_ptr ptr(new TestTransitionState()); + return ptr; + } + + const int ParentBeamIndex() const override { return parent_beam_index_; } + + // Get the current beam index for this state. + const int GetBeamIndex() const override { return beam_index_; } + + // Set the current beam index for this state. + void SetBeamIndex(const int index) override { beam_index_ = index; } + + // Get the score associated with this transition state. + const float GetScore() const override { return score_; } + + // Set the score associated with this transition state. + void SetScore(const float score) override { score_ = score; } + + // Depicts this state as an HTML-language string. + string HTMLRepresentation() const override { return ""; } + + int parent_beam_index_; + + int beam_index_; + + float score_; + + int transition_action_; +}; + +// This transition function annotates a TestTransitionState with the action that +// was chosen for the transition. +auto transition_function = [](TestTransitionState *state, int action) { + TestTransitionState *cast_state = dynamic_cast(state); + cast_state->transition_action_ = action; +}; + +// Create oracle and permission functions that do nothing. +auto null_oracle = [](TestTransitionState *) { return 0; }; +auto null_permissions = [](TestTransitionState *, int) { return true; }; +auto null_finality = [](TestTransitionState *) { return false; }; + +// Create a unique_ptr with a test transition state in it and set its initial +// score. +std::unique_ptr CreateState(float score) { + std::unique_ptr state; + state.reset(new TestTransitionState()); + state->SetScore(score); + return state; +} + +// Convenience accessor for the action field in TestTransitionState. +int GetTransition(const TransitionState *state) { + return (dynamic_cast(state))->transition_action_; +} + +// Convenience accessor for the parent_beam_index_ field in TestTransitionState. +void SetParentBeamIndex(TransitionState *state, int index) { + (dynamic_cast(state))->parent_beam_index_ = index; +} + +} // namespace + +// ***************************************************************************** +// Tests begin here. +// ***************************************************************************** +TEST(BeamTest, AdvancesFromPredictionWithSingleBeam) { + // Create a matrix of transitions. + constexpr int kNumTransitions = 4; + constexpr int kMatrixSize = kNumTransitions; + constexpr float matrix[kMatrixSize] = {30.0, 20.0, 40.0, 10.0}; + constexpr int kBestTransition = 2; + constexpr float kOldScore = 3.0; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(kOldScore)); + constexpr int kBeamSize = 1; + Beam beam(kBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), kBeamSize); + + // Make sure the state has performed the expected transition. + EXPECT_EQ(GetTransition(beam.beam().at(0)), kBestTransition); + + // Make sure the state has had its score updated properly. + EXPECT_EQ(beam.beam().at(0)->GetScore(), kOldScore + matrix[kBestTransition]); + + // Make sure that the beam index field is consistent with the actual beam idx. + EXPECT_EQ(beam.beam().at(0)->GetBeamIndex(), 0); + + // Make sure that the beam_state accessor actually accesses the beam. + EXPECT_EQ(beam.beam().at(0), beam.beam_state(0)); + + // Validate the beam history field. + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 0); +} + +TEST(BeamTest, AdvancingCreatesNewTransitions) { + // Create a matrix of transitions. + constexpr int kMaxBeamSize = 8; + constexpr int kNumTransitions = 4; + constexpr int kMatrixSize = kNumTransitions * kMaxBeamSize; + constexpr float matrix[kMatrixSize] = { + 30.0, 20.0, 40.0, 10.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, + 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, + 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0}; + constexpr float kOldScore = 4.0; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(kOldScore)); + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), 4); + + // Make sure the state has performed the expected transition. + EXPECT_EQ(GetTransition(beam.beam().at(0)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(1)), 0); + EXPECT_EQ(GetTransition(beam.beam().at(2)), 1); + EXPECT_EQ(GetTransition(beam.beam().at(3)), 3); + + // Make sure the state has had its score updated properly. + EXPECT_EQ(beam.beam().at(0)->GetScore(), kOldScore + matrix[2]); + EXPECT_EQ(beam.beam().at(1)->GetScore(), kOldScore + matrix[0]); + EXPECT_EQ(beam.beam().at(2)->GetScore(), kOldScore + matrix[1]); + EXPECT_EQ(beam.beam().at(3)->GetScore(), kOldScore + matrix[3]); + + // Make sure that the beam index field is consistent with the actual beam idx. + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(beam.beam().at(i)->GetBeamIndex(), i); + } + + // In this case, we expect the top 4 results to have come from state 0 and + // the remaining 4 slots to be empty (-1). + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 0); + EXPECT_EQ(history.at(1).at(1), 0); + EXPECT_EQ(history.at(1).at(2), 0); + EXPECT_EQ(history.at(1).at(3), 0); + EXPECT_EQ(history.at(1).at(4), -1); + EXPECT_EQ(history.at(1).at(5), -1); + EXPECT_EQ(history.at(1).at(6), -1); + EXPECT_EQ(history.at(1).at(7), -1); +} + +TEST(BeamTest, MultipleElementBeamsAdvanceAllElements) { + // Create a matrix of transitions. + constexpr int kMaxBeamSize = 8; + constexpr int kNumTransitions = 4; + constexpr int kMatrixSize = kNumTransitions * kMaxBeamSize; + + constexpr float matrix[kMatrixSize] = { + 30.0, 20.0, 40.0, 10.0, // State 0 + 31.0, 21.0, 41.0, 11.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, + 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, + 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0}; + + constexpr float kOldScores[] = {5.0, 7.0}; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(kOldScores[0])); + states.push_back(CreateState(kOldScores[1])); + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), 8); + + // Make sure the state has performed the expected transition. + // Note that the transition index is not the index into the matrix, but rather + // the index into the matrix 'row' for that state. + EXPECT_EQ(GetTransition(beam.beam().at(0)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(1)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(2)), 0); + EXPECT_EQ(GetTransition(beam.beam().at(3)), 0); + EXPECT_EQ(GetTransition(beam.beam().at(4)), 1); + EXPECT_EQ(GetTransition(beam.beam().at(5)), 1); + EXPECT_EQ(GetTransition(beam.beam().at(6)), 3); + EXPECT_EQ(GetTransition(beam.beam().at(7)), 3); + + // Make sure the state has had its score updated properly. + EXPECT_EQ(beam.beam().at(0)->GetScore(), kOldScores[1] + matrix[6]); + EXPECT_EQ(beam.beam().at(1)->GetScore(), kOldScores[0] + matrix[2]); + EXPECT_EQ(beam.beam().at(2)->GetScore(), kOldScores[1] + matrix[4]); + EXPECT_EQ(beam.beam().at(3)->GetScore(), kOldScores[0] + matrix[0]); + EXPECT_EQ(beam.beam().at(4)->GetScore(), kOldScores[1] + matrix[5]); + EXPECT_EQ(beam.beam().at(5)->GetScore(), kOldScores[0] + matrix[1]); + EXPECT_EQ(beam.beam().at(6)->GetScore(), kOldScores[1] + matrix[7]); + EXPECT_EQ(beam.beam().at(7)->GetScore(), kOldScores[0] + matrix[3]); + + // Make sure that the beam index field is consistent with the actual beam idx. + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(beam.beam().at(i)->GetBeamIndex(), i); + } + + // Validate the history at this step. + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 1); + EXPECT_EQ(history.at(1).at(1), 0); + EXPECT_EQ(history.at(1).at(2), 1); + EXPECT_EQ(history.at(1).at(3), 0); + EXPECT_EQ(history.at(1).at(4), 1); + EXPECT_EQ(history.at(1).at(5), 0); + EXPECT_EQ(history.at(1).at(6), 1); + EXPECT_EQ(history.at(1).at(7), 0); +} + +TEST(BeamTest, AdvancingDropsLowValuePredictions) { + // Create a matrix of transitions. + constexpr int kNumTransitions = 4; + constexpr int kMaxBeamSize = 8; + constexpr int kMatrixSize = kNumTransitions * kMaxBeamSize; + constexpr float matrix[kMatrixSize] = {30.0, 20.0, 40.0, 10.0, // State 0 + 31.0, 21.0, 41.0, 11.0, // State 1 + 32.0, 22.0, 42.0, 12.0, // State 2 + 33.0, 23.0, 43.0, 13.0, // State 3 + 34.0, 24.0, 44.0, 14.0, // State 4 + 35.0, 25.0, 45.0, 15.0, // State 5 + 36.0, 26.0, 46.0, 16.0, // State 6 + 37.0, 27.0, 47.0, 17.0}; // State 7 + constexpr float kOldScores[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8}; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(kOldScores[0])); + states.push_back(CreateState(kOldScores[1])); + states.push_back(CreateState(kOldScores[2])); + states.push_back(CreateState(kOldScores[3])); + states.push_back(CreateState(kOldScores[4])); + states.push_back(CreateState(kOldScores[5])); + states.push_back(CreateState(kOldScores[6])); + states.push_back(CreateState(kOldScores[7])); + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), 8); + + // Make sure the state has performed the expected transition. + // In this case, every state will perform transition 2. + EXPECT_EQ(GetTransition(beam.beam().at(0)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(1)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(2)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(3)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(4)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(5)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(6)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(7)), 2); + + // Make sure the state has had its score updated properly. (Note that row + // 0 had the smallest transition score, so it ends up on the bottom of the + // beam, and so forth.) For the matrix index, N*kNumTransitions gets into the + // correct state row and we add 2 since that was the transition index. + EXPECT_EQ(beam.beam().at(0)->GetScore(), + kOldScores[7] + matrix[7 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(1)->GetScore(), + kOldScores[6] + matrix[6 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(2)->GetScore(), + kOldScores[5] + matrix[5 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(3)->GetScore(), + kOldScores[4] + matrix[4 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(4)->GetScore(), + kOldScores[3] + matrix[3 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(5)->GetScore(), + kOldScores[2] + matrix[2 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(6)->GetScore(), + kOldScores[1] + matrix[1 * kNumTransitions + 2]); + EXPECT_EQ(beam.beam().at(7)->GetScore(), + kOldScores[0] + matrix[0 * kNumTransitions + 2]); + + // Make sure that the beam index field is consistent with the actual beam idx. + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(beam.beam().at(i)->GetBeamIndex(), i); + } + + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 7); + EXPECT_EQ(history.at(1).at(1), 6); + EXPECT_EQ(history.at(1).at(2), 5); + EXPECT_EQ(history.at(1).at(3), 4); + EXPECT_EQ(history.at(1).at(4), 3); + EXPECT_EQ(history.at(1).at(5), 2); + EXPECT_EQ(history.at(1).at(6), 1); + EXPECT_EQ(history.at(1).at(7), 0); +} + +TEST(BeamTest, AdvancesFromOracleWithSingleBeam) { + // Create an oracle function for this state. + constexpr int kOracleLabel = 3; + auto oracle_function = [](TransitionState *) { return kOracleLabel; }; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(0.0)); + constexpr int kBeamSize = 1; + Beam beam(kBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + oracle_function); + beam.Init(std::move(states)); + beam.AdvanceFromOracle(); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), kBeamSize); + + // Make sure the state has performed the expected transition. + EXPECT_EQ(GetTransition(beam.beam().at(0)), kOracleLabel); + + // Make sure the state has had its score held to 0. + EXPECT_EQ(beam.beam().at(0)->GetScore(), 0.0); + + // Make sure that the beam index field is consistent with the actual beam idx. + EXPECT_EQ(beam.beam().at(0)->GetBeamIndex(), 0); + + // Validate the beam history field. + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 0); +} + +TEST(BeamTest, AdvancesFromOracleWithMultipleStates) { + constexpr int kMaxBeamSize = 8; + + // Create a beam with 8 transition states. + std::vector> states; + for (int i = 0; i < kMaxBeamSize; ++i) { + // This is nonzero to test the oracle holding scores to 0. + states.push_back(CreateState(10.0)); + } + + std::vector expected_actions; + + // Create an oracle function for this state. Use mocks for finer control. + testing::MockFunction mock_oracle_function; + for (int i = 0; i < kMaxBeamSize; ++i) { + // We expect each state to be queried for its oracle label, + // and then to be transitioned in place with its oracle label. + int oracle_label = i % 3; // 3 is arbitrary. + EXPECT_CALL(mock_oracle_function, Call(states.at(i).get())) + .WillOnce(Return(oracle_label)); + expected_actions.push_back(oracle_label); + } + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + mock_oracle_function.AsStdFunction()); + beam.Init(std::move(states)); + beam.AdvanceFromOracle(); + + // Make sure the state has performed the expected transition, has had its + // score held to 0, and is self consistent. + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(GetTransition(beam.beam().at(i)), expected_actions.at(i)); + EXPECT_EQ(beam.beam().at(i)->GetScore(), 0.0); + EXPECT_EQ(beam.beam().at(i)->GetBeamIndex(), i); + } + + auto history = beam.history(); + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(history.at(1).at(i), i); + } +} + +TEST(BeamTest, ReportsNonFinality) { + constexpr int kMaxBeamSize = 8; + + // Create a beam with 8 transition states. + std::vector> states; + for (int i = 0; i < kMaxBeamSize; ++i) { + // This is nonzero to test the oracle holding scores to 0. + states.push_back(CreateState(10.0)); + } + + std::vector expected_actions; + + // Create a finality function for this state. Use mocks for finer control. + testing::MockFunction mock_finality_function; + + // Make precisely one call return false, which should cause IsFinal + // to report false. + constexpr int incomplete_state = 3; + EXPECT_CALL(mock_finality_function, Call(states.at(incomplete_state).get())) + .WillOnce(Return(false)); + EXPECT_CALL(mock_finality_function, + Call(Ne(states.at(incomplete_state).get()))) + .WillRepeatedly(Return(true)); + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, mock_finality_function.AsStdFunction(), + transition_function, null_oracle); + beam.Init(std::move(states)); + + EXPECT_FALSE(beam.IsTerminal()); +} + +TEST(BeamTest, ReportsFinality) { + constexpr int kMaxBeamSize = 8; + + // Create a beam with 8 transition states. + std::vector> states; + for (int i = 0; i < kMaxBeamSize; ++i) { + // This is nonzero to test the oracle holding scores to 0. + states.push_back(CreateState(10.0)); + } + + std::vector expected_actions; + + // Create a finality function for this state. Use mocks for finer control. + testing::MockFunction mock_finality_function; + + // All calls will return true, so IsFinal should return true. + EXPECT_CALL(mock_finality_function, Call(_)).WillRepeatedly(Return(true)); + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, mock_finality_function.AsStdFunction(), + transition_function, null_oracle); + beam.Init(std::move(states)); + + EXPECT_TRUE(beam.IsTerminal()); +} + +TEST(BeamTest, IgnoresForbiddenTransitionActions) { + // Create a matrix of transitions. + constexpr int kMaxBeamSize = 4; + constexpr int kNumTransitions = 4; + constexpr int kMatrixSize = kNumTransitions * kMaxBeamSize; + constexpr float matrix[kMatrixSize] = { + 10.0, 1000.0, 40.0, 30.0, 00.0, 0000.0, 00.0, 00.0, + 00.0, 0000.0, 00.0, 00.0, 00.0, 0000.0, 00.0, 00.0}; + constexpr float kOldScore = 4.0; + + // Create the beam. + std::vector> states; + states.push_back(CreateState(kOldScore)); + + // Forbid the second transition (index 1). + testing::MockFunction + mock_permission_function; + EXPECT_CALL(mock_permission_function, Call(states.at(0).get(), 0)) + .WillOnce(Return(true)); + EXPECT_CALL(mock_permission_function, Call(states.at(0).get(), 1)) + .WillOnce(Return(false)); + EXPECT_CALL(mock_permission_function, Call(states.at(0).get(), 2)) + .WillOnce(Return(true)); + EXPECT_CALL(mock_permission_function, Call(states.at(0).get(), 3)) + .WillOnce(Return(true)); + + Beam beam(kMaxBeamSize); + beam.SetFunctions(mock_permission_function.AsStdFunction(), null_finality, + transition_function, null_oracle); + beam.Init(std::move(states)); + beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), 3); + + // Make sure the state has performed the expected transition. + EXPECT_EQ(GetTransition(beam.beam().at(0)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(1)), 3); + EXPECT_EQ(GetTransition(beam.beam().at(2)), 0); + + // Make sure the state has had its score updated properly. + EXPECT_EQ(beam.beam().at(0)->GetScore(), kOldScore + matrix[2]); + EXPECT_EQ(beam.beam().at(1)->GetScore(), kOldScore + matrix[3]); + EXPECT_EQ(beam.beam().at(2)->GetScore(), kOldScore + matrix[0]); + + // Make sure that the beam index field is consistent with the actual beam idx. + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(beam.beam().at(i)->GetBeamIndex(), i); + } + + // In this case, we expect the top 3 results to have come from state 0 and + // the remaining 3 slots to be empty (-1). + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 0); + EXPECT_EQ(history.at(1).at(1), 0); + EXPECT_EQ(history.at(1).at(2), 0); + EXPECT_EQ(history.at(1).at(3), -1); +} + +TEST(BeamTest, BadlySizedMatrixDies) { + // Create a matrix of transitions. + constexpr int kNumTransitions = 4; + constexpr int kMatrixSize = 4; // We have a max beam size of 4; should be 16. + constexpr float matrix[kMatrixSize] = {30.0, 20.0, 40.0, 10.0}; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(0.0)); + states.push_back(CreateState(0.0)); + constexpr int kMaxBeamSize = 8; + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + + // This matrix should have 8 elements, not 4, so this should die. + EXPECT_DEATH(beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions), + "Transition matrix size does not match max beam size \\* number " + "of state transitions"); +} + +TEST(BeamTest, BadlySizedBeamInitializationDies) { + // Create an initialization beam too large for the max beam size. + constexpr int kMaxBeamSize = 4; + std::vector> states; + for (int i = 0; i < kMaxBeamSize + 1; ++i) { + states.push_back(CreateState(0.0)); + } + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + + // Try to initialize the beam; this should die. + EXPECT_DEATH(beam.Init(std::move(states)), + "Attempted to initialize a beam with more states"); +} + +TEST(BeamTest, ValidBeamIndicesAfterBeamInitialization) { + // Create a standard beam. + constexpr int kMaxBeamSize = 4; + std::vector> states; + for (int i = 0; i < kMaxBeamSize; ++i) { + states.push_back(CreateState(0.0)); + } + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + + beam.Init(std::move(states)); + + // Verify that all beam indices have been initialized. + for (int i = 0; i < kMaxBeamSize; ++i) { + EXPECT_EQ(i, beam.beam_state(i)->GetBeamIndex()); + } +} + +TEST(BeamTest, FindPreviousIndexTracesHistory) { + // Create a matrix of transitions. + constexpr int kNumTransitions = 4; + constexpr int kMaxBeamSize = 8; + constexpr int kMatrixSize = kNumTransitions * kMaxBeamSize; + constexpr float matrix[kMatrixSize] = { + 30.0, 20.0, 40.0, 10.0, // State 0 + 31.0, 21.0, 41.0, 11.0, // State 1 + 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, + 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0, 00.0}; + constexpr float kOldScores[] = {5.0, 7.0}; + constexpr int kParentBeamIndices[] = {1138, 42}; + + // Create the beam and transition it. + std::vector> states; + states.push_back(CreateState(kOldScores[0])); + states.push_back(CreateState(kOldScores[1])); + + // Set parent beam indices. + SetParentBeamIndex(states.at(0).get(), kParentBeamIndices[0]); + SetParentBeamIndex(states.at(1).get(), kParentBeamIndices[1]); + + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + beam.AdvanceFromPrediction(matrix, kMatrixSize, kNumTransitions); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), 8); + + // Make sure the state has performed the expected transition. + // Note that the transition index is not the index into the matrix, but rather + // the index into the matrix 'row' for that state. + EXPECT_EQ(GetTransition(beam.beam().at(0)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(1)), 2); + EXPECT_EQ(GetTransition(beam.beam().at(2)), 0); + EXPECT_EQ(GetTransition(beam.beam().at(3)), 0); + EXPECT_EQ(GetTransition(beam.beam().at(4)), 1); + EXPECT_EQ(GetTransition(beam.beam().at(5)), 1); + EXPECT_EQ(GetTransition(beam.beam().at(6)), 3); + EXPECT_EQ(GetTransition(beam.beam().at(7)), 3); + + // Make sure the state has had its score updated properly. + EXPECT_EQ(beam.beam().at(0)->GetScore(), kOldScores[1] + matrix[6]); + EXPECT_EQ(beam.beam().at(1)->GetScore(), kOldScores[0] + matrix[2]); + EXPECT_EQ(beam.beam().at(2)->GetScore(), kOldScores[1] + matrix[4]); + EXPECT_EQ(beam.beam().at(3)->GetScore(), kOldScores[0] + matrix[0]); + EXPECT_EQ(beam.beam().at(4)->GetScore(), kOldScores[1] + matrix[5]); + EXPECT_EQ(beam.beam().at(5)->GetScore(), kOldScores[0] + matrix[1]); + EXPECT_EQ(beam.beam().at(6)->GetScore(), kOldScores[1] + matrix[7]); + EXPECT_EQ(beam.beam().at(7)->GetScore(), kOldScores[0] + matrix[3]); + + // Make sure that the beam index field is consistent with the actual beam idx. + for (int i = 0; i < beam.beam().size(); ++i) { + EXPECT_EQ(beam.beam().at(i)->GetBeamIndex(), i); + } + + // Validate the history at this step. + auto history = beam.history(); + EXPECT_EQ(history.at(1).at(0), 1); + EXPECT_EQ(history.at(1).at(1), 0); + EXPECT_EQ(history.at(1).at(2), 1); + EXPECT_EQ(history.at(1).at(3), 0); + EXPECT_EQ(history.at(1).at(4), 1); + EXPECT_EQ(history.at(1).at(5), 0); + EXPECT_EQ(history.at(1).at(6), 1); + EXPECT_EQ(history.at(1).at(7), 0); + + EXPECT_EQ(history.at(0).at(0), kParentBeamIndices[0]); + EXPECT_EQ(history.at(0).at(1), kParentBeamIndices[1]); + EXPECT_EQ(history.at(0).at(2), -1); + EXPECT_EQ(history.at(0).at(3), -1); + EXPECT_EQ(history.at(0).at(4), -1); + EXPECT_EQ(history.at(0).at(5), -1); + EXPECT_EQ(history.at(0).at(6), -1); + EXPECT_EQ(history.at(0).at(7), -1); + + // Make sure that FindPreviousIndex can read through the history from step 1 + // to step 0. + constexpr int kDesiredIndex = 0; + constexpr int kCurrentIndexOne = 4; + EXPECT_EQ(beam.FindPreviousIndex(kCurrentIndexOne, kDesiredIndex), + kParentBeamIndices[1]); + + constexpr int kCurrentIndexTwo = 7; + EXPECT_EQ(beam.FindPreviousIndex(kCurrentIndexTwo, kDesiredIndex), + kParentBeamIndices[0]); +} + +TEST(BeamTest, FindPreviousIndexReturnsInError) { + // Create the beam. This now has only one history state, 0. + std::vector> states; + states.push_back(CreateState(0.0)); + constexpr int kMaxBeamSize = 8; + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + + // If the requested step is greater than the number of steps taken, expect -1. + EXPECT_EQ(beam.FindPreviousIndex(0, 1), -1); + + // If the requested step is less than 0, expect -1. + EXPECT_EQ(beam.FindPreviousIndex(0, -1), -1); + + // If the requested index does not have a state, expect -1. + EXPECT_EQ(beam.FindPreviousIndex(0, 1), -1); + + // If the requested index is less than 0, expect -1. + EXPECT_EQ(beam.FindPreviousIndex(0, -1), -1); + + // If the requested index is larger than the maximum beam size -1, expect -1. + EXPECT_EQ(beam.FindPreviousIndex(0, kMaxBeamSize), -1); +} + +TEST(BeamTest, ResetClearsBeamState) { + // Create the beam + std::vector> states; + states.push_back(CreateState(1.0)); + constexpr int kMaxBeamSize = 8; + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + + // Validate the new beam. + EXPECT_EQ(beam.beam().size(), 1); + + // Reset the beam. + beam.Reset(); + + // Validate the now-reset beam, which should be empty. + EXPECT_EQ(beam.beam().size(), 0); +} + +TEST(BeamTest, ResetClearsBeamHistory) { + // Create the beam + std::vector> states; + states.push_back(CreateState(1.0)); + constexpr int kMaxBeamSize = 8; + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + + // Validate the new beam history. + EXPECT_EQ(beam.history().size(), 1); + + // Reset the beam. + beam.Reset(); + + // Validate the now-reset beam history, which should be empty. + EXPECT_EQ(beam.history().size(), 0); +} + +TEST(BeamTest, SettingMaxSizeResetsBeam) { + // Create the beam + std::vector> states; + states.push_back(CreateState(1.0)); + constexpr int kMaxBeamSize = 8; + Beam beam(kMaxBeamSize); + beam.SetFunctions(null_permissions, null_finality, transition_function, + null_oracle); + beam.Init(std::move(states)); + + // Validate the new beam history. + EXPECT_EQ(beam.history().size(), 1); + + // Reset the beam. + constexpr int kNewMaxBeamSize = 4; + beam.SetMaxSize(kNewMaxBeamSize); + EXPECT_EQ(beam.max_size(), kNewMaxBeamSize); + + // Validate the now-reset beam history, which should be empty. + EXPECT_EQ(beam.history().size(), 0); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/component_registry.cc b/syntaxnet/dragnn/core/component_registry.cc new file mode 100644 index 0000000000000000000000000000000000000000..706ddd4e3df41e4380073ba1b3bce1becb1c99fd --- /dev/null +++ b/syntaxnet/dragnn/core/component_registry.cc @@ -0,0 +1,23 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/component_registry.h" + +namespace syntaxnet { + +// Class registry for DRAGNN components. +REGISTER_SYNTAXNET_CLASS_REGISTRY("DRAGNN Component", dragnn::Component); + +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/component_registry.h b/syntaxnet/dragnn/core/component_registry.h new file mode 100644 index 0000000000000000000000000000000000000000..7244976253ae24b0412d1b77f291aef22a89c359 --- /dev/null +++ b/syntaxnet/dragnn/core/component_registry.h @@ -0,0 +1,29 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPONENT_REGISTRY_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPONENT_REGISTRY_H_ + +#include "dragnn/core/interfaces/component.h" +#include "syntaxnet/registry.h" + +// Macro to add a component to the registry. This macro associates a class with +// its class name as a string, so FooComponent would be associated with the +// string "FooComponent". +#define REGISTER_DRAGNN_COMPONENT(component) \ + REGISTER_SYNTAXNET_CLASS_COMPONENT(syntaxnet::dragnn::Component, #component, \ + component) + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPONENT_REGISTRY_H_ diff --git a/syntaxnet/dragnn/core/compute_session.h b/syntaxnet/dragnn/core/compute_session.h new file mode 100644 index 0000000000000000000000000000000000000000..74cbc677272c8694dca43a810572b64b0c2d3315 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session.h @@ -0,0 +1,135 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_H_ + +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/index_translator.h" +#include "dragnn/core/interfaces/component.h" +#include "dragnn/protos/spec.pb.h" +#include "dragnn/protos/trace.pb.h" + +namespace syntaxnet { +namespace dragnn { + +// This defines the interface for a ComputeSession object. We only ever expect +// ComputeSessionImpl to implement the ComputeSession - this is only used +// to provide a mocking seam. + +class ComputeSession { + public: + virtual ~ComputeSession() {} + + // Initialize this ComputeSession to compute the graph defined in the given + // MasterSpec with the hyperparameters passed in the GridPoint. This should + // only be called once, when the ComputeSession is created. + virtual void Init(const MasterSpec &master_spec, + const GridPoint &hyperparams) = 0; + + // Initialize a component with data and a given maximum beam + // size. Note that attempting to initialize a component that depends on + // another component that has not yet finished will cause a CHECK failure. + virtual void InitializeComponentData(const string &component_name, + int max_beam_size) = 0; + + // Return the batch size for the given component. + virtual int BatchSize(const string &component_name) const = 0; + + // Return the beam size for the given component. + virtual int BeamSize(const string &component_name) const = 0; + + // Returns the spec used to create this ComputeSession. + virtual const ComponentSpec &Spec(const string &component_name) const = 0; + + // For a given component and linked feature channel, get the beam size of the + // component that is the source of the linked features. + virtual int SourceComponentBeamSize(const string &component_name, + int channel_id) = 0; + + // Advance the given component using the component's oracle. + virtual void AdvanceFromOracle(const string &component_name) = 0; + + // Advance the given component using the given score matrix. + virtual void AdvanceFromPrediction(const string &component_name, + const float score_matrix[], + int score_matrix_length) = 0; + + // Get the input features for the given component and channel. This passes + // through to the relevant Component's GetFixedFeatures() call. + virtual int GetInputFeatures( + const string &component_name, + std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const = 0; + + // Get the input features for the given component and channel, advancing via + // the oracle until the state is final. This passes through to the relevant + // Component's BulkGetFixedFeatures() call. + virtual int BulkGetInputFeatures(const string &component_name, + const BulkFeatureExtractor &extractor) = 0; + + // Get the input features for the given component and channel. This function + // can return empty LinkFeatures protos, which represent unused padding slots + // in the output weight tensor. + virtual std::vector GetTranslatedLinkFeatures( + const string &component_name, int channel_id) = 0; + + // Get the oracle labels for the given component. + virtual std::vector> EmitOracleLabels( + const string &component_name) = 0; + + // Returns true if the given component is terminal. + virtual bool IsTerminal(const string &component_name) = 0; + + // Force the given component to write out its predictions to the backing data. + virtual void FinalizeData(const string &component_name) = 0; + + // Return the finalized predictions from this compute session. + virtual std::vector GetSerializedPredictions() = 0; + + // Returns the trace protos. This will CHECK fail or be empty if the + // SetTracing() has not been called to initialize the underlying Component + // traces. + virtual std::vector GetTraceProtos() = 0; + + // Provides the ComputeSession with a batch of data to compute. + virtual void SetInputData(const std::vector &data) = 0; + + // Resets all components owned by this ComputeSession. + virtual void ResetSession() = 0; + + // Set the tracing for this ComputeSession. + virtual void SetTracing(bool tracing_on) = 0; + + // Returns a unique identifier for this ComputeSession. + virtual int Id() const = 0; + + // Returns a string describing the given component. + virtual string GetDescription(const string &component_name) const = 0; + + // Get all the translators for the given component. Should only be used to + // validate correct construction of translators in tests. + virtual const std::vector Translators( + const string &component_name) const = 0; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_H_ diff --git a/syntaxnet/dragnn/core/compute_session_impl.cc b/syntaxnet/dragnn/core/compute_session_impl.cc new file mode 100644 index 0000000000000000000000000000000000000000..e83db32d644ef04fd9b8a8ac38de9b10767fbd31 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session_impl.cc @@ -0,0 +1,399 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/compute_session_impl.h" + +#include +#include + +#include "dragnn/protos/data.pb.h" +#include "dragnn/protos/spec.pb.h" +#include "dragnn/protos/trace.pb.h" +#include "syntaxnet/registry.h" +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +ComputeSessionImpl::ComputeSessionImpl( + int id, + std::function(const string &component_name, + const string &backend_type)> + component_builder) + : component_builder_(std::move(component_builder)), id_(id) {} + +void ComputeSessionImpl::Init(const MasterSpec &master_spec, + const GridPoint &hyperparams) { + spec_ = master_spec; + grid_point_ = hyperparams; + + VLOG(2) << "Creating components."; + bool is_input = true; + Component *predecessor; + for (const ComponentSpec &spec : master_spec.component()) { + // Construct the component using the specified backend. + VLOG(2) << "Creating component '" << spec.name() + << "' with backend: " << spec.backend().registered_name(); + auto component = + component_builder_(spec.name(), spec.backend().registered_name()); + + // Initializes the component. + component->InitializeComponent(spec); + + // Adds a predecessor to non-input components. + if (!is_input) { + predecessors_.insert( + std::pair(component.get(), predecessor)); + } + + // The current component will be the predecessor component next time around. + predecessor = component.get(); + + // All components after the first are non-input components. + is_input = false; + + // Move into components list. + components_.insert(std::pair>( + spec.name(), std::move(component))); + } + VLOG(2) << "Done creating components."; + + VLOG(2) << "Adding translators."; + for (const ComponentSpec &spec : master_spec.component()) { + // First, get the component object for this spec. + VLOG(2) << "Examining component: " << spec.name(); + auto map_result = components_.find(spec.name()); + CHECK(map_result != components_.end()) << "Unable to find component."; + Component *start_component = map_result->second.get(); + + if (spec.linked_feature_size() > 0) { + VLOG(2) << "Adding " << spec.linked_feature_size() << " translators for " + << spec.name(); + + // Attach all the translators described in the spec. + std::vector translator_set; + for (const LinkedFeatureChannel &channel : spec.linked_feature()) { + // For every translator, save off a non-unique ptr in the component name + // to translator map, then push the unique ptr onto the management + // vector. + auto translator = CreateTranslator(channel, start_component); + translator_set.push_back(translator.get()); + owned_translators_.push_back(std::move(translator)); + } + + // Once all translators have been created, associate this group of + // translators with a component. + translators_.insert(std::pair>( + spec.name(), std::move(translator_set))); + } else { + VLOG(2) << "No translators found for " << spec.name(); + } + } + VLOG(2) << "Done adding translators."; + + VLOG(2) << "Initialization complete."; +} + +void ComputeSessionImpl::InitializeComponentData(const string &component_name, + int max_beam_size) { + CHECK(input_data_ != nullptr) << "Attempted to access a component without " + "providing input data for this session."; + Component *component = GetComponent(component_name); + + // Try and find the source component. If one exists, check that it is terminal + // and get its data; if not, pass in an empty vector for source data. + auto source_result = predecessors_.find(component); + if (source_result == predecessors_.end()) { + VLOG(1) << "Source result not found. Using empty initialization vector for " + << component_name; + component->InitializeData({}, max_beam_size, input_data_.get()); + } else { + VLOG(1) << "Source result found. Using prior initialization vector for " + << component_name; + auto source = source_result->second; + CHECK(source->IsTerminal()) << "Source is not terminal for component '" + << component_name << "'. Exiting."; + component->InitializeData(source->GetBeam(), max_beam_size, + input_data_.get()); + } + if (do_tracing_) { + component->InitializeTracing(); + } +} + +int ComputeSessionImpl::BatchSize(const string &component_name) const { + return GetReadiedComponent(component_name)->BatchSize(); +} + +int ComputeSessionImpl::BeamSize(const string &component_name) const { + return GetReadiedComponent(component_name)->BeamSize(); +} + +const ComponentSpec &ComputeSessionImpl::Spec( + const string &component_name) const { + for (const auto &component : spec_.component()) { + if (component.name() == component_name) { + return component; + } + } + LOG(FATAL) << "Missing component '" << component_name << "'. Exiting."; +} + +int ComputeSessionImpl::SourceComponentBeamSize(const string &component_name, + int channel_id) { + const auto &translators = GetTranslators(component_name); + return translators.at(channel_id)->path().back()->BeamSize(); +} + +void ComputeSessionImpl::AdvanceFromOracle(const string &component_name) { + GetReadiedComponent(component_name)->AdvanceFromOracle(); +} + +void ComputeSessionImpl::AdvanceFromPrediction(const string &component_name, + const float score_matrix[], + int score_matrix_length) { + GetReadiedComponent(component_name) + ->AdvanceFromPrediction(score_matrix, score_matrix_length); +} + +int ComputeSessionImpl::GetInputFeatures( + const string &component_name, std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, int channel_id) const { + return GetReadiedComponent(component_name) + ->GetFixedFeatures(allocate_indices, allocate_ids, allocate_weights, + channel_id); +} + +int ComputeSessionImpl::BulkGetInputFeatures( + const string &component_name, const BulkFeatureExtractor &extractor) { + return GetReadiedComponent(component_name)->BulkGetFixedFeatures(extractor); +} + +std::vector ComputeSessionImpl::GetTranslatedLinkFeatures( + const string &component_name, int channel_id) { + auto *component = GetReadiedComponent(component_name); + auto features = component->GetRawLinkFeatures(channel_id); + + IndexTranslator *translator = GetTranslators(component_name).at(channel_id); + for (int i = 0; i < features.size(); ++i) { + LinkFeatures &feature = features[i]; + if (feature.has_feature_value()) { + VLOG(2) << "Raw feature[" << i << "]: " << feature.ShortDebugString(); + IndexTranslator::Index index = translator->Translate( + feature.batch_idx(), feature.beam_idx(), feature.feature_value()); + feature.set_step_idx(index.step_index); + feature.set_batch_idx(index.batch_index); + feature.set_beam_idx(index.beam_index); + } else { + VLOG(2) << "Raw feature[" << i << "]: PADDING (empty proto)"; + } + } + + // Add the translated link features to the component's trace. + if (do_tracing_) { + component->AddTranslatedLinkFeaturesToTrace(features, channel_id); + } + + return features; +} +std::vector> ComputeSessionImpl::EmitOracleLabels( + const string &component_name) { + return GetReadiedComponent(component_name)->GetOracleLabels(); +} + +bool ComputeSessionImpl::IsTerminal(const string &component_name) { + return GetReadiedComponent(component_name)->IsTerminal(); +} + +void ComputeSessionImpl::SetTracing(bool tracing_on) { + do_tracing_ = tracing_on; + for (auto &component_pair : components_) { + if (!tracing_on) { + component_pair.second->DisableTracing(); + } + } +} + +void ComputeSessionImpl::FinalizeData(const string &component_name) { + VLOG(2) << "Finalizing data for " << component_name; + GetReadiedComponent(component_name)->FinalizeData(); +} + +std::vector ComputeSessionImpl::GetSerializedPredictions() { + VLOG(2) << "Geting serialized predictions."; + return input_data_->SerializedData(); +} + +std::vector ComputeSessionImpl::GetTraceProtos() { + std::vector traces; + + // First compute all possible traces for each component. + std::map>> component_traces; + std::vector pipeline; + for (auto &component_spec : spec_.component()) { + pipeline.push_back(component_spec.name()); + component_traces.insert( + {component_spec.name(), + GetComponent(component_spec.name())->GetTraceProtos()}); + } + + // Only output for the actual number of states in each beam. + auto final_beam = GetComponent(pipeline.back())->GetBeam(); + for (int batch_idx = 0; batch_idx < final_beam.size(); ++batch_idx) { + for (int beam_idx = 0; beam_idx < final_beam[batch_idx].size(); + ++beam_idx) { + std::vector beam_path; + beam_path.push_back(beam_idx); + + // Trace components backwards, finding the source of each state in the + // prior component. + VLOG(2) << "Start trace: " << beam_idx; + for (int i = pipeline.size() - 1; i > 0; --i) { + const auto *component = GetComponent(pipeline[i]); + int source_beam_idx = + component->GetSourceBeamIndex(beam_path.back(), batch_idx); + beam_path.push_back(source_beam_idx); + + VLOG(2) << "Tracing path: " << pipeline[i] << " = " << source_beam_idx; + } + + // Trace the path from the *start* to the end. + std::reverse(beam_path.begin(), beam_path.end()); + MasterTrace master_trace; + for (int i = 0; i < pipeline.size(); ++i) { + *master_trace.add_component_trace() = + component_traces[pipeline[i]][batch_idx][beam_path[i]]; + } + traces.push_back(master_trace); + } + } + + return traces; +} + +void ComputeSessionImpl::SetInputData(const std::vector &data) { + input_data_.reset(new InputBatchCache(data)); +} + +void ComputeSessionImpl::ResetSession() { + // Reset all component states. + for (auto &component_pair : components_) { + component_pair.second->ResetComponent(); + } + + // Reset the input data pointer. + input_data_.reset(); +} + +int ComputeSessionImpl::Id() const { return id_; } + +string ComputeSessionImpl::GetDescription(const string &component_name) const { + return GetComponent(component_name)->Name(); +} + +const std::vector ComputeSessionImpl::Translators( + const string &component_name) const { + auto translators = GetTranslators(component_name); + std::vector const_translators; + for (const auto &translator : translators) { + const_translators.push_back(translator); + } + return const_translators; +} + +Component *ComputeSessionImpl::GetReadiedComponent( + const string &component_name) const { + auto component = GetComponent(component_name); + CHECK(component->IsReady()) + << "Attempted to access component " << component_name + << " without first initializing it."; + return component; +} + +Component *ComputeSessionImpl::GetComponent( + const string &component_name) const { + auto result = components_.find(component_name); + if (result == components_.end()) { + LOG(ERROR) << "Could not find component \"" << component_name + << "\" in the component set. Current components are: "; + for (const auto &component_pair : components_) { + LOG(ERROR) << component_pair.first; + } + LOG(FATAL) << "Missing component. Exiting."; + } + + auto component = result->second.get(); + return component; +} + +const std::vector &ComputeSessionImpl::GetTranslators( + const string &component_name) const { + auto result = translators_.find(component_name); + if (result == translators_.end()) { + LOG(ERROR) << "Could not find component " << component_name + << " in the translator set. Current components are: "; + for (const auto &component_pair : translators_) { + LOG(ERROR) << component_pair.first; + } + LOG(FATAL) << "Missing component. Exiting."; + } + return result->second; +} + +std::unique_ptr ComputeSessionImpl::CreateTranslator( + const LinkedFeatureChannel &channel, Component *start_component) { + const int num_components = spec_.component_size(); + VLOG(2) << "Channel spec: " << channel.ShortDebugString(); + + // Find the linked feature's source component, if it exists. + auto source_map_result = components_.find(channel.source_component()); + CHECK(source_map_result != components_.end()) + << "Unable to find source component " << channel.source_component(); + const Component *end_component = source_map_result->second.get(); + + // Our goal here is to iterate up the source map from the + // start_component to the end_component. + Component *current_component = start_component; + std::vector path; + path.push_back(current_component); + while (current_component != end_component) { + // Try to find the next link upwards in the source chain. + auto source_result = predecessors_.find(current_component); + + // If this component doesn't have a source to find, that's an error. + CHECK(source_result != predecessors_.end()) + << "No link to source " << channel.source_component(); + + // If we jump more times than there are components in the graph, that + // is an error state. + CHECK_LT(path.size(), num_components) << "Too many jumps. Is there a " + "loop in the MasterSpec " + "component definition?"; + + // Add the source to the vector and repeat. + path.push_back(source_result->second); + current_component = source_result->second; + } + + // At this point, we have the source chain for the traslator and can + // build it. + std::unique_ptr translator( + new IndexTranslator(path, channel.source_translator())); + return translator; +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/compute_session_impl.h b/syntaxnet/dragnn/core/compute_session_impl.h new file mode 100644 index 0000000000000000000000000000000000000000..3e219d4e034956a90f37fb152d215ffbe9073d14 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session_impl.h @@ -0,0 +1,157 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_IMPL_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_IMPL_H_ + +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/compute_session.h" +#include "dragnn/core/index_translator.h" +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/protos/data.pb.h" +#include "dragnn/protos/spec.pb.h" +#include "dragnn/protos/trace.pb.h" + +namespace syntaxnet { +namespace dragnn { + +class ComputeSessionImpl : public ComputeSession { + public: + // Creates a ComputeSessionImpl with the provided component builder function. + ComputeSessionImpl( + int id, + std::function(const string &component_name, + const string &backend_type)> + component_builder); + + void Init(const MasterSpec &master_spec, + const GridPoint &hyperparams) override; + + void InitializeComponentData(const string &component_name, + int max_beam_size) override; + + int BatchSize(const string &component_name) const override; + + int BeamSize(const string &component_name) const override; + + const ComponentSpec &Spec(const string &component_name) const override; + + int SourceComponentBeamSize(const string &component_name, + int channel_id) override; + + void AdvanceFromOracle(const string &component_name) override; + + void AdvanceFromPrediction(const string &component_name, + const float score_matrix[], + int score_matrix_length) override; + + int GetInputFeatures(const string &component_name, + std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const override; + + int BulkGetInputFeatures(const string &component_name, + const BulkFeatureExtractor &extractor) override; + + std::vector GetTranslatedLinkFeatures( + const string &component_name, int channel_id) override; + + std::vector> EmitOracleLabels( + const string &component_name) override; + + bool IsTerminal(const string &component_name) override; + + void FinalizeData(const string &component_name) override; + + std::vector GetSerializedPredictions() override; + + std::vector GetTraceProtos() override; + + void SetInputData(const std::vector &data) override; + + void ResetSession() override; + + void SetTracing(bool tracing_on) override; + + int Id() const override; + + string GetDescription(const string &component_name) const override; + + const std::vector Translators( + const string &component_name) const override; + + private: + // Get a given component. Fails if the component is not found. + Component *GetComponent(const string &component_name) const; + + // Get a given component. CHECK-fail if the component's IsReady method + // returns false. + Component *GetReadiedComponent(const string &component_name) const; + + // Get the index translators for the given component. + const std::vector &GetTranslators( + const string &component_name) const; + + // Create an index translator. + std::unique_ptr CreateTranslator( + const LinkedFeatureChannel &channel, Component *start_component); + + // Perform initialization on the given Component. + void InitComponent(Component *component); + + // Holds all of the components owned by this ComputeSession, associated with + // their names in the MasterSpec. + std::map> components_; + + // Holds a vector of translators for each component, indexed by the name + // of the component they belong to. + std::map> translators_; + + // Holds ownership of all the IndexTranslators for this compute session. + std::vector> owned_translators_; + + // The predecessor component for every component. + // If a component is not in this map, it has no predecessor component and + // will have its beam initialized without any data from other components. + std::map predecessors_; + + // Holds the current input data for this ComputeSession. + std::unique_ptr input_data_; + + // Function that, given a string, will return a Component. + std::function(const string &component_name, + const string &backend_type)> + component_builder_; + + // The master spec for this compute session. + MasterSpec spec_; + + // The hyperparameters for this compute session. + GridPoint grid_point_; + + // Unique identifier, assigned at construction. + int id_; + + // Whether or not to perform tracing. + bool do_tracing_ = false; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_IMPL_H_ diff --git a/syntaxnet/dragnn/core/compute_session_impl_test.cc b/syntaxnet/dragnn/core/compute_session_impl_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..74bef1f5a6b27a95ff54364f90a9d601de996a36 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session_impl_test.cc @@ -0,0 +1,1172 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/compute_session_impl.h" + +#include +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/component_registry.h" +#include "dragnn/core/compute_session.h" +#include "dragnn/core/compute_session_pool.h" +#include "dragnn/core/interfaces/component.h" +#include "dragnn/core/test/generic.h" +#include "dragnn/core/test/mock_component.h" +#include "dragnn/core/test/mock_transition_state.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +using syntaxnet::test::EqualsProto; +using testing::_; +using testing::ElementsAre; +using testing::Return; +using testing::NotNull; + +// ***************************************************************************** +// Test-internal class definitions. +// ***************************************************************************** + +// Define a test component to validate registered construction. +class TestComponentType1 : public Component { + public: + TestComponentType1() {} + void InitializeComponent(const ComponentSpec &spec) override { + name_ = spec.name(); + } + void InitializeData( + const std::vector> &states, + int max_beam_size, InputBatchCache *input_data) override {} + void InitializeTracing() override {} + void DisableTracing() override {} + bool IsReady() const override { return true; } + string Name() const override { return name_; } + int BeamSize() const override { return 3; } + int BatchSize() const override { return 1; } + int StepsTaken(int batch_index) const override { return 0; } + int GetBeamIndexAtStep(int step, int current_index, + int batch) const override { + return 0; + } + int GetSourceBeamIndex(int current_index, int batch) const override { + return 0; + } + void AdvanceFromPrediction(const float transition_matrix[], + int matrix_length) override {} + void AdvanceFromOracle() override {} + bool IsTerminal() const override { return true; } + std::function GetStepLookupFunction( + const string &method) override { + return nullptr; + } + std::vector> GetBeam() override { + std::vector> states; + return states; + } + int GetFixedFeatures(std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const override { + return 0; + } + int BulkGetFixedFeatures(const BulkFeatureExtractor &extractor) override { + return 0; + } + std::vector GetRawLinkFeatures(int channel_id) const override { + std::vector ret; + return ret; + } + std::vector> GetOracleLabels() const override { + std::vector> ret; + return ret; + } + void FinalizeData() override {} + void ResetComponent() override {} + + std::vector> GetTraceProtos() const override { + std::vector> ret; + return ret; + } + void AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) override {} + + string name_; +}; + +REGISTER_DRAGNN_COMPONENT(TestComponentType1); + +// Define a second test component to validate registered construction. +class TestComponentType2 : public Component { + public: + TestComponentType2() {} + void InitializeComponent(const ComponentSpec &spec) override { + name_ = spec.name(); + } + void InitializeData( + const std::vector> &states, + int max_beam_size, InputBatchCache *input_data) override {} + void InitializeTracing() override {} + void DisableTracing() override {} + bool IsReady() const override { return true; } + string Name() const override { return name_; } + int BeamSize() const override { return 4; } + int BatchSize() const override { return 2; } + int StepsTaken(int batch_index) const override { return 0; } + int GetBeamIndexAtStep(int step, int current_index, + int batch) const override { + return 0; + } + int GetSourceBeamIndex(int current_index, int batch) const override { + return 0; + } + void AdvanceFromPrediction(const float transition_matrix[], + int matrix_length) override {} + void AdvanceFromOracle() override {} + bool IsTerminal() const override { return true; } + std::function GetStepLookupFunction( + const string &method) override { + return nullptr; + } + std::vector> GetBeam() override { + std::vector> states; + return states; + } + int GetFixedFeatures(std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const override { + return 0; + } + int BulkGetFixedFeatures(const BulkFeatureExtractor &extractor) override { + return 0; + } + std::vector GetRawLinkFeatures(int channel_id) const override { + std::vector ret; + return ret; + } + std::vector> GetOracleLabels() const override { + std::vector> ret; + return ret; + } + void FinalizeData() override {} + void ResetComponent() override {} + + std::vector> GetTraceProtos() const override { + std::vector> ret; + return ret; + } + void AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) override {} + + string name_; +}; + +REGISTER_DRAGNN_COMPONENT(TestComponentType2); + +// Define a component that returns false for IsReady and IsTerminal. +class UnreadyComponent : public Component { + public: + UnreadyComponent() {} + void InitializeComponent(const ComponentSpec &spec) override { + name_ = spec.name(); + } + void InitializeData( + const std::vector> &states, + int max_beam_size, InputBatchCache *input_data) override {} + void InitializeTracing() override {} + void DisableTracing() override {} + bool IsReady() const override { return false; } + string Name() const override { return name_; } + int BeamSize() const override { return 1; } + int BatchSize() const override { return 2; } + int StepsTaken(int batch_index) const override { return 0; } + int GetBeamIndexAtStep(int step, int current_index, + int batch) const override { + return 0; + } + int GetSourceBeamIndex(int current_index, int batch) const override { + return 0; + } + void AdvanceFromPrediction(const float transition_matrix[], + int matrix_length) override {} + void AdvanceFromOracle() override {} + bool IsTerminal() const override { return false; } + std::function GetStepLookupFunction( + const string &method) override { + return nullptr; + } + std::vector> GetBeam() override { + std::vector> states; + return states; + } + int GetFixedFeatures(std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const override { + return 0; + } + int BulkGetFixedFeatures(const BulkFeatureExtractor &extractor) override { + return 0; + } + std::vector GetRawLinkFeatures(int channel_id) const override { + std::vector ret; + return ret; + } + std::vector> GetOracleLabels() const override { + std::vector> ret; + return ret; + } + void FinalizeData() override {} + void ResetComponent() override {} + std::vector> GetTraceProtos() const override { + std::vector> ret; + return ret; + } + void AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) override {} + string name_; +}; + +REGISTER_DRAGNN_COMPONENT(UnreadyComponent); + +class ComputeSessionImplTestPoolAccessor { + public: + static void SetComponentBuilder( + ComputeSessionPool *pool, + std::function(const string &component_name, + const string &backend_type)> + component_builder_function) { + pool->SetComponentBuilder(std::move(component_builder_function)); + } +}; + +// ***************************************************************************** +// Tests begin here. +// ***************************************************************************** + +// Helper function to validate a translation path against a vector of expected +// component name strings. +void ValidatePath(const std::vector &expected_path, + const std::vector &path) { + EXPECT_EQ(expected_path.size(), path.size()); + for (int i = 0; i < expected_path.size(); ++i) { + EXPECT_EQ(expected_path.at(i), path.at(i)->Name()); + } +} + +void AddComponentToSpec(const string &component_name, + const string &backend_name, MasterSpec *spec) { + auto component_spec = spec->add_component(); + component_spec->set_name(component_name); + auto backend = component_spec->mutable_backend(); + backend->set_registered_name(backend_name); +} + +void AddTranslatorToSpec(const string &source_name, const string &dest_name, + const string &type, MasterSpec *spec) { + // Find the destination component. + ComponentSpec *dest_spec = nullptr; + for (int i = 0; i < spec->component_size(); ++i) { + if (spec->component(i).name() == dest_name) { + dest_spec = spec->mutable_component(i); + break; + } + } + + // Make sure it's not null... + EXPECT_NE(dest_spec, nullptr); + + // Set up the translator. + auto linked_feature = dest_spec->add_linked_feature(); + linked_feature->set_source_component(source_name); + linked_feature->set_source_translator(type); +} + +TEST(ComputeSessionImplTest, CreatesComponent) { + // Define a spec that creates an instance of TestComponentType1. + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Make sure that the component exists and is of type TestComponentType1. + string Type1ComponentDesc = "component_one"; + constexpr int kType1BatchSize = 1; + EXPECT_EQ(Type1ComponentDesc, session->GetDescription("component_one")); + EXPECT_EQ(kType1BatchSize, session->BatchSize("component_one")); +} + +TEST(ComputeSessionImplTest, ReturnsComponentSpec) { + // Define a spec that creates an instance of TestComponentType1 and + // TestComponentType2. + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType2", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + + auto session = pool.GetSession(); + EXPECT_EQ(spec.component(1).DebugString(), + session->Spec("component_two").DebugString()); +} + +TEST(ComputeSessionImplTest, CreatesMultipleComponents) { + // Define a spec that creates an instance of TestComponentType1 and + // TestComponentType2. + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType2", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Make sure that the components exist and are the correct type. + string Type1ComponentDesc = "component_one"; + constexpr int kType1BatchSize = 1; + EXPECT_EQ(Type1ComponentDesc, session->GetDescription("component_one")); + EXPECT_EQ(kType1BatchSize, session->BatchSize("component_one")); + + string Type2ComponentDesc = "component_two"; + constexpr int kType2BatchSize = 2; + EXPECT_EQ(Type2ComponentDesc, session->GetDescription("component_two")); + EXPECT_EQ(kType2BatchSize, session->BatchSize("component_two")); +} + +TEST(ComputeSessionImplTest, InitializesComponents) { + // Define a spec that creates an instance of TestComponentType1 and + // TestComponentType2. + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType2", &spec); + + // Create a map to hold references to mock components. Expect the correct + // initialization call (with the appropriate proto passed in). + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + if (name == "component_one") { + EXPECT_CALL(*component, + InitializeComponent(EqualsProto(spec.component(0)))); + } else { + EXPECT_CALL(*component, + InitializeComponent(EqualsProto(spec.component(1)))); + } + mock_components[name] = component.get(); + return component; + }; + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); +} + +TEST(ComputeSessionImplTest, CreatesTranslator) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType2", &spec); + + // Add a translator from component 1 to component 2. + AddTranslatorToSpec("component_one", "component_two", "identity", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + auto linked_features = session->Translators("component_two"); + EXPECT_EQ(1, linked_features.size()); + ValidatePath({"component_two", "component_one"}, + linked_features.at(0)->path()); + EXPECT_EQ(linked_features.at(0)->method(), "identity"); +} + +TEST(ComputeSessionImplTest, CreatesTranslatorWithLongWalk) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + AddComponentToSpec("component_three", "TestComponentType2", &spec); + + // Add a translator from component 3 to component 1. + AddTranslatorToSpec("component_one", "component_three", "identity", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Get and validate the linked feature vector for component 3. + auto linked_features = session->Translators("component_three"); + EXPECT_EQ(1, linked_features.size()); + ValidatePath({"component_three", "component_two", "component_one"}, + linked_features.at(0)->path()); + EXPECT_EQ(linked_features.at(0)->method(), "identity"); +} + +TEST(ComputeSessionImplTest, CreatesTranslatorForMultipleComponents) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + AddComponentToSpec("component_three", "TestComponentType2", &spec); + + // Add a translator from component 3 to component 1. + AddTranslatorToSpec("component_one", "component_three", "identity", &spec); + + // Add a translator from component 3 to component 2. + AddTranslatorToSpec("component_two", "component_three", "history", &spec); + + // Add a translator from component 2 to component 1. + AddTranslatorToSpec("component_one", "component_two", "history", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Get and validate the linked feature vector for component 3. + auto linked_features = session->Translators("component_three"); + EXPECT_EQ(2, linked_features.size()); + ValidatePath({"component_three", "component_two", "component_one"}, + linked_features.at(0)->path()); + EXPECT_EQ(linked_features.at(0)->method(), "identity"); + ValidatePath({"component_three", "component_two"}, + linked_features.at(1)->path()); + EXPECT_EQ(linked_features.at(1)->method(), "history"); + + // Get and validate the linked feature vector for component 2. + auto linked_features_2 = session->Translators("component_two"); + EXPECT_EQ(1, linked_features_2.size()); + ValidatePath({"component_two", "component_one"}, + linked_features_2.at(0)->path()); + EXPECT_EQ(linked_features_2.at(0)->method(), "history"); +} + +TEST(ComputeSessionImplTest, CreatesMultipleTranslatorsBetweenSameComponents) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Add a translator from component 2 to component 1. + AddTranslatorToSpec("component_one", "component_two", "identity", &spec); + + // Add a translator from component 2 to component 1. + AddTranslatorToSpec("component_one", "component_two", "history", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Get and validate the linked feature vector for component 2. + auto linked_features = session->Translators("component_two"); + EXPECT_EQ(2, linked_features.size()); + ValidatePath({"component_two", "component_one"}, + linked_features.at(0)->path()); + EXPECT_EQ(linked_features.at(0)->method(), "identity"); + ValidatePath({"component_two", "component_one"}, + linked_features.at(1)->path()); + EXPECT_EQ(linked_features.at(1)->method(), "history"); +} + +TEST(ComputeSessionImplTest, CreatesSelfReferentialTranslator) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Add a translator from component 1 to component 1. + AddTranslatorToSpec("component_one", "component_one", "identity", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Get and validate the linked feature vector for component 1. + auto linked_features = session->Translators("component_one"); + EXPECT_EQ(1, linked_features.size()); + ValidatePath({"component_one"}, linked_features.at(0)->path()); +} + +TEST(ComputeSessionImplTest, CreateTranslatorFailsWithWrongNameDeathTest) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Add a translator from a nonexistent component to component 1. + AddTranslatorToSpec("NONEXISTENT_COMPONENT_THIS_WILL_DIE", "component_one", + "identity", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + + EXPECT_DEATH(pool.GetSession(), "Unable to find source component"); +} + +TEST(ComputeSessionImplTest, GetsSourceComponentBeamSize) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType2", &spec); + + // Add a translator from component 1 to component 2. + AddTranslatorToSpec("component_one", "component_two", "identity", &spec); + + // Create a pool so we can get a session. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + constexpr int kChannelId = 0; + constexpr int kType1BeamSize = 3; + EXPECT_EQ(kType1BeamSize, + session->SourceComponentBeamSize("component_two", kChannelId)); +} + +TEST(ComputeSessionImplTest, GetsTranslatedLinkFeatures) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType2", &spec); + + // Add a translator from component 1 to component 2. + AddTranslatorToSpec("component_one", "component_two", "identity", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + EXPECT_CALL(*component, IsReady()).WillRepeatedly(Return(true)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Create a link features vector to return from the destination component. + std::vector features; + LinkFeatures feature_one; + feature_one.set_batch_idx(12); + feature_one.set_beam_idx(23); + feature_one.set_feature_value(34); + features.push_back(feature_one); + LinkFeatures feature_two; + feature_two.set_batch_idx(45); + feature_two.set_beam_idx(56); + feature_two.set_feature_value(67); + features.push_back(feature_two); + + // This link feature should remain empty. + LinkFeatures padding_feature; + features.push_back(padding_feature); + + // The session should request the raw link features for the specified channel. + constexpr int kChannelId = 0; + EXPECT_CALL(*mock_components["component_two"], GetRawLinkFeatures(kChannelId)) + .WillOnce(Return(features)); + + // The session will request the source beam index for both features. + constexpr int kSourceBeamOneIndex = 7; + EXPECT_CALL( + *mock_components["component_two"], + GetSourceBeamIndex(feature_one.beam_idx(), feature_one.batch_idx())) + .WillOnce(Return(kSourceBeamOneIndex)); + constexpr int kSourceBeamTwoIndex = 77; + EXPECT_CALL( + *mock_components["component_two"], + GetSourceBeamIndex(feature_two.beam_idx(), feature_two.batch_idx())) + .WillOnce(Return(kSourceBeamTwoIndex)); + + // The translate call should use the 'identity' translator on the step index. + // This means that the GetBeamIndexAtStep call will have the values from + // the linked feature proto (since we also don't have an intermediate + // component.) + constexpr int kFeatureOneBeamIndex = 9; + EXPECT_CALL(*mock_components["component_one"], + GetBeamIndexAtStep(feature_one.feature_value(), + kSourceBeamOneIndex, feature_one.batch_idx())) + .WillOnce(Return(kFeatureOneBeamIndex)); + + constexpr int kFeatureTwoBeamIndex = 99; + EXPECT_CALL(*mock_components["component_one"], + GetBeamIndexAtStep(feature_two.feature_value(), + kSourceBeamTwoIndex, feature_two.batch_idx())) + .WillOnce(Return(kFeatureTwoBeamIndex)); + + auto translated_features = + session->GetTranslatedLinkFeatures("component_two", kChannelId); + + auto translated_one = translated_features.at(0); + EXPECT_EQ(translated_one.batch_idx(), feature_one.batch_idx()); + EXPECT_EQ(translated_one.beam_idx(), kFeatureOneBeamIndex); + EXPECT_EQ(translated_one.step_idx(), feature_one.feature_value()); + + auto translated_two = translated_features.at(1); + EXPECT_EQ(translated_two.batch_idx(), feature_two.batch_idx()); + EXPECT_EQ(translated_two.beam_idx(), kFeatureTwoBeamIndex); + EXPECT_EQ(translated_two.step_idx(), feature_two.feature_value()); + + // The third feature is a padding feature, and so should be empty. + auto translated_three = translated_features.at(2); + EXPECT_FALSE(translated_three.has_batch_idx()); + EXPECT_FALSE(translated_three.has_beam_idx()); + EXPECT_FALSE(translated_three.has_step_idx()); + EXPECT_FALSE(translated_three.has_feature_value()); +} + +TEST(ComputeSessionImplTest, InitializesComponentDataWithNoSource) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + + // Set expectations and get a session, then get the component. + // The initialization should be called with an empty state vector, but with + // a non-null input batch cache pointer. + constexpr int kMaxBeamSize = 11; + EXPECT_CALL(*(mock_components["component_one"]), + InitializeData(testing::IsEmpty(), kMaxBeamSize, NotNull())); + session->SetInputData({"arbitrary_data"}); + session->InitializeComponentData("component_one", kMaxBeamSize); +} + +TEST(ComputeSessionImplTest, InitializesComponentWithSource) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder.. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Set expectations. + constexpr int kMaxBeamSize = 11; + MockTransitionState mock_transition_state; + std::vector> beam( + {{&mock_transition_state}}); + + // Expect that the first component will report that it is terminal and return + // a beam. + EXPECT_CALL(*mock_components["component_one"], IsTerminal()) + .WillOnce(Return(true)); + EXPECT_CALL(*mock_components["component_one"], GetBeam()) + .WillOnce(Return(beam)); + + // Expect that the second component will recieve that beam. + EXPECT_CALL(*mock_components["component_two"], + InitializeData(beam, kMaxBeamSize, NotNull())); + + // Attempt to initialize the component. + session->InitializeComponentData("component_two", kMaxBeamSize); +} + +TEST(ComputeSessionImplTest, + InitializeDataFailsWhenInputDataNotProvidedDeathTest) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + + constexpr int kMaxBeamSize = 3; + EXPECT_DEATH(session->InitializeComponentData("component_one", kMaxBeamSize), + "without providing input data"); +} + +TEST(ComputeSessionImplTest, + InitializeDataFailsWhenComponentDoesNotExistdeathTest) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + constexpr int kMaxBeamSize = 3; + EXPECT_DEATH( + session->InitializeComponentData("DOES_NOT_EXIST_DIE", kMaxBeamSize), + "Could not find component"); +} + +TEST(ComputeSessionImplTest, + InitializeDataFailsWhenSourceIsNotTerminalDeathTest) { + auto function_that_will_die = []() { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec]( + const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Expect that the first component will report that it is not terminal + EXPECT_CALL(*mock_components["component_one"], IsTerminal()) + .WillOnce(Return(false)); + + // Attempt to initialize the component. + constexpr int kMaxBeamSize = 11; + session->InitializeComponentData("component_two", kMaxBeamSize); + }; + + // The death expectation is interacting strangely with this test, so I need + // to wrap the function in a lambda. + EXPECT_DEATH(function_that_will_die(), "Source is not terminal"); +} + +TEST(ComputeSessionImplTest, ResetSessionResetsAllComponents) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Expect that the first component will report that it is not terminal + EXPECT_CALL(*mock_components["component_one"], ResetComponent()); + EXPECT_CALL(*mock_components["component_two"], ResetComponent()); + + session->ResetSession(); +} + +TEST(ComputeSessionImplTest, SetTracingPropagatesToAllComponents) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Add a translator from component 1 to component 2. + AddTranslatorToSpec("component_one", "component_two", "identity", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Enable tracing on the session. + session->SetTracing(true); + + // Initialize the first component, along with its tracing. + constexpr int kMaxBeamSize = 1; + EXPECT_CALL(*mock_components["component_one"], + InitializeData(testing::IsEmpty(), kMaxBeamSize, NotNull())); + EXPECT_CALL(*mock_components["component_one"], InitializeTracing()); + session->InitializeComponentData("component_one", kMaxBeamSize); + + MockTransitionState mock_transition_state; + std::vector> beam( + {{&mock_transition_state}}); + EXPECT_CALL(*mock_components["component_one"], IsTerminal()) + .WillOnce(Return(true)); + EXPECT_CALL(*mock_components["component_one"], GetBeam()) + .WillOnce(Return(beam)); + + // Expect that the second component will recieve that beam, and then its + // tracing will be initialized. + EXPECT_CALL(*mock_components["component_two"], + InitializeData(beam, kMaxBeamSize, NotNull())); + EXPECT_CALL(*mock_components["component_two"], InitializeTracing()); + session->InitializeComponentData("component_two", kMaxBeamSize); + + // Expect that all components will see the tracing value. + EXPECT_CALL(*mock_components["component_one"], IsReady()) + .WillRepeatedly(Return(true)); + EXPECT_CALL(*mock_components["component_two"], IsReady()) + .WillRepeatedly(Return(true)); + + std::vector features; + LinkFeatures feature_one; + feature_one.set_beam_idx(0); + feature_one.set_batch_idx(0); + feature_one.set_feature_value(34); + features.push_back(feature_one); + + // Translated version: feature_value is copied to step_idx. + std::vector translated; + feature_one.set_step_idx(feature_one.feature_value()); + translated.push_back(feature_one); + + // The session should request the raw link features for the specified channel. + constexpr int kChannelId = 0; + EXPECT_CALL(*mock_components["component_two"], GetRawLinkFeatures(kChannelId)) + .WillRepeatedly(Return(features)); + + // Identity will not change the features. + EXPECT_CALL(*mock_components["component_two"], + AddTranslatedLinkFeaturesToTrace( + ElementsAre(EqualsProto(translated[0])), kChannelId)); + session->GetTranslatedLinkFeatures("component_two", kChannelId); + + // Now disable tracing. This time we don't expect any tracing to be called. + EXPECT_CALL(*mock_components["component_one"], DisableTracing()); + EXPECT_CALL(*mock_components["component_two"], DisableTracing()); + session->SetTracing(false); + EXPECT_CALL(*mock_components["component_two"], + AddTranslatedLinkFeaturesToTrace( + ElementsAre(EqualsProto(translated[0])), kChannelId)) + .Times(0); + session->GetTranslatedLinkFeatures("component_two", kChannelId); +} + +TEST(ComputeSessionImplTest, TraceSourceBeamPath) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType1", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + AddComponentToSpec("component_three", "TestComponentType1", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + ComponentTrace trace; + + // Test logic: verify that the traces correspond only to the paths taken to + // reach the final states in component 3. This requires backtracking to + // retrace the path of the beam. In this case, we expect three paths: + // + // Component 0 -> Component 1 -> Component 2 + // batch 0, beam 6 -> batch 0, beam 4 -> batch 0, beam 0 + // batch 0, beam 6 -> batch 0, beam 3 -> batch 0, beam 1 + // batch 1, beam 0 -> batch 1, beam 2 -> batch 1, beam 0 + + // Fill the component traces with some dummy values of the approach beam sizes + // for each batch. + + // Component 1: batch 0 has beam size 7, batch 1 has beam size 2. + std::vector> component_one_trace = { + {trace, trace, trace, trace, trace, trace, trace}, {trace, trace}}; + + // Component 2: batch 0 has beam size 5, batch 1 has beam size 3. + std::vector> component_two_trace = { + {trace, trace, trace, trace, trace}, {trace, trace, trace}}; + + // Component 3: batch 0 has beam size 2, batch 1 has beam size 1. + std::vector> component_three_trace = { + {trace, trace}, {trace}}; + + // The Session will get all traces from every component. + EXPECT_CALL(*mock_components["component_one"], GetTraceProtos()) + .WillOnce(Return(component_one_trace)); + EXPECT_CALL(*mock_components["component_two"], GetTraceProtos()) + .WillOnce(Return(component_two_trace)); + EXPECT_CALL(*mock_components["component_three"], GetTraceProtos()) + .WillOnce(Return(component_three_trace)); + + // Final beam has 2 states in batch 0, 1 state in batch 1. So we expect three + // chains. + MockTransitionState mock_transition_state; + std::vector> beam( + {{&mock_transition_state, &mock_transition_state}, + {&mock_transition_state}}); + + EXPECT_CALL(*mock_components["component_three"], GetBeam()) + .WillOnce(Return(beam)); + + // First test chain. + EXPECT_CALL(*mock_components["component_three"], GetSourceBeamIndex(0, 0)) + .WillOnce(Return(4)); + EXPECT_CALL(*mock_components["component_two"], GetSourceBeamIndex(4, 0)) + .WillOnce(Return(6)); + + // Second test chain. + EXPECT_CALL(*mock_components["component_three"], GetSourceBeamIndex(1, 0)) + .WillOnce(Return(3)); + EXPECT_CALL(*mock_components["component_two"], GetSourceBeamIndex(3, 0)) + .WillOnce(Return(6)); + + // Third test chain. + EXPECT_CALL(*mock_components["component_three"], GetSourceBeamIndex(0, 1)) + .WillOnce(Return(2)); + EXPECT_CALL(*mock_components["component_two"], GetSourceBeamIndex(2, 1)) + .WillOnce(Return(1)); + + // Execute the call's. + session->GetTraceProtos(); +} + +TEST(ComputeSessionImplTest, InterfacePassesThrough) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "TestComponentType2", &spec); + AddComponentToSpec("component_two", "TestComponentType1", &spec); + + // Create a map to hold references to mock components. + std::map mock_components; + auto builder_function = [&mock_components, spec](const string &name, + const string &backend_type) { + VLOG(2) << "Mocking for name: " << name; + std::unique_ptr component(new MockComponent()); + EXPECT_CALL(*component, InitializeComponent(_)); + mock_components[name] = component.get(); + return component; + }; + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + ComputeSessionImplTestPoolAccessor::SetComponentBuilder(&pool, + builder_function); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Expect that the first component will report that it is ready. + EXPECT_CALL(*mock_components["component_one"], IsReady()) + .WillRepeatedly(Return(true)); + + // BatchSize() + int batch_size = 3; + EXPECT_CALL(*mock_components["component_one"], BatchSize()) + .WillOnce(Return(batch_size)); + EXPECT_EQ(batch_size, session->BatchSize("component_one")); + + // BeamSize() + int beam_size = 32; + EXPECT_CALL(*mock_components["component_one"], BeamSize()) + .WillOnce(Return(beam_size)); + EXPECT_EQ(beam_size, session->BeamSize("component_one")); + + // AdvanceFromOracle() + EXPECT_CALL(*mock_components["component_one"], AdvanceFromOracle()); + session->AdvanceFromOracle("component_one"); + + // AdvanceFromPrediction() + constexpr int kScoreMatrixLength = 3; + const float score_matrix[kScoreMatrixLength] = {1.0, 2.3, 4.5}; + EXPECT_CALL(*mock_components["component_one"], + AdvanceFromPrediction(score_matrix, kScoreMatrixLength)); + session->AdvanceFromPrediction("component_one", score_matrix, + kScoreMatrixLength); + + // GetFixedFeatures + auto allocate_indices = [](int size) -> int32 * { return nullptr; }; + auto allocate_ids = [](int size) -> int64 * { return nullptr; }; + auto allocate_weights = [](int size) -> float * { return nullptr; }; + constexpr int kChannelId = 3; + EXPECT_CALL(*mock_components["component_one"], + GetFixedFeatures(_, _, _, kChannelId)) + .WillOnce(Return(0)); + EXPECT_EQ( + 0, session->GetInputFeatures("component_one", allocate_indices, + allocate_ids, allocate_weights, kChannelId)); + + // BulkGetFixedFeatures + BulkFeatureExtractor extractor(nullptr, nullptr, nullptr, false, 0, 0); + EXPECT_CALL(*mock_components["component_one"], BulkGetFixedFeatures(_)) + .WillOnce(Return(0)); + EXPECT_EQ(0, session->BulkGetInputFeatures("component_one", extractor)); + + // EmitOracleLabels() + std::vector> oracle_labels = {{0, 1}, {2, 3}}; + EXPECT_CALL(*mock_components["component_one"], GetOracleLabels()) + .WillOnce(Return(oracle_labels)); + EXPECT_EQ(oracle_labels, session->EmitOracleLabels("component_one")); + + // IsTerminal() + bool is_terminal = true; + EXPECT_CALL(*mock_components["component_one"], IsTerminal()) + .WillOnce(Return(is_terminal)); + EXPECT_EQ(is_terminal, session->IsTerminal("component_one")); + + // FinalizeData() + EXPECT_CALL(*mock_components["component_one"], FinalizeData()); + session->FinalizeData("component_one"); +} + +TEST(ComputeSessionImplTest, InterfaceRequiresReady) { + MasterSpec spec; + GridPoint hyperparams; + + AddComponentToSpec("component_one", "UnreadyComponent", &spec); + + // Create a pool, substituting a mock component builder. + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + session->SetInputData({"arbitrary_data"}); + + // Call the functions which should die if the component isn't ready. + EXPECT_DEATH(session->BatchSize("component_one"), + "without first initializing it"); + EXPECT_DEATH(session->BeamSize("component_one"), + "without first initializing it"); + EXPECT_DEATH(session->AdvanceFromOracle("component_one"), + "without first initializing it"); + EXPECT_DEATH(session->EmitOracleLabels("component_one"), + "without first initializing it"); + EXPECT_DEATH(session->IsTerminal("component_one"), + "without first initializing it"); + EXPECT_DEATH(session->FinalizeData("component_one"), + "without first initializing it"); + + constexpr int kScoreMatrixLength = 3; + const float score_matrix[kScoreMatrixLength] = {1.0, 2.3, 4.5}; + EXPECT_DEATH(session->AdvanceFromPrediction("component_one", score_matrix, + kScoreMatrixLength), + "without first initializing it"); + constexpr int kArbitraryChannelId = 3; + EXPECT_DEATH(session->GetInputFeatures("component_one", nullptr, nullptr, + nullptr, kArbitraryChannelId), + "without first initializing it"); + BulkFeatureExtractor extractor(nullptr, nullptr, nullptr, false, 0, 0); + EXPECT_DEATH(session->BulkGetInputFeatures("component_one", extractor), + "without first initializing it"); + EXPECT_DEATH( + session->GetTranslatedLinkFeatures("component_one", kArbitraryChannelId), + "without first initializing it"); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/compute_session_pool.cc b/syntaxnet/dragnn/core/compute_session_pool.cc new file mode 100644 index 0000000000000000000000000000000000000000..a1a38ad652aeb14fa2bf0c6be221b31f08804887 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session_pool.cc @@ -0,0 +1,104 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/compute_session_pool.h" + +#include + +#include "dragnn/core/component_registry.h" +#include "dragnn/core/compute_session_impl.h" +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +using tensorflow::mutex_lock; + +ComputeSessionPool::ComputeSessionPool(const MasterSpec &master_spec, + const GridPoint &hyperparams) + : master_spec_(master_spec), + hyperparams_(hyperparams), + num_unique_sessions_(0) { + // Create a default component builder function. This function looks up + // components in the component registry and returns them. + component_builder_ = []( + const string &component_name, + const string &backend_type) -> std::unique_ptr { + VLOG(2) << "Creating component " << component_name << " with backend " + << backend_type; + std::unique_ptr component(Component::Create(backend_type)); + return component; + }; + + // Create a default session builder function. This function returns a + // ComputeSessionImpl that uses the currently set component_builder_ + // function to create its components. + session_builder_ = [this]() { + return std::unique_ptr( + new ComputeSessionImpl(num_unique_sessions_, this->component_builder_)); + }; +} + +ComputeSessionPool::~ComputeSessionPool() { + LOG(INFO) << "Destroying pool: total number of sessions created = " + << num_unique_sessions_; + if (sessions_.size() < num_unique_sessions_) { + LOG(WARNING) << "Destroying pool: number of unreturned sessions = " + << (num_unique_sessions_ - sessions_.size()); + } +} + +void ComputeSessionPool::SetComputeSessionBuilder( + std::function()> session_builder) { + mutex_lock lock(lock_); + session_builder_ = std::move(session_builder); +} + +void ComputeSessionPool::SetComponentBuilder( + std::function(const string &component_name, + const string &backend_type)> + component_builder) { + mutex_lock lock(lock_); + component_builder_ = std::move(component_builder); +} + +std::unique_ptr ComputeSessionPool::GetSession() { + mutex_lock lock(lock_); + std::unique_ptr session_ptr; + if (sessions_.empty()) { + // There are no available sessions, so create and initialize one. + VLOG(2) << "Creating new session."; + session_ptr = session_builder_(); + num_unique_sessions_++; + session_ptr->Init(master_spec_, hyperparams_); + } else { + // Get the last free session, and remove it from the free sessions vector. + VLOG(2) << "Reusing session from pool of size " << sessions_.size(); + session_ptr = std::move(sessions_.back()); + sessions_.pop_back(); + + session_ptr->ResetSession(); + } + return session_ptr; +} + +void ComputeSessionPool::ReturnSession( + std::unique_ptr session) { + mutex_lock lock(lock_); + sessions_.push_back(std::move(session)); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/compute_session_pool.h b/syntaxnet/dragnn/core/compute_session_pool.h new file mode 100644 index 0000000000000000000000000000000000000000..fba1e71e86a0f0492eaf43922ca18d9738a229b2 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session_pool.h @@ -0,0 +1,102 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_POOL_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_POOL_H_ + +#include + +#include "dragnn/core/compute_session.h" +#include "dragnn/protos/spec.pb.h" +#include "tensorflow/core/platform/mutex.h" + +namespace syntaxnet { +namespace dragnn { + +// This pool creates and manages the reuse of ComputeSession objects. + +class ComputeSessionPool { + public: + // Create a ComputeSessionPool that creates ComputeSessions for the given + // MasterSpec and hyperparameters. + ComputeSessionPool(const MasterSpec &master_spec, + const GridPoint &hyperparams); + + virtual ~ComputeSessionPool(); + + // Get a ComputeSession. This function will attempt to use an already-created + // ComputeSession, but if none are available a new one will be created. + std::unique_ptr GetSession(); + + // Returns a ComputeSession to the backing pool. + void ReturnSession(std::unique_ptr session); + + // Returns the count of outstanding unique sessions. + int num_outstanding_sessions() { + tensorflow::mutex_lock lock(lock_); + return num_unique_sessions_ - sessions_.size(); + } + + private: + friend class ComputeSessionImplTestPoolAccessor; + friend class ComputeSessionPoolTestPoolAccessor; + + // This is a creational injection setter. It should be used for tests + // where we want our ComputeSessionPool to prepare and return + // MockComputeSessions instead of actual ComputeSessionImpls. + void SetComputeSessionBuilder( + std::function()> session_builder); + + // This injector will cause ComputeSessions built in this pool to use the + // passed function to create Components. This is useful when you want a + // ComputeSession to create MockComponents instead of real ones. + void SetComponentBuilder( + std::function(const string &component_name, + const string &backend_type)> + component_builder); + + // The MasterSpec that will be used to initialize ComputeSessions from this + // pool. + const MasterSpec master_spec_; + + // The hyperparameters that will be used to initialize ComputeSessions from + // this pool. + const GridPoint hyperparams_; + + // The function that is used to create ComputeSessions. + std::function()> session_builder_; + + // The function passed to ComputeSessions that will be used by that session + // to create components. + std::function(const string &component_name, + const string &backend_type)> + component_builder_; + + // ComputeSessions that are not currently being used. These sessions are not + // reset until they are requested by another thread. + std::vector> sessions_; + + // Count of the number of unique ComputeSession objects that have been + // created. Used to assign IDs to new Sessions. + int num_unique_sessions_; + + // Mutex that protects accesses to all members of this object. + tensorflow::mutex lock_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_COMPUTE_SESSION_POOL_H_ diff --git a/syntaxnet/dragnn/core/compute_session_pool_test.cc b/syntaxnet/dragnn/core/compute_session_pool_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..82d0735d5e1198f9af166adb26a7862bfe94cac6 --- /dev/null +++ b/syntaxnet/dragnn/core/compute_session_pool_test.cc @@ -0,0 +1,226 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/compute_session_pool.h" + +#include + +#include + +#include "dragnn/core/compute_session.h" +#include "dragnn/core/test/generic.h" +#include "dragnn/core/test/mock_component.h" +#include "dragnn/core/test/mock_compute_session.h" +#include "tensorflow/core/platform/env.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +using syntaxnet::test::EqualsProto; +using testing::Return; +using testing::Invoke; +using testing::MockFunction; + +class ComputeSessionPoolTestPoolAccessor { + public: + static void SetComponentBuilder( + ComputeSessionPool *pool, + std::function(const string &component_name, + const string &backend_type)> + component_builder_function) { + pool->SetComponentBuilder(std::move(component_builder_function)); + } + + static void SetSessionBuilder(ComputeSessionPool *pool, + std::function()> + session_builder_function) { + pool->SetComputeSessionBuilder(std::move(session_builder_function)); + } +}; + +TEST(ComputeSessionPoolTest, DefaultConstructorWorks) { + MasterSpec spec; + GridPoint hyperparams; + ComputeSessionPool pool(spec, hyperparams); + auto request = pool.GetSession(); + EXPECT_NE(request, nullptr); +} + +TEST(ComputeSessionPoolTest, ComponentBuilderInjectionWorks) { + MasterSpec spec; + auto component = spec.add_component(); + component->set_name("test_component_name"); + auto backend = component->mutable_backend(); + backend->set_registered_name("arbitrary_component"); + GridPoint hyperparams; + + ComputeSessionPool pool(spec, hyperparams); + + // Set up a mock component builder. + MockFunction(const string &component_name, + const string &backend_type)> + mock_component_builder; + auto mock_creation_function = [](string, string) { + return std::unique_ptr(new MockComponent()); + }; + EXPECT_CALL(mock_component_builder, + Call("test_component_name", "arbitrary_component")) + .WillOnce(Invoke(mock_creation_function)); + ComputeSessionPoolTestPoolAccessor::SetComponentBuilder( + &pool, mock_component_builder.AsStdFunction()); + + // Now, when the session is requested, the mock component builder should see + // the expected call. + auto request = pool.GetSession(); + EXPECT_NE(request, nullptr); +} + +TEST(ComputeSessionPoolTest, CreatesNewSessionIfNoSessionsExist) { + // We don't need to fill these for this test. + MasterSpec spec; + GridPoint hyperparams; + ComputeSessionPool pool(spec, hyperparams); + + // Create a function that will track calls to the session builder. + MockFunction()> mock_session_builder; + + // Initialize expectations for a request for a ComputeSession. + std::unique_ptr session_one(new MockComputeSession()); + MockComputeSession *session_one_ptr = session_one.get(); + auto mock_creation_function = [&session_one]() { + return std::move(session_one); + }; + EXPECT_CALL(mock_session_builder, Call()) + .WillOnce(Invoke(mock_creation_function)) + .RetiresOnSaturation(); + EXPECT_CALL(*session_one_ptr, + Init(EqualsProto(spec), EqualsProto(hyperparams))); + + // Initialize expectations for another request for a ComputeSession. + std::unique_ptr session_two(new MockComputeSession()); + MockComputeSession *session_two_ptr = session_two.get(); + auto mock_creation_function_two = [&session_two]() { + return std::move(session_two); + }; + EXPECT_CALL(mock_session_builder, Call()) + .WillOnce(Invoke(mock_creation_function_two)) + .RetiresOnSaturation(); + EXPECT_CALL(*session_two_ptr, + Init(EqualsProto(spec), EqualsProto(hyperparams))); + + // Inject the function to the pool. + ComputeSessionPoolTestPoolAccessor::SetSessionBuilder( + &pool, mock_session_builder.AsStdFunction()); + + // The first call will recieve the second session because of how the mocks go. + auto first_request = pool.GetSession(); + EXPECT_EQ(first_request.get(), session_two_ptr); + + auto second_request = pool.GetSession(); + EXPECT_EQ(second_request.get(), session_one_ptr); +} + +TEST(ComputeSessionPoolTest, ReusesAvailableSessions) { + // We don't need to fill these for this test. + MasterSpec spec; + GridPoint hyperparams; + ComputeSessionPool pool(spec, hyperparams); + + // Create a function that will track calls to the session builder. + MockFunction()> mock_session_builder; + + // Initialize expectations for a request for a ComputeSession. + std::unique_ptr session_one(new MockComputeSession()); + MockComputeSession *session_one_ptr = session_one.get(); + auto mock_creation_function = [&session_one]() { + return std::move(session_one); + }; + EXPECT_CALL(mock_session_builder, Call()) + .WillOnce(Invoke(mock_creation_function)) + .RetiresOnSaturation(); + EXPECT_CALL(*session_one_ptr, + Init(EqualsProto(spec), EqualsProto(hyperparams))); + + // Initialize expectations for another request for a ComputeSession. + std::unique_ptr session_two(new MockComputeSession()); + MockComputeSession *session_two_ptr = session_two.get(); + auto mock_creation_function_two = [&session_two]() { + return std::move(session_two); + }; + EXPECT_CALL(mock_session_builder, Call()) + .WillOnce(Invoke(mock_creation_function_two)) + .RetiresOnSaturation(); + EXPECT_CALL(*session_two_ptr, + Init(EqualsProto(spec), EqualsProto(hyperparams))); + + // Inject the function to the pool. + ComputeSessionPoolTestPoolAccessor::SetSessionBuilder( + &pool, mock_session_builder.AsStdFunction()); + + // The first call will recieve the second session because of how the mocks go. + auto first_request = pool.GetSession(); + EXPECT_EQ(1, pool.num_outstanding_sessions()); + EXPECT_EQ(first_request.get(), session_two_ptr); + + // Return the first pointer. After this, the second request should get that + // pointer. + EXPECT_CALL(*session_two_ptr, ResetSession()); + pool.ReturnSession(std::move(first_request)); + EXPECT_EQ(0, pool.num_outstanding_sessions()); + auto second_request = pool.GetSession(); + EXPECT_EQ(1, pool.num_outstanding_sessions()); + EXPECT_EQ(second_request.get(), session_two_ptr); + + // There are now no spare sessions, so the next session request should + // create a second session. + auto third_request = pool.GetSession(); + EXPECT_EQ(2, pool.num_outstanding_sessions()); + EXPECT_EQ(third_request.get(), session_one_ptr); +} + +TEST(ComputeSessionPoolTest, AssignsUniqueIds) { + MasterSpec spec; + GridPoint hyperparams; + ComputeSessionPool pool(spec, hyperparams); + auto session = pool.GetSession(); + auto session_2 = pool.GetSession(); + EXPECT_NE(session->Id(), session_2->Id()); +} + +TEST(ComputeSessionPoolTest, SupportsMultithreadedAccess) { + MasterSpec spec; + GridPoint hyperparams; + ComputeSessionPool pool(spec, hyperparams); + + std::vector> request_threads; + constexpr int kNumThreadsToTest = 100; + for (int i = 0; i < kNumThreadsToTest; ++i) { + request_threads.push_back(std::unique_ptr( + tensorflow::Env::Default()->StartThread( + tensorflow::ThreadOptions(), "thread", + [this, &pool] { auto session = pool.GetSession(); }))); + } + + // Deleting a tensorflow::Thread blocks until the thread exits, + // so clearing the vector blocks until all threads have exited. + request_threads.clear(); + + // Make sure all the threads got their session. + EXPECT_EQ(kNumThreadsToTest, pool.num_outstanding_sessions()); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/index_translator.cc b/syntaxnet/dragnn/core/index_translator.cc new file mode 100644 index 0000000000000000000000000000000000000000..028494a8855ea655873472d6639a0bb7ff4fef6c --- /dev/null +++ b/syntaxnet/dragnn/core/index_translator.cc @@ -0,0 +1,82 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/index_translator.h" + +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +using Index = IndexTranslator::Index; + +IndexTranslator::IndexTranslator(const std::vector &path, + const string &method) + : path_(path), method_(method) { + if (method_ == "identity") { + // Identity lookup: Return the feature index. + step_lookup_ = [](int batch_index, int beam_index, int feature) { + return feature; + }; + } else if (method_ == "history") { + // History lookup: Return the number of steps taken less the feature. + step_lookup_ = [this](int batch_index, int beam_index, int feature) { + if (feature > path_.back()->StepsTaken(batch_index) - 1) { + VLOG(2) << "Translation to outside: feature is " << feature + << " and steps_taken is " + << path_.back()->StepsTaken(batch_index); + return -1; + } + return ((path_.back()->StepsTaken(batch_index) - 1) - feature); + }; + } else { + // Component defined lookup: Get the lookup function from the component. + // If the lookup function is not defined, this function will CHECK. + step_lookup_ = path_.back()->GetStepLookupFunction(method_); + } +} + +Index IndexTranslator::Translate(int batch_index, int beam_index, + int feature_value) { + Index translated_index; + translated_index.batch_index = batch_index; + VLOG(2) << "Translation requested (type: " << method_ << ") for batch " + << batch_index << " beam " << beam_index << " feature " + << feature_value; + + // For all save the last item in the path, get the source index for the + // previous component. + int current_beam_index = beam_index; + VLOG(2) << "Beam index before walk is " << current_beam_index; + for (int i = 0; i < path_.size() - 1; ++i) { + // Backtrack through previous components. For each non-final component, + // figure out what state in the prior component was used to initialize the + // state at the current beam index. + current_beam_index = + path_.at(i)->GetSourceBeamIndex(current_beam_index, batch_index); + VLOG(2) << "Beam index updated to " << current_beam_index; + } + VLOG(2) << "Beam index after walk is " << current_beam_index; + translated_index.step_index = + step_lookup_(batch_index, current_beam_index, feature_value); + VLOG(2) << "Translated step index is " << translated_index.step_index; + translated_index.beam_index = path_.back()->GetBeamIndexAtStep( + translated_index.step_index, current_beam_index, batch_index); + VLOG(2) << "Translated beam index is " << translated_index.beam_index; + return translated_index; +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/index_translator.h b/syntaxnet/dragnn/core/index_translator.h new file mode 100644 index 0000000000000000000000000000000000000000..556e3588fb66601959dbd249fe4d33c00cf568b5 --- /dev/null +++ b/syntaxnet/dragnn/core/index_translator.h @@ -0,0 +1,83 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INDEX_TRANSLATOR_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INDEX_TRANSLATOR_H_ + +#include +#include + +#include "dragnn/core/interfaces/component.h" +#include "dragnn/core/interfaces/transition_state.h" + +namespace syntaxnet { +namespace dragnn { + +// A IndexTranslator provides an interface into the data of another component. +// It allows one component to look up a translated array index from the history +// or state of another component. +// +// When it is created, it is passed a pointer to the source component (that is, +// the component whose data it will be accessing) and a string representing the +// type of data access it will perform. There are two universal data access +// methods - "identity" and "history" - and components can declare more via +// their GetStepLookupFunction function. + +class IndexTranslator { + public: + // Index into a TensorArray. Provides a given step, and the beam index within + // that step, for TensorArray access to data in the given batch. + struct Index { + int batch_index = -1; + int beam_index = -1; + int step_index = -1; + }; + + // Creates a new IndexTranslator with access method as determined by the + // passed string. The Translator will walk the path "path" in order, and will + // translate from the last Component in the path. + IndexTranslator(const std::vector &path, const string &method); + + // Returns an index in (step, beam, batch) index space as computed from the + // given feature value. + Index Translate(int batch_index, int beam_index, int feature_value); + + // Returns the path to be walked by this translator. + const std::vector &path() const { return path_; } + + // Returns the method to be used by this translator. + const string &method() const { return method_; } + + private: + // The ordered list of components that must be walked to get from the + // requesting component to the source component. This vector has the + // requesting component at index 0 and the source component at the end. If + // the requesting component is the source component, this vector has only one + // entry. + const std::vector path_; + + // The function this translator will use to look up the step in the source + // component. The function is invoked as: + // step_lookup_(batch_index, beam_index, feature). + std::function step_lookup_; + + // This translator's method. + string method_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INDEX_TRANSLATOR_H_ diff --git a/syntaxnet/dragnn/core/index_translator_test.cc b/syntaxnet/dragnn/core/index_translator_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..0972ed85591d459a233ebf4cd222a3945c6d2201 --- /dev/null +++ b/syntaxnet/dragnn/core/index_translator_test.cc @@ -0,0 +1,195 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/index_translator.h" + +#include "dragnn/core/test/mock_component.h" +#include "dragnn/core/test/mock_transition_state.h" +#include +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +using testing::MockFunction; +using testing::Return; + +TEST(IndexTranslatorTest, PerformsIdentityTranslation) { + MockComponent mock_component; + + // We are testing the Identity lookup with a single component (so, self- + // referencing) and thus we expect the translator to call GetBeamIndexAtStep + // for the step and index we pass in. + constexpr int kBeam = 4; + constexpr int kFeature = 2; + constexpr int kResultIndex = 3; + constexpr int kBatch = 99; + EXPECT_CALL(mock_component, GetBeamIndexAtStep(kFeature, kBeam, kBatch)) + .WillOnce(Return(kResultIndex)); + + // Execute! + IndexTranslator translator({&mock_component}, "identity"); + auto result = translator.Translate(kBatch, kBeam, kFeature); + EXPECT_EQ(kResultIndex, result.beam_index); + EXPECT_EQ(kFeature, result.step_index); + EXPECT_EQ(kBatch, result.batch_index); +} + +TEST(IndexTranslatorTest, PerformsHistoryTranslation) { + MockComponent mock_component; + + // We are testing the History lookup with a single component (so, self- + // referencing) and thus we expect the translator to call StepsTaken() to get + // the number of steps taken and GetBeamIndexAtStep with (total-desired). + constexpr int kBeam = 4; + constexpr int kFeature = 2; + constexpr int kTotalNumberSteps = 8; + constexpr int kBatch = 99; + + // Here, the expected step result is two in from the final index, so + // (8-1) - 2, or 5. + constexpr int kExpectedResult = 5; + constexpr int kResultIndex = 3; + EXPECT_CALL(mock_component, StepsTaken(kBatch)) + .WillRepeatedly(Return(kTotalNumberSteps)); + EXPECT_CALL(mock_component, + GetBeamIndexAtStep(kExpectedResult, kBeam, kBatch)) + .WillOnce(Return(kResultIndex)); + + // Execute! + IndexTranslator translator({&mock_component}, "history"); + auto result = translator.Translate(kBatch, kBeam, kFeature); + EXPECT_EQ(kResultIndex, result.beam_index); + EXPECT_EQ(kExpectedResult, result.step_index); + EXPECT_EQ(kBatch, result.batch_index); +} + +TEST(IndexTranslatorTest, TraversesPathToLookup) { + MockComponent mock_component_a; + MockComponent mock_component_b; + MockComponent mock_component_c; + constexpr int kBatch = 99; + + // The translator should request the source index from mock component A. + constexpr int kBeam = 4; + constexpr int kSourceBIndex = 3; + EXPECT_CALL(mock_component_a, GetSourceBeamIndex(kBeam, kBatch)) + .WillOnce(Return(kSourceBIndex)); + + // The translator should use the source index from A in a source index request + // to component B. + constexpr int kSourceCIndex = 17; + EXPECT_CALL(mock_component_b, GetSourceBeamIndex(kSourceBIndex, kBatch)) + .WillOnce(Return(kSourceCIndex)); + + // The translator should request the beam index at the requested step in + // component C, using the beam index from the source index request to B. + constexpr int kFeature = 2; + constexpr int kResultIndex = 1157; + + // This is testing with an identity translator, so kFeature == kStep. + EXPECT_CALL(mock_component_c, + GetBeamIndexAtStep(kFeature, kSourceCIndex, kBatch)) + .WillOnce(Return(kResultIndex)); + + // Execute! + IndexTranslator translator( + {&mock_component_a, &mock_component_b, &mock_component_c}, "identity"); + auto result = translator.Translate(kBatch, kBeam, kFeature); + EXPECT_EQ(kResultIndex, result.beam_index); + EXPECT_EQ(kFeature, result.step_index); + EXPECT_EQ(kBatch, result.batch_index); +} + +TEST(IndexTranslatorTest, RequestsArbitraryTranslationFunction) { + MockComponent mock_component; + MockFunction mock_function; + + // This test ensures that we can get an arbitrary translation function + // from the component and execute it properly. + constexpr int kBeam = 4; + constexpr int kFeature = 2; + constexpr int kFunctionResult = 10; + constexpr int kResultIndex = 3; + constexpr int kBatch = 99; + + // The arbitrary function should be called with the desired input. + EXPECT_CALL(mock_function, Call(kBatch, kBeam, kFeature)) + .WillOnce(Return(kFunctionResult)); + + // The translator should request the function from the component. + EXPECT_CALL(mock_component, GetStepLookupFunction("arbitrary_function")) + .WillOnce(Return(mock_function.AsStdFunction())); + + // The translator should call GetBeamIndexAtStep with the result of calling + // the function. + EXPECT_CALL(mock_component, + GetBeamIndexAtStep(kFunctionResult, kBeam, kBatch)) + .WillOnce(Return(kResultIndex)); + + // Execute! + IndexTranslator translator({&mock_component}, "arbitrary_function"); + auto result = translator.Translate(kBatch, kBeam, kFeature); + EXPECT_EQ(kResultIndex, result.beam_index); + EXPECT_EQ(kFunctionResult, result.step_index); + EXPECT_EQ(kBatch, result.batch_index); +} + +// This test ensures that the translation function is queried with the beam +// index for that component, and that the translation function is taken from +// the correct component. +TEST(IndexTranslatorTest, RequestsArbitraryTranslationAcrossComponents) { + MockComponent mock_component_a; + MockComponent mock_component_b; + MockFunction mock_function; + + // This test ensures that we can get an arbitrary translation function + // from the component and execute it properly. + constexpr int kFeature = 2; + constexpr int kFunctionResult = 10; + constexpr int kResultIndex = 3; + constexpr int kBatch = 99; + + // The translator should request the source index from mock component A. + constexpr int kBeam = 4; + constexpr int kSourceBIndex = 3; + EXPECT_CALL(mock_component_a, GetSourceBeamIndex(kBeam, kBatch)) + .WillOnce(Return(kSourceBIndex)); + + // The translator should request the function from the component. + EXPECT_CALL(mock_component_b, GetStepLookupFunction("arbitrary_function")) + .WillOnce(Return(mock_function.AsStdFunction())); + + // The arbitrary function should be called with the desired input. + EXPECT_CALL(mock_function, Call(kBatch, kSourceBIndex, kFeature)) + .WillOnce(Return(kFunctionResult)); + + // The translator should call GetBeamIndexAtStep with the result of calling + // the function. + EXPECT_CALL(mock_component_b, + GetBeamIndexAtStep(kFunctionResult, kSourceBIndex, kBatch)) + .WillOnce(Return(kResultIndex)); + + // Execute! + IndexTranslator translator({&mock_component_a, &mock_component_b}, + "arbitrary_function"); + auto result = translator.Translate(kBatch, kBeam, kFeature); + EXPECT_EQ(kResultIndex, result.beam_index); + EXPECT_EQ(kFunctionResult, result.step_index); + EXPECT_EQ(kBatch, result.batch_index); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/input_batch_cache.h b/syntaxnet/dragnn/core/input_batch_cache.h new file mode 100644 index 0000000000000000000000000000000000000000..1f3ef977cf8fc8cc7382c316426b6ab170b67959 --- /dev/null +++ b/syntaxnet/dragnn/core/input_batch_cache.h @@ -0,0 +1,93 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INPUT_BATCH_CACHE_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INPUT_BATCH_CACHE_H_ + +#include +#include +#include + +#include "dragnn/core/interfaces/input_batch.h" +#include "tensorflow/core/platform/logging.h" + +namespace syntaxnet { +namespace dragnn { + +// A InputBatchCache holds data converted to a DRAGNN internal representation. +// It performs the conversion lazily via Data objects and caches the result. + +class InputBatchCache { + public: + // Creates an empty cache. + InputBatchCache() : stored_type_(std::type_index(typeid(void))) {} + + // Creates a InputBatchCache from a single example. This copies the string. + explicit InputBatchCache(const string &data) + : stored_type_(std::type_index(typeid(void))), source_data_({data}) {} + + // Creates a InputBatchCache from a vector of examples. The vector is copied. + explicit InputBatchCache(const std::vector &data) + : stored_type_(std::type_index(typeid(void))), source_data_(data) {} + + // Adds a single string to the cache. Only useable before GetAs() has been + // called. + void AddData(const string &data) { + CHECK(stored_type_ == std::type_index(typeid(void))) + << "You may not add data to an InputBatchCache after the cache has " + "been converted via GetAs()."; + source_data_.emplace_back(data); + } + + // Converts the stored strings into protos and return them in a specific + // InputBatch subclass. T should always be of type InputBatch. After this + // method is called once, all further calls must be of the same data type. + template + T *GetAs() { + if (!converted_data_) { + stored_type_ = std::type_index(typeid(T)); + converted_data_.reset(new T()); + converted_data_->SetData(source_data_); + } + CHECK(std::type_index(typeid(T)) == stored_type_) + << "Attempted to convert to two object types! Existing object type was " + << stored_type_.name() << ", new object type was " + << std::type_index(typeid(T)).name(); + + return dynamic_cast(converted_data_.get()); + } + + // Returns the serialized representation of the data held in the input batch + // object within this cache. + const std::vector SerializedData() const { + CHECK(converted_data_) << "Cannot return batch without data."; + return converted_data_->GetSerializedData(); + } + + private: + // The typeid of the stored data. + std::type_index stored_type_; + + // The raw data. + std::vector source_data_; + + // The converted data, contained in an InputBatch object. + std::unique_ptr converted_data_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INPUT_BATCH_CACHE_H_ diff --git a/syntaxnet/dragnn/core/input_batch_cache_test.cc b/syntaxnet/dragnn/core/input_batch_cache_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..463663444e3480c66b8422a04daef408aabcb686 --- /dev/null +++ b/syntaxnet/dragnn/core/input_batch_cache_test.cc @@ -0,0 +1,122 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/input_batch_cache.h" + +#include "dragnn/core/interfaces/input_batch.h" +#include +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +class StringData : public InputBatch { + public: + StringData() {} + + void SetData(const std::vector &data) override { + for (const auto &element : data) { + data_.push_back(element + "_converted"); + } + } + + const std::vector GetSerializedData() const override { return data_; } + + std::vector *data() { return &data_; } + + private: + std::vector data_; +}; + +class DifferentStringData : public InputBatch { + public: + DifferentStringData() {} + + void SetData(const std::vector &data) override { + for (const auto &element : data) { + data_.push_back(element + "_also_converted"); + } + } + + const std::vector GetSerializedData() const override { return data_; } + + std::vector *data() { return &data_; } + + private: + std::vector data_; +}; + +TEST(InputBatchCacheTest, ConvertsSingleInput) { + string test_string = "Foo"; + InputBatchCache generic_set(test_string); + auto data = generic_set.GetAs(); + EXPECT_EQ(data->data()->size(), 1); + EXPECT_EQ(data->data()->at(0), "Foo_converted"); +} + +TEST(InputBatchCacheTest, ConvertsAddedInput) { + string test_string = "Foo"; + InputBatchCache generic_set; + generic_set.AddData(test_string); + auto data = generic_set.GetAs(); + EXPECT_EQ(data->data()->size(), 1); + EXPECT_EQ(data->data()->at(0), "Foo_converted"); +} + +TEST(InputBatchCacheTest, ConvertsVectorOfInputs) { + std::vector test_inputs; + test_inputs.push_back("Foo"); + test_inputs.push_back("Bar"); + test_inputs.push_back("Baz"); + InputBatchCache generic_set(test_inputs); + auto data = generic_set.GetAs(); + EXPECT_EQ(data->data()->size(), test_inputs.size()); + EXPECT_EQ(data->data()->at(0), "Foo_converted"); + EXPECT_EQ(data->data()->at(1), "Bar_converted"); + EXPECT_EQ(data->data()->at(2), "Baz_converted"); +} + +TEST(InputBatchCacheTest, ConvertingMultipleDataTypesCausesCheck) { + string test_string = "Foo"; + InputBatchCache generic_set(test_string); + auto data = generic_set.GetAs(); + EXPECT_EQ(data->data()->at(0), "Foo_converted"); + ASSERT_DEATH(generic_set.GetAs(), + "Attempted to convert to two object types!.*"); +} + +TEST(InputBatchCacheTest, ReturnsSingleInput) { + string test_string = "Foo"; + InputBatchCache generic_set(test_string); + auto data = generic_set.GetAs(); + EXPECT_NE(nullptr, data); + auto returned = generic_set.SerializedData(); + EXPECT_EQ(returned.size(), 1); + EXPECT_EQ(returned.at(0), "Foo_converted"); +} + +TEST(InputBatchCacheTest, ConvertsAddedInputDiesAfterGetAs) { + string test_string = "Foo"; + InputBatchCache generic_set; + generic_set.AddData(test_string); + auto data = generic_set.GetAs(); + EXPECT_EQ(data->data()->size(), 1); + EXPECT_EQ(data->data()->at(0), "Foo_converted"); + EXPECT_DEATH(generic_set.AddData("YOU MAY NOT DO THIS AND IT WILL DIE."), + "after the cache has been converted"); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/interfaces/BUILD b/syntaxnet/dragnn/core/interfaces/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..3dcb59781ce356a95e213966bc1de4ce524d71c1 --- /dev/null +++ b/syntaxnet/dragnn/core/interfaces/BUILD @@ -0,0 +1,40 @@ +package( + default_visibility = ["//visibility:public"], + features = ["-layering_check"], +) + +cc_library( + name = "cloneable_transition_state", + hdrs = ["cloneable_transition_state.h"], + deps = [":transition_state"], +) + +cc_library( + name = "component", + hdrs = ["component.h"], + deps = [ + ":transition_state", + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core:input_batch_cache", + "//dragnn/protos:spec_proto", + "//dragnn/protos:trace_proto", + "//syntaxnet:base", + "//syntaxnet:registry", + ], +) + +cc_library( + name = "input_batch", + hdrs = ["input_batch.h"], + deps = [ + "//syntaxnet:base", + ], +) + +cc_library( + name = "transition_state", + hdrs = ["transition_state.h"], + deps = [ + "//syntaxnet:base", + ], +) diff --git a/syntaxnet/dragnn/core/interfaces/cloneable_transition_state.h b/syntaxnet/dragnn/core/interfaces/cloneable_transition_state.h new file mode 100644 index 0000000000000000000000000000000000000000..aa1355b0269382cbafac7cdd0959f867cce5258c --- /dev/null +++ b/syntaxnet/dragnn/core/interfaces/cloneable_transition_state.h @@ -0,0 +1,67 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_CLONEABLE_TRANSITION_STATE_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_CLONEABLE_TRANSITION_STATE_H_ + +#include +#include + +#include "dragnn/core/interfaces/transition_state.h" + +namespace syntaxnet { +namespace dragnn { + +// This defines a TransitionState object that can be used with the Beam class. +// Any class designed to be used with the Beam must inherit from +// CloneableTransitionState, not TransitionState. + +template +class CloneableTransitionState : public TransitionState { + public: + ~CloneableTransitionState() override {} + + // Initialize this TransitionState from a previous TransitionState. The + // ParentBeamIndex is the location of that previous TransitionState in the + // provided beam. + void Init(const TransitionState &parent) override = 0; + + // Return the beam index of the state passed into the initializer of this + // TransitionState. + const int ParentBeamIndex() const override = 0; + + // Get the current beam index for this state. + const int GetBeamIndex() const override = 0; + + // Set the current beam index for this state. + void SetBeamIndex(const int index) override = 0; + + // Get the score associated with this transition state. + const float GetScore() const override = 0; + + // Set the score associated with this transition state. + void SetScore(const float score) override = 0; + + // Depicts this state as an HTML-language string. + string HTMLRepresentation() const override = 0; + + // Produces a new state with the same backing data as this state. + virtual std::unique_ptr Clone() const = 0; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_CLONEABLE_TRANSITION_STATE_H_ diff --git a/syntaxnet/dragnn/core/interfaces/component.h b/syntaxnet/dragnn/core/interfaces/component.h new file mode 100644 index 0000000000000000000000000000000000000000..b382ab0a88a66c02b65a8508a61b817cff86a51a --- /dev/null +++ b/syntaxnet/dragnn/core/interfaces/component.h @@ -0,0 +1,141 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_COMPONENT_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_COMPONENT_H_ + +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/input_batch_cache.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/protos/spec.pb.h" +#include "dragnn/protos/trace.pb.h" +#include "syntaxnet/registry.h" + +namespace syntaxnet { +namespace dragnn { + +class Component : public RegisterableClass { + public: + virtual ~Component() {} + + // Initializes this component from the spec. + virtual void InitializeComponent(const ComponentSpec &spec) = 0; + + // Provides the previous beam to the component. + virtual void InitializeData( + const std::vector> &states, + int max_beam_size, InputBatchCache *input_data) = 0; + + // Returns true if the component has had InitializeData called on it since + // the last time it was reset. + virtual bool IsReady() const = 0; + + // Initializes the component for tracing execution, resetting any existing + // traces. This will typically have the side effect of slowing down all + // subsequent Component calculations and storing a trace in memory that can be + // returned by GetTraceProtos(). + virtual void InitializeTracing() = 0; + + // Disables tracing, freeing any associated traces and avoiding triggering + // additional computation in the future. + virtual void DisableTracing() = 0; + + // Returns the string name of this component. + virtual string Name() const = 0; + + // Returns the current batch size of the component's underlying data. + virtual int BatchSize() const = 0; + + // Returns the maximum beam size of this component. + virtual int BeamSize() const = 0; + + // Returns the number of steps taken by this component so far. + virtual int StepsTaken(int batch_index) const = 0; + + // Return the beam index of the item which is currently at index + // 'index', when the beam was at step 'step', for batch element 'batch'. + virtual int GetBeamIndexAtStep(int step, int current_index, + int batch) const = 0; + + // Return the source index of the item which is currently at index 'index' + // for batch element 'batch'. This index is into the final beam of the + // Component that this Component was initialized from. + virtual int GetSourceBeamIndex(int current_index, int batch) const = 0; + + // Request a translation function based on the given method string. + // The translation function will be called with arguments (beam, batch, value) + // and should return the step index corresponding to the given value, for the + // data in the given beam and batch. + virtual std::function GetStepLookupFunction( + const string &method) = 0; + + // Advances this component from the given transition matrix. + virtual void AdvanceFromPrediction(const float transition_matrix[], + int transition_matrix_length) = 0; + + // Advances this component from the state oracles. + virtual void AdvanceFromOracle() = 0; + + // Returns true if all states within this component are terminal. + virtual bool IsTerminal() const = 0; + + // Returns the current batch of beams for this component. + virtual std::vector> GetBeam() = 0; + + // Extracts and populates the vector of FixedFeatures for the specified + // channel. Each functor allocates storage space for the indices, the IDs, and + // the weights (respectively). + virtual int GetFixedFeatures( + std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id) const = 0; + + // Extracts and populates all FixedFeatures for all channels, advancing this + // component via the oracle until it is terminal. This call uses a + // BulkFeatureExtractor object to contain the functors and other information. + virtual int BulkGetFixedFeatures(const BulkFeatureExtractor &extractor) = 0; + + // Extracts and returns the vector of LinkFeatures for the specified + // channel. Note: these are NOT translated. + virtual std::vector GetRawLinkFeatures( + int channel_id) const = 0; + + // Returns a vector of oracle labels for each element in the beam and + // batch. + virtual std::vector> GetOracleLabels() const = 0; + + // Annotate the underlying data object with the results of this Component's + // calculation. + virtual void FinalizeData() = 0; + + // Reset this component. + virtual void ResetComponent() = 0; + + // Get a vector of all traces managed by this component. + virtual std::vector> GetTraceProtos() const = 0; + + // Add the translated link features (done outside the component) to the traces + // managed by this component. + virtual void AddTranslatedLinkFeaturesToTrace( + const std::vector &features, int channel_id) = 0; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_COMPONENT_H_ diff --git a/syntaxnet/dragnn/core/interfaces/input_batch.h b/syntaxnet/dragnn/core/interfaces/input_batch.h new file mode 100644 index 0000000000000000000000000000000000000000..100ec76b115138a5db66095753495e688e8dd362 --- /dev/null +++ b/syntaxnet/dragnn/core/interfaces/input_batch.h @@ -0,0 +1,45 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_INPUT_BATCH_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_INPUT_BATCH_H_ + +#include +#include + +#include "syntaxnet/base.h" + +namespace syntaxnet { +namespace dragnn { + +// An InputBatch object converts strings into a given data type. It is used to +// abstract DRAGNN internal data typing. Each internal DRAGNN data type should +// subclass InputBatch, with a public accessor to the type in question. + +class InputBatch { + public: + virtual ~InputBatch() {} + + // Set the data to translate to the subclass' data type. + virtual void SetData(const std::vector &data) = 0; + + // Translate the underlying data back to a vector of strings, as appropriate. + virtual const std::vector GetSerializedData() const = 0; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_INPUT_BATCH_H_ diff --git a/syntaxnet/dragnn/core/interfaces/transition_state.h b/syntaxnet/dragnn/core/interfaces/transition_state.h new file mode 100644 index 0000000000000000000000000000000000000000..c00409b98f02526f9436451eac9516a0f68885a4 --- /dev/null +++ b/syntaxnet/dragnn/core/interfaces/transition_state.h @@ -0,0 +1,68 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_TRANSITION_STATE_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_TRANSITION_STATE_H_ + +#include +#include + +#include "syntaxnet/base.h" + +namespace syntaxnet { +namespace dragnn { + +// TransitionState defines the minimal interface required to pass data between +// Component objects. It is used to initialize one Component from the output of +// another, and every backend should define one. Note that inheriting from +// TransitionState directly is not sufficient to use the Beam class, which +// requires extra functionality given by inheriting from the +// ClonableTransitionState interface. (ClonableTransitionState is a subclass +// of TransitionState, so inheriting from ClonableTransitionState is sufficient +// to allow Components to pass your backing states.) + +class TransitionState { + public: + virtual ~TransitionState() {} + + // Initialize this TransitionState from a previous TransitionState. The + // ParentBeamIndex is the location of that previous TransitionState in the + // provided beam. + virtual void Init(const TransitionState &parent) = 0; + + // Return the beam index of the state passed into the initializer of this + // TransitionState. + virtual const int ParentBeamIndex() const = 0; + + // Get the current beam index for this state. + virtual const int GetBeamIndex() const = 0; + + // Set the current beam index for this state. + virtual void SetBeamIndex(const int index) = 0; + + // Get the score associated with this transition state. + virtual const float GetScore() const = 0; + + // Set the score associated with this transition state. + virtual void SetScore(const float score) = 0; + + // Depicts this state as an HTML-language string. + virtual string HTMLRepresentation() const = 0; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_INTERFACES_TRANSITION_STATE_H_ diff --git a/syntaxnet/dragnn/core/interfaces/transition_state_starter_test.cc b/syntaxnet/dragnn/core/interfaces/transition_state_starter_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..a12edf845c6921d06b6d9b10add3a5de6437ff1c --- /dev/null +++ b/syntaxnet/dragnn/core/interfaces/transition_state_starter_test.cc @@ -0,0 +1,128 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/test/mock_transition_state.h" +#include +#include "testing/base/public/googletest.h" +#include "testing/base/public/gunit.h" + +// This test suite is intended to validate the contracts that the DRAGNN +// system expects from all transition state subclasses. Developers creating +// new TransitionState subclasses should copy this test and modify it as needed, +// using it to ensure their state conforms to DRAGNN expectations. + +namespace syntaxnet { +namespace dragnn { + +using testing::Return; + +// When this test is instantiated, this function should be changed to +// instantiate a TransitionState subclass of the appropriate type instead +// of Transitionstate-> +std::unique_ptr CreateState() { + std::unique_ptr test_state(new TransitionState()); + return test_state; +} + +// Validates the consistency of the beam index setter and getter. +TEST(TransitionStateInterfaceTest, CanSetAndGetBeamIndex) { + // Create and initialize a test state-> + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr int kOldBeamIndex = 12; + test_state->SetBeamIndex(kOldBeamIndex); + EXPECT_EQ(test_state->GetBeamIndex(), kOldBeamIndex); + + constexpr int kNewBeamIndex = 7; + test_state->SetBeamIndex(kNewBeamIndex); + EXPECT_EQ(test_state->GetBeamIndex(), kNewBeamIndex); +} + +// Validates the consistency of the score setter and getter. +TEST(TransitionStateInterfaceTest, CanSetAndGetScore) { + // Create and initialize a test state-> + MockTransitionState mock_state; + auto test_state = CreateState(); + test_state->Init(mock_state); + + constexpr float kOldScore = 12.1; + test_state->SetScore(kOldScore); + EXPECT_EQ(test_state->GetScore(), kOldScore); + + constexpr float kNewScore = 7.2; + test_state->SetScore(kNewScore); + EXPECT_EQ(test_state->GetScore(), kNewScore); +} + +// This test ensures that the initializing state's current index is saved +// as the parent beam index of the state being initialized. +TEST(TransitionStateInterfaceTest, ReportsParentBeamIndex) { + // Create a mock transition state that wil report a specific current index. + // This index should become the parent state index for the test state-> + MockTransitionState mock_state; + constexpr int kParentBeamIndex = 1138; + EXPECT_CALL(mock_state, GetBeamIndex()) + .WillRepeatedly(Return(kParentBeamIndex)); + + auto test_state = CreateState(); + test_state->Init(mock_state); + EXPECT_EQ(test_state->ParentBeamIndex(), kParentBeamIndex); +} + +// This test ensures that the initializing state's current score is saved +// as the current score of the state being initialized. +TEST(TransitionStateInterfaceTest, InitializationCopiesParentScore) { + // Create a mock transition state that wil report a specific current index. + // This index should become the parent state index for the test state-> + MockTransitionState mock_state; + constexpr float kParentScore = 24.12; + EXPECT_CALL(mock_state, GetScore()).WillRepeatedly(Return(kParentScore)); + + auto test_state = CreateState(); + test_state->Init(mock_state); + EXPECT_EQ(test_state->GetScore(), kParentScore); +} + +// This test ensures that calling Clone maintains the state data (parent beam +// index, beam index, score, etc.) of the state that was cloned. +TEST(TransitionStateInterfaceTest, CloningMaintainsState) { + // Create and initialize the state-> + MockTransitionState mock_state; + constexpr int kParentBeamIndex = 1138; + EXPECT_CALL(mock_state, GetBeamIndex()) + .WillRepeatedly(Return(kParentBeamIndex)); + auto test_state = CreateState(); + test_state->Init(mock_state); + + // Validate the internal state of the test state. + constexpr float kOldScore = 20.0; + test_state->SetScore(kOldScore); + EXPECT_EQ(test_state->GetScore(), kOldScore); + constexpr int kOldBeamIndex = 12; + test_state->SetBeamIndex(kOldBeamIndex); + EXPECT_EQ(test_state->GetBeamIndex(), kOldBeamIndex); + + auto clone = test_state->Clone(); + + // The clone should have identical state to the old state. + EXPECT_EQ(clone->ParentBeamIndex(), kParentBeamIndex); + EXPECT_EQ(clone->GetScore(), kOldScore); + EXPECT_EQ(clone->GetBeamIndex(), kOldBeamIndex); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/compute_session_op.cc b/syntaxnet/dragnn/core/ops/compute_session_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..e27e32935562ac604e2ebcea5d24f811aad35288 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/compute_session_op.cc @@ -0,0 +1,85 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/ops/compute_session_op.h" + +#include "dragnn/core/compute_session.h" +#include "dragnn/core/resource_container.h" +#include "tensorflow/core/framework/tensor_shape.h" + +namespace syntaxnet { +namespace dragnn { + +using tensorflow::OpKernel; +using tensorflow::OpKernelConstruction; +using tensorflow::OpKernelContext; +using tensorflow::Tensor; +using tensorflow::TensorShape; +using tensorflow::errors::InvalidArgument; + +typedef ResourceContainer ComputeSessionResource; + +ComputeSessionOp::ComputeSessionOp(OpKernelConstruction *context) + : OpKernel(context) { + OP_REQUIRES(context, context->num_inputs() > 0, + InvalidArgument("Must declare at least one input of type string " + "for the ComputeSession handle.")); + OP_REQUIRES(context, context->input_type(0) == tensorflow::DT_STRING, + InvalidArgument("Must declare at least one input of type string " + "for the ComputeSession handle.")); + OP_REQUIRES_OK(context, context->GetAttr("component", &component_name_)); +} + +// Computes extracts the state from the resource manager and calls +// ComputeWithState(). If OutputsHandle() is true, also outputs the handle for +// subsequent ops. +void ComputeSessionOp::Compute(OpKernelContext *context) { + // Validates the input/output tensors and the op attrs. + if (RequiresComponentName()) { + OP_REQUIRES(context, !component_name_.empty(), + InvalidArgument("Required \"component\" attribute is empty.")); + } + if (OutputsHandle()) { + OP_REQUIRES(context, context->num_outputs() > 0, + InvalidArgument( + "Must declare at least one output of type string " + "for the ComputeSession handle if OutputsHandle is true.")); + OP_REQUIRES(context, + context->expected_output_dtype(0) == tensorflow::DT_STRING, + InvalidArgument( + "Must declare at least one output of type string " + "for the ComputeSession handle if OutputsHandle is true.")); + } + + // Gets the relevant ComputeSessionResource and computes with it. + auto handle = context->input(0).vec(); + ComputeSessionResource *session_resource; + OP_REQUIRES_OK(context, + context->resource_manager()->Lookup( + handle(0), handle(1), &session_resource)); + ComputeWithState(context, session_resource->get()); + + // Outputs the passed handle, if necessary, allowing op dependency chains. + if (OutputsHandle()) { + Tensor *output; + OP_REQUIRES_OK(context, + context->allocate_output(0, TensorShape({2}), &output)); + output->vec() = handle; + } + session_resource->Unref(); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/compute_session_op.h b/syntaxnet/dragnn/core/ops/compute_session_op.h new file mode 100644 index 0000000000000000000000000000000000000000..9780ca817e3a2a6bba5f720fa7a14120d286a634 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/compute_session_op.h @@ -0,0 +1,69 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_OPS_COMPUTE_SESSION_OP_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_OPS_COMPUTE_SESSION_OP_H_ + +#include + +#include "dragnn/core/compute_session.h" +#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor" +#include "tensorflow/core/framework/op_kernel.h" +#include "tensorflow/core/framework/tensor.h" + +namespace syntaxnet { +namespace dragnn { + +// Abstract base class: Given a MasterState and a component name, runs some op +// on the component state. The first input is always the handle. If +// OutputsHandle() is true in the derived class, then the first output will also +// be the handle. +class ComputeSessionOp : public tensorflow::OpKernel { + public: + explicit ComputeSessionOp(tensorflow::OpKernelConstruction *context); + + // Virtual Compute()-like function that assumes the state has been extracted + // from the handle. + virtual void ComputeWithState(tensorflow::OpKernelContext *context, + ComputeSession *compute_session) = 0; + + // Compute extracts the state from the resource manager and calls + // ComputeWithState(). If OutputsHandle() is true, also outputs the handle for + // subsequent ops. + void Compute(tensorflow::OpKernelContext *context) override; + + protected: + // If true, then the handle will be the first output of this op. + virtual bool OutputsHandle() const = 0; + + // If true, then the constructor will check that the "component_name" + // attribute is set. + virtual bool RequiresComponentName() const = 0; + + // Returns the component name. + string component_name() const { + CHECK(RequiresComponentName()); + return component_name_; + } + + private: + // Name of the component used by this op. + string component_name_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_OPS_COMPUTE_SESSION_OP_H_ diff --git a/syntaxnet/dragnn/core/ops/dragnn_bulk_op_kernels.cc b/syntaxnet/dragnn/core/ops/dragnn_bulk_op_kernels.cc new file mode 100644 index 0000000000000000000000000000000000000000..28355557c6139dcc85d89d0d76f7fb1245490f2a --- /dev/null +++ b/syntaxnet/dragnn/core/ops/dragnn_bulk_op_kernels.cc @@ -0,0 +1,411 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include +#include +#include +#include + +#include "dragnn/core/ops/compute_session_op.h" +#include "dragnn/core/resource_container.h" +#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor" +#include "tensorflow/core/framework/numeric_types.h" +#include "tensorflow/core/framework/op_kernel.h" +#include "tensorflow/core/framework/tensor.h" +#include "tensorflow/core/framework/tensor_shape.h" +#include "tensorflow/core/framework/types.pb.h" +#include "tensorflow/core/lib/core/status.h" +#include "tensorflow/core/platform/types.h" + +using std::vector; + +using tensorflow::DEVICE_CPU; +using tensorflow::DT_FLOAT; +using tensorflow::DT_INT32; +using tensorflow::DT_INT64; +using tensorflow::DT_STRING; +using tensorflow::DataType; +using tensorflow::OpKernel; +using tensorflow::OpKernelConstruction; +using tensorflow::OpKernelContext; +using tensorflow::quint8; +using tensorflow::Status; +using tensorflow::Tensor; +using tensorflow::TensorShape; +using tensorflow::uint8; + +namespace syntaxnet { +namespace dragnn { + +namespace { + +// Helper struct for resource manager. +struct VectorTriple { + std::unique_ptr>>> + index_vectors; + std::unique_ptr>>> id_vectors; + std::unique_ptr>>> + weight_vectors; +}; + +} // namespace + +typedef ResourceContainer VectorTripleResource; + +// See docstring in dragnn_bulk_ops.cc. +class BulkFixedFeatures : public ComputeSessionOp { + public: + explicit BulkFixedFeatures(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->GetAttr("num_channels", &num_channels_)); + + // Input: state handle. + vector input_types(1, DT_STRING); + + // Output: indices, ids and weights for every fixed feature channel. + vector output_types; + output_types.push_back(DT_STRING); + for (int c = 0; c < num_channels_; ++c) output_types.push_back(DT_INT32); + for (int c = 0; c < num_channels_; ++c) output_types.push_back(DT_INT64); + for (int c = 0; c < num_channels_; ++c) output_types.push_back(DT_FLOAT); + output_types.push_back(DT_INT32); + OP_REQUIRES_OK(context, context->MatchSignature(input_types, output_types)); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + constexpr int kTensorOffset = 1; + auto indices_allocator = [context, kTensorOffset](int channel, + int num_elements) { + Tensor *output; + CHECK(context + ->allocate_output(channel + kTensorOffset, + TensorShape({num_elements}), &output) + .ok()); + return output->vec().data(); + }; + + const int num_channels = num_channels_; + auto ids_allocator = [context, num_channels, kTensorOffset]( + int channel, int num_elements) { + Tensor *output; + CHECK(context + ->allocate_output(num_channels + channel + kTensorOffset, + TensorShape({num_elements}), &output) + .ok()); + return output->vec().data(); + }; + auto weights_allocator = [context, num_channels, kTensorOffset]( + int channel, int num_elements) { + Tensor *output; + CHECK(context + ->allocate_output(2 * num_channels + channel + kTensorOffset, + TensorShape({num_elements}), &output) + .ok()); + return output->vec().data(); + }; + + BulkFeatureExtractor extractor(indices_allocator, ids_allocator, + weights_allocator); + + int num_steps = session->BulkGetInputFeatures(component_name(), extractor); + VLOG(2) << "Extracted " << num_steps; + Tensor *num_steps_tensor; + OP_REQUIRES_OK( + context, context->allocate_output(3 * num_channels_ + 1, + TensorShape({}), &num_steps_tensor)); + num_steps_tensor->scalar()() = num_steps; + } + + private: + // Number of fixed feature channels. + int num_channels_; +}; + +REGISTER_KERNEL_BUILDER(Name("BulkFixedFeatures").Device(DEVICE_CPU), + BulkFixedFeatures); + +// See docstring in dragnn_bulk_ops.cc. +class BulkFixedEmbeddings : public ComputeSessionOp { + public: + explicit BulkFixedEmbeddings(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->GetAttr("num_channels", &num_channels_)); + + // Input: state handle. + vector input_types; + input_types.push_back(DT_STRING); + for (int c = 0; c < num_channels_; ++c) input_types.push_back(DT_FLOAT); + const vector output_types = {DT_STRING, DT_FLOAT, DT_INT32}; + OP_REQUIRES_OK(context, context->MatchSignature(input_types, output_types)); + OP_REQUIRES_OK(context, context->GetAttr("pad_to_batch", &pad_to_batch_)); + OP_REQUIRES_OK(context, context->GetAttr("pad_to_steps", &pad_to_steps_)); + use_padding_ = (pad_to_steps_ != -1) || (pad_to_batch_ != -1); + VLOG(2) << "Created a BulkFixedEmbeddings with use_padding = " + << use_padding_; + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + const int batch_size = session->BatchSize(component_name()); + tensorflow::ResourceMgr *rmgr = context->resource_manager(); + + // Create the pool for this container, or re-use one that was allocated in a + // previous call. + auto create = [this](VectorTripleResource **resource) { + LOG(INFO) << "Creating new VectorTripleResource"; + std::unique_ptr triple(new VectorTriple()); + *resource = new VectorTripleResource(std::move(triple)); + (*resource)->get()->index_vectors.reset( + new std::vector>>(num_channels_)); + (*resource)->get()->id_vectors.reset( + new std::vector>>(num_channels_)); + (*resource)->get()->weight_vectors.reset( + new std::vector>>(num_channels_)); + for (int i = 0; i < num_channels_; ++i) { + (*resource)->get()->index_vectors->at(i).reset( + new std::vector()); + (*resource)->get()->id_vectors->at(i).reset(new std::vector()); + (*resource)->get()->weight_vectors->at(i).reset( + new std::vector()); + } + return Status::OK(); + }; + + VectorTripleResource *vector_triple; + auto handle = context->input(0).vec(); + OP_REQUIRES_OK(context, rmgr->LookupOrCreate( + handle(0), handle(1), &vector_triple, create)); + + std::vector>> *indices = + vector_triple->get()->index_vectors.get(); + std::vector>> *ids = + vector_triple->get()->id_vectors.get(); + std::vector>> *weights = + vector_triple->get()->weight_vectors.get(); + + auto indices_allocator = [context, &indices](int channel, int size) { + (*indices)[channel]->resize(size); + return (*indices)[channel]->data(); + }; + auto ids_allocator = [context, &ids](int channel, int size) { + (*ids)[channel]->resize(size); + return (*ids)[channel]->data(); + }; + auto weights_allocator = [context, &weights](int channel, int size) { + (*weights)[channel]->resize(size); + return (*weights)[channel]->data(); + }; + + BulkFeatureExtractor extractor(indices_allocator, ids_allocator, + weights_allocator, use_padding_, + pad_to_steps_, pad_to_batch_); + + int num_steps = session->BulkGetInputFeatures(component_name(), extractor); + VLOG(2) << "Extracted " << num_steps; + + Tensor *num_steps_tensor; + OP_REQUIRES_OK(context, context->allocate_output(2, TensorShape({}), + &num_steps_tensor)); + num_steps_tensor->scalar()() = num_steps; + + // Looks up and outputs embedding vectors. + const auto &spec = session->Spec(component_name()); + + int embedding_size = 0; + for (int channel = 0; channel < num_channels_; ++channel) { + embedding_size += context->input(1 + channel).shape().dim_size(1) * + spec.fixed_feature(channel).size(); + } + + const int padded_batch = std::max(pad_to_batch_, batch_size); + const int padded_num_steps = std::max(pad_to_steps_, num_steps); + Tensor *embedding_vectors; + OP_REQUIRES_OK( + context, + context->allocate_output( + 1, TensorShape({padded_num_steps * padded_batch, embedding_size}), + &embedding_vectors)); + embedding_vectors->flat().setZero(); + + int channel_offset = 0; + for (int channel = 0; channel < num_channels_; ++channel) { + ExtractForChannel(*(indices->at(channel)), *(ids->at(channel)), + *(weights->at(channel)), channel_offset, + context->input(1 + channel), embedding_vectors); + channel_offset += context->input(1 + channel).shape().dim_size(1) * + spec.fixed_feature(channel).size(); + } + vector_triple->Unref(); + } + + private: + void ExtractForChannel(const std::vector &indices, + const std::vector &ids, + const std::vector &weights, int channel_base, + const Tensor &embeddings, Tensor *output) { + // Just turn this into a feature-size matrix, then the index is just the + // X coordinate into it. Run up the row (known length!) and sum. + int num_elements = output->shape().dim_size(0); + int embedding_length = embeddings.shape().dim_size(1); + VLOG(2) << "Num elements: " << num_elements; + VLOG(2) << "Embedding length: " << embedding_length; + auto output_matrix = output->matrix(); + auto embedding_matrix = embeddings.matrix(); + VLOG(2) << "Channel base:" << channel_base; + for (int i = 0; i < indices.size(); ++i) { + VLOG(2) << "Feature: ind:" << indices[i] << ", id: " << ids[i] + << ", wt: " << weights[i]; + int y_base = + (indices[i] / num_elements) * embedding_length + channel_base; + int x_base = indices[i] % num_elements; + VLOG(2) << "Extracting to (x,y) = (" << x_base << "," << y_base << ")"; + for (int j = 0; j < embedding_length; ++j) { + output_matrix(x_base, y_base + j) += + embedding_matrix(ids[i], j) * weights[i]; + } + } + } + + // Number of fixed feature channels. + int num_channels_; + + // Will pad output to at least this many batch elements. + int pad_to_batch_ = -1; + + // Will pad output to at least this many steps. + int pad_to_steps_ = -1; + + // Set if either pad_to_batch or pad_to_steps is not -1. + bool use_padding_ = false; + + TF_DISALLOW_COPY_AND_ASSIGN(BulkFixedEmbeddings); +}; + +REGISTER_KERNEL_BUILDER(Name("BulkFixedEmbeddings").Device(DEVICE_CPU), + BulkFixedEmbeddings); + +// See docstring in dragnn_bulk_ops.cc. +class BulkAdvanceFromOracle : public ComputeSessionOp { + public: + explicit BulkAdvanceFromOracle(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, + context->MatchSignature({DT_STRING}, {DT_STRING, DT_INT32})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + // Advances all transition states along the oracle path. + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + const int batch_size = session->BatchSize(component_name()); + const int beam_size = session->BeamSize(component_name()); + const int num_items = batch_size * beam_size; + vector>> gold; + + int num_steps = 0; + while (!session->IsTerminal(component_name())) { + gold.emplace_back(session->EmitOracleLabels(component_name())); + + // Advance the component. + session->AdvanceFromOracle(component_name()); + ++num_steps; + } + + // Fills output tensor with oracle labels where possible, or -1. + Tensor *gold_output; + OP_REQUIRES_OK(context, + context->allocate_output( + 1, TensorShape({num_items * num_steps}), &gold_output)); + int item = 0; + for (int batch_ix = 0; batch_ix < batch_size; ++batch_ix) { + for (int beam_ix = 0; beam_ix < beam_size; ++beam_ix, ++item) { + for (int step = 0; step < num_steps; ++step) { + gold_output->vec()(item * num_steps + step) = + step < gold.size() ? gold[step][batch_ix][beam_ix] : -1; + } + } + } + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(BulkAdvanceFromOracle); +}; + +REGISTER_KERNEL_BUILDER(Name("BulkAdvanceFromOracle").Device(DEVICE_CPU), + BulkAdvanceFromOracle); + +// See docstring in dragnn_bulk_ops.cc. +template +class BulkAdvanceFromPrediction : public ComputeSessionOp { + public: + explicit BulkAdvanceFromPrediction(OpKernelConstruction *context) + : ComputeSessionOp(context) { + const DataType dt = tensorflow::DataTypeToEnum::v(); + OP_REQUIRES_OK(context, + context->MatchSignature({DT_STRING, dt}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + // Advances all transition states as much as possible using the given scores. + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + const Tensor &scores_tensor = context->input(1); + const auto &scores = scores_tensor.matrix(); + const int num_items = (session->BatchSize(component_name()) * + session->BeamSize(component_name())); + const int num_actions = scores_tensor.shape().dim_size(1); + const int num_steps = scores_tensor.shape().dim_size(0) / num_items; + vector scores_per_step(num_items * num_actions); + for (int step = 0; step < num_steps; ++step) { + for (int item = 0; item < num_items; ++item) { + for (int action = 0; action < num_actions; ++action) { + scores_per_step[item * num_actions + action] = + scores(item * num_steps + step, action); + } + } + if (!session->IsTerminal(component_name())) { + session->AdvanceFromPrediction(component_name(), scores_per_step.data(), + scores_per_step.size()); + } + } + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(BulkAdvanceFromPrediction); +}; + +#define REGISTER_BULK_ADVANCE(type) \ + REGISTER_KERNEL_BUILDER(Name("BulkAdvanceFromPrediction") \ + .Device(DEVICE_CPU) \ + .TypeConstraint("T"), \ + BulkAdvanceFromPrediction) + +REGISTER_BULK_ADVANCE(float); +REGISTER_BULK_ADVANCE(quint8); +REGISTER_BULK_ADVANCE(uint8); + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/dragnn_bulk_op_kernels_test.cc b/syntaxnet/dragnn/core/ops/dragnn_bulk_op_kernels_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..01b776f6cc20758c9dbabe69a6ab7e133a66fd90 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/dragnn_bulk_op_kernels_test.cc @@ -0,0 +1,603 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/compute_session_pool.h" +#include "dragnn/core/resource_container.h" +#include "dragnn/core/test/mock_compute_session.h" + +#include +#include "tensorflow/core/framework/fake_input.h" +#include "tensorflow/core/framework/node_def_builder.h" +#include "tensorflow/core/kernels/ops_testutil.h" + +namespace syntaxnet { +namespace dragnn { + +using tensorflow::AllocatorAttributes; +using tensorflow::DT_FLOAT; +using tensorflow::DT_STRING; +using tensorflow::FrameAndIter; +using tensorflow::NodeDefBuilder; +using tensorflow::OpKernelContext; +using tensorflow::ResourceMgr; +using tensorflow::ScopedStepContainer; +using tensorflow::Status; +using tensorflow::TensorShape; +using tensorflow::checkpoint::TensorSliceReaderCacheWrapper; +using tensorflow::test::SetOutputAttrs; + +using testing::Return; +using testing::_; + +typedef ResourceContainer ComputeSessionResource; +typedef ResourceContainer ComputeSessionPoolResource; + +class DragnnBulkOpKernelsTest : public tensorflow::OpsTestBase { + public: + static const int kEmbeddingSize = 2; + static const int kNumActions = 3; + static const int kNumChannels = 2; + static const int kNumIds = 8; + static const int kNumItems = 3; + static const int kNumSteps = 3; + const string kComponentName = "TESTING_COMPONENT_NAME"; + + MockComputeSession *GetMockSession() { + TF_CHECK_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_CHECK_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + return mock_session_ptr; + } + + void ResetOpKernelContext() { + params_.reset(new OpKernelContext::Params); + params_->device = device_.get(); + params_->frame_iter = FrameAndIter(0, 0); + params_->inputs = &inputs_; + params_->op_kernel = kernel_.get(); + step_container_.reset(new ScopedStepContainer(0, [](const string &) {})); + params_->step_container = step_container_.get(); + attrs_.clear(); + SetOutputAttrs(params_.get(), &attrs_); + TensorSliceReaderCacheWrapper slice_reader_cache_wrapper; + params_->slice_reader_cache = &slice_reader_cache_wrapper; + params_->resource_manager = device_->resource_manager(); + context_.reset(new OpKernelContext(params_.get())); + } + + Status RunOpKernelWithContext() { + device_->Compute(kernel_.get(), context_.get()); + return context_->status(); + } + + // Accessor for the underlying resource manager. + ResourceMgr *resource_mgr() { return params_->resource_manager; } + /* + // Returns a vector with dimensions: channel x batch x step. + // For each item we return features for three steps: + // feature step 0: (5, 1) + // feature step 1: (5, 0.5), (6, 0.7) + // feature step 2: (3, 0.1), (7, [empty]) <- Default weight is 1.0. + void ExpectFeatures(MockComputeSession *mock_session) { + vector feature_step_zero, feature_step_one, feature_step_two; + for (int item = 0; item < kNumItems; ++item) { + feature_step_zero.emplace_back(); + feature_step_zero.back().add_id(5); + feature_step_zero.back().add_weight(1.0); + feature_step_one.emplace_back(); + feature_step_one.back().add_id(5); + feature_step_one.back().add_weight(0.5); + feature_step_one.back().add_id(6); + feature_step_one.back().add_weight(0.7); + feature_step_two.emplace_back(); + feature_step_two.back().add_id(3); + feature_step_two.back().add_weight(0.1); + feature_step_two.back().add_id(7); + } + for (int channel = 0; channel < kNumChannels; ++channel) { + EXPECT_CALL(*mock_session, GetInputFeatures(kComponentName, channel)) + .Times(3) + .WillOnce(Return(feature_step_zero)) + .WillOnce(Return(feature_step_one)) + .WillOnce(Return(feature_step_two)); + } + } + + // Returns a vector with dimensions: channel x batch x step. + // For each item we return features for three steps with ids only: + // feature step 0: id=5 + // feature step 1: id=6 + // feature step 2: id=3 + void ExpectFeatureIds(MockComputeSession *mock_session) { + vector feature_step_zero, feature_step_one, feature_step_two; + for (int item = 0; item < kNumItems; ++item) { + feature_step_zero.emplace_back(); + feature_step_zero.back().add_id(5); + feature_step_one.emplace_back(); + feature_step_one.back().add_id(6); + feature_step_two.emplace_back(); + feature_step_two.back().add_id(3); + } + for (int channel = 0; channel < kNumChannels; ++channel) { + EXPECT_CALL(*mock_session, GetInputFeatures(kComponentName, channel)) + .Times(3) + .WillOnce(Return(feature_step_zero)) + .WillOnce(Return(feature_step_one)) + .WillOnce(Return(feature_step_two)); + } + } + */ + // This needs to maintain its existence throughout the compute call. + std::vector attrs_; +}; + +const int DragnnBulkOpKernelsTest::kEmbeddingSize; +const int DragnnBulkOpKernelsTest::kNumActions; +const int DragnnBulkOpKernelsTest::kNumChannels; +const int DragnnBulkOpKernelsTest::kNumIds; +const int DragnnBulkOpKernelsTest::kNumItems; +const int DragnnBulkOpKernelsTest::kNumSteps; + +// The ExtractFixedFeatures op should return a set of fixed feature vectors +// as described below. +TEST_F(DragnnBulkOpKernelsTest, BulkFixedFeatures) { + // Create and initialize the kernel under test. + TF_ASSERT_OK( + NodeDefBuilder("BulkFixedFeatures", "BulkFixedFeatures") + .Attr("component", kComponentName) + .Attr("num_channels", kNumChannels) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + + MockComputeSession *mock_session = GetMockSession(); + const std::vector expected_indices({0, 2, 1, 0, 1}); + const std::vector expected_ids({5, 5, 6, 3, 7}); + const std::vector expected_weights({1.0, 0.5, 0.7, 0.1, 1.0}); + + // This function takes the allocator functions passed into GetBulkFF, uses + // them to allocate a tensor, then fills that tensor based on channel. + auto assigner_function = [=](string, const BulkFeatureExtractor &extractor) { + constexpr int kFeatureCount = 3; + constexpr int kTotalFeatures = 5; + constexpr int kNumSteps = 3; + for (int i = 0; i < kNumChannels; ++i) { + // Allocate a new tensor set for every channel. + int32 *indices = + extractor.AllocateIndexMemory(i, kTotalFeatures * kNumSteps); + int64 *ids = extractor.AllocateIdMemory(i, kTotalFeatures * kNumSteps); + float *weights = + extractor.AllocateWeightMemory(i, kTotalFeatures * kNumSteps); + + // Fill the tensor. + int array_index = 0; + for (int step = 0; step < kNumSteps; step++) { + for (int j = 0; j < kTotalFeatures; ++j) { + int offset = i + 1; + indices[array_index] = + (expected_indices[j] + step * kFeatureCount) * offset; + ids[array_index] = expected_ids[j] * offset; + weights[array_index] = expected_weights[j] * offset; + ++array_index; + } + } + } + return kNumSteps; + }; + + EXPECT_CALL(*mock_session, BulkGetInputFeatures(kComponentName, _)) + .WillOnce(testing::Invoke(assigner_function)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + // In this case, for every channel we should have: + // indices = [0 , 2 , 1 , 0 , 1 ] + // [3 , 5 , 4 , 3 , 4 ] + // [6 , 8 , 7 , 6 , 7 ] + // ids = [5 , 5 , 6 , 3 , 7 ] + // [5 , 5 , 6 , 3 , 7 ] + // [5 , 5 , 6 , 3 , 7 ] + // weights = [1.0, 0.5, 0.7, 0.1, 1.0] + // [1.0, 0.5, 0.7, 0.1, 1.0] + // [1.0, 0.5, 0.7, 0.1, 1.0] + + for (int i = 0; i < kNumChannels * 3; ++i) { + EXPECT_EQ(expected_indices.size() * kNumSteps, + GetOutput(i + 1)->NumElements()); + } + for (int channel = 0; channel < kNumChannels; ++channel) { + LOG(INFO) << "Channel " << channel; + for (int step = 0; step < kNumSteps; ++step) { + for (int i = 0; i < expected_indices.size(); ++i) { + const int j = i + step * expected_indices.size(); + + // Note that the expectation on the indices changes per step, unlike the + // expectation for ids and weights. + int offset = channel + 1; + EXPECT_EQ((expected_indices[i] + step * kNumItems) * offset, + GetOutput(channel + 1)->vec()(j)); + EXPECT_EQ(expected_ids[i] * offset, + GetOutput(kNumChannels + channel + 1)->vec()(j)); + EXPECT_EQ(expected_weights[i] * offset, + GetOutput(2 * kNumChannels + channel + 1)->vec()(j)); + } + } + } + EXPECT_EQ(kNumSteps, GetOutput(3 * kNumChannels + 1)->scalar()()); +} + +TEST_F(DragnnBulkOpKernelsTest, BulkFixedEmbeddings) { + // Create and initialize the kernel under test. + TF_ASSERT_OK( + NodeDefBuilder("BulkFixedEmbeddings", "BulkFixedEmbeddings") + .Attr("component", kComponentName) + .Attr("num_channels", kNumChannels) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_FLOAT)) // Embedding matrices. + .Finalize(node_def())); + MockComputeSession *mock_session = GetMockSession(); + ComponentSpec spec; + spec.set_name(kComponentName); + auto chan0_spec = spec.add_fixed_feature(); + chan0_spec->set_size(2); + auto chan1_spec = spec.add_fixed_feature(); + chan1_spec->set_size(1); + EXPECT_CALL(*mock_session, Spec(kComponentName)) + .WillOnce(testing::ReturnRef(spec)); + + EXPECT_CALL(*mock_session, BatchSize(kComponentName)) + .WillOnce(Return(kNumItems)); + + const std::vector feature_step_1({0, 1, 2, 1, 2, 2, 1, 0, 1, 0}); + const std::vector feature_index_1({0, 0, 0, 0, 0, 1, 1, 1, 1, 1}); + const std::vector feature_ids_1({5, 6, 3, 5, 7, 5, 6, 3, 5, 7}); + const std::vector feature_weights_1( + {1.0, 0.7, 0.1, 0.5, 1.0, 10, 7, 1, 5, 10}); + + const std::vector feature_step_2({0, 1, 2, 1, 2}); + const std::vector feature_index_2({0, 0, 0, 0, 0}); + const std::vector feature_ids_2({5, 6, 3, 5, 7}); + const std::vector feature_weights_2({1.0, 0.7, 0.1, 0.5, 1.0}); + + const std::vector> feature_steps_by_channel( + {feature_step_1, feature_step_2}); + const std::vector> feature_index_by_channel( + {feature_index_1, feature_index_2}); + const std::vector> feature_ids_by_channel( + {feature_ids_1, feature_ids_2}); + const std::vector> feature_weights_by_channel( + {feature_weights_1, feature_weights_2}); + + // This function takes the allocator functions passed into GetBulkFF, uses + // them to allocate a tensor, then fills that tensor based on channel. + auto assigner_function = [=](string, const BulkFeatureExtractor &extractor) { + constexpr int kNumElements = 3; + constexpr int kNumSteps = 3; + for (int i = 0; i < kNumChannels; ++i) { + auto feature_step = feature_steps_by_channel.at(i); + auto feature_index = feature_index_by_channel.at(i); + auto feature_ids = feature_ids_by_channel.at(i); + auto feature_weights = feature_weights_by_channel.at(i); + + // Allocate a new tensor set for every channel. + int32 *indices = + extractor.AllocateIndexMemory(i, kNumElements * feature_step.size()); + int64 *ids = + extractor.AllocateIdMemory(i, kNumElements * feature_step.size()); + float *weights = + extractor.AllocateWeightMemory(i, kNumElements * feature_step.size()); + + // Fill the tensor. + int array_index = 0; + + for (int element = 0; element < kNumElements; ++element) { + for (int feature = 0; feature < feature_step.size(); ++feature) { + indices[array_index] = extractor.GetIndex( + kNumSteps, kNumElements, feature_index[feature], element, + feature_step[feature]); + ids[array_index] = feature_ids[feature]; + weights[array_index] = feature_weights[feature]; + ++array_index; + } + } + } + return kNumSteps; + }; + + EXPECT_CALL(*mock_session, BulkGetInputFeatures(kComponentName, _)) + .WillOnce(testing::Invoke(assigner_function)); + + // Embedding matrices as additional inputs. + // For channel 0, the embeddings are [id, 0]. + // For channel 1, the embeddings are [0, id]. + vector embedding_matrix_a; + vector embedding_matrix_b; + for (int id = 0; id < kNumIds; ++id) { + embedding_matrix_a.push_back(id); + embedding_matrix_a.push_back(0); + embedding_matrix_b.push_back(0); + embedding_matrix_b.push_back(id); + } + AddInputFromArray(TensorShape({8, 2}), embedding_matrix_a); + AddInputFromArray(TensorShape({8, 2}), embedding_matrix_b); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + // In this case we should have, for every item, these three steps: + const vector> expected_embeddings = {{5.0, 0, 73, 0, 0, 5.0}, + {6.7, 0, 67, 0, 0, 6.7}, + {7.3, 0, 50, 0, 0, 7.3}}; + EXPECT_EQ(kNumSteps * kNumItems, GetOutput(1)->shape().dim_size(0)); + constexpr int kNumFeatures = 3; + EXPECT_EQ(kNumFeatures * kEmbeddingSize, GetOutput(1)->shape().dim_size(1)); + for (int item = 0; item < kNumItems; ++item) { + for (int step = 0; step < kNumSteps; ++step) { + for (int col = 0; col < kNumChannels * kEmbeddingSize; ++col) { + const int row = item * kNumSteps + step; + EXPECT_EQ(expected_embeddings[step][col], + GetOutput(1)->matrix()(row, col)) + << "step: " << step << ", row: " << row << ", col: " << col; + } + } + } + + EXPECT_EQ(kNumSteps, GetOutput(2)->scalar()()); +} + +TEST_F(DragnnBulkOpKernelsTest, BulkFixedEmbeddingsWithPadding) { + // Create and initialize the kernel under test. + constexpr int kPaddedNumSteps = 5; + constexpr int kPaddedBatchSize = 4; + TF_ASSERT_OK( + NodeDefBuilder("BulkFixedEmbeddings", "BulkFixedEmbeddings") + .Attr("component", kComponentName) + .Attr("num_channels", kNumChannels) + .Attr("pad_to_steps", kPaddedNumSteps) + .Attr("pad_to_batch", kPaddedBatchSize) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_FLOAT)) // Embedding matrices. + .Finalize(node_def())); + MockComputeSession *mock_session = GetMockSession(); + ComponentSpec spec; + spec.set_name(kComponentName); + auto chan0_spec = spec.add_fixed_feature(); + chan0_spec->set_size(2); + auto chan1_spec = spec.add_fixed_feature(); + chan1_spec->set_size(1); + EXPECT_CALL(*mock_session, Spec(kComponentName)) + .WillOnce(testing::ReturnRef(spec)); + + EXPECT_CALL(*mock_session, BatchSize(kComponentName)) + .WillOnce(Return(kNumItems)); + + const std::vector feature_step_1({0, 1, 2, 1, 2, 2, 1, 0, 1, 0}); + const std::vector feature_index_1({0, 0, 0, 0, 0, 1, 1, 1, 1, 1}); + const std::vector feature_ids_1({5, 6, 3, 5, 7, 5, 6, 3, 5, 7}); + const std::vector feature_weights_1( + {1.0, 0.7, 0.1, 0.5, 1.0, 10, 7, 1, 5, 10}); + + const std::vector feature_step_2({0, 1, 2, 1, 2}); + const std::vector feature_index_2({0, 0, 0, 0, 0}); + const std::vector feature_ids_2({5, 6, 3, 5, 7}); + const std::vector feature_weights_2({1.0, 0.7, 0.1, 0.5, 1.0}); + + const std::vector> feature_steps_by_channel( + {feature_step_1, feature_step_2}); + const std::vector> feature_index_by_channel( + {feature_index_1, feature_index_2}); + const std::vector> feature_ids_by_channel( + {feature_ids_1, feature_ids_2}); + const std::vector> feature_weights_by_channel( + {feature_weights_1, feature_weights_2}); + + // This function takes the allocator functions passed into GetBulkFF, uses + // them to allocate a tensor, then fills that tensor based on channel. + auto assigner_function = [=](string, const BulkFeatureExtractor &extractor) { + constexpr int kNumElements = 3; + constexpr int kNumSteps = 3; + for (int i = 0; i < kNumChannels; ++i) { + auto feature_step = feature_steps_by_channel.at(i); + auto feature_index = feature_index_by_channel.at(i); + auto feature_ids = feature_ids_by_channel.at(i); + auto feature_weights = feature_weights_by_channel.at(i); + + // Allocate a new tensor set for every channel. + int32 *indices = + extractor.AllocateIndexMemory(i, kNumElements * feature_step.size()); + int64 *ids = + extractor.AllocateIdMemory(i, kNumElements * feature_step.size()); + float *weights = + extractor.AllocateWeightMemory(i, kNumElements * feature_step.size()); + + // Fill the tensor. + int array_index = 0; + + for (int element = 0; element < kNumElements; ++element) { + for (int feature = 0; feature < feature_step.size(); ++feature) { + indices[array_index] = extractor.GetIndex( + kNumSteps, kNumElements, feature_index[feature], element, + feature_step[feature]); + ids[array_index] = feature_ids[feature]; + weights[array_index] = feature_weights[feature]; + ++array_index; + } + } + } + return kNumSteps; + }; + + EXPECT_CALL(*mock_session, BulkGetInputFeatures(kComponentName, _)) + .WillOnce(testing::Invoke(assigner_function)); + + // Embedding matrices as additional inputs. + // For channel 0, the embeddings are [id, 0]. + // For channel 1, the embeddings are [0, id]. + vector embedding_matrix_a; + vector embedding_matrix_b; + for (int id = 0; id < kNumIds; ++id) { + embedding_matrix_a.push_back(id); + embedding_matrix_a.push_back(0); + embedding_matrix_b.push_back(0); + embedding_matrix_b.push_back(id); + } + AddInputFromArray(TensorShape({8, 2}), embedding_matrix_a); + AddInputFromArray(TensorShape({8, 2}), embedding_matrix_b); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + // In this case we should have, for every item, these three steps: + const vector> expected_embeddings = {{5.0, 0, 73, 0, 0, 5.0}, + {6.7, 0, 67, 0, 0, 6.7}, + {7.3, 0, 50, 0, 0, 7.3}}; + EXPECT_EQ(kPaddedNumSteps * kPaddedBatchSize, + GetOutput(1)->shape().dim_size(0)); + + constexpr int kNumFeatures = 3; + EXPECT_EQ(kNumFeatures * kEmbeddingSize, GetOutput(1)->shape().dim_size(1)); + for (int item = 0; item < kNumItems; ++item) { + for (int step = 0; step < kNumSteps; ++step) { + for (int col = 0; col < kNumChannels * kEmbeddingSize; ++col) { + const int row = item * kPaddedNumSteps + step; + EXPECT_EQ(expected_embeddings[step][col], + GetOutput(1)->matrix()(row, col)) + << "step: " << step << ", row: " << row << ", col: " << col; + } + } + } + + EXPECT_EQ(kNumSteps, GetOutput(2)->scalar()()); +} + +TEST_F(DragnnBulkOpKernelsTest, BulkAdvanceFromOracle) { + // Create and initialize the kernel under test. + TF_ASSERT_OK( + NodeDefBuilder("BulkAdvanceFromOracle", "BulkAdvanceFromOracle") + .Attr("component", kComponentName) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + MockComputeSession *mock_session = GetMockSession(); + EXPECT_CALL(*mock_session, IsTerminal(kComponentName)) + .WillOnce(Return(false)) + .WillOnce(Return(false)) + .WillOnce(Return(false)) + .WillOnce(Return(true)); + EXPECT_CALL(*mock_session, AdvanceFromOracle(kComponentName)) + .Times(kNumSteps); + const vector>> gold = { + {{1}, {1}, {1}}, {{2}, {2}, {2}}, {{3}, {3}, {3}}, + }; + EXPECT_CALL(*mock_session, EmitOracleLabels(kComponentName)) + .WillOnce(Return(gold[0])) + .WillOnce(Return(gold[1])) + .WillOnce(Return(gold[2])); + EXPECT_CALL(*mock_session, BeamSize(kComponentName)).WillOnce(Return(1)); + EXPECT_CALL(*mock_session, BatchSize(kComponentName)) + .WillOnce(Return(kNumItems)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + // For every item we should have: + const vector expected_gold = {1, 2, 3}; + EXPECT_EQ(kNumSteps * kNumItems, GetOutput(1)->NumElements()); + for (int item = 0; item < kNumItems; ++item) { + for (int step = 0; step < kNumSteps; ++step) { + EXPECT_EQ(expected_gold[step], + GetOutput(1)->vec()(step + item * kNumSteps)); + } + } +} + +string ArrayToString(const float *array, const int size) { + string str = "[ "; + for (int i = 0; i < size; ++i) { + str += tensorflow::strings::Printf("%.1f ", array[i]); + } + return str + "]"; +} + +MATCHER(CheckScoresAreConsecutiveIntegersDivTen, "") { + const int size = + DragnnBulkOpKernelsTest::kNumItems * DragnnBulkOpKernelsTest::kNumActions; + for (int i(0), score(arg[0] * 10); i < size; ++i, ++score) { + EXPECT_NEAR(score / 10.0f, arg[i], 1e-4) + << "i: " << i << ", scores: " << ArrayToString(arg, size); + } + return true; +} + +TEST_F(DragnnBulkOpKernelsTest, BulkAdvanceFromPrediction) { + // Create and initialize the kernel under test. + TF_ASSERT_OK( + NodeDefBuilder("BulkAdvanceFromPrediction", "BulkAdvanceFromPrediction") + .Attr("component", kComponentName) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_FLOAT)) // Prediction scores for advancing. + .Finalize(node_def())); + MockComputeSession *mock_session = GetMockSession(); + + // Creates an input tensor such that each step will see a list of consecutive + // integers divided by 10 as scores. + vector scores(kNumItems * kNumSteps * kNumActions); + for (int step(0), cnt(0); step < kNumSteps; ++step) { + for (int item = 0; item < kNumItems; ++item) { + for (int action = 0; action < kNumActions; ++action, ++cnt) { + scores[action + kNumActions * (step + item * kNumSteps)] = cnt / 10.0f; + } + } + } + AddInputFromArray(TensorShape({kNumItems * kNumSteps, kNumActions}), + scores); + + EXPECT_CALL(*mock_session, BeamSize(kComponentName)).WillOnce(Return(1)); + EXPECT_CALL(*mock_session, BatchSize(kComponentName)) + .WillOnce(Return(kNumItems)); + EXPECT_CALL(*mock_session, IsTerminal(kComponentName)) + .Times(kNumSteps) + .WillRepeatedly(Return(false)); + EXPECT_CALL(*mock_session, + AdvanceFromPrediction(kComponentName, + CheckScoresAreConsecutiveIntegersDivTen(), + kNumItems * kNumActions)) + .Times(kNumSteps); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/dragnn_bulk_ops.cc b/syntaxnet/dragnn/core/ops/dragnn_bulk_ops.cc new file mode 100644 index 0000000000000000000000000000000000000000..4c2d444d2ce70c84ed3f711f82ebfe6567a845a3 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/dragnn_bulk_ops.cc @@ -0,0 +1,134 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "tensorflow/core/framework/op.h" +#include "tensorflow/core/framework/shape_inference.h" + +namespace syntaxnet { +namespace dragnn { + +REGISTER_OP("BulkFixedFeatures") + .Input("handle: string") + .Output("output_handle: string") + .Output("indices: num_channels * int32") + .Output("ids: num_channels * int64") + .Output("weights: num_channels * float") + .Output("num_steps: int32") + .Attr("component: string") + .Attr("num_channels: int") + .Doc(R"doc( +Given a ComputeSession and a component, outputs fixed features for all steps. + +This op outputs features for the entire oracle path of the component. Unlike +ExtractFixedFeatures, this op mutates the master state, advancing all of its +states until they are final. For every channel, indices[channel], ids[channel], +and weights[channel] have the same length, ie. the number of predicates, +ordered by batch, beam, step. + +handle: A handle to a ComputeSession. +output_handle: A handle to the same ComputeSession after advancement. +indices: (num_channels vectors of int32) If indices[i] = j, then + embedding_sum[j] += embedding_matrix[ids[i]] * weights[i]. +ids: (num_channels vectors of int64) Ids to lookup in embedding matrices. +weights: (num_channels vectors of float) Weight for each embedding. +num_steps: (int32 scalar) The batch was unrolled for this many steps. +component: The name of a Component instance, matching the ComponentSpec.name. +num_channels: The number of FixedFeature channels. +)doc"); + +REGISTER_OP("BulkFixedEmbeddings") + .Input("handle: string") + .Input("embedding_matrix: num_channels * T") + .Output("output_handle: string") + .Output("embedding_vectors: T") + .Output("num_steps: int32") + .Attr("component: string") + .Attr("num_channels: int") + .Attr("T: type") + .Attr("pad_to_batch: int=-1") + .Attr("pad_to_steps: int=-1") + .SetIsStateful() + .Doc(R"doc( +This op is a more efficient version of BulkFixedFeatures. + +It is intended to be run with large batch sizes at inference time. The op takes +a handle to ComputeSession and embedding matrices as tensor inputs, and directly +outputs concatenated embedding vectors. + +handle: A handle to ComputeSession. +embedding_matrix: Embedding matrices. +output_handle: A handle to the same ComputeSession after advancement. +embedding_vectors: (matrix of float) Concatenated embeddings, + shaped as (batch * beam * token) x sum_channel(embedding_dim[channel]). +num_steps: The batch was unrolled for these many steps. +component: The name of a Component instance, matching the ComponentSpec.name. +num_channels: The number of FixedFeature channels. +T: The datatype to emit. +pad_to_batch: If set, the op will pad/truncate to this number of elements. +pad_to_steps: If set, the op will pad/truncate to this number of steps. +)doc"); + +REGISTER_OP("BulkAdvanceFromOracle") + .Input("handle: string") + .Output("output_handle: string") + .Output("gold_labels: int32") + .Attr("component: string") + .Doc(R"doc( +Given a ComputeSession, advances until all states are final. + +Note that, unlike AdvanceFromOracle, this op does mutate the master state, by +advancing all of its states until they are final. + +handle: A handle to a ComputeSession. +output_handle: A handle to the same ComputeSession, after it has advanced. +gold_labels: [batch_size * beam_size * max_num_steps] vector of oracle actions, + where max_num_steps is the maximum number of steps in the oracle + action sequences for every state in the batch of beams. Each + sub-segment of length max_num_steps provides the oracle action + sequence for the corresponding state in the batch of beams, padded + with trailing -1s. +component: The name of a Component instance, matching the ComponentSpec.name. +)doc"); + +REGISTER_OP("BulkAdvanceFromPrediction") + .Input("handle: string") + .Input("scores: T") + .Output("output_handle: string") + .Attr("component: string") + .Attr("T: type") + .SetShapeFn([](tensorflow::shape_inference::InferenceContext *c) { + tensorflow::shape_inference::ShapeHandle handle; + TF_RETURN_IF_ERROR(c->Merge(c->input(0), c->Vector(2), &handle)); + c->set_output(0, handle); + + auto scores = c->input(1); + TF_RETURN_IF_ERROR(c->WithRank(scores, 2, &scores)); + return tensorflow::Status::OK(); + }) + .Doc(R"doc( +Given a ComputeSession and a tensor of scores, advances the state. + +The state will be advanced until all scores are used up or all states are final. + +handle: A handle to a ComputeSession. +scores: A tensor of scores with shape + {batch_size * beam_size * num_steps, num_actions}. +output_handle: handle to the same ComputeSession after advancement. +component: The name of a Component instance, matching the ComponentSpec.name. +T: The datatype to emit. +)doc"); + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/dragnn_op_kernels.cc b/syntaxnet/dragnn/core/ops/dragnn_op_kernels.cc new file mode 100644 index 0000000000000000000000000000000000000000..a01b3772488fab0851acf0c38a16edd7c2a3ef17 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/dragnn_op_kernels.cc @@ -0,0 +1,646 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include +#include +#include + +#include "dragnn/core/compute_session.h" +#include "dragnn/core/compute_session_pool.h" +#include "dragnn/core/ops/compute_session_op.h" +#include "dragnn/core/resource_container.h" +#include "dragnn/protos/data.pb.h" +#include "dragnn/protos/spec.pb.h" +#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor" +#include "tensorflow/core/framework/op_kernel.h" +#include "tensorflow/core/framework/resource_mgr.h" +#include "tensorflow/core/framework/tensor.h" +#include "tensorflow/core/framework/tensor_shape.h" +#include "tensorflow/core/lib/core/status.h" +#include "tensorflow/core/lib/core/threadpool.h" +#include "tensorflow/core/platform/logging.h" +#include "tensorflow/core/platform/mutex.h" + +using tensorflow::DEVICE_CPU; +using tensorflow::DT_BOOL; +using tensorflow::DT_FLOAT; +using tensorflow::DT_INT32; +using tensorflow::DT_INT64; +using tensorflow::DT_STRING; +using tensorflow::DataType; +using tensorflow::OpKernel; +using tensorflow::OpKernelConstruction; +using tensorflow::OpKernelContext; +using tensorflow::ResourceMgr; +using tensorflow::Status; +using tensorflow::Tensor; +using tensorflow::TensorShape; + +namespace syntaxnet { +namespace dragnn { + +typedef ResourceContainer ComputeSessionResource; +typedef ResourceContainer ComputeSessionPoolResource; + +// Given a MasterSpec proto, outputs a handle to a ComputeSession. +class GetSession : public OpKernel { + public: + explicit GetSession(OpKernelConstruction *context) : OpKernel(context) { + string master_spec_str; + string grid_point_spec_str; + OP_REQUIRES_OK(context, context->GetAttr("master_spec", &master_spec_str)); + OP_REQUIRES_OK(context, + context->GetAttr("grid_point", &grid_point_spec_str)); + CHECK(master_spec_.ParseFromString(master_spec_str)); + CHECK(grid_point_.ParseFromString(grid_point_spec_str)); + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_STRING})); + } + + void Compute(OpKernelContext *context) override { + const string container = context->input(0).scalar()(); + ResourceMgr *rmgr = context->resource_manager(); + + // Create the pool for this container, or re-use one that was allocated in a + // previous call. + auto create_pool = [this, + &container](ComputeSessionPoolResource **resource) { + LOG(INFO) << "Creating new ComputeSessionPool in container handle: " + << container; + std::unique_ptr pool( + new ComputeSessionPool(master_spec_, grid_point_)); + *resource = new ComputeSessionPoolResource(std::move(pool)); + return Status::OK(); + }; + + ComputeSessionPoolResource *pool_resource; + + // Synchronize access to the resource manager when getting or creating the + // ComputeSessionPool. + // Scoping for minimal mutex locking. + { + mutex_lock lock(lock_); + OP_REQUIRES_OK(context, + rmgr->LookupOrCreate( + container, "pool", &pool_resource, create_pool)); + } + ComputeSessionPool *pool = pool_resource->get(); + CHECK(pool != nullptr); + + // Get a new Session for this computation from the pool. + std::unique_ptr session = pool->GetSession(); + const string id = std::to_string(session->Id()); + + // Store it in the ResourceManager. + OP_REQUIRES_OK( + context, + rmgr->Create( + container, id, new ComputeSessionResource(std::move(session)))); + + Tensor *output; + OP_REQUIRES_OK(context, + context->allocate_output(0, TensorShape({2}), &output)); + output->vec()(0) = container; + output->vec()(1) = id; + + // Unref the pool so it gets destroyed properly. + pool_resource->Unref(); + VLOG(1) << "Returning session: " << id; + } + + private: + MasterSpec master_spec_; + GridPoint grid_point_; + + // Mutex that serializes accesses to the resource manager. (These would block + // in the compute session pool anyways, so there's no regression there, and + // we need to protect from racy multiple initialization.) + tensorflow::mutex lock_; + + TF_DISALLOW_COPY_AND_ASSIGN(GetSession); +}; + +REGISTER_KERNEL_BUILDER(Name("GetSession").Device(DEVICE_CPU), GetSession); + +// Given a handle to a ComputeSession, returns it to the pool. As long as we +// start with "GetSession", DRAGNN graphs are thread-safe and there is no need +// for explicit multi-thread logic. As long as we end with "ReleaseSession", +// then memory usage will be constrained to the maximum number of concurrent +// requests. +class ReleaseSession : public OpKernel { + public: + explicit ReleaseSession(OpKernelConstruction *context) : OpKernel(context) { + string master_spec_str; + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {})); + } + + void Compute(OpKernelContext *context) override { + auto handle = context->input(0).vec(); + const string &container = handle(0); + const string &id = handle(1); + VLOG(1) << "Releasing session: " << id; + ResourceMgr *rmgr = context->resource_manager(); + + // Get the pool for this container. + ComputeSessionPoolResource *pool_resource; + TF_CHECK_OK(rmgr->Lookup(container, "pool", + &pool_resource)); + auto *pool = pool_resource->get(); + CHECK(pool != nullptr); + + // Get the compute session. + ComputeSessionResource *session_resource; + TF_CHECK_OK( + rmgr->Lookup(container, id, &session_resource)); + + // We need to release the ComputeSession from both the ResourceMgr and + // the ComputeSessionPool. The order of release is critical. If the + // resource is not first Delete()-ed from the ResourceMgr, then another + // thread may try to Create() the same resource, resulting in an + // "Already exists" error. + // + // First, delete the ResourceMgr reference so it can be used in the future. + TF_CHECK_OK(rmgr->Delete(container, id)); + + // Second, return the ComputeSession to the pool. + pool->ReturnSession(session_resource->release()); + + // Unref the resources so they get destroyed properly. + session_resource->Unref(); + pool_resource->Unref(); + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(ReleaseSession); +}; + +REGISTER_KERNEL_BUILDER(Name("ReleaseSession").Device(DEVICE_CPU), + ReleaseSession); + +/******************************************************************************* + * ComputeSessionOps below here. + ******************************************************************************/ + +// Given a handle to a BatchedBeamComponentState, advances based on the next +// oracle (gold) action. +class AdvanceFromOracle : public ComputeSessionOp { + public: + explicit AdvanceFromOracle(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + session->AdvanceFromOracle(component_name()); + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(AdvanceFromOracle); +}; + +REGISTER_KERNEL_BUILDER(Name("AdvanceFromOracle").Device(DEVICE_CPU), + AdvanceFromOracle); + +// Given a handle to a BatchedBeamComponentState and a tensor of scores, +// advances the state. The tensor of scores has shape batch_size x beam_size +// x num_actions. +class AdvanceFromPrediction : public ComputeSessionOp { + public: + explicit AdvanceFromPrediction(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, + context->MatchSignature({DT_STRING, DT_FLOAT}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + const Tensor &scores = context->input(1); + session->AdvanceFromPrediction(component_name(), + scores.tensor().data(), + scores.NumElements()); + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(AdvanceFromPrediction); +}; + +REGISTER_KERNEL_BUILDER(Name("AdvanceFromPrediction").Device(DEVICE_CPU), + AdvanceFromPrediction); + +// Given a handle to a ComputeSession and a channel index, outputs fixed +// features. +// Fixed features are returned as 3 vectors or equal length: +// - ids: specifies which rows should be looked up in the embedding +// matrix, +// - weights: specifies a scale for each embedding vector, +// - indices: sorted vector that assigns the same index to embedding +// vectors +// that should be summed together. +// +// For example if we have 3 features, for a given channel, we might have: +// feature a: (5, 1) +// feature b: (5, 0.5), (6, 0.5) +// feature c: (7, 1) +// In this case: +// indices should look like: [0, 1, 1, 2] +// ids should be [5, 5, 6, 7] +// weights should be [1, 0.5, 0.5, 1] +class ExtractFixedFeatures : public ComputeSessionOp { + public: + explicit ExtractFixedFeatures(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->GetAttr("channel_id", &channel_id_)); + OP_REQUIRES_OK(context, context->MatchSignature( + {DT_STRING}, {DT_INT32, DT_INT64, DT_FLOAT})); + } + + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + // Allocates output tensors. + auto indices_allocator = [context](int num_elements) { + Tensor *output; + CHECK(context->allocate_output(0, TensorShape({num_elements}), &output) + .ok()); + return output->vec().data(); + }; + auto ids_allocator = [context](int num_elements) { + Tensor *ids_tensor; + CHECK( + context->allocate_output(1, TensorShape({num_elements}), &ids_tensor) + .ok()); + return ids_tensor->vec().data(); + }; + auto weights_allocator = [context](int num_elements) { + Tensor *output; + CHECK(context->allocate_output(2, TensorShape({num_elements}), &output) + .ok()); + return output->vec().data(); + }; + int num_features = session->GetInputFeatures( + component_name(), indices_allocator, ids_allocator, weights_allocator, + channel_id_); + VLOG(2) << "Extracted " << num_features; + } + + private: + int channel_id_; + TF_DISALLOW_COPY_AND_ASSIGN(ExtractFixedFeatures); +}; + +REGISTER_KERNEL_BUILDER(Name("ExtractFixedFeatures").Device(DEVICE_CPU), + ExtractFixedFeatures); + +// Given a handle to a ComputeSession and a channel index, outputs link +// features. Link features are returned as two vectors of length batch_size * +// beam_size * channel_size: +// - step_idx: specifies the element to read in a tensor array of activations, +// - idx: specifies the row within the tensor array element. +class ExtractLinkFeatures : public ComputeSessionOp { + public: + explicit ExtractLinkFeatures(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->GetAttr("channel_id", &channel_id_)); + OP_REQUIRES_OK(context, + context->MatchSignature({DT_STRING}, {DT_INT32, DT_INT32})); + } + + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + auto features = + session->GetTranslatedLinkFeatures(component_name(), channel_id_); + + // Computes output size. + const int64 num_indices = features.size(); + + // Allocates output tensors. + Tensor *step_idx_output; + Tensor *idx_output; + OP_REQUIRES_OK(context, + context->allocate_output(0, TensorShape({num_indices}), + &step_idx_output)); + OP_REQUIRES_OK(context, context->allocate_output( + 1, TensorShape({num_indices}), &idx_output)); + + const int source_beam_size = + session->SourceComponentBeamSize(component_name(), channel_id_); + VLOG(2) << "source_beam_size:" << source_beam_size; + + // Clip step_idx for all features. If a feature is empty, set the step + // index to -1. + for (int i = 0; i < features.size(); ++i) { + if (!features[i].has_step_idx() || features[i].step_idx() < -1) { + features[i].set_step_idx(-1); + } + } + + // Fills output tensors. + for (int i = 0; i < features.size(); ++i) { + // Sets the element to read from a tensor array of activations. + step_idx_output->vec()(i) = features[i].step_idx(); + + // Within the tensor array element the id has to account for beam index + // and batch index for this specific component state. + idx_output->vec()(i) = + features[i].step_idx() >= 0 + ? OutputLinearIndex(features[i], source_beam_size) + : 0; + + VLOG(2) << "features[" << i << "]: " << features[i].ShortDebugString(); + } + } + + private: + // Given the beam index and the batch index in a LinkFeatures proto, returns + // the corresponding linear index, assuming that the matrix we're indexing + // into has shape {batch_size * beam_size, activation_size}, reshaped from a + // tensor of shape {batch_size, beam_size, activation_size}. + static uint64 OutputLinearIndex(const LinkFeatures &feature, + const int beam_size) { + VLOG(2) << "OutputLinearIndex batch_idx:" << feature.batch_idx() + << " beam_size:" << beam_size << " beam_idx:" << feature.beam_idx(); + return feature.batch_idx() * beam_size + feature.beam_idx(); + } + + int channel_id_; + TF_DISALLOW_COPY_AND_ASSIGN(ExtractLinkFeatures); +}; + +REGISTER_KERNEL_BUILDER(Name("ExtractLinkFeatures").Device(DEVICE_CPU), + ExtractLinkFeatures); + +// Given a handle to a BatchedBeamComponentState, emits a vector of gold +// labels. +// The vector of gold labels has size batch_size * beam_size. +class EmitOracleLabels : public ComputeSessionOp { + public: + explicit EmitOracleLabels(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_INT32})); + } + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + VLOG(2) << "state->BatchSize: " << session->BatchSize(component_name()); + VLOG(2) << "state->BeamSize: " << session->BeamSize(component_name()); + Tensor *output; + OP_REQUIRES_OK(context, + context->allocate_output( + 0, + TensorShape({session->BatchSize(component_name()) * + session->BeamSize(component_name())}), + &output)); + std::vector> batched_labels = + session->EmitOracleLabels(component_name()); + int raw_index = 0; + for (const auto &batch_vector : batched_labels) { + for (const auto &label : batch_vector) { + output->vec()(raw_index) = label; + ++raw_index; + } + } + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(EmitOracleLabels); +}; + +REGISTER_KERNEL_BUILDER(Name("EmitOracleLabels").Device(DEVICE_CPU), + EmitOracleLabels); + +// Given a handle to a ComponentState, emits a single bool indicating +// whether all elements in the batch contain beams containing all final states. +class EmitAllFinal : public ComputeSessionOp { + public: + explicit EmitAllFinal(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_BOOL})); + } + + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + Tensor *output; + OP_REQUIRES_OK(context, + context->allocate_output(0, TensorShape({1}), &output)); + const bool is_terminal = session->IsTerminal(component_name()); + VLOG(2) << "EmitAllFinal: is_terminal = " << is_terminal; + output->vec()(0) = is_terminal; + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(EmitAllFinal); +}; + +REGISTER_KERNEL_BUILDER(Name("EmitAllFinal").Device(DEVICE_CPU), EmitAllFinal); + +// Prepares the given component for computation. +class InitComponentData : public ComputeSessionOp { + public: + explicit InitComponentData(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, + context->MatchSignature({DT_STRING, DT_INT32}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + const int beam_size = context->input(1).scalar()(); + session->InitializeComponentData(component_name(), beam_size); + } +}; + +REGISTER_KERNEL_BUILDER(Name("InitComponentData").Device(DEVICE_CPU), + InitComponentData); + +// Returns the given component's batch size. +class BatchSize : public ComputeSessionOp { + public: + explicit BatchSize(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_INT32})); + } + + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + Tensor *output; + OP_REQUIRES_OK(context, + context->allocate_output(0, TensorShape({}), &output)); + output->scalar()() = session->BatchSize(component_name()); + } +}; + +REGISTER_KERNEL_BUILDER(Name("BatchSize").Device(DEVICE_CPU), BatchSize); + +// Attaches a data source to the master. +class AttachDataReader : public ComputeSessionOp { + public: + explicit AttachDataReader(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK( + context, context->MatchSignature({DT_STRING, DT_STRING}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return false; } + + // Calls SetInputData() on the ComputeSession. + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + auto input_data(context->input(1).vec()); + + std::vector data; + for (int i = 0; i < input_data.size(); ++i) { + data.push_back(input_data(i)); + } + session->SetInputData(data); + } +}; + +REGISTER_KERNEL_BUILDER(Name("AttachDataReader").Device(DEVICE_CPU), + AttachDataReader); + +// Sets the tracing flag on the master state, which will enable or disable +// tracing as inference / training is run. +class SetTracing : public ComputeSessionOp { + public: + explicit SetTracing(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, + context->MatchSignature({DT_STRING, DT_BOOL}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return false; } + + // Calls SetTracing() on the ComputeSession. + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + auto tracing_on = context->input(1).scalar()(); + session->SetTracing(tracing_on); + } +}; + +REGISTER_KERNEL_BUILDER(Name("SetTracing").Device(DEVICE_CPU), SetTracing); + +class WriteAnnotations : public ComputeSessionOp { + public: + explicit WriteAnnotations(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_STRING})); + } + + bool OutputsHandle() const override { return true; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + session->FinalizeData(component_name()); + } +}; + +REGISTER_KERNEL_BUILDER(Name("WriteAnnotations").Device(DEVICE_CPU), + WriteAnnotations); + +// Given a handle to a ComponentState, emits a vector of strings +// corresponding to the serialized predictions of the model. +class EmitAnnotations : public ComputeSessionOp { + public: + explicit EmitAnnotations(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_STRING})); + } + + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + // Get the annotations from the state. + auto annotations = session->GetSerializedPredictions(); + + // Copy annotations to the output. + Tensor *output; + const int64 output_size = annotations.size(); + OP_REQUIRES_OK(context, context->allocate_output( + 0, TensorShape({output_size}), &output)); + auto annotations_output = output->vec(); + for (int i = 0; i < annotations.size(); ++i) { + annotations_output(i) = annotations[i]; + } + } + + private: + TF_DISALLOW_COPY_AND_ASSIGN(EmitAnnotations); +}; + +REGISTER_KERNEL_BUILDER(Name("EmitAnnotations").Device(DEVICE_CPU), + EmitAnnotations); + +// Get the component trace. +class GetComponentTrace : public ComputeSessionOp { + public: + explicit GetComponentTrace(OpKernelConstruction *context) + : ComputeSessionOp(context) { + OP_REQUIRES_OK(context, context->MatchSignature({DT_STRING}, {DT_STRING})); + } + + bool OutputsHandle() const override { return false; } + bool RequiresComponentName() const override { return true; } + + void ComputeWithState(OpKernelContext *context, + ComputeSession *session) override { + auto traces = session->GetTraceProtos(); + + const int64 size = traces.size(); + Tensor *trace_output_tensor; + OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape({size}), + &trace_output_tensor)); + auto trace_output = trace_output_tensor->vec(); + for (int i = 0; i < size; ++i) { + CHECK(traces[i].SerializeToString(&trace_output(i))); + } + } + + TF_DISALLOW_COPY_AND_ASSIGN(GetComponentTrace); +}; + +REGISTER_KERNEL_BUILDER(Name("GetComponentTrace").Device(DEVICE_CPU), + GetComponentTrace); + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/dragnn_op_kernels_test.cc b/syntaxnet/dragnn/core/ops/dragnn_op_kernels_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..92444fa920e06530942e6589249bdb78fb075793 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/dragnn_op_kernels_test.cc @@ -0,0 +1,866 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include +#include +#include + +#include "dragnn/core/compute_session.h" +#include "dragnn/core/compute_session_pool.h" +#include "dragnn/core/resource_container.h" +#include "dragnn/core/test/generic.h" +#include "dragnn/core/test/mock_compute_session.h" + +#include + +#include "tensorflow/core/framework/allocator.h" +#include "tensorflow/core/framework/control_flow.h" +#include "tensorflow/core/framework/fake_input.h" +#include "tensorflow/core/framework/node_def_builder.h" +#include "tensorflow/core/framework/op_kernel.h" +#include "tensorflow/core/framework/tensor.h" +#include "tensorflow/core/framework/tensor_testutil.h" +#include "tensorflow/core/framework/types.h" +#include "tensorflow/core/framework/types.pb.h" +#include "tensorflow/core/kernels/ops_testutil.h" +#include "tensorflow/core/kernels/ops_util.h" +#include "tensorflow/core/lib/core/status_test_util.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +using tensorflow::AllocatorAttributes; +using tensorflow::checkpoint::TensorSliceReaderCacheWrapper; +using tensorflow::DT_BOOL; +using tensorflow::DT_FLOAT; +using tensorflow::DT_STRING; +using tensorflow::DT_INT32; +using tensorflow::FrameAndIter; +using tensorflow::DataType; +using tensorflow::NodeDefBuilder; +using tensorflow::OpKernelContext; +using tensorflow::ResourceMgr; +using tensorflow::ScopedStepContainer; +using tensorflow::Status; +using tensorflow::test::SetOutputAttrs; +using tensorflow::TensorShape; + +using testing::_; +using testing::ElementsAreArray; +using testing::Invoke; +using testing::Pointwise; +using testing::Return; + +typedef ResourceContainer ComputeSessionResource; +typedef ResourceContainer ComputeSessionPoolResource; + +class DragnnOpKernelsTest : public tensorflow::OpsTestBase { + public: + void ResetOpKernelContext() { + params_.reset(new OpKernelContext::Params); + params_->device = device_.get(); + params_->frame_iter = FrameAndIter(0, 0); + params_->inputs = &inputs_; + params_->op_kernel = kernel_.get(); + step_container_.reset(new ScopedStepContainer(0, [](const string &) {})); + params_->step_container = step_container_.get(); + attrs_.clear(); + SetOutputAttrs(params_.get(), &attrs_); + TensorSliceReaderCacheWrapper slice_reader_cache_wrapper; + params_->slice_reader_cache = &slice_reader_cache_wrapper; + params_->resource_manager = device_->resource_manager(); + context_.reset(new OpKernelContext(params_.get())); + } + + Status RunOpKernelWithContext() { + device_->Compute(kernel_.get(), context_.get()); + return context_->status(); + } + + // Accessor for the underlying resource manager. + ResourceMgr *resource_mgr() { return params_->resource_manager; } + + // This needs to maintain its existence throughout the compute call. + std::vector attrs_; +}; + +// Helper function to build LinkFeatures. +LinkFeatures MakeFeatures(int batch_index, int beam_index, int step) { + LinkFeatures features; + features.set_batch_idx(batch_index); + features.set_beam_idx(beam_index); + features.set_step_idx(step); + return features; +} + +// The GetSessionOp should +// 1. create a ComputeSessionPool resource and store it in the ResourceMgr, +// 2. create a ComputeSession resource and store it in the ResourceMgr, +// 3. return the container and id strings in its output. +TEST_F(DragnnOpKernelsTest, GetSessionOpTest) { + // Create a MasterSpec and GridPoint string to pass into the attrs for this + // op. + MasterSpec spec; + spec.set_debug_tracing(true); + string master_spec_str; + spec.SerializeToString(&master_spec_str); + + GridPoint hyperparams; + string hyperparams_str; + hyperparams.SerializeToString(&hyperparams_str); + + // Create and initialize the kernel under test. + TF_ASSERT_OK( + NodeDefBuilder("get_session", "GetSession") + .Attr("master_spec", master_spec_str) + .Attr("grid_point", hyperparams_str) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + AddInputFromList(TensorShape({1}), {container_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Expect that the 0th output contains two strings, and that the ResourceMgr + // contains a ComputeSessionResource associated with those two strings. + const string container_str = GetOutput(0)->vec()(0); + const string id_str = GetOutput(0)->vec()(1); + VLOG(2) << "container: " << container_str << " id: " << id_str; + + // The first compute session should have id "0". + EXPECT_EQ("0", id_str); + ComputeSessionResource *session_resource; + TF_EXPECT_OK(resource_mgr()->Lookup( + container_str, id_str, &session_resource)); + + // Expect that the ResourceMgr also contains a ComputeSessionPoolResource. + const string pool_id_str = "pool"; + ComputeSessionPoolResource *pool_resource; + TF_EXPECT_OK(resource_mgr()->Lookup( + container_str, pool_id_str, &pool_resource)); + + // Unref the managed resources so they get destroyed properly. + session_resource->Unref(); + pool_resource->Unref(); +} + +// The GetSessionOp should take a session stored in the resource manager +// and return it to the ComputeSessionPool. +TEST_F(DragnnOpKernelsTest, ReleaseSessionOpTest) { + // Create and initialize the kernel under test. + TF_ASSERT_OK( + NodeDefBuilder("release_session", "ReleaseSession") + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a ComputeSessionPool. + MasterSpec spec; + GridPoint hyperparams; + std::unique_ptr pool( + new ComputeSessionPool(spec, hyperparams)); + + // Get an unowned pointer to the ComputeSessionPool before moving + // the pool to the resource manager. + ComputeSessionPool *pool_ptr = pool.get(); + TF_ASSERT_OK(resource_mgr()->Create( + container_string, "pool", + new ComputeSessionPoolResource(std::move(pool)))); + + // Create a ComputeSession and move it to the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(pool_ptr->GetSession()))); + + // At this point, the pool should report that it has one outstanding session. + EXPECT_EQ(1, pool_ptr->num_outstanding_sessions()); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // At this point, the pool should report that it has no outstanding sessions. + EXPECT_EQ(0, pool_ptr->num_outstanding_sessions()); + + // The resource manager should no longer contain the session object. + ComputeSessionResource *null_resource = nullptr; + auto result = resource_mgr()->Lookup( + container_string, id_string, &null_resource); + EXPECT_NE(Status::OK(), result); + EXPECT_EQ(null_resource, nullptr); +} + +// The AdvanceFromOracle op should call AdvanceFromOracle on the specified +// component name. +TEST_F(DragnnOpKernelsTest, AdvanceFromOracleOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("advance_from_oracle", "AdvanceFromOracle") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set expectations on the mock session. + EXPECT_CALL(*mock_session_ptr, AdvanceFromOracle(component_name)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); +} + +// The AdvanceFromPredicton op should call AdvanceFromPrediction on the +// specified component with the passed scores. +TEST_F(DragnnOpKernelsTest, AdvanceFromPredictionOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("advance_from_prediction", "AdvanceFromPrediction") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_FLOAT)) // The prediction tensor. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + const std::vector weights = {1.1, 2.2, 3.3, 4.4}; + AddInputFromArray(TensorShape({2, 2}), weights); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set expectations on the mock session. + auto validator_function = [weights](const string &component_name, + const float score_matrix[], + int score_matrix_length) { + EXPECT_EQ(weights.size(), score_matrix_length); + for (int i = 0; i < weights.size(); ++i) { + EXPECT_EQ(weights[i], score_matrix[i]); + } + }; + EXPECT_CALL(*mock_session_ptr, AdvanceFromPrediction(component_name, _, _)) + .WillOnce(Invoke(validator_function)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); +} + +// The ExtractFixedFeatures op should return a set of fixed feature vectors +// as described below. +TEST_F(DragnnOpKernelsTest, ExtractFixedFeaturesOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + constexpr int kChannelId = 78; + TF_ASSERT_OK( + NodeDefBuilder("advance_from_prediction", "ExtractFixedFeatures") + .Attr("component", component_name) + .Attr("channel_id", kChannelId) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // If we have 3 features, for a given channel, we might have: + // feature a: (5, 1) + // feature b: (5, 0.5), (6, 0.7) + // feature c: (3, 0.1), (7, [empty]) <- Empty weights are equivalent to 1.0. + // In this case: + // indices should look like [0 , 1 , 1 , 2 , 2 ] + // ids should be [5 , 5 , 6 , 3 , 7 ] + // weights should be [1.0, 0.5, 0.7, 0.1, 1.0] + const std::vector expected_indices({0, 1, 1, 2, 2}); + const std::vector expected_ids({5, 5, 6, 3, 7}); + const std::vector expected_weights({1.0, 0.5, 0.7, 0.1, 1.0}); + + auto assigner_function = + [=](string, std::function indices_allocator, + std::function ids_allocator, + std::function weights_allocator, int) { + constexpr int kFeatureCount = 5; + int32 *indices = indices_allocator(kFeatureCount); + int64 *ids = ids_allocator(kFeatureCount); + float *weights = weights_allocator(kFeatureCount); + for (int i = 0; i < kFeatureCount; ++i) { + indices[i] = expected_indices[i]; + ids[i] = expected_ids[i]; + weights[i] = expected_weights[i]; + } + return kFeatureCount; + }; + + EXPECT_CALL(*mock_session_ptr, + GetInputFeatures(component_name, _, _, _, kChannelId)) + .WillOnce(testing::Invoke(assigner_function)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + EXPECT_EQ(expected_indices.size(), GetOutput(0)->NumElements()); + for (int i = 0; i < expected_indices.size(); ++i) { + EXPECT_EQ(expected_indices[i], GetOutput(0)->vec()(i)); + } + EXPECT_EQ(expected_ids.size(), GetOutput(1)->NumElements()); + for (int i = 0; i < expected_ids.size(); ++i) { + EXPECT_EQ(expected_ids[i], GetOutput(1)->vec()(i)); + } + EXPECT_EQ(expected_weights.size(), GetOutput(2)->NumElements()); + for (int i = 0; i < expected_weights.size(); ++i) { + EXPECT_EQ(expected_weights[i], GetOutput(2)->vec()(i)); + } +} + +// The ExtractLinkFeatures op should return a set of linked feature vectors +// as described below. +TEST_F(DragnnOpKernelsTest, ExtractLinkFeaturesOpTest) { + // TODO(googleuser): Is a 2-vector output the correct way to do this? + // Why reshape instead of passing [batch, beam, index] or just + // [batch,index] ? + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + constexpr int kChannelId = 3421; + TF_ASSERT_OK( + NodeDefBuilder("extract_link_features", "ExtractLinkFeatures") + .Attr("component", component_name) + .Attr("channel_id", kChannelId) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // This op will return link features in two flat arrays using batch-major + // ordering. So, if we have a batch of 2 and a beam of 3, with data as follows + // (note that the features are {batch,beam,step} and [] is 'empty') + // batch 1 features: {{02,03,[]},{01,00,04},{08,06,01}} + // batch 2 features: {{12,13,14},{11,12,-1},{18,16,20}} + // + // and a **source component** beam size of 5 should result in output tensors: + // step_idx (tensor 0): {-1, 4, 1, 14, -1, 20} + // array_idx (tensor 1): { 0, 5, 46, 73, 0, 106} + // (0 [step=-1]),(5=1*5+0),(46=8*5+6),(73=12*5+13),(0 [step=-1]),(96=18*5+16) + constexpr int kSourceComponentBeamSize = 5; + + std::vector features; + features.push_back(MakeFeatures(2, 3, -1)); + features.back().clear_step_idx(); // step_idx is now empty. + features.push_back(MakeFeatures(1, 0, 4)); + features.push_back(MakeFeatures(8, 6, 1)); + features.push_back(MakeFeatures(12, 13, 14)); + features.push_back(MakeFeatures(11, 12, -1)); + features.push_back(MakeFeatures(18, 16, 20)); + + const std::vector expected_step_idx({-1, 4, 1, 14, -1, 20}); + const std::vector expected_array_idx({0, 5, 46, 73, 0, 106}); + + EXPECT_CALL(*mock_session_ptr, + SourceComponentBeamSize(component_name, kChannelId)) + .WillRepeatedly(Return(kSourceComponentBeamSize)); + EXPECT_CALL(*mock_session_ptr, + GetTranslatedLinkFeatures(component_name, kChannelId)) + .WillOnce(Return(features)); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + EXPECT_EQ(expected_step_idx.size(), GetOutput(0)->NumElements()); + for (int i = 0; i < expected_step_idx.size(); ++i) { + EXPECT_EQ(expected_step_idx[i], GetOutput(0)->vec()(i)); + } + EXPECT_EQ(expected_array_idx.size(), GetOutput(1)->NumElements()); + for (int i = 0; i < expected_array_idx.size(); ++i) { + EXPECT_EQ(expected_array_idx[i], GetOutput(1)->vec()(i)); + } +} + +// The EmitOracleLabels op should return a set of oracle labels for all +// elements in all beams in all batches. +TEST_F(DragnnOpKernelsTest, EmitOracleLabelsOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("emit_oracle_labels", "EmitOracleLabels") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // The op should request the batch and beam size, then request the oracle + // labels. They should be returned batch major, so: + // batch 1 oracle labels: {1, 3, 5, 7} + // batch 2 oracle labels: {2, 4, 6, 8} + // should result in an output tensor as follows: + // {1, 3, 5, 7, 2, 4, 6, 8} + + constexpr int kBatchSize = 2; + constexpr int kBeamSize = 4; + const std::vector> oracle_labels( + {{1, 3, 5, 7}, {2, 4, 6, 8}}); + + EXPECT_CALL(*mock_session_ptr, BatchSize(component_name)) + .WillRepeatedly(Return(kBatchSize)); + EXPECT_CALL(*mock_session_ptr, BeamSize(component_name)) + .WillRepeatedly(Return(kBeamSize)); + EXPECT_CALL(*mock_session_ptr, EmitOracleLabels(component_name)) + .WillOnce(Return(oracle_labels)); + + const std::vector expected_labels({1, 3, 5, 7, 2, 4, 6, 8}); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + EXPECT_EQ(expected_labels.size(), GetOutput(0)->NumElements()); + for (int i = 0; i < expected_labels.size(); ++i) { + EXPECT_EQ(expected_labels[i], GetOutput(0)->vec()(i)); + } +} + +// The EmitAllFinal op should return the result of IsTerminal(component_name). +TEST_F(DragnnOpKernelsTest, EmitAllFinalOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("emit_all_final", "EmitAllFinal") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set up mocks. + constexpr bool kIsTerminal = true; + EXPECT_CALL(*mock_session_ptr, IsTerminal(component_name)) + .WillOnce(Return(kIsTerminal)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + EXPECT_EQ(1, GetOutput(0)->NumElements()); + EXPECT_EQ(kIsTerminal, GetOutput(0)->vec()(0)); +} + +// The InitComponent op should initialize the given component with the given +// beam size. +// TODO(googleuser): Should we just store the beam size somewhere in the +// ComputeSession? +TEST_F(DragnnOpKernelsTest, InitComponentDataOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("init_component_data", "InitComponentData") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_INT32)) // The beam size. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + constexpr int32 kBeamSize = 9001; + AddInputFromList(TensorShape({1}), {kBeamSize}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set up mocks. + EXPECT_CALL(*mock_session_ptr, + InitializeComponentData(component_name, kBeamSize)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Output should be the input handle. + EXPECT_EQ(container_string, GetOutput(0)->vec()(0)); + EXPECT_EQ(id_string, GetOutput(0)->vec()(1)); +} + +// The BatchSize op should call BatchSize on the ComputeSession with the given +// component as argument. +TEST_F(DragnnOpKernelsTest, BatchSizeOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("batch_size", "BatchSize") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set up mocks. + constexpr int kBatchSize = 8; + EXPECT_CALL(*mock_session_ptr, BatchSize(component_name)) + .WillOnce(Return(kBatchSize)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Output should be the batch size. + EXPECT_EQ(kBatchSize, GetOutput(0)->scalar()()); +} + +// The AttachDataReader op should push the given vector of strings into the +// session. +TEST_F(DragnnOpKernelsTest, AttachDataReaderOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("attach_data_reader", "AttachDataReader") + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_STRING)) // The data to pass to the session. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + const std::vector data( + {"one string", "two string", "red string", "blue string"}); + AddInputFromArray(TensorShape({4}), data); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set up mocks. + EXPECT_CALL(*mock_session_ptr, SetInputData(ElementsAreArray(data))); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); +} + +// The SetTracingOp should pass its argument through to the underlying +// ComputeSession. +TEST_F(DragnnOpKernelsTest, SetTracingOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("set_tracing", "SetTracing") + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Input(FakeInput(DT_BOOL)) // The boolean to set tracing to. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + constexpr bool kSetTracing = true; + AddInputFromList(TensorShape({1}), {kSetTracing}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set expectations on the mock session. + EXPECT_CALL(*mock_session_ptr, SetTracing(kSetTracing)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); +} + +// The WriteAnnotations op should call FinalizeData on the current component. +TEST_F(DragnnOpKernelsTest, WriteAnnotationsOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("write_annotations", "WriteAnnotations") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + // Create a MockComputeSession and set expectations. + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Set expectations on the mock session. + EXPECT_CALL(*mock_session_ptr, FinalizeData(component_name)); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); +} + +// The EmitAnnotations op should return a vector of annotated strings as +// described below. +TEST_F(DragnnOpKernelsTest, EmitAnnotationsOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("emit_annotations", "EmitAnnotations") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + constexpr int kBatchSize = 2; + std::vector predictions({"one", "two"}); + + EXPECT_CALL(*mock_session_ptr, BatchSize(component_name)) + .WillRepeatedly(Return(kBatchSize)); + EXPECT_CALL(*mock_session_ptr, GetSerializedPredictions()) + .WillOnce(Return(predictions)); + + // The output vector is batch_size. + const std::vector expected_output({"one", "two"}); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + EXPECT_EQ(expected_output.size(), GetOutput(0)->NumElements()); + for (int i = 0; i < expected_output.size(); ++i) { + EXPECT_EQ(expected_output[i], GetOutput(0)->vec()(i)); + } +} + +// The GetComponentTrace op should return a vector of serialized trace protos. +TEST_F(DragnnOpKernelsTest, GetComponentTraceOpTest) { + // Create and initialize the kernel under test. + const string component_name = "TESTING_COMPONENT_NAME"; + TF_ASSERT_OK( + NodeDefBuilder("get_component_trace", "GetComponentTrace") + .Attr("component", component_name) + .Input(FakeInput(DT_STRING)) // The handle for the ComputeSession. + .Finalize(node_def())); + TF_ASSERT_OK(InitOp()); + + // Set the input data. + const string container_string = "container_str"; + const string id_string = "id_str"; + AddInputFromList(TensorShape({2}), {container_string, id_string}); + + // Reset the test context to ensure it's clean. + ResetOpKernelContext(); + + std::unique_ptr mock_session(new MockComputeSession()); + MockComputeSession *mock_session_ptr = mock_session.get(); + + // This op will request a set of MasterTraces from GetTraceProtos(), then + // return them. + + MasterTrace trace; + auto component_trace = trace.add_component_trace(); + component_trace->set_name("arbitrary_component_name_for_html"); + auto component_trace_2 = trace.add_component_trace(); + component_trace_2->set_name("arbitrary_component_name_2_for_html"); + const std::vector master_traces({trace}); + + EXPECT_CALL(*mock_session_ptr, GetTraceProtos()) + .WillOnce(Return(master_traces)); + + // Wrap the ComputeSessionResource and put it into the resource manager. + TF_ASSERT_OK(resource_mgr()->Create( + container_string, id_string, + new ComputeSessionResource(std::move(mock_session)))); + + // Run the kernel. + TF_EXPECT_OK(RunOpKernelWithContext()); + + // Validate the outputs. + EXPECT_EQ(master_traces.size(), GetOutput(0)->NumElements()); + for (int i = 0; i < master_traces.size(); ++i) { + string expected; + master_traces.at(i).SerializeToString(&expected); + EXPECT_EQ(expected, GetOutput(0)->vec()(i)); + } +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/ops/dragnn_ops.cc b/syntaxnet/dragnn/core/ops/dragnn_ops.cc new file mode 100644 index 0000000000000000000000000000000000000000..f2bd653b0d906a8eaa663275fce9c439a43efe24 --- /dev/null +++ b/syntaxnet/dragnn/core/ops/dragnn_ops.cc @@ -0,0 +1,260 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "tensorflow/core/framework/op.h" + +namespace syntaxnet { +namespace dragnn { + +REGISTER_OP("GetSession") + .Input("container: string") + .Attr("master_spec: string") + .Attr("grid_point: string") + .Output("handle: string") + .SetIsStateful() + .Doc(R"doc( +Given MasterSpec and GridPoint protos, outputs a handle to a ComputeSession. + +container: A unique identifier for the ComputeSessionPool from which a + ComputeSession will be allocated. +master_spec: A serialized syntaxnet.dragnn.MasterSpec proto. +grid_point: A serialized syntaxnet.dragnn.GridPoint proto. +handle: A string handle to a ComputeSession. +)doc"); + +REGISTER_OP("ReleaseSession").Input("handle: string").SetIsStateful().Doc(R"doc( +Given a ComputeSession, return it to the ComputeSession pool. + +This ComputeSession will no longer be available after this op returns. + +handle: A handle to a ComputeSession that will be returned to the backing pool. +)doc"); + +REGISTER_OP("InitComponentData") + .Input("handle: string") + .Input("beam_size: int32") + .Attr("component: string") + .Output("output_handle: string") + .Doc(R"doc( +Initialize a component with the given beam size for a given ComputeSession. + +handle: A handle to a ComputeSession. +beam_size: The size of the beam to use on the component. +component: The name of a Component instance, matching the ComponentSpec.name. +output_handle: The handle to the same ComputeSession after initialization. +)doc"); + +REGISTER_OP("BatchSize") + .Input("handle: string") + .Attr("component: string") + .Output("batch_size: int32") + .Doc(R"doc( +Given a ComputeSession and a component name,return the component batch size. + +handle: A handle to a ComputeSession. +component: The name of a Component instance, matching the ComponentSpec.name. +batch_size: The size of the given component's batch. +)doc"); + +REGISTER_OP("SetTracing") + .Input("handle: string") + .Input("tracing_on: bool") + .Attr("component: string = 'NOT_USED_FOR_THIS_OP'") + .Output("output_handle: string") + .Doc(R"doc( +Given a ComputeSession, turns on or off tracing for all components. + +handle: A handle to a ComputeSession. +tracing_on: Whether or not to record traces. +output_handle: The handle to the same ComputeSession, with the tracing status changed. +)doc"); + +REGISTER_OP("AttachDataReader") + .Input("handle: string") + .Input("input_spec: string") + .Attr("component: string = 'NOT_USED_FOR_THIS_OP'") + .Output("output_handle: string") + .Doc(R"doc( +Given a ComputeSession, attach a data source. + +This op is agnostic to the type of input data. The vector of input strings is +interpreted by the backend. + +handle: A handle to a ComputeSession. +input_spec: A vector of strings, where each string represents one batch item. +output_handle: The handle to the same ComputeSession after attachment. +)doc"); + +REGISTER_OP("AdvanceFromOracle") + .Input("handle: string") + .Attr("component: string") + .Output("output_handle: string") + .Doc(R"doc( +Given a ComputeSession and a Component name, advance the component via oracle. + +handle: A handle to a ComputeSession. +component: The name of a Component instance, matching the ComponentSpec.name. +output_handle: The handle to the same ComputeSession after advancement. +)doc"); + +REGISTER_OP("AdvanceFromPrediction") + .Input("handle: string") + .Input("scores: float") + .Attr("component: string") + .Output("output_handle: string") + .Doc(R"doc( +Given a ComputeSession, a Component name, and a score tensor, advance the state. + +handle: A handle to a ComputeSession. +scores: A tensor of scores, ordered by {batch_size, beam_size, num_actions}. +component: The name of a Component instance, matching the ComponentSpec.name. +output_handle: A handle to the same ComputeSession after advancement. +)doc"); + +REGISTER_OP("DragnnEmbeddingInitializer") + .Output("embeddings: float") + .Attr("embedding_input: string") + .Attr("vocab: string") + .Attr("scaling_coefficient: float = 1.0") + .Attr("seed: int = 0") + .Attr("seed2: int = 0") + .Doc(R"doc( +*** PLACEHOLDER OP - FUNCTIONALITY NOT YET IMPLEMENTED *** + +Read embeddings from an an input for every key specified in a text vocab file. + +embeddings: A tensor containing embeddings from the specified sstable. +embedding_input: Path to location with embedding vectors. +vocab: Path to list of keys corresponding to the input. +scaling_coefficient: A scaling coefficient for the embedding matrix. +seed: If either `seed` or `seed2` are set to be non-zero, the random number + generator is seeded by the given seed. Otherwise, it is seeded by a + random seed. +seed2: A second seed to avoid seed collision. +)doc"); + +REGISTER_OP("ExtractFixedFeatures") + .Input("handle: string") + .Output("indices: int32") + .Output("ids: int64") + .Output("weights: float") + .Attr("component: string") + .Attr("channel_id: int") + .Doc(R"doc( +Given a ComputeSession, Component, and channel index, output fixed features. + +Fixed features returned as 3 vectors, 'indices', 'ids', and 'weights' of equal +length. 'ids' specifies which rows should be looked up in the embedding +matrix. 'weights' specifies a scale for each embedding vector. 'indices' is a +sorted vector that assigns the same index to embedding vectors that should be +summed together. + +handle: A handle to a ComputeSession. +indices: The row to add the feature to. +ids: The indices into embedding matrices for each feature. +weights: The weight for each looked up feature. +component: The name of a Component instance, matching the ComponentSpec.name. +channel_id: The feature channel to extract features for. +)doc"); + +REGISTER_OP("ExtractLinkFeatures") + .Input("handle: string") + .Output("step_idx: int32") + .Output("idx: int32") + .Attr("component: string") + .Attr("channel_id: int") + .Doc(R"doc( +Given a ComputeSession, Component, and a channel index, outputs link features. + +Output indices have shape {batch_size * beam_size * channel_size}. + +handle: A handle to a ComputeSession. +step_idx: The step indices to read activations from. +idx: indices The index within a step to read the activations from. +component: The name of a Component instance, matching the ComponentSpec.name. +channel_id: The feature channel to extract features for. +)doc"); + +REGISTER_OP("EmitOracleLabels") + .Input("handle: string") + .Output("gold_labels: int32") + .Attr("component: string") + .Doc(R"doc( +Given a ComputeSession and Component, emit a vector of gold labels. + +handle: A handle to a ComputeSession. +gold_labels: A [batch_size * beam_size] vector of gold labels for the current + ComputeSession. +component: The name of a Component instance, matching the ComponentSpec.name. +)doc"); + +REGISTER_OP("EmitAllFinal") + .Input("handle: string") + .Output("all_final: bool") + .Attr("component: string") + .Doc(R"doc( +Given a ComputeSession and Component, returns whether the Component is final. + +A component is considered final when all elements in the batch have beams +containing all final states. + +handle: A handle to a ComputeSession. +all_final: Whether every element in the specified component is 'final'. +component: The name of a Component instance, matching the ComponentSpec.name. +)doc"); + +REGISTER_OP("WriteAnnotations") + .Input("handle: string") + .Output("output_handle: string") + .Attr("component: string") + .Doc(R"doc( +Given a ComputeSession, has the given component write out its annotations. + +The annotations are written to the underlying data objects passed in at the +beginning of the computation. + +handle: A handle to a ComputeSession. +output_handle: A handle to the same ComputeSession after writing. +component: The name of a Component instance, matching the ComponentSpec.name. +)doc"); + +REGISTER_OP("EmitAnnotations") + .Input("handle: string") + .Output("annotations: string") + .Attr("component: string") + .Doc(R"doc( +Given a ComputeSession, emits strings with final predictions for the model. + +Predictions are given for each element in the final component's batch. + +handle: A handle to a ComputeSession. +annotations: A vector of strings representing the annotated data. +component: The name of a Component instance, matching the ComponentSpec.name. +)doc"); + +REGISTER_OP("GetComponentTrace") + .Input("handle: string") + .Output("trace: string") + .Attr("component: string") + .Doc(R"doc( +Gets the raw MasterTrace proto for each batch, state, and beam slot. + +handle: A handle to a ComputeSession. +trace: A vector of MasterTrace protos. +component: The name of a Component instance, matching the ComponentSpec.name. +)doc"); + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/resource_container.h b/syntaxnet/dragnn/core/resource_container.h new file mode 100644 index 0000000000000000000000000000000000000000..7ca72a05ccf97ec4f8d61d536a1ffb8a1b6d7379 --- /dev/null +++ b/syntaxnet/dragnn/core/resource_container.h @@ -0,0 +1,51 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_RESOURCE_CONTAINER_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_RESOURCE_CONTAINER_H_ + +#include + +#include "syntaxnet/base.h" +#include "tensorflow/core/framework/resource_mgr.h" + +namespace syntaxnet { +namespace dragnn { + +using tensorflow::strings::StrCat; + +// Wrapper to store a data type T in the ResourceMgr. There should be one per +// Session->Run() call that may happen concurrently. +template +class ResourceContainer : public tensorflow::ResourceBase { + public: + explicit ResourceContainer(std::unique_ptr data) + : data_(std::move(data)) {} + + ~ResourceContainer() override {} + + T *get() { return data_.get(); } + std::unique_ptr release() { return std::move(data_); } + + string DebugString() override { return "ResourceContainer"; } + + private: + std::unique_ptr data_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_RESOURCE_CONTAINER_H_ diff --git a/syntaxnet/dragnn/core/resource_container_test.cc b/syntaxnet/dragnn/core/resource_container_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..10316c0554f0d65d0c06d99788d431b5d2cda7ae --- /dev/null +++ b/syntaxnet/dragnn/core/resource_container_test.cc @@ -0,0 +1,64 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +// Tests the methods of ResourceContainer. +// +// NOTE(danielandor): For all tests: ResourceContainer is derived from +// RefCounted, which requires the use of Unref to reduce the ref count +// to zero and automatically delete the pointer. +#include "dragnn/core/resource_container.h" + +#include +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +class MockDatatype {}; + +TEST(ResourceContainerTest, Get) { + std::unique_ptr data(new MockDatatype()); + MockDatatype *data_ptr = data.get(); + auto *container = new ResourceContainer(std::move(data)); + EXPECT_EQ(data_ptr, container->get()); + container->Unref(); +} + +TEST(ResourceContainerTest, Release) { + std::unique_ptr data(new MockDatatype()); + MockDatatype *data_ptr = data.get(); + auto *container = new ResourceContainer(std::move(data)); + std::unique_ptr data_again = container->release(); + container->Unref(); + EXPECT_EQ(data_ptr, data_again.get()); +} + +TEST(ResourceContainerTest, NullptrOnGetAfterRelease) { + std::unique_ptr data(new MockDatatype()); + auto *container = new ResourceContainer(std::move(data)); + container->release(); + EXPECT_EQ(nullptr, container->get()); + container->Unref(); +} + +TEST(ResourceContainerTest, DebugString) { + std::unique_ptr data(new MockDatatype()); + auto *container = new ResourceContainer(std::move(data)); + EXPECT_EQ("ResourceContainer", container->DebugString()); + container->Unref(); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/test/BUILD b/syntaxnet/dragnn/core/test/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..9810bd2533d0af457064fa94ff8c2635e1650bd9 --- /dev/null +++ b/syntaxnet/dragnn/core/test/BUILD @@ -0,0 +1,56 @@ +package( + default_visibility = ["//visibility:public"], + features = ["-layering_check"], +) + +cc_library( + name = "mock_component", + testonly = True, + hdrs = ["mock_component.h"], + deps = [ + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core:index_translator", + "//dragnn/core/interfaces:component", + "//dragnn/core/interfaces:transition_state", + "//dragnn/protos:data_proto", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_library( + name = "mock_compute_session", + testonly = True, + hdrs = ["mock_compute_session.h"], + deps = [ + "//dragnn/components/util:bulk_feature_extractor", + "//dragnn/core:compute_session", + "//dragnn/protos:data_proto", + "//dragnn/protos:spec_proto", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_library( + name = "mock_transition_state", + testonly = True, + hdrs = ["mock_transition_state.h"], + deps = [ + "//dragnn/core/interfaces:transition_state", + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) + +cc_library( + name = "generic", + testonly = True, + srcs = ["generic.cc"], + hdrs = ["generic.h"], + deps = [ + "//syntaxnet:base", + "//syntaxnet:test_main", + ], +) diff --git a/syntaxnet/dragnn/core/test/generic.cc b/syntaxnet/dragnn/core/test/generic.cc new file mode 100644 index 0000000000000000000000000000000000000000..6e7cddbcec67cbd5a82d45ccce31046cd5c365d6 --- /dev/null +++ b/syntaxnet/dragnn/core/test/generic.cc @@ -0,0 +1,36 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/core/test/generic.h" + +#include "tensorflow/core/lib/io/path.h" + +namespace syntaxnet { +namespace test { + +string GetTestDataPrefix() { + const char *env = getenv("TEST_SRCDIR"); + const char *workspace = getenv("TEST_WORKSPACE"); + if (!env || env[0] == '\0' || !workspace || workspace[0] == '\0') { + LOG(FATAL) << "Test directories not set up"; + } + return tensorflow::io::JoinPath( + + env, workspace + ); +} + +} // namespace test +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/core/test/generic.h b/syntaxnet/dragnn/core/test/generic.h new file mode 100644 index 0000000000000000000000000000000000000000..5624856bbc7c82d087655bf0b7fab4feddfb8209 --- /dev/null +++ b/syntaxnet/dragnn/core/test/generic.h @@ -0,0 +1,40 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_GENERIC_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_GENERIC_H_ + +#include + +#include + +#include "syntaxnet/base.h" +#include "tensorflow/core/platform/protobuf.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace test { + +MATCHER_P(EqualsProto, a, "Protos are not equivalent:") { + return a.DebugString() == arg.DebugString(); +} + +// Returns the prefix for where the test data is stored. +string GetTestDataPrefix(); + +} // namespace test +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_GENERIC_H_ diff --git a/syntaxnet/dragnn/core/test/mock_component.h b/syntaxnet/dragnn/core/test/mock_component.h new file mode 100644 index 0000000000000000000000000000000000000000..74d9986b4bdf9689572561b06ad7eddf7f0e7e65 --- /dev/null +++ b/syntaxnet/dragnn/core/test/mock_component.h @@ -0,0 +1,78 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_COMPONENT_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_COMPONENT_H_ + +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/index_translator.h" +#include "dragnn/core/interfaces/component.h" +#include "dragnn/core/interfaces/transition_state.h" +#include "dragnn/protos/data.pb.h" +#include "dragnn/protos/spec.pb.h" +#include "syntaxnet/base.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +class MockComponent : public Component { + public: + MOCK_METHOD1(InitializeComponent, void(const ComponentSpec &spec)); + MOCK_METHOD3( + InitializeData, + void(const std::vector> &states, + int max_beam_size, InputBatchCache *input_data)); + MOCK_CONST_METHOD0(IsReady, bool()); + MOCK_METHOD0(InitializeTracing, void()); + MOCK_METHOD0(DisableTracing, void()); + MOCK_CONST_METHOD0(Name, string()); + MOCK_CONST_METHOD0(BatchSize, int()); + MOCK_CONST_METHOD0(BeamSize, int()); + MOCK_CONST_METHOD1(StepsTaken, int(int batch_index)); + MOCK_CONST_METHOD3(GetBeamIndexAtStep, + int(int step, int current_index, int batch)); + MOCK_CONST_METHOD2(GetSourceBeamIndex, int(int current_index, int batch)); + MOCK_METHOD2(AdvanceFromPrediction, + void(const float transition_matrix[], int matrix_length)); + MOCK_METHOD0(AdvanceFromOracle, void()); + MOCK_CONST_METHOD0(IsTerminal, bool()); + MOCK_METHOD0(GetBeam, std::vector>()); + MOCK_CONST_METHOD4(GetFixedFeatures, + int(std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id)); + MOCK_METHOD1(BulkGetFixedFeatures, + int(const BulkFeatureExtractor &extractor)); + MOCK_CONST_METHOD1(GetRawLinkFeatures, + std::vector(int channel_id)); + MOCK_CONST_METHOD0(GetOracleLabels, std::vector>()); + MOCK_METHOD0(ResetComponent, void()); + MOCK_METHOD1(GetStepLookupFunction, + std::function(const string &method)); + MOCK_METHOD0(FinalizeData, void()); + MOCK_CONST_METHOD0(GetTraceProtos, + std::vector>()); + MOCK_METHOD2(AddTranslatedLinkFeaturesToTrace, + void(const std::vector &features, int channel_id)); +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_COMPONENT_H_ diff --git a/syntaxnet/dragnn/core/test/mock_compute_session.h b/syntaxnet/dragnn/core/test/mock_compute_session.h new file mode 100644 index 0000000000000000000000000000000000000000..8df455c4eb2aecf8531a67659a234b189111ceb0 --- /dev/null +++ b/syntaxnet/dragnn/core/test/mock_compute_session.h @@ -0,0 +1,76 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_COMPUTE_SESSION_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_COMPUTE_SESSION_H_ + +#include + +#include "dragnn/components/util/bulk_feature_extractor.h" +#include "dragnn/core/compute_session.h" +#include "dragnn/protos/data.pb.h" +#include "dragnn/protos/spec.pb.h" +#include "syntaxnet/base.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +class MockComputeSession : public ComputeSession { + public: + MOCK_METHOD2(Init, void(const MasterSpec &master_spec, + const GridPoint &hyperparams)); + MOCK_METHOD2(InitializeComponentData, + void(const string &component_name, int max_beam_size)); + MOCK_CONST_METHOD1(BatchSize, int(const string &component_name)); + MOCK_CONST_METHOD1(BeamSize, int(const string &component_name)); + MOCK_CONST_METHOD1(Spec, const ComponentSpec &(const string &component_name)); + MOCK_METHOD2(SourceComponentBeamSize, + int(const string &component_name, int channel_id)); + MOCK_METHOD1(AdvanceFromOracle, void(const string &component_name)); + MOCK_METHOD3(AdvanceFromPrediction, + void(const string &component_name, const float score_matrix[], + int score_matrix_length)); + MOCK_CONST_METHOD5(GetInputFeatures, + int(const string &component_name, + std::function allocate_indices, + std::function allocate_ids, + std::function allocate_weights, + int channel_id)); + MOCK_METHOD2(BulkGetInputFeatures, + int(const string &component_name, + const BulkFeatureExtractor &extractor)); + MOCK_METHOD2(GetTranslatedLinkFeatures, + std::vector(const string &component_name, + int channel_id)); + MOCK_METHOD1(EmitOracleLabels, + std::vector>(const string &component_name)); + MOCK_METHOD1(IsTerminal, bool(const string &component_name)); + MOCK_METHOD1(FinalizeData, void(const string &component_name)); + MOCK_METHOD0(GetSerializedPredictions, std::vector()); + MOCK_METHOD0(GetTraceProtos, std::vector()); + MOCK_METHOD1(SetInputData, void(const std::vector &data)); + MOCK_METHOD0(ResetSession, void()); + MOCK_METHOD1(SetTracing, void(bool tracing_on)); + MOCK_CONST_METHOD0(Id, int()); + MOCK_CONST_METHOD1(GetDescription, string(const string &component_name)); + MOCK_CONST_METHOD1(Translators, const std::vector( + const string &component_name)); +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_COMPUTE_SESSION_H_ diff --git a/syntaxnet/dragnn/core/test/mock_transition_state.h b/syntaxnet/dragnn/core/test/mock_transition_state.h new file mode 100644 index 0000000000000000000000000000000000000000..a6737cb3ef47e12b6a2ba08d355d4076f80389bb --- /dev/null +++ b/syntaxnet/dragnn/core/test/mock_transition_state.h @@ -0,0 +1,45 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_TRANSITION_STATE_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_TRANSITION_STATE_H_ + +#include + +#include + +#include "dragnn/core/interfaces/transition_state.h" +#include "syntaxnet/base.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +class MockTransitionState : public TransitionState { + public: + MOCK_METHOD1(Init, void(const TransitionState &parent)); + MOCK_CONST_METHOD0(Clone, std::unique_ptr()); + MOCK_CONST_METHOD0(ParentBeamIndex, const int()); + MOCK_METHOD1(SetBeamIndex, void(const int index)); + MOCK_CONST_METHOD0(GetBeamIndex, const int()); + MOCK_CONST_METHOD0(GetScore, const float()); + MOCK_METHOD1(SetScore, void(const float score)); + MOCK_CONST_METHOD0(HTMLRepresentation, string()); +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_CORE_TEST_MOCK_TRANSITION_STATE_H_ diff --git a/syntaxnet/dragnn/core/testdata/brain-parser-model b/syntaxnet/dragnn/core/testdata/brain-parser-model new file mode 100644 index 0000000000000000000000000000000000000000..0ad278116f27a30400b9add779f99ba848fce803 Binary files /dev/null and b/syntaxnet/dragnn/core/testdata/brain-parser-model differ diff --git a/syntaxnet/dragnn/core/testdata/master_spec_link.textproto b/syntaxnet/dragnn/core/testdata/master_spec_link.textproto new file mode 100644 index 0000000000000000000000000000000000000000..aea09690c5f6c9f31684a87909167838fa03da02 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/master_spec_link.textproto @@ -0,0 +1,86 @@ +component { + name: "parser" + num_actions: 100 + transition_system { + registered_name: "arc-standard" + parameters { + key: "entity_name_tokenizer" + value: "pre-tokenized" + } + parameters { + key: "language" + value: "en" + } + parameters { + key: "neurosis_feature_syntax_version" + value: "2" + } + parameters { + key: "parser_skip_deterministic" + value: "false" + } + parameters { + key: "parser_transition_system" + value: "arc-standard" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "tags" + fml: "input.tag stack.tag stack(1).tag" + embedding_dim: 32 + vocabulary_size: 51 + size: 3 + predicate_map: "hashed" + } + fixed_feature { + name: "words" + fml: "input.word" + embedding_dim: 64 + vocabulary_size: 39396 + size: 1 + predicate_map: "hashed" + } + linked_feature { + name: "rnn" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "parser" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: 'SyntaxNetComponent' + } + component_builder { + registered_name: 'DynamicComponentBuilder' + } + network_unit { + registered_name: 'FeedForwardNetwork' + parameters { + key: 'hidden_layer_sizes' + value: '64' + } + } +} diff --git a/syntaxnet/dragnn/core/testdata/repository b/syntaxnet/dragnn/core/testdata/repository new file mode 100644 index 0000000000000000000000000000000000000000..1bb1180529fce623cdc07a704372318fe7ba9efb Binary files /dev/null and b/syntaxnet/dragnn/core/testdata/repository differ diff --git a/syntaxnet/dragnn/core/testdata/simple-tagger.brain-parser-model b/syntaxnet/dragnn/core/testdata/simple-tagger.brain-parser-model new file mode 100644 index 0000000000000000000000000000000000000000..8fa7dfca794eeb6c1584a25f87ad2b1fb2f9b34e Binary files /dev/null and b/syntaxnet/dragnn/core/testdata/simple-tagger.brain-parser-model differ diff --git a/syntaxnet/dragnn/core/testdata/simple-tagger.repository b/syntaxnet/dragnn/core/testdata/simple-tagger.repository new file mode 100644 index 0000000000000000000000000000000000000000..b3cc9e04b07d009b0faff002d8c02c4ca3a84449 Binary files /dev/null and b/syntaxnet/dragnn/core/testdata/simple-tagger.repository differ diff --git a/syntaxnet/dragnn/core/testdata/simple-tagger.tag-map b/syntaxnet/dragnn/core/testdata/simple-tagger.tag-map new file mode 100644 index 0000000000000000000000000000000000000000..8fac8c5c6f1a02cb6a1ec3b40fdec276e56a1f3b --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/simple-tagger.tag-map @@ -0,0 +1,46 @@ +45 +NN 132998 +IN 98554 +NNP 91466 +DT 81832 +JJ 61217 +NNS 59856 +, 48727 +. 39478 +CD 36568 +RB 30907 +VBD 29889 +VB 26438 +CC 23959 +TO 22357 +VBZ 21672 +VBN 20024 +PRP 17436 +VBG 14846 +VBP 12491 +MD 9803 +POS 8701 +PRP$ 8407 +$ 7372 +`` 7092 +'' 6919 +: 4772 +WDT 4294 +JJR 3238 +NNPS 2673 +RP 2662 +WP 2363 +WRB 2143 +JJS 1947 +RBR 1768 +-RRB- 1376 +-LRB- 1366 +EX 863 +RBS 451 +PDT 368 +FW 234 +WP$ 168 +# 142 +UH 97 +SYM 58 +LS 36 diff --git a/syntaxnet/dragnn/core/testdata/simple_parser_master_spec.textproto b/syntaxnet/dragnn/core/testdata/simple_parser_master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..2f98236c206345e6eaa53f0967a5f412e1de04f7 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/simple_parser_master_spec.textproto @@ -0,0 +1,59 @@ +component { + name: "parser" + num_actions : 93 + transition_system { + registered_name: "arc-standard" + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + } + linked_feature { + name: "rnn" + fml: "stack.focus stack(1).focus" + embedding_dim: 32 + size: 2 + source_component: "parser" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: 'DynamicComponentBuilder' + } + network_unit { + registered_name: 'FeedForwardNetwork' + parameters { + key: 'hidden_layer_sizes' + value: '64' + } + } + + inference_beam_size: 4 +} diff --git a/syntaxnet/dragnn/core/testdata/simple_tagger_lstm_master_spec.textproto b/syntaxnet/dragnn/core/testdata/simple_tagger_lstm_master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..a418f7e2d7f675d99f8d39a6a5dc4dd5d43d5c3e --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/simple_tagger_lstm_master_spec.textproto @@ -0,0 +1,52 @@ +component { + name: "tagger" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "LSTMNetwork" + parameters { + key: "hidden_layer_sizes" + value: "64" + } + } +} diff --git a/syntaxnet/dragnn/core/testdata/simple_tagger_master_spec.textproto b/syntaxnet/dragnn/core/testdata/simple_tagger_master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..502286335a58794d569a9313b87feeea6b37b37a --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/simple_tagger_master_spec.textproto @@ -0,0 +1,63 @@ +component { + name: "tagger" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + } + linked_feature { + name: "rnn" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "tagger" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "FeedForwardNetwork" + parameters { + key: "hidden_layer_sizes" + value: "64" + } + } + training_beam_size: 1 + inference_beam_size: 3 +} diff --git a/syntaxnet/dragnn/core/testdata/simple_tagger_wrapped_lstm_master_spec.textproto b/syntaxnet/dragnn/core/testdata/simple_tagger_wrapped_lstm_master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..24d9e35abd4936f96709f2a5e808462d9facdaa8 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/simple_tagger_wrapped_lstm_master_spec.textproto @@ -0,0 +1,65 @@ +component { + name: "tagger" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + predicate_map: "hashed" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "wrapped_units.LayerNormBasicLSTMNetwork" + parameters { + key: "hidden_layer_sizes" + value: "64,64,64" + } + parameters { + key: "input_dropout_rate" + value: "0.9" + } + parameters { + key: "recurrent_dropout_rate" + value: "0.9" + } + parameters { + key: "layer_norm" + value: "true" + } + } +} diff --git a/syntaxnet/dragnn/core/testdata/split_tagger_master_spec.textproto b/syntaxnet/dragnn/core/testdata/split_tagger_master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..388405c744ca2eacc97a036570f4e4d3ebef5e45 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/split_tagger_master_spec.textproto @@ -0,0 +1,111 @@ +component { + name: "tagger-features" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + } + linked_feature { + name: "rnn" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "tagger-features" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "FeedForwardNetwork" + parameters { + key: "hidden_layer_sizes" + value: "64" + } + } +} +component { + name: "tagger" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + linked_feature { + name: "features" + fml: "input.focus" + embedding_dim: -1 + size: 1 + source_component: "tagger-features" + source_translator: "identity" + source_layer: "logits" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "IdentityNetwork" + } +} diff --git a/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.label-map b/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.label-map new file mode 100644 index 0000000000000000000000000000000000000000..8fdd1fc86d9f33e2e639d794bb2b719a0767bc75 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.label-map @@ -0,0 +1,47 @@ +46 +punct 243160 +prep 194627 +pobj 186958 +det 170592 +nsubj 144821 +nn 144800 +amod 117242 +ROOT 90592 +dobj 88551 +aux 76523 +advmod 72893 +conj 59384 +cc 57532 +num 36350 +poss 35117 +dep 34986 +ccomp 29470 +cop 25991 +mark 25141 +xcomp 25111 +rcmod 16234 +auxpass 15740 +advcl 14996 +possessive 14866 +nsubjpass 14133 +pcomp 12488 +appos 11112 +partmod 11106 +neg 11090 +number 10658 +prt 7123 +quantmod 6653 +tmod 5418 +infmod 5134 +npadvmod 3213 +parataxis 3012 +mwe 2793 +expl 2712 +iobj 1642 +acomp 1632 +discourse 1381 +csubj 1225 +predet 1160 +preconj 749 +goeswith 146 +csubjpass 41 diff --git a/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.tag-map b/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.tag-map new file mode 100644 index 0000000000000000000000000000000000000000..2cad1a73b010ace29854dc80296c79728e9b3c52 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.tag-map @@ -0,0 +1,50 @@ +49 +NN 285194 +IN 228165 +DT 179147 +NNP 175147 +JJ 125667 +NNS 115732 +, 97481 +. 85938 +RB 78513 +VB 63952 +CC 57554 +VBD 56635 +CD 55674 +PRP 55244 +VBZ 48126 +VBN 44458 +VBG 34524 +VBP 33669 +TO 28772 +MD 22364 +PRP$ 20706 +HYPH 18526 +POS 14905 +`` 12193 +'' 12154 +WDT 10267 +: 8713 +$ 7993 +WP 7336 +RP 7335 +WRB 6634 +JJR 6295 +NNPS 5917 +-RRB- 3904 +-LRB- 3840 +JJS 3596 +RBR 3186 +EX 2733 +UH 1521 +RBS 1467 +PDT 1271 +FW 928 +NFP 844 +SYM 652 +ADD 476 +LS 392 +WP$ 332 +GW 184 +AFX 42 diff --git a/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.word-map b/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.word-map new file mode 100644 index 0000000000000000000000000000000000000000..4b9e22b6f7ddd35776803739392c31328612f2b7 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/syntaxnet_tagger.word-map @@ -0,0 +1,4 @@ +3 +sentence 4 +. 3 +0 2 diff --git a/syntaxnet/dragnn/core/testdata/tagger_parser_master_spec.textproto b/syntaxnet/dragnn/core/testdata/tagger_parser_master_spec.textproto new file mode 100644 index 0000000000000000000000000000000000000000..5a60df0044c3f8cbf395f19742690b117af86280 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/tagger_parser_master_spec.textproto @@ -0,0 +1,185 @@ +component { + name: "features" + num_actions : 1 + transition_system { + registered_name: "shift-only" + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "words" + fml: "input(-1).word input(-2).word input(-3).word input.word input(1).word input(2).word input(3).word" + embedding_dim: 64 + vocabulary_size: 39397 + size: 7 + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "IdentityNetwork" + } + inference_beam_size: 1 +} +component { + name: "tagger" + num_actions : 49 + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + linked_feature { + name: "words" + fml: "input.focus" + embedding_dim: -1 + size: 1 + source_component: "features" + source_translator: "identity" + source_layer: "input_embeddings" + } + linked_feature { + name: "rnn" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "tagger" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "FeedForwardNetwork" + parameters { + key: "hidden_layer_sizes" + value: "64" + } + } + inference_beam_size: 1 +} +component { + name: "parser" + num_actions : 93 + transition_system { + registered_name: "arc-standard" + } + resource { + name: "tag-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.tag-map" + file_format: "text" + } + } + resource { + name: "word-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.word-map" + file_format: "text" + } + } + resource { + name: "label-map" + part { + file_pattern: "TESTDATA/syntaxnet_tagger.label-map" + file_format: "text" + } + } + fixed_feature { + name: "action" + fml: "last-action" + embedding_dim: 32 + vocabulary_size: 93 + size: 1 + } + linked_feature { + name: "words" + fml: "input.focus" + embedding_dim: -1 + size: 1 + source_component: "features" + source_translator: "identity" + source_layer: "input_embeddings" + } + linked_feature { + name: "tagger" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "tagger" + source_translator: "identity" + source_layer: "layer_0" + } + linked_feature { + name: "rnn" + fml: "stack.focus" + embedding_dim: 32 + size: 1 + source_component: "parser" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + backend { + registered_name: "SyntaxNetComponent" + } + component_builder { + registered_name: "DynamicComponentBuilder" + } + network_unit { + registered_name: "FeedForwardNetwork" + parameters { + key: "hidden_layer_sizes" + value: "64" + } + } + inference_beam_size: 5 +} diff --git a/syntaxnet/dragnn/core/testdata/ud-hungarian.master-spec b/syntaxnet/dragnn/core/testdata/ud-hungarian.master-spec new file mode 100644 index 0000000000000000000000000000000000000000..9090c225974211b598c3e941db53c8965c0e2bf8 --- /dev/null +++ b/syntaxnet/dragnn/core/testdata/ud-hungarian.master-spec @@ -0,0 +1,213 @@ +component { + name: "rl_rnn" + transition_system { + registered_name: "shift-only" + parameters { + key: "left-to-right" + value: "false" + } + parameters { + key: "parser_skip_deterministic" + value: "false" + } + } + resource { + name: "char-ngram-map" + part { + file_pattern: "TOPDIR/ud-hungarian.char-ngram-map" + file_format: "text" + record_format: "" + } + } + resource { + name: "word-map" + part { + file_pattern: "TOPDIR/ud-hungarian.word-map" + file_format: "text" + record_format: "" + } + } + resource { + name: "label-map" + part { + file_pattern: "TOPDIR/ud-hungarian.label-map" + file_format: "text" + record_format: "" + } + } + fixed_feature { + name: "char_ngram" + fml: "input.token.char-ngram" + embedding_dim: 16 + vocabulary_size: 9943 + size: 1 + } + fixed_feature { + name: "other" + fml: "input.token {digit hyphen punctuation-amount quote }" + embedding_dim: 8 + vocabulary_size: 5 + size: 4 + } + fixed_feature { + name: "words" + fml: "input.word" + embedding_dim: 64 + vocabulary_size: 11090 + size: 1 + } + network_unit { + registered_name: "wrapped_units.LayerNormBasicLSTMNetwork" + parameters { + key: "hidden_layer_sizes" + value: "256" + } + } + component_builder { + registered_name: 'DynamicComponentBuilder' + } + backend { + registered_name: "SyntaxNetComponent" + } + num_actions: 1 + attention_component: "" +} +component { + name: "tagger" + transition_system { + registered_name: "tagger" + parameters { + key: "join_category_to_pos" + value: "true" + } + parameters { + key: "parser_skip_deterministic" + value: "false" + } + } + resource { + name: "tag-map" + part { + file_pattern: "TOPDIR/ud-hungarian.tag-map" + file_format: "text" + record_format: "" + } + } + resource { + name: "label-map" + part { + file_pattern: "TOPDIR/ud-hungarian.label-map" + file_format: "text" + record_format: "" + } + } + fixed_feature { + name: "action" + fml: "last-action" + embedding_dim: 32 + vocabulary_size: 100 + size: 1 + } + linked_feature { + name: "encoder" + fml: "input.focus" + embedding_dim: 64 + size: 1 + source_component: "rl_rnn" + source_translator: "reverse-token" + source_layer: "state_h_0" + } + network_unit { + registered_name: "wrapped_units.LayerNormBasicLSTMNetwork" + parameters { + key: "hidden_layer_sizes" + value: "256" + } + } + component_builder { + registered_name: 'DynamicComponentBuilder' + } + backend { + registered_name: "SyntaxNetComponent" + } + num_actions: 642 + attention_component: "" +} +component { + name: "parser" + transition_system { + registered_name: "arc-standard" + parameters { + key: "parser_skip_deterministic" + value: "false" + } + } + resource { + name: "label-map" + part { + file_pattern: "TOPDIR/ud-hungarian.label-map" + file_format: "text" + record_format: "" + } + } + fixed_feature { + name: "action" + fml: "last-action" + embedding_dim: 32 + vocabulary_size: 100 + size: 1 + } + fixed_feature { + name: "labels" + fml: "stack.child(1).label stack.child(1).sibling(-1).label stack.child(-1).label stack.child(-1).sibling(1).label stack(1).child(1).label stack(1).child(1).sibling(-1).label stack(1).child(-1).label stack(1).child(-1).sibling(1).label stack.child(2).label stack.child(-2).label stack(1).child(2).label stack(1).child(-2).label" + embedding_dim: 16 + vocabulary_size: 57 + size: 12 + } + linked_feature { + name: "encoder" + fml: "input.focus" + embedding_dim: 64 + size: 1 + source_component: "rl_rnn" + source_translator: "reverse-token" + source_layer: "state_h_0" + } + linked_feature { + name: "parser-rnn" + fml: "stack.focus stack(1).focus" + embedding_dim: 64 + size: 2 + source_component: "parser" + source_translator: "shift-reduce-step" + source_layer: "layer_0" + } + linked_feature { + name: "tagger" + fml: "input.focus stack.focus stack(1).focus" + embedding_dim: 64 + size: 3 + source_component: "tagger" + source_translator: "identity" + source_layer: "state_h_0" + } + network_unit { + registered_name: 'FeedForwardNetwork' + parameters { + key: "hidden_layer_sizes" + value: "256,256" + } + parameters { + key: "layer_norm_hidden" + value: "True" + } + } + component_builder { + registered_name: 'DynamicComponentBuilder' + } + backend { + registered_name: "SyntaxNetComponent" + } + num_actions: 109 + attention_component: "" +} diff --git a/syntaxnet/dragnn/io/BUILD b/syntaxnet/dragnn/io/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..6a101679045fb366cd48217eaeecaec86b061c91 --- /dev/null +++ b/syntaxnet/dragnn/io/BUILD @@ -0,0 +1,34 @@ +package(default_visibility = ["//visibility:public"]) + +cc_library( + name = "sentence_input_batch", + srcs = ["sentence_input_batch.cc"], + hdrs = ["sentence_input_batch.h"], + deps = [ + ":syntaxnet_sentence", + "//dragnn/core/interfaces:input_batch", + "//syntaxnet:base", + "//syntaxnet:sentence_proto", + ], +) + +cc_library( + name = "syntaxnet_sentence", + hdrs = ["syntaxnet_sentence.h"], + deps = [ + "//syntaxnet:sentence_proto", + "//syntaxnet:workspace", + ], +) + +cc_test( + name = "sentence_input_batch_test", + srcs = ["sentence_input_batch_test.cc"], + deps = [ + ":sentence_input_batch", + "//dragnn/core/test:generic", + "//syntaxnet:sentence_proto", + "//syntaxnet:test_main", + "@org_tensorflow//tensorflow/core:test", + ], +) diff --git a/syntaxnet/dragnn/io/sentence_input_batch.cc b/syntaxnet/dragnn/io/sentence_input_batch.cc new file mode 100644 index 0000000000000000000000000000000000000000..8ea595c344ac311cdb365d0f6426c82e68b668d3 --- /dev/null +++ b/syntaxnet/dragnn/io/sentence_input_batch.cc @@ -0,0 +1,46 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/io/sentence_input_batch.h" + +#include "syntaxnet/sentence.pb.h" + +namespace syntaxnet { +namespace dragnn { + +void SentenceInputBatch::SetData( + const std::vector &stringified_sentence_protos) { + for (const auto &stringified_proto : stringified_sentence_protos) { + std::unique_ptr sentence(new Sentence); + std::unique_ptr workspace_set(new WorkspaceSet); + CHECK(sentence->ParseFromString(stringified_proto)) + << "Unable to parse string input as syntaxnet.Sentence."; + SyntaxNetSentence aug_sentence(std::move(sentence), + std::move(workspace_set)); + data_.push_back(std::move(aug_sentence)); + } +} + +const std::vector SentenceInputBatch::GetSerializedData() const { + std::vector output_data; + output_data.resize(data_.size()); + for (int i = 0; i < data_.size(); ++i) { + data_[i].sentence()->SerializeToString(&(output_data[i])); + } + return output_data; +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/io/sentence_input_batch.h b/syntaxnet/dragnn/io/sentence_input_batch.h new file mode 100644 index 0000000000000000000000000000000000000000..7c355813a770c0ba650fe154a33ff7dbfabcf4db --- /dev/null +++ b/syntaxnet/dragnn/io/sentence_input_batch.h @@ -0,0 +1,52 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_IO_SENTENCE_INPUT_BATCH_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_IO_SENTENCE_INPUT_BATCH_H_ + +#include +#include + +#include "dragnn/core/interfaces/input_batch.h" +#include "dragnn/io/syntaxnet_sentence.h" +#include "syntaxnet/base.h" + +namespace syntaxnet { +namespace dragnn { + +// Data accessor backed by a syntaxnet::Sentence object. +class SentenceInputBatch : public InputBatch { + public: + SentenceInputBatch() {} + + // Translates from a vector of stringified Sentence protos. + void SetData( + const std::vector &stringified_sentence_protos) override; + + // Translates to a vector of stringified Sentence protos. + const std::vector GetSerializedData() const override; + + // Get the underlying Sentences. + std::vector *data() { return &data_; } + + private: + // The backing Sentence protos. + std::vector data_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_IO_SENTENCE_INPUT_BATCH_H_ diff --git a/syntaxnet/dragnn/io/sentence_input_batch_test.cc b/syntaxnet/dragnn/io/sentence_input_batch_test.cc new file mode 100644 index 0000000000000000000000000000000000000000..0a3f6e69ed28f38c3d370e16af35a99a7115ebaa --- /dev/null +++ b/syntaxnet/dragnn/io/sentence_input_batch_test.cc @@ -0,0 +1,69 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#include "dragnn/io/sentence_input_batch.h" + +#include "dragnn/core/test/generic.h" +#include "syntaxnet/sentence.pb.h" +#include "tensorflow/core/platform/test.h" + +namespace syntaxnet { +namespace dragnn { + +using syntaxnet::test::EqualsProto; + +TEST(SentenceInputBatchTest, ConvertsFromStringifiedProtos) { + // Create some distinct Sentence protos. + Sentence sentence_one; + sentence_one.set_docid("foo"); + Sentence sentence_two; + sentence_two.set_docid("bar"); + std::vector protos({sentence_one, sentence_two}); + + // Create stringified versions. + std::vector strings; + for (const auto &sentence : protos) { + string str; + sentence.SerializeToString(&str); + strings.push_back(str); + } + + // Create a SentenceInputBatch. The data inside it should match the protos. + SentenceInputBatch set; + set.SetData(strings); + auto converted_data = set.data(); + for (int i = 0; i < protos.size(); ++i) { + EXPECT_THAT(*(converted_data->at(i).sentence()), EqualsProto(protos.at(i))); + EXPECT_NE(converted_data->at(i).workspace(), nullptr); + } + + // Get the data back out. The strings should be identical. + auto output = set.GetSerializedData(); + EXPECT_EQ(output.size(), strings.size()); + EXPECT_NE(output.size(), 0); + for (int i = 0; i < output.size(); ++i) { + EXPECT_EQ(strings.at(i), output.at(i)); + } +} + +TEST(SentenceInputBatchTest, BadlyFormedProtosDie) { + // Create a input batch with malformed data. This should cause a CHECK fail. + SentenceInputBatch set; + EXPECT_DEATH(set.SetData({"BADLY FORMATTED DATA. SHOULD CAUSE A CHECK"}), + "Unable to parse string input"); +} + +} // namespace dragnn +} // namespace syntaxnet diff --git a/syntaxnet/dragnn/io/syntaxnet_sentence.h b/syntaxnet/dragnn/io/syntaxnet_sentence.h new file mode 100644 index 0000000000000000000000000000000000000000..d9076b44f1646f4400f87e4dc6affbd4bcbf6f69 --- /dev/null +++ b/syntaxnet/dragnn/io/syntaxnet_sentence.h @@ -0,0 +1,42 @@ +// Copyright 2017 Google Inc. All Rights Reserved. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// ============================================================================= + +#ifndef NLP_SAFT_OPENSOURCE_DRAGNN_IO_SYNTAXNET_SENTENCE_H_ +#define NLP_SAFT_OPENSOURCE_DRAGNN_IO_SYNTAXNET_SENTENCE_H_ + +#include "syntaxnet/sentence.pb.h" +#include "syntaxnet/workspace.h" + +namespace syntaxnet { +namespace dragnn { + +class SyntaxNetSentence { + public: + SyntaxNetSentence(std::unique_ptr sentence, + std::unique_ptr workspace) + : sentence_(std::move(sentence)), workspace_(std::move(workspace)) {} + + Sentence *sentence() const { return sentence_.get(); } + WorkspaceSet *workspace() const { return workspace_.get(); } + + private: + std::unique_ptr sentence_; + std::unique_ptr workspace_; +}; + +} // namespace dragnn +} // namespace syntaxnet + +#endif // NLP_SAFT_OPENSOURCE_DRAGNN_IO_SYNTAXNET_SENTENCE_H_ diff --git a/syntaxnet/dragnn/protos/BUILD b/syntaxnet/dragnn/protos/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..1559232760bb8a70651667c514829daf98d827f0 --- /dev/null +++ b/syntaxnet/dragnn/protos/BUILD @@ -0,0 +1,43 @@ +package(default_visibility = ["//visibility:public"]) + +load( + "//syntaxnet:syntaxnet.bzl", + "tf_proto_library", + "tf_proto_library_py", +) + +# Protos. + +tf_proto_library( + name = "data_proto", + srcs = ["data.proto"], +) + +tf_proto_library( + name = "trace_proto", + srcs = ["trace.proto"], + deps = [ + ":data_proto", + ], +) + +tf_proto_library( + name = "spec_proto", + srcs = ["spec.proto"], +) + +tf_proto_library_py( + name = "data_py_pb2", + srcs = ["data.proto"], +) + +tf_proto_library_py( + name = "trace_py_pb2", + srcs = ["trace.proto"], + deps = [":data_py_pb2"], +) + +tf_proto_library_py( + name = "spec_py_pb2", + srcs = ["spec.proto"], +) diff --git a/syntaxnet/dragnn/protos/data.proto b/syntaxnet/dragnn/protos/data.proto new file mode 100644 index 0000000000000000000000000000000000000000..222bfea5726ec29e5fc16852e4d78b774f37eae3 --- /dev/null +++ b/syntaxnet/dragnn/protos/data.proto @@ -0,0 +1,39 @@ +// DRAGNN data proto. See go/dragnn-design for more information. + +syntax = "proto2"; + +package syntaxnet.dragnn; + +// A fixed sparse bag of features in DRAGNN. The id, weight, and description +// fields are all aligned if present (ie, any of these that are non-empty should +// have the same # items). If weight is omitted, 1.0 is used. +// +// These features as interepreted as multiple firings of a single feature +// template: e.g., for a single focus word, a bag of ngrams. +message FixedFeatures { + repeated uint64 id = 1; + repeated float weight = 2; + + // string-valued description of each *feature value*. (Only used for + // debugging.) + repeated string value_name = 3; + + // string-valued name of feature. (Only used for debugging.) + optional string feature_name = 4; +} + +// A feature in DRAGNN thats link a component to another or a component to +// itself recurrently. If batch_idx or beam_idx are omitted, 0 is used. +message LinkFeatures { + // Index into the {step x batch x beam} activations workspace generated by + // the previous computation. + optional int64 batch_idx = 1; + optional int64 beam_idx = 2; + optional int64 step_idx = 3; + + // Values in the original feature space. This is ignored in TensorFlow. + optional int64 feature_value = 4; + + // string-valued name of feature. (Only used for debugging.) + optional string feature_name = 5; +} diff --git a/syntaxnet/dragnn/protos/spec.proto b/syntaxnet/dragnn/protos/spec.proto new file mode 100644 index 0000000000000000000000000000000000000000..e4d4fd31f7c15636d60adfbe2252f95d24449749 --- /dev/null +++ b/syntaxnet/dragnn/protos/spec.proto @@ -0,0 +1,272 @@ +// DRAGNN Configuration proto. See go/dragnn-design for more information. + +syntax = "proto2"; + +package syntaxnet.dragnn; + +// Proto to specify a set of DRAGNN components (transition systems) that are +// trained and evaluated jointly. Each component gets one ComponentSpec. +// +// The order of component is important: a component can only link to components +// that come before (for now.) +// NEXT ID: 6 +message MasterSpec { + repeated ComponentSpec component = 1; + + // Whether to extract debug traces. + optional bool debug_tracing = 4 [default = false]; + + reserved 2, 3, 5; +} + +// Complete specification for a single task. +message ComponentSpec { + // Name for this component: this is used in linked features via the + // "source_component" field. + optional string name = 1; + + // TransitionSystem to use. + optional RegisteredModuleSpec transition_system = 2; + + // Resources that this component depends on. These are copied to TaskInputs + // when calling SAFT code. + repeated Resource resource = 3; + + // Feature space configurations. + repeated FixedFeatureChannel fixed_feature = 4; + repeated LinkedFeatureChannel linked_feature = 5; + + // Neural Network builder specification. + optional RegisteredModuleSpec network_unit = 6; + + // The registered C++ implementation of the dragnn::Component class; e.g. + // "SyntaxNetComponent". + optional RegisteredModuleSpec backend = 7; + + // Number of possible actions from every state. + optional int32 num_actions = 8; + + // Specify the name of the lower level component on which it has attention. + optional string attention_component = 9 [default = ""]; + + // Options for the ComponentBuilder. If this is empty, the regular + // tf.while_loop based builder is assumed. + optional RegisteredModuleSpec component_builder = 10; + + // Default max number of active states for beam training. + optional int32 training_beam_size = 11 [default = 1]; + + // Default max number of active states for beam inference. + optional int32 inference_beam_size = 12 [default = 1]; +} + +// Super generic container for any registered sub-piece of DRAGNN. +message RegisteredModuleSpec { + // Name of the registered class. + optional string registered_name = 1; + + // Parameters to set while initializing this system; these are copied to + // Parameters in a TaskSpec when calling SAFT code, or via kwargs in TF Python + // code. + map parameters = 2; +} + +// Fixed resources that will be converted into TaskInput's when calling SAFT +// code. +message Resource { + optional string name = 1; + repeated Part part = 2; +} + +// The Parts here should be more or less compatible with TaskInput. +message Part { + optional string file_pattern = 1; + optional string file_format = 2; + optional string record_format = 3; +} + +// ------------------------------------------------------------------------ +// Feature specifications. +// +// A *feature channel* is a named collection of feature templates that share an +// embedding matrix. Thus all features in the channel are assumed to use the +// same vocabulary: e.g., words, POS tags, hidden layer activations, etc. These +// are extracted, embedded, and then concatenated together as a group. + +// Specification for a feature channel that is a *fixed* function of the input. +// NEXT_ID: 10 +message FixedFeatureChannel { + // Interpretable name for this feature channel. NN builders might depend on + // this to determine how to hook different channels up internally. + optional string name = 1; + + // String describing the FML for this feature channel. + optional string fml = 2; + + // Size of parameters for this space: + + // Dimensions of embedding space, or -1 if the feature should not be embedded. + optional int32 embedding_dim = 3; + + // No. of possible values returned. + optional int32 vocabulary_size = 4; + + // No. of different feature templates in the channel, i.e. the # of features + // that will be concatenated but share the embedding for this channel. + optional int32 size = 5; + + // Whether the embeddings for this channel should be held constant at their + // pretrained values, instead of being trained. Pretrained embeddings are + // required when true. + optional bool is_constant = 9; + + // Resources for this space: + + // Predicate map for compacting feature values. + optional string predicate_map = 6; + + // Pointer to a pretrained embedding matrix for this feature set. + optional Resource pretrained_embedding_matrix = 7; + + // Vocab file, containing all vocabulary words one per line. + optional Resource vocab = 8; +} + +// Specification for a feature channel that *links* to component +// activations. Note that the "vocabulary" of these features is the activations +// that they are linked to, so it is determined by the other components in the +// spec. +message LinkedFeatureChannel { + // Interpretable name for this feature channel. NN builders might depend on + // this to determine how to hook different channels up internally. + optional string name = 1; + + // Feature function specification. Note: these should all be of type + // LinkedFeatureType. + optional string fml = 2; + + // Embedding dimension, or -1 if the link should not be embedded. + optional int32 embedding_dim = 3; + + // No. of different feature templates in the channel, i.e. the # of features + // that will be concatenated but share the embedding for this channel. + optional int32 size = 4; + + // Component to use for translation, e.g. "tagger" + optional string source_component = 5; + + // Translator target, e.g. "token" or "last_action", to translate raw feature + // values into indices. This must be interpretable by the Component referenced + // by source_component. + optional string source_translator = 6; + + // Layer that these features should connect to. + optional string source_layer = 7; +} + +// A vector of hyperparameter configurations to search over. +message TrainingGridSpec { + // Grid points to search over. + repeated GridPoint grid_point = 1; + + // Training targets to create in the graph builder stage. + repeated TrainTarget target = 2; +} + +// A hyperparameter configuration for a training run. +// NEXT ID: 22 +message GridPoint { + // Global learning rate initialization point. + optional double learning_rate = 1 [default = 0.1]; + + // Momentum coefficient when using MomentumOptimizer. + optional double momentum = 2 [default = 0.9]; + + // Decay rate and base for global learning rate decay. The learning rate is + // reduced by a factor of |decay_base| every |decay_steps|. + optional double decay_base = 16 [default = 0.96]; + optional int32 decay_steps = 3 [default = 1000]; + + // Whether to decay the learning rate in a "staircase" manner. If true, the + // rate is adjusted exactly once every |decay_steps|. Otherwise, the rate is + // adjusted in smaller increments on every step, such that the overall rate of + // decay is still |decay_base| every |decay_steps|. + optional bool decay_staircase = 17 [default = true]; + + // Random seed to initialize parameters. + optional int32 seed = 4 [default = 0]; + + // Specify the optimizer used in training, the default is MomentumOptimizer. + optional string learning_method = 7 [default = 'momentum']; + + // Whether or not to use a moving average of the weights in inference time. + optional bool use_moving_average = 8 [default = false]; + + // Rolling average update co-efficient. + optional double average_weight = 9 [default = 0.9999]; + + // The dropout *keep* probability rate used in the model. 1.0 = no dropout. + optional double dropout_rate = 10 [default = 1.0]; + + // The dropout *keep* probability rate for recurrent connections. If < 0.0, + // recurrent connections should use |dropout_rate| instead. 1.0 = no dropout. + optional double recurrent_dropout_rate = 20 [default = -1.0]; + + // Gradient clipping threshold, applied if greater than zero. A value in the + // range 1-20 seems to work well to prevent large learning rates from causing + // problems for updates at the start of training. + optional double gradient_clip_norm = 11 [default = 0.0]; + + // A spec for using multiple optimization methods. + message CompositeOptimizerSpec { + // First optimizer. + optional GridPoint method1 = 1; + + // Second optimizer. + optional GridPoint method2 = 2; + + // After this number of steps, switch from first to second. + optional int32 switch_after_steps = 3; + } + optional CompositeOptimizerSpec composite_optimizer_spec = 12; + + // Parameters for Adam training. + optional double adam_beta1 = 13 [default = 0.01]; + optional double adam_beta2 = 14 [default = 0.9999]; + optional double adam_eps = 15 [default = 1e-8]; + + // Coefficient for global L2 regularization. + optional double l2_regularization_coefficient = 18 [default = 1e-4]; + + // Coefficient for global self normalization regularization. + // A value of zero turns it off. + optional double self_norm_alpha = 19 [default = 0.0]; + + // Comma separated list of components to which self_norm_alpha + // should be restricted. If left empty, no filtering will take + // place. Typically a single component. + optional string self_norm_components_filter = 21; + + reserved 5, 6; +} + +// Training target to be built into the graph. +message TrainTarget { + // Name for this target. This should be unique across all targets. + optional string name = 1; + + // Specify the weights for different components. This should be the same size + // as the number of components in the spec, or empty (defaults to equal + // weights). Weights are normalized across the components being trained to sum + // to one. + repeated double component_weights = 2; + + // Specify whether to train a component using supervised signal or not. This + // should be the same size as the number of components in the spec, or empty + // (defaults to all true). + repeated bool unroll_using_oracle = 3; + + // Maximum length of the pipeline to train. E.g. if max_index is 1, then only + // the first component will be trained via this target. + optional int32 max_index = 4 [default = -1]; +} diff --git a/syntaxnet/dragnn/protos/trace.proto b/syntaxnet/dragnn/protos/trace.proto new file mode 100644 index 0000000000000000000000000000000000000000..2da051fe93baca72dd8e0e17a80fa5e76986407e --- /dev/null +++ b/syntaxnet/dragnn/protos/trace.proto @@ -0,0 +1,78 @@ +syntax = "proto2"; + +import "dragnn/protos/data.proto"; + + +package syntaxnet.dragnn; + +// Describes single embedding "group", e.g., 'words', 'tags'. Each group shares +// an embedding space. +message FixedFeatureChannelTrace { + // string-valued name of the group, e.g., 'words'. + optional string name = 1; + + // The feature functions active in this embedding group. + repeated FixedFeatures value_trace = 2; +} + +// Trace for an entire linked feature channel. +message LinkedFeatureChannelTrace { + // Name of the embedding space. + optional string name = 1; + + // The component that this feature links to. + optional string source_component = 2; + + // The string-valued name of the translator function that maps a feature value + // to a step index. + optional string source_translator = 3; + + // The name of the layer that we are extracting from the identified step. + optional string source_layer = 4; + + // Individual features within this group. + repeated LinkFeatures value_trace = 5; +} + +// The trace for a single step of a single Component. +message ComponentStepTrace { + // A caption/description to describe this step. This should fit in a graphical + // node rendered to the screen. + optional string caption = 1; + + repeated FixedFeatureChannelTrace fixed_feature_trace = 2; + repeated LinkedFeatureChannelTrace linked_feature_trace = 3; + + // An *HTML-language* representation of the current state. + optional string html_representation = 4; + + // The scores for each potential decision. (The mapping from index to name is + // managed by the component.) + repeated double outcome_score = 5; + + // Set to true once the step is finished. (This allows us to open a step after + // each transition, without having to know if it will be used.) + optional bool step_finished = 6 [default = false]; +} + +// The traces for all steps for a single Component. +message ComponentTrace { + // Name of the component; should match the ComponentSpec. + optional string name = 1; + + // The steps that have been taken by this Component. + repeated ComponentStepTrace step_trace = 2; +} + +// The traces for all Components. +message MasterTrace { + repeated ComponentTrace component_trace = 1; +} + +// Main proto being used to trace parsing. +message DragnnTrace { + + // For each sentence, there is a sequence of state sets storing tracing + // information. + repeated MasterTrace master_trace = 1; +} diff --git a/syntaxnet/dragnn/python/BUILD b/syntaxnet/dragnn/python/BUILD new file mode 100644 index 0000000000000000000000000000000000000000..c784ce4e4155d68e3fc8a41acd0fe7dbbe96179f --- /dev/null +++ b/syntaxnet/dragnn/python/BUILD @@ -0,0 +1,374 @@ +package(default_visibility = ["//visibility:public"]) + +cc_binary( + name = "dragnn_cc_impl.so", + linkopts = select({ + "//conditions:default": ["-lm"], + "@org_tensorflow//tensorflow:darwin": [], + }), + linkshared = 1, + linkstatic = 1, + deps = [ + "//dragnn/components/stateless:stateless_component", + "//dragnn/components/syntaxnet:syntaxnet_component", + "//dragnn/core:dragnn_bulk_ops_cc", + "//dragnn/core:dragnn_ops_cc", + ], +) + +py_library( + name = "load_dragnn_cc_impl_py", + srcs = ["load_dragnn_cc_impl.py"], + data = [":dragnn_cc_impl.so"], +) + +py_library( + name = "bulk_component", + srcs = [ + "bulk_component.py", + ], + deps = [ + ":dragnn_ops", + ":network_units", + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "components", + srcs = [ + "component.py", + ], + deps = [ + ":bulk_component", + ":dragnn_ops", + ":network_units", + "//syntaxnet/util:check", + "//syntaxnet/util:pyregistry", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "composite_optimizer", + srcs = ["composite_optimizer.py"], + deps = [ + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "dragnn_ops", + srcs = ["dragnn_ops.py"], + deps = [], +) + +py_library( + name = "graph_builder", + srcs = ["graph_builder.py"], + deps = [ + ":biaffine_units", + ":components", + ":composite_optimizer", + ":dragnn_ops", + ":network_units", + ":wrapped_units", + "//dragnn/protos:spec_py_pb2", + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "network_units", + srcs = ["network_units.py"], + deps = [ + ":dragnn_ops", + "//syntaxnet/util:check", + "//syntaxnet/util:pyregistry", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "render_parse_tree_graphviz", + srcs = ["render_parse_tree_graphviz.py"], + deps = [ + ], +) + +py_test( + name = "render_parse_tree_graphviz_test", + srcs = ["render_parse_tree_graphviz_test.py"], + deps = [ + ":render_parse_tree_graphviz", + "//syntaxnet:sentence_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "render_spec_with_graphviz", + srcs = ["render_spec_with_graphviz.py"], + deps = [ + "//dragnn/protos:spec_py_pb2", + ], +) + +py_test( + name = "render_spec_with_graphviz_test", + srcs = ["render_spec_with_graphviz_test.py"], + deps = [ + ":render_spec_with_graphviz", + ":spec_builder", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "sentence_io", + srcs = ["sentence_io.py"], + deps = [ + "//syntaxnet:parser_ops", + ], +) + +py_binary( + name = "visualization", + srcs = ["visualization.py"], + data = [ + "//dragnn/viz:viz-min-js-gz", + ], + deps = [ + "//dragnn/protos:trace_py_pb2", + ], +) + +py_test( + name = "visualization_test", + srcs = ["visualization_test.py"], + deps = [ + ":visualization", + "//dragnn/protos:spec_py_pb2", + "//dragnn/protos:trace_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "wrapped_units", + srcs = ["wrapped_units.py"], + deps = [ + ":network_units", + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +# Tests + +py_test( + name = "bulk_component_test", + srcs = [ + "bulk_component_test.py", + ], + deps = [ + ":bulk_component", + ":components", + ":dragnn_ops", + ":load_dragnn_cc_impl_py", + ":network_units", + "//dragnn/core:dragnn_bulk_ops", + "//dragnn/core:dragnn_ops", + "//dragnn/protos:spec_py_pb2", + "//syntaxnet:load_parser_ops_py", + "//syntaxnet:sentence_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_test( + name = "composite_optimizer_test", + srcs = ["composite_optimizer_test.py"], + deps = [ + ":composite_optimizer", + ":load_dragnn_cc_impl_py", + "//dragnn/core:dragnn_bulk_ops", + "//dragnn/core:dragnn_ops", + "//syntaxnet:load_parser_ops_py", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_test( + name = "graph_builder_test", + size = "large", + srcs = ["graph_builder_test.py"], + data = [ + "//dragnn/core:testdata", + ], + tags = [ + "notsan", + ], + deps = [ + ":dragnn_ops", + ":graph_builder", + ":load_dragnn_cc_impl_py", + "//dragnn/core:dragnn_bulk_ops", + "//dragnn/core:dragnn_ops", + "//dragnn/protos:spec_py_pb2", + "//dragnn/protos:trace_py_pb2", + "//syntaxnet:load_parser_ops_py", + "//syntaxnet:sentence_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_test( + name = "network_units_test", + size = "small", + srcs = ["network_units_test.py"], + deps = [ + ":load_dragnn_cc_impl_py", + ":network_units", + "//dragnn/core:dragnn_bulk_ops", + "//dragnn/core:dragnn_ops", + "//dragnn/protos:spec_py_pb2", + "//syntaxnet:load_parser_ops_py", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_test( + name = "sentence_io_test", + srcs = ["sentence_io_test.py"], + data = ["//syntaxnet:testdata"], + deps = [ + ":sentence_io", + "//syntaxnet:load_parser_ops_py", + "//syntaxnet:parser_ops", + "//syntaxnet:sentence_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "trainer_lib", + srcs = ["trainer_lib.py"], + deps = [ + "//dragnn/protos:spec_py_pb2", + "//syntaxnet:parser_ops", + "//syntaxnet:sentence_py_pb2", + "//syntaxnet:task_spec_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + "@org_tensorflow//tensorflow/core:protos_all_py", + ], +) + +py_library( + name = "lexicon", + srcs = ["lexicon.py"], + deps = [ + "//syntaxnet:parser_ops", + "//syntaxnet:task_spec_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_test( + name = "lexicon_test", + srcs = ["lexicon_test.py"], + deps = [ + ":lexicon", + "//syntaxnet:load_parser_ops_py", + "//syntaxnet:parser_ops", + "//syntaxnet:parser_trainer", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "evaluation", + srcs = ["evaluation.py"], + deps = [ + "//syntaxnet:sentence_py_pb2", + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_test( + name = "evaluation_test", + srcs = ["evaluation_test.py"], + deps = [ + ":evaluation", + "//syntaxnet:sentence_py_pb2", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "spec_builder", + srcs = ["spec_builder.py"], + deps = [ + ":lexicon", + "//dragnn/protos:spec_py_pb2", + "//syntaxnet:parser_ops", + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_test( + name = "spec_builder_test", + srcs = ["spec_builder_test.py"], + deps = [ + ":spec_builder", + "//dragnn/protos:spec_py_pb2", + "//syntaxnet:load_parser_ops_py", + "//syntaxnet:parser_ops", + "//syntaxnet:parser_trainer", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "digraph_ops", + srcs = ["digraph_ops.py"], + deps = [ + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_test( + name = "digraph_ops_test", + srcs = ["digraph_ops_test.py"], + deps = [ + ":digraph_ops", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) + +py_library( + name = "biaffine_units", + srcs = ["biaffine_units.py"], + deps = [ + ":digraph_ops", + ":network_units", + "//syntaxnet/util:check", + "@org_tensorflow//tensorflow:tensorflow_py", + ], +) diff --git a/syntaxnet/dragnn/python/biaffine_units.py b/syntaxnet/dragnn/python/biaffine_units.py new file mode 100644 index 0000000000000000000000000000000000000000..c34a2ed6a3c6dfb7117d8f8299dffdb757c0b469 --- /dev/null +++ b/syntaxnet/dragnn/python/biaffine_units.py @@ -0,0 +1,255 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Network units used in the Dozat and Manning (2017) biaffine parser.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + +from dragnn.python import digraph_ops +from dragnn.python import network_units +from syntaxnet.util import check + + +class BiaffineDigraphNetwork(network_units.NetworkUnitInterface): + """Network unit that computes biaffine digraph scores. + + The D&M parser uses two MLPs to create two activation vectors for each token, + which represent the token when it it used as the source or target of an arc. + Arcs are scored using a "biaffine" function that includes a bilinear and + linear term: + + sources[s] * arc_weights * targets[t] + sources[s] * source_weights + + The digraph is "unlabeled" in that there is at most one arc between any pair + of tokens. If labels are required, the BiaffineLabelNetwork can be used to + label a set of selected arcs. + + Note that in the typical use case where the source and target activations are + the same dimension and are produced by single-layer MLPs, it is arithmetically + equivalent to produce the source and target activations using a single MLP of + twice the size, and then split those activations in half. The |SplitNetwork| + can be used for this purpose. + + Parameters: + None. + + Features: + sources: [B * N, S] matrix of batched activations for source tokens. + targets: [B * N, T] matrix of batched activations for target tokens. + + Layers: + adjacency: [B * N, N] matrix where entry b*N+s,t is the score of the arc + from s to t in batch b, if s != t, or the score for selecting t + as a root, if s == t. + """ + + def __init__(self, component): + """Initializes weights and layers. + + Args: + component: Parent ComponentBuilderBase object. + """ + super(BiaffineDigraphNetwork, self).__init__(component) + + check.Eq(len(self._fixed_feature_dims.items()), 0, + 'Expected no fixed features') + check.Eq(len(self._linked_feature_dims.items()), 2, + 'Expected two linked features') + + check.In('sources', self._linked_feature_dims, + 'Missing required linked feature') + check.In('targets', self._linked_feature_dims, + 'Missing required linked feature') + self._source_dim = self._linked_feature_dims['sources'] + self._target_dim = self._linked_feature_dims['targets'] + + # TODO(googleuser): Make parameter initialization configurable. + self._weights = [] + self._weights.append(tf.get_variable( + 'weights_arc', [self._source_dim, self._target_dim], tf.float32, + tf.random_normal_initializer(stddev=1e-4))) + self._weights.append(tf.get_variable( + 'weights_source', [self._source_dim], tf.float32, + tf.random_normal_initializer(stddev=1e-4))) + self._weights.append(tf.get_variable( + 'root', [self._source_dim], tf.float32, + tf.random_normal_initializer(stddev=1e-4))) + + self._params.extend(self._weights) + self._regularized_weights.extend(self._weights) + + # Negative Layer.dim indicates that the dimension is dynamic. + self._layers.append(network_units.Layer(self, 'adjacency', -1)) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """Requires |stride|; otherwise see base class.""" + check.NotNone(stride, + 'BiaffineDigraphNetwork requires "stride" and must be called ' + 'in the bulk feature extractor component.') + + # TODO(googleuser): Add dropout during training. + del during_training + + # Retrieve (possibly averaged) weights. + weights_arc = self._component.get_variable('weights_arc') + weights_source = self._component.get_variable('weights_source') + root = self._component.get_variable('root') + + # Extract the source and target token activations. Use |stride| to collapse + # batch and beam into a single dimension. + sources = network_units.lookup_named_tensor('sources', linked_embeddings) + targets = network_units.lookup_named_tensor('targets', linked_embeddings) + source_tokens_bxnxs = tf.reshape(sources.tensor, + [stride, -1, self._source_dim]) + target_tokens_bxnxt = tf.reshape(targets.tensor, + [stride, -1, self._target_dim]) + num_tokens = tf.shape(source_tokens_bxnxs)[1] + + # Compute the arc, source, and root potentials. + arcs_bxnxn = digraph_ops.ArcPotentialsFromTokens( + source_tokens_bxnxs, target_tokens_bxnxt, weights_arc) + sources_bxnxn = digraph_ops.ArcSourcePotentialsFromTokens( + source_tokens_bxnxs, weights_source) + roots_bxn = digraph_ops.RootPotentialsFromTokens( + root, target_tokens_bxnxt, weights_arc) + + # Combine them into a single matrix with the roots on the diagonal. + adjacency_bxnxn = digraph_ops.CombineArcAndRootPotentials( + arcs_bxnxn + sources_bxnxn, roots_bxn) + + return [tf.reshape(adjacency_bxnxn, [-1, num_tokens])] + + +class BiaffineLabelNetwork(network_units.NetworkUnitInterface): + """Network unit that computes biaffine label scores. + + D&M parser uses a slightly modified version of the arc scoring function to + score labels. The differences are: + + 1. Each label has its own source and target MLPs and biaffine weights. + 2. A linear term for the target token is added. + 3. A bias term is added. + + Parameters: + num_labels: The number of dependency labels, L. + + Features: + sources: [B * N, S] matrix of batched activations for source tokens. + targets: [B * N, T] matrix of batched activations for target tokens. + + Layers: + labels: [B * N, L] matrix where entry b*N+t,l is the score of the label of + the inbound arc for token t in batch b. + """ + + def __init__(self, component): + """Initializes weights and layers. + + Args: + component: Parent ComponentBuilderBase object. + """ + super(BiaffineLabelNetwork, self).__init__(component) + + parameters = component.spec.network_unit.parameters + self._num_labels = int(parameters['num_labels']) + + check.Gt(self._num_labels, 0, 'Expected some labels') + check.Eq(len(self._fixed_feature_dims.items()), 0, + 'Expected no fixed features') + check.Eq(len(self._linked_feature_dims.items()), 2, + 'Expected two linked features') + + check.In('sources', self._linked_feature_dims, + 'Missing required linked feature') + check.In('targets', self._linked_feature_dims, + 'Missing required linked feature') + + self._source_dim = self._linked_feature_dims['sources'] + self._target_dim = self._linked_feature_dims['targets'] + + # TODO(googleuser): Make parameter initialization configurable. + self._weights = [] + self._weights.append(tf.get_variable( + 'weights_pair', [self._num_labels, self._source_dim, self._target_dim], + tf.float32, tf.random_normal_initializer(stddev=1e-4))) + self._weights.append(tf.get_variable( + 'weights_source', [self._num_labels, self._source_dim], tf.float32, + tf.random_normal_initializer(stddev=1e-4))) + self._weights.append(tf.get_variable( + 'weights_target', [self._num_labels, self._target_dim], tf.float32, + tf.random_normal_initializer(stddev=1e-4))) + + self._biases = [] + self._biases.append(tf.get_variable( + 'biases', [self._num_labels], tf.float32, + tf.random_normal_initializer(stddev=1e-4))) + + self._params.extend(self._weights + self._biases) + self._regularized_weights.extend(self._weights) + + self._layers.append(network_units.Layer(self, 'labels', self._num_labels)) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """Requires |stride|; otherwise see base class.""" + check.NotNone(stride, + 'BiaffineLabelNetwork requires "stride" and must be called ' + 'in the bulk feature extractor component.') + + # TODO(googleuser): Add dropout during training. + del during_training + + # Retrieve (possibly averaged) weights. + weights_pair = self._component.get_variable('weights_pair') + weights_source = self._component.get_variable('weights_source') + weights_target = self._component.get_variable('weights_target') + biases = self._component.get_variable('biases') + + # Extract and shape the source and target token activations. Use |stride| + # to collapse batch and beam into a single dimension. + sources = network_units.lookup_named_tensor('sources', linked_embeddings) + targets = network_units.lookup_named_tensor('targets', linked_embeddings) + sources_bxnxs = tf.reshape(sources.tensor, [stride, -1, self._source_dim]) + targets_bxnxt = tf.reshape(targets.tensor, [stride, -1, self._target_dim]) + + # Compute the pair, source, and target potentials. + pairs_bxnxl = digraph_ops.LabelPotentialsFromTokenPairs(sources_bxnxs, + targets_bxnxt, + weights_pair) + sources_bxnxl = digraph_ops.LabelPotentialsFromTokens(sources_bxnxs, + weights_source) + targets_bxnxl = digraph_ops.LabelPotentialsFromTokens(targets_bxnxt, + weights_target) + + # Combine them with the biases. + labels_bxnxl = pairs_bxnxl + sources_bxnxl + targets_bxnxl + biases + + # Flatten out the batch dimension. + return [tf.reshape(labels_bxnxl, [-1, self._num_labels])] diff --git a/syntaxnet/dragnn/python/bulk_component.py b/syntaxnet/dragnn/python/bulk_component.py new file mode 100644 index 0000000000000000000000000000000000000000..f00ac92fed7a2b914d2dcbf4071c30c1b9d92ffc --- /dev/null +++ b/syntaxnet/dragnn/python/bulk_component.py @@ -0,0 +1,477 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Component builders for non-recurrent networks in DRAGNN.""" + + +import tensorflow as tf +from tensorflow.python.platform import tf_logging as logging + +from dragnn.python import component +from dragnn.python import dragnn_ops +from dragnn.python import network_units +from syntaxnet.util import check + + +def fetch_linked_embedding(comp, network_states, feature_spec): + """Looks up linked embeddings in other components. + + Args: + comp: ComponentBuilder object with respect to which the feature is to be + fetched + network_states: dictionary of NetworkState objects + feature_spec: FeatureSpec proto for the linked feature to be looked up + + Returns: + NamedTensor containing the linked feature tensor + + Raises: + NotImplementedError: if a linked feature with source translator other than + 'identity' is configured. + RuntimeError: if a recurrent linked feature is configured. + """ + if feature_spec.source_translator != 'identity': + raise NotImplementedError(feature_spec.source_translator) + if feature_spec.source_component == comp.name: + raise RuntimeError( + 'Recurrent linked features are not supported in bulk extraction.') + tf.logging.info('[%s] Adding linked feature "%s"', comp.name, + feature_spec.name) + source = comp.master.lookup_component[feature_spec.source_component] + + return network_units.NamedTensor( + network_states[source.name].activations[ + feature_spec.source_layer].bulk_tensor, + feature_spec.name) + + +def _validate_embedded_fixed_features(comp): + """Checks that the embedded fixed features of |comp| are set up properly.""" + for feature in comp.spec.fixed_feature: + check.Gt(feature.embedding_dim, 0, + 'Embeddings requested for non-embedded feature: %s' % feature) + if feature.is_constant: + check.IsTrue(feature.HasField('pretrained_embedding_matrix'), + 'Constant embeddings must be pretrained: %s' % feature) + + +def fetch_differentiable_fixed_embeddings(comp, state, stride): + """Looks up fixed features with separate, differentiable, embedding lookup. + + Args: + comp: Component whose fixed features we wish to look up. + state: live MasterState object for the component. + stride: Tensor containing current batch * beam size. + + Returns: + state handle: updated state handle to be used after this call + fixed_embeddings: list of NamedTensor objects + """ + _validate_embedded_fixed_features(comp) + num_channels = len(comp.spec.fixed_feature) + if not num_channels: + return state.handle, [] + + state.handle, indices, ids, weights, num_steps = ( + dragnn_ops.bulk_fixed_features( + state.handle, component=comp.name, num_channels=num_channels)) + fixed_embeddings = [] + for channel, feature_spec in enumerate(comp.spec.fixed_feature): + differentiable_or_constant = ('constant' if feature_spec.is_constant else + 'differentiable') + tf.logging.info('[%s] Adding %s fixed feature "%s"', comp.name, + differentiable_or_constant, feature_spec.name) + size = stride * num_steps * feature_spec.size + fixed_embedding = network_units.embedding_lookup( + comp.get_variable(network_units.fixed_embeddings_name(channel)), + indices[channel], ids[channel], weights[channel], size) + if feature_spec.is_constant: + fixed_embedding = tf.stop_gradient(fixed_embedding) + fixed_embeddings.append( + network_units.NamedTensor(fixed_embedding, feature_spec.name)) + + return state.handle, fixed_embeddings + + +def fetch_fast_fixed_embeddings(comp, state): + """Looks up fixed features with fast, non-differentiable, op. + + Since BulkFixedEmbeddings is non-differentiable with respect to the + embeddings, the idea is to call this function only when the graph is + not being used for training. + + Args: + comp: Component whose fixed features we wish to look up. + state: live MasterState object for the component. + + Returns: + state handle: updated state handle to be used after this call + fixed_embeddings: list of NamedTensor objects + """ + _validate_embedded_fixed_features(comp) + num_channels = len(comp.spec.fixed_feature) + if not num_channels: + return state.handle, [] + tf.logging.info('[%s] Adding %d fast fixed features', comp.name, num_channels) + + state.handle, bulk_embeddings, _ = dragnn_ops.bulk_fixed_embeddings( + state.handle, [ + comp.get_variable(network_units.fixed_embeddings_name(c)) + for c in range(num_channels) + ], + component=comp.name) + + bulk_embeddings = network_units.NamedTensor(bulk_embeddings, + 'bulk-%s-fixed-features' % + comp.name) + return state.handle, [bulk_embeddings] + + +def extract_fixed_feature_ids(comp, state, stride): + """Extracts fixed feature IDs. + + Args: + comp: Component whose fixed feature IDs we wish to extract. + state: Live MasterState object for the component. + stride: Tensor containing current batch * beam size. + + Returns: + state handle: Updated state handle to be used after this call. + ids: List of [stride * num_steps, 1] feature IDs per channel. Missing IDs + (e.g., due to batch padding) are set to -1. + """ + num_channels = len(comp.spec.fixed_feature) + if not num_channels: + return state.handle, [] + + for feature_spec in comp.spec.fixed_feature: + check.Eq(feature_spec.size, 1, 'All features must have size=1') + check.Lt(feature_spec.embedding_dim, 0, 'All features must be non-embedded') + + state.handle, indices, ids, _, num_steps = dragnn_ops.bulk_fixed_features( + state.handle, component=comp.name, num_channels=num_channels) + size = stride * num_steps + + fixed_ids = [] + for channel, feature_spec in enumerate(comp.spec.fixed_feature): + tf.logging.info('[%s] Adding fixed feature IDs "%s"', comp.name, + feature_spec.name) + + # The +1 and -1 increments ensure that missing IDs default to -1. + # + # TODO(googleuser): This formula breaks if multiple IDs are extracted at some + # step. Try using tf.unique() to enforce the unique-IDS precondition. + sums = tf.unsorted_segment_sum(ids[channel] + 1, indices[channel], size) - 1 + sums = tf.expand_dims(sums, axis=1) + fixed_ids.append(network_units.NamedTensor(sums, feature_spec.name, dim=1)) + return state.handle, fixed_ids + + +def update_network_states(comp, tensors, network_states, stride): + """Stores Tensor objects corresponding to layer outputs. + + For use in subsequent tasks. + + Args: + comp: Component for which the tensor handles are being stored. + tensors: list of Tensors to store + network_states: dictionary of component NetworkState objects + stride: stride of the stored tensor. + """ + network_state = network_states[comp.name] + with tf.name_scope(comp.name + '/stored_act'): + for index, network_tensor in enumerate(tensors): + network_state.activations[comp.network.layers[index].name] = ( + network_units.StoredActivations(tensor=network_tensor, stride=stride, + dim=comp.network.layers[index].dim)) + + +def build_cross_entropy_loss(logits, gold): + """Constructs a cross entropy from logits and one-hot encoded gold labels. + + Supports skipping rows where the gold label is the magic -1 value. + + Args: + logits: float Tensor of scores. + gold: int Tensor of one-hot labels. + + Returns: + cost, correct, total: the total cost, the total number of correctly + predicted labels, and the total number of valid labels. + """ + valid = tf.reshape(tf.where(tf.greater(gold, -1)), [-1]) + gold = tf.gather(gold, valid) + logits = tf.gather(logits, valid) + correct = tf.reduce_sum(tf.to_int32(tf.nn.in_top_k(logits, gold, 1))) + total = tf.size(gold) + cost = tf.reduce_sum( + tf.contrib.nn.deprecated_flipped_sparse_softmax_cross_entropy_with_logits( + logits, tf.cast(gold, tf.int64))) / tf.cast(total, tf.float32) + return cost, correct, total + + +class BulkFeatureExtractorComponentBuilder(component.ComponentBuilderBase): + """A component builder to bulk extract features. + + Both fixed and linked features are supported, with some restrictions: + 1. Fixed features may not be recurrent. Fixed features are extracted along the + gold path, which does not work during inference. + 2. Linked features may not be recurrent and are 'untranslated'. For now, + linked features are extracted without passing them through any transition + system or source translator. + """ + + def build_greedy_training(self, state, network_states): + """Extracts features and advances a batch using the oracle path. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects + + Returns: + state handle: final state after advancing + cost: regularization cost, possibly associated with embedding matrices + correct: since no gold path is available, 0. + total: since no gold path is available, 0. + """ + logging.info('Building component: %s', self.spec.name) + stride = state.current_batch_size * self.training_beam_size + with tf.variable_scope(self.name, reuse=True): + state.handle, fixed_embeddings = fetch_differentiable_fixed_embeddings( + self, state, stride) + + linked_embeddings = [ + fetch_linked_embedding(self, network_states, spec) + for spec in self.spec.linked_feature + ] + + with tf.variable_scope(self.name, reuse=True): + tensors = self.network.create( + fixed_embeddings, linked_embeddings, None, None, True, stride=stride) + update_network_states(self, tensors, network_states, stride) + cost = self.add_regularizer(tf.constant(0.)) + + correct, total = tf.constant(0), tf.constant(0) + return state.handle, cost, correct, total + + def build_greedy_inference(self, state, network_states, + during_training=False): + """Extracts features and advances a batch using the oracle path. + + NOTE(danielandor) For now this method cannot be called during training. + That is to say, unroll_using_oracle for this component must be set to true. + This will be fixed by separating train_with_oracle and train_with_inference. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects + during_training: whether the graph is being constructed during training + + Returns: + state handle: final state after advancing + """ + logging.info('Building component: %s', self.spec.name) + if during_training: + stride = state.current_batch_size * self.training_beam_size + else: + stride = state.current_batch_size * self.inference_beam_size + + with tf.variable_scope(self.name, reuse=True): + if during_training: + state.handle, fixed_embeddings = fetch_differentiable_fixed_embeddings( + self, state, stride) + else: + state.handle, fixed_embeddings = fetch_fast_fixed_embeddings(self, + state) + + linked_embeddings = [ + fetch_linked_embedding(self, network_states, spec) + for spec in self.spec.linked_feature + ] + + with tf.variable_scope(self.name, reuse=True): + tensors = self.network.create( + fixed_embeddings, + linked_embeddings, + None, + None, + during_training=during_training, + stride=stride) + + update_network_states(self, tensors, network_states, stride) + return state.handle + + +class BulkFeatureIdExtractorComponentBuilder(component.ComponentBuilderBase): + """A component builder to bulk extract feature IDs. + + This is a variant of BulkFeatureExtractorComponentBuilder that only supports + fixed features, and extracts raw feature IDs instead of feature embeddings. + Since the extracted feature IDs are integers, the results produced by this + component are in general not differentiable. + """ + + def __init__(self, master, component_spec): + """Initializes the feature ID extractor component. + + Args: + master: dragnn.MasterBuilder object. + component_spec: dragnn.ComponentSpec proto to be built. + """ + super(BulkFeatureIdExtractorComponentBuilder, self).__init__( + master, component_spec) + check.Eq(len(self.spec.linked_feature), 0, 'Linked features are forbidden') + for feature_spec in self.spec.fixed_feature: + check.Lt(feature_spec.embedding_dim, 0, + 'Features must be non-embedded: %s' % feature_spec) + + def build_greedy_training(self, state, network_states): + """See base class.""" + state.handle = self._extract_feature_ids(state, network_states, True) + cost = self.add_regularizer(tf.constant(0.)) + correct, total = tf.constant(0), tf.constant(0) + return state.handle, cost, correct, total + + def build_greedy_inference(self, state, network_states, + during_training=False): + """See base class.""" + return self._extract_feature_ids(state, network_states, during_training) + + def _extract_feature_ids(self, state, network_states, during_training): + """Extracts feature IDs and advances a batch using the oracle path. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: Dictionary of component NetworkState objects. + during_training: Whether the graph is being constructed during training. + + Returns: + state handle: Final state after advancing. + """ + logging.info('Building component: %s', self.spec.name) + + if during_training: + stride = state.current_batch_size * self.training_beam_size + else: + stride = state.current_batch_size * self.inference_beam_size + + with tf.variable_scope(self.name, reuse=True): + state.handle, ids = extract_fixed_feature_ids(self, state, stride) + + with tf.variable_scope(self.name, reuse=True): + tensors = self.network.create( + ids, [], None, None, during_training, stride=stride) + update_network_states(self, tensors, network_states, stride) + return state.handle + + +class BulkAnnotatorComponentBuilder(component.ComponentBuilderBase): + """A component builder to bulk annotate or compute the cost of a gold path. + + This component can be used with features that don't depend on the + transition system state. + + Since no feature extraction is performed, only non-recurrent + 'identity' linked features are supported. + + If a FeedForwardNetwork is configured with no hidden units, this component + acts as a 'bulk softmax' component. + """ + + def build_greedy_training(self, state, network_states): + """Advances a batch using oracle paths, returning the overall CE cost. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects + + Returns: + (state handle, cost, correct, total): TF ops corresponding to the final + state after unrolling, the total cost, the total number of correctly + predicted actions, and the total number of actions. + + Raises: + RuntimeError: if fixed features are configured. + """ + logging.info('Building component: %s', self.spec.name) + if self.spec.fixed_feature: + raise RuntimeError( + 'Fixed features are not compatible with bulk annotation. ' + 'Use the "bulk-features" component instead.') + linked_embeddings = [ + fetch_linked_embedding(self, network_states, spec) + for spec in self.spec.linked_feature + ] + + stride = state.current_batch_size * self.training_beam_size + with tf.variable_scope(self.name, reuse=True): + network_tensors = self.network.create([], linked_embeddings, None, None, + True, stride) + + update_network_states(self, network_tensors, network_states, stride) + + logits = self.network.get_logits(network_tensors) + state.handle, gold = dragnn_ops.bulk_advance_from_oracle( + state.handle, component=self.name) + + cost, correct, total = build_cross_entropy_loss(logits, gold) + cost = self.add_regularizer(cost) + + return state.handle, cost, correct, total + + def build_greedy_inference(self, state, network_states, + during_training=False): + """Annotates a batch of documents using network scores. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects + during_training: whether the graph is being constructed during training + + Returns: + Handle to the state once inference is complete for this Component. + + Raises: + RuntimeError: if fixed features are configured + """ + logging.info('Building component: %s', self.spec.name) + if self.spec.fixed_feature: + raise RuntimeError( + 'Fixed features are not compatible with bulk annotation. ' + 'Use the "bulk-features" component instead.') + linked_embeddings = [ + fetch_linked_embedding(self, network_states, spec) + for spec in self.spec.linked_feature + ] + + if during_training: + stride = state.current_batch_size * self.training_beam_size + else: + stride = state.current_batch_size * self.inference_beam_size + + with tf.variable_scope(self.name, reuse=True): + network_tensors = self.network.create( + [], linked_embeddings, None, None, during_training, stride) + + update_network_states(self, network_tensors, network_states, stride) + + logits = self.network.get_logits(network_tensors) + return dragnn_ops.bulk_advance_from_prediction( + state.handle, logits, component=self.name) diff --git a/syntaxnet/dragnn/python/bulk_component_test.py b/syntaxnet/dragnn/python/bulk_component_test.py new file mode 100644 index 0000000000000000000000000000000000000000..5db5f0565421f715c322fcbef2307a83f3e33d2a --- /dev/null +++ b/syntaxnet/dragnn/python/bulk_component_test.py @@ -0,0 +1,478 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for bulk_component. + +Verifies that: +1. BulkFeatureExtractor and BulkAnnotator both raise NotImplementedError when + non-identity translator configured. +2. BulkFeatureExtractor and BulkAnnotator both raise RuntimeError when + recurrent linked features are configured. +3. BulkAnnotator raises RuntimeError when fixed features are configured. +4. BulkFeatureIdExtractor raises ValueError when linked features are configured, + or when the fixed features are invalid. +""" + +import os.path + + +import tensorflow as tf + +from tensorflow.python.framework import test_util +from tensorflow.python.platform import googletest +from google.protobuf import text_format + +from dragnn.protos import spec_pb2 +from dragnn.python import bulk_component +from dragnn.python import component +from dragnn.python import dragnn_ops +from dragnn.python import network_units +from syntaxnet import sentence_pb2 + +import dragnn.python.load_dragnn_cc_impl +import syntaxnet.load_parser_ops + +FLAGS = tf.app.flags.FLAGS + + +class MockNetworkUnit(object): + + def get_layer_size(self, unused_layer_name): + return 64 + + +class MockComponent(object): + + def __init__(self): + self.name = 'mock' + self.network = MockNetworkUnit() + + +class MockMaster(object): + + def __init__(self): + self.spec = spec_pb2.MasterSpec() + self.hyperparams = spec_pb2.GridPoint() + self.lookup_component = {'mock': MockComponent()} + + +def _create_fake_corpus(): + """Returns a list of fake serialized sentences for tests.""" + num_docs = 4 + corpus = [] + for num_tokens in range(1, num_docs + 1): + sentence = sentence_pb2.Sentence() + sentence.text = 'x' * num_tokens + for i in range(num_tokens): + token = sentence.token.add() + token.word = 'x' + token.start = i + token.end = i + corpus.append(sentence.SerializeToString()) + return corpus + + +class BulkComponentTest(test_util.TensorFlowTestCase): + + def setUp(self): + self.master = MockMaster() + self.master_state = component.MasterState( + handle='handle', current_batch_size=2) + self.network_states = { + 'mock': component.NetworkState(), + 'test': component.NetworkState(), + } + + def testFailsOnNonIdentityTranslator(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + linked_feature { + name: "features" embedding_dim: -1 size: 1 + source_translator: "history" + source_component: "mock" + } + """, component_spec) + + # For feature extraction: + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureExtractorComponentBuilder( + self.master, component_spec) + + # Expect feature extraction to generate a error due to the "history" + # translator. + with self.assertRaises(NotImplementedError): + comp.build_greedy_training(self.master_state, self.network_states) + + # As well as annotation: + with tf.Graph().as_default(): + comp = bulk_component.BulkAnnotatorComponentBuilder( + self.master, component_spec) + + with self.assertRaises(NotImplementedError): + comp.build_greedy_training(self.master_state, self.network_states) + + def testFailsOnRecurrentLinkedFeature(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "FeedForwardNetwork" + parameters { + key: 'hidden_layer_sizes' value: '64' + } + } + linked_feature { + name: "features" embedding_dim: -1 size: 1 + source_translator: "identity" + source_component: "test" + source_layer: "layer_0" + } + """, component_spec) + + # For feature extraction: + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureExtractorComponentBuilder( + self.master, component_spec) + + # Expect feature extraction to generate a error due to the "history" + # translator. + with self.assertRaises(RuntimeError): + comp.build_greedy_training(self.master_state, self.network_states) + + # As well as annotation: + with tf.Graph().as_default(): + comp = bulk_component.BulkAnnotatorComponentBuilder( + self.master, component_spec) + + with self.assertRaises(RuntimeError): + comp.build_greedy_training(self.master_state, self.network_states) + + def testConstantFixedFeatureFailsIfNotPretrained(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed" embedding_dim: 32 size: 1 + is_constant: true + } + component_builder { + registered_name: "bulk_component.BulkFeatureExtractorComponentBuilder" + } + """, component_spec) + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureExtractorComponentBuilder( + self.master, component_spec) + + with self.assertRaisesRegexp(ValueError, + 'Constant embeddings must be pretrained'): + comp.build_greedy_training(self.master_state, self.network_states) + with self.assertRaisesRegexp(ValueError, + 'Constant embeddings must be pretrained'): + comp.build_greedy_inference( + self.master_state, self.network_states, during_training=True) + with self.assertRaisesRegexp(ValueError, + 'Constant embeddings must be pretrained'): + comp.build_greedy_inference( + self.master_state, self.network_states, during_training=False) + + def testNormalFixedFeaturesAreDifferentiable(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed" embedding_dim: 32 size: 1 + pretrained_embedding_matrix { part {} } + vocab { part {} } + } + component_builder { + registered_name: "bulk_component.BulkFeatureExtractorComponentBuilder" + } + """, component_spec) + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureExtractorComponentBuilder( + self.master, component_spec) + + # Get embedding matrix variables. + with tf.variable_scope(comp.name, reuse=True): + fixed_embedding_matrix = tf.get_variable( + network_units.fixed_embeddings_name(0)) + + # Get output layer. + comp.build_greedy_training(self.master_state, self.network_states) + activations = self.network_states[comp.name].activations + outputs = activations[comp.network.layers[0].name].bulk_tensor + + # Compute the gradient of the output layer w.r.t. the embedding matrix. + # This should be well-defined for in the normal case. + gradients = tf.gradients(outputs, fixed_embedding_matrix) + self.assertEqual(len(gradients), 1) + self.assertFalse(gradients[0] is None) + + def testConstantFixedFeaturesAreNotDifferentiableButOthersAre(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "constant" embedding_dim: 32 size: 1 + is_constant: true + pretrained_embedding_matrix { part {} } + vocab { part {} } + } + fixed_feature { + name: "trainable" embedding_dim: 32 size: 1 + pretrained_embedding_matrix { part {} } + vocab { part {} } + } + component_builder { + registered_name: "bulk_component.BulkFeatureExtractorComponentBuilder" + } + """, component_spec) + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureExtractorComponentBuilder( + self.master, component_spec) + + # Get embedding matrix variables. + with tf.variable_scope(comp.name, reuse=True): + constant_embedding_matrix = tf.get_variable( + network_units.fixed_embeddings_name(0)) + trainable_embedding_matrix = tf.get_variable( + network_units.fixed_embeddings_name(1)) + + # Get output layer. + comp.build_greedy_training(self.master_state, self.network_states) + activations = self.network_states[comp.name].activations + outputs = activations[comp.network.layers[0].name].bulk_tensor + + # The constant embeddings are non-differentiable. + constant_gradients = tf.gradients(outputs, constant_embedding_matrix) + self.assertEqual(len(constant_gradients), 1) + self.assertTrue(constant_gradients[0] is None) + + # The trainable embeddings are differentiable. + trainable_gradients = tf.gradients(outputs, trainable_embedding_matrix) + self.assertEqual(len(trainable_gradients), 1) + self.assertFalse(trainable_gradients[0] is None) + + def testFailsOnFixedFeature(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "annotate" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed" embedding_dim: 32 size: 1 + } + """, component_spec) + with tf.Graph().as_default(): + comp = bulk_component.BulkAnnotatorComponentBuilder( + self.master, component_spec) + + # Expect feature extraction to generate a runtime error due to the + # fixed feature. + with self.assertRaises(RuntimeError): + comp.build_greedy_training(self.master_state, self.network_states) + + def testBulkFeatureIdExtractorOkWithOneFixedFeature(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed" embedding_dim: -1 size: 1 + } + """, component_spec) + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureIdExtractorComponentBuilder( + self.master, component_spec) + + # Should not raise errors. + self.network_states[component_spec.name] = component.NetworkState() + comp.build_greedy_training(self.master_state, self.network_states) + self.network_states[component_spec.name] = component.NetworkState() + comp.build_greedy_inference(self.master_state, self.network_states) + + def testBulkFeatureIdExtractorFailsOnLinkedFeature(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed" embedding_dim: -1 size: 1 + } + linked_feature { + name: "linked" embedding_dim: -1 size: 1 + source_translator: "identity" + source_component: "mock" + } + """, component_spec) + with tf.Graph().as_default(): + with self.assertRaises(ValueError): + unused_comp = bulk_component.BulkFeatureIdExtractorComponentBuilder( + self.master, component_spec) + + def testBulkFeatureIdExtractorOkWithMultipleFixedFeatures(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed1" embedding_dim: -1 size: 1 + } + fixed_feature { + name: "fixed2" embedding_dim: -1 size: 1 + } + fixed_feature { + name: "fixed3" embedding_dim: -1 size: 1 + } + """, component_spec) + with tf.Graph().as_default(): + comp = bulk_component.BulkFeatureIdExtractorComponentBuilder( + self.master, component_spec) + + # Should not raise errors. + self.network_states[component_spec.name] = component.NetworkState() + comp.build_greedy_training(self.master_state, self.network_states) + self.network_states[component_spec.name] = component.NetworkState() + comp.build_greedy_inference(self.master_state, self.network_states) + + def testBulkFeatureIdExtractorFailsOnEmbeddedFixedFeature(self): + component_spec = spec_pb2.ComponentSpec() + text_format.Parse(""" + name: "test" + network_unit { + registered_name: "IdentityNetwork" + } + fixed_feature { + name: "fixed" embedding_dim: 2 size: 1 + } + """, component_spec) + with tf.Graph().as_default(): + with self.assertRaises(ValueError): + unused_comp = bulk_component.BulkFeatureIdExtractorComponentBuilder( + self.master, component_spec) + + def testBulkFeatureIdExtractorExtractFocusWithOffset(self): + path = os.path.join(tf.test.get_temp_dir(), 'label-map') + with open(path, 'w') as label_map_file: + label_map_file.write('0\n') + + master_spec = spec_pb2.MasterSpec() + text_format.Parse(""" + component { + name: "test" + transition_system { + registered_name: "shift-only" + } + resource { + name: "label-map" + part { + file_pattern: "%s" + file_format: "text" + } + } + network_unit { + registered_name: "ExportFixedFeaturesNetwork" + } + backend { + registered_name: "SyntaxNetComponent" + } + fixed_feature { + name: "focus1" embedding_dim: -1 size: 1 fml: "input.focus" + predicate_map: "none" + } + fixed_feature { + name: "focus2" embedding_dim: -1 size: 1 fml: "input(1).focus" + predicate_map: "none" + } + fixed_feature { + name: "focus3" embedding_dim: -1 size: 1 fml: "input(2).focus" + predicate_map: "none" + } + } + """ % path, master_spec) + + with tf.Graph().as_default(): + corpus = _create_fake_corpus() + corpus = tf.constant(corpus, shape=[len(corpus)]) + handle = dragnn_ops.get_session( + container='test', + master_spec=master_spec.SerializeToString(), + grid_point='') + handle = dragnn_ops.attach_data_reader(handle, corpus) + handle = dragnn_ops.init_component_data( + handle, beam_size=1, component='test') + batch_size = dragnn_ops.batch_size(handle, component='test') + master_state = component.MasterState(handle, batch_size) + + extractor = bulk_component.BulkFeatureIdExtractorComponentBuilder( + self.master, master_spec.component[0]) + network_state = component.NetworkState() + self.network_states['test'] = network_state + handle = extractor.build_greedy_inference(master_state, + self.network_states) + focus1 = network_state.activations['focus1'].bulk_tensor + focus2 = network_state.activations['focus2'].bulk_tensor + focus3 = network_state.activations['focus3'].bulk_tensor + + with self.test_session() as sess: + focus1, focus2, focus3 = sess.run([focus1, focus2, focus3]) + tf.logging.info('focus1=\n%s', focus1) + tf.logging.info('focus2=\n%s', focus2) + tf.logging.info('focus3=\n%s', focus3) + + self.assertAllEqual( + focus1, + [[0], [-1], [-1], [-1], + [0], [1], [-1], [-1], + [0], [1], [2], [-1], + [0], [1], [2], [3]]) + + self.assertAllEqual( + focus2, + [[-1], [-1], [-1], [-1], + [1], [-1], [-1], [-1], + [1], [2], [-1], [-1], + [1], [2], [3], [-1]]) + + self.assertAllEqual( + focus3, + [[-1], [-1], [-1], [-1], + [-1], [-1], [-1], [-1], + [2], [-1], [-1], [-1], + [2], [3], [-1], [-1]]) + + +if __name__ == '__main__': + googletest.main() diff --git a/syntaxnet/dragnn/python/component.py b/syntaxnet/dragnn/python/component.py new file mode 100644 index 0000000000000000000000000000000000000000..e38a216fe208696c0c81061f8530fbe39b4a24fb --- /dev/null +++ b/syntaxnet/dragnn/python/component.py @@ -0,0 +1,629 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builds a DRAGNN graph for local training.""" + +from abc import ABCMeta +from abc import abstractmethod + +import tensorflow as tf +from tensorflow.python.platform import tf_logging as logging + +from dragnn.python import dragnn_ops +from dragnn.python import network_units +from syntaxnet.util import check +from syntaxnet.util import registry + + +class NetworkState(object): + """Simple utility to manage the state of a DRAGNN network. + + This class encapsulates the variables that are a specific to any + particular instance of a DRAGNN stack, as constructed by the + MasterBuilder below. + + Attributes: + activations: Dictionary mapping layer names to StoredActivation objects. + """ + + def __init__(self): + self.activations = {} + + +class MasterState(object): + """Simple utility to encapsulate tensors associated with the master state. + + Attributes: + handle: string tensor handle to the underlying nlp_saft::dragnn::MasterState + current_batch_size: int tensor containing the batch size following the most + recent MasterState::Reset(). + """ + + def __init__(self, handle, current_batch_size): + self.handle = handle + self.current_batch_size = current_batch_size + + +@registry.RegisteredClass +class ComponentBuilderBase(object): + """Utility to build a single Component in a DRAGNN stack of models. + + This class handles converting a ComponentSpec proto into various TF + sub-graphs. It will stitch together various neural units with dynamic + unrolling inside a tf.while loop. + + All variables for parameters are created during the constructor within the + scope of the component's name, e.g. 'tagger/embedding_matrix_0' for a + component named 'tagger'. + + As part of the specification, ComponentBuilder will wrap an underlying + NetworkUnit which generates the actual network layout. + """ + + __metaclass__ = ABCMeta # required for @abstractmethod + + def __init__(self, master, component_spec, attr_defaults=None): + """Initializes the ComponentBuilder from specifications. + + Args: + master: dragnn.MasterBuilder object. + component_spec: dragnn.ComponentSpec proto to be built. + attr_defaults: Optional dict of component attribute defaults. If not + provided or if empty, attributes are not extracted. + """ + self.master = master + self.num_actions = component_spec.num_actions + self.name = component_spec.name + self.spec = component_spec + self.moving_average = None + + # Determine if this component should apply self-normalization. + self.eligible_for_self_norm = ( + not self.master.hyperparams.self_norm_components_filter or self.name in + self.master.hyperparams.self_norm_components_filter.split(',')) + + # Extract component attributes before make_network(), so the network unit + # can access them. + self._attrs = {} + if attr_defaults: + self._attrs = network_units.get_attrs_with_defaults( + self.spec.component_builder.parameters, attr_defaults) + + with tf.variable_scope(self.name): + self.training_beam_size = tf.constant( + self.spec.training_beam_size, name='TrainingBeamSize') + self.inference_beam_size = tf.constant( + self.spec.inference_beam_size, name='InferenceBeamSize') + self.locally_normalize = tf.constant(False, name='LocallyNormalize') + self._step = tf.get_variable( + 'step', [], initializer=tf.zeros_initializer(), dtype=tf.int32) + self._total = tf.get_variable( + 'total', [], initializer=tf.zeros_initializer(), dtype=tf.int32) + + # Construct network variables. + self.network = self.make_network(self.spec.network_unit) + + # Construct moving average. + if self.master.hyperparams.use_moving_average: + self.moving_average = tf.train.ExponentialMovingAverage( + decay=self.master.hyperparams.average_weight, num_updates=self._step) + self.avg_ops = [self.moving_average.apply(self.network.params)] + + def make_network(self, network_unit): + """Makes a NetworkUnitInterface object based on the network_unit spec. + + Components may override this method to exert control over the + network unit construction, such as which network units are supported. + + Args: + network_unit: RegisteredModuleSpec proto defining the network unit. + + Returns: + An implementation of NetworkUnitInterface. + + Raises: + ValueError: if the requested network unit is not found in the registry. + """ + network_type = network_unit.registered_name + + with tf.variable_scope(self.name): + # Raises ValueError if not found. + return network_units.NetworkUnitInterface.Create(network_type, self) + + @abstractmethod + def build_greedy_training(self, state, network_states): + """Builds a training graph for this component. + + Two assumptions are made about the resulting graph: + 1. An oracle will be used to unroll the state and compute the cost. + 2. The graph will be differentiable when the cost is being minimized. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects. + + Returns: + (state, cost, correct, total) -- These are TF ops corresponding to + the final state after unrolling, the total cost, the total number of + correctly predicted actions, and the total number of actions. + """ + pass + + def build_structured_training(self, state, network_states): + """Builds a beam search based training loop for this component. + + The default implementation builds a dummy graph and raises a + TensorFlow runtime exception to indicate that structured training + is not implemented. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects. + + Returns: + (handle, cost, correct, total) -- These are TF ops corresponding + to the final handle after unrolling, the total cost, and the + total number of actions. Since the number of correctly predicted + actions is not applicable in the structured training setting, a + dummy value should returned. + """ + del network_states # Unused. + with tf.control_dependencies([tf.Assert(False, ['Not implemented.'])]): + handle = tf.identity(state.handle) + cost = tf.constant(0.) + correct, total = tf.constant(0), tf.constant(0) + return handle, cost, correct, total + + @abstractmethod + def build_greedy_inference(self, state, network_states, + during_training=False): + """Builds an inference graph for this component. + + If this graph is being constructed 'during_training', then it needs to be + differentiable even though it doesn't return an explicit cost. + + There may be other cases where the distinction between training and eval is + important. The handling of dropout is an example of this. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: dictionary of component NetworkState objects. + during_training: whether the graph is being constructed during training + + Returns: + Handle to the state once inference is complete for this Component. + """ + pass + + def get_summaries(self): + """Constructs a set of summaries for this component. + + Returns: + List of Summary ops to get parameter norms, progress reports, and + so forth for this component. + """ + + def combine_norm(matrices): + # Handles None in cases where the optimizer or moving average slot is + # not present. + squares = [tf.reduce_sum(tf.square(m)) for m in matrices if m is not None] + + # Some components may not have any parameters, in which case we simply + # return zero. + if squares: + return tf.sqrt(tf.add_n(squares)) + else: + return tf.constant(0, tf.float32) + + summaries = [] + summaries.append(tf.summary.scalar('%s step' % self.name, self._step)) + summaries.append(tf.summary.scalar('%s total' % self.name, self._total)) + if self.network.params: + summaries.append( + tf.summary.scalar('%s parameter Norm' % self.name, + combine_norm(self.network.params))) + slot_names = self.master.optimizer.get_slot_names() + for name in slot_names: + slot_params = [ + self.master.optimizer.get_slot(p, name) for p in self.network.params + ] + summaries.append( + tf.summary.scalar('%s %s Norm' % (self.name, name), + combine_norm(slot_params))) + + # Construct moving average. + if self.master.hyperparams.use_moving_average: + summaries.append( + tf.summary.scalar('%s avg Norm' % self.name, + combine_norm([ + self.moving_average.average(p) + for p in self.network.params + ]))) + + return summaries + + def get_variable(self, var_name=None, var_params=None): + """Returns either the original or averaged version of a given variable. + + If the master.read_from_avg flag is set to True, and the + ExponentialMovingAverage (EMA) object has been attached, then this will ask + the EMA object for the given variable. + + This is to allow executing inference from the averaged version of + parameters. + + Arguments: + var_name: Name of the variable. + var_params: tf.Variable for which to retrieve an average. + + Only one of |var_name| or |var_params| needs to be provided. If both are + provided, |var_params| takes precedence. + + Returns: + tf.Variable object corresponding to original or averaged version. + """ + if var_params: + var_name = var_params.name + else: + check.NotNone(var_name, 'specify at least one of var_name or var_params') + var_params = tf.get_variable(var_name) + + if self.moving_average and self.master.read_from_avg: + logging.info('Retrieving average for: %s', var_name) + var_params = self.moving_average.average(var_params) + assert var_params + logging.info('Returning: %s', var_params.name) + return var_params + + def advance_counters(self, total): + """Returns ops to advance the per-component step and total counters. + + Args: + total: Total number of actions to increment counters by. + + Returns: + tf.Group op incrementing 'step' by 1 and 'total' by total. + """ + update_total = tf.assign_add(self._total, total, use_locking=True) + update_step = tf.assign_add(self._step, 1, use_locking=True) + return tf.group(update_total, update_step) + + def add_regularizer(self, cost): + """Adds L2 regularization for parameters which have it turned on. + + Args: + cost: float cost before regularization. + + Returns: + Updated cost optionally including regularization. + """ + if self.network is None: + return cost + regularized_weights = self.network.get_l2_regularized_weights() + if not regularized_weights: + return cost + l2_coeff = self.master.hyperparams.l2_regularization_coefficient + if l2_coeff == 0.0: + return cost + tf.logging.info('[%s] Regularizing parameters: %s', self.name, + [w.name for w in regularized_weights]) + l2_costs = [tf.nn.l2_loss(p) for p in regularized_weights] + return tf.add(cost, l2_coeff * tf.add_n(l2_costs), name='regularizer') + + def build_post_restore_hook(self): + """Builds a post restore graph for this component. + + This is a run-once graph that prepares any state necessary for the + inference portion of the component. It is generally a no-op. + + Returns: + A no-op state. + """ + logging.info('Building default post restore hook for component: %s', + self.spec.name) + return tf.no_op(name='setup_%s' % self.spec.name) + + def attr(self, name): + """Returns the value of the component attribute with the |name|.""" + return self._attrs[name] + + +def update_tensor_arrays(network_tensors, arrays): + """Updates a list of tensor arrays from the network's output tensors. + + Arguments: + network_tensors: Output tensors from the underlying NN unit. + arrays: TensorArrays to be updated. + + Returns: + New list of TensorArrays after writing activations. + """ + # TODO(googleuser): Only store activations that will be used later in linked + # feature specifications. + next_arrays = [] + for index, network_tensor in enumerate(network_tensors): + array = arrays[index] + size = array.size() + array = array.write(size, network_tensor) + next_arrays.append(array) + return next_arrays + + +class DynamicComponentBuilder(ComponentBuilderBase): + """Component builder for recurrent DRAGNN networks. + + Feature extraction and annotation are done sequentially in a tf.while_loop + so fixed and linked features can be recurrent. + """ + + def build_greedy_training(self, state, network_states): + """Builds a training loop for this component. + + This loop repeatedly evaluates the network and computes the loss, but it + does not advance using the predictions of the network. Instead, it advances + using the oracle defined in the underlying transition system. The final + state will always correspond to the gold annotation. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: NetworkState object containing component TensorArrays. + + Returns: + (state, cost, correct, total) -- These are TF ops corresponding to + the final state after unrolling, the total cost, the total number of + correctly predicted actions, and the total number of actions. + """ + logging.info('Building component: %s', self.spec.name) + with tf.control_dependencies([tf.assert_equal(self.training_beam_size, 1)]): + stride = state.current_batch_size * self.training_beam_size + + cost = tf.constant(0.) + correct = tf.constant(0) + total = tf.constant(0) + + def cond(handle, *_): + all_final = dragnn_ops.emit_all_final(handle, component=self.name) + return tf.logical_not(tf.reduce_all(all_final)) + + def body(handle, cost, correct, total, *arrays): + """Runs the network and advances the state by a step.""" + + with tf.control_dependencies([handle, cost, correct, total] + + [x.flow for x in arrays]): + # Get a copy of the network inside this while loop. + updated_state = MasterState(handle, state.current_batch_size) + network_tensors = self._feedforward_unit( + updated_state, arrays, network_states, stride, during_training=True) + + # Every layer is written to a TensorArray, so that it can be backprop'd. + next_arrays = update_tensor_arrays(network_tensors, arrays) + with tf.control_dependencies([x.flow for x in next_arrays]): + with tf.name_scope('compute_loss'): + # A gold label > -1 determines that the sentence is still + # in a valid state. Otherwise, the sentence has ended. + # + # We add only the valid sentences to the loss, in the following way: + # 1. We compute 'valid_ix', the indices in gold that contain + # valid oracle actions. + # 2. We compute the cost function by comparing logits and gold + # only for the valid indices. + gold = dragnn_ops.emit_oracle_labels(handle, component=self.name) + gold.set_shape([None]) + valid = tf.greater(gold, -1) + valid_ix = tf.reshape(tf.where(valid), [-1]) + gold = tf.gather(gold, valid_ix) + + logits = self.network.get_logits(network_tensors) + logits = tf.gather(logits, valid_ix) + + cost += tf.reduce_sum( + tf.nn.sparse_softmax_cross_entropy_with_logits( + labels=tf.cast(gold, tf.int64), logits=logits)) + + if (self.eligible_for_self_norm and + self.master.hyperparams.self_norm_alpha > 0): + log_z = tf.reduce_logsumexp(logits, [1]) + cost += (self.master.hyperparams.self_norm_alpha * + tf.nn.l2_loss(log_z)) + + correct += tf.reduce_sum( + tf.to_int32(tf.nn.in_top_k(logits, gold, 1))) + total += tf.size(gold) + + with tf.control_dependencies([cost, correct, total, gold]): + handle = dragnn_ops.advance_from_oracle(handle, component=self.name) + return [handle, cost, correct, total] + next_arrays + + with tf.name_scope(self.name + '/train_state'): + init_arrays = [] + for layer in self.network.layers: + init_arrays.append(layer.create_array(state.current_batch_size)) + + output = tf.while_loop( + cond, + body, [state.handle, cost, correct, total] + init_arrays, + name='train_%s' % self.name) + + # Saves completed arrays and return final state and cost. + state.handle = output[0] + correct = output[2] + total = output[3] + arrays = output[4:] + cost = output[1] + + # Store handles to the final output for use in subsequent tasks. + network_state = network_states[self.name] + with tf.name_scope(self.name + '/stored_act'): + for index, layer in enumerate(self.network.layers): + network_state.activations[layer.name] = network_units.StoredActivations( + array=arrays[index]) + + # Normalize the objective by the total # of steps taken. + with tf.control_dependencies([tf.assert_greater(total, 0)]): + cost /= tf.to_float(total) + + # Adds regularization for the hidden weights. + cost = self.add_regularizer(cost) + + with tf.control_dependencies([x.flow for x in arrays]): + return tf.identity(state.handle), cost, correct, total + + def build_greedy_inference(self, state, network_states, + during_training=False): + """Builds an inference loop for this component. + + Repeatedly evaluates the network and advances the underlying state according + to the predicted scores. + + Args: + state: MasterState from the 'AdvanceMaster' op that advances the + underlying master to this component. + network_states: NetworkState object containing component TensorArrays. + during_training: whether the graph is being constructed during training + + Returns: + Handle to the state once inference is complete for this Component. + """ + logging.info('Building component: %s', self.spec.name) + if during_training: + stride = state.current_batch_size * self.training_beam_size + else: + stride = state.current_batch_size * self.inference_beam_size + + def cond(handle, *_): + all_final = dragnn_ops.emit_all_final(handle, component=self.name) + return tf.logical_not(tf.reduce_all(all_final)) + + def body(handle, *arrays): + """Runs the network and advances the state by a step.""" + + with tf.control_dependencies([handle] + [x.flow for x in arrays]): + # Get a copy of the network inside this while loop. + updated_state = MasterState(handle, state.current_batch_size) + network_tensors = self._feedforward_unit( + updated_state, + arrays, + network_states, + stride, + during_training=during_training) + next_arrays = update_tensor_arrays(network_tensors, arrays) + with tf.control_dependencies([x.flow for x in next_arrays]): + logits = self.network.get_logits(network_tensors) + logits = tf.cond(self.locally_normalize, + lambda: tf.nn.log_softmax(logits), lambda: logits) + handle = dragnn_ops.advance_from_prediction( + handle, logits, component=self.name) + return [handle] + next_arrays + + # Create the TensorArray's to store activations for downstream/recurrent + # connections. + with tf.name_scope(self.name + '/inference_state'): + init_arrays = [] + for layer in self.network.layers: + init_arrays.append(layer.create_array(stride)) + output = tf.while_loop( + cond, + body, [state.handle] + init_arrays, + name='inference_%s' % self.name) + + # Saves completed arrays and returns final state. + state.handle = output[0] + arrays = output[1:] + network_state = network_states[self.name] + with tf.name_scope(self.name + '/stored_act'): + for index, layer in enumerate(self.network.layers): + network_state.activations[layer.name] = network_units.StoredActivations( + array=arrays[index]) + with tf.control_dependencies([x.flow for x in arrays]): + return tf.identity(state.handle) + + def _feedforward_unit(self, state, arrays, network_states, stride, + during_training): + """Constructs a single instance of a feed-forward cell. + + Given an input state and access to the arrays storing activations, this + function encapsulates creation of a single network unit. This will *not* + create new variables. + + Args: + state: MasterState for the state that will be used to extract features. + arrays: List of TensorArrays corresponding to network outputs from this + component. These are used for recurrent link features; the arrays from + other components are used for stack-prop style connections. + network_states: NetworkState object containing the TensorArrays from + *all* components. + stride: int Tensor with the current beam * batch size. + during_training: Whether to build a unit for training (vs inference). + + Returns: + List of tensors generated by the underlying network implementation. + """ + with tf.variable_scope(self.name, reuse=True): + fixed_embeddings = [] + for channel_id, feature_spec in enumerate(self.spec.fixed_feature): + fixed_embedding = network_units.fixed_feature_lookup( + self, state, channel_id, stride) + if feature_spec.is_constant: + fixed_embedding.tensor = tf.stop_gradient(fixed_embedding.tensor) + fixed_embeddings.append(fixed_embedding) + + linked_embeddings = [] + for channel_id, feature_spec in enumerate(self.spec.linked_feature): + if feature_spec.source_component == self.name: + # Recurrent feature: pull from the local arrays. + index = self.network.get_layer_index(feature_spec.source_layer) + source_array = arrays[index] + source_layer_size = self.network.layers[index].dim + linked_embeddings.append( + network_units.activation_lookup_recurrent( + self, state, channel_id, source_array, source_layer_size, + stride)) + else: + # Stackprop style feature: pull from another component's arrays. + source = self.master.lookup_component[feature_spec.source_component] + source_tensor = network_states[source.name].activations[ + feature_spec.source_layer] + source_layer_size = source.network.get_layer_size( + feature_spec.source_layer) + linked_embeddings.append( + network_units.activation_lookup_other( + self, state, channel_id, source_tensor.dynamic_tensor, + source_layer_size)) + + context_tensor_arrays = [] + for context_layer in self.network.context_layers: + index = self.network.get_layer_index(context_layer.name) + context_tensor_arrays.append(arrays[index]) + + if self.spec.attention_component: + logging.info('%s component has attention over %s', self.name, + self.spec.attention_component) + source = self.master.lookup_component[self.spec.attention_component] + network_state = network_states[self.spec.attention_component] + with tf.control_dependencies( + [tf.assert_equal(state.current_batch_size, 1)]): + attention_tensor = tf.identity( + network_state.activations['layer_0'].bulk_tensor) + + else: + attention_tensor = None + + return self.network.create(fixed_embeddings, linked_embeddings, + context_tensor_arrays, attention_tensor, + during_training) diff --git a/syntaxnet/dragnn/python/composite_optimizer.py b/syntaxnet/dragnn/python/composite_optimizer.py new file mode 100644 index 0000000000000000000000000000000000000000..71aff9b2162bd69919f8cc0bf13c948326138e5d --- /dev/null +++ b/syntaxnet/dragnn/python/composite_optimizer.py @@ -0,0 +1,70 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""An optimizer that switches between several methods.""" + +import tensorflow as tf +from tensorflow.python.training import optimizer + + +class CompositeOptimizer(optimizer.Optimizer): + """Optimizer that switches between several methods. + """ + + def __init__(self, + optimizer1, + optimizer2, + switch, + use_locking=False, + name='Composite'): + """Construct a new Composite optimizer. + + Args: + optimizer1: A tf.python.training.optimizer.Optimizer object. + optimizer2: A tf.python.training.optimizer.Optimizer object. + switch: A tf.bool Tensor, selecting whether to use the first or the second + optimizer. + use_locking: Bool. If True apply use locks to prevent concurrent updates + to variables. + name: Optional name prefix for the operations created when applying + gradients. Defaults to "Composite". + """ + super(CompositeOptimizer, self).__init__(use_locking, name) + self._optimizer1 = optimizer1 + self._optimizer2 = optimizer2 + self._switch = switch + + def apply_gradients(self, grads_and_vars, global_step=None, name=None): + + return tf.cond( + self._switch, + lambda: self._optimizer1.apply_gradients(grads_and_vars, + global_step, name), + lambda: self._optimizer2.apply_gradients(grads_and_vars, + global_step, name) + ) + + + def get_slot(self, var, name): + slot1 = self._optimizer1.get_slot(var, name) + slot2 = self._optimizer2.get_slot(var, name) + if slot1 and slot2: + raise LookupError('Slot named %s for variable %s populated for both ' + 'optimizers' % (name, var.name)) + return slot1 or slot2 + + def get_slot_names(self): + return sorted(self._optimizer1.get_slot_names() + + self._optimizer2.get_slot_names()) diff --git a/syntaxnet/dragnn/python/composite_optimizer_test.py b/syntaxnet/dragnn/python/composite_optimizer_test.py new file mode 100644 index 0000000000000000000000000000000000000000..ea18982d30eb17f32e0fbee39047faa04c7a724c --- /dev/null +++ b/syntaxnet/dragnn/python/composite_optimizer_test.py @@ -0,0 +1,126 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for CompositeOptimizer.""" + + +import numpy as np +import tensorflow as tf + +from tensorflow.python.framework import test_util +from tensorflow.python.platform import googletest +from tensorflow.python.platform import tf_logging as logging + +from dragnn.python import composite_optimizer + + +class MockAdamOptimizer(tf.train.AdamOptimizer): + + def __init__(self, + learning_rate=0.001, + beta1=0.9, + beta2=0.999, + epsilon=1e-8, + use_locking=False, + name="Adam"): + super(MockAdamOptimizer, self).__init__(learning_rate, beta1, beta2, + epsilon, use_locking, name) + + def _create_slots(self, var_list): + super(MockAdamOptimizer, self)._create_slots(var_list) + for v in var_list: + self._zeros_slot(v, "adam_counter", self._name) + + def _apply_dense(self, grad, var): + train_op = super(MockAdamOptimizer, self)._apply_dense(grad, var) + counter = self.get_slot(var, "adam_counter") + return tf.group(train_op, tf.assign_add(counter, [1.0])) + + +class MockMomentumOptimizer(tf.train.MomentumOptimizer): + + def __init__(self, + learning_rate, + momentum, + use_locking=False, + name="Momentum", + use_nesterov=False): + super(MockMomentumOptimizer, self).__init__(learning_rate, momentum, + use_locking, name, use_nesterov) + + def _create_slots(self, var_list): + super(MockMomentumOptimizer, self)._create_slots(var_list) + for v in var_list: + self._zeros_slot(v, "momentum_counter", self._name) + + def _apply_dense(self, grad, var): + train_op = super(MockMomentumOptimizer, self)._apply_dense(grad, var) + counter = self.get_slot(var, "momentum_counter") + return tf.group(train_op, tf.assign_add(counter, [1.0])) + + +class CompositeOptimizerTest(test_util.TensorFlowTestCase): + + def test_switching(self): + with self.test_session() as sess: + # Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3 + x_data = np.random.rand(100).astype(np.float32) + y_data = x_data * 0.1 + 0.3 + + # Try to find values for w and b that compute y_data = w * x_data + b + # (We know that w should be 0.1 and b 0.3, but TensorFlow will + # figure that out for us.) + w = tf.Variable(tf.random_uniform([1], -1.0, 1.0)) + b = tf.Variable(tf.zeros([1])) + y = w * x_data + b + + # Minimize the mean squared errors. + loss = tf.reduce_mean(tf.square(y - y_data)) + + # Set up optimizers. + step = tf.get_variable( + "step", + shape=[], + initializer=tf.zeros_initializer(), + trainable=False, + dtype=tf.int32) + optimizer1 = MockAdamOptimizer(0.05) + optimizer2 = MockMomentumOptimizer(0.05, 0.5) + switch = tf.less(step, 100) + optimizer = composite_optimizer.CompositeOptimizer(optimizer1, optimizer2, + switch) + train_op = optimizer.minimize(loss) + + sess.run(tf.global_variables_initializer()) + + # Fit the line.: + for iteration in range(201): + self.assertEqual(sess.run(switch), iteration < 100) + sess.run(train_op) + sess.run(tf.assign_add(step, 1)) + slot_names = optimizer.get_slot_names() + self.assertItemsEqual( + slot_names, + ["m", "v", "momentum", "adam_counter", "momentum_counter"]) + adam_counter = sess.run(optimizer.get_slot(w, "adam_counter")) + momentum_counter = sess.run(optimizer.get_slot(w, "momentum_counter")) + self.assertEqual(adam_counter, min(iteration + 1, 100)) + self.assertEqual(momentum_counter, max(iteration - 99, 0)) + if iteration % 20 == 0: + logging.info("%d %s %d %d", iteration, sess.run([switch, step, w, b]), + adam_counter, momentum_counter) + +if __name__ == "__main__": + googletest.main() diff --git a/syntaxnet/dragnn/python/digraph_ops.py b/syntaxnet/dragnn/python/digraph_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..7e6953152c65deff95e07717eaa3774d3b7e1524 --- /dev/null +++ b/syntaxnet/dragnn/python/digraph_ops.py @@ -0,0 +1,356 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""TensorFlow ops for directed graphs.""" + +import tensorflow as tf + +from syntaxnet.util import check + + +def ArcPotentialsFromTokens(source_tokens, target_tokens, weights): + r"""Returns arc potentials computed from token activations and weights. + + For each batch of source and target token activations, computes a scalar + potential for each arc as the 3-way product between the activation vectors of + the source and target of the arc and the |weights|. Specifically, + + arc[b,s,t] = + \sum_{i,j} source_tokens[b,s,i] * weights[i,j] * target_tokens[b,t,j] + + Note that the token activations can be extended with bias terms to implement a + "biaffine" model (Dozat and Manning, 2017). + + Args: + source_tokens: [B,N,S] tensor of batched activations for the source token in + each arc. + target_tokens: [B,N,T] tensor of batched activations for the target token in + each arc. + weights: [S,T] matrix of weights. + + B,N may be statically-unknown, but S,T must be statically-known. The dtype + of all arguments must be compatible. + + Returns: + [B,N,N] tensor A of arc potentials where A_{b,s,t} is the potential of the + arc from s to t in batch element b. The dtype of A is the same as that of + the arguments. Note that the diagonal entries (i.e., where s==t) represent + self-loops and may not be meaningful. + """ + # All arguments must have statically-known rank. + check.Eq(source_tokens.get_shape().ndims, 3, 'source_tokens must be rank 3') + check.Eq(target_tokens.get_shape().ndims, 3, 'target_tokens must be rank 3') + check.Eq(weights.get_shape().ndims, 2, 'weights must be a matrix') + + # All activation dimensions must be statically-known. + num_source_activations = weights.get_shape().as_list()[0] + num_target_activations = weights.get_shape().as_list()[1] + check.NotNone(num_source_activations, 'unknown source activation dimension') + check.NotNone(num_target_activations, 'unknown target activation dimension') + check.Eq(source_tokens.get_shape().as_list()[2], num_source_activations, + 'dimension mismatch between weights and source_tokens') + check.Eq(target_tokens.get_shape().as_list()[2], num_target_activations, + 'dimension mismatch between weights and target_tokens') + + # All arguments must share the same type. + check.Same([weights.dtype.base_dtype, + source_tokens.dtype.base_dtype, + target_tokens.dtype.base_dtype], + 'dtype mismatch') + + source_tokens_shape = tf.shape(source_tokens) + target_tokens_shape = tf.shape(target_tokens) + batch_size = source_tokens_shape[0] + num_tokens = source_tokens_shape[1] + with tf.control_dependencies([ + tf.assert_equal(batch_size, target_tokens_shape[0]), + tf.assert_equal(num_tokens, target_tokens_shape[1])]): + # Flatten out the batch dimension so we can use one big multiplication. + targets_bnxt = tf.reshape(target_tokens, [-1, num_target_activations]) + + # Matrices are row-major, so we arrange for the RHS argument of each matmul + # to have its transpose flag set. That way no copying is required to align + # the rows of the LHS with the columns of the RHS. + weights_targets_bnxs = tf.matmul(targets_bnxt, weights, transpose_b=True) + + # The next computation is over pairs of tokens within each batch element, so + # restore the batch dimension. + weights_targets_bxnxs = tf.reshape( + weights_targets_bnxs, [batch_size, num_tokens, num_source_activations]) + + # Note that this multiplication is repeated across the batch dimension, + # instead of being one big multiplication as in the first matmul. There + # doesn't seem to be a way to arrange this as a single multiplication given + # the pairwise nature of this computation. + arcs_bxnxn = tf.matmul(source_tokens, weights_targets_bxnxs, + transpose_b=True) + return arcs_bxnxn + + +def ArcSourcePotentialsFromTokens(tokens, weights): + r"""Returns arc source potentials computed from tokens and weights. + + For each batch of token activations, computes a scalar potential for each arc + as the product between the activations of the source token and the |weights|. + Specifically, + + arc[b,s,:] = \sum_{i} weights[i] * tokens[b,s,i] + + Args: + tokens: [B,N,S] tensor of batched activations for source tokens. + weights: [S] vector of weights. + + B,N may be statically-unknown, but S must be statically-known. The dtype of + all arguments must be compatible. + + Returns: + [B,N,N] tensor A of arc potentials as defined above. The dtype of A is the + same as that of the arguments. Note that the diagonal entries (i.e., where + s==t) represent self-loops and may not be meaningful. + """ + # All arguments must have statically-known rank. + check.Eq(tokens.get_shape().ndims, 3, 'tokens must be rank 3') + check.Eq(weights.get_shape().ndims, 1, 'weights must be a vector') + + # All activation dimensions must be statically-known. + num_source_activations = weights.get_shape().as_list()[0] + check.NotNone(num_source_activations, 'unknown source activation dimension') + check.Eq(tokens.get_shape().as_list()[2], num_source_activations, + 'dimension mismatch between weights and tokens') + + # All arguments must share the same type. + check.Same([weights.dtype.base_dtype, + tokens.dtype.base_dtype], + 'dtype mismatch') + + tokens_shape = tf.shape(tokens) + batch_size = tokens_shape[0] + num_tokens = tokens_shape[1] + + # Flatten out the batch dimension so we can use a couple big matmuls. + tokens_bnxs = tf.reshape(tokens, [-1, num_source_activations]) + weights_sx1 = tf.expand_dims(weights, 1) + sources_bnx1 = tf.matmul(tokens_bnxs, weights_sx1) + sources_bnxn = tf.tile(sources_bnx1, [1, num_tokens]) + + # Restore the batch dimension in the output. + sources_bxnxn = tf.reshape(sources_bnxn, [batch_size, num_tokens, num_tokens]) + return sources_bxnxn + + +def RootPotentialsFromTokens(root, tokens, weights): + r"""Returns root selection potentials computed from tokens and weights. + + For each batch of token activations, computes a scalar potential for each root + selection as the 3-way product between the activations of the artificial root + token, the token activations, and the |weights|. Specifically, + + roots[b,r] = \sum_{i,j} root[i] * weights[i,j] * tokens[b,r,j] + + Args: + root: [S] vector of activations for the artificial root token. + tokens: [B,N,T] tensor of batched activations for root tokens. + weights: [S,T] matrix of weights. + + B,N may be statically-unknown, but S,T must be statically-known. The dtype + of all arguments must be compatible. + + Returns: + [B,N] matrix R of root-selection potentials as defined above. The dtype of + R is the same as that of the arguments. + """ + # All arguments must have statically-known rank. + check.Eq(root.get_shape().ndims, 1, 'root must be a vector') + check.Eq(tokens.get_shape().ndims, 3, 'tokens must be rank 3') + check.Eq(weights.get_shape().ndims, 2, 'weights must be a matrix') + + # All activation dimensions must be statically-known. + num_source_activations = weights.get_shape().as_list()[0] + num_target_activations = weights.get_shape().as_list()[1] + check.NotNone(num_source_activations, 'unknown source activation dimension') + check.NotNone(num_target_activations, 'unknown target activation dimension') + check.Eq(root.get_shape().as_list()[0], num_source_activations, + 'dimension mismatch between weights and root') + check.Eq(tokens.get_shape().as_list()[2], num_target_activations, + 'dimension mismatch between weights and tokens') + + # All arguments must share the same type. + check.Same([weights.dtype.base_dtype, + root.dtype.base_dtype, + tokens.dtype.base_dtype], + 'dtype mismatch') + + root_1xs = tf.expand_dims(root, 0) + + tokens_shape = tf.shape(tokens) + batch_size = tokens_shape[0] + num_tokens = tokens_shape[1] + + # Flatten out the batch dimension so we can use a couple big matmuls. + tokens_bnxt = tf.reshape(tokens, [-1, num_target_activations]) + weights_targets_bnxs = tf.matmul(tokens_bnxt, weights, transpose_b=True) + roots_1xbn = tf.matmul(root_1xs, weights_targets_bnxs, transpose_b=True) + + # Restore the batch dimension in the output. + roots_bxn = tf.reshape(roots_1xbn, [batch_size, num_tokens]) + return roots_bxn + + +def CombineArcAndRootPotentials(arcs, roots): + """Combines arc and root potentials into a single set of potentials. + + Args: + arcs: [B,N,N] tensor of batched arc potentials. + roots: [B,N] matrix of batched root potentials. + + Returns: + [B,N,N] tensor P of combined potentials where + P_{b,s,t} = s == t ? roots[b,t] : arcs[b,s,t] + """ + # All arguments must have statically-known rank. + check.Eq(arcs.get_shape().ndims, 3, 'arcs must be rank 3') + check.Eq(roots.get_shape().ndims, 2, 'roots must be a matrix') + + # All arguments must share the same type. + dtype = arcs.dtype.base_dtype + check.Same([dtype, roots.dtype.base_dtype], 'dtype mismatch') + + roots_shape = tf.shape(roots) + arcs_shape = tf.shape(arcs) + batch_size = roots_shape[0] + num_tokens = roots_shape[1] + with tf.control_dependencies([ + tf.assert_equal(batch_size, arcs_shape[0]), + tf.assert_equal(num_tokens, arcs_shape[1]), + tf.assert_equal(num_tokens, arcs_shape[2])]): + return tf.matrix_set_diag(arcs, roots) + + +def LabelPotentialsFromTokens(tokens, weights): + r"""Computes label potentials from tokens and weights. + + For each batch of token activations, computes a scalar potential for each + label as the product between the activations of the source token and the + |weights|. Specifically, + + labels[b,t,l] = \sum_{i} weights[l,i] * tokens[b,t,i] + + Args: + tokens: [B,N,T] tensor of batched token activations. + weights: [L,T] matrix of weights. + + B,N may be dynamic, but L,T must be static. The dtype of all arguments must + be compatible. + + Returns: + [B,N,L] tensor of label potentials as defined above, with the same dtype as + the arguments. + """ + check.Eq(tokens.get_shape().ndims, 3, 'tokens must be rank 3') + check.Eq(weights.get_shape().ndims, 2, 'weights must be a matrix') + + num_labels = weights.get_shape().as_list()[0] + num_activations = weights.get_shape().as_list()[1] + check.NotNone(num_labels, 'unknown number of labels') + check.NotNone(num_activations, 'unknown activation dimension') + check.Eq(tokens.get_shape().as_list()[2], num_activations, + 'activation mismatch between weights and tokens') + tokens_shape = tf.shape(tokens) + batch_size = tokens_shape[0] + num_tokens = tokens_shape[1] + + check.Same([tokens.dtype.base_dtype, + weights.dtype.base_dtype], + 'dtype mismatch') + + # Flatten out the batch dimension so we can use one big matmul(). + tokens_bnxt = tf.reshape(tokens, [-1, num_activations]) + labels_bnxl = tf.matmul(tokens_bnxt, weights, transpose_b=True) + + # Restore the batch dimension in the output. + labels_bxnxl = tf.reshape(labels_bnxl, [batch_size, num_tokens, num_labels]) + return labels_bxnxl + + +def LabelPotentialsFromTokenPairs(sources, targets, weights): + r"""Computes label potentials from source and target tokens and weights. + + For each aligned pair of source and target token activations, computes a + scalar potential for each label on the arc from the source to the target. + Specifically, + + labels[b,t,l] = \sum_{i,j} sources[b,t,i] * weights[l,i,j] * targets[b,t,j] + + Args: + sources: [B,N,S] tensor of batched source token activations. + targets: [B,N,T] tensor of batched target token activations. + weights: [L,S,T] tensor of weights. + + B,N may be dynamic, but L,S,T must be static. The dtype of all arguments + must be compatible. + + Returns: + [B,N,L] tensor of label potentials as defined above, with the same dtype as + the arguments. + """ + check.Eq(sources.get_shape().ndims, 3, 'sources must be rank 3') + check.Eq(targets.get_shape().ndims, 3, 'targets must be rank 3') + check.Eq(weights.get_shape().ndims, 3, 'weights must be rank 3') + + num_labels = weights.get_shape().as_list()[0] + num_source_activations = weights.get_shape().as_list()[1] + num_target_activations = weights.get_shape().as_list()[2] + check.NotNone(num_labels, 'unknown number of labels') + check.NotNone(num_source_activations, 'unknown source activation dimension') + check.NotNone(num_target_activations, 'unknown target activation dimension') + check.Eq(sources.get_shape().as_list()[2], num_source_activations, + 'activation mismatch between weights and source tokens') + check.Eq(targets.get_shape().as_list()[2], num_target_activations, + 'activation mismatch between weights and target tokens') + + check.Same([sources.dtype.base_dtype, + targets.dtype.base_dtype, + weights.dtype.base_dtype], + 'dtype mismatch') + + sources_shape = tf.shape(sources) + targets_shape = tf.shape(targets) + batch_size = sources_shape[0] + num_tokens = sources_shape[1] + with tf.control_dependencies([tf.assert_equal(batch_size, targets_shape[0]), + tf.assert_equal(num_tokens, targets_shape[1])]): + # For each token, we must compute a vector-3tensor-vector product. There is + # no op for this, but we can use reshape() and matmul() to compute it. + + # Reshape |weights| and |targets| so we can use a single matmul(). + weights_lsxt = tf.reshape(weights, [num_labels * num_source_activations, + num_target_activations]) + targets_bnxt = tf.reshape(targets, [-1, num_target_activations]) + weights_targets_bnxls = tf.matmul(targets_bnxt, weights_lsxt, + transpose_b=True) + + # Restore all dimensions. + weights_targets_bxnxlxs = tf.reshape( + weights_targets_bnxls, + [batch_size, num_tokens, num_labels, num_source_activations]) + + # Incorporate the source activations. In this case, we perform a batched + # matmul() between the trailing [L,S] matrices of the current result and the + # trailing [S] vectors of the tokens. + sources_bxnx1xs = tf.expand_dims(sources, 2) + labels_bxnxlx1 = tf.matmul(weights_targets_bxnxlxs, sources_bxnx1xs, + transpose_b=True) + labels_bxnxl = tf.squeeze(labels_bxnxlx1, [3]) + return labels_bxnxl diff --git a/syntaxnet/dragnn/python/digraph_ops_test.py b/syntaxnet/dragnn/python/digraph_ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..e38109f4df8933537943138e98e952d6e18cd8a2 --- /dev/null +++ b/syntaxnet/dragnn/python/digraph_ops_test.py @@ -0,0 +1,178 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for digraph ops.""" + +import tensorflow as tf + +from dragnn.python import digraph_ops + + +class DigraphOpsTest(tf.test.TestCase): + """Testing rig.""" + + def testArcPotentialsFromTokens(self): + with self.test_session(): + # Batch of two, where the second batch item is the reverse of the first. + source_tokens = tf.constant([[[1, 2], + [2, 3], + [3, 4]], + [[3, 4], + [2, 3], + [1, 2]]], tf.float32) + target_tokens = tf.constant([[[4, 5, 6], + [5, 6, 7], + [6, 7, 8]], + [[6, 7, 8], + [5, 6, 7], + [4, 5, 6]]], tf.float32) + weights = tf.constant([[2, 3, 5], + [7, 11, 13]], + tf.float32) + + arcs = digraph_ops.ArcPotentialsFromTokens(source_tokens, target_tokens, + weights) + + # For example, + # ((1 * 2 * 4 + 1 * 3 * 5 + 1 * 5 * 6) + + # (2 * 7 * 4 + 2 * 11 * 5 + 2 * 13 * 6)) = 375 + self.assertAllEqual(arcs.eval(), + [[[375, 447, 519], + [589, 702, 815], + [803, 957, 1111]], + [[1111, 957, 803], # reflected through the center + [815, 702, 589], + [519, 447, 375]]]) + + def testArcSourcePotentialsFromTokens(self): + with self.test_session(): + tokens = tf.constant([[[4, 5, 6], + [5, 6, 7], + [6, 7, 8]], + [[6, 7, 8], + [5, 6, 7], + [4, 5, 6]]], tf.float32) + weights = tf.constant([2, 3, 5], tf.float32) + + arcs = digraph_ops.ArcSourcePotentialsFromTokens(tokens, weights) + + self.assertAllEqual(arcs.eval(), [[[53, 53, 53], + [63, 63, 63], + [73, 73, 73]], + [[73, 73, 73], + [63, 63, 63], + [53, 53, 53]]]) + + def testRootPotentialsFromTokens(self): + with self.test_session(): + root = tf.constant([1, 2], tf.float32) + tokens = tf.constant([[[4, 5, 6], + [5, 6, 7], + [6, 7, 8]], + [[6, 7, 8], + [5, 6, 7], + [4, 5, 6]]], tf.float32) + weights = tf.constant([[2, 3, 5], + [7, 11, 13]], + tf.float32) + + roots = digraph_ops.RootPotentialsFromTokens(root, tokens, weights) + + self.assertAllEqual(roots.eval(), [[375, 447, 519], + [519, 447, 375]]) + + def testCombineArcAndRootPotentials(self): + with self.test_session(): + arcs = tf.constant([[[1, 2, 3], + [2, 3, 4], + [3, 4, 5]], + [[3, 4, 5], + [2, 3, 4], + [1, 2, 3]]], tf.float32) + roots = tf.constant([[6, 7, 8], + [8, 7, 6]], tf.float32) + + potentials = digraph_ops.CombineArcAndRootPotentials(arcs, roots) + + self.assertAllEqual(potentials.eval(), [[[6, 2, 3], + [2, 7, 4], + [3, 4, 8]], + [[8, 4, 5], + [2, 7, 4], + [1, 2, 6]]]) + + def testLabelPotentialsFromTokens(self): + with self.test_session(): + tokens = tf.constant([[[1, 2], + [3, 4], + [5, 6]], + [[6, 5], + [4, 3], + [2, 1]]], tf.float32) + + + weights = tf.constant([[ 2, 3], + [ 5, 7], + [11, 13]], tf.float32) + + labels = digraph_ops.LabelPotentialsFromTokens(tokens, weights) + + self.assertAllEqual(labels.eval(), + + [[[ 8, 19, 37], + [ 18, 43, 85], + [ 28, 67, 133]], + [[ 27, 65, 131], + [ 17, 41, 83], + [ 7, 17, 35]]]) + + def testLabelPotentialsFromTokenPairs(self): + with self.test_session(): + sources = tf.constant([[[1, 2], + [3, 4], + [5, 6]], + [[6, 5], + [4, 3], + [2, 1]]], tf.float32) + targets = tf.constant([[[3, 4], + [5, 6], + [7, 8]], + [[8, 7], + [6, 5], + [4, 3]]], tf.float32) + + + weights = tf.constant([[[ 2, 3], + [ 5, 7]], + [[11, 13], + [17, 19]], + [[23, 29], + [31, 37]]], tf.float32) + + labels = digraph_ops.LabelPotentialsFromTokenPairs(sources, targets, + weights) + + self.assertAllEqual(labels.eval(), + + [[[ 104, 339, 667], + [ 352, 1195, 2375], + [ 736, 2531, 5043]], + [[ 667, 2419, 4857], + [ 303, 1115, 2245], + [ 75, 291, 593]]]) + + +if __name__ == "__main__": + tf.test.main() diff --git a/syntaxnet/dragnn/python/dragnn_ops.py b/syntaxnet/dragnn/python/dragnn_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..8a640107e0c82ead207cb34a8cf4199d0585ca98 --- /dev/null +++ b/syntaxnet/dragnn/python/dragnn_ops.py @@ -0,0 +1,24 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Groups the DRAGNN TensorFlow ops in one module.""" + + +try: + from dragnn.core.ops.gen_dragnn_bulk_ops import * + from dragnn.core.ops.gen_dragnn_ops import * +except ImportError as e: + raise e + diff --git a/syntaxnet/dragnn/python/evaluation.py b/syntaxnet/dragnn/python/evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..a028502252d419882c90d2e9cde6e44a3ec4b04b --- /dev/null +++ b/syntaxnet/dragnn/python/evaluation.py @@ -0,0 +1,117 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Parser evaluation utils.""" + +from __future__ import division + +import tensorflow as tf + +from syntaxnet import sentence_pb2 +from syntaxnet.util import check + + +def calculate_parse_metrics(gold_corpus, annotated_corpus): + """Calculate POS/UAS/LAS accuracy based on gold and annotated sentences.""" + check.Eq(len(gold_corpus), len(annotated_corpus), 'Corpora are not aligned') + num_tokens = 0 + num_correct_pos = 0 + num_correct_uas = 0 + num_correct_las = 0 + for gold_str, annotated_str in zip(gold_corpus, annotated_corpus): + gold = sentence_pb2.Sentence() + annotated = sentence_pb2.Sentence() + gold.ParseFromString(gold_str) + annotated.ParseFromString(annotated_str) + check.Eq(gold.text, annotated.text, 'Text is not aligned') + check.Eq(len(gold.token), len(annotated.token), 'Tokens are not aligned') + tokens = zip(gold.token, annotated.token) + num_tokens += len(tokens) + num_correct_pos += sum(1 for x, y in tokens if x.tag == y.tag) + num_correct_uas += sum(1 for x, y in tokens if x.head == y.head) + num_correct_las += sum(1 for x, y in tokens + if x.head == y.head and x.label == y.label) + + tf.logging.info('Total num documents: %d', len(annotated_corpus)) + tf.logging.info('Total num tokens: %d', num_tokens) + pos = num_correct_pos * 100.0 / num_tokens + uas = num_correct_uas * 100.0 / num_tokens + las = num_correct_las * 100.0 / num_tokens + tf.logging.info('POS: %.2f%%', pos) + tf.logging.info('UAS: %.2f%%', uas) + tf.logging.info('LAS: %.2f%%', las) + return pos, uas, las + + +def parser_summaries(gold_corpus, annotated_corpus): + """Computes parser evaluation summaries for gold and annotated sentences.""" + pos, uas, las = calculate_parse_metrics(gold_corpus, annotated_corpus) + return {'POS': pos, 'LAS': las, 'UAS': uas, 'eval_metric': las} + + +def calculate_segmentation_metrics(gold_corpus, annotated_corpus): + """Calculate precision/recall/f1 based on gold and annotated sentences.""" + check.Eq(len(gold_corpus), len(annotated_corpus), 'Corpora are not aligned') + num_gold_tokens = 0 + num_test_tokens = 0 + num_correct_tokens = 0 + def token_span(token): + check.Ge(token.end, token.start) + return (token.start, token.end) + + def ratio(numerator, denominator): + check.Ge(numerator, 0) + check.Ge(denominator, 0) + if denominator > 0: + return numerator / denominator + elif numerator == 0: + return 0.0 # map 0/0 to 0 + else: + return float('inf') # map x/0 to inf + + for gold_str, annotated_str in zip(gold_corpus, annotated_corpus): + gold = sentence_pb2.Sentence() + annotated = sentence_pb2.Sentence() + gold.ParseFromString(gold_str) + annotated.ParseFromString(annotated_str) + check.Eq(gold.text, annotated.text, 'Text is not aligned') + gold_spans = set() + test_spans = set() + for token in gold.token: + check.NotIn(token_span(token), gold_spans, 'Duplicate token') + gold_spans.add(token_span(token)) + for token in annotated.token: + check.NotIn(token_span(token), test_spans, 'Duplicate token') + test_spans.add(token_span(token)) + num_gold_tokens += len(gold_spans) + num_test_tokens += len(test_spans) + num_correct_tokens += len(gold_spans.intersection(test_spans)) + + tf.logging.info('Total num documents: %d', len(annotated_corpus)) + tf.logging.info('Total gold tokens: %d', num_gold_tokens) + tf.logging.info('Total test tokens: %d', num_test_tokens) + precision = 100 * ratio(num_correct_tokens, num_test_tokens) + recall = 100 * ratio(num_correct_tokens, num_gold_tokens) + f1 = ratio(2 * precision * recall, precision + recall) + tf.logging.info('Precision: %.2f%%', precision) + tf.logging.info('Recall: %.2f%%', recall) + tf.logging.info('F1: %.2f%%', f1) + + return round(precision, 2), round(recall, 2), round(f1, 2) + + +def segmentation_summaries(gold_corpus, annotated_corpus): + """Computes segmentation eval summaries for gold and annotated sentences.""" + prec, rec, f1 = calculate_segmentation_metrics(gold_corpus, annotated_corpus) + return {'precision': prec, 'recall': rec, 'f1': f1, 'eval_metric': f1} diff --git a/syntaxnet/dragnn/python/evaluation_test.py b/syntaxnet/dragnn/python/evaluation_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7be0fc4be1716a56a72529b56d91ce3ad732338c --- /dev/null +++ b/syntaxnet/dragnn/python/evaluation_test.py @@ -0,0 +1,108 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for parser evaluation.""" + +import tensorflow as tf + +from dragnn.python import evaluation +from syntaxnet import sentence_pb2 + + +class EvaluationTest(tf.test.TestCase): + + def _add_sentence(self, tags, heads, labels, corpus): + """Adds a sentence to the corpus.""" + sentence = sentence_pb2.Sentence() + for tag, head, label in zip(tags, heads, labels): + sentence.token.add(word='x', start=0, end=0, + tag=tag, head=head, label=label) + corpus.append(sentence.SerializeToString()) + + def setUp(self): + self._gold_corpus = [] + self._test_corpus = [] + + # A correct sentence. + self._add_sentence(['DT'], [-1], ['ROOT'], self._gold_corpus) + self._add_sentence(['DT'], [-1], ['ROOT'], self._test_corpus) + + # An incorrect sentence. There is one POS mistake, two head mistakes, and + # one label mistake. NB: Since the label mistake occurs on the one token + # with a correct head, this sentence has three mistakes w.r.t. LAS. + self._add_sentence(['DT', 'JJ', 'NN'], [2, 2, -1], ['det', 'amod', 'ROOT'], + self._gold_corpus) + self._add_sentence(['xx', 'JJ', 'NN'], [1, 0, -1], ['det', 'amod', 'xxxx'], + self._test_corpus) + + def testCalculateParseMetrics(self): + pos, uas, las = evaluation.calculate_parse_metrics(self._gold_corpus, + self._test_corpus) + self.assertEqual(75, pos) + self.assertEqual(50, uas) + self.assertEqual(25, las) + + def testCalculateSegmentationMetrics(self): + self._gold_corpus = [] + self._test_corpus = [] + + def add_sentence_for_segment_eval(starts, ends, corpus): + """Adds a sentence to the corpus.""" + sentence = sentence_pb2.Sentence() + for start, end in zip(starts, ends): + sentence.token.add(word='x', start=start, end=end) + corpus.append(sentence.SerializeToString()) + + # A test case with 5 gold words, 4 test words and 3 are correct. + # -gold tokens: 'This is a gold sentence' + # -test tokens: 'Thisis a gold sentence' + add_sentence_for_segment_eval( + [0, 5, 8, 10, 15], [3, 6, 8, 13, 22], self._gold_corpus) + add_sentence_for_segment_eval( + [0, 8, 10, 15], [6, 8, 13, 22], self._test_corpus) + + # Another test case with 3 gold words, 5 test words and 2 correct words. + # -gold tokens: 'another gold sentence' + # -test tokens: 'another gold sen tence' + add_sentence_for_segment_eval([0, 8, 13], [6, 11, 20], self._gold_corpus) + add_sentence_for_segment_eval([0, 8, 13, 17, 21], [6, 11, 15, 19, 22], + self._test_corpus) + prec, rec, f1 = evaluation.calculate_segmentation_metrics(self._gold_corpus, + self._test_corpus) + self.assertEqual(55.56, prec) + self.assertEqual(62.50, rec) + self.assertEqual(58.82, f1) + + summaries = evaluation.segmentation_summaries(self._gold_corpus, + self._test_corpus) + self.assertEqual({ + 'precision': 55.56, + 'recall': 62.50, + 'f1': 58.82, + 'eval_metric': 58.82 + }, summaries) + + def testParserSummaries(self): + summaries = evaluation.parser_summaries(self._gold_corpus, + self._test_corpus) + self.assertEqual({ + 'POS': 75, + 'UAS': 50, + 'LAS': 25, + 'eval_metric': 25 # equals LAS + }, summaries) + + +if __name__ == '__main__': + tf.test.main() diff --git a/syntaxnet/dragnn/python/graph_builder.py b/syntaxnet/dragnn/python/graph_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..014fd4a97e4eec5e90b87034ab05f6b870739f3e --- /dev/null +++ b/syntaxnet/dragnn/python/graph_builder.py @@ -0,0 +1,606 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Builds a DRAGNN graph for local training.""" + + +import tensorflow as tf +from tensorflow.core.protobuf import saver_pb2 +from tensorflow.python.platform import tf_logging as logging + +from dragnn.protos import spec_pb2 +from dragnn.python import component +from dragnn.python import composite_optimizer +from dragnn.python import dragnn_ops +from syntaxnet.util import check + +try: + tf.NotDifferentiable('ExtractFixedFeatures') +except KeyError, e: + logging.info(str(e)) + + +def _create_learning_rate(hyperparams, step_var): + """Creates learning rate var, with decay and switching for CompositeOptimizer. + + Args: + hyperparams: a GridPoint proto containing optimizer spec, particularly + learning_method to determine optimizer class to use. + step_var: tf.Variable, global training step. + + Returns: + a scalar `Tensor`, the learning rate based on current step and hyperparams. + """ + if hyperparams.learning_method != 'composite': + base_rate = hyperparams.learning_rate + else: + spec = hyperparams.composite_optimizer_spec + switch = tf.less(step_var, spec.switch_after_steps) + base_rate = tf.cond(switch, lambda: tf.constant(spec.method1.learning_rate), + lambda: tf.constant(spec.method2.learning_rate)) + return tf.train.exponential_decay( + base_rate, + step_var, + hyperparams.decay_steps, + hyperparams.decay_base, + staircase=hyperparams.decay_staircase) + + +def _create_optimizer(hyperparams, learning_rate_var, step_var=None): + """Creates an optimizer object for a given spec, learning rate and step var. + + Args: + hyperparams: a GridPoint proto containing optimizer spec, particularly + learning_method to determine optimizer class to use. + learning_rate_var: a `tf.Tensor`, the learning rate. + step_var: a `tf.Variable`, global training step. + + Returns: + a `tf.train.Optimizer` object that was built. + """ + if hyperparams.learning_method == 'gradient_descent': + return tf.train.GradientDescentOptimizer( + learning_rate_var, use_locking=True) + elif hyperparams.learning_method == 'adam': + return tf.train.AdamOptimizer( + learning_rate_var, + beta1=hyperparams.adam_beta1, + beta2=hyperparams.adam_beta2, + epsilon=hyperparams.adam_eps, + use_locking=True) + elif hyperparams.learning_method == 'lazyadam': + return tf.contrib.opt.LazyAdamOptimizer( + learning_rate_var, + beta1=hyperparams.adam_beta1, + beta2=hyperparams.adam_beta2, + epsilon=hyperparams.adam_eps, + use_locking=True) + elif hyperparams.learning_method == 'momentum': + return tf.train.MomentumOptimizer( + learning_rate_var, hyperparams.momentum, use_locking=True) + elif hyperparams.learning_method == 'composite': + spec = hyperparams.composite_optimizer_spec + optimizer1 = _create_optimizer(spec.method1, learning_rate_var, step_var) + optimizer2 = _create_optimizer(spec.method2, learning_rate_var, step_var) + if step_var is None: + logging.fatal('step_var is required for CompositeOptimizer') + switch = tf.less(step_var, spec.switch_after_steps) + return composite_optimizer.CompositeOptimizer( + optimizer1, optimizer2, switch, use_locking=True) + else: + logging.fatal('Unknown learning method (optimizer)') + + +class MasterBuilder(object): + """A builder for a DRAGNN stack of models. + + This class is the major factory for all DRAGNN models. It provides + common hooks to build training and evaluation targets from a single + MasterSpec and hyperparameter configuration. + + The key concept is as follows: to execute a DRAGNN graph, one needs + two stateful pieces: + + 1. A handle to a C++ dragnn state, managed outside of TensorFlow and + accesssed via the custom dragnn ops. + 2. A set of StoredActivations, one for each component, that contain network + activations that can be used across components. + + TODO(googleuser): Update these comments to be accurate. + Both of these can be handled automatically "under-the-hood" by the + MasterBuilder API. For #1, the key consideration is that each C++ + ComputeSession is allocated statically, meaning memory is shared + across different tensorflow::Session invocations. ComputeSessions are + allocated from pools. The `pool_scope` identifies the pool, unique to this + MasterBuilder, from which the ComputeSession is allocated. From there, + GetSession takes care of handing out ComputeSessions with unique handles. + Each ComputeSession can then be run concurrently. + + Attributes: + spec: the MasterSpec proto. + hyperparams: the GridPoint proto containing hyperparameters. + pool_scope: string identifier for the ComputeSession pool to use. + components: a list of ComponentBuilders in the order they are defined + in the MasterSpec. + lookup_component: a dictionary to lookup ComponentBuilders by name. + optimizer: handle to the tf.train Optimizer object used to train this model. + master_vars: dictionary of globally shared tf.Variable objects (e.g. + the global training step and learning rate.) + """ + + def __init__(self, master_spec, hyperparam_config=None, pool_scope='shared'): + """Initializes the MasterBuilder from specifications. + + During construction, all components are initialized along with their + parameter tf.Variables. + + Args: + master_spec: dragnn.MasterSpec proto. + hyperparam_config: dragnn.GridPoint proto specifying hyperparameters. + Defaults to empty specification. + pool_scope: string identifier for the compute session pool to use. + + Raises: + ValueError: if a component is not found in the registry. + """ + self.spec = master_spec + self.hyperparams = (spec_pb2.GridPoint() + if hyperparam_config is None else hyperparam_config) + self.pool_scope = pool_scope + + # Set the graph-level random seed before creating the Components so the ops + # they create will use this seed. + tf.set_random_seed(hyperparam_config.seed) + + # Construct all utility class and variables for each Component. + self.components = [] + self.lookup_component = {} + for component_spec in master_spec.component: + component_type = component_spec.component_builder.registered_name + + # Raises ValueError if not found. + comp = component.ComponentBuilderBase.Create(component_type, self, + component_spec) + + self.lookup_component[comp.name] = comp + self.components.append(comp) + + # Add global step variable. + self.master_vars = {} + with tf.variable_scope('master', reuse=False): + self.master_vars['step'] = tf.get_variable( + 'step', [], initializer=tf.zeros_initializer(), dtype=tf.int32) + self.master_vars['learning_rate'] = _create_learning_rate( + self.hyperparams, self.master_vars['step']) + + # Construct optimizer. + self.optimizer = _create_optimizer(self.hyperparams, + self.master_vars['learning_rate'], + self.master_vars['step']) + + @property + def component_names(self): + return tuple(c.name for c in self.components) + + def _get_compute_session(self): + """Returns a new ComputeSession handle.""" + return dragnn_ops.get_session( + self.pool_scope, + master_spec=self.spec.SerializeToString(), + grid_point=self.hyperparams.SerializeToString(), + name='GetSession') + + def _get_session_with_reader(self, enable_tracing): + """Utility to create ComputeSession management ops. + + Creates a new ComputeSession handle and provides the following + named nodes: + + ComputeSession/InputBatch -- a placeholder for attaching a string + specification for AttachReader. + ComputeSession/AttachReader -- the AttachReader op. + + Args: + enable_tracing: bool, whether to enable tracing before attaching the data. + + Returns: + handle: handle to a new ComputeSession returned by the AttachReader op. + input_batch: InputBatch placeholder. + """ + with tf.name_scope('ComputeSession'): + input_batch = tf.placeholder( + dtype=tf.string, shape=[None], name='InputBatch') + + # Get the ComputeSession and chain some essential ops. + handle = self._get_compute_session() + if enable_tracing: + handle = dragnn_ops.set_tracing(handle, True) + handle = dragnn_ops.attach_data_reader( + handle, input_batch, name='AttachReader') + + return handle, input_batch + + def _outputs_with_release(self, handle, inputs, outputs): + """Ensures ComputeSession is released before outputs are returned. + + Args: + handle: Handle to ComputeSession on which all computation until now has + depended. It will be released and assigned to the output 'run'. + inputs: list of nodes we want to pass through without any dependencies. + outputs: list of nodes whose access should ensure the ComputeSession is + safely released. + + Returns: + A dictionary of both input and output nodes. + """ + with tf.control_dependencies(outputs.values()): + with tf.name_scope('ComputeSession'): + release_op = dragnn_ops.release_session(handle) + run_op = tf.group(release_op, name='run') + for output in outputs: + with tf.control_dependencies([release_op]): + outputs[output] = tf.identity(outputs[output], name=output) + all_nodes = inputs.copy() + all_nodes.update(outputs) + + # Add an alias for simply running without collecting outputs. + # Common, for instance, with training. + all_nodes['run'] = run_op + return all_nodes + + def build_training(self, + handle, + compute_gradients=True, + use_moving_average=False, + advance_counters=True, + component_weights=None, + unroll_using_oracle=None, + max_index=-1): + """Builds a training pipeline. + + Args: + handle: Handle tensor for the ComputeSession. + compute_gradients: Whether to generate gradients and an optimizer op. + When False, build_training will return a 'dry run' training op, + used normally only for oracle tracing. + use_moving_average: Whether or not to read from the moving + average variables instead of the true parameters. Note: it is not + possible to make gradient updates when this is True. + advance_counters: Whether or not this loop should increment the + per-component step counters. + component_weights: If set, this is a list of relative weights + each component's cost should get in the pipeline. Defaults to 1.0 for + each component. + unroll_using_oracle: If set, this is a list of booleans indicating + whether or not to use the gold decodings for each component. Defaults + to True for each component. + max_index: Training will use only the first max_index components, + or -1 for all components. + + Returns: + handle: to the ComputeSession, conditioned on completing training step. + outputs: a dictionary of useful training tensors. + + Raises: + IndexError: if max_index is positive but out of bounds. + """ + check.IsFalse(compute_gradients and use_moving_average, + 'It is not possible to make gradient updates when reading ' + 'from the moving average variables.') + + self.read_from_avg = use_moving_average + if max_index < 0: + max_index = len(self.components) + else: + if not 0 < max_index <= len(self.components): + raise IndexError('Invalid max_index {} for components {}; handle {}'. + format(max_index, self.component_names, handle.name)) + + # By default, we train every component supervised. + if not component_weights: + component_weights = [1] * max_index + if not unroll_using_oracle: + unroll_using_oracle = [True] * max_index + + component_weights = component_weights[:max_index] + total_weight = (float)(sum(component_weights)) + component_weights = [w / total_weight for w in component_weights] + + unroll_using_oracle = unroll_using_oracle[:max_index] + + logging.info('Creating training target:') + logging.info('\tWeights: %s', component_weights) + logging.info('\tOracle: %s', unroll_using_oracle) + + metrics_list = [] + cost = tf.constant(0.) + effective_batch = tf.constant(0) + + avg_ops = [] + params_to_train = [] + + network_states = {} + for component_index in range(0, max_index): + comp = self.components[component_index] + network_states[comp.name] = component.NetworkState() + + logging.info('Initializing data for component "%s"', comp.name) + handle = dragnn_ops.init_component_data( + handle, beam_size=comp.training_beam_size, component=comp.name) + # TODO(googleuser): Phase out component.MasterState. + master_state = component.MasterState(handle, + dragnn_ops.batch_size( + handle, component=comp.name)) + with tf.control_dependencies([handle, cost]): + args = (master_state, network_states) + if unroll_using_oracle[component_index]: + + handle, component_cost, component_correct, component_total = (tf.cond( + comp.training_beam_size > 1, + lambda: comp.build_structured_training(*args), + lambda: comp.build_greedy_training(*args))) + + else: + handle = comp.build_greedy_inference(*args, during_training=True) + component_cost = tf.constant(0.) + component_correct, component_total = tf.constant(0), tf.constant(0) + + weighted_component_cost = tf.multiply( + component_cost, + tf.constant((float)(component_weights[component_index])), + name='weighted_component_cost') + + cost += weighted_component_cost + effective_batch += component_total + metrics_list += [[component_total], [component_correct]] + + if advance_counters: + with tf.control_dependencies( + [comp.advance_counters(component_total)]): + cost = tf.identity(cost) + + # Keep track of which parameters will be trained, and any moving + # average updates to apply for these parameters. + params_to_train += comp.network.params + if self.hyperparams.use_moving_average: + avg_ops += comp.avg_ops + + # Concatenate evaluation results + metrics = tf.concat(metrics_list, 0) + + # If gradient computation is requested, then: + # 1. compute the gradients, + # 2. add an optimizer to update the parameters using the gradients, + # 3. make the ComputeSession handle depend on the optimizer. + if compute_gradients: + logging.info('Creating train op with %d variables:\n\t%s', + len(params_to_train), + '\n\t'.join([x.name for x in params_to_train])) + + grads_and_vars = self.optimizer.compute_gradients( + cost, var_list=params_to_train) + clipped_gradients = [(self._clip_gradients(g), v) + for g, v in grads_and_vars] + minimize_op = self.optimizer.apply_gradients( + clipped_gradients, global_step=self.master_vars['step']) + + if self.hyperparams.use_moving_average: + with tf.control_dependencies([minimize_op]): + minimize_op = tf.group(*avg_ops) + + # Make sure all the side-effectful minimizations ops finish before + # proceeding. + with tf.control_dependencies([minimize_op]): + handle = tf.identity(handle) + + # Restore that subsequent builds don't use average by default. + self.read_from_avg = False + + # Returns named access to common outputs. + outputs = { + 'cost': cost, + 'batch': effective_batch, + 'metrics': metrics, + } + return handle, outputs + + def _clip_gradients(self, grad): + """Clips gradients if the hyperparameter `gradient_clip_norm` requires it. + + Sparse tensors, in the form of IndexedSlices returned for the + gradients of embeddings, require special handling. + + Args: + grad: Gradient Tensor, IndexedSlices, or None. + + Returns: + Optionally clipped gradient. + """ + if grad is not None and self.hyperparams.gradient_clip_norm > 0: + logging.info('Clipping gradient %s', grad) + if isinstance(grad, tf.IndexedSlices): + tmp = tf.clip_by_norm(grad.values, self.hyperparams.gradient_clip_norm) + return tf.IndexedSlices(tmp, grad.indices, grad.dense_shape) + else: + return tf.clip_by_norm(grad, self.hyperparams.gradient_clip_norm) + else: + return grad + + def build_post_restore_hook(self): + """Builds a graph that should be executed after the restore op. + + This graph is intended to be run once, before the inference pipeline is + run. + + Returns: + setup_op - An op that, when run, guarantees all setup ops will run. + """ + with tf.control_dependencies( + [comp.build_post_restore_hook() for comp in self.components]): + return tf.no_op(name='post_restore_hook_master') + + def build_inference(self, handle, use_moving_average=False): + """Builds an inference pipeline. + + This always uses the whole pipeline. + + Args: + handle: Handle tensor for the ComputeSession. + use_moving_average: Whether or not to read from the moving + average variables instead of the true parameters. Note: it is not + possible to make gradient updates when this is True. + + Returns: + handle: Handle after annotation. + """ + self.read_from_avg = use_moving_average + network_states = {} + + for comp in self.components: + network_states[comp.name] = component.NetworkState() + handle = dragnn_ops.init_component_data( + handle, beam_size=comp.inference_beam_size, component=comp.name) + master_state = component.MasterState(handle, + dragnn_ops.batch_size( + handle, component=comp.name)) + with tf.control_dependencies([handle]): + handle = comp.build_greedy_inference(master_state, network_states) + handle = dragnn_ops.write_annotations(handle, component=comp.name) + + self.read_from_avg = False + return handle + + def add_training_from_config(self, + target_config, + prefix='train-', + trace_only=False, + **kwargs): + """Constructs a training pipeline from a TrainTarget proto. + + This constructs a separately managed pipeline for a given target: + it has its own ComputeSession, InputSpec placeholder, etc. The ops + are given standardized names to allow access from the C++ API. It + passes the values in target_config to build_training() above. + + For the default prefix ('train-'), and a target named 'target', this will + construct the following targets in the graph: + + train-target/ComputeSession/* (the standard ComputeSession controls) + train-target/run (handle to a completed training step) + train-target/metrics (per-decision metrics from gold oracles) + train-target/cost (total cost across all components) + + Enabling `trace_only` effectively creates a graph that is a 'dry run'. + There will be no side affects. In addition, the gradients won't be computed + and the model parameters will not be updated. + + Args: + target_config: the TrainTarget proto. + prefix: Preprends target_config.name with this to construct + a unique identifier. + trace_only: Enabling this will result in: + 1. Tracing will be enabled for the ComputeSession.. + 2. A 'traces' node will be added to the outputs. + 3. Gradients will not be computed. + **kwargs: Passed on to build_training() above. + + Returns: + Dictionary of training targets. + """ + logging.info('Creating new training target ' + '%s' + ' from config: %s', target_config.name, str(target_config)) + scope_id = prefix + target_config.name + with tf.name_scope(scope_id): + # Construct training targets. Disable tracing during training. + handle, input_batch = self._get_session_with_reader(trace_only) + + # If `trace_only` is True, the training graph shouldn't have any + # side effects. Otherwise, the standard training scenario should + # generate gradients and update counters. + handle, outputs = self.build_training( + handle, + compute_gradients=not trace_only, + advance_counters=not trace_only, + component_weights=target_config.component_weights, + unroll_using_oracle=target_config.unroll_using_oracle, + max_index=target_config.max_index, + **kwargs) + if trace_only: + outputs['traces'] = dragnn_ops.get_component_trace( + handle, component=self.spec.component[-1].name) + else: + # Standard training keeps track of the number of training steps. + outputs['target_step'] = tf.get_variable( + scope_id + '/TargetStep', [], + initializer=tf.zeros_initializer(), + dtype=tf.int32) + increment_target_step = tf.assign_add( + outputs['target_step'], 1, use_locking=True) + + with tf.control_dependencies([increment_target_step]): + handle = tf.identity(handle) + + return self._outputs_with_release(handle, {'input_batch': input_batch}, + outputs) + + def add_annotation(self, name_scope='annotation', enable_tracing=False): + """Adds an annotation pipeline to the graph. + + This will create the following additional named targets by default, for use + in C++ annotation code (as well as regular ComputeSession targets): + annotation/ComputeSession/session_id (placeholder for giving unique id) + annotation/EmitAnnotations (get annotated data) + annotation/GetComponentTrace (get trace data) + annotation/SetTracing (sets tracing based on annotation/tracing_on) + + Args: + name_scope: Scope for the annotation pipeline. + enable_tracing: Enabling this will result in two things: + 1. Tracing will be enabled during inference. + 2. A 'traces' node will be added to the outputs. + + Returns: + A dictionary of input and output nodes. + """ + with tf.name_scope(name_scope): + handle, input_batch = self._get_session_with_reader(enable_tracing) + handle = self.build_inference(handle, use_moving_average=True) + + annotations = dragnn_ops.emit_annotations( + handle, component=self.spec.component[-1].name) + outputs = {'annotations': annotations} + + if enable_tracing: + outputs['traces'] = dragnn_ops.get_component_trace( + handle, component=self.spec.component[-1].name) + + return self._outputs_with_release(handle, {'input_batch': input_batch}, + outputs) + + def add_post_restore_hook(self, name_scope): + """Adds the post restore ops.""" + with tf.name_scope(name_scope): + return self.build_post_restore_hook() + + def add_saver(self): + """Adds a Saver for all variables in the graph.""" + logging.info('Saving non-quantized variables:\n\t%s', '\n\t'.join( + [x.name for x in tf.global_variables() if 'quantized' not in x.name])) + self.saver = tf.train.Saver( + var_list=[ + x for x in tf.global_variables() if 'quantized' not in x.name + ], + write_version=saver_pb2.SaverDef.V1) diff --git a/syntaxnet/dragnn/python/graph_builder_test.py b/syntaxnet/dragnn/python/graph_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3ca81599a91e56dd90aa3b4ef62b62ff0f806ce5 --- /dev/null +++ b/syntaxnet/dragnn/python/graph_builder_test.py @@ -0,0 +1,687 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for graph_builder.""" + + +import collections +import os.path + + +import numpy as np +import tensorflow as tf + +from google.protobuf import text_format + +from dragnn.protos import spec_pb2 +from dragnn.protos import trace_pb2 +from dragnn.python import dragnn_ops +from dragnn.python import graph_builder +from syntaxnet import sentence_pb2 + +from tensorflow.python.framework import test_util +from tensorflow.python.platform import googletest +from tensorflow.python.platform import tf_logging as logging + +import dragnn.python.load_dragnn_cc_impl +import syntaxnet.load_parser_ops + +FLAGS = tf.app.flags.FLAGS +if not hasattr(FLAGS, 'test_srcdir'): + FLAGS.test_srcdir = '' +if not hasattr(FLAGS, 'test_tmpdir'): + FLAGS.test_tmpdir = tf.test.get_temp_dir() + +_DUMMY_GOLD_SENTENCE = """ +token { + word: "sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" +} +token { + word: "0" start: 9 end: 9 head: 0 tag: "CD" category: "NUM" label: "num" +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." category: "." label: "punct" +} +""" + +# The second sentence has different length, to test the effect of +# mixed-length batches. +_DUMMY_GOLD_SENTENCE_2 = """ +token { + word: "sentence" start: 0 end: 7 tag: "NN" category: "NOUN" label: "ROOT" +} +""" + +# The test sentence is the gold sentence with the tags and parse information +# removed. +_DUMMY_TEST_SENTENCE = """ +token { + word: "sentence" start: 0 end: 7 +} +token { + word: "0" start: 9 end: 9 +} +token { + word: "." start: 10 end: 10 +} +""" + +_DUMMY_TEST_SENTENCE_2 = """ +token { + word: "sentence" start: 0 end: 7 +} +""" + +_TAGGER_EXPECTED_SENTENCES = [ + """ +token { + word: "sentence" start: 0 end: 7 tag: "NN" +} +token { + word: "0" start: 9 end: 9 tag: "CD" +} +token { + word: "." start: 10 end: 10 tag: "." +} +""", """ +token { + word: "sentence" start: 0 end: 7 tag: "NN" +} +""" +] + +_TAGGER_PARSER_EXPECTED_SENTENCES = [ + """ +token { + word: "sentence" start: 0 end: 7 tag: "NN" label: "ROOT" +} +token { + word: "0" start: 9 end: 9 head: 0 tag: "CD" label: "num" +} +token { + word: "." start: 10 end: 10 head: 0 tag: "." label: "punct" +} +""", """ +token { + word: "sentence" start: 0 end: 7 tag: "NN" label: "ROOT" +} +""" +] + +_UNLABELED_PARSER_EXPECTED_SENTENCES = [ + """ +token { + word: "sentence" start: 0 end: 7 label: "punct" +} +token { + word: "0" start: 9 end: 9 head: 0 label: "punct" +} +token { + word: "." start: 10 end: 10 head: 0 label: "punct" +} +""", """ +token { + word: "sentence" start: 0 end: 7 label: "punct" +} +""" +] + +_LABELED_PARSER_EXPECTED_SENTENCES = [ + """ +token { + word: "sentence" start: 0 end: 7 label: "ROOT" +} +token { + word: "0" start: 9 end: 9 head: 0 label: "num" +} +token { + word: "." start: 10 end: 10 head: 0 label: "punct" +} +""", """ +token { + word: "sentence" start: 0 end: 7 label: "ROOT" +} +""" +] + + +def _as_op(x): + """Always returns the tf.Operation associated with a node.""" + return x.op if isinstance(x, tf.Tensor) else x + + +def _find_input_path(src, dst_predicate): + """Finds an input path from `src` to a node that satisfies `dst_predicate`. + + TensorFlow graphs are directed. We generate paths from outputs to inputs, + recursively searching both direct (i.e. data) and control inputs. Graphs with + while_loop control flow may contain cycles. Therefore we eliminate loops + during the DFS. + + Args: + src: tf.Tensor or tf.Operation root node. + dst_predicate: function taking one argument (a node), returning true iff a + a target node has been found. + + Returns: + a path from `src` to the first node that satisfies dest_predicate, or the + empty list otherwise. + """ + path_to = {src: None} + + def dfs(x): + if dst_predicate(x): + return x + x_op = _as_op(x) + for y in x_op.control_inputs + list(x_op.inputs): + # Check if we've already visited node `y`. + if y not in path_to: + path_to[y] = x + res = dfs(y) + if res is not None: + return res + return None + + dst = dfs(src) + path = [] + while dst in path_to: + path.append(dst) + dst = path_to[dst] + return list(reversed(path)) + + +def _find_input_path_to_type(src, dst_type): + """Finds a path from `src` to a node with type (i.e. kernel) `dst_type`.""" + return _find_input_path(src, lambda x: _as_op(x).type == dst_type) + + +class GraphBuilderTest(test_util.TensorFlowTestCase): + + def assertEmpty(self, container, msg=None): + """Assert that an object has zero length. + + Args: + container: Anything that implements the collections.Sized interface. + msg: Optional message to report on failure. + """ + if not isinstance(container, collections.Sized): + self.fail('Expected a Sized object, got: ' + '{!r}'.format(type(container).__name__), msg) + + # explicitly check the length since some Sized objects (e.g. numpy.ndarray) + # have strange __nonzero__/__bool__ behavior. + if len(container): + self.fail('{!r} has length of {}.'.format(container, len(container)), msg) + + def assertNotEmpty(self, container, msg=None): + """Assert that an object has non-zero length. + + Args: + container: Anything that implements the collections.Sized interface. + msg: Optional message to report on failure. + """ + if not isinstance(container, collections.Sized): + self.fail('Expected a Sized object, got: ' + '{!r}'.format(type(container).__name__), msg) + + # explicitly check the length since some Sized objects (e.g. numpy.ndarray) + # have strange __nonzero__/__bool__ behavior. + if not len(container): + self.fail('{!r} has length of 0.'.format(container), msg) + + def LoadSpec(self, spec_path): + master_spec = spec_pb2.MasterSpec() + testdata = os.path.join(FLAGS.test_srcdir, + 'dragnn/core/testdata') + with file(os.path.join(testdata, spec_path), 'r') as fin: + text_format.Parse(fin.read().replace('TESTDATA', testdata), master_spec) + return master_spec + + def MakeHyperparams(self, **kwargs): + hyperparam_config = spec_pb2.GridPoint() + for key in kwargs: + setattr(hyperparam_config, key, kwargs[key]) + return hyperparam_config + + def RunTraining(self, hyperparam_config): + master_spec = self.LoadSpec('master_spec_link.textproto') + + self.assertTrue(isinstance(hyperparam_config, spec_pb2.GridPoint)) + gold_doc = sentence_pb2.Sentence() + text_format.Parse(_DUMMY_GOLD_SENTENCE, gold_doc) + gold_doc_2 = sentence_pb2.Sentence() + text_format.Parse(_DUMMY_GOLD_SENTENCE_2, gold_doc_2) + reader_strings = [ + gold_doc.SerializeToString(), gold_doc_2.SerializeToString() + ] + tf.logging.info('Generating graph with config: %s', hyperparam_config) + with tf.Graph().as_default(): + builder = graph_builder.MasterBuilder(master_spec, hyperparam_config) + + target = spec_pb2.TrainTarget() + target.name = 'testTraining-all' + train = builder.add_training_from_config(target) + with self.test_session() as sess: + logging.info('Initializing') + sess.run(tf.global_variables_initializer()) + + # Run one iteration of training and verify nothing crashes. + logging.info('Training') + sess.run(train['run'], feed_dict={train['input_batch']: reader_strings}) + + def testTraining(self): + """Tests the default hyperparameter settings.""" + self.RunTraining(self.MakeHyperparams()) + + def testTrainingWithGradientClipping(self): + """Adds code coverage for gradient clipping.""" + self.RunTraining(self.MakeHyperparams(gradient_clip_norm=1.25)) + + def testTrainingWithAdamAndAveraging(self): + """Adds code coverage for ADAM and the use of moving averaging.""" + self.RunTraining( + self.MakeHyperparams(learning_method='adam', use_moving_average=True)) + + def testTrainingWithCompositeOptimizer(self): + """Adds code coverage for CompositeOptimizer.""" + grid_point = self.MakeHyperparams(learning_method='composite') + grid_point.composite_optimizer_spec.method1.learning_method = 'adam' + grid_point.composite_optimizer_spec.method2.learning_method = 'momentum' + grid_point.composite_optimizer_spec.method2.momentum = 0.9 + self.RunTraining(grid_point) + + def RunFullTrainingAndInference(self, + test_name, + master_spec_path=None, + master_spec=None, + component_weights=None, + unroll_using_oracle=None, + num_evaluated_components=1, + expected_num_actions=None, + expected=None, + batch_size_limit=None): + if not master_spec: + master_spec = self.LoadSpec(master_spec_path) + + gold_doc = sentence_pb2.Sentence() + text_format.Parse(_DUMMY_GOLD_SENTENCE, gold_doc) + gold_doc_2 = sentence_pb2.Sentence() + text_format.Parse(_DUMMY_GOLD_SENTENCE_2, gold_doc_2) + gold_reader_strings = [ + gold_doc.SerializeToString(), gold_doc_2.SerializeToString() + ] + + test_doc = sentence_pb2.Sentence() + text_format.Parse(_DUMMY_TEST_SENTENCE, test_doc) + test_doc_2 = sentence_pb2.Sentence() + text_format.Parse(_DUMMY_TEST_SENTENCE_2, test_doc_2) + test_reader_strings = [ + test_doc.SerializeToString(), test_doc.SerializeToString(), + test_doc_2.SerializeToString(), test_doc.SerializeToString() + ] + + if batch_size_limit is not None: + gold_reader_strings = gold_reader_strings[:batch_size_limit] + test_reader_strings = test_reader_strings[:batch_size_limit] + + with tf.Graph().as_default(): + tf.set_random_seed(1) + hyperparam_config = spec_pb2.GridPoint() + builder = graph_builder.MasterBuilder( + master_spec, hyperparam_config, pool_scope=test_name) + target = spec_pb2.TrainTarget() + target.name = 'testFullInference-train-%s' % test_name + if component_weights: + target.component_weights.extend(component_weights) + else: + target.component_weights.extend([0] * len(master_spec.component)) + target.component_weights[-1] = 1.0 + if unroll_using_oracle: + target.unroll_using_oracle.extend(unroll_using_oracle) + else: + target.unroll_using_oracle.extend([False] * len(master_spec.component)) + target.unroll_using_oracle[-1] = True + train = builder.add_training_from_config(target) + oracle_trace = builder.add_training_from_config( + target, prefix='train_traced-', trace_only=True) + builder.add_saver() + + anno = builder.add_annotation(test_name) + trace = builder.add_annotation(test_name + '-traced', enable_tracing=True) + + # Verifies that the summaries can be built. + for component in builder.components: + component.get_summaries() + + config = tf.ConfigProto( + intra_op_parallelism_threads=0, inter_op_parallelism_threads=0) + with self.test_session(config=config) as sess: + logging.info('Initializing') + sess.run(tf.global_variables_initializer()) + + logging.info('Dry run oracle trace...') + traces = sess.run( + oracle_trace['traces'], + feed_dict={oracle_trace['input_batch']: gold_reader_strings}) + + # Check that the oracle traces are not empty. + for serialized_trace in traces: + master_trace = trace_pb2.MasterTrace() + master_trace.ParseFromString(serialized_trace) + self.assertTrue(master_trace.component_trace) + self.assertTrue(master_trace.component_trace[0].step_trace) + + logging.info('Simulating training...') + break_iter = 400 + is_resolved = False + for i in range(0, + 400): # needs ~100 iterations, but is not deterministic + cost, eval_res_val = sess.run( + [train['cost'], train['metrics']], + feed_dict={train['input_batch']: gold_reader_strings}) + logging.info('cost = %s', cost) + self.assertFalse(np.isnan(cost)) + total_val = eval_res_val.reshape((-1, 2))[:, 0].sum() + correct_val = eval_res_val.reshape((-1, 2))[:, 1].sum() + if correct_val == total_val and not is_resolved: + logging.info('... converged on iteration %d with (correct, total) ' + '= (%d, %d)', i, correct_val, total_val) + is_resolved = True + # Run for slightly longer than convergence to help with quantized + # weight tiebreakers. + break_iter = i + 50 + + if i == break_iter: + break + + # If training failed, report total/correct actions for each component. + if not expected_num_actions: + expected_num_actions = 4 * num_evaluated_components + if (correct_val != total_val or correct_val != expected_num_actions or + total_val != expected_num_actions): + for c in xrange(len(master_spec.component)): + logging.error('component %s:\nname=%s\ntotal=%s\ncorrect=%s', c, + master_spec.component[c].name, eval_res_val[2 * c], + eval_res_val[2 * c + 1]) + + assert correct_val == total_val, 'Did not converge! %d vs %d.' % ( + correct_val, total_val) + + self.assertEqual(expected_num_actions, correct_val) + self.assertEqual(expected_num_actions, total_val) + + builder.saver.save(sess, os.path.join(FLAGS.test_tmpdir, 'model')) + + logging.info('Running test.') + logging.info('Printing annotations') + annotations = sess.run( + anno['annotations'], + feed_dict={anno['input_batch']: test_reader_strings}) + logging.info('Put %d inputs in, got %d annotations out.', + len(test_reader_strings), len(annotations)) + + # Also run the annotation graph with tracing enabled. + annotations_with_trace, traces = sess.run( + [trace['annotations'], trace['traces']], + feed_dict={trace['input_batch']: test_reader_strings}) + + # The result of the two annotation graphs should be identical. + self.assertItemsEqual(annotations, annotations_with_trace) + + # Check that the inference traces are not empty. + for serialized_trace in traces: + master_trace = trace_pb2.MasterTrace() + master_trace.ParseFromString(serialized_trace) + self.assertTrue(master_trace.component_trace) + self.assertTrue(master_trace.component_trace[0].step_trace) + + self.assertEqual(len(test_reader_strings), len(annotations)) + pred_sentences = [] + for annotation in annotations: + pred_sentences.append(sentence_pb2.Sentence()) + pred_sentences[-1].ParseFromString(annotation) + + if expected is None: + expected = _TAGGER_EXPECTED_SENTENCES + + expected_sentences = [expected[i] for i in [0, 0, 1, 0]] + + for i, pred_sentence in enumerate(pred_sentences): + self.assertProtoEquals(expected_sentences[i], pred_sentence) + + def testSimpleTagger(self): + self.RunFullTrainingAndInference('simple-tagger', + 'simple_tagger_master_spec.textproto') + + def testSimpleTaggerLayerNorm(self): + spec = self.LoadSpec('simple_tagger_master_spec.textproto') + spec.component[0].network_unit.parameters['layer_norm_hidden'] = 'True' + spec.component[0].network_unit.parameters['layer_norm_input'] = 'True' + self.RunFullTrainingAndInference('simple-tagger', master_spec=spec) + + def testSimpleTaggerLSTM(self): + self.RunFullTrainingAndInference('simple-tagger-lstm', + 'simple_tagger_lstm_master_spec.textproto') + + def testSimpleTaggerWrappedLSTM(self): + self.RunFullTrainingAndInference( + 'simple-tagger-wrapped-lstm', + 'simple_tagger_wrapped_lstm_master_spec.textproto') + + def testSplitTagger(self): + self.RunFullTrainingAndInference('split-tagger', + 'split_tagger_master_spec.textproto') + + def testTaggerParser(self): + self.RunFullTrainingAndInference( + 'tagger-parser', + 'tagger_parser_master_spec.textproto', + component_weights=[0., 1., 1.], + unroll_using_oracle=[False, True, True], + expected_num_actions=12, + expected=_TAGGER_PARSER_EXPECTED_SENTENCES) + + def testTaggerParserWithAttention(self): + spec = self.LoadSpec('tagger_parser_master_spec.textproto') + + # Make the 'parser' component attend to the 'tagger' component. + self.assertEqual('tagger', spec.component[1].name) + self.assertEqual('parser', spec.component[2].name) + spec.component[2].attention_component = 'tagger' + + # Attention + beam decoding is not yet supported. + spec.component[2].inference_beam_size = 1 + + # Running with batch size equal to 1 should be fine. + self.RunFullTrainingAndInference( + 'tagger-parser', + master_spec=spec, + batch_size_limit=1, + component_weights=[0., 1., 1.], + unroll_using_oracle=[False, True, True], + expected_num_actions=9, + expected=_TAGGER_PARSER_EXPECTED_SENTENCES) + + def testTaggerParserWithAttentionBatchDeath(self): + spec = self.LoadSpec('tagger_parser_master_spec.textproto') + + # Make the 'parser' component attend to the 'tagger' component. + self.assertEqual('tagger', spec.component[1].name) + self.assertEqual('parser', spec.component[2].name) + spec.component[2].attention_component = 'tagger' + + # Trying to run with a batch size greater than 1 should fail: + with self.assertRaises(tf.errors.InvalidArgumentError): + self.RunFullTrainingAndInference( + 'tagger-parser', + master_spec=spec, + component_weights=[0., 1., 1.], + unroll_using_oracle=[False, True, True], + expected_num_actions=9, + expected=_TAGGER_PARSER_EXPECTED_SENTENCES) + + def testStructuredTrainingNotImplementedDeath(self): + spec = self.LoadSpec('simple_parser_master_spec.textproto') + + # Make the 'parser' component have a beam at training time. + self.assertEqual('parser', spec.component[0].name) + spec.component[0].training_beam_size = 8 + + # The training run should fail at runtime rather than build time. + with self.assertRaisesRegexp(tf.errors.InvalidArgumentError, + r'\[Not implemented.\]'): + self.RunFullTrainingAndInference( + 'simple-parser', + master_spec=spec, + expected_num_actions=8, + component_weights=[1], + expected=_LABELED_PARSER_EXPECTED_SENTENCES) + + def testSimpleParser(self): + self.RunFullTrainingAndInference( + 'simple-parser', + 'simple_parser_master_spec.textproto', + expected_num_actions=8, + component_weights=[1], + expected=_LABELED_PARSER_EXPECTED_SENTENCES) + + def checkOpOrder(self, name, endpoint, expected_op_order): + """Checks that ops ending up at root are called in the expected order. + + To check the order, we find a path along the directed graph formed by + the inputs of each op. If op X has a chain of inputs to op Y, then X + cannot be executed before Y. There may be multiple paths between any two + ops, but the ops along any path are executed in that order. Therefore, we + look up the expected ops in reverse order. + + Args: + name: string name of the endpoint, for logging. + endpoint: node whose execution we want to check. + expected_op_order: string list of op types, in the order we expecte them + to be executed leading up to `endpoint`. + """ + for target in reversed(expected_op_order): + path = _find_input_path_to_type(endpoint, target) + self.assertNotEmpty(path) + logging.info('path[%d] from %s to %s: %s', + len(path), name, target, [_as_op(x).type for x in path]) + endpoint = path[-1] + + def getBuilderAndTarget( + self, test_name, master_spec_path='simple_parser_master_spec.textproto'): + """Generates a MasterBuilder and TrainTarget based on a simple spec.""" + master_spec = self.LoadSpec(master_spec_path) + hyperparam_config = spec_pb2.GridPoint() + target = spec_pb2.TrainTarget() + target.name = 'test-%s-train' % test_name + target.component_weights.extend([0] * len(master_spec.component)) + target.component_weights[-1] = 1.0 + target.unroll_using_oracle.extend([False] * len(master_spec.component)) + target.unroll_using_oracle[-1] = True + builder = graph_builder.MasterBuilder( + master_spec, hyperparam_config, pool_scope=test_name) + return builder, target + + def testGetSessionReleaseSession(self): + """Checks that GetSession and ReleaseSession are called in order.""" + test_name = 'get-session-release-session' + + with tf.Graph().as_default(): + # Build the actual graphs. The choice of spec is arbitrary, as long as + # training and annotation nodes can be constructed. + builder, target = self.getBuilderAndTarget(test_name) + train = builder.add_training_from_config(target) + anno = builder.add_annotation(test_name) + + # We want to ensure that certain ops are executed in the correct order. + # Specifically, the ops GetSession and ReleaseSession must both be called, + # and in that order. + # + # First of all, the path to a non-existent node type should be empty. + path = _find_input_path_to_type(train['run'], 'foo') + self.assertEmpty(path) + + # The train['run'] is expected to start by calling GetSession, and to end + # by calling ReleaseSession. + self.checkOpOrder('train', train['run'], ['GetSession', 'ReleaseSession']) + + # A similar contract applies to the annotations. + self.checkOpOrder('annotations', anno['annotations'], + ['GetSession', 'ReleaseSession']) + + def testAttachDataReader(self): + """Checks that train['run'] and 'annotations' call AttachDataReader.""" + test_name = 'attach-data-reader' + + with tf.Graph().as_default(): + builder, target = self.getBuilderAndTarget(test_name) + train = builder.add_training_from_config(target) + anno = builder.add_annotation(test_name) + + # AttachDataReader should be called between GetSession and ReleaseSession. + self.checkOpOrder('train', train['run'], + ['GetSession', 'AttachDataReader', 'ReleaseSession']) + + # A similar contract applies to the annotations. + self.checkOpOrder('annotations', anno['annotations'], + ['GetSession', 'AttachDataReader', 'ReleaseSession']) + + def testSetTracingFalse(self): + """Checks that 'annotations' doesn't call SetTracing if disabled.""" + test_name = 'set-tracing-false' + + with tf.Graph().as_default(): + builder, _ = self.getBuilderAndTarget(test_name) + + # Note: "enable_tracing=False" is the default. + anno = builder.add_annotation(test_name, enable_tracing=False) + + # ReleaseSession should still be there. + path = _find_input_path_to_type(anno['annotations'], 'ReleaseSession') + self.assertNotEmpty(path) + + # As should AttachDataReader. + path = _find_input_path_to_type(path[-1], 'AttachDataReader') + self.assertNotEmpty(path) + + # But SetTracing should not be called. + set_tracing_path = _find_input_path_to_type(path[-1], 'SetTracing') + self.assertEmpty(set_tracing_path) + + # Instead, we should go to GetSession. + path = _find_input_path_to_type(path[-1], 'GetSession') + self.assertNotEmpty(path) + + def testSetTracingTrue(self): + """Checks that 'annotations' does call SetTracing if enabled.""" + test_name = 'set-tracing-true' + + with tf.Graph().as_default(): + builder, _ = self.getBuilderAndTarget(test_name) + anno = builder.add_annotation(test_name, enable_tracing=True) + + # Check SetTracing is called after GetSession but before AttachDataReader. + self.checkOpOrder('annotations', anno['annotations'], [ + 'GetSession', 'SetTracing', 'AttachDataReader', 'ReleaseSession' + ]) + + # Same for the 'traces' output, if that's what you were to call. + self.checkOpOrder('traces', anno['traces'], [ + 'GetSession', 'SetTracing', 'AttachDataReader', 'ReleaseSession' + ]) + + +if __name__ == '__main__': + googletest.main() diff --git a/syntaxnet/dragnn/python/lexicon.py b/syntaxnet/dragnn/python/lexicon.py new file mode 100644 index 0000000000000000000000000000000000000000..b56ca0e8235d531c443aeb924c387321e21a9748 --- /dev/null +++ b/syntaxnet/dragnn/python/lexicon.py @@ -0,0 +1,73 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""SyntaxNet lexicon utils.""" + +import os.path + + +import tensorflow as tf + +from syntaxnet import task_spec_pb2 +from syntaxnet.ops import gen_parser_ops + + +def create_lexicon_context(path): + """Construct a SyntaxNet TaskContext file for standard lexical resources.""" + context = task_spec_pb2.TaskSpec() + for name in [ + 'word-map', 'tag-map', 'tag-to-category', 'lcword-map', 'category-map', + 'char-map', 'char-ngram-map', 'label-map', 'prefix-table', 'suffix-table' + ]: + context.input.add(name=name).part.add(file_pattern=os.path.join(path, name)) + return context + + +def build_lexicon(output_path, + training_corpus_path, + tf_master='', + training_corpus_format='conll-sentence', + morph_to_pos=False, + **kwargs): + """Constructs a SyntaxNet lexicon at the given path. + + Args: + output_path: Location to construct the lexicon. + training_corpus_path: Path to CONLL formatted training data. + tf_master: TensorFlow master executor (string, defaults to '' to use the + local instance). + training_corpus_format: Format of the training corpus (defaults to CONLL; + search for REGISTER_SYNTAXNET_DOCUMENT_FORMAT for other formats). + morph_to_pos: Whether to serialize morph attributes to the tag field, + combined with category and fine POS tag. + **kwargs: Forwarded to the LexiconBuilder op. + """ + context = create_lexicon_context(output_path) + if morph_to_pos: + context.parameter.add(name='join_category_to_pos', value='true') + context.parameter.add(name='add_pos_as_attribute', value='true') + context.parameter.add(name='serialize_morph_to_pos', value='true') + + # Add the training data to the context. + resource = context.input.add() + resource.name = 'corpus' + resource.record_format.extend([training_corpus_format]) + part = resource.part.add() + part.file_pattern = training_corpus_path + + # Run the lexicon builder op. + with tf.Session(tf_master) as sess: + sess.run( + gen_parser_ops.lexicon_builder( + task_context_str=str(context), corpus_name='corpus', **kwargs)) diff --git a/syntaxnet/dragnn/python/lexicon_test.py b/syntaxnet/dragnn/python/lexicon_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d23442bc031cb7fff0be93c21230b1ac73786645 --- /dev/null +++ b/syntaxnet/dragnn/python/lexicon_test.py @@ -0,0 +1,79 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Tests for SyntaxNet lexicon.""" + +import os +import os.path + +import tensorflow as tf + +from google.protobuf import text_format + +from dragnn.python import lexicon + +# Imported for FLAGS.tf_master, which is used in the lexicon module. + +from syntaxnet import parser_trainer +from syntaxnet import task_spec_pb2 + +import syntaxnet.load_parser_ops + +FLAGS = tf.app.flags.FLAGS +if not hasattr(FLAGS, 'test_srcdir'): + FLAGS.test_srcdir = '' +if not hasattr(FLAGS, 'test_tmpdir'): + FLAGS.test_tmpdir = tf.test.get_temp_dir() + + +_EXPECTED_CONTEXT = r""" +input { name: "word-map" Part { file_pattern: "/tmp/word-map" } } +input { name: "tag-map" Part { file_pattern: "/tmp/tag-map" } } +input { name: "tag-to-category" Part { file_pattern: "/tmp/tag-to-category" } } +input { name: "lcword-map" Part { file_pattern: "/tmp/lcword-map" } } +input { name: "category-map" Part { file_pattern: "/tmp/category-map" } } +input { name: "char-map" Part { file_pattern: "/tmp/char-map" } } +input { name: "char-ngram-map" Part { file_pattern: "/tmp/char-ngram-map" } } +input { name: "label-map" Part { file_pattern: "/tmp/label-map" } } +input { name: "prefix-table" Part { file_pattern: "/tmp/prefix-table" } } +input { name: "suffix-table" Part { file_pattern: "/tmp/suffix-table" } } +""" + + +class LexiconTest(tf.test.TestCase): + + def testCreateLexiconContext(self): + expected_context = task_spec_pb2.TaskSpec() + text_format.Parse(_EXPECTED_CONTEXT, expected_context) + self.assertProtoEquals( + lexicon.create_lexicon_context('/tmp'), expected_context) + + def testBuildLexicon(self): + empty_input_path = os.path.join(FLAGS.test_tmpdir, 'empty-input') + lexicon_output_path = os.path.join(FLAGS.test_tmpdir, 'lexicon-output') + + with open(empty_input_path, 'w'): + pass + + # The directory may already exist when running locally multiple times. + if not os.path.exists(lexicon_output_path): + os.mkdir(lexicon_output_path) + + # Just make sure this doesn't crash; the lexicon builder op is already + # exercised in its own unit test. + lexicon.build_lexicon(lexicon_output_path, empty_input_path) + + +if __name__ == '__main__': + tf.test.main() diff --git a/syntaxnet/dragnn/python/load_dragnn_cc_impl.py b/syntaxnet/dragnn/python/load_dragnn_cc_impl.py new file mode 100644 index 0000000000000000000000000000000000000000..913d4eda3b4f9156aeb6001eaa838095f09bb005 --- /dev/null +++ b/syntaxnet/dragnn/python/load_dragnn_cc_impl.py @@ -0,0 +1,22 @@ +# Copyright 2016 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Loads dragnn_ops shared library.""" + +import os.path +import tensorflow as tf + +tf.load_op_library( + os.path.join(tf.resource_loader.get_data_files_path(), 'dragnn_cc_impl.so')) diff --git a/syntaxnet/dragnn/python/network_units.py b/syntaxnet/dragnn/python/network_units.py new file mode 100644 index 0000000000000000000000000000000000000000..6ba2d8d228081df15a5dd79f72e8b2857f480eae --- /dev/null +++ b/syntaxnet/dragnn/python/network_units.py @@ -0,0 +1,1603 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Basic network units used in assembling DRAGNN graphs.""" + +from abc import ABCMeta +from abc import abstractmethod + + +import tensorflow as tf +from tensorflow.python.ops import nn +from tensorflow.python.ops import tensor_array_ops as ta +from tensorflow.python.platform import tf_logging as logging + +from dragnn.python import dragnn_ops +from syntaxnet.util import check +from syntaxnet.util import registry + + +def linked_embeddings_name(channel_id): + """Returns the name of the linked embedding matrix for some channel ID.""" + return 'linked_embedding_matrix_%d' % channel_id + + +def fixed_embeddings_name(channel_id): + """Returns the name of the fixed embedding matrix for some channel ID.""" + return 'fixed_embedding_matrix_%d' % channel_id + + +class StoredActivations(object): + """Wrapper around stored activation vectors. + + Because activations are produced and consumed in different layouts by bulk + vs. dynamic components, this class provides a simple common + interface/conversion API. It can be constructed from either a TensorArray + (dynamic) or a Tensor (bulk), and the resulting object to use for lookups is + either bulk_tensor (for bulk components) or dynamic_tensor (for dynamic + components). + """ + + def __init__(self, tensor=None, array=None, stride=None, dim=None): + """Creates ops for converting the input to either format. + + If 'tensor' is used, then a conversion from [stride * steps, dim] to + [steps + 1, stride, dim] is performed for dynamic_tensor reads. + + If 'array' is used, then a conversion from [steps + 1, stride, dim] to + [stride * steps, dim] is performed for bulk_tensor reads. + + Args: + tensor: Bulk tensor input. + array: TensorArray dynamic input. + stride: stride of bulk tensor. Not used for dynamic. + dim: dim of bulk tensor. Not used for dynamic. + """ + if tensor is not None: + check.IsNone(array, 'Cannot initialize from tensor and array') + check.NotNone(stride, 'Stride is required for bulk tensor') + check.NotNone(dim, 'Dim is required for bulk tensor') + + self._bulk_tensor = tensor + with tf.name_scope('convert_to_dyn'): + tensor = tf.reshape(tensor, [stride, -1, dim]) + tensor = tf.transpose(tensor, perm=[1, 0, 2]) + pad = tf.zeros([1, stride, dim], dtype=tensor.dtype) + self._array_tensor = tf.concat([pad, tensor], 0) + + if array is not None: + check.IsNone(tensor, 'Cannot initialize from both tensor and array') + with tf.name_scope('convert_to_bulk'): + self._bulk_tensor = convert_network_state_tensorarray(array) + with tf.name_scope('convert_to_dyn'): + self._array_tensor = array.stack() + + @property + def bulk_tensor(self): + return self._bulk_tensor + + @property + def dynamic_tensor(self): + return self._array_tensor + + +class NamedTensor(object): + """Container for a tensor with associated name and dimension attributes.""" + + def __init__(self, tensor, name, dim=None): + """Inits NamedTensor with tensor, name and optional dim.""" + self.tensor = tensor + self.name = name + self.dim = dim + + +def add_embeddings(channel_id, feature_spec, seed=None): + """Adds a variable for the embedding of a given fixed feature. + + Supports pre-trained or randomly initialized embeddings In both cases, extra + vector is reserved for out-of-vocabulary words, so the embedding matrix has + the size of [feature_spec.vocabulary_size + 1, feature_spec.embedding_dim]. + + Args: + channel_id: Numeric id of the fixed feature channel + feature_spec: Feature spec protobuf of type FixedFeatureChannel + seed: used for random initializer + + Returns: + tf.Variable object corresponding to the embedding for that feature. + + Raises: + RuntimeError: if more the pretrained embeddings are specified in resources + containing more than one part. + """ + check.Gt(feature_spec.embedding_dim, 0, + 'Embeddings requested for non-embedded feature: %s' % feature_spec) + name = fixed_embeddings_name(channel_id) + shape = [feature_spec.vocabulary_size + 1, feature_spec.embedding_dim] + if feature_spec.HasField('pretrained_embedding_matrix'): + if len(feature_spec.pretrained_embedding_matrix.part) > 1: + raise RuntimeError('pretrained_embedding_matrix resource contains ' + 'more than one part:\n%s', + str(feature_spec.pretrained_embedding_matrix)) + if len(feature_spec.vocab.part) > 1: + raise RuntimeError('vocab resource contains more than one part:\n%s', + str(feature_spec.vocab)) + seed1, seed2 = tf.get_seed(seed) + embeddings = dragnn_ops.dragnn_embedding_initializer( + embedding_input=feature_spec.pretrained_embedding_matrix.part[0] + .file_pattern, + vocab=feature_spec.vocab.part[0].file_pattern, + scaling_coefficient=1.0, + seed=seed1, + seed2=seed2) + return tf.get_variable(name, initializer=tf.reshape(embeddings, shape)) + else: + return tf.get_variable( + name, + shape, + initializer=tf.random_normal_initializer( + stddev=1.0 / feature_spec.embedding_dim**.5, seed=seed)) + + +def embedding_lookup(embedding_matrix, indices, ids, weights, size): + """Performs a weighted embedding lookup. + + Args: + embedding_matrix: float Tensor from which to do the lookup. + indices: int Tensor for the output rows of the looked up vectors. + ids: int Tensor vectors to look up in the embedding_matrix. + weights: float Tensor weights to apply to the looked up vectors. + size: int number of output rows. Needed since some output rows may be + empty. + + Returns: + Weighted embedding vectors. + """ + embeddings = tf.nn.embedding_lookup([embedding_matrix], ids) + # TODO(googleuser): allow skipping weights. + broadcast_weights_shape = tf.concat([tf.shape(weights), [1]], 0) + embeddings *= tf.reshape(weights, broadcast_weights_shape) + embeddings = tf.unsorted_segment_sum(embeddings, indices, size) + return embeddings + + +def fixed_feature_lookup(component, state, channel_id, stride): + """Looks up fixed features and passes them through embeddings. + + Embedding vectors may be scaled by weights if the features specify it. + + Args: + component: Component object in which to look up the fixed features. + state: MasterState object for the live nlp_saft::dragnn::MasterState. + channel_id: int id of the fixed feature to look up. + stride: int Tensor of current batch * beam size. + + Returns: + NamedTensor object containing the embedding vectors. + """ + feature_spec = component.spec.fixed_feature[channel_id] + check.Gt(feature_spec.embedding_dim, 0, + 'Embeddings requested for non-embedded feature: %s' % feature_spec) + embedding_matrix = component.get_variable(fixed_embeddings_name(channel_id)) + + with tf.op_scope([embedding_matrix], 'fixed_embedding_' + feature_spec.name): + indices, ids, weights = dragnn_ops.extract_fixed_features( + state.handle, component=component.name, channel_id=channel_id) + size = stride * feature_spec.size + embeddings = embedding_lookup(embedding_matrix, indices, ids, weights, size) + dim = feature_spec.size * feature_spec.embedding_dim + return NamedTensor( + tf.reshape(embeddings, [-1, dim]), feature_spec.name, dim=dim) + + +def get_input_tensor(fixed_embeddings, linked_embeddings): + """Helper function for constructing an input tensor from all the features. + + Args: + fixed_embeddings: list of NamedTensor objects for fixed feature channels + linked_embeddings: list of NamedTensor objects for linked feature channels + + Returns: + a tensor of shape [N, D], where D is the total input dimension of the + concatenated feature channels + + Raises: + RuntimeError: if no features, fixed or linked, are configured. + """ + embeddings = fixed_embeddings + linked_embeddings + if not embeddings: + raise RuntimeError('There needs to be at least one feature set defined.') + + # Concat_v2 takes care of optimizing away the concatenation + # operation in the case when there is exactly one embedding input. + return tf.concat([e.tensor for e in embeddings], 1) + + +def get_input_tensor_with_stride(fixed_embeddings, linked_embeddings, stride): + """Constructs an input tensor with a separate dimension for steps. + + Args: + fixed_embeddings: list of NamedTensor objects for fixed feature channels + linked_embeddings: list of NamedTensor objects for linked feature channels + stride: int stride (i.e. beam * batch) to use to reshape the input + + Returns: + a tensor of shape [stride, num_steps, D], where D is the total input + dimension of the concatenated feature channels + """ + input_tensor = get_input_tensor(fixed_embeddings, linked_embeddings) + shape = tf.shape(input_tensor) + return tf.reshape(input_tensor, [stride, -1, shape[1]]) + + +def convert_network_state_tensorarray(tensorarray): + """Converts a source TensorArray to a source Tensor. + + Performs a permutation between the steps * [stride, D] shape of a + source TensorArray and the (flattened) [stride * steps, D] shape of + a source Tensor. + + The TensorArrays used during recurrence have an additional zeroth step that + needs to be removed. + + Args: + tensorarray: TensorArray object to be converted. + + Returns: + Tensor object after conversion. + """ + tensor = tensorarray.stack() # Results in a [steps, stride, D] tensor. + tensor = tf.slice(tensor, [1, 0, 0], [-1, -1, -1]) # Lop off the 0th step. + tensor = tf.transpose(tensor, [1, 0, 2]) # Switch steps and stride. + return tf.reshape(tensor, [-1, tf.shape(tensor)[2]]) + + +def pass_through_embedding_matrix(act_block, embedding_matrix, step_idx): + """Passes the activations through the embedding_matrix. + + Takes care to handle out of bounds lookups. + + Args: + act_block: matrix of activations. + embedding_matrix: matrix of weights. + step_idx: vector containing step indices, with -1 indicating out of bounds. + + Returns: + the embedded activations. + """ + # Indicator vector for out of bounds lookups. + step_idx_mask = tf.expand_dims(tf.equal(step_idx, -1), -1) + + # Pad the last column of the activation vectors with the indicator. + act_block = tf.concat([act_block, tf.to_float(step_idx_mask)], 1) + return tf.matmul(act_block, embedding_matrix) + + +def lookup_named_tensor(name, named_tensors): + """Retrieves a NamedTensor by name. + + Args: + name: Name of the tensor to retrieve. + named_tensors: List of NamedTensor objects to search. + + Returns: + The NamedTensor in |named_tensors| with the |name|. + + Raises: + KeyError: If the |name| is not found among the |named_tensors|. + """ + for named_tensor in named_tensors: + if named_tensor.name == name: + return named_tensor + raise KeyError('Name "%s" not found in named tensors: %s' % + (name, named_tensors)) + + +def activation_lookup_recurrent(component, state, channel_id, source_array, + source_layer_size, stride): + """Looks up activations from tensor arrays. + + If the linked feature's embedding_dim is set to -1, the feature vectors are + not passed through (i.e. multiplied by) an embedding matrix. + + Args: + component: Component object in which to look up the fixed features. + state: MasterState object for the live nlp_saft::dragnn::MasterState. + channel_id: int id of the fixed feature to look up. + source_array: TensorArray from which to fetch feature vectors, expected to + have size [steps + 1] elements of shape [stride, D] each. + source_layer_size: int length of feature vectors before embedding. + stride: int Tensor of current batch * beam size. + + Returns: + NamedTensor object containing the embedding vectors. + """ + feature_spec = component.spec.linked_feature[channel_id] + + with tf.name_scope('activation_lookup_recurrent_%s' % feature_spec.name): + # Linked features are returned as a pair of tensors, one indexing into + # steps, and one indexing within the activation tensor (beam x batch) + # stored for a step. + step_idx, idx = dragnn_ops.extract_link_features( + state.handle, component=component.name, channel_id=channel_id) + + # We take the [steps, batch*beam, ...] tensor array, gather and concat + # the steps we might need into a [some_steps*batch*beam, ...] tensor, + # and flatten 'idx' to dereference this new tensor. + # + # The first element of each tensor array is reserved for an + # initialization variable, so we offset all step indices by +1. + # + # TODO(googleuser): It would be great to not have to extract + # the steps in their entirety, forcing a copy of much of the + # TensorArray at each step. Better would be to support a + # TensorArray.gather_nd to pick the specific elements directly. + # TODO(googleuser): In the interim, a small optimization would + # be to use tf.unique instead of tf.range. + step_min = tf.reduce_min(step_idx) + ta_range = tf.range(step_min + 1, tf.reduce_max(step_idx) + 2) + act_block = source_array.gather(ta_range) + act_block = tf.reshape(act_block, + tf.concat([[-1], tf.shape(act_block)[2:]], 0)) + flat_idx = (step_idx - step_min) * stride + idx + act_block = tf.gather(act_block, flat_idx) + act_block = tf.reshape(act_block, [-1, source_layer_size]) + + if feature_spec.embedding_dim != -1: + embedding_matrix = component.get_variable( + linked_embeddings_name(channel_id)) + act_block = pass_through_embedding_matrix(act_block, embedding_matrix, + step_idx) + dim = feature_spec.size * feature_spec.embedding_dim + else: + # If embedding_dim is -1, just output concatenation of activations. + dim = feature_spec.size * source_layer_size + + return NamedTensor( + tf.reshape(act_block, [-1, dim]), feature_spec.name, dim=dim) + + +def activation_lookup_other(component, state, channel_id, source_tensor, + source_layer_size): + """Looks up activations from tensors. + + If the linked feature's embedding_dim is set to -1, the feature vectors are + not passed through (i.e. multiplied by) an embedding matrix. + + Args: + component: Component object in which to look up the fixed features. + state: MasterState object for the live nlp_saft::dragnn::MasterState. + channel_id: int id of the fixed feature to look up. + source_tensor: Tensor from which to fetch feature vectors. Expected to have + have shape [steps + 1, stride, D]. + source_layer_size: int length of feature vectors before embedding (D). It + would in principle be possible to get this dimension dynamically from + the second dimension of source_tensor. However, having it statically is + more convenient. + + Returns: + NamedTensor object containing the embedding vectors. + """ + feature_spec = component.spec.linked_feature[channel_id] + + with tf.name_scope('activation_lookup_other_%s' % feature_spec.name): + # Linked features are returned as a pair of tensors, one indexing into + # steps, and one indexing within the stride (beam x batch) of each step. + step_idx, idx = dragnn_ops.extract_link_features( + state.handle, component=component.name, channel_id=channel_id) + + # The first element of each tensor array is reserved for an + # initialization variable, so we offset all step indices by +1. + indices = tf.stack([step_idx + 1, idx], axis=1) + act_block = tf.gather_nd(source_tensor, indices) + act_block = tf.reshape(act_block, [-1, source_layer_size]) + + if feature_spec.embedding_dim != -1: + embedding_matrix = component.get_variable( + linked_embeddings_name(channel_id)) + act_block = pass_through_embedding_matrix(act_block, embedding_matrix, + step_idx) + dim = feature_spec.size * feature_spec.embedding_dim + else: + # If embedding_dim is -1, just output concatenation of activations. + dim = feature_spec.size * source_layer_size + + return NamedTensor( + tf.reshape(act_block, [-1, dim]), feature_spec.name, dim=dim) + + +class LayerNorm(object): + """Utility to add layer normalization to any tensor. + + Layer normalization implementation is based on: + + https://arxiv.org/abs/1607.06450. "Layer Normalization" + Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton + + This object will construct additional variables that need to be optimized, and + these variables can be accessed via params(). + + Attributes: + params: List of additional parameters to be trained. + """ + + def __init__(self, component, name, shape, dtype): + """Construct variables to normalize an input of given shape. + + Arguments: + component: ComponentBuilder handle. + name: Human readable name to organize the variables. + shape: Shape of the layer to be normalized. + dtype: Type of the layer to be normalized. + """ + self._name = name + self._shape = shape + self._component = component + beta = tf.get_variable( + 'beta_%s' % name, + shape=shape, + dtype=dtype, + initializer=tf.zeros_initializer()) + gamma = tf.get_variable( + 'gamma_%s' % name, + shape=shape, + dtype=dtype, + initializer=tf.ones_initializer()) + self._params = [beta, gamma] + + @property + def params(self): + return self._params + + def normalize(self, inputs): + """Apply normalization to input. + + The shape must match the declared shape in the constructor. + [This is copied from tf.contrib.rnn.LayerNormBasicLSTMCell.] + + Args: + inputs: Input tensor + + Returns: + Normalized version of input tensor. + + Raises: + ValueError: if inputs has undefined rank. + """ + inputs_shape = inputs.get_shape() + inputs_rank = inputs_shape.ndims + if inputs_rank is None: + raise ValueError('Inputs %s has undefined rank.' % inputs.name) + axis = range(1, inputs_rank) + + beta = self._component.get_variable('beta_%s' % self._name) + gamma = self._component.get_variable('gamma_%s' % self._name) + + with tf.variable_scope('layer_norm_%s' % self._name): + # Calculate the moments on the last axis (layer activations). + mean, variance = nn.moments(inputs, axis, keep_dims=True) + + # Compute layer normalization using the batch_normalization function. + variance_epsilon = 1E-12 + outputs = nn.batch_normalization( + inputs, mean, variance, beta, gamma, variance_epsilon) + outputs.set_shape(inputs_shape) + return outputs + + +class Layer(object): + """A layer in a feed-forward network. + + Attributes: + component: ComponentBuilderBase that produces this layer. + name: Name of this layer. + dim: Dimension of this layer, or negative if dynamic. + """ + + def __init__(self, component, name, dim): + check.NotNone(dim, 'Dimension is required') + self.component = component + self.name = name + self.dim = dim + + def __str__(self): + return 'Layer: %s/%s[%d]' % (self.component.name, self.name, self.dim) + + def create_array(self, stride): + """Creates a new tensor array to store this layer's activations. + + Arguments: + stride: Possibly dynamic batch * beam size with which to initialize the + tensor array + + Returns: + TensorArray object + """ + check.Gt(self.dim, 0, 'Cannot create array when dimension is dynamic') + tensor_array = ta.TensorArray(dtype=tf.float32, + size=0, + dynamic_size=True, + clear_after_read=False, + infer_shape=False, + name='%s_array' % self.name) + + # Start each array with all zeros. Special values will still be learned via + # the extra embedding dimension stored for each linked feature channel. + initial_value = tf.zeros([stride, self.dim]) + return tensor_array.write(0, initial_value) + + +def get_attrs_with_defaults(parameters, defaults): + """Populates a dictionary with run-time attributes. + + Given defaults, populates any overrides from 'parameters' with their + corresponding converted values. 'defaults' should be typed. This is useful + for specifying NetworkUnit-specific configuration options. + + Args: + parameters: a map. + defaults: a typed set of default values. + + Returns: + dictionary populated with any overrides. + + Raises: + RuntimeError: if a key in parameters is not present in defaults. + """ + attrs = defaults + for key, value in parameters.iteritems(): + check.In(key, defaults, 'Unknown attribute: %s' % key) + if isinstance(defaults[key], bool): + attrs[key] = value.lower() == 'true' + else: + attrs[key] = type(defaults[key])(value) + return attrs + + +def maybe_apply_dropout(inputs, keep_prob, per_sequence, stride=None): + """Applies dropout, if so configured, to an input tensor. + + The input may be rank 2 or 3 depending on whether the stride (i.e., batch + size) has been incorporated into the shape. + + Args: + inputs: [stride * num_steps, dim] or [stride, num_steps, dim] input tensor. + keep_prob: Scalar probability of keeping each input element. If >= 1.0, no + dropout is performed. + per_sequence: If true, sample the dropout mask once per sequence, instead of + once per step. Requires |stride| when true. + stride: Scalar batch size. Optional if |per_sequence| is false. + + Returns: + [stride * num_steps, dim] or [stride, num_steps, dim] tensor, matching the + shape of |inputs|, containing the masked or original inputs, depending on + whether dropout was actually performed. + """ + check.Ge(inputs.get_shape().ndims, 2, 'inputs must be rank 2 or 3') + check.Le(inputs.get_shape().ndims, 3, 'inputs must be rank 2 or 3') + flat = (inputs.get_shape().ndims == 2) + + if keep_prob >= 1.0: + return inputs + + if not per_sequence: + return tf.nn.dropout(inputs, keep_prob) + + check.NotNone(stride, 'per-sequence dropout requires stride') + dim = inputs.get_shape().as_list()[-1] + check.NotNone(dim, 'inputs must have static activation dimension, but have ' + 'static shape %s' % inputs.get_shape().as_list()) + + # If needed, restore the batch dimension to separate the sequences. + inputs_sxnxd = tf.reshape(inputs, [stride, -1, dim]) if flat else inputs + + # Replace |num_steps| with 1 in |noise_shape|, so the dropout mask broadcasts + # to all steps for a particular sequence. + noise_shape = [stride, 1, dim] + masked_sxnxd = tf.nn.dropout(inputs_sxnxd, keep_prob, noise_shape) + + # If needed, flatten out the batch dimension in the return value. + return tf.reshape(masked_sxnxd, [-1, dim]) if flat else masked_sxnxd + + +@registry.RegisteredClass +class NetworkUnitInterface(object): + """Base class to implement NN specifications. + + This class contains the required functionality to build a network inside of a + DRAGNN graph: (1) initializing TF variables during __init__(), and (2) + creating particular instances from extracted features in create(). + + Attributes: + params (list): List of tf.Variable objects representing trainable + parameters. + layers (list): List of Layer objects to track network layers that should + be written to Tensors during training and inference. + """ + __metaclass__ = ABCMeta # required for @abstractmethod + + def __init__(self, component, init_layers=None, init_context_layers=None): + """Initializes parameters for embedding matrices. + + The subclass may provide optional lists of initial layers and context layers + to allow this base class constructor to use accessors like get_layer_size(), + which is required for networks that may be used self-recurrently. + + Args: + component: parent ComponentBuilderBase object. + init_layers: optional initial layers. + init_context_layers: optional initial context layers. + """ + self._component = component + self._params = [] + self._layers = init_layers if init_layers else [] + self._regularized_weights = [] + self._context_layers = init_context_layers if init_context_layers else [] + self._fixed_feature_dims = {} # mapping from name to dimension + self._linked_feature_dims = {} # mapping from name to dimension + + # Allocate parameters for all embedding channels. Note that for both Fixed + # and Linked embedding matrices, we store an additional +1 embedding that's + # used when the index is out of scope. + for channel_id, spec in enumerate(component.spec.fixed_feature): + check.NotIn(spec.name, self._fixed_feature_dims, + 'Duplicate fixed feature') + check.Gt(spec.size, 0, 'Invalid fixed feature size') + if spec.embedding_dim > 0: + fixed_dim = spec.embedding_dim + self._params.append(add_embeddings(channel_id, spec)) + else: + fixed_dim = 1 # assume feature ID extraction; only one ID per step + self._fixed_feature_dims[spec.name] = spec.size * fixed_dim + + for channel_id, spec in enumerate(component.spec.linked_feature): + check.NotIn(spec.name, self._linked_feature_dims, + 'Duplicate linked feature') + check.Gt(spec.size, 0, 'Invalid linked feature size') + if spec.source_component == component.name: + source_array_dim = self.get_layer_size(spec.source_layer) + else: + source = component.master.lookup_component[spec.source_component] + source_array_dim = source.network.get_layer_size(spec.source_layer) + + if spec.embedding_dim != -1: + check.Gt(source_array_dim, 0, + 'Cannot embed linked feature with dynamic dimension') + self._params.append( + tf.get_variable( + linked_embeddings_name(channel_id), + [source_array_dim + 1, spec.embedding_dim], + initializer=tf.random_normal_initializer( + stddev=1 / spec.embedding_dim**.5))) + + self._linked_feature_dims[spec.name] = spec.size * spec.embedding_dim + else: + # If embedding_dim is -1, linked features are not embedded. + self._linked_feature_dims[spec.name] = spec.size * source_array_dim + + # Compute the cumulative dimension of all inputs. If any input has dynamic + # dimension, then the result is -1. + input_dims = (self._fixed_feature_dims.values() + + self._linked_feature_dims.values()) + if any(x < 0 for x in input_dims): + self._concatenated_input_dim = -1 + else: + self._concatenated_input_dim = sum(input_dims) + tf.logging.info('component %s concat_input_dim %s', component.name, + self._concatenated_input_dim) + + # Allocate attention parameters. + if self._component.spec.attention_component: + attention_source_component = self._component.master.lookup_component[ + self._component.spec.attention_component] + attention_hidden_layer_sizes = map( + int, attention_source_component.spec.network_unit.parameters[ + 'hidden_layer_sizes'].split(',')) + attention_hidden_layer_size = attention_hidden_layer_sizes[-1] + + hidden_layer_sizes = map(int, component.spec.network_unit.parameters[ + 'hidden_layer_sizes'].split(',')) + # The attention function is built on the last layer of hidden embeddings. + hidden_layer_size = hidden_layer_sizes[-1] + self._params.append( + tf.get_variable( + 'attention_weights_pm_0', + [attention_hidden_layer_size, hidden_layer_size], + initializer=tf.random_normal_initializer(stddev=1e-4))) + + self._params.append( + tf.get_variable( + 'attention_weights_hm_0', [hidden_layer_size, hidden_layer_size], + initializer=tf.random_normal_initializer(stddev=1e-4))) + + self._params.append( + tf.get_variable( + 'attention_bias_0', [1, hidden_layer_size], + initializer=tf.zeros_initializer())) + + self._params.append( + tf.get_variable( + 'attention_bias_1', [1, hidden_layer_size], + initializer=tf.zeros_initializer())) + + self._params.append( + tf.get_variable( + 'attention_weights_pu', + [attention_hidden_layer_size, component.num_actions], + initializer=tf.random_normal_initializer(stddev=1e-4))) + + @abstractmethod + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """Constructs a feed-forward unit based on the features and context tensors. + + Args: + fixed_embeddings: list of NamedTensor objects + linked_embeddings: list of NamedTensor objects + context_tensor_arrays: optional list of TensorArray objects used for + implicit recurrence. + attention_tensor: optional Tensor used for attention. + during_training: whether to create a network for training (vs inference). + stride: int scalar tensor containing the stride required for + bulk computation. + + Returns: + A list of tensors corresponding to the list of layers. + """ + pass + + @property + def layers(self): + return self._layers + + @property + def params(self): + return self._params + + @property + def regularized_weights(self): + return self._regularized_weights + + @property + def context_layers(self): + return self._context_layers + + def get_layer_index(self, layer_name): + """Gets the index of the given named layer of the network.""" + return [x.name for x in self.layers].index(layer_name) + + def get_layer_size(self, layer_name): + """Gets the size of the given named layer of the network. + + Args: + layer_name: string name of layer to look update + + Returns: + the size of the layer. + + Raises: + KeyError: if the layer_name to look up doesn't exist. + """ + for layer in self.layers: + if layer.name == layer_name: + return layer.dim + raise KeyError('Layer {} not found in component {}'.format( + layer_name, self._component.name)) + + def get_logits(self, network_tensors): + """Pulls out the logits from the tensors produced by this unit. + + Args: + network_tensors: list of tensors as output by create(). + + Raises: + NotImplementedError: by default a 'logits' tensor need not be implemented. + """ + raise NotImplementedError() + + def get_l2_regularized_weights(self): + """Gets the weights that need to be regularized.""" + return self.regularized_weights + + def attention(self, last_layer, attention_tensor): + """Compute the attention term for the network unit.""" + h_tensor = attention_tensor + + # Compute the attentions. + # Using feed-forward net to map the two inputs into the same dimension + focus_tensor = tf.nn.tanh( + tf.matmul( + h_tensor, + self._component.get_variable('attention_weights_pm_0'), + name='h_x_pm') + self._component.get_variable('attention_bias_0')) + + context_tensor = tf.nn.tanh( + tf.matmul( + last_layer, + self._component.get_variable('attention_weights_hm_0'), + name='l_x_hm') + self._component.get_variable('attention_bias_1')) + # The tf.multiply in the following expression broadcasts along the 0 dim: + z_vec = tf.reduce_sum(tf.multiply(focus_tensor, context_tensor), 1) + p_vec = tf.nn.softmax(tf.reshape(z_vec, [1, -1])) + # The tf.multiply in the following expression broadcasts along the 1 dim: + r_vec = tf.expand_dims( + tf.reduce_sum( + tf.multiply( + h_tensor, tf.reshape(p_vec, [-1, 1]), name='time_together2'), + 0), + 0) + return tf.matmul( + r_vec, + self._component.get_variable('attention_weights_pu'), + name='time_together3') + + +class IdentityNetwork(NetworkUnitInterface): + """A network that returns concatenated input embeddings and activations.""" + + def __init__(self, component): + super(IdentityNetwork, self).__init__(component) + self._layers = [ + Layer( + component, + name='input_embeddings', + dim=self._concatenated_input_dim) + ] + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + return [get_input_tensor(fixed_embeddings, linked_embeddings)] + + def get_layer_size(self, layer_name): + # Note that get_layer_size is called by super.__init__ before any layers are + # constructed if and only if there are recurrent links. + assert hasattr(self, + '_layers'), 'IdentityNetwork cannot have recurrent links' + return super(IdentityNetwork, self).get_layer_size(layer_name) + + def get_logits(self, network_tensors): + return network_tensors[-1] + + def get_context_layers(self): + return [] + + +class FeedForwardNetwork(NetworkUnitInterface): + """Implementation of C&M style feedforward network. + + Supports dropout and optional layer normalization. + + Layers: + layer_: Activations for i'th hidden layer (0-origin). + last_layer: Activations for the last hidden layer. This is a convenience + alias for "layer_", where n is the number of hidden layers. + logits: Logits associated with component actions. + """ + + def __init__(self, component): + """Initializes parameters required to run this network. + + Args: + component: parent ComponentBuilderBase object. + + Parameters used to construct the network: + hidden_layer_sizes: comma-separated list of ints, indicating the + number of hidden units in each hidden layer. + layer_norm_input (False): Whether or not to apply layer normalization + on the concatenated input to the network. + layer_norm_hidden (False): Whether or not to apply layer normalization + to the first set of hidden layer activations. + nonlinearity ('relu'): Name of function from module "tf.nn" to apply to + each hidden layer; e.g., "relu" or "elu". + dropout_keep_prob (-1.0): The probability that an input is not dropped. + If >= 1.0, disables dropout. If < 0.0, uses the global |dropout_rate| + hyperparameter. + dropout_per_sequence (False): If true, sample the dropout mask once per + sequence, instead of once per step. See Gal and Ghahramani + (https://arxiv.org/abs/1512.05287). + dropout_all_layers (False): If true, apply dropout to the input of all + hidden layers, instead of just applying it to the network input. + + Hyperparameters used: + dropout_rate: The probability that an input is not dropped. Only used + when the |dropout_keep_prob| parameter is negative. + """ + self._attrs = get_attrs_with_defaults( + component.spec.network_unit.parameters, defaults={ + 'hidden_layer_sizes': '', + 'layer_norm_input': False, + 'layer_norm_hidden': False, + 'nonlinearity': 'relu', + 'dropout_keep_prob': -1.0, + 'dropout_per_sequence': False, + 'dropout_all_layers': False}) + + # Initialize the hidden layer sizes before running the base initializer, as + # the base initializer may need to know the size of of the hidden layer for + # recurrent connections. + self._hidden_layer_sizes = ( + map(int, self._attrs['hidden_layer_sizes'].split(',')) + if self._attrs['hidden_layer_sizes'] else []) + super(FeedForwardNetwork, self).__init__(component) + + # Infer dropout rate from network parameters and grid hyperparameters. + self._dropout_rate = self._attrs['dropout_keep_prob'] + if self._dropout_rate < 0.0: + self._dropout_rate = component.master.hyperparams.dropout_rate + + # Add layer norm if specified. + self._layer_norm_input = None + self._layer_norm_hidden = None + if self._attrs['layer_norm_input']: + self._layer_norm_input = LayerNorm(self._component, 'concat_input', + self._concatenated_input_dim, + tf.float32) + self._params.extend(self._layer_norm_input.params) + + if self._attrs['layer_norm_hidden']: + self._layer_norm_hidden = LayerNorm(self._component, 'layer_0', + self._hidden_layer_sizes[0], + tf.float32) + self._params.extend(self._layer_norm_hidden.params) + + # Extract nonlinearity from |tf.nn|. + self._nonlinearity = getattr(tf.nn, self._attrs['nonlinearity']) + + # TODO(googleuser): add initializer stddevs as part of the network unit's + # configuration. + self._weights = [] + last_layer_dim = self._concatenated_input_dim + + # Initialize variables for the parameters, and add Layer objects for + # cross-component bookkeeping. + for index, hidden_layer_size in enumerate(self._hidden_layer_sizes): + weights = tf.get_variable( + 'weights_%d' % index, [last_layer_dim, hidden_layer_size], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._params.append(weights) + if index > 0 or self._layer_norm_hidden is None: + self._params.append( + tf.get_variable( + 'bias_%d' % index, [hidden_layer_size], + initializer=tf.constant_initializer( + 0.2, dtype=tf.float32))) + + self._weights.append(weights) + self._layers.append( + Layer( + component, name='layer_%d' % index, dim=hidden_layer_size)) + last_layer_dim = hidden_layer_size + + # Add a convenience alias for the last hidden layer, if any. + if self._hidden_layer_sizes: + self._layers.append(Layer(component, 'last_layer', last_layer_dim)) + + # By default, regularize only the weights. + self._regularized_weights.extend(self._weights) + + if component.num_actions: + self._params.append( + tf.get_variable( + 'weights_softmax', [last_layer_dim, component.num_actions], + initializer=tf.random_normal_initializer(stddev=1e-4))) + self._params.append( + tf.get_variable( + 'bias_softmax', [component.num_actions], + initializer=tf.zeros_initializer())) + self._layers.append( + Layer( + component, name='logits', dim=component.num_actions)) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """See base class.""" + input_tensor = get_input_tensor(fixed_embeddings, linked_embeddings) + + if during_training: + input_tensor.set_shape([None, self._concatenated_input_dim]) + input_tensor = self._maybe_apply_dropout(input_tensor, stride) + + if self._layer_norm_input: + input_tensor = self._layer_norm_input.normalize(input_tensor) + + tensors = [] + last_layer = input_tensor + for index, hidden_layer_size in enumerate(self._hidden_layer_sizes): + acts = tf.matmul(last_layer, + self._component.get_variable('weights_%d' % index)) + + # Note that the first layer was already handled before this loop. + # TODO(googleuser): Refactor this loop so dropout and layer normalization + # are applied consistently. + if during_training and self._attrs['dropout_all_layers'] and index > 0: + acts.set_shape([None, hidden_layer_size]) + acts = self._maybe_apply_dropout(acts, stride) + + # Don't add a bias term if we're going to apply layer norm, since layer + # norm includes a bias already. + if index == 0 and self._layer_norm_hidden: + acts = self._layer_norm_hidden.normalize(acts) + else: + acts = tf.nn.bias_add(acts, + self._component.get_variable('bias_%d' % index)) + + last_layer = self._nonlinearity(acts) + tensors.append(last_layer) + + # Add a convenience alias for the last hidden layer, if any. + if self._hidden_layer_sizes: + tensors.append(last_layer) + + if self._layers[-1].name == 'logits': + logits = tf.matmul( + last_layer, self._component.get_variable( + 'weights_softmax')) + self._component.get_variable('bias_softmax') + + if self._component.spec.attention_component: + logits += self.attention(last_layer, attention_tensor) + + logits = tf.identity(logits, name=self._layers[-1].name) + tensors.append(logits) + return tensors + + def get_layer_size(self, layer_name): + if layer_name == 'logits': + return self._component.num_actions + + if layer_name == 'last_layer': + return self._hidden_layer_sizes[-1] + + if not layer_name.startswith('layer_'): + logging.fatal( + 'Invalid layer name: "%s" Can only retrieve from "logits", ' + '"last_layer", and "layer_*".', + layer_name) + + # NOTE(danielandor): Since get_layer_size is called before the + # model has been built, we compute the layer size directly from + # the hyperparameters rather than from self._layers. + layer_index = int(layer_name.split('_')[1]) + return self._hidden_layer_sizes[layer_index] + + def get_logits(self, network_tensors): + return network_tensors[-1] + + def _maybe_apply_dropout(self, inputs, stride): + return maybe_apply_dropout(inputs, self._dropout_rate, + self._attrs['dropout_per_sequence'], stride) + + +class LSTMNetwork(NetworkUnitInterface): + """Implementation of action LSTM style network.""" + + def __init__(self, component): + assert component.num_actions > 0, 'Component num actions must be positive.' + network_unit_spec = component.spec.network_unit + self._hidden_layer_sizes = ( + int)(network_unit_spec.parameters['hidden_layer_sizes']) + + self._input_dropout_rate = component.master.hyperparams.dropout_rate + self._recurrent_dropout_rate = ( + component.master.hyperparams.recurrent_dropout_rate) + if self._recurrent_dropout_rate < 0.0: + self._recurrent_dropout_rate = component.master.hyperparams.dropout_rate + + super(LSTMNetwork, self).__init__(component) + layer_input_dim = self._concatenated_input_dim + + self._context_layers = [] + + # TODO(googleuser): should we choose different initilizer, + # e.g. truncated_normal_initializer? + self._x2i = tf.get_variable( + 'x2i', [layer_input_dim, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._h2i = tf.get_variable( + 'h2i', [self._hidden_layer_sizes, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._c2i = tf.get_variable( + 'c2i', [self._hidden_layer_sizes, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._bi = tf.get_variable( + 'bi', [self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + + self._x2o = tf.get_variable( + 'x2o', [layer_input_dim, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._h2o = tf.get_variable( + 'h2o', [self._hidden_layer_sizes, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._c2o = tf.get_variable( + 'c2o', [self._hidden_layer_sizes, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._bo = tf.get_variable( + 'bo', [self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + + self._x2c = tf.get_variable( + 'x2c', [layer_input_dim, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._h2c = tf.get_variable( + 'h2c', [self._hidden_layer_sizes, self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + self._bc = tf.get_variable( + 'bc', [self._hidden_layer_sizes], + initializer=tf.random_normal_initializer(stddev=1e-4)) + + self._params.extend([ + self._x2i, self._h2i, self._c2i, self._bi, self._x2o, self._h2o, + self._c2o, self._bo, self._x2c, self._h2c, self._bc]) + + lstm_h_layer = Layer(component, name='lstm_h', dim=self._hidden_layer_sizes) + lstm_c_layer = Layer(component, name='lstm_c', dim=self._hidden_layer_sizes) + + self._context_layers.append(lstm_h_layer) + self._context_layers.append(lstm_c_layer) + + self._layers.extend(self._context_layers) + + self._layers.append( + Layer( + component, name='layer_0', dim=self._hidden_layer_sizes)) + + self.params.append(tf.get_variable( + 'weights_softmax', [self._hidden_layer_sizes, component.num_actions], + initializer=tf.random_normal_initializer(stddev=1e-4))) + self.params.append( + tf.get_variable( + 'bias_softmax', [component.num_actions], + initializer=tf.zeros_initializer())) + + self._layers.append( + Layer( + component, name='logits', dim=component.num_actions)) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """See base class.""" + input_tensor = get_input_tensor(fixed_embeddings, linked_embeddings) + + # context_tensor_arrays[0] is lstm_h + # context_tensor_arrays[1] is lstm_c + assert len(context_tensor_arrays) == 2 + length = context_tensor_arrays[0].size() + + # Get the (possibly averaged) parameters to execute the network. + x2i = self._component.get_variable('x2i') + h2i = self._component.get_variable('h2i') + c2i = self._component.get_variable('c2i') + bi = self._component.get_variable('bi') + x2o = self._component.get_variable('x2o') + h2o = self._component.get_variable('h2o') + c2o = self._component.get_variable('c2o') + bo = self._component.get_variable('bo') + x2c = self._component.get_variable('x2c') + h2c = self._component.get_variable('h2c') + bc = self._component.get_variable('bc') + + # i_h_tm1, i_c_tm1 = h_{t-1}, c_{t-1} + i_h_tm1 = context_tensor_arrays[0].read(length - 1) + i_c_tm1 = context_tensor_arrays[1].read(length - 1) + + # apply dropout according to http://arxiv.org/pdf/1409.2329v5.pdf + if during_training and self._input_dropout_rate < 1: + input_tensor = tf.nn.dropout(input_tensor, self._input_dropout_rate) + + # input -- i_t = sigmoid(affine(x_t, h_{t-1}, c_{t-1})) + i_ait = tf.matmul(input_tensor, x2i) + tf.matmul(i_h_tm1, h2i) + tf.matmul( + i_c_tm1, c2i) + bi + i_it = tf.sigmoid(i_ait) + + # forget -- f_t = 1 - i_t + i_ft = tf.ones([1, 1]) - i_it + + # write memory cell -- tanh(affine(x_t, h_{t-1})) + i_awt = tf.matmul(input_tensor, x2c) + tf.matmul(i_h_tm1, h2c) + bc + i_wt = tf.tanh(i_awt) + + # c_t = f_t \odot c_{t-1} + i_t \odot tanh(affine(x_t, h_{t-1})) + ct = tf.add( + tf.multiply(i_it, i_wt), tf.multiply(i_ft, i_c_tm1), name='lstm_c') + + # output -- o_t = sigmoid(affine(x_t, h_{t-1}, c_t)) + i_aot = tf.matmul(input_tensor, x2o) + tf.matmul(ct, c2o) + tf.matmul( + i_h_tm1, h2o) + bo + + i_ot = tf.sigmoid(i_aot) + + # ht = o_t \odot tanh(ct) + ph_t = tf.tanh(ct) + ht = tf.multiply(i_ot, ph_t, name='lstm_h') + + if during_training and self._recurrent_dropout_rate < 1: + ht = tf.nn.dropout( + ht, self._recurrent_dropout_rate, name='lstm_h_dropout') + + h = tf.identity(ht, name='layer_0') + + logits = tf.nn.xw_plus_b(ht, tf.get_variable('weights_softmax'), + tf.get_variable('bias_softmax')) + + if self._component.spec.attention_component: + logits += self.attention(ht, attention_tensor) + + logits = tf.identity(logits, name='logits') + # tensors will be consistent with the layers: + # [lstm_h, lstm_c, layer_0, logits] + tensors = [ht, ct, h, logits] + return tensors + + def get_layer_size(self, layer_name): + assert layer_name == 'layer_0', 'Can only retrieve from first hidden layer.' + return self._hidden_layer_sizes + + def get_logits(self, network_tensors): + return network_tensors[self.get_layer_index('logits')] + + +class ConvNetwork(NetworkUnitInterface): + """Implementation of a convolutional feed forward network.""" + + def __init__(self, component): + """Initializes kernels and biases for this convolutional net. + + Args: + component: parent ComponentBuilderBase object. + + Parameters used to construct the network: + widths: comma separated list of ints, number of steps input to the + convolutional kernel at every layer. + depths: comma separated list of ints, number of channels input to the + convolutional kernel at every layer. + output_embedding_dim: int, number of output channels for the convolutional + kernel of the last layer, which receives no ReLU activation and + therefore can be used in a softmax output. If zero, this final + layer is disabled entirely. + nonlinearity ('relu'): Name of function from module "tf.nn" to apply to + each hidden layer; e.g., "relu" or "elu". + dropout_keep_prob (-1.0): The probability that an input is not dropped. + If >= 1.0, disables dropout. If < 0.0, uses the global |dropout_rate| + hyperparameter. + dropout_per_sequence (False): If true, sample the dropout mask once per + sequence, instead of once per step. See Gal and Ghahramani + (https://arxiv.org/abs/1512.05287). + + Hyperparameters used: + dropout_rate: The probability that an input is not dropped. Only used + when the |dropout_keep_prob| parameter is negative. + """ + + super(ConvNetwork, self).__init__(component) + self._attrs = get_attrs_with_defaults( + component.spec.network_unit.parameters, defaults={ + 'widths': '', + 'depths': '', + 'output_embedding_dim': 0, + 'nonlinearity': 'relu', + 'dropout_keep_prob': -1.0, + 'dropout_per_sequence': False}) + + self._weights = [] + self._biases = [] + self._widths = map(int, self._attrs['widths'].split(',')) + self._depths = map(int, self._attrs['depths'].split(',')) + self._output_dim = self._attrs['output_embedding_dim'] + if self._output_dim: + self._depths.append(self._output_dim) + self.kernel_shapes = [] + for i in range(len(self._depths) - 1): + self.kernel_shapes.append( + [1, self._widths[i], self._depths[i], self._depths[i + 1]]) + for i in range(len(self._depths) - 1): + with tf.variable_scope('conv%d' % i): + self._weights.append( + tf.get_variable( + 'weights', + self.kernel_shapes[i], + initializer=tf.random_normal_initializer(stddev=1e-4), + dtype=tf.float32)) + bias_init = 0.0 if (i == len(self._widths) - 1) else 0.2 + self._biases.append( + tf.get_variable( + 'biases', + self.kernel_shapes[i][-1], + initializer=tf.constant_initializer(bias_init), + dtype=tf.float32)) + + # Extract nonlinearity from |tf.nn|. + self._nonlinearity = getattr(tf.nn, self._attrs['nonlinearity']) + + # Infer dropout rate from network parameters and grid hyperparameters. + self._dropout_rate = self._attrs['dropout_keep_prob'] + if self._dropout_rate < 0.0: + self._dropout_rate = component.master.hyperparams.dropout_rate + + self._params.extend(self._weights + self._biases) + self._layers.append( + Layer( + component, name='conv_output', dim=self._depths[-1])) + self._regularized_weights.extend(self._weights[:-1] if self._output_dim else + self._weights) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """Requires |stride|; otherwise see base class.""" + if stride is None: + raise RuntimeError("ConvNetwork needs 'stride' and must be called in the " + "bulk feature extractor component.") + input_tensor = get_input_tensor_with_stride(fixed_embeddings, + linked_embeddings, stride) + + # TODO(googleuser): Add context and attention. + del context_tensor_arrays, attention_tensor + + # On CPU, add a dimension so that the 'image' has shape + # [stride, 1, num_steps, D]. + conv = tf.expand_dims(input_tensor, 1) + for i in range(len(self._depths) - 1): + with tf.variable_scope('conv%d' % i, reuse=True) as scope: + if during_training: + conv.set_shape([None, 1, None, self._depths[i]]) + conv = self._maybe_apply_dropout(conv, stride) + conv = tf.nn.conv2d( + conv, + self._component.get_variable('weights'), [1, 1, 1, 1], + padding='SAME') + conv = tf.nn.bias_add(conv, self._component.get_variable('biases')) + if i < (len(self._weights) - 1) or not self._output_dim: + conv = self._nonlinearity(conv, name=scope.name) + return [ + tf.reshape( + conv, [-1, self._depths[-1]], name='reshape_activations') + ] + + def _maybe_apply_dropout(self, inputs, stride): + # The |inputs| are rank 4 (one 1xN "image" per sequence). Squeeze out and + # restore the singleton image height, so dropout is applied to the normal + # rank 3 batched input tensor. + inputs = tf.squeeze(inputs, [1]) + inputs = maybe_apply_dropout(inputs, self._dropout_rate, + self._attrs['dropout_per_sequence'], stride) + inputs = tf.expand_dims(inputs, 1) + return inputs + + +class PairwiseConvNetwork(NetworkUnitInterface): + """Implementation of a pairwise 2D convolutional feed forward network. + + For a sequence of N tokens, all N^2 pairs of concatenated input features are + constructed. If each input vector is of length D, then the sequence is + represented by an image of dimensions [N, N] with 2*D channels per pixel. + I.e. pixel [i, j] has a representation that is the concatenation of the + representations of the tokens at i and at j. + + To use this network for graph edge scoring, for instance by using the "heads" + transition system, the output layer needs to have dimensions [N, N] and only + a single channel. The network takes care of outputting an [N, N] sized layer, + but the user needs to ensure that the output depth equals 1. + + TODO(googleuser): Like Dozat and Manning, we will need an + additional network to label the edges, and the ability to read head + and modifier representations from different inputs. + """ + + def __init__(self, component): + """Initializes kernels and biases for this convolutional net. + + Parameters used to construct the network: + depths: comma separated list of ints, number of channels input to the + convolutional kernel at every layer. + widths: comma separated list of ints, number of steps input to the + convolutional kernel at every layer. + relu_layers: comma separate list of ints, the id of layers after which + to apply a relu activation. *By default, all but the final layer will + have a relu activation applied.* + + To generate a network with M layers, both 'depths' and 'widths' must be of + length M. The input depth of the first layer is inferred from the total + concatenated size of the input features. + + Args: + component: parent ComponentBuilderBase object. + + Raises: + RuntimeError: if the number of depths and weights are not equal. + ValueError: if the final depth is not equal to 1. + """ + parameters = component.spec.network_unit.parameters + super(PairwiseConvNetwork, self).__init__(component) + + # Each input pixel will comprise the concatenation of two tokens, so the + # input depth is double that for a single token. + self._depths = [self._concatenated_input_dim * 2] + self._depths.extend(map(int, parameters['depths'].split(','))) + self._widths = map(int, parameters['widths'].split(',')) + self._num_layers = len(self._widths) + if len(self._depths) != self._num_layers + 1: + raise RuntimeError('Unmatched depths/weights %s/%s' % + (parameters['depths'], parameters['weights'])) + if self._depths[-1] != 1: + raise ValueError('Final depth is not equal to 1 in %s' % + parameters['depths']) + + self._kernel_shapes = [] + for i, width in enumerate(self._widths): + self._kernel_shapes.append( + [width, width, self._depths[i], self._depths[i + 1]]) + if parameters['relu_layers']: + self._relu_layers = set(map(int, parameters['relu_layers'].split(','))) + else: + self._relu_layers = set(range(self._num_layers - 1)) + + self._weights = [] + self._biases = [] + for i, kernel_shape in enumerate(self._kernel_shapes): + with tf.variable_scope('conv%d' % i): + self._weights.append( + tf.get_variable( + 'weights', + kernel_shape, + initializer=tf.random_normal_initializer(stddev=1e-4), + dtype=tf.float32)) + bias_init = 0.0 if i in self._relu_layers else 0.2 + self._biases.append( + tf.get_variable( + 'biases', + kernel_shape[-1], + initializer=tf.constant_initializer(bias_init), + dtype=tf.float32)) + + self._params.extend(self._weights + self._biases) + self._layers.append(Layer(component, name='conv_output', dim=-1)) + self._regularized_weights.extend(self._weights[:-1]) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """Requires |stride|; otherwise see base class.""" + # TODO(googleuser): Normalize the arguments to create(). 'stride' + # is unused by the recurrent network units, while 'context_tensor_arrays' + # and 'attenion_tensor_array' is unused by bulk network units. b/33587044 + if stride is None: + raise ValueError("PairwiseConvNetwork needs 'stride'") + + input_tensor = get_input_tensor_with_stride(fixed_embeddings, + linked_embeddings, stride) + + # TODO(googleuser): Add dropout. + del context_tensor_arrays, attention_tensor, during_training # Unused. + + num_steps = tf.shape(input_tensor)[1] + arg1 = tf.expand_dims(input_tensor, 1) + arg1 = tf.tile(arg1, tf.stack([1, num_steps, 1, 1])) + arg2 = tf.expand_dims(input_tensor, 2) + arg2 = tf.tile(arg2, tf.stack([1, 1, num_steps, 1])) + conv = tf.concat([arg1, arg2], 3) + for i in xrange(self._num_layers): + with tf.variable_scope('conv%d' % i, reuse=True) as scope: + conv = tf.nn.conv2d( + conv, + self._component.get_variable('weights'), [1, 1, 1, 1], + padding='SAME') + conv = tf.nn.bias_add(conv, self._component.get_variable('biases')) + if i in self._relu_layers: + conv = tf.nn.relu(conv, name=scope.name) + return [tf.reshape(conv, [-1, num_steps], name='reshape_activations')] + + +class ExportFixedFeaturesNetwork(NetworkUnitInterface): + """A network that exports fixed features as layers. + + Each fixed feature embedding is output as a layer whose name and dimension are + set to the name and dimension of the corresponding fixed feature. + """ + + def __init__(self, component): + """Initializes exported layers.""" + super(ExportFixedFeaturesNetwork, self).__init__(component) + for feature_spec in component.spec.fixed_feature: + name = feature_spec.name + dim = self._fixed_feature_dims[name] + self._layers.append(Layer(component, name, dim)) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + """See base class.""" + check.Eq(len(self.layers), len(fixed_embeddings)) + for index in range(len(fixed_embeddings)): + check.Eq(self.layers[index].name, fixed_embeddings[index].name) + return [fixed_embedding.tensor for fixed_embedding in fixed_embeddings] + + +class SplitNetwork(NetworkUnitInterface): + """Network unit that splits its input into slices of equal dimension. + + Parameters: + num_slices: The number of slices to split the input into, S. The input must + have static dimension D, where D % S == 0. + + Features: + All inputs are concatenated before being split. + + Layers: + slice_0: [B * N, D / S] The first slice of the input. + slice_1: [B * N, D / S] The second slice of the input. + ... + """ + + def __init__(self, component): + """Initializes weights and layers. + + Args: + component: Parent ComponentBuilderBase object. + """ + super(SplitNetwork, self).__init__(component) + + parameters = component.spec.network_unit.parameters + self._num_slices = int(parameters['num_slices']) + check.Gt(self._num_slices, 0, 'Invalid number of slices.') + check.Eq(self._concatenated_input_dim % self._num_slices, 0, + 'Input dimension %s does not evenly divide into %s slices' % + (self._concatenated_input_dim, self._num_slices)) + self._slice_dim = int(self._concatenated_input_dim / self._num_slices) + + for slice_index in xrange(self._num_slices): + self._layers.append( + Layer(self, 'slice_%s' % slice_index, self._slice_dim)) + + def create(self, + fixed_embeddings, + linked_embeddings, + context_tensor_arrays, + attention_tensor, + during_training, + stride=None): + input_bnxd = get_input_tensor(fixed_embeddings, linked_embeddings) + return tf.split(input_bnxd, self._num_slices, axis=1) diff --git a/syntaxnet/dragnn/python/network_units_test.py b/syntaxnet/dragnn/python/network_units_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d913c5263ca332796d48ea708f7aa0ffedbebafa --- /dev/null +++ b/syntaxnet/dragnn/python/network_units_test.py @@ -0,0 +1,159 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for network_units.""" + + +import tensorflow as tf +from tensorflow.python.framework import test_util +from tensorflow.python.platform import googletest + +from dragnn.protos import spec_pb2 +from dragnn.python import network_units + +import dragnn.python.load_dragnn_cc_impl +import syntaxnet.load_parser_ops + +FLAGS = tf.app.flags.FLAGS + + +class NetworkUnitsConverterTest(test_util.TensorFlowTestCase): + + def testConvertNetworkStateTensorarray(self): + with self.test_session() as session: + ta = tf.TensorArray( + dtype=tf.float32, + size=0, + dynamic_size=True, + clear_after_read=False, + infer_shape=False) + # Create a 3-step x 2-stride x 2-feature-dim source array. + ta = ta.write(0, [[0., 0.]] * 2) # The zeroth step will be removed. + ta = ta.write(1, [[1., 10.]] * 2) + ta = ta.write(2, [[2., 20.]] * 2) + ta = ta.write(3, [[3., 30.]] * 2) + tensor = network_units.convert_network_state_tensorarray(ta) + actual = session.run(tensor) + self.assertEqual(actual.shape, (6, 2)) + + # The arrangement of the values is expected to be stride * steps. + expected = [[1., 10.], [2., 20.], [3., 30.], [1., 10.], [2., 20.], + [3., 30.]] + self.assertAllEqual(actual, expected) + + +class MockComponent(object): + + def __init__(self, master, component_spec): + self.master = master + self.spec = component_spec + self.name = component_spec.name + self.beam_size = 1 + self._attrs = {} + + def attr(self, name): + return self._attrs[name] + + +class MockMaster(object): + + def __init__(self): + self.spec = spec_pb2.MasterSpec() + self.hyperparams = spec_pb2.GridPoint() + self.lookup_component = { + 'previous': MockComponent(self, spec_pb2.ComponentSpec()) + } + + +class NetworkUnitsLookupTest(test_util.TensorFlowTestCase): + + def setUp(self): + # Clear the graph and all existing variables. Otherwise, variables created + # in different tests may collide with each other. + tf.reset_default_graph() + + self._master = MockMaster() + self._master.spec = spec_pb2.MasterSpec() + + # Add a component with a linked feature. + component_spec = self._master.spec.component.add() + component_spec.name = 'fake_linked' + component_spec.backend.registered_name = 'FakeComponent' + linked_feature = component_spec.linked_feature.add() + linked_feature.source_component = 'fake_linked' + linked_feature.source_translator = 'identity' + linked_feature.embedding_dim = -1 + linked_feature.size = 2 + self._linked_component = MockComponent(self._master, component_spec) + + # Add a feature with a fixed feature. + component_spec = self._master.spec.component.add() + component_spec.name = 'fake_fixed' + component_spec.backend.registered_name = 'FakeComponent' + fixed_feature = component_spec.fixed_feature.add() + fixed_feature.fml = 'input.word' + fixed_feature.embedding_dim = 1 + fixed_feature.size = 1 + self._fixed_component = MockComponent(self._master, component_spec) + + def testExportFixedFeaturesNetworkWithEnabledEmbeddingMatrix(self): + network = network_units.ExportFixedFeaturesNetwork(self._fixed_component) + self.assertEqual(1, len(network.params)) + + def testExportFixedFeaturesNetworkWithDisabledEmbeddingMatrix(self): + self._fixed_component.spec.fixed_feature[0].embedding_dim = -1 + network = network_units.ExportFixedFeaturesNetwork(self._fixed_component) + self.assertEqual(0, len(network.params)) + + +class GetAttrsWithDefaultsTest(test_util.TensorFlowTestCase): + + def MakeAttrs(self, defaults, key=None, value=None): + """Returns attrs based on the |defaults| and one |key|,|value| override.""" + spec = spec_pb2.RegisteredModuleSpec() + if key and value: + spec.parameters[key] = value + return network_units.get_attrs_with_defaults(spec.parameters, defaults) + + def testFalseValues(self): + + def _assert_attr_is_false(value=None): + key = 'foo' + attrs = self.MakeAttrs({key: False}, key, value) + self.assertFalse(attrs[key]) + + _assert_attr_is_false() + _assert_attr_is_false('false') + _assert_attr_is_false('False') + _assert_attr_is_false('FALSE') + _assert_attr_is_false('no') + _assert_attr_is_false('whatever') + _assert_attr_is_false(' ') + _assert_attr_is_false('') + + def testTrueValues(self): + + def _assert_attr_is_true(value=None): + key = 'foo' + attrs = self.MakeAttrs({key: False}, key, value) + self.assertTrue(attrs[key]) + + _assert_attr_is_true('true') + _assert_attr_is_true('True') + _assert_attr_is_true('TRUE') + + +if __name__ == '__main__': + googletest.main() diff --git a/syntaxnet/dragnn/python/render_parse_tree_graphviz.py b/syntaxnet/dragnn/python/render_parse_tree_graphviz.py new file mode 100644 index 0000000000000000000000000000000000000000..528a9f1527618cef158c67864490091d9e7e6047 --- /dev/null +++ b/syntaxnet/dragnn/python/render_parse_tree_graphviz.py @@ -0,0 +1,71 @@ +# -*- coding: utf-8 -*- +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Renders parse trees with Graphviz.""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import base64 +import warnings + +import pygraphviz + + +def parse_tree_graph(sentence): + """Constructs a parse tree graph. + + Args: + sentence: syntaxnet.Sentence instance. + + Returns: + HTML graph contents, as a string. + """ + graph = pygraphviz.AGraph(directed=True, strict=False, rankdir="TB") + + for i, token in enumerate(sentence.token): + node_id = "tok_{}".format(i) + graph.add_node(node_id, label=token.word) + if token.head >= 0: + src_id = "tok_{}".format(token.head) + graph.add_edge( + src_id, + node_id, + label=token.label, + key="parse_{}_{}".format(node_id, src_id)) + + with warnings.catch_warnings(): + # Fontconfig spews some warnings, suppress them for now. (Especially because + # they can clutter IPython notebooks). + warnings.simplefilter("ignore") + svg = graph.draw(format="svg", prog="dot") + + svg = unicode(svg, "utf-8") + + # For both inline and "new window" displays, we show the tokens with the + # graph. (The sentence order of nodes is sometimes difficult to read.) + image_and_text = u"

Text: {}

{}".format(" ".join( + token.word for token in sentence.token), svg) + + # We generate a base64 URI. This is not too big, but older browsers may not + # handle it well. + new_window_html = (u"" + + image_and_text).encode("utf-8") + as_uri = "data:text/html;charset=utf-8;base64,{}".format( + base64.b64encode(new_window_html)) + + return u"{}

Open in new window

".format( + image_and_text, as_uri) diff --git a/syntaxnet/dragnn/python/render_parse_tree_graphviz_test.py b/syntaxnet/dragnn/python/render_parse_tree_graphviz_test.py new file mode 100644 index 0000000000000000000000000000000000000000..fc7190c8fe2f0776e78e6b4b08c317348da78dfa --- /dev/null +++ b/syntaxnet/dragnn/python/render_parse_tree_graphviz_test.py @@ -0,0 +1,42 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for ....dragnn.python.render_parse_tree_graphviz.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from tensorflow.python.platform import googletest +from dragnn.python import render_parse_tree_graphviz +from syntaxnet import sentence_pb2 + + +class RenderParseTreeGraphvizTest(googletest.TestCase): + + def testGiveMeAName(self): + document = sentence_pb2.Sentence() + document.token.add(start=0, end=0, word='hi', head=1, label='something') + document.token.add(start=1, end=1, word='there') + contents = render_parse_tree_graphviz.parse_tree_graph(document) + self.assertIn('{name}
+ {transition_name}
+ {network_name}
+ {num_actions_str}
+ hidden: {num_hidden} + >""".format( + name=component.name, + transition_name=component.transition_system.registered_name, + network_name=component.network_unit.registered_name, + num_actions_str="{} action{}".format(component.num_actions, "s" if + component.num_actions != 1 else ""), + num_hidden=component.network_unit.parameters.get("hidden_layer_sizes", + "not specified")) + + +def _linked_feature_label(linked_feature): + """Generates the label on edges between components. + + Args: + linked_feature: spec_pb2.LinkedFeatureChannel proto + + Returns: + String label + """ + return """< + {name}
+ F={num_features} D={projected_dim}
+ {fml}
+ {source_translator}
+ {source_layer} + >""".format( + name=linked_feature.name, + num_features=linked_feature.size, + projected_dim=linked_feature.embedding_dim, + fml=linked_feature.fml, + source_translator=linked_feature.source_translator, + source_layer=linked_feature.source_layer) + + +def master_spec_graph(master_spec): + """Constructs a master spec graph. + + Args: + master_spec: MasterSpec proto. + + Raises: + TypeError, if master_spec is not the right type. N.B. that this may be + raised if you import proto classes in non-standard ways (e.g. dynamically). + + Returns: + SVG graph contents as a string. + """ + if not isinstance(master_spec, spec_pb2.MasterSpec): + raise TypeError("master_spec_graph() expects a MasterSpec input.") + + graph = pygraphviz.AGraph(directed=True) + + graph.node_attr.update( + shape="box", + style="filled", + fillcolor="white", + fontname="roboto, helvetica, arial", + fontsize=11) + graph.edge_attr.update(fontname="roboto, helvetica, arial", fontsize=11) + + for component in master_spec.component: + graph.add_node(component.name, label=_component_contents(component)) + + for component in master_spec.component: + for linked_feature in component.linked_feature: + graph.add_edge( + linked_feature.source_component, + component.name, + label=_linked_feature_label(linked_feature)) + + with warnings.catch_warnings(): + # Fontconfig spews some warnings, suppress them for now. (Especially because + # they can clutter IPython notebooks). + warnings.simplefilter("ignore") + return graph.draw(format="svg", prog="dot") diff --git a/syntaxnet/dragnn/python/render_spec_with_graphviz_test.py b/syntaxnet/dragnn/python/render_spec_with_graphviz_test.py new file mode 100644 index 0000000000000000000000000000000000000000..5dfb0013ba9d5efebe8a19d908dca277c62c5cb5 --- /dev/null +++ b/syntaxnet/dragnn/python/render_spec_with_graphviz_test.py @@ -0,0 +1,75 @@ +# Copyright 2017 Google Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + +"""Tests for render_spec_with_graphviz.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +from tensorflow.python.platform import googletest +from dragnn.protos import spec_pb2 +from dragnn.python import render_spec_with_graphviz +from dragnn.python import spec_builder + + +def _make_basic_master_spec(): + """Constructs a simple spec. + + Modified version of nlp/saft/opensource/dragnn/tools/parser_trainer.py + + Returns: + spec_pb2.MasterSpec instance. + """ + # Construct the "lookahead" ComponentSpec. This is a simple right-to-left RNN + # sequence model, which encodes the context to the right of each token. It has + # no loss except for the downstream components. + lookahead = spec_builder.ComponentSpecBuilder('lookahead') + lookahead.set_network_unit( + name='FeedForwardNetwork', hidden_layer_sizes='256') + lookahead.set_transition_system(name='shift-only', left_to_right='true') + lookahead.add_fixed_feature(name='words', fml='input.word', embedding_dim=64) + lookahead.add_rnn_link(embedding_dim=-1) + + # Construct the ComponentSpec for parsing. + parser = spec_builder.ComponentSpecBuilder('parser') + parser.set_network_unit(name='FeedForwardNetwork', hidden_layer_sizes='256') + parser.set_transition_system(name='arc-standard') + parser.add_token_link(source=lookahead, fml='input.focus', embedding_dim=32) + + master_spec = spec_pb2.MasterSpec() + master_spec.component.extend([lookahead.spec, parser.spec]) + return master_spec + + +class RenderSpecWithGraphvizTest(googletest.TestCase): + + def test_constructs_simple_graph(self): + master_spec = _make_basic_master_spec() + contents = render_spec_with_graphviz.master_spec_graph(master_spec) + self.assertIn('lookahead', contents) + self.assertIn('