Commit 9beaea41 authored by Alexander Gorban's avatar Alexander Gorban
Browse files

Merge remote-tracking branch 'upstream/master'

parents 6159b593 3a3c5b9d
# Learning to Protect Communications with Adversarial Neural Cryptography
This is a slightly-updated model used for the paper
["Learning to Protect Communications with Adversarial Neural
Cryptography"](https://arxiv.org/abs/1610.06918).
> We ask whether neural networks can learn to use secret keys to protect
> information from other neural networks. Specifically, we focus on ensuring
> confidentiality properties in a multiagent system, and we specify those
> properties in terms of an adversary. Thus, a system may consist of neural
> networks named Alice and Bob, and we aim to limit what a third neural
> network named Eve learns from eavesdropping on the communication between
> Alice and Bob. We do not prescribe specific cryptographic algorithms to
> these neural networks; instead, we train end-to-end, adversarially.
> We demonstrate that the neural networks can learn how to perform forms of
> encryption and decryption, and also how to apply these operations
> selectively in order to meet confidentiality goals.
This code allows you to train an encoder/decoder/adversary triplet
and evaluate their effectiveness on randomly generated input and key
pairs.
## Prerequisites
The only software requirements for running the encoder and decoder is having
Tensorflow installed.
Requires Tensorflow r0.12 or later.
## Training and evaluating
After installing TensorFlow and ensuring that your paths are configured
appropriately:
python train_eval.py
This will begin training a fresh model. If and when the model becomes
sufficiently well-trained, it will reset the Eve model multiple times
and retrain it from scratch, outputting the accuracy thus obtained
in each run.
## Model differences from the paper
The model has been simplified slightly from the one described in
the paper - the convolutional layer width was reduced by a factor
of two. In the version in the paper, there was a nonlinear unit
after the fully-connected layer; that nonlinear has been removed
here. These changes improve the robustness of training. The
initializer for the convolution layers has switched to the
tf.contrib.layers default of xavier_initializer instead of
a simpler truncated_normal.
## Contact information
This model repository is maintained by David G. Andersen
([dave-andersen](https://github.com/dave-andersen)).
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Adversarial training to learn trivial encryption functions,
from the paper "Learning to Protect Communications with
Adversarial Neural Cryptography", Abadi & Andersen, 2016.
https://arxiv.org/abs/1610.06918
This program creates and trains three neural networks,
termed Alice, Bob, and Eve. Alice takes inputs
in_m (message), in_k (key) and outputs 'ciphertext'.
Bob takes inputs in_k, ciphertext and tries to reconstruct
the message.
Eve is an adversarial network that takes input ciphertext
and also tries to reconstruct the message.
The main function attempts to train these networks and then
evaluates them, all on random plaintext and key values.
"""
# TensorFlow Python 3 compatibility
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import signal
import sys
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
flags = tf.app.flags
flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate')
flags.DEFINE_integer('batch_size', 4096, 'Batch size')
FLAGS = flags.FLAGS
# Input and output configuration.
TEXT_SIZE = 16
KEY_SIZE = 16
# Training parameters.
ITERS_PER_ACTOR = 1
EVE_MULTIPLIER = 2 # Train Eve 2x for every step of Alice/Bob
# Train until either max loops or Alice/Bob "good enough":
MAX_TRAINING_LOOPS = 850000
BOB_LOSS_THRESH = 0.02 # Exit when Bob loss < 0.02 and Eve > 7.7 bits
EVE_LOSS_THRESH = 7.7
# Logging and evaluation.
PRINT_EVERY = 200 # In training, log every 200 steps.
EVE_EXTRA_ROUNDS = 2000 # At end, train eve a bit more.
RETRAIN_EVE_ITERS = 10000 # Retrain eve up to ITERS*LOOPS times.
RETRAIN_EVE_LOOPS = 25 # With an evaluation each loop
NUMBER_OF_EVE_RESETS = 5 # And do this up to 5 times with a fresh eve.
# Use EVAL_BATCHES samples each time we check accuracy.
EVAL_BATCHES = 1
def batch_of_random_bools(batch_size, n):
"""Return a batch of random "boolean" numbers.
Args:
batch_size: Batch size dimension of returned tensor.
n: number of entries per batch.
Returns:
A [batch_size, n] tensor of "boolean" numbers, where each number is
preresented as -1 or 1.
"""
as_int = tf.random_uniform(
[batch_size, n], minval=0, maxval=2, dtype=tf.int32)
expanded_range = (as_int * 2) - 1
return tf.cast(expanded_range, tf.float32)
class AdversarialCrypto(object):
"""Primary model implementation class for Adversarial Neural Crypto.
This class contains the code for the model itself,
and when created, plumbs the pathways from Alice to Bob and
Eve, creates the optimizers and loss functions, etc.
Attributes:
eve_loss: Eve's loss function.
bob_loss: Bob's loss function. Different units from eve_loss.
eve_optimizer: A tf op that runs Eve's optimizer.
bob_optimizer: A tf op that runs Bob's optimizer.
bob_reconstruction_loss: Bob's message reconstruction loss,
which is comparable to eve_loss.
reset_eve_vars: Execute this op to completely reset Eve.
"""
def get_message_and_key(self):
"""Generate random pseudo-boolean key and message values."""
batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[])
in_m = batch_of_random_bools(batch_size, TEXT_SIZE)
in_k = batch_of_random_bools(batch_size, KEY_SIZE)
return in_m, in_k
def model(self, collection, message, key=None):
"""The model for Alice, Bob, and Eve. If key=None, the first FC layer
takes only the Key as inputs. Otherwise, it uses both the key
and the message.
Args:
collection: The graph keys collection to add new vars to.
message: The input message to process.
key: The input key (if any) to use.
"""
if key is not None:
combined_message = tf.concat(1, [message, key])
else:
combined_message = message
# Ensure that all variables created are in the specified collection.
with tf.contrib.framework.arg_scope(
[tf.contrib.layers.fully_connected, tf.contrib.layers.convolution],
variables_collections=[collection]):
fc = tf.contrib.layers.fully_connected(
combined_message,
TEXT_SIZE + KEY_SIZE,
biases_initializer=tf.constant_initializer(0.0),
activation_fn=None)
# Perform a sequence of 1D convolutions (by expanding the message out to 2D
# and then squeezing it back down).
fc = tf.expand_dims(fc, 2)
# 2,1 -> 1,2
conv = tf.contrib.layers.convolution(
fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid)
# 1,2 -> 1, 2
conv = tf.contrib.layers.convolution(
conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid)
# 1,2 -> 1, 1
conv = tf.contrib.layers.convolution(
conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh)
conv = tf.squeeze(conv, 2)
return conv
def __init__(self):
in_m, in_k = self.get_message_and_key()
encrypted = self.model('alice', in_m, in_k)
decrypted = self.model('bob', encrypted, in_k)
eve_out = self.model('eve', encrypted, None)
self.reset_eve_vars = tf.group(
*[w.initializer for w in tf.get_collection('eve')])
optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
# Eve's goal is to decrypt the entire message:
eve_bits_wrong = tf.reduce_sum(
tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
self.eve_loss = tf.reduce_sum(eve_bits_wrong)
self.eve_optimizer = optimizer.minimize(
self.eve_loss, var_list=tf.get_collection('eve'))
# Alice and Bob want to be accurate...
self.bob_bits_wrong = tf.reduce_sum(
tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
# ... and to not let Eve do better than guessing.
self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong)
bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong)
# 7-9 bits wrong is OK too, so we squish the error function a bit.
# Without doing this, we often tend to hang out at 0.25 / 7.5 error,
# and it seems bad to have continued, high communication error.
bob_eve_loss = tf.reduce_sum(
tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2)
# Rescale the losses to [0, 1] per example and combine.
self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss)
self.bob_optimizer = optimizer.minimize(
self.bob_loss,
var_list=(tf.get_collection('alice') + tf.get_collection('bob')))
def doeval(s, ac, n, itercount):
"""Evaluate the current network on n batches of random examples.
Args:
s: The current TensorFlow session
ac: an instance of the AdversarialCrypto class
n: The number of iterations to run.
itercount: Iteration count label for logging.
Returns:
Bob and eve's loss, as a percent of bits incorrect.
"""
bob_loss_accum = 0
eve_loss_accum = 0
for _ in xrange(n):
bl, el = s.run([ac.bob_reconstruction_loss, ac.eve_loss])
bob_loss_accum += bl
eve_loss_accum += el
bob_loss_percent = bob_loss_accum / (n * FLAGS.batch_size)
eve_loss_percent = eve_loss_accum / (n * FLAGS.batch_size)
print('%d %.2f %.2f' % (itercount, bob_loss_percent, eve_loss_percent))
sys.stdout.flush()
return bob_loss_percent, eve_loss_percent
def train_until_thresh(s, ac):
for j in xrange(MAX_TRAINING_LOOPS):
for _ in xrange(ITERS_PER_ACTOR):
s.run(ac.bob_optimizer)
for _ in xrange(ITERS_PER_ACTOR * EVE_MULTIPLIER):
s.run(ac.eve_optimizer)
if j % PRINT_EVERY == 0:
bob_avg_loss, eve_avg_loss = doeval(s, ac, EVAL_BATCHES, j)
if (bob_avg_loss < BOB_LOSS_THRESH and eve_avg_loss > EVE_LOSS_THRESH):
print('Target losses achieved.')
return True
return False
def train_and_evaluate():
"""Run the full training and evaluation loop."""
ac = AdversarialCrypto()
init = tf.global_variables_initializer()
with tf.Session() as s:
s.run(init)
print('# Batch size: ', FLAGS.batch_size)
print('# Iter Bob_Recon_Error Eve_Recon_Error')
if train_until_thresh(s, ac):
for _ in xrange(EVE_EXTRA_ROUNDS):
s.run(eve_optimizer)
print('Loss after eve extra training:')
doeval(s, ac, EVAL_BATCHES * 2, 0)
for _ in xrange(NUMBER_OF_EVE_RESETS):
print('Resetting Eve')
s.run(reset_eve_vars)
eve_counter = 0
for _ in xrange(RETRAIN_EVE_LOOPS):
for _ in xrange(RETRAIN_EVE_ITERS):
eve_counter += 1
s.run(eve_optimizer)
doeval(s, ac, EVAL_BATCHES, eve_counter)
doeval(s, ac, EVAL_BATCHES, eve_counter)
def main(unused_argv):
# Exit more quietly with Ctrl-C.
signal.signal(signal.SIGINT, signal.SIG_DFL)
train_and_evaluate()
if __name__ == '__main__':
tf.app.run()
...@@ -8,14 +8,14 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718) ...@@ -8,14 +8,14 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718)
<Introduction> <Introduction>
Machine learning techniques based on neural networks are achieving remarkable Machine learning techniques based on neural networks are achieving remarkable
results in a wide variety of domains. Often, the training of models requires results in a wide variety of domains. Often, the training of models requires
large, representative datasets, which may be crowdsourced and contain sensitive large, representative datasets, which may be crowdsourced and contain sensitive
information. The models should not expose private information in these datasets. information. The models should not expose private information in these datasets.
Addressing this goal, we develop new algorithmic techniques for learning and a Addressing this goal, we develop new algorithmic techniques for learning and a
refined analysis of privacy costs within the framework of differential privacy. refined analysis of privacy costs within the framework of differential privacy.
Our implementation and experiments demonstrate that we can train deep neural Our implementation and experiments demonstrate that we can train deep neural
networks with non-convex objectives, under a modest privacy budget, and at a networks with non-convex objectives, under a modest privacy budget, and at a
manageable cost in software complexity, training efficiency, and model quality. manageable cost in software complexity, training efficiency, and model quality.
paper: https://arxiv.org/abs/1607.00133 paper: https://arxiv.org/abs/1607.00133
...@@ -46,7 +46,7 @@ https://github.com/panyx0718/models/tree/master/slim ...@@ -46,7 +46,7 @@ https://github.com/panyx0718/models/tree/master/slim
# Download the data to the data/ directory. # Download the data to the data/ directory.
# List the codes. # List the codes.
ls -R differential_privacy/ $ ls -R differential_privacy/
differential_privacy/: differential_privacy/:
dp_sgd __init__.py privacy_accountant README.md dp_sgd __init__.py privacy_accountant README.md
...@@ -72,16 +72,16 @@ differential_privacy/privacy_accountant/tf: ...@@ -72,16 +72,16 @@ differential_privacy/privacy_accountant/tf:
accountant.py accountant_test.py BUILD accountant.py accountant_test.py BUILD
# List the data. # List the data.
ls -R data/ $ ls -R data/
./data: ./data:
mnist_test.tfrecord mnist_train.tfrecord mnist_test.tfrecord mnist_train.tfrecord
# Build the codes. # Build the codes.
bazel build -c opt differential_privacy/... $ bazel build -c opt differential_privacy/...
# Run the mnist differntial privacy training codes. # Run the mnist differntial privacy training codes.
bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \ $ bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \
--training_data_path=data/mnist_train.tfrecord \ --training_data_path=data/mnist_train.tfrecord \
--eval_data_path=data/mnist_test.tfrecord \ --eval_data_path=data/mnist_test.tfrecord \
--save_path=/tmp/mnist_dir --save_path=/tmp/mnist_dir
...@@ -102,6 +102,6 @@ train_accuracy: 0.53 ...@@ -102,6 +102,6 @@ train_accuracy: 0.53
eval_accuracy: 0.53 eval_accuracy: 0.53
... ...
ls /tmp/mnist_dir/ $ ls /tmp/mnist_dir/
checkpoint ckpt ckpt.meta results-0.json checkpoint ckpt ckpt.meta results-0.json
``` ```
...@@ -367,6 +367,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo ...@@ -367,6 +367,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222 I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222
``` ```
If you compiled TensorFlow (from v1.1-rc3) with VERBS support and you have the
required device and IB verbs SW stack, you can specify --protocol='grpc+verbs'
In order to use Verbs RDMA for Tensor passing between workers and ps.
Need to add the the --protocol flag in all tasks (ps and workers).
The default protocol is the TensorFlow default protocol of grpc.
[Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now
training Inception in a distributed manner. training Inception in a distributed manner.
...@@ -749,7 +756,7 @@ batch-splitting the model across multiple GPUs. ...@@ -749,7 +756,7 @@ batch-splitting the model across multiple GPUs.
permit training the model with higher learning rates. permit training the model with higher learning rates.
* Often the GPU memory is a bottleneck that prevents employing larger batch * Often the GPU memory is a bottleneck that prevents employing larger batch
sizes. Employing more GPUs allows one to user larger batch sizes because sizes. Employing more GPUs allows one to use larger batch sizes because
this model splits the batch across the GPUs. this model splits the batch across the GPUs.
**NOTE** If one wishes to train this model with *asynchronous* gradient updates, **NOTE** If one wishes to train this model with *asynchronous* gradient updates,
......
...@@ -45,7 +45,8 @@ def main(unused_args): ...@@ -45,7 +45,8 @@ def main(unused_args):
{'ps': ps_hosts, {'ps': ps_hosts,
'worker': worker_hosts}, 'worker': worker_hosts},
job_name=FLAGS.job_name, job_name=FLAGS.job_name,
task_index=FLAGS.task_id) task_index=FLAGS.task_id,
protocol=FLAGS.protocol)
if FLAGS.job_name == 'ps': if FLAGS.job_name == 'ps':
# `ps` jobs wait for incoming connections from the workers. # `ps` jobs wait for incoming connections from the workers.
......
...@@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '', ...@@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '',
"""Comma-separated list of hostname:port for the """ """Comma-separated list of hostname:port for the """
"""worker jobs. e.g. """ """worker jobs. e.g. """
"""'machine1:2222,machine2:1111,machine2:2222'""") """'machine1:2222,machine2:1111,machine2:2222'""")
tf.app.flags.DEFINE_string('protocol', 'grpc',
"""Communication protocol to use in distributed """
"""execution (default grpc) """)
tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train', tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train',
"""Directory where to write event logs """ """Directory where to write event logs """
......
...@@ -73,7 +73,7 @@ LSTM-8192-2048 (50\% Dropout) | 32.2 | 3.3 ...@@ -73,7 +73,7 @@ LSTM-8192-2048 (50\% Dropout) | 32.2 | 3.3
<b>How To Run</b> <b>How To Run</b>
Pre-requesite: Prerequisites:
* Install TensorFlow. * Install TensorFlow.
* Install Bazel. * Install Bazel.
...@@ -97,7 +97,7 @@ Pre-requesite: ...@@ -97,7 +97,7 @@ Pre-requesite:
[link](http://download.tensorflow.org/models/LM_LSTM_CNN/vocab-2016-09-10.txt) [link](http://download.tensorflow.org/models/LM_LSTM_CNN/vocab-2016-09-10.txt)
* test dataset: link * test dataset: link
[link](http://download.tensorflow.org/models/LM_LSTM_CNN/test/news.en.heldout-00000-of-00050) [link](http://download.tensorflow.org/models/LM_LSTM_CNN/test/news.en.heldout-00000-of-00050)
* It is recommended to run on modern desktop instead of laptop. * It is recommended to run on a modern desktop instead of a laptop.
```shell ```shell
# 1. Clone the code to your workspace. # 1. Clone the code to your workspace.
...@@ -105,7 +105,7 @@ Pre-requesite: ...@@ -105,7 +105,7 @@ Pre-requesite:
# 3. Create an empty WORKSPACE file in your workspace. # 3. Create an empty WORKSPACE file in your workspace.
# 4. Create an empty output directory in your workspace. # 4. Create an empty output directory in your workspace.
# Example directory structure below: # Example directory structure below:
ls -R $ ls -R
.: .:
data lm_1b output WORKSPACE data lm_1b output WORKSPACE
...@@ -121,13 +121,13 @@ BUILD data_utils.py lm_1b_eval.py README.md ...@@ -121,13 +121,13 @@ BUILD data_utils.py lm_1b_eval.py README.md
./output: ./output:
# Build the codes. # Build the codes.
bazel build -c opt lm_1b/... $ bazel build -c opt lm_1b/...
# Run sample mode: # Run sample mode:
bazel-bin/lm_1b/lm_1b_eval --mode sample \ $ bazel-bin/lm_1b/lm_1b_eval --mode sample \
--prefix "I love that I" \ --prefix "I love that I" \
--pbtxt data/graph-2016-09-10.pbtxt \ --pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \ --vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' --ckpt 'data/ckpt-*'
...(omitted some TensorFlow output) ...(omitted some TensorFlow output)
I love I love
I love that I love that
...@@ -138,11 +138,11 @@ I love that I find that amazing ...@@ -138,11 +138,11 @@ I love that I find that amazing
...(omitted) ...(omitted)
# Run eval mode: # Run eval mode:
bazel-bin/lm_1b/lm_1b_eval --mode eval \ $ bazel-bin/lm_1b/lm_1b_eval --mode eval \
--pbtxt data/graph-2016-09-10.pbtxt \ --pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \ --vocab_file data/vocab-2016-09-10.txt \
--input_data data/news.en.heldout-00000-of-00050 \ --input_data data/news.en.heldout-00000-of-00050 \
--ckpt 'data/ckpt-*' --ckpt 'data/ckpt-*'
...(omitted some TensorFlow output) ...(omitted some TensorFlow output)
Loaded step 14108582. Loaded step 14108582.
# perplexity is high initially because words without context are harder to # perplexity is high initially because words without context are harder to
...@@ -166,28 +166,28 @@ Eval Step: 4531, Average Perplexity: 29.285674. ...@@ -166,28 +166,28 @@ Eval Step: 4531, Average Perplexity: 29.285674.
...(omitted. At convergence, it should be around 30.) ...(omitted. At convergence, it should be around 30.)
# Run dump_emb mode: # Run dump_emb mode:
bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \ $ bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \
--pbtxt data/graph-2016-09-10.pbtxt \ --pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \ --vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' \ --ckpt 'data/ckpt-*' \
--save_dir output --save_dir output
...(omitted some TensorFlow output) ...(omitted some TensorFlow output)
Finished softmax weights Finished softmax weights
Finished word embedding 0/793471 Finished word embedding 0/793471
Finished word embedding 1/793471 Finished word embedding 1/793471
Finished word embedding 2/793471 Finished word embedding 2/793471
...(omitted) ...(omitted)
ls output/ $ ls output/
embeddings_softmax.npy ... embeddings_softmax.npy ...
# Run dump_lstm_emb mode: # Run dump_lstm_emb mode:
bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \ $ bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \
--pbtxt data/graph-2016-09-10.pbtxt \ --pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \ --vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' \ --ckpt 'data/ckpt-*' \
--sentence "I love who I am ." \ --sentence "I love who I am ." \
--save_dir output --save_dir output
ls output/ $ ls output/
lstm_emb_step_0.npy lstm_emb_step_2.npy lstm_emb_step_4.npy lstm_emb_step_0.npy lstm_emb_step_2.npy lstm_emb_step_4.npy
lstm_emb_step_6.npy lstm_emb_step_1.npy lstm_emb_step_3.npy lstm_emb_step_6.npy lstm_emb_step_1.npy lstm_emb_step_3.npy
lstm_emb_step_5.npy lstm_emb_step_5.npy
......
...@@ -34,7 +34,7 @@ to tf.SequenceExample. ...@@ -34,7 +34,7 @@ to tf.SequenceExample.
<b>How to run:</b> <b>How to run:</b>
```shell ```shell
ls -R $ ls -R
.: .:
data next_frame_prediction WORKSPACE data next_frame_prediction WORKSPACE
...@@ -52,18 +52,18 @@ cross_conv2.png cross_conv3.png cross_conv.png ...@@ -52,18 +52,18 @@ cross_conv2.png cross_conv3.png cross_conv.png
# Build everything. # Build everything.
bazel build -c opt next_frame_prediction/... $ bazel build -c opt next_frame_prediction/...
# The following example runs the generated 2d objects. # The following example runs the generated 2d objects.
# For Sprites dataset, image_size should be 60, norm_scale should be 255.0. # For Sprites dataset, image_size should be 60, norm_scale should be 255.0.
# Batch size is normally 16~64, depending on your memory size. # Batch size is normally 16~64, depending on your memory size.
#
# Run training. # Run training.
bazel-bin/next_frame_prediction/cross_conv/train \ $ bazel-bin/next_frame_prediction/cross_conv/train \
--batch_size=1 \ --batch_size=1 \
--data_filepattern=data/tfrecords \ --data_filepattern=data/tfrecords \
--image_size=64 \ --image_size=64 \
--log_root=/tmp/predict --log_root=/tmp/predict
step: 1, loss: 24.428671 step: 1, loss: 24.428671
step: 2, loss: 19.211605 step: 2, loss: 19.211605
...@@ -75,11 +75,11 @@ step: 7, loss: 1.747665 ...@@ -75,11 +75,11 @@ step: 7, loss: 1.747665
step: 8, loss: 1.572436 step: 8, loss: 1.572436
step: 9, loss: 1.586816 step: 9, loss: 1.586816
step: 10, loss: 1.434191 step: 10, loss: 1.434191
#
# Run eval. # Run eval.
bazel-bin/next_frame_prediction/cross_conv/eval \ $ bazel-bin/next_frame_prediction/cross_conv/eval \
--batch_size=1 \ --batch_size=1 \
--data_filepattern=data/tfrecords_test \ --data_filepattern=data/tfrecords_test \
--image_size=64 \ --image_size=64 \
--log_root=/tmp/predict --log_root=/tmp/predict
``` ```
...@@ -23,7 +23,7 @@ https://arxiv.org/pdf/1605.07146v1.pdf ...@@ -23,7 +23,7 @@ https://arxiv.org/pdf/1605.07146v1.pdf
<b>Settings:</b> <b>Settings:</b>
* Random split 50k training set into 45k/5k train/eval split. * Random split 50k training set into 45k/5k train/eval split.
* Pad to 36x36 and random crop. Horizontal flip. Per-image whitenting. * Pad to 36x36 and random crop. Horizontal flip. Per-image whitening.
* Momentum optimizer 0.9. * Momentum optimizer 0.9.
* Learning rate schedule: 0.1 (40k), 0.01 (60k), 0.001 (>60k). * Learning rate schedule: 0.1 (40k), 0.01 (60k), 0.001 (>60k).
* L2 weight decay: 0.002. * L2 weight decay: 0.002.
...@@ -65,40 +65,40 @@ curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binar ...@@ -65,40 +65,40 @@ curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binar
<b>How to run:</b> <b>How to run:</b>
```shell ```shell
# cd to the your workspace. # cd to the models repository and run with bash. Expected command output shown.
# It contains an empty WORKSPACE file, resnet codes and cifar10 dataset. # The directory should contain an empty WORKSPACE file, the resnet code, and the cifar10 dataset.
# Note: User can split 5k from train set for eval set. # Note: The user can split 5k from train set for eval set.
ls -R $ ls -R
.: .:
cifar10 resnet WORKSPACE cifar10 resnet WORKSPACE
./cifar10: ./cifar10:
data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin
data_batch_5.bin test_batch.bin data_batch_5.bin test_batch.bin
./resnet: ./resnet:
BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py
# Build everything for GPU. # Build everything for GPU.
bazel build -c opt --config=cuda resnet/... $ bazel build -c opt --config=cuda resnet/...
# Train the model. # Train the model.
bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \ $ bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \
--log_root=/tmp/resnet_model \ --log_root=/tmp/resnet_model \
--train_dir=/tmp/resnet_model/train \ --train_dir=/tmp/resnet_model/train \
--dataset='cifar10' \ --dataset='cifar10' \
--num_gpus=1 --num_gpus=1
# While the model is training, you can also check on its progress using tensorboard: # While the model is training, you can also check on its progress using tensorboard:
tensorboard --logdir=/tmp/resnet_model $ tensorboard --logdir=/tmp/resnet_model
# Evaluate the model. # Evaluate the model.
# Avoid running on the same GPU as the training job at the same time, # Avoid running on the same GPU as the training job at the same time,
# otherwise, you might run out of memory. # otherwise, you might run out of memory.
bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \ $ bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \
--log_root=/tmp/resnet_model \ --log_root=/tmp/resnet_model \
--eval_dir=/tmp/resnet_model/test \ --eval_dir=/tmp/resnet_model/test \
--mode=eval \ --mode=eval \
--dataset='cifar10' \ --dataset='cifar10' \
--num_gpus=0 --num_gpus=0
``` ```
...@@ -85,7 +85,7 @@ class ResNet(object): ...@@ -85,7 +85,7 @@ class ResNet(object):
# comparably good performance. # comparably good performance.
# https://arxiv.org/pdf/1605.07146v1.pdf # https://arxiv.org/pdf/1605.07146v1.pdf
# filters = [16, 160, 320, 640] # filters = [16, 160, 320, 640]
# Update hps.num_residual_units to 9 # Update hps.num_residual_units to 4
with tf.variable_scope('unit_1_0'): with tf.variable_scope('unit_1_0'):
x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]), x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]),
......
...@@ -178,12 +178,12 @@ image classification dataset. ...@@ -178,12 +178,12 @@ image classification dataset.
In the table below, we list each model, the corresponding In the table below, we list each model, the corresponding
TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5
accuracy (on the imagenet test set). accuracy (on the imagenet test set).
Note that the VGG and ResNet parameters have been converted from their original Note that the VGG and ResNet V1 parameters have been converted from their original
caffe formats caffe formats
([here](https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014) ([here](https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014)
and and
[here](https://github.com/KaimingHe/deep-residual-networks)), [here](https://github.com/KaimingHe/deep-residual-networks)),
whereas the Inception parameters have been trained internally at whereas the Inception and ResNet V2 parameters have been trained internally at
Google. Also be aware that these accuracies were computed by evaluating using a Google. Also be aware that these accuracies were computed by evaluating using a
single image crop. Some academic papers report higher accuracy by using multiple single image crop. Some academic papers report higher accuracy by using multiple
crops at multiple scales. crops at multiple scales.
...@@ -195,12 +195,19 @@ Model | TF-Slim File | Checkpoint | Top-1 Accuracy| Top-5 Accuracy | ...@@ -195,12 +195,19 @@ Model | TF-Slim File | Checkpoint | Top-1 Accuracy| Top-5 Accuracy |
[Inception V3](http://arxiv.org/abs/1512.00567)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py)|[inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)|78.0|93.9| [Inception V3](http://arxiv.org/abs/1512.00567)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py)|[inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)|78.0|93.9|
[Inception V4](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py)|[inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)|80.2|95.2| [Inception V4](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py)|[inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)|80.2|95.2|
[Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3| [Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3|
[ResNet 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2| [ResNet V1 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2|
[ResNet 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9| [ResNet V1 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9|
[ResNet 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2| [ResNet V1 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2|
[ResNet V2 50](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_50.tar.gz](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz)|75.6|92.8|
[ResNet V2 101](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_101.tar.gz](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz)|77.0|93.7|
[ResNet V2 152](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_152.tar.gz](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz)|77.8|94.1|
[VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8| [VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8|
[VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8| [VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8|
^ ResNet V2 models use Inception pre-processing and input image size of 299 (use
`--preprocessing_name inception --eval_image_size 299` when using
`eval_image_classifier.py`). Performance numbers for ResNet V2 models are
reported on ImageNet valdiation set.
Here is an example of how to download the Inception V3 checkpoint: Here is an example of how to download the Inception V3 checkpoint:
...@@ -344,10 +351,10 @@ following error: ...@@ -344,10 +351,10 @@ following error:
```bash ```bash
InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000] InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]
``` ```
This is due to the fact that the VGG and ResNet final layers have only 1000 This is due to the fact that the VGG and ResNet V1 final layers have only 1000
outputs rather than 1001. outputs rather than 1001.
To fix this issue, you can set the `--labels_offsets=1` flag. This results in To fix this issue, you can set the `--labels_offset=1` flag. This results in
the ImageNet labels being shifted down by one: the ImageNet labels being shifted down by one:
......
...@@ -16,7 +16,7 @@ The results described below are based on model trained on multi-gpu and ...@@ -16,7 +16,7 @@ The results described below are based on model trained on multi-gpu and
multi-machine settings. It has been simplified to run on only one machine multi-machine settings. It has been simplified to run on only one machine
for open source purpose. for open source purpose.
<b>DataSet</b> <b>Dataset</b>
We used the Gigaword dataset described in [Rush et al. A Neural Attention Model We used the Gigaword dataset described in [Rush et al. A Neural Attention Model
for Sentence Summarization](https://arxiv.org/abs/1509.00685). for Sentence Summarization](https://arxiv.org/abs/1509.00685).
......
...@@ -157,7 +157,7 @@ class PTBModel(object): ...@@ -157,7 +157,7 @@ class PTBModel(object):
(cell_output, state) = cell(inputs[:, time_step, :], state) (cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output) outputs.append(cell_output)
output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size]) output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
softmax_w = tf.get_variable( softmax_w = tf.get_variable(
"softmax_w", [size, vocab_size], dtype=data_type()) "softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type()) softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment