Commit 9beaea41 authored by Alexander Gorban's avatar Alexander Gorban
Browse files

Merge remote-tracking branch 'upstream/master'

parents 6159b593 3a3c5b9d
# Learning to Protect Communications with Adversarial Neural Cryptography
This is a slightly-updated model used for the paper
["Learning to Protect Communications with Adversarial Neural
Cryptography"](https://arxiv.org/abs/1610.06918).
> We ask whether neural networks can learn to use secret keys to protect
> information from other neural networks. Specifically, we focus on ensuring
> confidentiality properties in a multiagent system, and we specify those
> properties in terms of an adversary. Thus, a system may consist of neural
> networks named Alice and Bob, and we aim to limit what a third neural
> network named Eve learns from eavesdropping on the communication between
> Alice and Bob. We do not prescribe specific cryptographic algorithms to
> these neural networks; instead, we train end-to-end, adversarially.
> We demonstrate that the neural networks can learn how to perform forms of
> encryption and decryption, and also how to apply these operations
> selectively in order to meet confidentiality goals.
This code allows you to train an encoder/decoder/adversary triplet
and evaluate their effectiveness on randomly generated input and key
pairs.
## Prerequisites
The only software requirements for running the encoder and decoder is having
Tensorflow installed.
Requires Tensorflow r0.12 or later.
## Training and evaluating
After installing TensorFlow and ensuring that your paths are configured
appropriately:
python train_eval.py
This will begin training a fresh model. If and when the model becomes
sufficiently well-trained, it will reset the Eve model multiple times
and retrain it from scratch, outputting the accuracy thus obtained
in each run.
## Model differences from the paper
The model has been simplified slightly from the one described in
the paper - the convolutional layer width was reduced by a factor
of two. In the version in the paper, there was a nonlinear unit
after the fully-connected layer; that nonlinear has been removed
here. These changes improve the robustness of training. The
initializer for the convolution layers has switched to the
tf.contrib.layers default of xavier_initializer instead of
a simpler truncated_normal.
## Contact information
This model repository is maintained by David G. Andersen
([dave-andersen](https://github.com/dave-andersen)).
# Copyright 2016 The TensorFlow Authors All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Adversarial training to learn trivial encryption functions,
from the paper "Learning to Protect Communications with
Adversarial Neural Cryptography", Abadi & Andersen, 2016.
https://arxiv.org/abs/1610.06918
This program creates and trains three neural networks,
termed Alice, Bob, and Eve. Alice takes inputs
in_m (message), in_k (key) and outputs 'ciphertext'.
Bob takes inputs in_k, ciphertext and tries to reconstruct
the message.
Eve is an adversarial network that takes input ciphertext
and also tries to reconstruct the message.
The main function attempts to train these networks and then
evaluates them, all on random plaintext and key values.
"""
# TensorFlow Python 3 compatibility
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import signal
import sys
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
flags = tf.app.flags
flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate')
flags.DEFINE_integer('batch_size', 4096, 'Batch size')
FLAGS = flags.FLAGS
# Input and output configuration.
TEXT_SIZE = 16
KEY_SIZE = 16
# Training parameters.
ITERS_PER_ACTOR = 1
EVE_MULTIPLIER = 2 # Train Eve 2x for every step of Alice/Bob
# Train until either max loops or Alice/Bob "good enough":
MAX_TRAINING_LOOPS = 850000
BOB_LOSS_THRESH = 0.02 # Exit when Bob loss < 0.02 and Eve > 7.7 bits
EVE_LOSS_THRESH = 7.7
# Logging and evaluation.
PRINT_EVERY = 200 # In training, log every 200 steps.
EVE_EXTRA_ROUNDS = 2000 # At end, train eve a bit more.
RETRAIN_EVE_ITERS = 10000 # Retrain eve up to ITERS*LOOPS times.
RETRAIN_EVE_LOOPS = 25 # With an evaluation each loop
NUMBER_OF_EVE_RESETS = 5 # And do this up to 5 times with a fresh eve.
# Use EVAL_BATCHES samples each time we check accuracy.
EVAL_BATCHES = 1
def batch_of_random_bools(batch_size, n):
"""Return a batch of random "boolean" numbers.
Args:
batch_size: Batch size dimension of returned tensor.
n: number of entries per batch.
Returns:
A [batch_size, n] tensor of "boolean" numbers, where each number is
preresented as -1 or 1.
"""
as_int = tf.random_uniform(
[batch_size, n], minval=0, maxval=2, dtype=tf.int32)
expanded_range = (as_int * 2) - 1
return tf.cast(expanded_range, tf.float32)
class AdversarialCrypto(object):
"""Primary model implementation class for Adversarial Neural Crypto.
This class contains the code for the model itself,
and when created, plumbs the pathways from Alice to Bob and
Eve, creates the optimizers and loss functions, etc.
Attributes:
eve_loss: Eve's loss function.
bob_loss: Bob's loss function. Different units from eve_loss.
eve_optimizer: A tf op that runs Eve's optimizer.
bob_optimizer: A tf op that runs Bob's optimizer.
bob_reconstruction_loss: Bob's message reconstruction loss,
which is comparable to eve_loss.
reset_eve_vars: Execute this op to completely reset Eve.
"""
def get_message_and_key(self):
"""Generate random pseudo-boolean key and message values."""
batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[])
in_m = batch_of_random_bools(batch_size, TEXT_SIZE)
in_k = batch_of_random_bools(batch_size, KEY_SIZE)
return in_m, in_k
def model(self, collection, message, key=None):
"""The model for Alice, Bob, and Eve. If key=None, the first FC layer
takes only the Key as inputs. Otherwise, it uses both the key
and the message.
Args:
collection: The graph keys collection to add new vars to.
message: The input message to process.
key: The input key (if any) to use.
"""
if key is not None:
combined_message = tf.concat(1, [message, key])
else:
combined_message = message
# Ensure that all variables created are in the specified collection.
with tf.contrib.framework.arg_scope(
[tf.contrib.layers.fully_connected, tf.contrib.layers.convolution],
variables_collections=[collection]):
fc = tf.contrib.layers.fully_connected(
combined_message,
TEXT_SIZE + KEY_SIZE,
biases_initializer=tf.constant_initializer(0.0),
activation_fn=None)
# Perform a sequence of 1D convolutions (by expanding the message out to 2D
# and then squeezing it back down).
fc = tf.expand_dims(fc, 2)
# 2,1 -> 1,2
conv = tf.contrib.layers.convolution(
fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid)
# 1,2 -> 1, 2
conv = tf.contrib.layers.convolution(
conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid)
# 1,2 -> 1, 1
conv = tf.contrib.layers.convolution(
conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh)
conv = tf.squeeze(conv, 2)
return conv
def __init__(self):
in_m, in_k = self.get_message_and_key()
encrypted = self.model('alice', in_m, in_k)
decrypted = self.model('bob', encrypted, in_k)
eve_out = self.model('eve', encrypted, None)
self.reset_eve_vars = tf.group(
*[w.initializer for w in tf.get_collection('eve')])
optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
# Eve's goal is to decrypt the entire message:
eve_bits_wrong = tf.reduce_sum(
tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
self.eve_loss = tf.reduce_sum(eve_bits_wrong)
self.eve_optimizer = optimizer.minimize(
self.eve_loss, var_list=tf.get_collection('eve'))
# Alice and Bob want to be accurate...
self.bob_bits_wrong = tf.reduce_sum(
tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
# ... and to not let Eve do better than guessing.
self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong)
bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong)
# 7-9 bits wrong is OK too, so we squish the error function a bit.
# Without doing this, we often tend to hang out at 0.25 / 7.5 error,
# and it seems bad to have continued, high communication error.
bob_eve_loss = tf.reduce_sum(
tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2)
# Rescale the losses to [0, 1] per example and combine.
self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss)
self.bob_optimizer = optimizer.minimize(
self.bob_loss,
var_list=(tf.get_collection('alice') + tf.get_collection('bob')))
def doeval(s, ac, n, itercount):
"""Evaluate the current network on n batches of random examples.
Args:
s: The current TensorFlow session
ac: an instance of the AdversarialCrypto class
n: The number of iterations to run.
itercount: Iteration count label for logging.
Returns:
Bob and eve's loss, as a percent of bits incorrect.
"""
bob_loss_accum = 0
eve_loss_accum = 0
for _ in xrange(n):
bl, el = s.run([ac.bob_reconstruction_loss, ac.eve_loss])
bob_loss_accum += bl
eve_loss_accum += el
bob_loss_percent = bob_loss_accum / (n * FLAGS.batch_size)
eve_loss_percent = eve_loss_accum / (n * FLAGS.batch_size)
print('%d %.2f %.2f' % (itercount, bob_loss_percent, eve_loss_percent))
sys.stdout.flush()
return bob_loss_percent, eve_loss_percent
def train_until_thresh(s, ac):
for j in xrange(MAX_TRAINING_LOOPS):
for _ in xrange(ITERS_PER_ACTOR):
s.run(ac.bob_optimizer)
for _ in xrange(ITERS_PER_ACTOR * EVE_MULTIPLIER):
s.run(ac.eve_optimizer)
if j % PRINT_EVERY == 0:
bob_avg_loss, eve_avg_loss = doeval(s, ac, EVAL_BATCHES, j)
if (bob_avg_loss < BOB_LOSS_THRESH and eve_avg_loss > EVE_LOSS_THRESH):
print('Target losses achieved.')
return True
return False
def train_and_evaluate():
"""Run the full training and evaluation loop."""
ac = AdversarialCrypto()
init = tf.global_variables_initializer()
with tf.Session() as s:
s.run(init)
print('# Batch size: ', FLAGS.batch_size)
print('# Iter Bob_Recon_Error Eve_Recon_Error')
if train_until_thresh(s, ac):
for _ in xrange(EVE_EXTRA_ROUNDS):
s.run(eve_optimizer)
print('Loss after eve extra training:')
doeval(s, ac, EVAL_BATCHES * 2, 0)
for _ in xrange(NUMBER_OF_EVE_RESETS):
print('Resetting Eve')
s.run(reset_eve_vars)
eve_counter = 0
for _ in xrange(RETRAIN_EVE_LOOPS):
for _ in xrange(RETRAIN_EVE_ITERS):
eve_counter += 1
s.run(eve_optimizer)
doeval(s, ac, EVAL_BATCHES, eve_counter)
doeval(s, ac, EVAL_BATCHES, eve_counter)
def main(unused_argv):
# Exit more quietly with Ctrl-C.
signal.signal(signal.SIGINT, signal.SIG_DFL)
train_and_evaluate()
if __name__ == '__main__':
tf.app.run()
......@@ -8,14 +8,14 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718)
<Introduction>
Machine learning techniques based on neural networks are achieving remarkable
results in a wide variety of domains. Often, the training of models requires
large, representative datasets, which may be crowdsourced and contain sensitive
information. The models should not expose private information in these datasets.
Addressing this goal, we develop new algorithmic techniques for learning and a
refined analysis of privacy costs within the framework of differential privacy.
Our implementation and experiments demonstrate that we can train deep neural
networks with non-convex objectives, under a modest privacy budget, and at a
Machine learning techniques based on neural networks are achieving remarkable
results in a wide variety of domains. Often, the training of models requires
large, representative datasets, which may be crowdsourced and contain sensitive
information. The models should not expose private information in these datasets.
Addressing this goal, we develop new algorithmic techniques for learning and a
refined analysis of privacy costs within the framework of differential privacy.
Our implementation and experiments demonstrate that we can train deep neural
networks with non-convex objectives, under a modest privacy budget, and at a
manageable cost in software complexity, training efficiency, and model quality.
paper: https://arxiv.org/abs/1607.00133
......@@ -46,7 +46,7 @@ https://github.com/panyx0718/models/tree/master/slim
# Download the data to the data/ directory.
# List the codes.
ls -R differential_privacy/
$ ls -R differential_privacy/
differential_privacy/:
dp_sgd __init__.py privacy_accountant README.md
......@@ -72,16 +72,16 @@ differential_privacy/privacy_accountant/tf:
accountant.py accountant_test.py BUILD
# List the data.
ls -R data/
$ ls -R data/
./data:
mnist_test.tfrecord mnist_train.tfrecord
# Build the codes.
bazel build -c opt differential_privacy/...
$ bazel build -c opt differential_privacy/...
# Run the mnist differntial privacy training codes.
bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \
$ bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \
--training_data_path=data/mnist_train.tfrecord \
--eval_data_path=data/mnist_test.tfrecord \
--save_path=/tmp/mnist_dir
......@@ -102,6 +102,6 @@ train_accuracy: 0.53
eval_accuracy: 0.53
...
ls /tmp/mnist_dir/
$ ls /tmp/mnist_dir/
checkpoint ckpt ckpt.meta results-0.json
```
......@@ -367,6 +367,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222
```
If you compiled TensorFlow (from v1.1-rc3) with VERBS support and you have the
required device and IB verbs SW stack, you can specify --protocol='grpc+verbs'
In order to use Verbs RDMA for Tensor passing between workers and ps.
Need to add the the --protocol flag in all tasks (ps and workers).
The default protocol is the TensorFlow default protocol of grpc.
[Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now
training Inception in a distributed manner.
......@@ -749,7 +756,7 @@ batch-splitting the model across multiple GPUs.
permit training the model with higher learning rates.
* Often the GPU memory is a bottleneck that prevents employing larger batch
sizes. Employing more GPUs allows one to user larger batch sizes because
sizes. Employing more GPUs allows one to use larger batch sizes because
this model splits the batch across the GPUs.
**NOTE** If one wishes to train this model with *asynchronous* gradient updates,
......
......@@ -45,7 +45,8 @@ def main(unused_args):
{'ps': ps_hosts,
'worker': worker_hosts},
job_name=FLAGS.job_name,
task_index=FLAGS.task_id)
task_index=FLAGS.task_id,
protocol=FLAGS.protocol)
if FLAGS.job_name == 'ps':
# `ps` jobs wait for incoming connections from the workers.
......
......@@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '',
"""Comma-separated list of hostname:port for the """
"""worker jobs. e.g. """
"""'machine1:2222,machine2:1111,machine2:2222'""")
tf.app.flags.DEFINE_string('protocol', 'grpc',
"""Communication protocol to use in distributed """
"""execution (default grpc) """)
tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train',
"""Directory where to write event logs """
......
......@@ -73,7 +73,7 @@ LSTM-8192-2048 (50\% Dropout) | 32.2 | 3.3
<b>How To Run</b>
Pre-requesite:
Prerequisites:
* Install TensorFlow.
* Install Bazel.
......@@ -97,7 +97,7 @@ Pre-requesite:
[link](http://download.tensorflow.org/models/LM_LSTM_CNN/vocab-2016-09-10.txt)
* test dataset: link
[link](http://download.tensorflow.org/models/LM_LSTM_CNN/test/news.en.heldout-00000-of-00050)
* It is recommended to run on modern desktop instead of laptop.
* It is recommended to run on a modern desktop instead of a laptop.
```shell
# 1. Clone the code to your workspace.
......@@ -105,7 +105,7 @@ Pre-requesite:
# 3. Create an empty WORKSPACE file in your workspace.
# 4. Create an empty output directory in your workspace.
# Example directory structure below:
ls -R
$ ls -R
.:
data lm_1b output WORKSPACE
......@@ -121,13 +121,13 @@ BUILD data_utils.py lm_1b_eval.py README.md
./output:
# Build the codes.
bazel build -c opt lm_1b/...
$ bazel build -c opt lm_1b/...
# Run sample mode:
bazel-bin/lm_1b/lm_1b_eval --mode sample \
--prefix "I love that I" \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*'
$ bazel-bin/lm_1b/lm_1b_eval --mode sample \
--prefix "I love that I" \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*'
...(omitted some TensorFlow output)
I love
I love that
......@@ -138,11 +138,11 @@ I love that I find that amazing
...(omitted)
# Run eval mode:
bazel-bin/lm_1b/lm_1b_eval --mode eval \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--input_data data/news.en.heldout-00000-of-00050 \
--ckpt 'data/ckpt-*'
$ bazel-bin/lm_1b/lm_1b_eval --mode eval \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--input_data data/news.en.heldout-00000-of-00050 \
--ckpt 'data/ckpt-*'
...(omitted some TensorFlow output)
Loaded step 14108582.
# perplexity is high initially because words without context are harder to
......@@ -166,28 +166,28 @@ Eval Step: 4531, Average Perplexity: 29.285674.
...(omitted. At convergence, it should be around 30.)
# Run dump_emb mode:
bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' \
--save_dir output
$ bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' \
--save_dir output
...(omitted some TensorFlow output)
Finished softmax weights
Finished word embedding 0/793471
Finished word embedding 1/793471
Finished word embedding 2/793471
...(omitted)
ls output/
$ ls output/
embeddings_softmax.npy ...
# Run dump_lstm_emb mode:
bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' \
--sentence "I love who I am ." \
--save_dir output
ls output/
$ bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \
--pbtxt data/graph-2016-09-10.pbtxt \
--vocab_file data/vocab-2016-09-10.txt \
--ckpt 'data/ckpt-*' \
--sentence "I love who I am ." \
--save_dir output
$ ls output/
lstm_emb_step_0.npy lstm_emb_step_2.npy lstm_emb_step_4.npy
lstm_emb_step_6.npy lstm_emb_step_1.npy lstm_emb_step_3.npy
lstm_emb_step_5.npy
......
......@@ -34,7 +34,7 @@ to tf.SequenceExample.
<b>How to run:</b>
```shell
ls -R
$ ls -R
.:
data next_frame_prediction WORKSPACE
......@@ -52,18 +52,18 @@ cross_conv2.png cross_conv3.png cross_conv.png
# Build everything.
bazel build -c opt next_frame_prediction/...
$ bazel build -c opt next_frame_prediction/...
# The following example runs the generated 2d objects.
# For Sprites dataset, image_size should be 60, norm_scale should be 255.0.
# Batch size is normally 16~64, depending on your memory size.
#
# Run training.
bazel-bin/next_frame_prediction/cross_conv/train \
--batch_size=1 \
--data_filepattern=data/tfrecords \
--image_size=64 \
--log_root=/tmp/predict
$ bazel-bin/next_frame_prediction/cross_conv/train \
--batch_size=1 \
--data_filepattern=data/tfrecords \
--image_size=64 \
--log_root=/tmp/predict
step: 1, loss: 24.428671
step: 2, loss: 19.211605
......@@ -75,11 +75,11 @@ step: 7, loss: 1.747665
step: 8, loss: 1.572436
step: 9, loss: 1.586816
step: 10, loss: 1.434191
#
# Run eval.
bazel-bin/next_frame_prediction/cross_conv/eval \
--batch_size=1 \
--data_filepattern=data/tfrecords_test \
--image_size=64 \
--log_root=/tmp/predict
$ bazel-bin/next_frame_prediction/cross_conv/eval \
--batch_size=1 \
--data_filepattern=data/tfrecords_test \
--image_size=64 \
--log_root=/tmp/predict
```
......@@ -23,7 +23,7 @@ https://arxiv.org/pdf/1605.07146v1.pdf
<b>Settings:</b>
* Random split 50k training set into 45k/5k train/eval split.
* Pad to 36x36 and random crop. Horizontal flip. Per-image whitenting.
* Pad to 36x36 and random crop. Horizontal flip. Per-image whitening.
* Momentum optimizer 0.9.
* Learning rate schedule: 0.1 (40k), 0.01 (60k), 0.001 (>60k).
* L2 weight decay: 0.002.
......@@ -65,40 +65,40 @@ curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binar
<b>How to run:</b>
```shell
# cd to the your workspace.
# It contains an empty WORKSPACE file, resnet codes and cifar10 dataset.
# Note: User can split 5k from train set for eval set.
ls -R
.:
cifar10 resnet WORKSPACE
# cd to the models repository and run with bash. Expected command output shown.
# The directory should contain an empty WORKSPACE file, the resnet code, and the cifar10 dataset.
# Note: The user can split 5k from train set for eval set.
$ ls -R
.:
cifar10 resnet WORKSPACE
./cifar10:
data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin
data_batch_5.bin test_batch.bin
./cifar10:
data_batch_1.bin data_batch_2.bin data_batch_3.bin data_batch_4.bin
data_batch_5.bin test_batch.bin
./resnet:
BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py
./resnet:
BUILD cifar_input.py g3doc README.md resnet_main.py resnet_model.py
# Build everything for GPU.
bazel build -c opt --config=cuda resnet/...
$ bazel build -c opt --config=cuda resnet/...
# Train the model.
bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \
--log_root=/tmp/resnet_model \
--train_dir=/tmp/resnet_model/train \
--dataset='cifar10' \
--num_gpus=1
$ bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \
--log_root=/tmp/resnet_model \
--train_dir=/tmp/resnet_model/train \
--dataset='cifar10' \
--num_gpus=1
# While the model is training, you can also check on its progress using tensorboard:
tensorboard --logdir=/tmp/resnet_model
$ tensorboard --logdir=/tmp/resnet_model
# Evaluate the model.
# Avoid running on the same GPU as the training job at the same time,
# otherwise, you might run out of memory.
bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \
--log_root=/tmp/resnet_model \
--eval_dir=/tmp/resnet_model/test \
--mode=eval \
--dataset='cifar10' \
--num_gpus=0
$ bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \
--log_root=/tmp/resnet_model \
--eval_dir=/tmp/resnet_model/test \
--mode=eval \
--dataset='cifar10' \
--num_gpus=0
```
......@@ -85,7 +85,7 @@ class ResNet(object):
# comparably good performance.
# https://arxiv.org/pdf/1605.07146v1.pdf
# filters = [16, 160, 320, 640]
# Update hps.num_residual_units to 9
# Update hps.num_residual_units to 4
with tf.variable_scope('unit_1_0'):
x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]),
......
......@@ -178,12 +178,12 @@ image classification dataset.
In the table below, we list each model, the corresponding
TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5
accuracy (on the imagenet test set).
Note that the VGG and ResNet parameters have been converted from their original
Note that the VGG and ResNet V1 parameters have been converted from their original
caffe formats
([here](https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014)
and
[here](https://github.com/KaimingHe/deep-residual-networks)),
whereas the Inception parameters have been trained internally at
whereas the Inception and ResNet V2 parameters have been trained internally at
Google. Also be aware that these accuracies were computed by evaluating using a
single image crop. Some academic papers report higher accuracy by using multiple
crops at multiple scales.
......@@ -195,12 +195,19 @@ Model | TF-Slim File | Checkpoint | Top-1 Accuracy| Top-5 Accuracy |
[Inception V3](http://arxiv.org/abs/1512.00567)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py)|[inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)|78.0|93.9|
[Inception V4](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py)|[inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)|80.2|95.2|
[Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3|
[ResNet 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2|
[ResNet 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9|
[ResNet 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2|
[ResNet V1 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2|
[ResNet V1 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9|
[ResNet V1 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2|
[ResNet V2 50](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_50.tar.gz](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz)|75.6|92.8|
[ResNet V2 101](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_101.tar.gz](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz)|77.0|93.7|
[ResNet V2 152](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_152.tar.gz](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz)|77.8|94.1|
[VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8|
[VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8|
^ ResNet V2 models use Inception pre-processing and input image size of 299 (use
`--preprocessing_name inception --eval_image_size 299` when using
`eval_image_classifier.py`). Performance numbers for ResNet V2 models are
reported on ImageNet valdiation set.
Here is an example of how to download the Inception V3 checkpoint:
......@@ -344,10 +351,10 @@ following error:
```bash
InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]
```
This is due to the fact that the VGG and ResNet final layers have only 1000
This is due to the fact that the VGG and ResNet V1 final layers have only 1000
outputs rather than 1001.
To fix this issue, you can set the `--labels_offsets=1` flag. This results in
To fix this issue, you can set the `--labels_offset=1` flag. This results in
the ImageNet labels being shifted down by one:
......
......@@ -16,7 +16,7 @@ The results described below are based on model trained on multi-gpu and
multi-machine settings. It has been simplified to run on only one machine
for open source purpose.
<b>DataSet</b>
<b>Dataset</b>
We used the Gigaword dataset described in [Rush et al. A Neural Attention Model
for Sentence Summarization](https://arxiv.org/abs/1509.00685).
......
......@@ -157,7 +157,7 @@ class PTBModel(object):
(cell_output, state) = cell(inputs[:, time_step, :], state)
outputs.append(cell_output)
output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size])
output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
softmax_w = tf.get_variable(
"softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment