Merge remote-tracking branch 'upstream/master'

9beaea41 · Alexander Gorban · 6159b593 · 3a3c5b9d · 9beaea41 · 9beaea41
Commit 9beaea41 authored May 01, 2017 by Alexander Gorban
13 changed files
--- a/adversarial_crypto/README.md
+++ b/adversarial_crypto/README.md
+# Learning to Protect Communications with Adversarial Neural Cryptography
+This is a slightly-updated model used for the paper
+["Learning to Protect Communications with Adversarial Neural
+Cryptography"](https://arxiv.org/abs/1610.06918).
+> We ask whether neural networks can learn to use secret keys to protect 
+> information from other neural networks. Specifically, we focus on ensuring 
+> confidentiality properties in a multiagent system, and we specify those 
+> properties in terms of an adversary. Thus, a system may consist of neural 
+> networks named Alice and Bob, and we aim to limit what a third neural 
+> network named Eve learns from eavesdropping on the communication between 
+> Alice and Bob. We do not prescribe specific cryptographic algorithms to 
+> these neural networks; instead, we train end-to-end, adversarially. 
+> We demonstrate that the neural networks can learn how to perform forms of 
+> encryption and decryption, and also how to apply these operations
+> selectively in order to meet confidentiality goals.
+This code allows you to train an encoder/decoder/adversary triplet
+and evaluate their effectiveness on randomly generated input and key
+pairs.
+## Prerequisites
+The only software requirements for running the encoder and decoder is having 
+Tensorflow installed.
+Requires Tensorflow r0.12 or later.
+## Training and evaluating
+After installing TensorFlow and ensuring that your paths are configured
+appropriately:
+  python train_eval.py
+This will begin training a fresh model.  If and when the model becomes
+sufficiently well-trained, it will reset the Eve model multiple times
+and retrain it from scratch, outputting the accuracy thus obtained
+in each run.
+## Model differences from the paper
+The model has been simplified slightly from the one described in
+the paper - the convolutional layer width was reduced by a factor
+of two.  In the version in the paper, there was a nonlinear unit
+after the fully-connected layer;  that nonlinear has been removed
+here.  These changes improve the robustness of training.  The
+initializer for the convolution layers has switched to the 
+tf.contrib.layers default of xavier_initializer instead of
+a simpler truncated_normal.
+## Contact information
+This model repository is maintained by David G. Andersen
+([dave-andersen](https://github.com/dave-andersen)).
--- a/adversarial_crypto/train_eval.py
+++ b/adversarial_crypto/train_eval.py
+# Copyright 2016 The TensorFlow Authors All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Adversarial training to learn trivial encryption functions,
+from the paper "Learning to Protect Communications with
+Adversarial Neural Cryptography", Abadi & Andersen, 2016.
+https://arxiv.org/abs/1610.06918
+This program creates and trains three neural networks,
+termed Alice, Bob, and Eve.  Alice takes inputs
+in_m (message), in_k (key) and outputs 'ciphertext'.
+Bob takes inputs in_k, ciphertext and tries to reconstruct
+the message.
+Eve is an adversarial network that takes input ciphertext
+and also tries to reconstruct the message.
+The main function attempts to train these networks and then
+evaluates them, all on random plaintext and key values.
+"""
+# TensorFlow Python 3 compatibility
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import signal
+import sys
+from six.moves import xrange  # pylint: disable=redefined-builtin
+import tensorflow as tf
+flags = tf.app.flags
+flags.DEFINE_float('learning_rate', 0.0008, 'Constant learning rate')
+flags.DEFINE_integer('batch_size', 4096, 'Batch size')
+FLAGS = flags.FLAGS
+# Input and output configuration.
+TEXT_SIZE = 16
+KEY_SIZE = 16
+# Training parameters.
+ITERS_PER_ACTOR = 1
+EVE_MULTIPLIER = 2  # Train Eve 2x for every step of Alice/Bob
+# Train until either max loops or Alice/Bob "good enough":
+MAX_TRAINING_LOOPS = 850000
+BOB_LOSS_THRESH = 0.02  # Exit when Bob loss < 0.02 and Eve > 7.7 bits
+EVE_LOSS_THRESH = 7.7
+# Logging and evaluation.
+PRINT_EVERY = 200  # In training, log every 200 steps.
+EVE_EXTRA_ROUNDS = 2000  # At end, train eve a bit more.
+RETRAIN_EVE_ITERS = 10000  # Retrain eve up to ITERS*LOOPS times.
+RETRAIN_EVE_LOOPS = 25  # With an evaluation each loop
+NUMBER_OF_EVE_RESETS = 5  # And do this up to 5 times with a fresh eve.
+# Use EVAL_BATCHES samples each time we check accuracy.
+EVAL_BATCHES = 1
+def batch_of_random_bools(batch_size, n):
+  """Return a batch of random "boolean" numbers.
+  Args:
+    batch_size:  Batch size dimension of returned tensor.
+    n:  number of entries per batch.
+  Returns:
+    A [batch_size, n] tensor of "boolean" numbers, where each number is
+    preresented as -1 or 1.
+  """
+  as_int = tf.random_uniform(
+      [batch_size, n], minval=0, maxval=2, dtype=tf.int32)
+  expanded_range = (as_int * 2) - 1
+  return tf.cast(expanded_range, tf.float32)
+class AdversarialCrypto(object):
+  """Primary model implementation class for Adversarial Neural Crypto.
+  This class contains the code for the model itself,
+  and when created, plumbs the pathways from Alice to Bob and
+  Eve, creates the optimizers and loss functions, etc.
+  Attributes:
+    eve_loss:  Eve's loss function.
+    bob_loss:  Bob's loss function.  Different units from eve_loss.
+    eve_optimizer:  A tf op that runs Eve's optimizer.
+    bob_optimizer:  A tf op that runs Bob's optimizer.
+    bob_reconstruction_loss:  Bob's message reconstruction loss,
+      which is comparable to eve_loss.
+    reset_eve_vars:  Execute this op to completely reset Eve.
+  """
+  def get_message_and_key(self):
+    """Generate random pseudo-boolean key and message values."""
+    batch_size = tf.placeholder_with_default(FLAGS.batch_size, shape=[])
+    in_m = batch_of_random_bools(batch_size, TEXT_SIZE)
+    in_k = batch_of_random_bools(batch_size, KEY_SIZE)
+    return in_m, in_k
+  def model(self, collection, message, key=None):
+    """The model for Alice, Bob, and Eve.  If key=None, the first FC layer
+    takes only the Key as inputs.  Otherwise, it uses both the key
+    and the message.
+    Args:
+      collection:  The graph keys collection to add new vars to.
+      message:  The input message to process.
+      key:  The input key (if any) to use.
+    """
+    if key is not None:
+      combined_message = tf.concat(1, [message, key])
+    else:
+      combined_message = message
+    # Ensure that all variables created are in the specified collection.
+    with tf.contrib.framework.arg_scope(
+        [tf.contrib.layers.fully_connected, tf.contrib.layers.convolution],
+        variables_collections=[collection]):
+      fc = tf.contrib.layers.fully_connected(
+          combined_message,
+          TEXT_SIZE + KEY_SIZE,
+          biases_initializer=tf.constant_initializer(0.0),
+          activation_fn=None)
+      # Perform a sequence of 1D convolutions (by expanding the message out to 2D
+      # and then squeezing it back down).
+      fc = tf.expand_dims(fc, 2)
+      # 2,1 -> 1,2
+      conv = tf.contrib.layers.convolution(
+          fc, 2, 2, 2, 'SAME', activation_fn=tf.nn.sigmoid)
+      # 1,2 -> 1, 2
+      conv = tf.contrib.layers.convolution(
+          conv, 2, 1, 1, 'SAME', activation_fn=tf.nn.sigmoid)
+      # 1,2 -> 1, 1
+      conv = tf.contrib.layers.convolution(
+          conv, 1, 1, 1, 'SAME', activation_fn=tf.nn.tanh)
+      conv = tf.squeeze(conv, 2)
+      return conv
+  def __init__(self):
+    in_m, in_k = self.get_message_and_key()
+    encrypted = self.model('alice', in_m, in_k)
+    decrypted = self.model('bob', encrypted, in_k)
+    eve_out = self.model('eve', encrypted, None)
+    self.reset_eve_vars = tf.group(
+        *[w.initializer for w in tf.get_collection('eve')])
+    optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)
+    # Eve's goal is to decrypt the entire message:
+    eve_bits_wrong = tf.reduce_sum(
+        tf.abs((eve_out + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
+    self.eve_loss = tf.reduce_sum(eve_bits_wrong)
+    self.eve_optimizer = optimizer.minimize(
+        self.eve_loss, var_list=tf.get_collection('eve'))
+    # Alice and Bob want to be accurate...
+    self.bob_bits_wrong = tf.reduce_sum(
+        tf.abs((decrypted + 1.0) / 2.0 - (in_m + 1.0) / 2.0), [1])
+    # ... and to not let Eve do better than guessing.
+    self.bob_reconstruction_loss = tf.reduce_sum(self.bob_bits_wrong)
+    bob_eve_error_deviation = tf.abs(float(TEXT_SIZE) / 2.0 - eve_bits_wrong)
+    # 7-9 bits wrong is OK too, so we squish the error function a bit.
+    # Without doing this, we often tend to hang out at 0.25 / 7.5 error,
+    # and it seems bad to have continued, high communication error.
+    bob_eve_loss = tf.reduce_sum(
+        tf.square(bob_eve_error_deviation) / (TEXT_SIZE / 2)**2)
+    # Rescale the losses to [0, 1] per example and combine.
+    self.bob_loss = (self.bob_reconstruction_loss / TEXT_SIZE + bob_eve_loss)
+    self.bob_optimizer = optimizer.minimize(
+        self.bob_loss,
+        var_list=(tf.get_collection('alice') + tf.get_collection('bob')))
+def doeval(s, ac, n, itercount):
+  """Evaluate the current network on n batches of random examples.
+  Args:
+    s:  The current TensorFlow session
+    ac: an instance of the AdversarialCrypto class
+    n:  The number of iterations to run.
+    itercount: Iteration count label for logging.
+  Returns:
+    Bob and eve's loss, as a percent of bits incorrect.
+  """
+  bob_loss_accum = 0
+  eve_loss_accum = 0
+  for _ in xrange(n):
+    bl, el = s.run([ac.bob_reconstruction_loss, ac.eve_loss])
+    bob_loss_accum += bl
+    eve_loss_accum += el
+  bob_loss_percent = bob_loss_accum / (n * FLAGS.batch_size)
+  eve_loss_percent = eve_loss_accum / (n * FLAGS.batch_size)
+  print('%d %.2f %.2f' % (itercount, bob_loss_percent, eve_loss_percent))
+  sys.stdout.flush()
+  return bob_loss_percent, eve_loss_percent
+def train_until_thresh(s, ac):
+  for j in xrange(MAX_TRAINING_LOOPS):
+    for _ in xrange(ITERS_PER_ACTOR):
+      s.run(ac.bob_optimizer)
+    for _ in xrange(ITERS_PER_ACTOR * EVE_MULTIPLIER):
+      s.run(ac.eve_optimizer)
+    if j % PRINT_EVERY == 0:
+      bob_avg_loss, eve_avg_loss = doeval(s, ac, EVAL_BATCHES, j)
+      if (bob_avg_loss < BOB_LOSS_THRESH and eve_avg_loss > EVE_LOSS_THRESH):
+        print('Target losses achieved.')
+        return True
+  return False
+def train_and_evaluate():
+  """Run the full training and evaluation loop."""
+  ac = AdversarialCrypto()
+  init = tf.global_variables_initializer()
+  with tf.Session() as s:
+    s.run(init)
+    print('# Batch size: ', FLAGS.batch_size)
+    print('# Iter Bob_Recon_Error Eve_Recon_Error')
+    if train_until_thresh(s, ac):
+      for _ in xrange(EVE_EXTRA_ROUNDS):
+        s.run(eve_optimizer)
+      print('Loss after eve extra training:')
+      doeval(s, ac, EVAL_BATCHES * 2, 0)
+      for _ in xrange(NUMBER_OF_EVE_RESETS):
+        print('Resetting Eve')
+        s.run(reset_eve_vars)
+        eve_counter = 0
+        for _ in xrange(RETRAIN_EVE_LOOPS):
+          for _ in xrange(RETRAIN_EVE_ITERS):
+            eve_counter += 1
+            s.run(eve_optimizer)
+          doeval(s, ac, EVAL_BATCHES, eve_counter)
+        doeval(s, ac, EVAL_BATCHES, eve_counter)
+def main(unused_argv):
+  # Exit more quietly with Ctrl-C.
+  signal.signal(signal.SIGINT, signal.SIG_DFL)
+  train_and_evaluate()
+if __name__ == '__main__':
+  tf.app.run()
--- a/differential_privacy/dp_sgd/README.md
+++ b/differential_privacy/dp_sgd/README.md
@@ -8,14 +8,14 @@ Open Sourced By: Xin Pan (xpan@google.com, github: panyx0718)
 <Introduction>
-Machine learning techniques based on neural networks are achieving remarkable 
+Machine learning techniques based on neural networks are achieving remarkable
-results in a wide variety of domains. Often, the training of models requires 
+results in a wide variety of domains. Often, the training of models requires
-large, representative datasets, which may be crowdsourced and contain sensitive 
+large, representative datasets, which may be crowdsourced and contain sensitive
-information. The models should not expose private information in these datasets. 
+information. The models should not expose private information in these datasets.
-Addressing this goal, we develop new algorithmic techniques for learning and a 
+Addressing this goal, we develop new algorithmic techniques for learning and a
-refined analysis of privacy costs within the framework of differential privacy. 
+refined analysis of privacy costs within the framework of differential privacy.
-Our implementation and experiments demonstrate that we can train deep neural 
+Our implementation and experiments demonstrate that we can train deep neural
-networks with non-convex objectives, under a modest privacy budget, and at a 
+networks with non-convex objectives, under a modest privacy budget, and at a
 manageable cost in software complexity, training efficiency, and model quality.
 paper: https://arxiv.org/abs/1607.00133
@@ -46,7 +46,7 @@ https://github.com/panyx0718/models/tree/master/slim
 # Download the data to the data/ directory.
 # List the codes.
-ls -R differential_privacy/
+$ ls -R differential_privacy/
 differential_privacy/:
 dp_sgd  __init__.py  privacy_accountant  README.md
@@ -72,16 +72,16 @@ differential_privacy/privacy_accountant/tf:
 accountant.py  accountant_test.py  BUILD
 # List the data.
-ls -R data/
+$ ls -R data/
 ./data:
 mnist_test.tfrecord  mnist_train.tfrecord
 # Build the codes.
-bazel build -c opt differential_privacy/...
+$ bazel build -c opt differential_privacy/...
 # Run the mnist differntial privacy training codes.
-bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \
+$ bazel-bin/differential_privacy/dp_sgd/dp_mnist/dp_mnist \
    --training_data_path=data/mnist_train.tfrecord \
    --eval_data_path=data/mnist_test.tfrecord \
    --save_path=/tmp/mnist_dir
@@ -102,6 +102,6 @@ train_accuracy: 0.53
 eval_accuracy: 0.53
 ...
-ls /tmp/mnist_dir/
+$ ls /tmp/mnist_dir/
 checkpoint  ckpt  ckpt.meta  results-0.json
 ```
--- a/inception/README.md
+++ b/inception/README.md
@@ -367,6 +367,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo
 I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222
 ```
+If you compiled TensorFlow (from v1.1-rc3) with VERBS support and you have the
+required device and IB verbs SW stack, you can specify --protocol='grpc+verbs'
+In order to use Verbs RDMA for Tensor passing between workers and ps.
+Need to add the the --protocol flag in all tasks (ps and workers).
+The default protocol is the TensorFlow default protocol of grpc.
 [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now
 training Inception in a distributed manner.
@@ -749,7 +756,7 @@ batch-splitting the model across multiple GPUs.
    permit training the model with higher learning rates.
 *   Often the GPU memory is a bottleneck that prevents employing larger batch
-    sizes. Employing more GPUs allows one to user larger batch sizes because
+    sizes. Employing more GPUs allows one to use larger batch sizes because
    this model splits the batch across the GPUs.
 **NOTE** If one wishes to train this model with *asynchronous* gradient updates,

--- a/inception/inception/imagenet_distributed_train.py
+++ b/inception/inception/imagenet_distributed_train.py
@@ -45,7 +45,8 @@ def main(unused_args):
      {'ps': ps_hosts,
       'worker': worker_hosts},
      job_name=FLAGS.job_name,
-      task_index=FLAGS.task_id)
+      task_index=FLAGS.task_id,
+      protocol=FLAGS.protocol)
  if FLAGS.job_name == 'ps':
    # `ps` jobs wait for incoming connections from the workers.

--- a/inception/inception/inception_distributed_train.py
+++ b/inception/inception/inception_distributed_train.py
@@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '',
                           """Comma-separated list of hostname:port for the """
                           """worker jobs. e.g. """
                           """'machine1:2222,machine2:1111,machine2:2222'""")
+tf.app.flags.DEFINE_string('protocol', 'grpc',
+                           """Communication protocol to use in distributed """
+                           """execution (default grpc) """)
 tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train',
                           """Directory where to write event logs """

--- a/lm_1b/README.md
+++ b/lm_1b/README.md
@@ -73,7 +73,7 @@ LSTM-8192-2048 (50\% Dropout) | 32.2 | 3.3
 <b>How To Run</b>
-Pre-requesite:
+Prerequisites:
 * Install TensorFlow.
 * Install Bazel.
@@ -97,7 +97,7 @@ Pre-requesite:
  [link](http://download.tensorflow.org/models/LM_LSTM_CNN/vocab-2016-09-10.txt)
  * test dataset: link
  [link](http://download.tensorflow.org/models/LM_LSTM_CNN/test/news.en.heldout-00000-of-00050)
-* It is recommended to run on modern desktop instead of laptop.
+* It is recommended to run on a modern desktop instead of a laptop.
 ```shell
 # 1. Clone the code to your workspace.
@@ -105,7 +105,7 @@ Pre-requesite:
 # 3. Create an empty WORKSPACE file in your workspace.
 # 4. Create an empty output directory in your workspace.
 # Example directory structure below:
-ls -R
+$ ls -R
 .:
 data  lm_1b  output  WORKSPACE
@@ -121,13 +121,13 @@ BUILD  data_utils.py  lm_1b_eval.py  README.md
 ./output:
 # Build the codes.
-bazel build -c opt lm_1b/...
+$ bazel build -c opt lm_1b/...
 # Run sample mode:
-bazel-bin/lm_1b/lm_1b_eval --mode sample \
+$ bazel-bin/lm_1b/lm_1b_eval --mode sample \
-                           --prefix "I love that I" \
+                             --prefix "I love that I" \
-                           --pbtxt data/graph-2016-09-10.pbtxt \
+                             --pbtxt data/graph-2016-09-10.pbtxt \
-                           --vocab_file data/vocab-2016-09-10.txt  \
+                             --vocab_file data/vocab-2016-09-10.txt  \
-                           --ckpt 'data/ckpt-*'
+                             --ckpt 'data/ckpt-*'
 ...(omitted some TensorFlow output)
 I love
 I love that
@@ -138,11 +138,11 @@ I love that I find that amazing
 ...(omitted)
 # Run eval mode:
-bazel-bin/lm_1b/lm_1b_eval --mode eval \
+$ bazel-bin/lm_1b/lm_1b_eval --mode eval \
-                           --pbtxt data/graph-2016-09-10.pbtxt \
+                             --pbtxt data/graph-2016-09-10.pbtxt \
-                           --vocab_file data/vocab-2016-09-10.txt  \
+                             --vocab_file data/vocab-2016-09-10.txt  \
-                           --input_data data/news.en.heldout-00000-of-00050 \
+                             --input_data data/news.en.heldout-00000-of-00050 \
-                           --ckpt 'data/ckpt-*'
+                             --ckpt 'data/ckpt-*'
 ...(omitted some TensorFlow output)
 Loaded step 14108582.
 # perplexity is high initially because words without context are harder to
@@ -166,28 +166,28 @@ Eval Step: 4531, Average Perplexity: 29.285674.
 ...(omitted. At convergence, it should be around 30.)
 # Run dump_emb mode:
-bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \
+$ bazel-bin/lm_1b/lm_1b_eval --mode dump_emb \
-                           --pbtxt data/graph-2016-09-10.pbtxt \
+                             --pbtxt data/graph-2016-09-10.pbtxt \
-                           --vocab_file data/vocab-2016-09-10.txt  \
+                             --vocab_file data/vocab-2016-09-10.txt  \
-                           --ckpt 'data/ckpt-*' \
+                             --ckpt 'data/ckpt-*' \
-                           --save_dir output
+                             --save_dir output
 ...(omitted some TensorFlow output)
 Finished softmax weights
 Finished word embedding 0/793471
 Finished word embedding 1/793471
 Finished word embedding 2/793471
 ...(omitted)
-ls output/
+$ ls output/
 embeddings_softmax.npy ...
 # Run dump_lstm_emb mode:
-bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \
+$ bazel-bin/lm_1b/lm_1b_eval --mode dump_lstm_emb \
-                           --pbtxt data/graph-2016-09-10.pbtxt \
+                             --pbtxt data/graph-2016-09-10.pbtxt \
-                           --vocab_file data/vocab-2016-09-10.txt \
+                             --vocab_file data/vocab-2016-09-10.txt \
-                           --ckpt 'data/ckpt-*' \
+                             --ckpt 'data/ckpt-*' \
-                           --sentence "I love who I am ." \
+                             --sentence "I love who I am ." \
-                           --save_dir output
+                             --save_dir output
-ls output/
+$ ls output/
 lstm_emb_step_0.npy  lstm_emb_step_2.npy  lstm_emb_step_4.npy
 lstm_emb_step_6.npy  lstm_emb_step_1.npy  lstm_emb_step_3.npy
 lstm_emb_step_5.npy

--- a/next_frame_prediction/README.md
+++ b/next_frame_prediction/README.md
@@ -34,7 +34,7 @@ to tf.SequenceExample.
 <b>How to run:</b>
 ```shell
-ls -R
+$ ls -R
 .:
 data  next_frame_prediction  WORKSPACE
@@ -52,18 +52,18 @@ cross_conv2.png  cross_conv3.png  cross_conv.png
 # Build everything.
-bazel build -c opt next_frame_prediction/...
+$ bazel build -c opt next_frame_prediction/...
 # The following example runs the generated 2d objects.
 # For Sprites dataset, image_size should be 60, norm_scale should be 255.0.
 # Batch size is normally 16~64, depending on your memory size.
-#
 # Run training.
-bazel-bin/next_frame_prediction/cross_conv/train \
+$ bazel-bin/next_frame_prediction/cross_conv/train \
-  --batch_size=1 \
+    --batch_size=1 \
-  --data_filepattern=data/tfrecords \
+    --data_filepattern=data/tfrecords \
-  --image_size=64 \
+    --image_size=64 \
-  --log_root=/tmp/predict
+    --log_root=/tmp/predict
 step: 1, loss: 24.428671
 step: 2, loss: 19.211605
@@ -75,11 +75,11 @@ step: 7, loss: 1.747665
 step: 8, loss: 1.572436
 step: 9, loss: 1.586816
 step: 10, loss: 1.434191
-#
 # Run eval.
-bazel-bin/next_frame_prediction/cross_conv/eval \
+$ bazel-bin/next_frame_prediction/cross_conv/eval \
-  --batch_size=1 \
+    --batch_size=1 \
-  --data_filepattern=data/tfrecords_test \
+    --data_filepattern=data/tfrecords_test \
-  --image_size=64 \
+    --image_size=64 \
-  --log_root=/tmp/predict
+    --log_root=/tmp/predict
 ```
--- a/resnet/README.md
+++ b/resnet/README.md
@@ -23,7 +23,7 @@ https://arxiv.org/pdf/1605.07146v1.pdf
 <b>Settings:</b>
 * Random split 50k training set into 45k/5k train/eval split.
-* Pad to 36x36 and random crop. Horizontal flip. Per-image whitenting. 
+* Pad to 36x36 and random crop. Horizontal flip. Per-image whitening.
 * Momentum optimizer 0.9.
 * Learning rate schedule: 0.1 (40k), 0.01 (60k), 0.001 (>60k).
 * L2 weight decay: 0.002.
@@ -65,40 +65,40 @@ curl -o cifar-100-binary.tar.gz https://www.cs.toronto.edu/~kriz/cifar-100-binar
 <b>How to run:</b>
 ```shell
-# cd to the your workspace.
+# cd to the models repository and run with bash. Expected command output shown.
-# It contains an empty WORKSPACE file, resnet codes and cifar10 dataset.
+# The directory should contain an empty WORKSPACE file, the resnet code, and the cifar10 dataset.
-# Note: User can split 5k from train set for eval set.
+# Note: The user can split 5k from train set for eval set.
-ls -R
+$ ls -R
-  .:
+.:
-  cifar10  resnet  WORKSPACE
+cifar10  resnet  WORKSPACE
-  ./cifar10:
+./cifar10:
-  data_batch_1.bin  data_batch_2.bin  data_batch_3.bin  data_batch_4.bin
+data_batch_1.bin  data_batch_2.bin  data_batch_3.bin  data_batch_4.bin
-  data_batch_5.bin  test_batch.bin
+data_batch_5.bin  test_batch.bin
-  ./resnet:
+./resnet:
-  BUILD  cifar_input.py  g3doc  README.md  resnet_main.py  resnet_model.py
+BUILD  cifar_input.py  g3doc  README.md  resnet_main.py  resnet_model.py
 # Build everything for GPU.
-bazel build -c opt --config=cuda resnet/...
+$ bazel build -c opt --config=cuda resnet/...
 # Train the model.
-bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \
+$ bazel-bin/resnet/resnet_main --train_data_path=cifar10/data_batch* \
-                             --log_root=/tmp/resnet_model \
+                               --log_root=/tmp/resnet_model \
-                             --train_dir=/tmp/resnet_model/train \
+                               --train_dir=/tmp/resnet_model/train \
-                             --dataset='cifar10' \
+                               --dataset='cifar10' \
-                             --num_gpus=1
+                               --num_gpus=1
 # While the model is training, you can also check on its progress using tensorboard:
-tensorboard --logdir=/tmp/resnet_model
+$ tensorboard --logdir=/tmp/resnet_model
 # Evaluate the model.
 # Avoid running on the same GPU as the training job at the same time,
 # otherwise, you might run out of memory.
-bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \
+$ bazel-bin/resnet/resnet_main --eval_data_path=cifar10/test_batch.bin \
-                             --log_root=/tmp/resnet_model \
+                               --log_root=/tmp/resnet_model \
-                             --eval_dir=/tmp/resnet_model/test \
+                               --eval_dir=/tmp/resnet_model/test \
-                             --mode=eval \
+                               --mode=eval \
-                             --dataset='cifar10' \
+                               --dataset='cifar10' \
-                             --num_gpus=0
+                               --num_gpus=0
 ```
--- a/resnet/resnet_model.py
+++ b/resnet/resnet_model.py
@@ -85,7 +85,7 @@ class ResNet(object):
      # comparably good performance.
      # https://arxiv.org/pdf/1605.07146v1.pdf
      # filters = [16, 160, 320, 640]
-      # Update hps.num_residual_units to 9
+      # Update hps.num_residual_units to 4
    with tf.variable_scope('unit_1_0'):
      x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]),

--- a/slim/README.md
+++ b/slim/README.md
@@ -178,12 +178,12 @@ image classification dataset.
 In the table below, we list each model, the corresponding
 TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5
 accuracy (on the imagenet test set).
-Note that the VGG and ResNet parameters have been converted from their original
+Note that the VGG and ResNet V1 parameters have been converted from their original
 caffe formats
 ([here](https://github.com/BVLC/caffe/wiki/Model-Zoo#models-used-by-the-vgg-team-in-ilsvrc-2014)
 and
 [here](https://github.com/KaimingHe/deep-residual-networks)),
-whereas the Inception parameters have been trained internally at
+whereas the Inception and ResNet V2 parameters have been trained internally at
 Google. Also be aware that these accuracies were computed by evaluating using a
 single image crop. Some academic papers report higher accuracy by using multiple
 crops at multiple scales.
@@ -195,12 +195,19 @@ Model | TF-Slim File | Checkpoint | Top-1 Accuracy| Top-5 Accuracy |
 [Inception V3](http://arxiv.org/abs/1512.00567)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py)|[inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)|78.0|93.9|
 [Inception V4](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_v4.py)|[inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)|80.2|95.2|
 [Inception-ResNet-v2](http://arxiv.org/abs/1602.07261)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py)|[inception_resnet_v2.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)|80.4|95.3|
-[ResNet 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2|
+[ResNet V1 50](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_50.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)|75.2|92.2|
-[ResNet 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9|
+[ResNet V1 101](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_101.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)|76.4|92.9|
-[ResNet 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2|
+[ResNet V1 152](https://arxiv.org/abs/1512.03385)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v1.py)|[resnet_v1_152.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)|76.8|93.2|
+[ResNet V2 50](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_50.tar.gz](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz)|75.6|92.8|
+[ResNet V2 101](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_101.tar.gz](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz)|77.0|93.7|
+[ResNet V2 152](https://arxiv.org/abs/1603.05027)^|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/resnet_v2.py)|[resnet_v2_152.tar.gz](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz)|77.8|94.1|
 [VGG 16](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_16.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)|71.5|89.8|
 [VGG 19](http://arxiv.org/abs/1409.1556.pdf)|[Code](https://github.com/tensorflow/models/blob/master/slim/nets/vgg.py)|[vgg_19.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)|71.1|89.8|
+^ ResNet V2 models use Inception pre-processing and input image size of 299 (use
+`--preprocessing_name inception --eval_image_size 299` when using
+`eval_image_classifier.py`). Performance numbers for ResNet V2 models are
+reported on ImageNet valdiation set. 
 Here is an example of how to download the Inception V3 checkpoint:
@@ -344,10 +351,10 @@ following error:
 ```bash
 InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1001] rhs shape= [1000]
 ```
-This is due to the fact that the VGG and ResNet final layers have only 1000
+This is due to the fact that the VGG and ResNet V1 final layers have only 1000
 outputs rather than 1001.
-To fix this issue, you can set the `--labels_offsets=1` flag. This results in
+To fix this issue, you can set the `--labels_offset=1` flag. This results in
 the ImageNet labels being shifted down by one:

--- a/textsum/README.md
+++ b/textsum/README.md
@@ -16,7 +16,7 @@ The results described below are based on model trained on multi-gpu and
 multi-machine settings. It has been simplified to run on only one machine
 for open source purpose.
-<b>DataSet</b>
+<b>Dataset</b>
 We used the Gigaword dataset described in [Rush et al. A Neural Attention Model
 for Sentence Summarization](https://arxiv.org/abs/1509.00685).

--- a/tutorials/rnn/ptb/ptb_word_lm.py
+++ b/tutorials/rnn/ptb/ptb_word_lm.py
@@ -157,7 +157,7 @@ class PTBModel(object):
        (cell_output, state) = cell(inputs[:, time_step, :], state)
        outputs.append(cell_output)
-    output = tf.reshape(tf.concat(axis=1, values=outputs), [-1, size])
+    output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
    softmax_w = tf.get_variable(
        "softmax_w", [size, vocab_size], dtype=data_type())
    softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())