Add minigo (#3955)

* Add minigo * Fix comments and make python version compatible

Add minigo (#3955)
* Add minigo * Fix comments and make python version compatible
84da970e · Yanhui Liang · GitHub · 6ff0a53f · 84da970e · 84da970e
Unverified Commit 84da970e authored Apr 18, 2018 by Yanhui Liang Committed by GitHub Apr 18, 2018
19 changed files
--- a/research/minigo/README.md
+++ b/research/minigo/README.md
+# MiniGo
+This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
+
+MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus.
+
+This implementation maintains the features of model training and validation, and also provides evaluation of two Go models.
+
+
+## DualNet Model
+The input to the neural network is a [board_size * board_size * 17] image stack
+comprising 17 binary feature planes. 8 feature planes consist of binary values
+indicating the presence of the current player's stones; A further 8 feature
+planes represent the corresponding features for the opponent's stones; The final
+feature plane represents the color to play, and has a constant value of either 1
+if black is to play or 0 if white to play. Check `features.py` for more details.
+
+In MiniGo implementation, the input features are processed by a residual tower
+that consists of a single convolutional block followed by either 9 or 19
+residual blocks.
+The convolutional block applies the following modules:
+  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+
+Each residual block applies the following modules sequentially to its input:
+  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  5. Batch normalization
+  6. A skip connection that adds the input to the block
+  7. A rectifier non-linearity
+
+Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
+
+The output of the residual tower is passed into two separate "heads" for
+computing the policy and value respectively. The policy head applies the
+following modules:
+  1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move
+
+The value head applies the following modules:
+  1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
+    board size and 64 for 9x9 board size
+  5. A rectifier non-linearity
+  6. A fully connected linear layer to a scalar
+  7. A tanh non-linearity outputting a scalar in the range [-1, 1]
+
+The overall network depth, in the 10 or 20 block network, is 19 or 39
+parameterized layers respectively for the residual tower, plus an additional 2
+layers for the policy head and 3 layers for the value head.
+
+## Getting Started
+Please follow the [instructions](https://github.com/tensorflow/minigo/blob/master/README.md#getting-started) in original Minigo repo to set up the environment.
+
+## Training Model
+One iteration of reinforcement learning consists of the following steps:
+ - Bootstrap: initializes a random model
+ - Selfplay: plays games with the latest model, producing data used for training
+ - Gather: groups games played with the same model into larger files of tfexamples.
+ - Train: trains a new model with the selfplay results from the most recent N
+   generations.
+
+ Run `minigo.py`.
+ ```
+ python minigo.py
+ ```
+
+## Validating Model
+ Run `minigo.py` with `--validation` argument
+ ```
+ python minigo.py --validation
+ ```
+ The `--validation` argument is to generate holdout dataset for model validation
+
+## Evaluating MiniGo Models
+ Run `minigo.py` with `--evaluation` argument
+ ```
+ python minigo.py --evaluation
+ ```
+ The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model.
+
+## Testing Pipeline
+As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose.
+
+ Run `minigo.py` with `--debug` argument
+ ```
+ python minigo.py --debug
+ ```
+ The `--debug` argument is for testing purpose with a dummy model.
+
+Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`.
+To test validation, run the following commands:
+ ```
+ python minigo.py --debug --validation
+ ```
+To test evaluation, run the following commands:
+ ```
+ python minigo.py --debug --evaluation
+ ```
+To test both validation and evaluation, run the following commands:
+ ```
+ python minigo.py --debug --validation --evaluation
+ ```
+
+## MCTS and Go features (TODO)
+Code clean up on MCTS and Go features.
--- a/research/minigo/__init__.py
+++ b/research/minigo/__init__.py
--- a/research/minigo/coords.py
+++ b/research/minigo/coords.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Logic for dealing with coordinates.
+
+This introduces some helpers and terminology that are used throughout MiniGo.
+
+MiniGo Coordinate: This is a tuple of the form (row, column) that is indexed
+  starting out at (0, 0) from the upper-left.
+Flattened Coordinate: this is a number ranging from 0 - N^2 (so N^2+1
+  possible values). The extra value N^2 is used to mark a 'pass' move.
+SGF Coordinate: Coordinate used for SGF serialization format. Coordinates use
+  two-letter pairs having the form (column, row) indexed from the upper-left
+  where 0, 0 = 'aa'.
+KGS Coordinate: Human-readable coordinate string indexed from bottom left, with
+  the first character a capital letter for the column and the second a number
+  from 1-19 for the row. Note that KGS chooses to skip the letter 'I' due to
+  its similarity with 'l' (lowercase 'L').
+PYGTP Coordinate: Tuple coordinate indexed starting at 1,1 from bottom-left
+  in the format (column, row)
+
+So, for a 19x19,
+
+Coord Type      upper_left      upper_right     pass
+-------------------------------------------------------
+minigo coord    (0, 0)          (0, 18)         None
+flat            0               18              361
+SGF             'aa'            'sa'            ''
+KGS             'A19'           'T19'           'pass'
+pygtp           (1, 19)         (19, 19)        (0, 0)
+"""
+
+import gtp
+
+# We provide more than 19 entries here in case of boards larger than 19 x 19.
+_SGF_COLUMNS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
+_KGS_COLUMNS = 'ABCDEFGHJKLMNOPQRSTUVWXYZ'
+
+
+def from_flat(board_size, flat):
+  """Converts from a flattened coordinate to a MiniGo coordinate."""
+  if flat == board_size * board_size:
+    return None
+  return divmod(flat, board_size)
+
+
+def to_flat(board_size, coord):
+  """Converts from a MiniGo coordinate to a flattened coordinate."""
+  if coord is None:
+    return board_size * board_size
+  return board_size * coord[0] + coord[1]
+
+
+def from_sgf(sgfc):
+  """Converts from an SGF coordinate to a MiniGo coordinate."""
+  if not sgfc:
+    return None
+  return _SGF_COLUMNS.index(sgfc[1]), _SGF_COLUMNS.index(sgfc[0])
+
+
+def to_sgf(coord):
+  """Converts from a MiniGo coordinate to an SGF coordinate."""
+  if coord is None:
+    return ''
+  return _SGF_COLUMNS[coord[1]] + _SGF_COLUMNS[coord[0]]
+
+
+def from_kgs(board_size, kgsc):
+  """Converts from a KGS coordinate to a MiniGo coordinate."""
+  if kgsc == 'pass':
+    return None
+  kgsc = kgsc.upper()
+  col = _KGS_COLUMNS.index(kgsc[0])
+  row_from_bottom = int(kgsc[1:])
+  return board_size - row_from_bottom, col
+
+
+def to_kgs(board_size, coord):
+  """Converts from a MiniGo coordinate to a KGS coordinate."""
+  if coord is None:
+    return 'pass'
+  y, x = coord
+  return '{}{}'.format(_KGS_COLUMNS[x], board_size - y)
+
+
+def from_pygtp(board_size, pygtpc):
+  """Converts from a pygtp coordinate to a MiniGo coordinate."""
+  # GTP has a notion of both a Pass and a Resign, both of which are mapped to
+  # None, so the conversion is not precisely bijective.
+  if pygtpc in (gtp.PASS, gtp.RESIGN):
+    return None
+  return board_size - pygtpc[1], pygtpc[0] - 1
+
+
+def to_pygtp(board_size, coord):
+  """Converts from a MiniGo coordinate to a pygtp coordinate."""
+  if coord is None:
+    return gtp.PASS
+  return coord[1] + 1, board_size - coord[0]
--- a/research/minigo/dualnet.py
+++ b/research/minigo/dualnet.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Contains utility and supporting functions for DualNet.
+
+This module provides the model interface, including functions for DualNet model
+bootstrap, training, validation, loading and exporting.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+
+import tensorflow as tf  # pylint: disable=g-bad-import-order
+
+import dualnet_model
+import features
+import preprocessing
+import symmetries
+
+
+class DualNetRunner(object):
+  """The DualNetRunner class for the complete model with graph and weights.
+
+  This class can restore the model from saved files, and provide inference for
+  given examples.
+  """
+
+  def __init__(self, save_file, params):
+    """Initialize the dual network from saved model/checkpoints.
+
+    Args:
+      save_file: Path where model parameters were previously saved. For example:
+        '/tmp/minigo/models_dir/000000-bootstrap/'
+      params: An object with hyperparameters for DualNetRunner
+    """
+    self.save_file = save_file
+    self.hparams = params
+    self.inference_input = None
+    self.inference_output = None
+    config = tf.ConfigProto()
+    config.gpu_options.allow_growth = True
+    self.sess = tf.Session(graph=tf.Graph(), config=config)
+    self.initialize_graph()
+
+  def initialize_graph(self):
+    """Initialize the graph with saved model."""
+    with self.sess.graph.as_default():
+      input_features, labels = get_inference_input(self.hparams)
+      estimator_spec = dualnet_model.model_fn(
+          input_features, labels, tf.estimator.ModeKeys.PREDICT, self.hparams)
+      self.inference_input = input_features
+      self.inference_output = estimator_spec.predictions
+      if self.save_file is not None:
+        self.initialize_weights(self.save_file)
+      else:
+        self.sess.run(tf.global_variables_initializer())
+
+  def initialize_weights(self, save_file):
+    """Initialize the weights from the given save_file.
+
+    Assumes that the graph has been constructed, and the save_file contains
+    weights that match the graph. Used to set the weights to a different version
+    of the player without redefining the entire graph.
+
+    Args:
+      save_file: Path where model parameters were previously saved.
+    """
+
+    tf.train.Saver().restore(self.sess, save_file)
+
+  def run(self, position, use_random_symmetry=True):
+    """Compute the policy and value output for a given position.
+
+    Args:
+      position: A given go board status
+      use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
+        the extracted feature (defined in features.py) of the given position
+
+    Returns:
+      prob, value: The policy and value output (defined in dualnet_model.py)
+    """
+    probs, values = self.run_many(
+        [position], use_random_symmetry=use_random_symmetry)
+    return probs[0], values[0]
+
+  def run_many(self, positions, use_random_symmetry=True):
+    """Compute the policy and value output for given positions.
+
+    Args:
+      positions: A list of positions for go board status
+      use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
+        the extracted features (defined in features.py) of the given positions
+
+    Returns:
+      probabilities, value: The policy and value outputs (defined in
+        dualnet_model.py)
+    """
+    def _extract_features(positions):
+      return features.extract_features(self.hparams.board_size, positions)
+    processed = list(map(_extract_features, positions))
+    # processed = [
+    #  features.extract_features(self.hparams.board_size, p) for p in positions]
+    if use_random_symmetry:
+      syms_used, processed = symmetries.randomize_symmetries_feat(processed)
+    # feed_dict is a dict object to provide the input examples for the step of
+    # inference. sess.run() returns the inference predictions (indicated by
+    # self.inference_output) of the given input as outputs
+    outputs = self.sess.run(
+        self.inference_output, feed_dict={self.inference_input: processed})
+    probabilities, value = outputs['policy_output'], outputs['value_output']
+    if use_random_symmetry:
+      probabilities = symmetries.invert_symmetries_pi(
+          self.hparams.board_size, syms_used, probabilities)
+    return probabilities, value
+
+
+def get_inference_input(params):
+  """Set up placeholders for input features/labels.
+
+  Args:
+    params: An object to indicate the hyperparameters of the model.
+
+  Returns:
+    The features and output tensors that get passed into model_fn. Check
+      dualnet_model.py for more details on the models input and output.
+  """
+  input_features = tf.placeholder(
+      tf.float32, [None, params.board_size, params.board_size,
+                   features.NEW_FEATURES_PLANES],
+      name='pos_tensor')
+
+  labels = {
+      'pi_tensor': tf.placeholder(
+          tf.float32, [None, params.board_size * params.board_size + 1]),
+      'value_tensor': tf.placeholder(tf.float32, [None])
+  }
+
+  return input_features, labels
+
+
+def bootstrap(working_dir, params):
+  """Initialize a tf.Estimator run with random initial weights.
+
+  Args:
+    working_dir: The directory where tf.estimator will drop logs,
+      checkpoints, and so on
+    params: hyperparams of the model.
+  """
+  # Forge an initial checkpoint with the name that subsequent Estimator will
+  # expect to find.
+  estimator_initial_checkpoint_name = 'model.ckpt-1'
+  save_file = os.path.join(working_dir,
+                           estimator_initial_checkpoint_name)
+  sess = tf.Session()
+  with sess.graph.as_default():
+    input_features, labels = get_inference_input(params)
+    dualnet_model.model_fn(
+        input_features, labels, tf.estimator.ModeKeys.PREDICT, params)
+    sess.run(tf.global_variables_initializer())
+    tf.train.Saver().save(sess, save_file)
+
+
+def export_model(working_dir, model_path):
+  """Take the latest checkpoint and export it to model_path for selfplay.
+
+  Assumes that all relevant model files are prefixed by the same name.
+  (For example, foo.index, foo.meta and foo.data-00000-of-00001).
+
+  Args:
+    working_dir: The directory where tf.estimator keeps its checkpoints.
+    model_path: Either a local path or a gs:// path to export model to.
+  """
+  latest_checkpoint = tf.train.latest_checkpoint(working_dir)
+  all_checkpoint_files = tf.gfile.Glob(latest_checkpoint + '*')
+  for filename in all_checkpoint_files:
+    suffix = filename.partition(latest_checkpoint)[2]
+    destination_path = model_path + suffix
+    tf.gfile.Copy(filename, destination_path)
+
+
+def train(working_dir, tf_records, generation_num, params):
+  """Train the model for a specific generation.
+
+  Args:
+    working_dir: The model working directory to save model parameters,
+      drop logs, checkpoints, and so on.
+    tf_records: A list of tf_record filenames for training input.
+    generation_num: The generation to be trained.
+    params: hyperparams of the model.
+
+  Raises:
+    ValueError: if generation_num is not greater than 0.
+  """
+  if generation_num <= 0:
+    raise ValueError('Model 0 is random weights')
+  estimator = tf.estimator.Estimator(
+      dualnet_model.model_fn, model_dir=working_dir, params=params)
+  max_steps = (generation_num * params.examples_per_generation
+               // params.batch_size)
+  profiler_hook = tf.train.ProfilerHook(output_dir=working_dir, save_secs=600)
+
+  def input_fn():
+    return preprocessing.get_input_tensors(
+        params, params.batch_size, tf_records)
+  estimator.train(
+      input_fn, hooks=[profiler_hook], max_steps=max_steps)
+
+
+def validate(working_dir, tf_records, params):
+  """Perform model validation on the hold out data.
+
+  Args:
+    working_dir: The model working directory.
+    tf_records: A list of tf_records filenames for holdout data.
+    params: hyperparams of the model.
+  """
+  estimator = tf.estimator.Estimator(
+      dualnet_model.model_fn, model_dir=working_dir, params=params)
+  def input_fn():
+    return preprocessing.get_input_tensors(
+        params, params.batch_size, tf_records, filter_amount=0.05)
+  estimator.evaluate(input_fn, steps=1000)
--- a/research/minigo/dualnet_model.py
+++ b/research/minigo/dualnet_model.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Defines DualNet model, the architecture of the policy and value network.
+
+The input to the neural network is a [board_size * board_size * 17] image stack
+comprising 17 binary feature planes. 8 feature planes consist of binary values
+indicating the presence of the current player's stones; A further 8 feature
+planes represent the corresponding features for the opponent's stones; The final
+feature plane represents the color to play, and has a constant value of either 1
+if black is to play or 0 if white to play. Check 'features.py' for more details.
+
+In MiniGo implementation, the input features are processed by a residual tower
+that consists of a single convolutional block followed by either 9 or 19
+residual blocks.
+The convolutional block applies the following modules:
+  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+Each residual block applies the following modules sequentially to its input:
+  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
+  5. Batch normalization
+  6. A skip connection that adds the input to the block
+  7. A rectifier non-linearity
+Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
+
+The output of the residual tower is passed into two separate "heads" for
+computing the policy and value respectively. The policy head applies the
+following modules:
+  1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A fully connected linear layer that outputs a vector of size 19^2 + 1 = 362
+  corresponding to logit probabilities for all intersections and the pass move
+The value head applies the following modules:
+  1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
+  2. Batch normalization
+  3. A rectifier non-linearity
+  4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
+    board size and 64 for 9x9 board size
+  5. A rectifier non-linearity
+  6. A fully connected linear layer to a scalar
+  7. A tanh non-linearity outputting a scalar in the range [-1, 1]
+
+The overall network depth, in the 10 or 20 block network, is 19 or 39
+parameterized layers respectively for the residual tower, plus an additional 2
+layers for the policy head and 3 layers for the value head.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+_BATCH_NORM_DECAY = 0.997
+_BATCH_NORM_EPSILON = 1e-5
+
+
+def _batch_norm(inputs, training, center=True, scale=True):
+  """Performs a batch normalization using a standard set of parameters."""
+  return tf.layers.batch_normalization(
+      inputs=inputs, momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON,
+      center=center, scale=scale, fused=True, training=training)
+
+
+def _conv2d(inputs, filters, kernel_size):
+  """Performs 2D convolution with a standard set of parameters."""
+  return tf.layers.conv2d(
+      inputs=inputs, filters=filters, kernel_size=kernel_size,
+      padding='same')
+
+
+def _conv_block(inputs, filters, kernel_size, training):
+  """A convolutional block.
+
+  Args:
+    inputs: A tensor representing a batch of input features with shape
+      [BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
+    filters: The number of filters for network layers in residual tower.
+    kernel_size: The kernel to be used in conv2d.
+    training: Either True or False, whether we are currently training the
+      model. Needed for batch norm.
+
+  Returns:
+    The output tensor of the convolutional block layer.
+  """
+  conv = _conv2d(inputs, filters, kernel_size)
+  batchn = _batch_norm(conv, training)
+
+  output = tf.nn.relu(batchn)
+
+  return output
+
+
+def _res_block(inputs, filters, kernel_size, training):
+  """A residual block.
+
+  Args:
+    inputs: A tensor representing a batch of input features with shape
+      [BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
+    filters: The number of filters for network layers in residual tower.
+    kernel_size: The kernel to be used in conv2d.
+    training: Either True or False, whether we are currently training the
+      model. Needed for batch norm.
+
+  Returns:
+    The output tensor of the residual block layer.
+  """
+
+  initial_output = _conv_block(inputs, filters, kernel_size, training)
+
+  int_layer2_conv = _conv2d(initial_output, filters, kernel_size)
+  int_layer2_batchn = _batch_norm(int_layer2_conv, training)
+
+  output = tf.nn.relu(inputs + int_layer2_batchn)
+
+  return output
+
+
+class Model(object):
+  """Base class for building the DualNet Model."""
+
+  def __init__(self, num_filters, num_shared_layers, fc_width, board_size):
+    """Initialize a model for computing the policy and value in RL.
+
+    Args:
+      num_filters: Number of filters (AlphaGoZero used 256). We use 128 by
+        default for a 19x19 go board, and 32 for 9x9 size.
+      num_shared_layers: Number of shared residual blocks.  AGZ used both 19
+        and 39. Here we use 19 for 19x19 size and 9 for 9x9 size because it's
+        faster to train.
+      fc_width: Dimensionality of the fully connected linear layer.
+      board_size: A single integer for the board size.
+    """
+    self.num_filters = num_filters
+    self.num_shared_layers = num_shared_layers
+    self.fc_width = fc_width
+    self.board_size = board_size
+    self.kernel_size = [3, 3]  # kernel size is from AGZ paper
+
+  def __call__(self, inputs, training):
+    """Add operations to classify a batch of input Go features.
+
+    Args:
+      inputs: A Tensor representing a batch of input Go features with shape
+        [BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES]
+      training: A boolean. Set to True to add operations required only when
+        training the classifier.
+
+    Returns:
+      policy_logits: A vector of size self.board_size * self.board_size + 1
+        corresponding to the policy logit probabilities for all intersections
+        and the pass move.
+      value_logits: A scalar for the value logits output
+    """
+    initial_output = _conv_block(
+        inputs=inputs, filters=self.num_filters,
+        kernel_size=self.kernel_size, training=training)
+    # the shared stack
+    shared_output = initial_output
+    for _ in range(self.num_shared_layers):
+      shared_output = _res_block(
+          inputs=shared_output, filters=self.num_filters,
+          kernel_size=self.kernel_size, training=training)
+
+    # policy head
+    policy_conv2d = _conv2d(inputs=shared_output, filters=2, kernel_size=[1, 1])
+    policy_batchn = _batch_norm(inputs=policy_conv2d, training=training,
+                                center=False, scale=False)
+    policy_relu = tf.nn.relu(policy_batchn)
+    policy_logits = tf.layers.dense(
+        tf.reshape(policy_relu, [-1, self.board_size * self.board_size * 2]),
+        self.board_size * self.board_size + 1)
+
+    # value head
+    value_conv2d = _conv2d(shared_output, filters=1, kernel_size=[1, 1])
+    value_batchn = _batch_norm(value_conv2d, training,
+                               center=False, scale=False)
+    value_relu = tf.nn.relu(value_batchn)
+    value_fc_hidden = tf.nn.relu(tf.layers.dense(
+        tf.reshape(value_relu, [-1, self.board_size * self.board_size]),
+        self.fc_width))
+    value_logits = tf.reshape(tf.layers.dense(value_fc_hidden, 1), [-1])
+
+    return policy_logits, value_logits
+
+
+def model_fn(features, labels, mode, params, config=None):  # pylint: disable=unused-argument
+  """DualNet model function.
+
+  Args:
+    features: tensor with shape
+      [BATCH_SIZE, self.board_size, self.board_size,
+      features.NEW_FEATURES_PLANES]
+    labels: dict from string to tensor with shape
+      'pi_tensor': [BATCH_SIZE, self.board_size * self.board_size + 1]
+      'value_tensor': [BATCH_SIZE]
+    mode: a tf.estimator.ModeKeys (batchnorm params update for TRAIN only)
+    params: an object of hyperparams
+    config: ignored; is required by Estimator API.
+  Returns:
+    EstimatorSpec parameterized according to the input params and the current
+    mode.
+  """
+  model = Model(params.num_filters, params.num_shared_layers, params.fc_width,
+                params.board_size)
+  policy_logits, value_logits = model(
+      features, mode == tf.estimator.ModeKeys.TRAIN)
+
+  policy_output = tf.nn.softmax(policy_logits, name='policy_output')
+  value_output = tf.nn.tanh(value_logits, name='value_output')
+
+  # Calculate model loss. The loss function sums over the mean-squared error,
+  # the cross-entropy losses and the l2 regularization term.
+  # Cross-entropy of policy
+  policy_entropy = -tf.reduce_mean(tf.reduce_sum(
+      policy_output * tf.log(policy_output), axis=1))
+  policy_cost = tf.reduce_mean(
+      tf.nn.softmax_cross_entropy_with_logits(
+          logits=policy_logits, labels=labels['pi_tensor']))
+  # Mean squared error
+  value_cost = tf.reduce_mean(
+      tf.square(value_output - labels['value_tensor']))
+  # L2 term
+  l2_cost = params.l2_strength * tf.add_n(
+      [tf.nn.l2_loss(v) for v in tf.trainable_variables()
+       if 'bias' not in v.name])
+  # The loss function
+  combined_cost = policy_cost + value_cost + l2_cost
+
+  # Get model train ops
+  global_step = tf.train.get_or_create_global_step()
+  boundaries = [int(1e6), int(2e6)]
+  values = [1e-2, 1e-3, 1e-4]
+  learning_rate = tf.train.piecewise_constant(
+      global_step, boundaries, values)
+  update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
+  with tf.control_dependencies(update_ops):
+    train_op = tf.train.MomentumOptimizer(
+        learning_rate, params.momentum).minimize(
+            combined_cost, global_step=global_step)
+
+  # Create multiple tensors for logging purpose
+  metric_ops = {
+      'accuracy': tf.metrics.accuracy(labels=labels['pi_tensor'],
+                                      predictions=policy_output,
+                                      name='accuracy_op'),
+      'policy_cost': tf.metrics.mean(policy_cost),
+      'value_cost': tf.metrics.mean(value_cost),
+      'l2_cost': tf.metrics.mean(l2_cost),
+      'policy_entropy': tf.metrics.mean(policy_entropy),
+      'combined_cost': tf.metrics.mean(combined_cost),
+  }
+  for metric_name, metric_op in metric_ops.items():
+    tf.summary.scalar(metric_name, metric_op[1])
+
+  # Return tf.estimator.EstimatorSpec
+  return tf.estimator.EstimatorSpec(
+      mode=mode,
+      predictions={
+          'policy_output': policy_output,
+          'value_output': value_output,
+      },
+      loss=combined_cost,
+      train_op=train_op,
+      eval_metric_ops=metric_ops)
--- a/research/minigo/evaluation.py
+++ b/research/minigo/evaluation.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Evaluation of playing games between two neural nets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import time
+
+import go
+from gtp_wrapper import MCTSPlayer
+import sgf_wrapper
+
+
+def play_match(params, black_net, white_net, games, readouts,
+               sgf_dir, verbosity):
+  """Plays matches between two neural nets.
+
+  One net that wins by a margin of 55% will be the winner.
+
+  Args:
+    params: An object of hyperparameters.
+    black_net: Instance of the DualNetRunner class to play as black.
+    white_net: Instance of the DualNetRunner class to play as white.
+    games: Number of games to play. We play all the games at the same time.
+    readouts: Number of readouts to perform for each step in each game.
+    sgf_dir: Directory to write the sgf results.
+    verbosity: Verbosity to show evaluation process.
+
+  Returns:
+    'B' is the winner is black_net, otherwise 'W'.
+  """
+  # For n games, we create lists of n black and n white players
+  black = MCTSPlayer(
+      params.board_size, black_net, verbosity=verbosity, two_player_mode=True,
+      num_parallel=params.simultaneous_leaves)
+  white = MCTSPlayer(
+      params.board_size, white_net, verbosity=verbosity, two_player_mode=True,
+      num_parallel=params.simultaneous_leaves)
+
+  black_name = os.path.basename(black_net.save_file)
+  white_name = os.path.basename(white_net.save_file)
+
+  black_win_counts = 0
+  white_win_counts = 0
+
+  for i in range(games):
+    num_move = 0  # The move number of the current game
+
+    black.initialize_game()
+    white.initialize_game()
+
+    while True:
+      start = time.time()
+      active = white if num_move % 2 else black
+      inactive = black if num_move % 2 else white
+
+      current_readouts = active.root.N
+      while active.root.N < current_readouts + readouts:
+        active.tree_search()
+
+      # print some stats on the search
+      if verbosity >= 3:
+        print(active.root.position)
+
+      # First, check the roots for hopeless games.
+      if active.should_resign():  # Force resign
+        active.set_result(-active.root.position.to_play, was_resign=True)
+        inactive.set_result(
+            active.root.position.to_play, was_resign=True)
+
+      if active.is_done():
+        fname = '{:d}-{:s}-vs-{:s}-{:d}.sgf'.format(
+            int(time.time()), white_name, black_name, i)
+        with open(os.path.join(sgf_dir, fname), 'w') as f:
+          sgfstr = sgf_wrapper.make_sgf(
+              params.board_size, active.position.recent, active.result_string,
+              black_name=black_name, white_name=white_name)
+          f.write(sgfstr)
+        print('Finished game', i, active.result_string)
+        if active.result_string is not None:
+          if active.result_string[0] == 'B':
+            black_win_counts += 1
+          elif active.result_string[0] == 'W':
+            white_win_counts += 1
+
+        break
+
+      move = active.pick_move()
+      active.play_move(move)
+      inactive.play_move(move)
+
+      dur = time.time() - start
+      num_move += 1
+
+      if (verbosity > 1) or (verbosity == 1 and num_move % 10 == 9):
+        timeper = (dur / readouts) * 100.0
+        print(active.root.position)
+        print('{:d}: {:d} readouts, {:.3f} s/100. ({:.2f} sec)'.format(
+            num_move, readouts, timeper, dur))
+
+  if (black_win_counts - white_win_counts) > params.eval_win_rate * games:
+    return go.BLACK_NAME
+  else:
+    return go.WHITE_NAME
--- a/research/minigo/features.py
+++ b/research/minigo/features.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Features used by AlphaGo Zero, in approximate order of importance.
+
+Feature                 # Notes
+Stone History           16 The stones of each color during the last 8 moves.
+Ones                    1  Constant plane of 1s
+All features with 8 planes are 1-hot encoded, with plane i marked with 1
+only if the feature was equal to i. Any features >= 8 would be marked as 8.
+
+This file includes the features from from AlphaGo Zero (AGZ) as NEW_FEATURES.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import go
+import numpy as np
+
+
+def planes(num_planes):
+  # to specify the number of planes in the features. For example, for a 19x19
+  # go board, the input stone feature will be in the shape of [19, 19, 16],
+  # where the third dimension is the num_planes.
+  def deco(f):
+    f.planes = num_planes
+    return f
+  return deco
+
+
+@planes(16)
+def stone_features(board_size, position):
+  """Create the 16 planes of features for a given position.
+
+  Args:
+    board_size: the go board size.
+    position: a given go board status.
+
+  Returns:
+    The 16 plane features.
+  """
+  # a bit easier to calculate it with axis 0 being the 16 board states,
+  # and then roll axis 0 to the end.
+  features = np.zeros([16, board_size, board_size], dtype=np.uint8)
+
+  num_deltas_avail = position.board_deltas.shape[0]
+  cumulative_deltas = np.cumsum(position.board_deltas, axis=0)
+  last_eight = np.tile(position.board, [8, 1, 1])
+  # apply deltas to compute previous board states
+  last_eight[1:num_deltas_avail + 1] -= cumulative_deltas
+  # if no more deltas are available, just repeat oldest board.
+  last_eight[num_deltas_avail + 1:] = last_eight[num_deltas_avail].reshape(
+      1, board_size, board_size)
+
+  features[::2] = last_eight == position.to_play
+  features[1::2] = last_eight == -position.to_play
+  return np.rollaxis(features, 0, 3)
+
+
+@planes(1)
+def color_to_play_feature(board_size, position):
+  if position.to_play == go.BLACK:
+    return np.ones([board_size, board_size, 1], dtype=np.uint8)
+  else:
+    return np.zeros([board_size, board_size, 1], dtype=np.uint8)
+
+NEW_FEATURES = [
+    stone_features,
+    color_to_play_feature
+]
+
+NEW_FEATURES_PLANES = sum(f.planes for f in NEW_FEATURES)
+
+
+def extract_features(board_size, position, features=None):
+  if features is None:
+    features = NEW_FEATURES
+  return np.concatenate([feature(board_size, position) for feature in features],
+                        axis=2)
--- a/research/minigo/go.py
+++ b/research/minigo/go.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Describe the Go game status.
+
+A board is a NxN numpy array.
+A Coordinate is a tuple index into the board.
+A Move is a (Coordinate c | None).
+A PlayerMove is a (Color, Move) tuple
+
+(0, 0) is considered to be the upper left corner of the board, and (18, 0)
+is the lower left.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import namedtuple
+import copy
+import itertools
+
+import coords
+import numpy as np
+
+# Represent a board as a numpy array, with 0 empty, 1 is black, -1 is white.
+# This means that swapping colors is as simple as multiplying array by -1.
+WHITE, EMPTY, BLACK, FILL, KO, UNKNOWN = range(-1, 5)
+
+# Represents "group not found" in the LibertyTracker object
+MISSING_GROUP_ID = -1
+
+BLACK_NAME = 'BLACK'
+WHITE_NAME = 'WHITE'
+
+def _check_bounds(board_size, c):
+  return c[0] % board_size == c[0] and c[1] % board_size == c[1]
+
+
+def get_neighbors_diagonals(board_size):
+  all_coords = [(i, j) for i in range(board_size) for j in range(board_size)]
+  def check_bounds(c):
+    return _check_bounds(board_size, c)
+
+  neighbors = {(x, y): list(filter(check_bounds, [
+      (x+1, y), (x-1, y), (x, y+1), (x, y-1)])) for x, y in all_coords}
+
+  diagonals = {(x, y): list(filter(check_bounds, [
+      (x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)])) for x, y in all_coords}
+
+  return neighbors, diagonals
+
+
+class IllegalMove(Exception):
+  pass
+
+
+class PlayerMove(namedtuple('PlayerMove', ['color', 'move'])):
+  pass
+
+
+class PositionWithContext(namedtuple('SgfPosition',
+                                     ['position', 'next_move', 'result'])):
+  pass
+
+
+def place_stones(board, color, stones):
+  for s in stones:
+    board[s] = color
+
+
+def replay_position(board_size, position, result):
+  """Wrapper for a go.Position which replays its history.
+
+  Assumes an empty start position! (i.e. no handicap, and history must
+  be exhaustive.)
+
+  Result must be passed in, since a resign cannot be inferred from position
+  history alone.
+
+  for position_w_context in replay_position(position):
+    print(position_w_context.position)
+  """
+  if position.n != len(position.recent):
+    raise ValueError('Position history is incomplete!')
+  pos = Position(board_size=board_size, komi=position.komi)
+  for player_move in position.recent:
+    color, next_move = player_move
+    yield PositionWithContext(pos, next_move, result)
+    pos = pos.play_move(next_move, color=color)
+
+
+def find_reached(board_size, board, c):
+  color = board[c]
+  chain = set([c])
+  reached = set()
+  frontier = [c]
+  neighbors, _ = get_neighbors_diagonals(board_size)
+  while frontier:
+    current = frontier.pop()
+    chain.add(current)
+    for n in neighbors[current]:
+      if board[n] == color and n not in chain:
+        frontier.append(n)
+      elif board[n] != color:
+        reached.add(n)
+  return chain, reached
+
+
+def is_koish(board_size, board, c):
+  """Check if c is surrounded on all sides by 1 color, and return that color."""
+  if board[c] != EMPTY:
+    return None
+  full_neighbors, _ = get_neighbors_diagonals(board_size)
+  neighbors = {board[n] for n in full_neighbors[c]}
+  if len(neighbors) == 1 and EMPTY not in neighbors:
+    return list(neighbors)[0]
+  else:
+    return None
+
+
+def is_eyeish(board_size, board, c):
+  """Check if c is an eye, for the purpose of restricting MC rollouts."""
+  # pass is fine.
+  if c is None:
+    return
+  color = is_koish(board_size, board, c)
+  if color is None:
+    return None
+  diagonal_faults = 0
+  _, diagonals = get_neighbors_diagonals[c]
+  if len(diagonals) < 4:
+    diagonal_faults += 1
+  for d in diagonals:
+    if not board[d] in (color, EMPTY):
+      diagonal_faults += 1
+  if diagonal_faults > 1:
+    return None
+  else:
+    return color
+
+
+class Group(namedtuple('Group', ['id', 'stones', 'liberties', 'color'])):
+  """
+  stones: a frozenset of Coordinates belonging to this group
+  liberties: a frozenset of Coordinates that are empty and adjacent to
+    this group.
+  color: color of this group
+  """
+
+  def __eq__(self, other):
+    return (self.stones == other.stones and self.liberties == other.liberties
+            and self.color == other.color)
+
+
+class LibertyTracker(object):
+
+  @staticmethod
+  def from_board(board_size, board):
+    board = np.copy(board)
+    curr_group_id = 0
+    lib_tracker = LibertyTracker(board_size)
+    for color in (WHITE, BLACK):
+      while color in board:
+        curr_group_id += 1
+        found_color = np.where(board == color)
+        coord = found_color[0][0], found_color[1][0]
+        chain, reached = find_reached(board_size, board, coord)
+        liberties = frozenset(r for r in reached if board[r] == EMPTY)
+        new_group = Group(curr_group_id, frozenset(
+            chain), liberties, color)
+        lib_tracker.groups[curr_group_id] = new_group
+        for s in chain:
+          lib_tracker.group_index[s] = curr_group_id
+        place_stones(board, FILL, chain)
+
+    lib_tracker.max_group_id = curr_group_id
+
+    liberty_counts = np.zeros([board_size, board_size], dtype=np.uint8)
+    for group in lib_tracker.groups.values():
+      num_libs = len(group.liberties)
+      for s in group.stones:
+        liberty_counts[s] = num_libs
+    lib_tracker.liberty_cache = liberty_counts
+
+    return lib_tracker
+
+  def __init__(self, board_size, group_index=None, groups=None,
+               liberty_cache=None, max_group_id=1):
+    # group_index: a NxN numpy array of group_ids. -1 means no group
+    # groups: a dict of group_id to groups
+    # liberty_cache: a NxN numpy array of liberty counts
+    self.board_size = board_size
+    self.group_index = group_index if group_index is not None else - \
+        np.ones([board_size, board_size], dtype=np.int32)
+    self.groups = groups or {}
+    self.liberty_cache = liberty_cache if liberty_cache is not None else - \
+        np.zeros([board_size, board_size], dtype=np.uint8)
+    self.max_group_id = max_group_id
+    self.neighbors, _ = get_neighbors_diagonals(board_size)
+
+  def __deepcopy__(self, memodict={}):
+    new_group_index = np.copy(self.group_index)
+    new_lib_cache = np.copy(self.liberty_cache)
+    # shallow copy
+    new_groups = copy.copy(self.groups)
+    return LibertyTracker(
+        self.board_size, new_group_index, new_groups,
+        liberty_cache=new_lib_cache, max_group_id=self.max_group_id)
+
+  def add_stone(self, color, c):
+    assert self.group_index[c] == MISSING_GROUP_ID
+    captured_stones = set()
+    opponent_neighboring_group_ids = set()
+    friendly_neighboring_group_ids = set()
+    empty_neighbors = set()
+
+    for n in self.neighbors[c]:
+      neighbor_group_id = self.group_index[n]
+      if neighbor_group_id != MISSING_GROUP_ID:
+        neighbor_group = self.groups[neighbor_group_id]
+        if neighbor_group.color == color:
+          friendly_neighboring_group_ids.add(neighbor_group_id)
+        else:
+          opponent_neighboring_group_ids.add(neighbor_group_id)
+      else:
+        empty_neighbors.add(n)
+
+    new_group = self._create_group(color, c, empty_neighbors)
+
+    for group_id in friendly_neighboring_group_ids:
+      new_group = self._merge_groups(group_id, new_group.id)
+
+    # new_group becomes stale as _update_liberties and
+    # _handle_captures are called; must refetch with self.groups[new_group.id]
+    for group_id in opponent_neighboring_group_ids:
+      neighbor_group = self.groups[group_id]
+      if len(neighbor_group.liberties) == 1:
+        captured = self._capture_group(group_id)
+        captured_stones.update(captured)
+      else:
+        self._update_liberties(group_id, remove={c})
+
+    self._handle_captures(captured_stones)
+
+    # suicide is illegal
+    if len(self.groups[new_group.id].liberties) == 0:
+      raise IllegalMove('Move at {} would commit suicide!\n'.format(c))
+
+    return captured_stones
+
+  def _create_group(self, color, c, liberties):
+    self.max_group_id += 1
+    new_group = Group(self.max_group_id, frozenset([c]), liberties, color)
+    self.groups[new_group.id] = new_group
+    self.group_index[c] = new_group.id
+    self.liberty_cache[c] = len(liberties)
+    return new_group
+
+  def _merge_groups(self, group1_id, group2_id):
+    group1 = self.groups[group1_id]
+    group2 = self.groups[group2_id]
+    self.groups[group1_id] = Group(
+        group1_id, group1.stones | group2.stones, group1.liberties,
+        group1.color)
+    del self.groups[group2_id]
+    for s in group2.stones:
+      self.group_index[s] = group1_id
+
+    self._update_liberties(
+        group1_id, add=group2.liberties, remove=group2.stones)
+
+    return group1
+
+  def _capture_group(self, group_id):
+    dead_group = self.groups[group_id]
+    del self.groups[group_id]
+    for s in dead_group.stones:
+      self.group_index[s] = MISSING_GROUP_ID
+      self.liberty_cache[s] = 0
+    return dead_group.stones
+
+  def _update_liberties(self, group_id, add=set(), remove=set()):
+    group = self.groups[group_id]
+    new_libs = (group.liberties | add) - remove
+    self.groups[group_id] = Group(
+        group_id, group.stones, new_libs, group.color)
+
+    new_lib_count = len(new_libs)
+    for s in self.groups[group_id].stones:
+      self.liberty_cache[s] = new_lib_count
+
+  def _handle_captures(self, captured_stones):
+    for s in captured_stones:
+      for n in self.neighbors[s]:
+        group_id = self.group_index[n]
+        if group_id != MISSING_GROUP_ID:
+          self._update_liberties(group_id, add={s})
+
+
+class Position(object):
+
+  def __init__(self, board_size, board=None, n=0, komi=7.5, caps=(0, 0),
+               lib_tracker=None, ko=None, recent=tuple(),
+               board_deltas=None, to_play=BLACK):
+    """
+    board_size: the go board size.
+    board: a numpy array
+    n: an int representing moves played so far
+    komi: a float, representing points given to the second player.
+    caps: a (int, int) tuple of captures for B, W.
+    lib_tracker: a LibertyTracker object
+    ko: a Move
+    recent: a tuple of PlayerMoves, such that recent[-1] is the last move.
+    board_deltas: a np.array of shape (n, go.N, go.N) representing changes
+      made to the board at each move (played move and captures).
+      Should satisfy next_pos.board - next_pos.board_deltas[0] == pos.board
+    to_play: BLACK or WHITE
+    """
+    assert type(recent) is tuple
+    self.board_size = board_size
+    self.board = board if board is not None else - \
+        np.zeros([board_size, board_size], dtype=np.int8)
+    self.n = n
+    self.komi = komi
+    self.caps = caps
+    self.lib_tracker = lib_tracker or LibertyTracker.from_board(
+        self.board_size, self.board)
+    self.ko = ko
+    self.recent = recent
+    self.board_deltas = board_deltas if board_deltas is not None else - \
+        np.zeros([0, board_size, board_size], dtype=np.int8)
+    self.to_play = to_play
+    self.last_eight = None
+    self.neighbors, _ = get_neighbors_diagonals(board_size)
+
+  def __deepcopy__(self, memodict={}):
+    new_board = np.copy(self.board)
+    new_lib_tracker = copy.deepcopy(self.lib_tracker)
+    return Position(
+        self.board_size, new_board, self.n, self.komi, self.caps,
+        new_lib_tracker, self.ko, self.recent, self.board_deltas, self.to_play)
+
+  def __str__(self):
+    pretty_print_map = {
+        WHITE: '\x1b[0;31;47mO',
+        EMPTY: '\x1b[0;31;43m.',
+        BLACK: '\x1b[0;31;40mX',
+        FILL: '#',
+        KO: '*',
+    }
+    board = np.copy(self.board)
+    captures = self.caps
+    if self.ko is not None:
+      place_stones(board, KO, [self.ko])
+    raw_board_contents = []
+    for i in range(self.board_size):
+      row = []
+      for j in range(self.board_size):
+        appended = '<' if (
+            self.recent and (i, j) == self.recent[-1].move) else ' '
+        row.append(pretty_print_map[board[i, j]] + appended)
+        row.append('\x1b[0m')
+      raw_board_contents.append(''.join(row))
+
+    row_labels = ['%2d ' % i for i in range(self.board_size, 0, -1)]
+    annotated_board_contents = [''.join(r) for r in zip(
+        row_labels, raw_board_contents, row_labels)]
+    header_footer_rows = [
+        '   ' + ' '.join('ABCDEFGHJKLMNOPQRST'[:self.board_size]) + '   ']
+    annotated_board = '\n'.join(itertools.chain(
+        header_footer_rows, annotated_board_contents, header_footer_rows))
+    details = '\nMove: {}. Captures X: {} O: {}\n'.format(
+        self.n, *captures)
+    return annotated_board + details
+
+  def is_move_suicidal(self, move):
+    potential_libs = set()
+    for n in self.neighbors[move]:
+      neighbor_group_id = self.lib_tracker.group_index[n]
+      if neighbor_group_id == MISSING_GROUP_ID:
+        # at least one liberty after playing here, so not a suicide
+        return False
+      neighbor_group = self.lib_tracker.groups[neighbor_group_id]
+      if neighbor_group.color == self.to_play:
+        potential_libs |= neighbor_group.liberties
+      elif len(neighbor_group.liberties) == 1:
+        # would capture an opponent group if they only had one lib.
+        return False
+    # it's possible to suicide by connecting several friendly groups
+    # each of which had one liberty.
+    potential_libs -= set([move])
+    return not potential_libs
+
+  def is_move_legal(self, move):
+    """Checks that a move is on an empty space, not on ko, and not suicide."""
+    if move is None:
+      return True
+    if self.board[move] != EMPTY:
+      return False
+    if move == self.ko:
+      return False
+    if self.is_move_suicidal(move):
+      return False
+
+    return True
+
+  def all_legal_moves(self):
+    """Returns a np.array of size go.N**2 + 1, with 1 = legal, 0 = illegal."""
+    # by default, every move is legal
+    legal_moves = np.ones([self.board_size, self.board_size], dtype=np.int8)
+    # ...unless there is already a stone there
+    legal_moves[self.board != EMPTY] = 0
+    # calculate which spots have 4 stones next to them
+    # padding is because the edge always counts as a lost liberty.
+    adjacent = np.ones([self.board_size+2, self.board_size+2], dtype=np.int8)
+    adjacent[1:-1, 1:-1] = np.abs(self.board)
+    num_adjacent_stones = (adjacent[:-2, 1:-1] + adjacent[1:-1, :-2] +
+                           adjacent[2:, 1:-1] + adjacent[1:-1, 2:])
+    # Surrounded spots are those that are empty and have 4 adjacent stones.
+    surrounded_spots = np.multiply(
+        (self.board == EMPTY),
+        (num_adjacent_stones == 4))
+    # Such spots are possibly illegal, unless they are capturing something.
+    # Iterate over and manually check each spot.
+    for coord in np.transpose(np.nonzero(surrounded_spots)):
+      if self.is_move_suicidal(tuple(coord)):
+        legal_moves[tuple(coord)] = 0
+
+    # ...and retaking ko is always illegal
+    if self.ko is not None:
+      legal_moves[self.ko] = 0
+
+    # and pass is always legal
+    return np.concatenate([legal_moves.ravel(), [1]])
+
+  def pass_move(self, mutate=False):
+    pos = self if mutate else copy.deepcopy(self)
+    pos.n += 1
+    pos.recent += (PlayerMove(pos.to_play, None),)
+    pos.board_deltas = np.concatenate((
+        np.zeros([1, self.board_size, self.board_size], dtype=np.int8),
+        pos.board_deltas[:6]))
+    pos.to_play *= -1
+    pos.ko = None
+    return pos
+
+  def flip_playerturn(self, mutate=False):
+    pos = self if mutate else copy.deepcopy(self)
+    pos.ko = None
+    pos.to_play *= -1
+    return pos
+
+  def get_liberties(self):
+    return self.lib_tracker.liberty_cache
+
+  def play_move(self, c, color=None, mutate=False):
+    # Obeys CGOS Rules of Play. In short:
+    # No suicides
+    # Chinese/area scoring
+    # Positional superko (this is very crudely approximate at the moment.)
+    if color is None:
+      color = self.to_play
+
+    pos = self if mutate else copy.deepcopy(self)
+
+    if c is None:
+      pos = pos.pass_move(mutate=mutate)
+      return pos
+
+    if not self.is_move_legal(c):
+      raise IllegalMove('{} move at {} is illegal: \n{}'.format(
+          'Black' if self.to_play == BLACK else 'White',
+          coords.to_kgs(c), self))
+
+    potential_ko = is_koish(self.board_size, self.board, c)
+
+    place_stones(pos.board, color, [c])
+    captured_stones = pos.lib_tracker.add_stone(color, c)
+    place_stones(pos.board, EMPTY, captured_stones)
+
+    opp_color = color * -1
+
+    new_board_delta = np.zeros([self.board_size, self.board_size],
+                               dtype=np.int8)
+    new_board_delta[c] = color
+    place_stones(new_board_delta, color, captured_stones)
+
+    if len(captured_stones) == 1 and potential_ko == opp_color:
+      new_ko = list(captured_stones)[0]
+    else:
+      new_ko = None
+
+    if pos.to_play == BLACK:
+      new_caps = (pos.caps[0] + len(captured_stones), pos.caps[1])
+    else:
+      new_caps = (pos.caps[0], pos.caps[1] + len(captured_stones))
+
+    pos.n += 1
+    pos.caps = new_caps
+    pos.ko = new_ko
+    pos.recent += (PlayerMove(color, c),)
+
+    # keep a rolling history of last 7 deltas - that's all we'll need to
+    # extract the last 8 board states.
+    pos.board_deltas = np.concatenate((
+        new_board_delta.reshape(1, self.board_size, self.board_size),
+        pos.board_deltas[:6]))
+    pos.to_play *= -1
+    return pos
+
+  def is_game_over(self):
+    return (len(self.recent) >= 2
+            and self.recent[-1].move is None
+            and self.recent[-2].move is None)
+
+  def score(self):
+    """Return score from B perspective. If W is winning, score is negative."""
+    working_board = np.copy(self.board)
+    while EMPTY in working_board:
+      unassigned_spaces = np.where(working_board == EMPTY)
+      c = unassigned_spaces[0][0], unassigned_spaces[1][0]
+      territory, borders = find_reached(self.board_size, working_board, c)
+      border_colors = set(working_board[b] for b in borders)
+      X_border = BLACK in border_colors
+      O_border = WHITE in border_colors
+      if X_border and not O_border:
+        territory_color = BLACK
+      elif O_border and not X_border:
+        territory_color = WHITE
+      else:
+        territory_color = UNKNOWN  # dame, or seki
+      place_stones(working_board, territory_color, territory)
+
+    return np.count_nonzero(working_board == BLACK) - np.count_nonzero(
+        working_board == WHITE) - self.komi
+
+  def result(self):
+    score = self.score()
+    if score > 0:
+      return 1
+    elif score < 0:
+      return -1
+    else:
+      return 0
+
+  def result_string(self):
+    score = self.score()
+    if score > 0:
+      return 'B+' + '{:.1f}'.format(score)
+    elif score < 0:
+      return 'W+' + '{:.1f}'.format(abs(score))
+    else:
+      return 'DRAW'
--- a/research/minigo/gtp_extensions.py
+++ b/research/minigo/gtp_extensions.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Extends gtp.py."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import sys
+import itertools
+
+import coords
+import go
+import gtp
+import sgf_wrapper
+
+
+def parse_message(message):
+  message = gtp.pre_engine(message).strip()
+  first, rest = (message.split(" ", 1) + [None])[:2]
+  if first.isdigit():
+    message_id = int(first)
+    if rest is not None:
+      command, arguments = (rest.split(" ", 1) + [None])[:2]
+    else:
+      command, arguments = None, None
+  else:
+    message_id = None
+    command, arguments = first, rest
+
+  command = command.replace("-", "_")  # for kgs extensions.
+  return message_id, command, arguments
+
+
+class KgsExtensionsMixin(gtp.Engine):
+
+  def __init__(self, game_obj, name="gtp (python, kgs-chat extensions)",
+               version="0.1"):
+    super().__init__(game_obj=game_obj, name=name, version=version)
+    self.known_commands += ["kgs-chat"]
+
+  def send(self, message):
+    message_id, command, arguments = parse_message(message)
+    if command in self.known_commands:
+      try:
+        retval = getattr(self, "cmd_" + command)(arguments)
+        response = gtp.format_success(message_id, retval)
+        sys.stderr.flush()
+        return response
+      except ValueError as exception:
+        return gtp.format_error(message_id, exception.args[0])
+    else:
+      return gtp.format_error(message_id, "unknown command: " + command)
+
+  # Nice to implement this, as KGS sends it each move.
+  def cmd_time_left(self, arguments):
+    pass
+
+  def cmd_showboard(self, arguments):
+    return self._game.showboard()
+
+  def cmd_kgs_chat(self, arguments):
+    try:
+      arg_list = arguments.split()
+      msg_type, sender, text = arg_list[0], arg_list[1], arg_list[2:]
+      text = " ".join(text)
+    except ValueError:
+      return "Unparseable message, args: %r" % arguments
+    return self._game.chat(msg_type, sender, text)
+
+
+class RegressionsMixin(gtp.Engine):
+  def cmd_loadsgf(self, arguments):
+    args = arguments.split()
+    if len(args) == 2:
+      file_, movenum = args
+      movenum = int(movenum)
+      print("movenum =", movenum, file=sys.stderr)
+    else:
+      file_ = args[0]
+      movenum = None
+
+    try:
+      with open(file_, 'r') as f:
+        contents = f.read()
+    except:
+      raise ValueError("Unreadable file: " + file_)
+
+    try:
+      # This is kinda bad, because replay_sgf is already calling
+      # 'play move' on its internal position objects, but we really
+      # want to advance the engine along with us rather than try to
+      # push in some finished Position object.
+      for idx, p in enumerate(sgf_wrapper.replay_sgf(contents)):
+        print("playing #", idx, p.next_move, file=sys.stderr)
+        self._game.play_move(p.next_move)
+        if movenum and idx == movenum:
+          break
+    except:
+      raise
+
+
+class GoGuiMixin(gtp.Engine):
+  """ GTP extensions of 'analysis commands' for gogui.
+  We reach into the game_obj (an instance of the players in strategies.py),
+  and extract stuff from its root nodes, etc.  These could be extracted into
+  methods on the Player object, but its a little weird to do that on a Player,
+  which doesn't really care about GTP commands, etc.  So instead, we just
+  violate encapsulation a bit... Suggestions welcome :) """
+
+  def __init__(self, game_obj, name="gtp (python, gogui extensions)",
+               version="0.1"):
+    super().__init__(game_obj=game_obj, name=name, version=version)
+    self.known_commands += ["gogui-analyze_commands"]
+
+  def cmd_gogui_analyze_commands(self, arguments):
+    return "\n".join(["var/Most Read Variation/nextplay",
+              "var/Think a spell/spin",
+              "pspairs/Visit Heatmap/visit_heatmap",
+              "pspairs/Q Heatmap/q_heatmap"])
+
+  def cmd_nextplay(self, arguments):
+    return self._game.root.mvp_gg()
+
+  def cmd_visit_heatmap(self, arguments):
+    sort_order = list(range(self._game.size * self._game.size + 1))
+    sort_order.sort(key=lambda i: self._game.root.child_N[i], reverse=True)
+    return self.heatmap(sort_order, self._game.root, 'child_N')
+
+  def cmd_q_heatmap(self, arguments):
+    sort_order = list(range(self._game.size * self._game.size + 1))
+    reverse = True if self._game.root.position.to_play is go.BLACK else False
+    sort_order.sort(
+      key=lambda i: self._game.root.child_Q[i], reverse=reverse)
+    return self.heatmap(sort_order, self._game.root, 'child_Q')
+
+  def heatmap(self, sort_order, node, prop):
+    return "\n".join(["{!s:6} {}".format(
+      coords.to_kgs(coords.from_flat(key)),
+      node.__dict__.get(prop)[key])
+      for key in sort_order if node.child_N[key] > 0][:20])
+
+  def cmd_spin(self, arguments):
+    for i in range(50):
+      for j in range(100):
+        self._game.tree_search()
+      moves = self.cmd_nextplay(None).lower()
+      moves = moves.split()
+      colors = "bw" if self._game.root.position.to_play is go.BLACK else "wb"
+      moves_cols = " ".join(['{} {}'.format(*z)
+                   for z in zip(itertools.cycle(colors), moves)])
+      print("gogui-gfx: TEXT", "{:.3f} after {}".format(
+        self._game.root.Q, self._game.root.N), file=sys.stderr, flush=True)
+      print("gogui-gfx: VAR", moves_cols, file=sys.stderr, flush=True)
+    return self.cmd_nextplay(None)
+
+
+class GTPDeluxe(KgsExtensionsMixin, RegressionsMixin, GoGuiMixin):
+  pass
--- a/research/minigo/gtp_wrapper.py
+++ b/research/minigo/gtp_wrapper.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""A wrapper of gtp and gtp_extensions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import datetime
+import os
+import sys
+
+import coords
+from dualnet import DualNetRunner
+import go
+import gtp
+import gtp_extensions
+from strategies import MCTSPlayerMixin, CGOSPlayerMixin
+
+
+def translate_gtp_colors(gtp_color):
+  if gtp_color == gtp.BLACK:
+    return go.BLACK
+  elif gtp_color == gtp.WHITE:
+    return go.WHITE
+  else:
+    return go.EMPTY
+
+
+class GtpInterface(object):
+  def __init__(self, board_size):
+    self.size = 9
+    self.position = None
+    self.komi = 6.5
+    self.board_size = board_size
+
+  def set_size(self, n):
+    if n != self.board_size:
+      raise ValueError(
+          ("Can't handle boardsize {n}!"
+          "Restart with env var BOARD_SIZE={n}").format(n=n))
+
+  def set_komi(self, komi):
+    self.komi = komi
+    self.position.komi = komi
+
+  def clear(self):
+    if self.position and len(self.position.recent) > 1:
+      try:
+        sgf = self.to_sgf()
+        with open(datetime.datetime.now().strftime(
+            "%Y-%m-%d-%H:%M.sgf"), 'w') as f:
+          f.write(sgf)
+      except NotImplementedError:
+        pass
+      except:
+        print("Error saving sgf", file=sys.stderr, flush=True)
+    self.position = go.Position(komi=self.komi)
+    self.initialize_game(self.position)
+
+  def accomodate_out_of_turn(self, color):
+    if not translate_gtp_colors(color) == self.position.to_play:
+      self.position.flip_playerturn(mutate=True)
+
+  def make_move(self, color, vertex):
+    c = coords.from_pygtp(vertex)
+    # let's assume this never happens for now.
+    # self.accomodate_out_of_turn(color)
+    return self.play_move(c)
+
+  def get_move(self, color):
+    self.accomodate_out_of_turn(color)
+    move = self.suggest_move(self.position)
+    if self.should_resign():
+      return gtp.RESIGN
+    return coords.to_pygtp(move)
+
+  def final_score(self):
+    return self.position.result_string()
+
+  def showboard(self):
+    print('\n\n' + str(self.position) + '\n\n', file=sys.stderr)
+    return True
+
+  def should_resign(self):
+    raise NotImplementedError
+
+  def get_score(self):
+    return self.position.result_string()
+
+  def suggest_move(self, position):
+    raise NotImplementedError
+
+  def play_move(self, c):
+    raise NotImplementedError
+
+  def initialize_game(self):
+    raise NotImplementedError
+
+  def chat(self, msg_type, sender, text):
+    raise NotImplementedError
+
+  def to_sgf(self):
+    raise NotImplementedError
+
+
+class MCTSPlayer(MCTSPlayerMixin, GtpInterface):
+  pass
+
+
+class CGOSPlayer(CGOSPlayerMixin, GtpInterface):
+  pass
+
+
+def make_gtp_instance(board_size, read_file, readouts_per_move=100,
+                      verbosity=1, cgos_mode=False):
+  n = DualNetRunner(read_file)
+  instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
+                        verbosity=verbosity, two_player_mode=True)
+  gtp_engine = gtp.Engine(instance)
+  if cgos_mode:
+    instance = CGOSPlayer(board_size, n, seconds_per_move=5,
+                verbosity=verbosity, two_player_mode=True)
+  else:
+    instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
+                          verbosity=verbosity, two_player_mode=True)
+  name = "Somebot-" + os.path.basename(read_file)
+  gtp_engine = gtp_extensions.GTPDeluxe(instance, name=name)
+  return gtp_engine
--- a/research/minigo/mcts.py
+++ b/research/minigo/mcts.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Monte Carlo Tree Search implementation.
+
+All terminology here (Q, U, N, p_UCT) uses the same notation as in the
+AlphaGo (AG) paper, and more details can be found in the paper. Here is a brief
+description:
+  Q: the action value of a position
+  U: the search control strategy
+  N: the visit counts of a state
+  p_UCT: the PUCT algorithm for action selection
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+import math
+
+import coords
+import numpy as np
+
+# Exploration constant
+c_PUCT = 1.38
+# Dirichlet noise, as a function of board_size
+def D_NOISE_ALPHA(board_size): return 0.03 * 361 / (board_size ** 2)
+
+
+class DummyNode(object):
+  """A fake node of a MCTS search tree.
+
+  This node is intended to be a placeholder for the root node, which would
+  otherwise have no parent node. If all nodes have parents, code becomes
+  simpler."""
+
+  def __init__(self, board_size):
+    self.board_size = board_size
+    self.parent = None
+    self.child_N = collections.defaultdict(float)
+    self.child_W = collections.defaultdict(float)
+
+
+class MCTSNode(object):
+  """A node of a MCTS search tree.
+
+  A node knows how to compute the action scores of all of its children,
+  so that a decision can be made about which move to explore next. Upon
+  selecting a move, the children dictionary is updated with a new node.
+
+  position: A go.Position instance
+  fmove: A move (coordinate) that led to this position, a a flattened coord
+    (raw number between 0-N^2, with None a pass)
+  parent: A parent MCTSNode.
+  """
+
+  def __init__(self, board_size, position, fmove=None, parent=None):
+    if parent is None:
+      parent = DummyNode(board_size)
+    self.board_size = board_size
+    self.parent = parent
+    self.fmove = fmove  # move that led to this position, as flattened coords
+    self.position = position
+    self.is_expanded = False
+    self.losses_applied = 0  # number of virtual losses on this node
+    # using child_() allows vectorized computation of action score.
+    self.illegal_moves = 1000 * (1 - self.position.all_legal_moves())
+    self.child_N = np.zeros([board_size * board_size + 1], dtype=np.float32)
+    self.child_W = np.zeros([board_size * board_size + 1], dtype=np.float32)
+    # save a copy of the original prior before it gets mutated by d-noise.
+    self.original_prior = np.zeros([board_size * board_size + 1],
+                                   dtype=np.float32)
+    self.child_prior = np.zeros([board_size * board_size + 1], dtype=np.float32)
+    self.children = {}  # map of flattened moves to resulting MCTSNode
+
+  def __repr__(self):
+    return "<MCTSNode move=%s, N=%s, to_play=%s>" % (
+        self.position.recent[-1:], self.N, self.position.to_play)
+
+  @property
+  def child_action_score(self):
+    return (self.child_Q * self.position.to_play
+            + self.child_U - self.illegal_moves)
+
+  @property
+  def child_Q(self):
+    return self.child_W / (1 + self.child_N)
+
+  @property
+  def child_U(self):
+    return (c_PUCT * math.sqrt(1 + self.N) *
+            self.child_prior / (1 + self.child_N))
+
+  @property
+  def Q(self):
+    return self.W / (1 + self.N)
+
+  @property
+  def N(self):
+    return self.parent.child_N[self.fmove]
+
+  @N.setter
+  def N(self, value):
+    self.parent.child_N[self.fmove] = value
+
+  @property
+  def W(self):
+    return self.parent.child_W[self.fmove]
+
+  @W.setter
+  def W(self, value):
+    self.parent.child_W[self.fmove] = value
+
+  @property
+  def Q_perspective(self):
+    "Return value of position, from perspective of player to play."
+    return self.Q * self.position.to_play
+
+  def select_leaf(self):
+    current = self
+    pass_move = self.board_size * self.board_size
+    while True:
+      current.N += 1
+      # if a node has never been evaluated, we have no basis to select a child.
+      if not current.is_expanded:
+        break
+      # HACK: if last move was a pass, always investigate double-pass first
+      # to avoid situations where we auto-lose by passing too early.
+      if (current.position.recent
+          and current.position.recent[-1].move is None
+          and current.child_N[pass_move] == 0):
+        current = current.maybe_add_child(pass_move)
+        continue
+
+      best_move = np.argmax(current.child_action_score)
+      current = current.maybe_add_child(best_move)
+    return current
+
+  def maybe_add_child(self, fcoord):
+    """Add child node for fcoord if it doesn't already exist, and returns it."""
+    if fcoord not in self.children:
+      new_position = self.position.play_move(
+          coords.from_flat(self.board_size, fcoord))
+      self.children[fcoord] = MCTSNode(
+          self.board_size, new_position, fmove=fcoord, parent=self)
+    return self.children[fcoord]
+
+  def add_virtual_loss(self, up_to):
+    """Propagate a virtual loss up to the root node.
+
+    Args:
+      up_to: The node to propagate until. (Keep track of this! You'll
+        need it to reverse the virtual loss later.)
+    """
+    self.losses_applied += 1
+    # This is a "win" for the current node; hence a loss for its parent node
+    # who will be deciding whether to investigate this node again.
+    loss = self.position.to_play
+    self.W += loss
+    if self.parent is None or self is up_to:
+      return
+    self.parent.add_virtual_loss(up_to)
+
+  def revert_virtual_loss(self, up_to):
+    self.losses_applied -= 1
+    revert = -1 * self.position.to_play
+    self.W += revert
+    if self.parent is None or self is up_to:
+      return
+    self.parent.revert_virtual_loss(up_to)
+
+  def revert_visits(self, up_to):
+    """Revert visit increments.
+
+    Sometimes, repeated calls to select_leaf return the same node.
+    This is rare and we're okay with the wasted computation to evaluate
+    the position multiple times by the dual_net. But select_leaf has the
+    side effect of incrementing visit counts. Since we want the value to
+    only count once for the repeatedly selected node, we also have to
+    revert the incremented visit counts.
+    """
+    self.N -= 1
+    if self.parent is None or self is up_to:
+      return
+    self.parent.revert_visits(up_to)
+
+  def incorporate_results(self, move_probabilities, value, up_to):
+    assert move_probabilities.shape == (self.board_size * self.board_size + 1,)
+    # A finished game should not be going through this code path - should
+    # directly call backup_value() on the result of the game.
+    assert not self.position.is_game_over()
+    if self.is_expanded:
+      self.revert_visits(up_to=up_to)
+      return
+    self.is_expanded = True
+    self.original_prior = self.child_prior = move_probabilities
+    # initialize child Q as current node's value, to prevent dynamics where
+    # if B is winning, then B will only ever explore 1 move, because the Q
+    # estimation will be so much larger than the 0 of the other moves.
+    #
+    # Conversely, if W is winning, then B will explore all 362 moves before
+    # continuing to explore the most favorable move. This is a waste of search.
+    #
+    # The value seeded here acts as a prior, and gets averaged into
+    # Q calculations.
+    self.child_W = np.ones([self.board_size * self.board_size + 1],
+                           dtype=np.float32) * value
+    self.backup_value(value, up_to=up_to)
+
+  def backup_value(self, value, up_to):
+    """Propagates a value estimation up to the root node.
+
+    Args:
+      value: the value to be propagated (1 = black wins, -1 = white wins)
+      up_to: the node to propagate until.
+    """
+    self.W += value
+    if self.parent is None or self is up_to:
+      return
+    self.parent.backup_value(value, up_to)
+
+  def is_done(self):
+    '''True if the last two moves were Pass or if the position is at a move
+    greater than the max depth.
+    '''
+    max_depth = (self.board_size ** 2) * 1.4  # 505 moves for 19x19, 113 for 9x9
+    return self.position.is_game_over() or self.position.n >= max_depth
+
+  def inject_noise(self):
+    dirch = np.random.dirichlet([D_NOISE_ALPHA(self.board_size)] * (
+        (self.board_size * self.board_size) + 1))
+    self.child_prior = self.child_prior * 0.75 + dirch * 0.25
+
+  def children_as_pi(self, squash=False):
+    """Returns the child visit counts as a probability distribution, pi
+    If squash is true, exponentiate the probabilities by a temperature
+    slightly larger than unity to encourage diversity in early play and
+    hopefully to move away from 3-3s
+    """
+    probs = self.child_N
+    if squash:
+      probs = probs ** .95
+    return probs / np.sum(probs)
+
+  def most_visited_path(self):
+    node = self
+    output = []
+    while node.children:
+      next_kid = np.argmax(node.child_N)
+      node = node.children.get(next_kid)
+      if node is None:
+        output.append("GAME END")
+        break
+      output.append("%s (%d) ==> " % (
+          coords.to_kgs(self.board_size,
+          coords.from_flat(self.board_size, node.fmove)), node.N))
+    output.append("Q: {:.5f}\n".format(node.Q))
+    return ''.join(output)
+
+  def mvp_gg(self):
+    """ Returns most visited path in go-gui VAR format e.g. 'b r3 w c17..."""
+    node = self
+    output = []
+    while node.children and max(node.child_N) > 1:
+      next_kid = np.argmax(node.child_N)
+      node = node.children[next_kid]
+      output.append("%s" % coords.to_kgs(
+          self.board_size, coords.from_flat(self.board_size, node.fmove)))
+    return ' '.join(output)
+
+  def describe(self):
+    sort_order = list(range(self.board_size * self.board_size + 1))
+    sort_order.sort(key=lambda i: (
+        self.child_N[i], self.child_action_score[i]), reverse=True)
+    soft_n = self.child_N / sum(self.child_N)
+    p_delta = soft_n - self.child_prior
+    p_rel = p_delta / self.child_prior
+    # Dump out some statistics
+    output = []
+    output.append("{q:.4f}\n".format(q=self.Q))
+    output.append(self.most_visited_path())
+    output.append(
+        "move:  action      Q      U      P    P-Dir    N  soft-N" +
+        "  p-delta  p-rel\n")
+    output.append(
+        "\n".join(["{!s:6}: {: .3f}, {: .3f}, {:.3f}, {:.3f}, {:.3f}, {:4d} {:.4f} {: .5f} {: .2f}".format(
+        coords.to_kgs(self.board_size, coords.from_flat(self.board_size, key)),
+        self.child_action_score[key],
+        self.child_Q[key],
+        self.child_U[key],
+        self.child_prior[key],
+        self.original_prior[key],
+        int(self.child_N[key]),
+        soft_n[key],
+        p_delta[key],
+        p_rel[key])
+        for key in sort_order][:15]))
+    return "".join(output)
--- a/research/minigo/minigo.py
+++ b/research/minigo/minigo.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Train MiniGo with several iterations of RL learning.
+
+One iteration of RL learning consists of bootstrap, selfplay, gather and train:
+  bootstrap: Initialize a random model
+  selfplay: Play games with the latest model to produce data used for training
+  gather: Group games played with the same model into larger files of tfexamples
+  train: Train a new model with the selfplay results from the most recent
+    N generations.
+After training, validation can be performed on the holdout data.
+Given two models, evaluation can be applied to choose a stronger model.
+The training pipeline consists of multiple RL learning iterations to achieve
+better models.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+import os
+import random
+import socket
+import sys
+import time
+
+import tensorflow as tf  # pylint: disable=g-bad-import-order
+
+import dualnet
+import evaluation
+import go
+import model_params
+import preprocessing
+import selfplay_mcts
+import utils
+
+_TF_RECORD_SUFFIX = '.tfrecord.zz'
+
+
+def _ensure_dir_exists(directory):
+  """Check if directory exists. If not, create it.
+
+  Args:
+    directory: A given directory
+  """
+  if os.path.isdir(directory) is False:
+    tf.gfile.MakeDirs(directory)
+
+
+def bootstrap(estimator_model_dir, trained_models_dir, params):
+  """Initialize the model with random weights.
+
+  Args:
+    estimator_model_dir: tf.estimator model directory.
+    trained_models_dir: Dir to save the trained models. Here to export the first
+      bootstrapped generation.
+    params: An object of hyperparameters for the model.
+  """
+  bootstrap_name = utils.generate_model_name(0)
+  _ensure_dir_exists(trained_models_dir)
+  bootstrap_model_path = os.path.join(trained_models_dir, bootstrap_name)
+  _ensure_dir_exists(estimator_model_dir)
+
+  print('Bootstrapping with working dir {}\n Model 0 exported to {}'.format(
+      estimator_model_dir, bootstrap_model_path))
+  dualnet.bootstrap(estimator_model_dir, params)
+  dualnet.export_model(estimator_model_dir, bootstrap_model_path)
+
+
+def selfplay(model_name, trained_models_dir, selfplay_dir, holdout_dir, sgf_dir,
+             params):
+  """Perform selfplay with a specific model.
+
+  Args:
+    model_name: The name of the model used for selfplay.
+    trained_models_dir: The path to the model files.
+    selfplay_dir: Where to write the games. Set as 'base_dir/data/selfplay/'.
+    holdout_dir: Where to write the holdout data. Set as
+      'base_dir/data/holdout/'.
+    sgf_dir: Where to write the sgf (Smart Game Format) files. Set as
+      'base_dir/sgf/'.
+    params: An object of hyperparameters for the model.
+  """
+  print('Playing a game with model {}'.format(model_name))
+  # Set paths for the model with 'model_name'
+  model_path = os.path.join(trained_models_dir, model_name)
+  output_dir = os.path.join(selfplay_dir, model_name)
+  holdout_dir = os.path.join(holdout_dir, model_name)
+  # clean_sgf is to write sgf file without comments.
+  # full_sgf is to write sgf file with comments.
+  clean_sgf = os.path.join(sgf_dir, model_name, 'clean')
+  full_sgf = os.path.join(sgf_dir, model_name, 'full')
+
+  _ensure_dir_exists(output_dir)
+  _ensure_dir_exists(holdout_dir)
+  _ensure_dir_exists(clean_sgf)
+  _ensure_dir_exists(full_sgf)
+
+  with utils.logged_timer('Loading weights from {} ... '.format(model_path)):
+    network = dualnet.DualNetRunner(model_path, params)
+
+  with utils.logged_timer('Playing game'):
+    player = selfplay_mcts.play(
+        params.board_size, network, params.selfplay_readouts,
+        params.selfplay_resign_threshold, params.simultaneous_leaves,
+        params.selfplay_verbose)
+
+  output_name = '{}-{}'.format(int(time.time()), socket.gethostname())
+
+  def _write_sgf_data(dir_sgf, use_comments):
+    with tf.gfile.GFile(
+        os.path.join(dir_sgf, '{}.sgf'.format(output_name)), 'w') as f:
+      f.write(player.to_sgf(use_comments=use_comments))
+
+  _write_sgf_data(clean_sgf, use_comments=False)
+  _write_sgf_data(full_sgf, use_comments=True)
+
+  game_data = player.extract_data()
+  tf_examples = preprocessing.make_dataset_from_selfplay(game_data, params)
+
+  # Hold out 5% of games for evaluation.
+  if random.random() < params.holdout_pct:
+    fname = os.path.join(
+        holdout_dir, ('{}'+_TF_RECORD_SUFFIX).format(output_name))
+  else:
+    fname = os.path.join(
+        output_dir, ('{}'+_TF_RECORD_SUFFIX).format(output_name))
+
+  preprocessing.write_tf_examples(fname, tf_examples)
+
+
+def gather(selfplay_dir, training_chunk_dir, params):
+  """Gather selfplay data into large training chunk.
+
+  Args:
+    selfplay_dir: Where to look for games. Set as 'base_dir/data/selfplay/'.
+    training_chunk_dir: where to put collected games. Set as
+      'base_dir/data/training_chunks/'.
+    params: An object of hyperparameters for the model.
+  """
+  # Check the selfplay data from the most recent 50 models.
+  _ensure_dir_exists(training_chunk_dir)
+  sorted_model_dirs = sorted(tf.gfile.ListDirectory(selfplay_dir))
+  models = [model_dir.strip('/')
+            for model_dir in sorted_model_dirs[-params.gather_generation:]]
+
+  with utils.logged_timer('Finding existing tfrecords...'):
+    model_gamedata = {
+        model: tf.gfile.Glob(
+            os.path.join(selfplay_dir, model, '*'+_TF_RECORD_SUFFIX))
+        for model in models
+    }
+  print('Found {} models'.format(len(models)))
+  for model_name, record_files in sorted(model_gamedata.items()):
+    print('    {}: {} files'.format(model_name, len(record_files)))
+
+  meta_file = os.path.join(training_chunk_dir, 'meta.txt')
+  try:
+    with tf.gfile.GFile(meta_file, 'r') as f:
+      already_processed = set(f.read().split())
+  except tf.errors.NotFoundError:
+    already_processed = set()
+
+  num_already_processed = len(already_processed)
+
+  for model_name, record_files in sorted(model_gamedata.items()):
+    if set(record_files) <= already_processed:
+      continue
+    print('Gathering files from {}:'.format(model_name))
+    tf_examples = preprocessing.shuffle_tf_examples(
+        params.shuffle_buffer_size, params.examples_per_chunk, record_files)
+    # tqdm to make the loops show a smart progress meter
+    for i, example_batch in enumerate(tf_examples):
+      output_record = os.path.join(
+          training_chunk_dir,
+          ('{}-{}'+_TF_RECORD_SUFFIX).format(model_name, str(i)))
+      preprocessing.write_tf_examples(
+          output_record, example_batch, serialize=False)
+    already_processed.update(record_files)
+
+  print('Processed {} new files'.format(
+      len(already_processed) - num_already_processed))
+  with tf.gfile.GFile(meta_file, 'w') as f:
+    f.write('\n'.join(sorted(already_processed)))
+
+
+def train(trained_models_dir, estimator_model_dir, training_chunk_dir, params):
+  """Train the latest model from gathered data.
+
+  Args:
+    trained_models_dir: Where to export the completed generation.
+    estimator_model_dir: tf.estimator model directory.
+    training_chunk_dir: Directory where gathered training chunks are.
+    params: An object of hyperparameters for the model.
+  """
+  model_num, model_name = utils.get_latest_model(trained_models_dir)
+  print('Initializing from model {}'.format(model_name))
+
+  new_model_name = utils.generate_model_name(model_num + 1)
+  print('New model will be {}'.format(new_model_name))
+  save_file = os.path.join(trained_models_dir, new_model_name)
+
+  tf_records = sorted(
+      tf.gfile.Glob(os.path.join(training_chunk_dir, '*'+_TF_RECORD_SUFFIX)))
+  tf_records = tf_records[
+      -(params.train_window_size // params.examples_per_chunk):]
+
+  print('Training from: {} to {}'.format(tf_records[0], tf_records[-1]))
+  with utils.logged_timer('Training'):
+    dualnet.train(estimator_model_dir, tf_records, model_num + 1, params)
+    dualnet.export_model(estimator_model_dir, save_file)
+
+
+def validate(trained_models_dir, holdout_dir, estimator_model_dir, params):
+  """Validate the latest model on the holdout dataset.
+
+  Args:
+    trained_models_dir: Directories where the completed generations/models are.
+    holdout_dir: Directories where holdout data are.
+    estimator_model_dir: tf.estimator model directory.
+    params: An object of hyperparameters for the model.
+  """
+  model_num, _ = utils.get_latest_model(trained_models_dir)
+
+  # Get the holdout game data
+  nums_names = utils.get_models(trained_models_dir)
+
+  # Model N was trained on games up through model N-1, so the validation set
+  # should only be for models through N-1 as well, thus the (model_num) term.
+  models = [num_name for num_name in nums_names if num_name[0] < model_num]
+
+  # pair is a tuple of (model_num, model_name), like (13, 000013-modelname)
+  holdout_dirs = [os.path.join(holdout_dir, pair[1])
+                  for pair in models[-params.holdout_generation:]]
+  tf_records = []
+  with utils.logged_timer('Building lists of holdout files'):
+    for record_dir in holdout_dirs:
+      if os.path.exists(record_dir):  # make sure holdout dir exists
+        tf_records.extend(
+            tf.gfile.Glob(os.path.join(record_dir, '*'+_TF_RECORD_SUFFIX)))
+
+  print('The length of tf_records is {}.'.format(len(tf_records)))
+  first_tf_record = os.path.basename(tf_records[0])
+  last_tf_record = os.path.basename(tf_records[-1])
+  with utils.logged_timer('Validating from {} to {}'.format(
+      first_tf_record, last_tf_record)):
+    dualnet.validate(estimator_model_dir, tf_records, params)
+
+
+def evaluate(trained_models_dir, black_model_name, white_model_name,
+             evaluate_dir, params):
+  """Evaluate with two models.
+
+  With the model name, construct two DualNetRunners to play as black and white
+  in a Go match. Two models play several names, and the model that wins by a
+  margin of 55% will be the winner.
+
+  Args:
+    trained_models_dir: Directories where the completed generations/models are.
+    black_model_name: The name of the model playing black.
+    white_model_name: The name of the model playing white.
+    evaluate_dir: Where to write the evaluation results. Set as
+      'base_dir/sgf/evaluate/''
+    params: An object of hyperparameters for the model.
+
+  Returns:
+    The model name of the winner.
+
+  Raises:
+      ValueError: if neither `WHITE` or `BLACK` is returned.
+  """
+
+  black_model = os.path.join(trained_models_dir, black_model_name)
+  white_model = os.path.join(trained_models_dir, white_model_name)
+
+  print('Evaluate models between {} and {}'.format(
+      black_model_name, white_model_name))
+
+  _ensure_dir_exists(evaluate_dir)
+
+  with utils.logged_timer('Loading weights'):
+    black_net = dualnet.DualNetRunner(black_model, params)
+    white_net = dualnet.DualNetRunner(white_model, params)
+
+  with utils.logged_timer('{} games'.format(params.eval_games)):
+    winner = evaluation.play_match(
+        params, black_net, white_net, params.eval_games,
+        params.eval_readouts, evaluate_dir, params.eval_verbose)
+
+  if winner != go.WHITE_NAME and winner != go.BLACK_NAME:
+    raise ValueError('Winner should be either White or Black!')
+
+  return black_model_name if winner == go.BLACK_NAME else white_model_name
+
+
+def _set_params_from_board_size(board_size):
+  """Set hyperparameters from board size."""
+  params = model_params.MiniGoParams()
+  k = utils.round_power_of_two(board_size ** 2 / 3)
+  params.num_filters = k  # Number of filters in the convolution layer
+  params.fc_width = 2 * k  # Width of each fully connected layer
+  params.num_shared_layers = board_size  # Number of shared trunk layers
+  params.board_size = board_size  # Board size
+
+  # How many positions can fit on a graphics card. 256 for 9s, 16 or 32 for 19s.
+  if FLAGS.board_size == 9:
+    params.batch_size = 256
+  else:
+    params.batch_size = 32
+  return params
+
+
+def main(_):
+  """Run the reinforcement learning loop."""
+  tf.logging.set_verbosity(tf.logging.INFO)
+
+  params = _set_params_from_board_size(FLAGS.board_size)
+
+  # A dummy model for debug/testing purpose with fewer games and iterations
+  if FLAGS.debug:
+    params = model_params.DummyMiniGoParams()
+
+  # Set directories for models and datasets
+  base_dir = FLAGS.base_dir + str(FLAGS.board_size) + '_board_size/'
+  dirs = utils.MiniGoDirectory(base_dir)
+  # if no models have been trained, start from bootstrap model
+  if os.path.isdir(base_dir) is False:
+    print('No trained model exists! Starting from Bootstrap...')
+    print('Creating random initial weights...')
+    bootstrap(dirs.estimator_model_dir, dirs.trained_models_dir, params)
+  else:
+    print('A MiniGo base directory has been found! ')
+    print('Start from the last checkpoint...')
+
+  _, best_model_so_far = utils.get_latest_model(dirs.trained_models_dir)
+
+  for rl_iter in range(params.max_iters_per_pipeline):
+    print('RL_iteration: {}'.format(rl_iter))
+
+    # Self-play to generate at least params.max_games_per_generation games
+    selfplay(best_model_so_far, dirs.trained_models_dir, dirs.selfplay_dir,
+             dirs.holdout_dir, dirs.sgf_dir, params)
+    games = tf.gfile.Glob(
+        os.path.join(dirs.selfplay_dir, best_model_so_far, '*.zz'))
+    while len(games) < params.max_games_per_generation:
+      selfplay(best_model_so_far, dirs.trained_models_dir, dirs.selfplay_dir,
+               dirs.holdout_dir, dirs.sgf_dir, params)
+      if FLAGS.validation:
+        params = model_params.DummyValidationParams()
+        selfplay(best_model_so_far, dirs.trained_models_dir, dirs.selfplay_dir,
+                 dirs.holdout_dir, dirs.sgf_dir, params)
+      games = tf.gfile.Glob(
+          os.path.join(dirs.selfplay_dir, best_model_so_far, '*.zz'))
+
+    print('Gathering game output...')
+    gather(dirs.selfplay_dir, dirs.training_chunk_dir, params)
+
+    print('Training on gathered game data...')
+    train(dirs.trained_models_dir, dirs.estimator_model_dir,
+          dirs.training_chunk_dir, params)
+
+    if FLAGS.validation:
+      print('Validating on the holdout game data...')
+      validate(dirs.trained_models_dir, dirs.holdout_dir,
+               dirs.estimator_model_dir, params)
+
+    _, current_model = utils.get_latest_model(dirs.trained_models_dir)
+    if FLAGS.evaluation:  # Perform evaluation if needed
+      print('Evaluating the latest model...')
+      best_model_so_far = evaluate(
+          dirs.trained_models_dir, best_model_so_far, current_model,
+          dirs.evaluate_dir, params)
+      print('Winner: {}!'.format(best_model_so_far))
+    else:
+      best_model_so_far = current_model
+
+
+if __name__ == '__main__':
+  parser = argparse.ArgumentParser()
+  parser.add_argument(
+      '--base_dir',
+      type=str,
+      default='/tmp/minigo/',
+      metavar='BD',
+      help='Base directory for the MiniGo models and datasets.')
+  parser.add_argument(
+      '--board_size',
+      type=int,
+      default=9,
+      metavar='N',
+      choices=[9, 19],
+      help='Go board size. The default size is 9.')
+  parser.add_argument(
+      '--evaluation',
+      action='store_true',
+      help='A boolean to specify evaluation in the RL pipeline.')
+  parser.add_argument(
+      '--debug',
+      action='store_true',
+      help='A boolean to indicate debug mode for testing purpose.')
+  parser.add_argument(
+      '--validation',
+      action='store_true',
+      help='A boolean to explicitly generate holdout data for validation.')
+
+  FLAGS, unparsed = parser.parse_known_args()
+  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
--- a/research/minigo/model_params.py
+++ b/research/minigo/model_params.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Defines MiniGo parameters."""
+
+
+class MiniGoParams(object):
+  """Parameters for MiniGo."""
+
+  # Go params
+  board_size = 9
+
+  # RL pipeline
+  max_games_per_generation = 10  # Number of games per selfplay generation
+  max_iters_per_pipeline = 2  # Number of RL iterations in one pipeline
+
+  # The shuffle buffer size determines how far an example could end up from
+  # where it started; this and the interleave parameters in preprocessing can
+  # give us an approximation of a uniform sampling.  The default of 4M is used
+  # in training, but smaller numbers can be used for aggregation or validation.
+  shuffle_buffer_size = 2000000  # shuffle buffer size in preprocessing
+
+  # dual_net
+  # How many positions to look at per generation.
+  # Per AlphaGo Zero (AGZ), 2048 minibatch * 1k = 2M positions/generation
+  examples_per_generation = 2000000
+
+  # for learning rate
+  l2_strength = 1e-4  # Regularization strength
+  momentum = 0.9  # Momentum used in SGD
+
+  kernel_size = [3, 3]  # kernel size of conv and res blocks is from AGZ paper
+
+  # selfplay
+  selfplay_readouts = 100  # How many simulations to run per move
+  selfplay_verbose = 1  # >=2 will print debug info, >=3 will print boards
+  # an absolute value of threshold to resign at
+  selfplay_resign_threshold = 0.95
+
+  # the number of simultaneous leaves in MCTS
+  simultaneous_leaves = 8
+
+  holdout_pct = 0.05  # How many games to hold out for validation
+  holdout_generation = 50  # How many recent generations/models for holdout data
+
+  # gather
+  gather_generation = 50  # How many recent generations/models for gathered data
+
+  # How many positions we should aggregate per 'chunk'.
+  examples_per_chunk = 10000
+  # How many positions to draw from for our training window.
+  # AGZ used the most recent 500k games, which, assuming 250 moves/game = 125M
+  train_window_size = 125000000
+
+  # evaluation
+  eval_games = 50  # The number of games to play in evaluation
+  eval_readouts = 100  # How many readouts to make per move in evaluation
+  eval_verbose = 1  # How verbose the players should be in evaluation
+  eval_win_rate = 0.55  # Winner needs to win by a margin of 55%.
+
+
+class DummyMiniGoParams(MiniGoParams):
+  """Parameters for a dummy model."""
+  num_filters = 8  # Number of filters in the convolution layer
+  fc_width = 16  # Width of each fully connected layer
+  num_shared_layers = 1  # Number of shared trunk layers
+  batch_size = 16
+  examples_per_generation = 64
+  max_games_per_generation = 2
+  max_iters_per_pipeline = 1
+  selfplay_readouts = 10
+
+  shuffle_buffer_size = 1000
+
+    # evaluation
+  eval_games = 10  # The number of games to play in evaluation
+  eval_readouts = 10  # How many readouts to make per move in evaluation
+  eval_verbose = 1  # How verbose the players should be in evaluation
+
+
+class DummyValidationParams(DummyMiniGoParams, MiniGoParams):
+  """Parameters for a dummy model."""
+  holdout_pct = 1  # Set holdout percent as 1 for validation testing purpose
--- a/research/minigo/preprocessing.py
+++ b/research/minigo/preprocessing.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Preprocessing step to create, read, write tf.Examples."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+import random
+
+import tensorflow as tf  # pylint: disable=g-bad-import-order
+
+import coords
+import features as features_lib
+import numpy as np
+import sgf_wrapper
+
+TF_RECORD_CONFIG = tf.python_io.TFRecordOptions(
+    tf.python_io.TFRecordCompressionType.ZLIB)
+
+
+# Constructing tf.Examples
+def _one_hot(board_size, index):
+  onehot = np.zeros([board_size * board_size + 1], dtype=np.float32)
+  onehot[index] = 1
+  return onehot
+
+
+def make_tf_example(features, pi, value):
+  """
+  Args:
+    features: [N, N, FEATURE_DIM] nparray of uint8
+    pi: [N * N + 1] nparray of float32
+    value: float
+  """
+  return tf.train.Example(
+      features=tf.train.Features(
+          feature={
+              'x': tf.train.Feature(
+                  bytes_list=tf.train.BytesList(value=[features.tostring()])),
+              'pi': tf.train.Feature(
+                  bytes_list=tf.train.BytesList(value=[pi.tostring()])),
+              'outcome': tf.train.Feature(
+                  float_list=tf.train.FloatList(value=[value]))
+          }))
+
+
+# Write tf.Example to files
+def write_tf_examples(filename, tf_examples, serialize=True):
+  """
+  Args:
+    filename: Where to write tf.records
+    tf_examples: An iterable of tf.Example
+    serialize: whether to serialize the examples.
+  """
+  with tf.python_io.TFRecordWriter(
+      filename, options=TF_RECORD_CONFIG) as writer:
+    for ex in tf_examples:
+      if serialize:
+        writer.write(ex.SerializeToString())
+      else:
+        writer.write(ex)
+
+
+# Read tf.Example from files
+def _batch_parse_tf_example(board_size, batch_size, example_batch):
+  """
+  Args:
+    example_batch: a batch of tf.Example
+  Returns:
+    A tuple (feature_tensor, dict of output tensors)
+  """
+  features = {
+      'x': tf.FixedLenFeature([], tf.string),
+      'pi': tf.FixedLenFeature([], tf.string),
+      'outcome': tf.FixedLenFeature([], tf.float32),
+  }
+  parsed = tf.parse_example(example_batch, features)
+  x = tf.decode_raw(parsed['x'], tf.uint8)
+  x = tf.cast(x, tf.float32)
+  x = tf.reshape(x, [batch_size, board_size, board_size,
+                     features_lib.NEW_FEATURES_PLANES])
+  pi = tf.decode_raw(parsed['pi'], tf.float32)
+  pi = tf.reshape(pi, [batch_size, board_size * board_size + 1])
+  outcome = parsed['outcome']
+  outcome.set_shape([batch_size])
+  return (x, {'pi_tensor': pi, 'value_tensor': outcome})
+
+
+def read_tf_records(
+    shuffle_buffer_size, batch_size, tf_records, num_repeats=None,
+    shuffle_records=True, shuffle_examples=True, filter_amount=1.0):
+  """
+  Args:
+    batch_size: batch size to return
+    tf_records: a list of tf_record filenames
+    num_repeats: how many times the data should be read (default: infinite)
+    shuffle_records: whether to shuffle the order of files read
+    shuffle_examples: whether to shuffle the tf.Examples
+    shuffle_buffer_size: how big of a buffer to fill before shuffling.
+    filter_amount: what fraction of records to keep
+  Returns:
+    a tf dataset of batched tensors
+  """
+
+  if shuffle_buffer_size is None:
+    shuffle_buffer_size = params.shuffle_buffer_size
+  if shuffle_records:
+    random.shuffle(tf_records)
+  record_list = tf.data.Dataset.from_tensor_slices(tf_records)
+
+  # compression_type here must agree with write_tf_examples
+  # cycle_length = how many tfrecord files are read in parallel
+  # block_length = how many tf.Examples are read from each file before
+  #   moving to the next file
+  # The idea is to shuffle both the order of the files being read,
+  # and the examples being read from the files.
+  dataset = record_list.interleave(
+      lambda x: tf.data.TFRecordDataset(x, compression_type='ZLIB'),
+      cycle_length=64, block_length=16)
+  dataset = dataset.filter(lambda x: tf.less(
+      tf.random_uniform([1]), filter_amount)[0])
+  # TODO(amj): apply py_func for transforms here.
+  if num_repeats is not None:
+    dataset = dataset.repeat(num_repeats)
+  else:
+    dataset = dataset.repeat()
+  if shuffle_examples:
+    dataset = dataset.shuffle(buffer_size=shuffle_buffer_size)
+  dataset = dataset.batch(batch_size)
+  return dataset
+
+
+def get_input_tensors(params, batch_size, tf_records, num_repeats=None,
+                      shuffle_records=True, shuffle_examples=True,
+                      filter_amount=0.05):
+  """Read tf.Records and prepare them for ingestion by dual_net.  See
+  `read_tf_records` for parameter documentation.
+
+  Returns a dict of tensors (see return value of batch_parse_tf_example)
+  """
+  shuffle_buffer_size = params.shuffle_buffer_size
+  dataset = read_tf_records(
+      shuffle_buffer_size, batch_size, tf_records, num_repeats=num_repeats,
+      shuffle_records=shuffle_records, shuffle_examples=shuffle_examples,
+      filter_amount=filter_amount)
+  dataset = dataset.filter(lambda t: tf.equal(tf.shape(t)[0], batch_size))
+  def batch_parse_tf_example(batch_size, dataset):
+    return _batch_parse_tf_example(params.board_size, batch_size, dataset)
+  dataset = dataset.map(functools.partial(
+      batch_parse_tf_example, batch_size))
+  return dataset.make_one_shot_iterator().get_next()
+
+
+# End-to-end utility functions
+def make_dataset_from_selfplay(data_extracts, params):
+  """Make an iterable of tf.Examples.
+
+  Args:
+    data_extracts: An iterable of (position, pi, result) tuples
+
+  Returns an iterable of tf.Examples.
+  """
+  board_size = params.board_size
+  tf_examples = (make_tf_example(features_lib.extract_features(
+      board_size, pos), pi, result) for pos, pi, result in data_extracts)
+  return tf_examples
+
+
+def make_dataset_from_sgf(board_size, sgf_filename, tf_record):
+  pwcs = sgf_wrapper.replay_sgf_file(board_size, sgf_filename)
+  def make_tf_example_from_pwc(pwcs):
+    return _make_tf_example_from_pwc(board_size, pwcs)
+  tf_examples = map(make_tf_example_from_pwc, pwcs)
+  write_tf_examples(tf_record, tf_examples)
+
+
+def _make_tf_example_from_pwc(board_size, position_w_context):
+  features = features_lib.extract_features(
+      board_size, position_w_context.position)
+  pi = _one_hot(board_size, coords.to_flat(position_w_context.next_move))
+  value = position_w_context.result
+  return make_tf_example(features, pi, value)
+
+
+def shuffle_tf_examples(shuffle_buffer_size, gather_size, records_to_shuffle):
+  """Read through tf.Record and yield shuffled, but unparsed tf.Examples.
+
+  Args:
+    shuffle_buffer_size: the size for shuffle buffer
+    gather_size: The number of tf.Examples to be gathered together
+    records_to_shuffle: A list of filenames
+
+  Returns:
+    An iterator yielding lists of bytes, which are serialized tf.Examples.
+  """
+  dataset = read_tf_records(shuffle_buffer_size, gather_size,
+                            records_to_shuffle, num_repeats=1)
+  batch = dataset.make_one_shot_iterator().get_next()
+  sess = tf.Session()
+  while True:
+    try:
+      result = sess.run(batch)
+      yield list(result)
+    except tf.errors.OutOfRangeError:
+      break
--- a/research/minigo/selfplay_mcts.py
+++ b/research/minigo/selfplay_mcts.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Play a self-play match with a given DualNet model."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import random
+import sys
+import time
+
+import coords
+from gtp_wrapper import MCTSPlayer
+
+
+def play(board_size, network, readouts, resign_threshold, simultaneous_leaves,
+         verbosity=0):
+  """Plays out a self-play match.
+
+  Args:
+    board_size: the go board size
+    network: the DualNet model
+    readouts: the number of readouts in MCTS
+    resign_threshold: the threshold to resign at in the match
+    simultaneous_leaves: the number of simultaneous leaves in MCTS
+    verbosity: the verbosity of the self-play match
+
+  Returns:
+    the final position
+    the n x 362 tensor of floats representing the mcts search probabilities
+    the n-ary tensor of floats representing the original value-net estimate
+      where n is the number of moves in the game.
+  """
+  player = MCTSPlayer(board_size, network, resign_threshold=resign_threshold,
+                      verbosity=verbosity, num_parallel=simultaneous_leaves)
+  # Disable resign in 5% of games
+  if random.random() < 0.05:
+    player.resign_threshold = -1.0
+
+  player.initialize_game()
+
+  # Must run this once at the start, so that noise injection actually
+  # affects the first move of the game.
+  first_node = player.root.select_leaf()
+  prob, val = network.run(first_node.position)
+  first_node.incorporate_results(prob, val, first_node)
+
+  while True:
+    start = time.time()
+    player.root.inject_noise()
+    current_readouts = player.root.N
+    # we want to do "X additional readouts", rather than "up to X readouts".
+    while player.root.N < current_readouts + readouts:
+      player.tree_search()
+
+    if verbosity >= 3:
+      print(player.root.position)
+      print(player.root.describe())
+
+    if player.should_resign():
+      player.set_result(-1 * player.root.position.to_play, was_resign=True)
+      break
+    move = player.pick_move()
+    player.play_move(move)
+    if player.root.is_done():
+      player.set_result(player.root.position.result(), was_resign=False)
+      break
+
+    if (verbosity >= 2) or (
+        verbosity >= 1 and player.root.position.n % 10 == 9):
+      print("Q: {:.5f}".format(player.root.Q))
+      dur = time.time() - start
+      print("%d: %d readouts, %.3f s/100. (%.2f sec)" % (
+          player.root.position.n, readouts, dur / readouts * 100.0, dur))
+    if verbosity >= 3:
+      print("Played >>",
+            coords.to_kgs(coords.from_flat(player.root.fmove)))
+
+  if verbosity >= 2:
+    print("%s: %.3f" % (player.result_string, player.root.Q), file=sys.stderr)
+    print(player.root.position,
+          player.root.position.score(), file=sys.stderr)
+
+  return player
--- a/research/minigo/sgf_wrapper.py
+++ b/research/minigo/sgf_wrapper.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Code to extract a series of positions + their next moves from an SGF.
+
+Most of the complexity here is dealing with two features of SGF:
+- Stones can be added via "play move" or "add move", the latter being used
+  to configure L+D puzzles, but also for initial handicap placement.
+- Plays don't necessarily alternate colors; they can be repeated B or W moves
+  This feature is used to handle free handicap placement.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from collections import namedtuple
+
+import coords
+import go
+from go import Position, PositionWithContext
+import numpy as np
+import sgf
+import utils
+
+
+SGF_TEMPLATE = """(;GM[1]FF[4]CA[UTF-8]AP[Minigo_sgfgenerator]RU[{ruleset}]
+SZ[{boardsize}]KM[{komi}]PW[{white_name}]PB[{black_name}]RE[{result}]
+{game_moves})"""
+
+PROGRAM_IDENTIFIER = "Minigo"
+
+
+def translate_sgf_move_qs(player_move, q):
+  return "{move}C[{q:.4f}]".format(
+    move=translate_sgf_move(player_move), q=q)
+
+
+def translate_sgf_move(player_move, comment):
+  if player_move.color not in (go.BLACK, go.WHITE):
+    raise ValueError("Can't translate color %s to sgf" % player_move.color)
+  c = coords.to_sgf(player_move.move)
+  color = 'B' if player_move.color == go.BLACK else 'W'
+  if comment is not None:
+    comment = comment.replace(']', r'\]')
+    comment_node = "C[{}]".format(comment)
+  else:
+    comment_node = ""
+  return ";{color}[{coords}]{comment_node}".format(
+    color=color, coords=c, comment_node=comment_node)
+
+
+def make_sgf(board_size,
+  move_history,
+  result_string,
+  ruleset="Chinese",
+  komi=7.5,
+  white_name=PROGRAM_IDENTIFIER,
+  black_name=PROGRAM_IDENTIFIER,
+  comments=[]
+):
+  """Turn a game into SGF.
+
+  Doesn't handle handicap games or positions with incomplete history.
+
+  Args:
+    move_history: iterable of PlayerMoves
+    result_string: "B+R", "W+0.5", etc.
+    comments: iterable of string/None. Will be zipped with move_history.
+  """
+  try:
+    # Python 2
+    from itertools import izip_longest
+    zip_longest = izip_longest
+  except ImportError:
+    # Python 3
+    from itertools import zip_longest
+
+  boardsize = board_size
+  game_moves = ''.join(translate_sgf_move(*z)
+             for z in zip_longest(move_history, comments))
+  result = result_string
+  return SGF_TEMPLATE.format(**locals())
+
+
+def sgf_prop(value_list):
+  'Converts raw sgf library output to sensible value'
+  if value_list is None:
+    return None
+  if len(value_list) == 1:
+    return value_list[0]
+  else:
+    return value_list
+
+
+def sgf_prop_get(props, key, default):
+  return sgf_prop(props.get(key, default))
+
+
+def handle_node(board_size, pos, node):
+  'A node can either add B+W stones, play as B, or play as W.'
+  props = node.properties
+  black_stones_added = [coords.from_sgf(board_size,
+    c) for c in props.get('AB', [])]
+  white_stones_added = [coords.from_sgf(board_size,
+    c) for c in props.get('AW', [])]
+  if black_stones_added or white_stones_added:
+    return add_stones(pos, black_stones_added, white_stones_added)
+  # If B/W props are not present, then there is no move. But if it is present
+  # and equal to the empty string, then the move was a pass.
+  elif 'B' in props:
+    black_move = coords.from_sgf(board_size, props.get('B', [''])[0])
+    return pos.play_move(black_move, color=go.BLACK)
+  elif 'W' in props:
+    white_move = coords.from_sgf(board_size, props.get('W', [''])[0])
+    return pos.play_move(white_move, color=go.WHITE)
+  else:
+    return pos
+
+
+def add_stones(pos, black_stones_added, white_stones_added):
+  working_board = np.copy(pos.board)
+  go.place_stones(working_board, go.BLACK, black_stones_added)
+  go.place_stones(working_board, go.WHITE, white_stones_added)
+  new_position = Position(board=working_board, n=pos.n, komi=pos.komi,
+              caps=pos.caps, ko=pos.ko, recent=pos.recent, to_play=pos.to_play)
+  return new_position
+
+
+def get_next_move(board_size, node):
+  props = node.next.properties
+  if 'W' in props:
+    return coords.from_sgf(board_size, props['W'][0])
+  else:
+    return coords.from_sgf(board_size, props['B'][0])
+
+
+def maybe_correct_next(pos, next_node):
+  if (('B' in next_node.properties and not pos.to_play == go.BLACK) or
+      ('W' in next_node.properties and not pos.to_play == go.WHITE)):
+    pos.flip_playerturn(mutate=True)
+
+
+def replay_sgf(board_size, sgf_contents):
+  """
+  Wrapper for sgf files, returning go.PositionWithContext instances.
+
+  It does NOT return the very final position, as there is no follow up.
+  To get the final position, call pwc.position.play_move(pwc.next_move)
+  on the last PositionWithContext returned.
+
+  Example usage:
+  with open(filename) as f:
+    for position_w_context in replay_sgf(f.read()):
+      print(position_w_context.position)
+  """
+  collection = sgf.parse(sgf_contents)
+  game = collection.children[0]
+  props = game.root.properties
+  assert int(sgf_prop(props.get('GM', ['1']))) == 1, "Not a Go SGF!"
+
+  komi = 0
+  if props.get('KM') != None:
+    komi = float(sgf_prop(props.get('KM')))
+  result = utils.parse_game_result(sgf_prop(props.get('RE')))
+
+  pos = Position(komi=komi)
+  current_node = game.root
+  while pos is not None and current_node.next is not None:
+    pos = handle_node(board_size, pos, current_node)
+    maybe_correct_next(pos, current_node.next)
+    next_move = get_next_move(board_size, current_node)
+    yield PositionWithContext(pos, next_move, result)
+    current_node = current_node.next
+
+
+def replay_sgf_file(board_size, sgf_file):
+  with open(sgf_file) as f:
+    for pwc in replay_sgf(board_size, f.read()):
+      yield pwc
--- a/research/minigo/strategies.py
+++ b/research/minigo/strategies.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""The strategy to play each move with MCTS."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import random
+import sys
+import time
+
+import coords
+import go
+from mcts import MCTSNode
+import numpy as np
+import sgf_wrapper
+
+
+def time_recommendation(move_num, seconds_per_move=5, time_limit=15*60,
+                        decay_factor=0.98):
+  """
+  Given current move number and "desired" seconds per move,
+  return how much time should actually be used. To be used specifically
+  for CGOS time controls, which are absolute 15 minute time.
+
+  The strategy is to spend the maximum time possible using seconds_per_move,
+  and then switch to an exponentially decaying time usage, calibrated so that
+  we have enough time for an infinite number of moves.
+  """
+
+  # divide by two since you only play half the moves in a game.
+  player_move_num = move_num / 2
+
+  # sum of geometric series maxes out at endgame_time seconds.
+  endgame_time = seconds_per_move / (1 - decay_factor)
+
+  if endgame_time > time_limit:
+    # there is so little main time that we're already in "endgame" mode.
+    base_time = time_limit * (1 - decay_factor)
+    return base_time * decay_factor ** player_move_num
+
+  # leave over endgame_time seconds for the end, and play at seconds_per_move
+  # for as long as possible
+  core_time = time_limit - endgame_time
+  core_moves = core_time / seconds_per_move
+
+  if player_move_num < core_moves:
+    return seconds_per_move
+  else:
+    return seconds_per_move * decay_factor ** (player_move_num - core_moves)
+
+
+def _get_temperature_cutoff(board_size):
+  # When to do deterministic move selection.  ~30 moves on a 19x19, ~8 on 9x9
+  return int((board_size * board_size) / 12)
+
+
+class MCTSPlayerMixin(object):
+  # If 'simulations_per_move' is nonzero, it will perform that many reads
+  # before playing. Otherwise, it uses 'seconds_per_move' of wall time'
+  def __init__(self, board_size, network, seconds_per_move=5,
+               simulations_per_move=0, resign_threshold=-0.90,
+               verbosity=0, two_player_mode=False, num_parallel=8):
+    self.board_size = board_size
+    self.network = network
+    self.seconds_per_move = seconds_per_move
+    self.simulations_per_move = simulations_per_move
+    self.verbosity = verbosity
+    self.two_player_mode = two_player_mode
+    if two_player_mode:
+      self.temp_threshold = -1
+    else:
+      self.temp_threshold = _get_temperature_cutoff(board_size)
+    self.num_parallel = num_parallel
+    self.qs = []
+    self.comments = []
+    self.searches_pi = []
+    self.root = None
+    self.result = 0
+    self.result_string = None
+    self.resign_threshold = -abs(resign_threshold)
+    super(MCTSPlayerMixin, self).__init__(board_size)
+
+  def initialize_game(self, position=None):
+    if position is None:
+      position = go.Position(self.board_size)
+    self.root = MCTSNode(self.board_size, position)
+    self.result = 0
+    self.result_string = None
+    self.comments = []
+    self.searches_pi = []
+    self.qs = []
+
+  def suggest_move(self, position):
+    """ Used for playing a single game.
+
+    For parallel play, use initialize_move, select_leaf,
+    incorporate_results, and pick_move
+    """
+    start = time.time()
+
+    if self.simulations_per_move == 0:
+      while time.time() - start < self.seconds_per_move:
+        self.tree_search()
+    else:
+      current_readouts = self.root.N
+      while self.root.N < current_readouts + self.simulations_per_move:
+        self.tree_search()
+      if self.verbosity > 0:
+        print("%d: Searched %d times in %s seconds\n\n" % (
+            position.n, self.simulations_per_move, time.time() - start),
+              file=sys.stderr)
+
+    # print some stats on anything with probability > 1%
+    if self.verbosity > 2:
+      print(self.root.describe(), file=sys.stderr)
+      print('\n\n', file=sys.stderr)
+    if self.verbosity > 3:
+      print(self.root.position, file=sys.stderr)
+
+    return self.pick_move()
+
+  def play_move(self, c):
+    """
+    Notable side effects:
+      - finalizes the probability distribution according to
+      this roots visit counts into the class' running tally, `searches_pi`
+      - Makes the node associated with this move the root, for future
+      `inject_noise` calls.
+    """
+    if not self.two_player_mode:
+      self.searches_pi.append(
+          self.root.children_as_pi(self.root.position.n < self.temp_threshold))
+    self.qs.append(self.root.Q)  # Save our resulting Q.
+    self.comments.append(self.root.describe())
+    self.root = self.root.maybe_add_child(coords.to_flat(self.board_size, c))
+    self.position = self.root.position  # for showboard
+    del self.root.parent.children
+    return True  # GTP requires positive result.
+
+  def pick_move(self):
+    """Picks a move to play, based on MCTS readout statistics.
+
+    Highest N is most robust indicator. In the early stage of the game, pick
+    a move weighted by visit count; later on, pick the absolute max."""
+    if self.root.position.n > self.temp_threshold:
+      fcoord = np.argmax(self.root.child_N)
+    else:
+      cdf = self.root.child_N.cumsum()
+      cdf /= cdf[-1]
+      selection = random.random()
+      fcoord = cdf.searchsorted(selection)
+      assert self.root.child_N[fcoord] != 0
+    return coords.from_flat(self.board_size, fcoord)
+
+  def tree_search(self, num_parallel=None):
+    if num_parallel is None:
+      num_parallel = self.num_parallel
+    leaves = []
+    failsafe = 0
+    while len(leaves) < num_parallel and failsafe < num_parallel * 2:
+      failsafe += 1
+      leaf = self.root.select_leaf()
+      if self.verbosity >= 4:
+        print(self.show_path_to_root(leaf))
+      # if game is over, override the value estimate with the true score
+      if leaf.is_done():
+        value = 1 if leaf.position.score() > 0 else -1
+        leaf.backup_value(value, up_to=self.root)
+        continue
+      leaf.add_virtual_loss(up_to=self.root)
+      leaves.append(leaf)
+    if leaves:
+      move_probs, values = self.network.run_many(
+          [leaf.position for leaf in leaves])
+      for leaf, move_prob, value in zip(leaves, move_probs, values):
+        leaf.revert_virtual_loss(up_to=self.root)
+        leaf.incorporate_results(move_prob, value, up_to=self.root)
+
+  def show_path_to_root(self, node):
+    MAX_DEPTH = (self.board_size ** 2) * 1.4  # 505 moves for 19x19, 113 for 9x9
+    pos = node.position
+    diff = node.position.n - self.root.position.n
+    if len(pos.recent) == 0:
+      return
+
+    def fmt(move):
+      return "{}-{}".format('b' if move.color == 1 else 'w',
+                            coords.to_kgs(self.board_size, move.move))
+    path = " ".join(fmt(move) for move in pos.recent[-diff:])
+    if node.position.n >= MAX_DEPTH:
+      path += " (depth cutoff reached) %0.1f" % node.position.score()
+    elif node.position.is_game_over():
+      path += " (game over) %0.1f" % node.position.score()
+    return path
+
+  def should_resign(self):
+    """Returns true if the player resigned.
+
+    No further moves should be played.
+    """
+    return self.root.Q_perspective < self.resign_threshold
+
+  def set_result(self, winner, was_resign):
+    self.result = winner
+    if was_resign:
+      string = "B+R" if winner == go.BLACK else "W+R"
+    else:
+      string = self.root.position.result_string()
+    self.result_string = string
+
+  def to_sgf(self, use_comments=True):
+    assert self.result_string is not None
+    pos = self.root.position
+    if use_comments:
+      comments = self.comments or ['No comments.']
+      comments[0] = ("Resign Threshold: %0.3f\n" %
+                     self.resign_threshold) + comments[0]
+    else:
+      comments = []
+    return sgf_wrapper.make_sgf(
+        self.board_size, pos.recent, self.result_string,
+        white_name=os.path.basename(self.network.save_file) or "Unknown",
+        black_name=os.path.basename(self.network.save_file) or "Unknown",
+        comments=comments)
+
+  def is_done(self):
+    return self.result != 0 or self.root.is_done()
+
+  def extract_data(self):
+    assert len(self.searches_pi) == self.root.position.n
+    assert self.result != 0
+    for pwc, pi in zip(go.replay_position(
+        self.board_size, self.root.position, self.result), self.searches_pi):
+      yield pwc.position, pi, pwc.result
+
+  def chat(self, msg_type, sender, text):
+    default_response = (
+        "Supported commands are 'winrate', 'nextplay', 'fortune', and 'help'.")
+    if self.root is None or self.root.position.n == 0:
+      return "I'm not playing right now.  " + default_response
+
+    if 'winrate' in text.lower():
+      wr = (abs(self.root.Q) + 1.0) / 2.0
+      color = "Black" if self.root.Q > 0 else "White"
+      return "{:s} {:.2f}%".format(color, wr * 100.0)
+    elif 'nextplay' in text.lower():
+      return "I'm thinking... " + self.root.most_visited_path()
+    elif 'fortune' in text.lower():
+      return "You're feeling lucky!"
+    elif 'help' in text.lower():
+      return "I can't help much with go -- try ladders!  Otherwise: {}".format(
+          default_response)
+    else:
+      return default_response
+
+
+class CGOSPlayerMixin(MCTSPlayerMixin):
+  def suggest_move(self, position):
+    self.seconds_per_move = time_recommendation(position.n)
+    return super().suggest_move(position)
--- a/research/minigo/symmetries.py
+++ b/research/minigo/symmetries.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Define symmetries for feature transformation.
+
+Allowable symmetries:
+  identity [12][34]
+  rot90 [24][13]
+  rot180 [43][21]
+  rot270 [31][42]
+  flip [13][24]
+  fliprot90 [34][12]
+  fliprot180 [42][31]
+  fliprot270 [21][43]
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import functools
+import random
+
+import numpy as np
+
+INVERSES = {
+    'identity': 'identity',
+    'rot90': 'rot270',
+    'rot180': 'rot180',
+    'rot270': 'rot90',
+    'flip': 'flip',
+    'fliprot90': 'fliprot90',
+    'fliprot180': 'fliprot180',
+    'fliprot270': 'fliprot270',
+}
+
+IMPLS = {
+    'identity': lambda x: x,
+    'rot90': np.rot90,
+    'rot180': functools.partial(np.rot90, k=2),
+    'rot270': functools.partial(np.rot90, k=3),
+    'flip': lambda x: np.rot90(np.fliplr(x)),
+    'fliprot90': np.flipud,
+    'fliprot180': lambda x: np.rot90(np.flipud(x)),
+    'fliprot270': np.fliplr,
+}
+
+assert set(INVERSES.keys()) == set(IMPLS.keys())
+SYMMETRIES = list(INVERSES.keys())
+
+
+# A symmetry is just a string describing the transformation.
+def invert_symmetry(s):
+  return INVERSES[s]
+
+
+def apply_symmetry_feat(s, features):
+  return IMPLS[s](features)
+
+
+def apply_symmetry_pi(board_size, s, pi):
+  pi = np.copy(pi)
+  # rotate all moves except for the pass move at end
+  pi[:-1] = IMPLS[s](pi[:-1].reshape([board_size, board_size])).ravel()
+  return pi
+
+
+def randomize_symmetries_feat(features):
+  symmetries_used = [random.choice(SYMMETRIES) for f in features]
+  return symmetries_used, [apply_symmetry_feat(s, f)
+                           for s, f in zip(symmetries_used, features)]
+
+
+def invert_symmetries_pi(board_size, symmetries, pis):
+  return [apply_symmetry_pi(board_size, invert_symmetry(s), pi)
+          for s, pi in zip(symmetries, pis)]
--- a/research/minigo/utils.py
+++ b/research/minigo/utils.py
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Utilities for MiniGo and DualNet model."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from contextlib import contextmanager
+import functools
+import itertools
+import math
+import operator
+import os
+import random
+import re
+import string
+import time
+
+import tensorflow as tf  # pylint: disable=g-bad-import-order
+
+# Regular expression of model number and name.
+MODEL_NUM_REGEX = r'^\d{6}'  # model_num consists of six digits
+# model_name consists of six digits followed by a dash and the model name
+MODEL_NAME_REGEX = r'^\d{6}(-\w+)+'
+
+
+def random_generator(size=6, chars=string.ascii_letters + string.digits):
+  return ''.join(random.choice(chars) for x in range(size))
+
+
+def generate_model_name(model_num):
+  """Generate a full model name for the given model number.
+
+  Args:
+    model_num: The number/generation of the model.
+
+  Returns:
+    The model's full name: model_num-model_name.
+  """
+  if model_num == 0:  # Model number for bootstrap model
+    new_name = 'bootstrap'
+  else:
+    new_name = random_generator()
+  full_name = '{:06d}-{}'.format(model_num, new_name)
+  return full_name
+
+
+def detect_model_num(full_name):
+  """Take the full name of a model and extract its model number.
+
+  Args:
+    full_name: The full name of a model.
+
+  Returns:
+    The model number. For example: '000000-bootstrap.index' => 0.
+  """
+  match = re.match(MODEL_NUM_REGEX, full_name)
+  if match:
+    return int(match.group())
+  else:
+    return None
+
+
+def detect_model_name(full_name):
+  """Take the full name of a model and extract its model name.
+
+  Args:
+    full_name: The full name of a model.
+
+  Returns:
+    The model name. For example: '000000-bootstrap.index' => '000000-bootstrap'.
+  """
+  match = re.match(MODEL_NAME_REGEX, full_name)
+  if match:
+    return match.group()
+  else:
+    return None
+
+
+def get_models(models_dir):
+  """Get all models.
+
+  Args:
+    models_dir: The directory of all models.
+
+  Returns:
+    A list of model number and names sorted increasingly. For example:
+    [(13, 000013-modelname), (17, 000017-modelname), ...etc]
+  """
+  all_models = tf.gfile.Glob(os.path.join(models_dir, '*.meta'))
+  model_filenames = [os.path.basename(m) for m in all_models]
+  model_numbers_names = sorted([
+      (detect_model_num(m), detect_model_name(m))
+      for m in model_filenames])
+  return model_numbers_names
+
+
+def get_latest_model(models_dir):
+  """Find the latest model.
+
+  Args:
+    models_dir: The directory of all models.
+
+  Returns:
+    The model number and name of the latest model. For example:
+    (17, 000017-modelname)
+  """
+  models = get_models(models_dir)
+  if models is None:
+    models = [(0, '000000-bootstrap')]
+  return models[-1]
+
+
+def round_power_of_two(n):
+  """Finds the nearest power of 2 to a number.
+
+  Thus 84 -> 64, 120 -> 128, etc.
+
+  Args:
+    n: The given number.
+
+  Returns:
+    The nearest 2-power number to n.
+  """
+  return 2 ** int(round(math.log(n, 2)))
+
+
+def parse_game_result(result):
+  if re.match(r'[bB]\+', result):
+    return 1
+  elif re.match(r'[wW]\+', result):
+    return -1
+  else:
+    return 0
+
+
+def product(numbers):
+  return functools.reduce(operator.mul, numbers)
+
+
+def take_n(n, iterable):
+  return list(itertools.islice(iterable, n))
+
+
+def iter_chunks(chunk_size, iterator):
+  iterator = iter(iterator)
+  while True:
+    next_chunk = take_n(chunk_size, iterator)
+    # If len(iterable) % chunk_size == 0, don't return an empty chunk.
+    if next_chunk:
+      yield next_chunk
+    else:
+      break
+
+
+def shuffler(iterator, pool_size=10**5, refill_threshold=0.9):
+  yields_between_refills = round(pool_size * (1 - refill_threshold))
+  # initialize pool; this step may or may not exhaust the iterator.
+  pool = take_n(pool_size, iterator)
+  while True:
+    random.shuffle(pool)
+    for _ in range(yields_between_refills):
+      yield pool.pop()
+    next_batch = take_n(yields_between_refills, iterator)
+    if not next_batch:
+      break
+    pool.extend(next_batch)
+  # finish consuming whatever's left - no need for further randomization.
+  # yield from pool
+  print(type(pool))
+  for p in pool:
+    yield p
+
+
+@contextmanager
+def timer(message):
+  tick = time.time()
+  yield
+  tock = time.time()
+  print('{}: {:.3} seconds'.foramt(message, (tock - tick)))
+
+
+@contextmanager
+def logged_timer(message):
+  tick = time.time()
+  yield
+  tock = time.time()
+  print('{}: {:.3} seconds'.format(message, (tock - tick)))
+  tf.logging.info('{}: {:.3} seconds'.format(message, (tock - tick)))
+
+
+class MiniGoDirectory(object):
+  """The class to set up directories of MiniGo."""
+
+  def __init__(self, base_dir):
+    self.trained_models_dir = os.path.join(base_dir, 'models')
+    self.estimator_model_dir = os.path.join(base_dir, 'estimator_model_dir/')
+    self.selfplay_dir = os.path.join(base_dir, 'data/selfplay/')
+    self.holdout_dir = os.path.join(base_dir, 'data/holdout/')
+    self.training_chunk_dir = os.path.join(base_dir, 'data/training_chunks/')
+    self.sgf_dir = os.path.join(base_dir, 'sgf/')
+    self.evaluate_dir = os.path.join(base_dir, 'sgf/evaluate/')