Remove unmaintained fork of Minigo code (#7605)

The reference implementation can be found at https://github.com/tensorflow/minigo This fork was originally created to experiment with performance upgrades for MLPerf, but since MLPerf work is focused in the original repo, this fork's existence only serves to confuse.

Remove unmaintained fork of Minigo code (#7605)
The reference implementation can be found at https://github.com/tensorflow/minigo This fork was originally created to experiment with performance upgrades for MLPerf, but since MLPerf work is focused in the original repo, this fork's existence only serves to confuse.
6f1e3b38 · Brian Lee · Hongkun Yu · 497989e0 · 497989e0 · 497989e0
Commit 6f1e3b38 authored Sep 25, 2019 by Brian Lee Committed by Hongkun Yu Sep 25, 2019
20 changed files
--- a/research/minigo/README.md
+++ b/research/minigo/README.md
-# MiniGo
-This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
-MiniGo is a minimalist Go engine modeled after AlphaGo Zero, ["Mastering the Game of Go without Human
-Knowledge"](https://www.nature.com/articles/nature24270). An useful one-diagram overview of Alphago Zero can be found in the [cheat sheet](https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0).
-The implementation of MiniGo consists of three main components: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently, the **DualNet model** is our focus.
-## DualNet Architecture
-DualNet is the neural network used in MiniGo. It's based on residual blocks with two heads output. Following is a brief overview of the DualNet architecture.
-### Input Features
-The input to the neural network is a [board_size * board_size * 17] image stack
-comprising 17 binary feature planes. 8 feature planes consist of binary values
-indicating the presence of the current player's stones; A further 8 feature
-planes represent the corresponding features for the opponent's stones; The final
-feature plane represents the color to play, and has a constant value of either 1
-if black is to play or 0 if white to play. Check [features.py](features.py) for more details.
-### Neural Network Structure
-In MiniGo implementation, the input features are processed by a residual tower
-that consists of a single convolutional block followed by either 9 or 19
-residual blocks.
-The convolutional block applies the following modules:
-  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-Each residual block applies the following modules sequentially to its input:
-  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-  4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
-  5. Batch normalization
-  6. A skip connection that adds the input to the block
-  7. A rectifier non-linearity
-Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size in MiniGo implementation.
-### Dual Heads Output
-The output of the residual tower is passed into two separate "heads" for
-computing the policy and value respectively. The policy head applies the
-following modules:
-  1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-  4. A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move
-The value head applies the following modules:
-  1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-  4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
-    board size and 64 for 9x9 board size
-  5. A rectifier non-linearity
-  6. A fully connected linear layer to a scalar
-  7. A tanh non-linearity outputting a scalar in the range [-1, 1]
-In MiniGo, the overall network depth, in the 10 or 20 block network, is 19 or 39
-parameterized layers respectively for the residual tower, plus an additional 2
-layers for the policy head and 3 layers for the value head.
-## Getting Started
-This project assumes you have virtualenv, TensorFlow (>= 1.5) and two other Go-related
-packages pygtp(>=0.4) and sgf (==0.5).
-## Training Model
-One iteration of reinforcement learning (RL) consists of the following steps:
- - Bootstrap: initializes a random DualNet model. If the estimator directory has exist, the model is initialized with the last checkpoint.
- - Selfplay: plays games with the latest model or the best model so far identified by evaluation, producing data used for training
- - Gather: groups games played with the same model into larger files of tfexamples.
- - Train: trains a new model with the selfplay results from the most recent N generations.
-To run the RL pipeline, issue the following command:
- ```
- python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256
- ```
- Arguments:
-   * `--base_dir`: Base directory for MiniGo data and models. If not specified, it's set as /tmp/minigo/ by default.
-   * `--board_size`: Go board size. It can be either 9 or 19. By default, it's 9.
-   * `--batch_size`: Batch size for model training. If not specified, it's calculated based on go board size.
- Use the `--help` or `-h` flag to get a full list of possible arguments. Besides all these arguments, other parameters about RL pipeline and DualNet model can be found and configured in [model_params.py](model_params.py).
-Suppose the base directory argument `base_dir` is `$HOME/minigo/` and we use 9 as the `board_size`. After model training, the following directories are created to store models and game data:
-    $HOME/minigo                  # base directory
-    │
-    ├── 9_size                    # directory for 9x9 board size
-    │   │
-    │   ├── data
-    │   │   ├── holdout           # holdout data for model validation
-    │   │   ├── selfplay          # data generated by selfplay of each model
-    │   │   └── training_chunks   # gatherd tf_examples for model training
-    │   │
-    │   ├── estimator_model_dir   # estimator working directory
-    │   │
-    │   ├── trained_models        # all the trained models
-    │   │
-    │   └── sgf                   # sgf (smart go files) folder
-    │       ├── 000000-bootstrap  # model name
-    │       │      ├── clean      # clean sgf files of model selfplay
-    │       │      └── full       # full sgf files of model selfplay
-    │       ├── ...
-    │       └── evaluate          # clean sgf files of model evaluation
-    │
-    └── ...
-## Validating Model
-To validate the trained model, issue the following command with `--validation` argument:
- ```
- python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --validation
- ```
-## Evaluating Models
-The performance of two models are compared with evaluation step. Given two models, one plays black and the other plays white. They play several games (# of games can be configured by parameter `eval_games` in [model_params.py](model_params.py)), and the one wins by a margin of 55% will be the winner.
-To include the evaluation step in the RL pipeline, `--evaluation` argument can be specified to compare the performance of the `current_trained_model` and the `best_model_so_far`. The winner is used to update `best_model_so_far`. Run the following command to include evaluation step in the pipeline:
- ```
- python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --evaluation
- ```
-## Testing Pipeline
-As the whole RL pipeline may take hours to train even for a 9x9 board size, a `--test` argument is provided to test the pipeline quickly with a dummy neural network model.
-To test the RL pipeline with a dummy model, issue the following command:
-```
- python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --test
-```
-## Running Self-play Only
-Self-play only option is provided to run selfplay step individually to generate training data in parallel. Issue the following command to run selfplay only with the latest trained model:
-```
- python minigo.py --selfplay
-```
-Other optional arguments:
-   * `--selfplay_model_name`: The name of the model used for selfplay only. If not specified, the latest trained model will be used for selfplay.
-   * `--selfplay_max_games`: The maximum number of games selfplay is required to generate. If not specified, the default parameter `max_games_per_generation` is used.
--- a/research/minigo/__init__.py
+++ b/research/minigo/__init__.py
--- a/research/minigo/coords.py
+++ b/research/minigo/coords.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Logic for dealing with coordinates.
-This introduces some helpers and terminology that are used throughout MiniGo.
-MiniGo Coordinate: This is a tuple of the form (row, column) that is indexed
-  starting out at (0, 0) from the upper-left.
-Flattened Coordinate: this is a number ranging from 0 - N^2 (so N^2+1
-  possible values). The extra value N^2 is used to mark a 'pass' move.
-SGF Coordinate: Coordinate used for SGF serialization format. Coordinates use
-  two-letter pairs having the form (column, row) indexed from the upper-left
-  where 0, 0 = 'aa'.
-KGS Coordinate: Human-readable coordinate string indexed from bottom left, with
-  the first character a capital letter for the column and the second a number
-  from 1-19 for the row. Note that KGS chooses to skip the letter 'I' due to
-  its similarity with 'l' (lowercase 'L').
-PYGTP Coordinate: Tuple coordinate indexed starting at 1,1 from bottom-left
-  in the format (column, row)
-So, for a 19x19,
-Coord Type      upper_left      upper_right     pass
-------------------------------------------------------
-minigo coord    (0, 0)          (0, 18)         None
-flat            0               18              361
-SGF             'aa'            'sa'            ''
-KGS             'A19'           'T19'           'pass'
-pygtp           (1, 19)         (19, 19)        (0, 0)
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import gtp
-# We provide more than 19 entries here in case of boards larger than 19 x 19.
-_SGF_COLUMNS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
-_KGS_COLUMNS = 'ABCDEFGHJKLMNOPQRSTUVWXYZ'
-def from_flat(board_size, flat):
-  """Converts from a flattened coordinate to a MiniGo coordinate."""
-  if flat == board_size * board_size:
-    return None
-  return divmod(flat, board_size)
-def to_flat(board_size, coord):
-  """Converts from a MiniGo coordinate to a flattened coordinate."""
-  if coord is None:
-    return board_size * board_size
-  return board_size * coord[0] + coord[1]
-def from_sgf(sgfc):
-  """Converts from an SGF coordinate to a MiniGo coordinate."""
-  if not sgfc:
-    return None
-  return _SGF_COLUMNS.index(sgfc[1]), _SGF_COLUMNS.index(sgfc[0])
-def to_sgf(coord):
-  """Converts from a MiniGo coordinate to an SGF coordinate."""
-  if coord is None:
-    return ''
-  return _SGF_COLUMNS[coord[1]] + _SGF_COLUMNS[coord[0]]
-def from_kgs(board_size, kgsc):
-  """Converts from a KGS coordinate to a MiniGo coordinate."""
-  if kgsc == 'pass':
-    return None
-  kgsc = kgsc.upper()
-  col = _KGS_COLUMNS.index(kgsc[0])
-  row_from_bottom = int(kgsc[1:])
-  return board_size - row_from_bottom, col
-def to_kgs(board_size, coord):
-  """Converts from a MiniGo coordinate to a KGS coordinate."""
-  if coord is None:
-    return 'pass'
-  y, x = coord
-  return '{}{}'.format(_KGS_COLUMNS[x], board_size - y)
-def from_pygtp(board_size, pygtpc):
-  """Converts from a pygtp coordinate to a MiniGo coordinate."""
-  # GTP has a notion of both a Pass and a Resign, both of which are mapped to
-  # None, so the conversion is not precisely bijective.
-  if pygtpc in (gtp.PASS, gtp.RESIGN):
-    return None
-  return board_size - pygtpc[1], pygtpc[0] - 1
-def to_pygtp(board_size, coord):
-  """Converts from a MiniGo coordinate to a pygtp coordinate."""
-  if coord is None:
-    return gtp.PASS
-  return coord[1] + 1, board_size - coord[0]
--- a/research/minigo/coords_test.py
+++ b/research/minigo/coords_test.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for coords."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import coords
-import numpy
-import utils_test
-tf.logging.set_verbosity(tf.logging.ERROR)
-class TestCoords(utils_test.MiniGoUnitTest):
-  def test_upperleft(self):
-    self.assertEqual(coords.from_sgf('aa'), (0, 0))
-    self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 0), (0, 0))
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'A9'), (0, 0))
-    self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (1, 9)), (0, 0))
-    self.assertEqual(coords.to_sgf((0, 0)), 'aa')
-    self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 0)), 0)
-    self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (0, 0)), 'A9')
-    self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (0, 0)), (1, 9))
-  def test_topleft(self):
-    self.assertEqual(coords.from_sgf('ia'), (0, 8))
-    self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 8), (0, 8))
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'J9'), (0, 8))
-    self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (9, 9)), (0, 8))
-    self.assertEqual(coords.to_sgf((0, 8)), 'ia')
-    self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 8)), 8)
-    self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (0, 8)), 'J9')
-    self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (0, 8)), (9, 9))
-  def test_pass(self):
-    self.assertEqual(coords.from_sgf(''), None)
-    self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 81), None)
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'pass'), None)
-    self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (0, 0)), None)
-    self.assertEqual(coords.to_sgf(None), '')
-    self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, None), 81)
-    self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, None), 'pass')
-    self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, None), (0, 0))
-  def test_parsing_9x9(self):
-    self.assertEqual(coords.from_sgf('aa'), (0, 0))
-    self.assertEqual(coords.from_sgf('ac'), (2, 0))
-    self.assertEqual(coords.from_sgf('ca'), (0, 2))
-    self.assertEqual(coords.from_sgf(''), None)
-    self.assertEqual(coords.to_sgf(None), '')
-    self.assertEqual('aa', coords.to_sgf(coords.from_sgf('aa')))
-    self.assertEqual('sa', coords.to_sgf(coords.from_sgf('sa')))
-    self.assertEqual((1, 17), coords.from_sgf(coords.to_sgf((1, 17))))
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'A1'), (8, 0))
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'A9'), (0, 0))
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'C2'), (7, 2))
-    self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'J2'), (7, 8))
-    self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (1, 1)), (8, 0))
-    self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (1, 9)), (0, 0))
-    self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (3, 2)), (7, 2))
-    self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (8, 0)), (1, 1))
-    self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (0, 0)), (1, 9))
-    self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (7, 2)), (3, 2))
-    self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (0, 8)), 'J9')
-    self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (8, 0)), 'A1')
-  def test_flatten(self):
-    self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 0)), 0)
-    self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 3)), 3)
-    self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (3, 0)), 27)
-    self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 27), (3, 0))
-    self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 10), (1, 1))
-    self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 80), (8, 8))
-    self.assertEqual(coords.to_flat(
-        utils_test.BOARD_SIZE, coords.from_flat(utils_test.BOARD_SIZE, 10)), 10)
-    self.assertEqual(coords.from_flat(
-        utils_test.BOARD_SIZE, coords.to_flat(
-            utils_test.BOARD_SIZE, (5, 4))), (5, 4))
-  def test_from_flat_ndindex_equivalence(self):
-    ndindices = list(numpy.ndindex(
-        utils_test.BOARD_SIZE, utils_test.BOARD_SIZE))
-    flat_coords = list(range(
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE))
-    def _from_flat(flat_coords):
-      return coords.from_flat(utils_test.BOARD_SIZE, flat_coords)
-    self.assertEqual(
-        list(map(_from_flat, flat_coords)), ndindices)
-if __name__ == '__main__':
-  tf.test.main()
--- a/research/minigo/dualnet.py
+++ b/research/minigo/dualnet.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Contains utility and supporting functions for DualNet.
-This module provides the model interface, including functions for DualNet model
-bootstrap, training, validation, loading and exporting.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import dualnet_model
-import features
-import preprocessing
-import symmetries
-class DualNetRunner(object):
-  """The DualNetRunner class for the complete model with graph and weights.
-  This class can restore the model from saved files, and provide inference for
-  given examples.
-  """
-  def __init__(self, save_file, params):
-    """Initialize the dual network from saved model/checkpoints.
-    Args:
-      save_file: Path where model parameters were previously saved. For example:
-        '/tmp/minigo/models_dir/000000-bootstrap/'
-      params: An object with hyperparameters for DualNetRunner
-    """
-    self.save_file = save_file
-    self.hparams = params
-    self.inference_input = None
-    self.inference_output = None
-    config = tf.ConfigProto()
-    config.gpu_options.allow_growth = True
-    self.sess = tf.Session(graph=tf.Graph(), config=config)
-    self.initialize_graph()
-  def initialize_graph(self):
-    """Initialize the graph with saved model."""
-    with self.sess.graph.as_default():
-      input_features, labels = get_inference_input(self.hparams)
-      estimator_spec = dualnet_model.model_fn(
-          input_features, labels, tf.estimator.ModeKeys.PREDICT, self.hparams)
-      self.inference_input = input_features
-      self.inference_output = estimator_spec.predictions
-      if self.save_file is not None:
-        self.initialize_weights(self.save_file)
-      else:
-        self.sess.run(tf.global_variables_initializer())
-  def initialize_weights(self, save_file):
-    """Initialize the weights from the given save_file.
-    Assumes that the graph has been constructed, and the save_file contains
-    weights that match the graph. Used to set the weights to a different version
-    of the player without redefining the entire graph.
-    Args:
-      save_file: Path where model parameters were previously saved.
-    """
-    tf.train.Saver().restore(self.sess, save_file)
-  def run(self, position, use_random_symmetry=True):
-    """Compute the policy and value output for a given position.
-    Args:
-      position: A given go board status
-      use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
-        the extracted feature (defined in features.py) of the given position
-    Returns:
-      prob, value: The policy and value output (defined in dualnet_model.py)
-    """
-    probs, values = self.run_many(
-        [position], use_random_symmetry=use_random_symmetry)
-    return probs[0], values[0]
-  def run_many(self, positions, use_random_symmetry=True):
-    """Compute the policy and value output for given positions.
-    Args:
-      positions: A list of positions for go board status
-      use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
-        the extracted features (defined in features.py) of the given positions
-    Returns:
-      probabilities, value: The policy and value outputs (defined in
-        dualnet_model.py)
-    """
-    def _extract_features(positions):
-      return features.extract_features(self.hparams.board_size, positions)
-    processed = list(map(_extract_features, positions))
-    # processed = [
-    #  features.extract_features(self.hparams.board_size, p) for p in positions]
-    if use_random_symmetry:
-      syms_used, processed = symmetries.randomize_symmetries_feat(processed)
-    # feed_dict is a dict object to provide the input examples for the step of
-    # inference. sess.run() returns the inference predictions (indicated by
-    # self.inference_output) of the given input as outputs
-    outputs = self.sess.run(
-        self.inference_output, feed_dict={self.inference_input: processed})
-    probabilities, value = outputs['policy_output'], outputs['value_output']
-    if use_random_symmetry:
-      probabilities = symmetries.invert_symmetries_pi(
-          self.hparams.board_size, syms_used, probabilities)
-    return probabilities, value
-def get_inference_input(params):
-  """Set up placeholders for input features/labels.
-  Args:
-    params: An object to indicate the hyperparameters of the model.
-  Returns:
-    The features and output tensors that get passed into model_fn. Check
-      dualnet_model.py for more details on the models input and output.
-  """
-  input_features = tf.placeholder(
-      tf.float32, [None, params.board_size, params.board_size,
-                   features.NEW_FEATURES_PLANES],
-      name='pos_tensor')
-  labels = {
-      'pi_tensor': tf.placeholder(
-          tf.float32, [None, params.board_size * params.board_size + 1]),
-      'value_tensor': tf.placeholder(tf.float32, [None])
-  }
-  return input_features, labels
-def bootstrap(working_dir, params):
-  """Initialize a tf.Estimator run with random initial weights.
-  Args:
-    working_dir: The directory where tf.estimator will drop logs,
-      checkpoints, and so on
-    params: hyperparams of the model.
-  """
-  # Forge an initial checkpoint with the name that subsequent Estimator will
-  # expect to find.
-  estimator_initial_checkpoint_name = 'model.ckpt-1'
-  save_file = os.path.join(working_dir,
-                           estimator_initial_checkpoint_name)
-  sess = tf.Session()
-  with sess.graph.as_default():
-    input_features, labels = get_inference_input(params)
-    dualnet_model.model_fn(
-        input_features, labels, tf.estimator.ModeKeys.PREDICT, params)
-    sess.run(tf.global_variables_initializer())
-    tf.train.Saver().save(sess, save_file)
-def export_model(working_dir, model_path):
-  """Take the latest checkpoint and export it to model_path for selfplay.
-  Assumes that all relevant model files are prefixed by the same name.
-  (For example, foo.index, foo.meta and foo.data-00000-of-00001).
-  Args:
-    working_dir: The directory where tf.estimator keeps its checkpoints.
-    model_path: Either a local path or a gs:// path to export model to.
-  """
-  latest_checkpoint = tf.train.latest_checkpoint(working_dir)
-  all_checkpoint_files = tf.gfile.Glob(latest_checkpoint + '*')
-  for filename in all_checkpoint_files:
-    suffix = filename.partition(latest_checkpoint)[2]
-    destination_path = model_path + suffix
-    tf.gfile.Copy(filename, destination_path)
-def train(working_dir, tf_records, generation, params):
-  """Train the model for a specific generation.
-  Args:
-    working_dir: The model working directory to save model parameters,
-      drop logs, checkpoints, and so on.
-    tf_records: A list of tf_record filenames for training input.
-    generation: The generation to be trained.
-    params: hyperparams of the model.
-  Raises:
-    ValueError: if generation is not greater than 0.
-  """
-  if generation <= 0:
-    raise ValueError('Model 0 is random weights')
-  estimator = tf.estimator.Estimator(
-      dualnet_model.model_fn, model_dir=working_dir, params=params)
-  max_steps = (generation * params.examples_per_generation
-               // params.batch_size)
-  profiler_hook = tf.train.ProfilerHook(output_dir=working_dir, save_secs=600)
-  def input_fn():
-    return preprocessing.get_input_tensors(
-        params, params.batch_size, tf_records)
-  estimator.train(
-      input_fn, hooks=[profiler_hook], max_steps=max_steps)
-def validate(working_dir, tf_records, params):
-  """Perform model validation on the hold out data.
-  Args:
-    working_dir: The model working directory.
-    tf_records: A list of tf_records filenames for holdout data.
-    params: hyperparams of the model.
-  """
-  estimator = tf.estimator.Estimator(
-      dualnet_model.model_fn, model_dir=working_dir, params=params)
-  def input_fn():
-    return preprocessing.get_input_tensors(
-        params, params.batch_size, tf_records, filter_amount=0.05)
-  estimator.evaluate(input_fn, steps=1000)
--- a/research/minigo/dualnet_model.py
+++ b/research/minigo/dualnet_model.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Defines DualNet model, the architecture of the policy and value network.
-The input to the neural network is a [board_size * board_size * 17] image stack
-comprising 17 binary feature planes. 8 feature planes consist of binary values
-indicating the presence of the current player's stones; A further 8 feature
-planes represent the corresponding features for the opponent's stones; The final
-feature plane represents the color to play, and has a constant value of either 1
-if black is to play or 0 if white to play. Check 'features.py' for more details.
-In MiniGo implementation, the input features are processed by a residual tower
-that consists of a single convolutional block followed by either 9 or 19
-residual blocks.
-The convolutional block applies the following modules:
-  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-Each residual block applies the following modules sequentially to its input:
-  1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-  4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
-  5. Batch normalization
-  6. A skip connection that adds the input to the block
-  7. A rectifier non-linearity
-Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
-The output of the residual tower is passed into two separate "heads" for
-computing the policy and value respectively. The policy head applies the
-following modules:
-  1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-  4. A fully connected linear layer that outputs a vector of size 19^2 + 1 = 362
-  corresponding to logit probabilities for all intersections and the pass move
-The value head applies the following modules:
-  1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
-  2. Batch normalization
-  3. A rectifier non-linearity
-  4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
-    board size and 64 for 9x9 board size
-  5. A rectifier non-linearity
-  6. A fully connected linear layer to a scalar
-  7. A tanh non-linearity outputting a scalar in the range [-1, 1]
-The overall network depth, in the 10 or 20 block network, is 19 or 39
-parameterized layers respectively for the residual tower, plus an additional 2
-layers for the policy head and 3 layers for the value head.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import tensorflow as tf
-_BATCH_NORM_DECAY = 0.997
-_BATCH_NORM_EPSILON = 1e-5
-def _batch_norm(inputs, training, center=True, scale=True):
-  """Performs a batch normalization using a standard set of parameters."""
-  return tf.layers.batch_normalization(
-      inputs=inputs, momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON,
-      center=center, scale=scale, fused=True, training=training)
-def _conv2d(inputs, filters, kernel_size):
-  """Performs 2D convolution with a standard set of parameters."""
-  return tf.layers.conv2d(
-      inputs=inputs, filters=filters, kernel_size=kernel_size,
-      padding='same')
-def _conv_block(inputs, filters, kernel_size, training):
-  """A convolutional block.
-  Args:
-    inputs: A tensor representing a batch of input features with shape
-      [BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
-    filters: The number of filters for network layers in residual tower.
-    kernel_size: The kernel to be used in conv2d.
-    training: Either True or False, whether we are currently training the
-      model. Needed for batch norm.
-  Returns:
-    The output tensor of the convolutional block layer.
-  """
-  conv = _conv2d(inputs, filters, kernel_size)
-  batchn = _batch_norm(conv, training)
-  output = tf.nn.relu(batchn)
-  return output
-def _res_block(inputs, filters, kernel_size, training):
-  """A residual block.
-  Args:
-    inputs: A tensor representing a batch of input features with shape
-      [BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
-    filters: The number of filters for network layers in residual tower.
-    kernel_size: The kernel to be used in conv2d.
-    training: Either True or False, whether we are currently training the
-      model. Needed for batch norm.
-  Returns:
-    The output tensor of the residual block layer.
-  """
-  initial_output = _conv_block(inputs, filters, kernel_size, training)
-  int_layer2_conv = _conv2d(initial_output, filters, kernel_size)
-  int_layer2_batchn = _batch_norm(int_layer2_conv, training)
-  output = tf.nn.relu(inputs + int_layer2_batchn)
-  return output
-class Model(object):
-  """Base class for building the DualNet Model."""
-  def __init__(self, num_filters, num_shared_layers, fc_width, board_size):
-    """Initialize a model for computing the policy and value in RL.
-    Args:
-      num_filters: Number of filters (AlphaGoZero used 256). We use 128 by
-        default for a 19x19 go board, and 32 for 9x9 size.
-      num_shared_layers: Number of shared residual blocks.  AGZ used both 19
-        and 39. Here we use 19 for 19x19 size and 9 for 9x9 size because it's
-        faster to train.
-      fc_width: Dimensionality of the fully connected linear layer.
-      board_size: A single integer for the board size.
-    """
-    self.num_filters = num_filters
-    self.num_shared_layers = num_shared_layers
-    self.fc_width = fc_width
-    self.board_size = board_size
-    self.kernel_size = [3, 3]  # kernel size is from AGZ paper
-  def __call__(self, inputs, training):
-    """Add operations to classify a batch of input Go features.
-    Args:
-      inputs: A Tensor representing a batch of input Go features with shape
-        [BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES]
-      training: A boolean. Set to True to add operations required only when
-        training the classifier.
-    Returns:
-      policy_logits: A vector of size self.board_size * self.board_size + 1
-        corresponding to the policy logit probabilities for all intersections
-        and the pass move.
-      value_logits: A scalar for the value logits output
-    """
-    initial_output = _conv_block(
-        inputs=inputs, filters=self.num_filters,
-        kernel_size=self.kernel_size, training=training)
-    # the shared stack
-    shared_output = initial_output
-    for _ in range(self.num_shared_layers):
-      shared_output = _res_block(
-          inputs=shared_output, filters=self.num_filters,
-          kernel_size=self.kernel_size, training=training)
-    # policy head
-    policy_conv2d = _conv2d(inputs=shared_output, filters=2, kernel_size=[1, 1])
-    policy_batchn = _batch_norm(inputs=policy_conv2d, training=training,
-                                center=False, scale=False)
-    policy_relu = tf.nn.relu(policy_batchn)
-    policy_logits = tf.layers.dense(
-        tf.reshape(policy_relu, [-1, self.board_size * self.board_size * 2]),
-        self.board_size * self.board_size + 1)
-    # value head
-    value_conv2d = _conv2d(shared_output, filters=1, kernel_size=[1, 1])
-    value_batchn = _batch_norm(value_conv2d, training,
-                               center=False, scale=False)
-    value_relu = tf.nn.relu(value_batchn)
-    value_fc_hidden = tf.nn.relu(tf.layers.dense(
-        tf.reshape(value_relu, [-1, self.board_size * self.board_size]),
-        self.fc_width))
-    value_logits = tf.reshape(tf.layers.dense(value_fc_hidden, 1), [-1])
-    return policy_logits, value_logits
-def model_fn(features, labels, mode, params, config=None):  # pylint: disable=unused-argument
-  """DualNet model function.
-  Args:
-    features: tensor with shape
-      [BATCH_SIZE, self.board_size, self.board_size,
-      features.NEW_FEATURES_PLANES]
-    labels: dict from string to tensor with shape
-      'pi_tensor': [BATCH_SIZE, self.board_size * self.board_size + 1]
-      'value_tensor': [BATCH_SIZE]
-    mode: a tf.estimator.ModeKeys (batchnorm params update for TRAIN only)
-    params: an object of hyperparams
-    config: ignored; is required by Estimator API.
-  Returns:
-    EstimatorSpec parameterized according to the input params and the current
-    mode.
-  """
-  model = Model(params.num_filters, params.num_shared_layers, params.fc_width,
-                params.board_size)
-  policy_logits, value_logits = model(
-      features, mode == tf.estimator.ModeKeys.TRAIN)
-  policy_output = tf.nn.softmax(policy_logits, name='policy_output')
-  value_output = tf.nn.tanh(value_logits, name='value_output')
-  # Calculate model loss. The loss function sums over the mean-squared error,
-  # the cross-entropy losses and the l2 regularization term.
-  # Cross-entropy of policy
-  policy_entropy = -tf.reduce_mean(tf.reduce_sum(
-      policy_output * tf.log(policy_output), axis=1))
-  policy_cost = tf.reduce_mean(
-      tf.nn.softmax_cross_entropy_with_logits(
-          logits=policy_logits, labels=labels['pi_tensor']))
-  # Mean squared error
-  value_cost = tf.reduce_mean(
-      tf.square(value_output - labels['value_tensor']))
-  # L2 term
-  l2_cost = params.l2_strength * tf.add_n(
-      [tf.nn.l2_loss(v) for v in tf.trainable_variables()
-       if 'bias' not in v.name])
-  # The loss function
-  combined_cost = policy_cost + value_cost + l2_cost
-  # Get model train ops
-  global_step = tf.train.get_or_create_global_step()
-  boundaries = [int(1e6), int(2e6)]
-  values = [1e-2, 1e-3, 1e-4]
-  learning_rate = tf.train.piecewise_constant(
-      global_step, boundaries, values)
-  update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
-  with tf.control_dependencies(update_ops):
-    train_op = tf.train.MomentumOptimizer(
-        learning_rate, params.momentum).minimize(
-            combined_cost, global_step=global_step)
-  # Create multiple tensors for logging purpose
-  metric_ops = {
-      'accuracy': tf.metrics.accuracy(labels=labels['pi_tensor'],
-                                      predictions=policy_output,
-                                      name='accuracy_op'),
-      'policy_cost': tf.metrics.mean(policy_cost),
-      'value_cost': tf.metrics.mean(value_cost),
-      'l2_cost': tf.metrics.mean(l2_cost),
-      'policy_entropy': tf.metrics.mean(policy_entropy),
-      'combined_cost': tf.metrics.mean(combined_cost),
-  }
-  for metric_name, metric_op in metric_ops.items():
-    tf.summary.scalar(metric_name, metric_op[1])
-  # Return tf.estimator.EstimatorSpec
-  return tf.estimator.EstimatorSpec(
-      mode=mode,
-      predictions={
-          'policy_output': policy_output,
-          'value_output': value_output,
-      },
-      loss=combined_cost,
-      train_op=train_op,
-      eval_metric_ops=metric_ops)
--- a/research/minigo/dualnet_test.py
+++ b/research/minigo/dualnet_test.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for dualnet and dualnet_model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import tempfile
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import dualnet
-import go
-import model_params
-import preprocessing
-import utils_test
-tf.logging.set_verbosity(tf.logging.ERROR)
-class TestDualNet(utils_test.MiniGoUnitTest):
-  def test_train(self):
-    with tempfile.TemporaryDirectory() as working_dir, \
-        tempfile.NamedTemporaryFile() as tf_record:
-      preprocessing.make_dataset_from_sgf(
-          utils_test.BOARD_SIZE, 'example_game.sgf', tf_record.name)
-      dualnet.train(
-          working_dir, [tf_record.name], 1, model_params.DummyMiniGoParams())
-  def test_inference(self):
-    with tempfile.TemporaryDirectory() as working_dir, \
-        tempfile.TemporaryDirectory() as export_dir:
-      dualnet.bootstrap(working_dir, model_params.DummyMiniGoParams())
-      exported_model = os.path.join(export_dir, 'bootstrap-model')
-      dualnet.export_model(working_dir, exported_model)
-      n1 = dualnet.DualNetRunner(
-          exported_model, model_params.DummyMiniGoParams())
-      n1.run(go.Position(utils_test.BOARD_SIZE))
-      n2 = dualnet.DualNetRunner(
-          exported_model, model_params.DummyMiniGoParams())
-      n2.run(go.Position(utils_test.BOARD_SIZE))
-if __name__ == '__main__':
-  tf.test.main()
--- a/research/minigo/evaluation.py
+++ b/research/minigo/evaluation.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Evaluation of playing games between two neural nets."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import time
-import go
-from gtp_wrapper import MCTSPlayer
-import sgf_wrapper
-def play_match(params, black_net, white_net, games, readouts,
-               sgf_dir, verbosity):
-  """Plays matches between two neural nets.
-  One net that wins by a margin of 55% will be the winner.
-  Args:
-    params: An object of hyperparameters.
-    black_net: Instance of the DualNetRunner class to play as black.
-    white_net: Instance of the DualNetRunner class to play as white.
-    games: Number of games to play. We play all the games at the same time.
-    readouts: Number of readouts to perform for each step in each game.
-    sgf_dir: Directory to write the sgf results.
-    verbosity: Verbosity to show evaluation process.
-  Returns:
-    'B' is the winner is black_net, otherwise 'W'.
-  """
-  # For n games, we create lists of n black and n white players
-  black = MCTSPlayer(
-      params.board_size, black_net, verbosity=verbosity, two_player_mode=True,
-      num_parallel=params.simultaneous_leaves)
-  white = MCTSPlayer(
-      params.board_size, white_net, verbosity=verbosity, two_player_mode=True,
-      num_parallel=params.simultaneous_leaves)
-  black_name = os.path.basename(black_net.save_file)
-  white_name = os.path.basename(white_net.save_file)
-  black_win_counts = 0
-  white_win_counts = 0
-  for i in range(games):
-    num_move = 0  # The move number of the current game
-    black.initialize_game()
-    white.initialize_game()
-    while True:
-      start = time.time()
-      active = white if num_move % 2 else black
-      inactive = black if num_move % 2 else white
-      current_readouts = active.root.N
-      while active.root.N < current_readouts + readouts:
-        active.tree_search()
-      # print some stats on the search
-      if verbosity >= 3:
-        print(active.root.position)
-      # First, check the roots for hopeless games.
-      if active.should_resign():  # Force resign
-        active.set_result(-active.root.position.to_play, was_resign=True)
-        inactive.set_result(
-            active.root.position.to_play, was_resign=True)
-      if active.is_done():
-        fname = '{:d}-{:s}-vs-{:s}-{:d}.sgf'.format(
-            int(time.time()), white_name, black_name, i)
-        with open(os.path.join(sgf_dir, fname), 'w') as f:
-          sgfstr = sgf_wrapper.make_sgf(
-              params.board_size, active.position.recent, active.result_string,
-              black_name=black_name, white_name=white_name)
-          f.write(sgfstr)
-        print('Finished game', i, active.result_string)
-        if active.result_string is not None:
-          if active.result_string[0] == 'B':
-            black_win_counts += 1
-          elif active.result_string[0] == 'W':
-            white_win_counts += 1
-        break
-      move = active.pick_move()
-      active.play_move(move)
-      inactive.play_move(move)
-      dur = time.time() - start
-      num_move += 1
-      if (verbosity > 1) or (verbosity == 1 and num_move % 10 == 9):
-        timeper = (dur / readouts) * 100.0
-        print(active.root.position)
-        print('{:d}: {:d} readouts, {:.3f} s/100. ({:.2f} sec)'.format(
-            num_move, readouts, timeper, dur))
-  if (black_win_counts - white_win_counts) > params.eval_win_rate * games:
-    return go.BLACK_NAME
-  else:
-    return go.WHITE_NAME
--- a/research/minigo/example_game.sgf
+++ b/research/minigo/example_game.sgf
-(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
-RU[Japanese]SZ[9]KM[0.00]
-PW[White]PB[Black]RE[B+4.00]
-;B[de]
-;W[fe]
-;B[ee]
-;W[fd]
-;B[ff]
-;W[gf]
-;B[gg]
-;W[fg]
-;B[ef]
-;W[gh]
-;B[hg]
-;W[hh]
-;B[eg]
-;W[fh]
-;B[ge]
-;W[hf]
-;B[he]
-;W[ig]
-;B[fc]
-;W[gd]
-;B[gc]
-;W[hd]
-;B[ed]
-;W[be]
-;B[hc]
-;W[ie]
-;B[bc]
-;W[cg]
-;B[cf]
-;W[bf]
-;B[ch]
-(;W[dg]
-;B[dh]
-;W[bh]
-;B[eh]
-;W[cc]
-;B[cb])
-(;W[cc]
-;B[cb]
-(;W[bh]
-;B[dh])
-(;W[dg]
-;B[dh]
-;W[bh]
-;B[eh]
-;W[dc]
-;B[bd]
-;W[ec]
-;B[cd]
-;W[fb]
-;B[gb]
-(;W[db])
-(;W[bb]
-;B[eb]
-;W[db]
-;B[fa]
-;W[ca]
-;B[ea]
-;W[da]
-;B[df]
-;W[bg]
-;B[bi]
-;W[ab]
-;B[ah]
-;W[ci]
-;B[di]
-;W[ag]
-;B[ae]
-;W[ac]
-;B[ad]
-;W[ha]
-;B[hb]
-;W[fi]
-;B[ce]
-;W[ai]
-;B[ci]
-;W[ei]
-;B[ah]
-;W[ic]
-;B[ib]
-;W[ai]
-;B[ba]
-;W[aa]
-;B[ah]
-;W[ga]
-;B[ia]
-;W[ai]
-;B[ga]
-;W[id]
-;B[ah]
-;W[dd]
-;B[af]TW[ba][cb][ge][he][if][gg][hg][ih][gi][hi][ii]TB[ha][fb][be][bf][ag][bg][cg][dg][bh][ai]))))
--- a/research/minigo/features.py
+++ b/research/minigo/features.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Features used by AlphaGo Zero, in approximate order of importance.
-Feature                 # Notes
-Stone History           16 The stones of each color during the last 8 moves.
-Ones                    1  Constant plane of 1s
-All features with 8 planes are 1-hot encoded, with plane i marked with 1
-only if the feature was equal to i. Any features >= 8 would be marked as 8.
-This file includes the features from from AlphaGo Zero (AGZ) as NEW_FEATURES.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import go
-import numpy as np
-def planes(num_planes):
-  # to specify the number of planes in the features. For example, for a 19x19
-  # go board, the input stone feature will be in the shape of [19, 19, 16],
-  # where the third dimension is the num_planes.
-  def deco(f):
-    f.planes = num_planes
-    return f
-  return deco
-@planes(16)
-def stone_features(board_size, position):
-  """Create the 16 planes of features for a given position.
-  Args:
-    board_size: the go board size.
-    position: a given go board status.
-  Returns:
-    The 16 plane features.
-  """
-  # a bit easier to calculate it with axis 0 being the 16 board states,
-  # and then roll axis 0 to the end.
-  features = np.zeros([16, board_size, board_size], dtype=np.uint8)
-  num_deltas_avail = position.board_deltas.shape[0]
-  cumulative_deltas = np.cumsum(position.board_deltas, axis=0)
-  last_eight = np.tile(position.board, [8, 1, 1])
-  # apply deltas to compute previous board states
-  last_eight[1:num_deltas_avail + 1] -= cumulative_deltas
-  # if no more deltas are available, just repeat oldest board.
-  last_eight[num_deltas_avail + 1:] = last_eight[num_deltas_avail].reshape(
-      1, board_size, board_size)
-  features[::2] = last_eight == position.to_play
-  features[1::2] = last_eight == -position.to_play
-  return np.rollaxis(features, 0, 3)
-@planes(1)
-def color_to_play_feature(board_size, position):
-  if position.to_play == go.BLACK:
-    return np.ones([board_size, board_size, 1], dtype=np.uint8)
-  else:
-    return np.zeros([board_size, board_size, 1], dtype=np.uint8)
-NEW_FEATURES = [
-    stone_features,
-    color_to_play_feature
-]
-NEW_FEATURES_PLANES = sum(f.planes for f in NEW_FEATURES)
-def extract_features(board_size, position, features=None):
-  if features is None:
-    features = NEW_FEATURES
-  return np.concatenate([feature(board_size, position) for feature in features],
-                        axis=2)
--- a/research/minigo/features_test.py
+++ b/research/minigo/features_test.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for features."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import features
-import go
-import numpy as np
-import utils_test
-tf.logging.set_verbosity(tf.logging.ERROR)
-EMPTY_ROW = '.' * utils_test.BOARD_SIZE + '\n'
-TEST_BOARD = utils_test.load_board('''
-.X.....OO
-X........
-XXXXXXXXX
-''' + EMPTY_ROW * 6)
-TEST_POSITION = go.Position(
-    utils_test.BOARD_SIZE,
-    board=TEST_BOARD,
-    n=3,
-    komi=6.5,
-    caps=(1, 2),
-    ko=None,
-    recent=(go.PlayerMove(go.BLACK, (0, 1)),
-            go.PlayerMove(go.WHITE, (0, 8)),
-            go.PlayerMove(go.BLACK, (1, 0))),
-    to_play=go.BLACK,
-)
-TEST_BOARD2 = utils_test.load_board('''
-.XOXXOO..
-XO.OXOX..
-XXO..X...
-''' + EMPTY_ROW * 6)
-TEST_POSITION2 = go.Position(
-    utils_test.BOARD_SIZE,
-    board=TEST_BOARD2,
-    n=0,
-    komi=6.5,
-    caps=(0, 0),
-    ko=None,
-    recent=tuple(),
-    to_play=go.BLACK,
-)
-TEST_POSITION3 = go.Position(utils_test.BOARD_SIZE)
-for coord in ((0, 0), (0, 1), (0, 2), (0, 3), (1, 1)):
-  TEST_POSITION3.play_move(coord, mutate=True)
-# resulting position should look like this:
-# X.XO.....
-# .X.......
-# .........
-class TestFeatureExtraction(utils_test.MiniGoUnitTest):
-  def test_stone_features(self):
-    f = features.stone_features(utils_test.BOARD_SIZE, TEST_POSITION3)
-    self.assertEqual(TEST_POSITION3.to_play, go.WHITE)
-    self.assertEqual(f.shape, (9, 9, 16))
-    self.assertEqualNPArray(f[:, :, 0], utils_test.load_board('''
-      ...X.....
-      .........''' + EMPTY_ROW * 7))
-    self.assertEqualNPArray(f[:, :, 1], utils_test.load_board('''
-      X.X......
-      .X.......''' + EMPTY_ROW * 7))
-    self.assertEqualNPArray(f[:, :, 2], utils_test.load_board('''
-      .X.X.....
-      .........''' + EMPTY_ROW * 7))
-    self.assertEqualNPArray(f[:, :, 3], utils_test.load_board('''
-      X.X......
-      .........''' + EMPTY_ROW * 7))
-    self.assertEqualNPArray(f[:, :, 4], utils_test.load_board('''
-      .X.......
-      .........''' + EMPTY_ROW * 7))
-    self.assertEqualNPArray(f[:, :, 5], utils_test.load_board('''
-      X.X......
-      .........''' + EMPTY_ROW * 7))
-    for i in range(10, 16):
-      self.assertEqualNPArray(
-          f[:, :, i], np.zeros([utils_test.BOARD_SIZE, utils_test.BOARD_SIZE]))
-if __name__ == '__main__':
-  tf.test.main()
--- a/research/minigo/go.py
+++ b/research/minigo/go.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Describe the Go game status.
-A board is a NxN numpy array.
-A Coordinate is a tuple index into the board.
-A Move is a (Coordinate c | None).
-A PlayerMove is a (Color, Move) tuple
-(0, 0) is considered to be the upper left corner of the board, and (18, 0)
-is the lower left.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from collections import namedtuple
-import copy
-import itertools
-import coords
-import numpy as np
-# Represent a board as a numpy array, with 0 empty, 1 is black, -1 is white.
-# This means that swapping colors is as simple as multiplying array by -1.
-WHITE, EMPTY, BLACK, FILL, KO, UNKNOWN = range(-1, 5)
-# Represents "group not found" in the LibertyTracker object
-MISSING_GROUP_ID = -1
-BLACK_NAME = 'BLACK'
-WHITE_NAME = 'WHITE'
-def _check_bounds(board_size, c):
-  return c[0] % board_size == c[0] and c[1] % board_size == c[1]
-def get_neighbors_diagonals(board_size):
-  """Return coordinates of neighbors and diagonals for a go board."""
-  all_coords = [(i, j) for i in range(board_size) for j in range(board_size)]
-  def check_bounds(c):
-    return _check_bounds(board_size, c)
-  neighbors = {(x, y): list(filter(check_bounds, [
-      (x+1, y), (x-1, y), (x, y+1), (x, y-1)])) for x, y in all_coords}
-  diagonals = {(x, y): list(filter(check_bounds, [
-      (x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)])) for x, y in all_coords}
-  return neighbors, diagonals
-class IllegalMove(Exception):
-  pass
-class PlayerMove(namedtuple('PlayerMove', ['color', 'move'])):
-  pass
-class PositionWithContext(namedtuple('SgfPosition',
-                                     ['position', 'next_move', 'result'])):
-  pass
-def place_stones(board, color, stones):
-  for s in stones:
-    board[s] = color
-def replay_position(board_size, position, result):
-  """Wrapper for a go.Position which replays its history."""
-  # Assumes an empty start position! (i.e. no handicap, and history must
-  # be exhaustive.)
-  # Result must be passed in, since a resign cannot be inferred from position
-  # history alone.
-  # for position_w_context in replay_position(position):
-  #   print(position_w_context.position)
-  if position.n != len(position.recent):
-    raise ValueError('Position history is incomplete!')
-  pos = Position(board_size=board_size, komi=position.komi)
-  for player_move in position.recent:
-    color, next_move = player_move
-    yield PositionWithContext(pos, next_move, result)
-    pos = pos.play_move(next_move, color=color)
-def find_reached(board_size, board, c):
-  """Find the chain to reach c."""
-  color = board[c]
-  chain = set([c])
-  reached = set()
-  frontier = [c]
-  neighbors, _ = get_neighbors_diagonals(board_size)
-  while frontier:
-    current = frontier.pop()
-    chain.add(current)
-    for n in neighbors[current]:
-      if board[n] == color and n not in chain:
-        frontier.append(n)
-      elif board[n] != color:
-        reached.add(n)
-  return chain, reached
-def is_koish(board_size, board, c):
-  """Check if c is surrounded on all sides by 1 color, and return that color."""
-  if board[c] != EMPTY:
-    return None
-  full_neighbors, _ = get_neighbors_diagonals(board_size)
-  neighbors = {board[n] for n in full_neighbors[c]}
-  if len(neighbors) == 1 and EMPTY not in neighbors:
-    return list(neighbors)[0]
-  else:
-    return None
-def is_eyeish(board_size, board, c):
-  """Check if c is an eye, for the purpose of restricting MC rollouts."""
-  # pass is fine.
-  if c is None:
-    return
-  color = is_koish(board_size, board, c)
-  if color is None:
-    return None
-  diagonal_faults = 0
-  _, all_diagonals = get_neighbors_diagonals(board_size)
-  diagonals = all_diagonals[c]
-  if len(diagonals) < 4:
-    diagonal_faults += 1
-  for d in diagonals:
-    if board[d] not in (color, EMPTY):
-      diagonal_faults += 1
-  if diagonal_faults > 1:
-    return None
-  else:
-    return color
-class Group(namedtuple('Group', ['id', 'stones', 'liberties', 'color'])):
-  """Group class.
-  stones: a frozenset of Coordinates belonging to this group
-  liberties: a frozenset of Coordinates that are empty and adjacent to
-    this group.
-  color: color of this group
-  """
-  def __eq__(self, other):
-    return (self.stones == other.stones and self.liberties == other.liberties
-            and self.color == other.color)
-class LibertyTracker(object):
-  """LibertyTracker class."""
-  @staticmethod
-  def from_board(board_size, board):
-    board = np.copy(board)
-    curr_group_id = 0
-    lib_tracker = LibertyTracker(board_size)
-    for color in (WHITE, BLACK):
-      while color in board:
-        curr_group_id += 1
-        found_color = np.where(board == color)
-        coord = found_color[0][0], found_color[1][0]
-        chain, reached = find_reached(board_size, board, coord)
-        liberties = frozenset(r for r in reached if board[r] == EMPTY)
-        new_group = Group(curr_group_id, frozenset(
-            chain), liberties, color)
-        lib_tracker.groups[curr_group_id] = new_group
-        for s in chain:
-          lib_tracker.group_index[s] = curr_group_id
-        place_stones(board, FILL, chain)
-    lib_tracker.max_group_id = curr_group_id
-    liberty_counts = np.zeros([board_size, board_size], dtype=np.uint8)
-    for group in lib_tracker.groups.values():
-      num_libs = len(group.liberties)
-      for s in group.stones:
-        liberty_counts[s] = num_libs
-    lib_tracker.liberty_cache = liberty_counts
-    return lib_tracker
-  def __init__(self, board_size, group_index=None, groups=None,
-               liberty_cache=None, max_group_id=1):
-    # group_index: a NxN numpy array of group_ids. -1 means no group
-    # groups: a dict of group_id to groups
-    # liberty_cache: a NxN numpy array of liberty counts
-    self.board_size = board_size
-    self.group_index = (group_index if group_index is not None else
-                        -np.ones([board_size, board_size], dtype=np.int32))
-    self.groups = groups or {}
-    self.liberty_cache = (
-        liberty_cache if liberty_cache is not None
-        else -np.zeros([board_size, board_size], dtype=np.uint8))
-    self.max_group_id = max_group_id
-    self.neighbors, _ = get_neighbors_diagonals(board_size)
-  def __deepcopy__(self, memodict=None):
-    new_group_index = np.copy(self.group_index)
-    new_lib_cache = np.copy(self.liberty_cache)
-    # shallow copy
-    new_groups = copy.copy(self.groups)
-    return LibertyTracker(
-        self.board_size, new_group_index, new_groups,
-        liberty_cache=new_lib_cache, max_group_id=self.max_group_id)
-  def add_stone(self, color, c):
-    assert self.group_index[c] == MISSING_GROUP_ID
-    captured_stones = set()
-    opponent_neighboring_group_ids = set()
-    friendly_neighboring_group_ids = set()
-    empty_neighbors = set()
-    for n in self.neighbors[c]:
-      neighbor_group_id = self.group_index[n]
-      if neighbor_group_id != MISSING_GROUP_ID:
-        neighbor_group = self.groups[neighbor_group_id]
-        if neighbor_group.color == color:
-          friendly_neighboring_group_ids.add(neighbor_group_id)
-        else:
-          opponent_neighboring_group_ids.add(neighbor_group_id)
-      else:
-        empty_neighbors.add(n)
-    new_group = self._create_group(color, c, empty_neighbors)
-    for group_id in friendly_neighboring_group_ids:
-      new_group = self._merge_groups(group_id, new_group.id)
-    # new_group becomes stale as _update_liberties and
-    # _handle_captures are called; must refetch with self.groups[new_group.id]
-    for group_id in opponent_neighboring_group_ids:
-      neighbor_group = self.groups[group_id]
-      if len(neighbor_group.liberties) == 1:
-        captured = self._capture_group(group_id)
-        captured_stones.update(captured)
-      else:
-        self._update_liberties(group_id, remove={c})
-    self._handle_captures(captured_stones)
-    # suicide is illegal
-    if self.groups[new_group.id].liberties is None:
-      raise IllegalMove('Move at {} would commit suicide!\n'.format(c))
-    return captured_stones
-  def _create_group(self, color, c, liberties):
-    self.max_group_id += 1
-    new_group = Group(self.max_group_id, frozenset([c]), liberties, color)
-    self.groups[new_group.id] = new_group
-    self.group_index[c] = new_group.id
-    self.liberty_cache[c] = len(liberties)
-    return new_group
-  def _merge_groups(self, group1_id, group2_id):
-    group1 = self.groups[group1_id]
-    group2 = self.groups[group2_id]
-    self.groups[group1_id] = Group(
-        group1_id, group1.stones | group2.stones, group1.liberties,
-        group1.color)
-    del self.groups[group2_id]
-    for s in group2.stones:
-      self.group_index[s] = group1_id
-    self._update_liberties(
-        group1_id, add=group2.liberties, remove=group2.stones)
-    return group1
-  def _capture_group(self, group_id):
-    dead_group = self.groups[group_id]
-    del self.groups[group_id]
-    for s in dead_group.stones:
-      self.group_index[s] = MISSING_GROUP_ID
-      self.liberty_cache[s] = 0
-    return dead_group.stones
-  def _update_liberties(self, group_id, add=set(), remove=set()):
-    group = self.groups[group_id]
-    new_libs = (group.liberties | add) - remove
-    self.groups[group_id] = Group(
-        group_id, group.stones, new_libs, group.color)
-    new_lib_count = len(new_libs)
-    for s in self.groups[group_id].stones:
-      self.liberty_cache[s] = new_lib_count
-  def _handle_captures(self, captured_stones):
-    for s in captured_stones:
-      for n in self.neighbors[s]:
-        group_id = self.group_index[n]
-        if group_id != MISSING_GROUP_ID:
-          self._update_liberties(group_id, add={s})
-class Position(object):
-  def __init__(self, board_size, board=None, n=0, komi=7.5, caps=(0, 0),
-               lib_tracker=None, ko=None, recent=tuple(),
-               board_deltas=None, to_play=BLACK):
-    """Initialize position class.
-    Args:
-      board_size: the go board size.
-      board: a numpy array
-      n: an int representing moves played so far
-      komi: a float, representing points given to the second player.
-      caps: a (int, int) tuple of captures for B, W.
-      lib_tracker: a LibertyTracker object
-      ko: a Move
-      recent: a tuple of PlayerMoves, such that recent[-1] is the last move.
-      board_deltas: a np.array of shape (n, go.N, go.N) representing changes
-        made to the board at each move (played move and captures).
-        Should satisfy next_pos.board - next_pos.board_deltas[0] == pos.board
-      to_play: BLACK or WHITE
-    """
-    if not isinstance(recent, tuple):
-      raise TypeError('Recent must be a tuple!')
-    self.board_size = board_size
-    self.board = (board if board is not None else
-                  -np.zeros([board_size, board_size], dtype=np.int8))
-    self.n = n
-    self.komi = komi
-    self.caps = caps
-    self.lib_tracker = lib_tracker or LibertyTracker.from_board(
-        self.board_size, self.board)
-    self.ko = ko
-    self.recent = recent
-    self.board_deltas = (board_deltas if board_deltas is not None else
-                         -np.zeros([0, board_size, board_size], dtype=np.int8))
-    self.to_play = to_play
-    self.last_eight = None
-    self.neighbors, _ = get_neighbors_diagonals(board_size)
-  def __deepcopy__(self, memodict=None):
-    new_board = np.copy(self.board)
-    new_lib_tracker = copy.deepcopy(self.lib_tracker)
-    return Position(
-        self.board_size, new_board, self.n, self.komi, self.caps,
-        new_lib_tracker, self.ko, self.recent, self.board_deltas, self.to_play)
-  def __str__(self):
-    pretty_print_map = {
-        WHITE: '\x1b[0;31;47mO',
-        EMPTY: '\x1b[0;31;43m.',
-        BLACK: '\x1b[0;31;40mX',
-        FILL: '#',
-        KO: '*',
-    }
-    board = np.copy(self.board)
-    captures = self.caps
-    if self.ko is not None:
-      place_stones(board, KO, [self.ko])
-    raw_board_contents = []
-    for i in range(self.board_size):
-      row = []
-      for j in range(self.board_size):
-        appended = '<' if (
-            self.recent and (i, j) == self.recent[-1].move) else ' '
-        row.append(pretty_print_map[board[i, j]] + appended)
-        row.append('\x1b[0m')
-      raw_board_contents.append(''.join(row))
-    row_labels = ['%2d ' % i for i in range(self.board_size, 0, -1)]
-    annotated_board_contents = [''.join(r) for r in zip(
-        row_labels, raw_board_contents, row_labels)]
-    header_footer_rows = [
-        '   ' + ' '.join('ABCDEFGHJKLMNOPQRST'[:self.board_size]) + '   ']
-    annotated_board = '\n'.join(itertools.chain(
-        header_footer_rows, annotated_board_contents, header_footer_rows))
-    details = '\nMove: {}. Captures X: {} O: {}\n'.format(
-        self.n, *captures)
-    return annotated_board + details
-  def is_move_suicidal(self, move):
-    potential_libs = set()
-    for n in self.neighbors[move]:
-      neighbor_group_id = self.lib_tracker.group_index[n]
-      if neighbor_group_id == MISSING_GROUP_ID:
-        # at least one liberty after playing here, so not a suicide
-        return False
-      neighbor_group = self.lib_tracker.groups[neighbor_group_id]
-      if neighbor_group.color == self.to_play:
-        potential_libs |= neighbor_group.liberties
-      elif len(neighbor_group.liberties) == 1:
-        # would capture an opponent group if they only had one lib.
-        return False
-    # it's possible to suicide by connecting several friendly groups
-    # each of which had one liberty.
-    potential_libs -= set([move])
-    return not potential_libs
-  def is_move_legal(self, move):
-    """Checks that a move is on an empty space, not on ko, and not suicide."""
-    if move is None:
-      return True
-    if self.board[move] != EMPTY:
-      return False
-    if move == self.ko:
-      return False
-    if self.is_move_suicidal(move):
-      return False
-    return True
-  def all_legal_moves(self):
-    """Returns a np.array of size go.N**2 + 1, with 1 = legal, 0 = illegal."""
-    # by default, every move is legal
-    legal_moves = np.ones([self.board_size, self.board_size], dtype=np.int8)
-    # ...unless there is already a stone there
-    legal_moves[self.board != EMPTY] = 0
-    # calculate which spots have 4 stones next to them
-    # padding is because the edge always counts as a lost liberty.
-    adjacent = np.ones([self.board_size+2, self.board_size+2], dtype=np.int8)
-    adjacent[1:-1, 1:-1] = np.abs(self.board)
-    num_adjacent_stones = (adjacent[:-2, 1:-1] + adjacent[1:-1, :-2] +
-                           adjacent[2:, 1:-1] + adjacent[1:-1, 2:])
-    # Surrounded spots are those that are empty and have 4 adjacent stones.
-    surrounded_spots = np.multiply(
-        (self.board == EMPTY),
-        (num_adjacent_stones == 4))
-    # Such spots are possibly illegal, unless they are capturing something.
-    # Iterate over and manually check each spot.
-    for coord in np.transpose(np.nonzero(surrounded_spots)):
-      if self.is_move_suicidal(tuple(coord)):
-        legal_moves[tuple(coord)] = 0
-    # ...and retaking ko is always illegal
-    if self.ko is not None:
-      legal_moves[self.ko] = 0
-    # and pass is always legal
-    return np.concatenate([legal_moves.ravel(), [1]])
-  def pass_move(self, mutate=False):
-    pos = self if mutate else copy.deepcopy(self)
-    pos.n += 1
-    pos.recent += (PlayerMove(pos.to_play, None),)
-    pos.board_deltas = np.concatenate((
-        np.zeros([1, self.board_size, self.board_size], dtype=np.int8),
-        pos.board_deltas[:6]))
-    pos.to_play *= -1
-    pos.ko = None
-    return pos
-  def flip_playerturn(self, mutate=False):
-    pos = self if mutate else copy.deepcopy(self)
-    pos.ko = None
-    pos.to_play *= -1
-    return pos
-  def get_liberties(self):
-    return self.lib_tracker.liberty_cache
-  def play_move(self, c, color=None, mutate=False):
-    """Obeys CGOS Rules of Play.
-    In short:
-    No suicides
-    Chinese/area scoring
-    Positional superko (this is very crudely approximate at the moment.)
-    Args:
-      c: the coordinate to play from.
-      color: the color of the player to play.
-      mutate:
-    Returns:
-      The position of next move.
-    Raises:
-      IllegalMove: if the input c is an illegal move.
-    """
-    if color is None:
-      color = self.to_play
-    pos = self if mutate else copy.deepcopy(self)
-    if c is None:
-      pos = pos.pass_move(mutate=mutate)
-      return pos
-    if not self.is_move_legal(c):
-      raise IllegalMove('{} move at {} is illegal: \n{}'.format(
-          'Black' if self.to_play == BLACK else 'White',
-          coords.to_kgs(self.board_size, c), self))
-    potential_ko = is_koish(self.board_size, self.board, c)
-    place_stones(pos.board, color, [c])
-    captured_stones = pos.lib_tracker.add_stone(color, c)
-    place_stones(pos.board, EMPTY, captured_stones)
-    opp_color = -1 * color
-    new_board_delta = np.zeros([self.board_size, self.board_size],
-                               dtype=np.int8)
-    new_board_delta[c] = color
-    place_stones(new_board_delta, color, captured_stones)
-    if len(captured_stones) == 1 and potential_ko == opp_color:
-      new_ko = list(captured_stones)[0]
-    else:
-      new_ko = None
-    if pos.to_play == BLACK:
-      new_caps = (pos.caps[0] + len(captured_stones), pos.caps[1])
-    else:
-      new_caps = (pos.caps[0], pos.caps[1] + len(captured_stones))
-    pos.n += 1
-    pos.caps = new_caps
-    pos.ko = new_ko
-    pos.recent += (PlayerMove(color, c),)
-    # keep a rolling history of last 7 deltas - that's all we'll need to
-    # extract the last 8 board states.
-    pos.board_deltas = np.concatenate((
-        new_board_delta.reshape(1, self.board_size, self.board_size),
-        pos.board_deltas[:6]))
-    pos.to_play *= -1
-    return pos
-  def is_game_over(self):
-    return (len(self.recent) >= 2
-            and self.recent[-1].move is None
-            and self.recent[-2].move is None)
-  def score(self):
-    """Return score from B perspective. If W is winning, score is negative."""
-    working_board = np.copy(self.board)
-    while EMPTY in working_board:
-      unassigned_spaces = np.where(working_board == EMPTY)
-      c = unassigned_spaces[0][0], unassigned_spaces[1][0]
-      territory, borders = find_reached(self.board_size, working_board, c)
-      border_colors = set(working_board[b] for b in borders)
-      X_border = BLACK in border_colors   # pylint: disable=invalid-name
-      O_border = WHITE in border_colors   # pylint: disable=invalid-name
-      if X_border and not O_border:
-        territory_color = BLACK
-      elif O_border and not X_border:
-        territory_color = WHITE
-      else:
-        territory_color = UNKNOWN  # dame, or seki
-      place_stones(working_board, territory_color, territory)
-    return np.count_nonzero(working_board == BLACK) - np.count_nonzero(
-        working_board == WHITE) - self.komi
-  def result(self):
-    score = self.score()
-    if score > 0:
-      return 1
-    elif score < 0:
-      return -1
-    else:
-      return 0
-  def result_string(self):
-    score = self.score()
-    if score > 0:
-      return 'B+' + '{:.1f}'.format(score)
-    elif score < 0:
-      return 'W+' + '{:.1f}'.format(abs(score))
-    else:
-      return 'DRAW'
--- a/research/minigo/go_test.py
+++ b/research/minigo/go_test.py
--- a/research/minigo/gtp_extensions.py
+++ b/research/minigo/gtp_extensions.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the 'License');
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an 'AS IS' BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Extends gtp.py."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import itertools
-import sys
-import coords
-import go
-import gtp
-import sgf_wrapper
-def parse_message(message):
-  message = gtp.pre_engine(message).strip()
-  first, rest = (message.split(' ', 1) + [None])[:2]
-  if first.isdigit():
-    message_id = int(first)
-    if rest is not None:
-      command, arguments = (rest.split(' ', 1) + [None])[:2]
-    else:
-      command, arguments = None, None
-  else:
-    message_id = None
-    command, arguments = first, rest
-  command = command.replace('-', '_')  # for kgs extensions.
-  return message_id, command, arguments
-class KgsExtensionsMixin(gtp.Engine):
-  def __init__(self, game_obj, name='gtp (python, kgs-chat extensions)',
-               version='0.1'):
-    super().__init__(game_obj=game_obj, name=name, version=version)
-    self.known_commands += ['kgs-chat']
-  def send(self, message):
-    message_id, command, arguments = parse_message(message)
-    if command in self.known_commands:
-      try:
-        retval = getattr(self, 'cmd_' + command)(arguments)
-        response = gtp.format_success(message_id, retval)
-        sys.stderr.flush()
-        return response
-      except ValueError as exception:
-        return gtp.format_error(message_id, exception.args[0])
-    else:
-      return gtp.format_error(message_id, 'unknown command: ' + command)
-  # Nice to implement this, as KGS sends it each move.
-  def cmd_time_left(self, arguments):
-    pass
-  def cmd_showboard(self, arguments):
-    return self._game.showboard()
-  def cmd_kgs_chat(self, arguments):
-    try:
-      arg_list = arguments.split()
-      msg_type, sender, text = arg_list[0], arg_list[1], arg_list[2:]
-      text = ' '.join(text)
-    except ValueError:
-      return 'Unparseable message, args: %r' % arguments
-    return self._game.chat(msg_type, sender, text)
-class RegressionsMixin(gtp.Engine):
-  def cmd_loadsgf(self, arguments):
-    args = arguments.split()
-    if len(args) == 2:
-      file_, movenum = args
-      movenum = int(movenum)
-      print('movenum =', movenum, file=sys.stderr)
-    else:
-      file_ = args[0]
-      movenum = None
-    try:
-      with open(file_, 'r') as f:
-        contents = f.read()
-    except:
-      raise ValueError('Unreadable file: ' + file_)
-    try:
-      # This is kinda bad, because replay_sgf is already calling
-      # 'play move' on its internal position objects, but we really
-      # want to advance the engine along with us rather than try to
-      # push in some finished Position object.
-      for idx, p in enumerate(sgf_wrapper.replay_sgf(contents)):
-        print('playing #', idx, p.next_move, file=sys.stderr)
-        self._game.play_move(p.next_move)
-        if movenum and idx == movenum:
-          break
-    except:
-      raise
-class GoGuiMixin(gtp.Engine):
-  """GTP extensions of 'analysis commands' for gogui.
-  We reach into the game_obj (an instance of the players in strategies.py),
-  and extract stuff from its root nodes, etc.  These could be extracted into
-  methods on the Player object, but its a little weird to do that on a Player,
-  which doesn't really care about GTP commands, etc.  So instead, we just
-  violate encapsulation a bit.
-  """
-  def __init__(self, game_obj, name='gtp (python, gogui extensions)',
-               version='0.1'):
-    super().__init__(game_obj=game_obj, name=name, version=version)
-    self.known_commands += ['gogui-analyze_commands']
-  def cmd_gogui_analyze_commands(self, arguments):
-    return '\n'.join(['var/Most Read Variation/nextplay',
-                      'var/Think a spell/spin',
-                      'pspairs/Visit Heatmap/visit_heatmap',
-                      'pspairs/Q Heatmap/q_heatmap'])
-  def cmd_nextplay(self, arguments):
-    return self._game.root.mvp_gg()
-  def cmd_visit_heatmap(self, arguments):
-    sort_order = list(range(self._game.size * self._game.size + 1))
-    sort_order.sort(key=lambda i: self._game.root.child_N[i], reverse=True)
-    return self.heatmap(sort_order, self._game.root, 'child_N')
-  def cmd_q_heatmap(self, arguments):
-    sort_order = list(range(self._game.size * self._game.size + 1))
-    reverse = True if self._game.root.position.to_play is go.BLACK else False
-    sort_order.sort(
-        key=lambda i: self._game.root.child_Q[i], reverse=reverse)
-    return self.heatmap(sort_order, self._game.root, 'child_Q')
-  def heatmap(self, sort_order, node, prop):
-    return '\n'.join(['{!s:6} {}'.format(
-        coords.to_kgs(coords.from_flat(key)), node.__dict__.get(prop)[key])
-                      for key in sort_order if node.child_N[key] > 0][:20])
-  def cmd_spin(self, arguments):
-    for _ in range(50):
-      for _ in range(100):
-        self._game.tree_search()
-      moves = self.cmd_nextplay(None).lower()
-      moves = moves.split()
-      colors = 'bw' if self._game.root.position.to_play is go.BLACK else 'wb'
-      moves_cols = ' '.join(['{} {}'.format(*z)
-                             for z in zip(itertools.cycle(colors), moves)])
-      print('gogui-gfx: TEXT', '{:.3f} after {}'.format(
-          self._game.root.Q, self._game.root.N), file=sys.stderr, flush=True)
-      print('gogui-gfx: VAR', moves_cols, file=sys.stderr, flush=True)
-    return self.cmd_nextplay(None)
-class GTPDeluxe(KgsExtensionsMixin, RegressionsMixin, GoGuiMixin):
-  pass
--- a/research/minigo/gtp_wrapper.py
+++ b/research/minigo/gtp_wrapper.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""A wrapper of gtp and gtp_extensions."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import datetime
-import os
-import sys
-import coords
-from dualnet import DualNetRunner
-import go
-import gtp
-import gtp_extensions
-from strategies import MCTSPlayerMixin, CGOSPlayerMixin
-def translate_gtp_colors(gtp_color):
-  if gtp_color == gtp.BLACK:
-    return go.BLACK
-  elif gtp_color == gtp.WHITE:
-    return go.WHITE
-  else:
-    return go.EMPTY
-class GtpInterface(object):
-  def __init__(self, board_size):
-    self.size = 9
-    self.position = None
-    self.komi = 6.5
-    self.board_size = board_size
-  def set_size(self, n):
-    if n != self.board_size:
-      raise ValueError((
-          "Can't handle boardsize {}! Please check the board size.").format(n))
-  def set_komi(self, komi):
-    self.komi = komi
-    self.position.komi = komi
-  def clear(self):
-    if self.position and len(self.position.recent) > 1:
-      try:
-        sgf = self.to_sgf()
-        with open(datetime.datetime.now().strftime(
-            '%Y-%m-%d-%H:%M.sgf'), 'w') as f:
-          f.write(sgf)
-      except NotImplementedError:
-        pass
-      except:
-        print('Error saving sgf', file=sys.stderr, flush=True)
-    self.position = go.Position(komi=self.komi)
-    self.initialize_game(self.position)
-  def accomodate_out_of_turn(self, color):
-    if translate_gtp_colors(color) != self.position.to_play:
-      self.position.flip_playerturn(mutate=True)
-  def make_move(self, color, vertex):
-    c = coords.from_pygtp(self.board_size, vertex)
-    # let's assume this never happens for now.
-    # self.accomodate_out_of_turn(color)
-    return self.play_move(c)
-  def get_move(self, color):
-    self.accomodate_out_of_turn(color)
-    move = self.suggest_move(self.position)
-    if self.should_resign():
-      return gtp.RESIGN
-    return coords.to_pygtp(self.board_size, move)
-  def final_score(self):
-    return self.position.result_string()
-  def showboard(self):
-    print('\n\n' + str(self.position) + '\n\n', file=sys.stderr)
-    return True
-  def should_resign(self):
-    raise NotImplementedError
-  def get_score(self):
-    return self.position.result_string()
-  def suggest_move(self, position):
-    raise NotImplementedError
-  def play_move(self, c):
-    raise NotImplementedError
-  def initialize_game(self):
-    raise NotImplementedError
-  def chat(self, msg_type, sender, text):
-    raise NotImplementedError
-  def to_sgf(self):
-    raise NotImplementedError
-class MCTSPlayer(MCTSPlayerMixin, GtpInterface):
-  pass
-class CGOSPlayer(CGOSPlayerMixin, GtpInterface):
-  pass
-def make_gtp_instance(board_size, read_file, readouts_per_move=100,
-                      verbosity=1, cgos_mode=False):
-  n = DualNetRunner(read_file)
-  instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
-                        verbosity=verbosity, two_player_mode=True)
-  gtp_engine = gtp.Engine(instance)
-  if cgos_mode:
-    instance = CGOSPlayer(board_size, n, seconds_per_move=5,
-                          verbosity=verbosity, two_player_mode=True)
-  else:
-    instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
-                          verbosity=verbosity, two_player_mode=True)
-  name = 'Somebot-' + os.path.basename(read_file)
-  gtp_engine = gtp_extensions.GTPDeluxe(instance, name=name)
-  return gtp_engine
--- a/research/minigo/mcts.py
+++ b/research/minigo/mcts.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Monte Carlo Tree Search implementation.
-All terminology here (Q, U, N, p_UCT) uses the same notation as in the
-AlphaGo (AG) paper, and more details can be found in the paper. Here is a brief
-description:
-  Q: the action value of a position
-  U: the search control strategy
-  N: the visit counts of a state
-  p_UCT: the PUCT algorithm for action selection
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import collections
-import math
-import coords
-import numpy as np
-# Exploration constant
-c_PUCT = 1.38  # pylint: disable=invalid-name
-# Dirichlet noise, as a function of board_size
-def D_NOISE_ALPHA(board_size):  # pylint: disable=invalid-name
-  return 0.03 * 361 / (board_size ** 2)
-class DummyNode(object):
-  """A fake node of a MCTS search tree.
-  This node is intended to be a placeholder for the root node, which would
-  otherwise have no parent node. If all nodes have parents, code becomes
-  simpler.
-  """
-  # pylint: disable=invalid-name
-  def __init__(self, board_size):
-    self.board_size = board_size
-    self.parent = None
-    self.child_N = collections.defaultdict(float)
-    self.child_W = collections.defaultdict(float)
-class MCTSNode(object):
-  """A node of a MCTS search tree.
-  A node knows how to compute the action scores of all of its children,
-  so that a decision can be made about which move to explore next. Upon
-  selecting a move, the children dictionary is updated with a new node.
-  position: A go.Position instance
-  fmove: A move (coordinate) that led to this position, a a flattened coord
-    (raw number between 0-N^2, with None a pass)
-  parent: A parent MCTSNode.
-  """
-  # pylint: disable=invalid-name
-  def __init__(self, board_size, position, fmove=None, parent=None):
-    if parent is None:
-      parent = DummyNode(board_size)
-    self.board_size = board_size
-    self.parent = parent
-    self.fmove = fmove  # move that led to this position, as flattened coords
-    self.position = position
-    self.is_expanded = False
-    self.losses_applied = 0  # number of virtual losses on this node
-    # using child_() allows vectorized computation of action score.
-    self.illegal_moves = 1000 * (1 - self.position.all_legal_moves())
-    self.child_N = np.zeros([board_size * board_size + 1], dtype=np.float32)
-    self.child_W = np.zeros([board_size * board_size + 1], dtype=np.float32)
-    # save a copy of the original prior before it gets mutated by d-noise.
-    self.original_prior = np.zeros([board_size * board_size + 1],
-                                   dtype=np.float32)
-    self.child_prior = np.zeros([board_size * board_size + 1], dtype=np.float32)
-    self.children = {}  # map of flattened moves to resulting MCTSNode
-  def __repr__(self):
-    return '<MCTSNode move={}, N={}, to_play={}>'.format(
-        self.position.recent[-1:], self.N, self.position.to_play)
-  @property
-  def child_action_score(self):
-    return (self.child_Q * self.position.to_play
-            + self.child_U - self.illegal_moves)
-  @property
-  def child_Q(self):
-    return self.child_W / (1 + self.child_N)
-  @property
-  def child_U(self):
-    return (c_PUCT * math.sqrt(1 + self.N) *
-            self.child_prior / (1 + self.child_N))
-  @property
-  def Q(self):
-    return self.W / (1 + self.N)
-  @property
-  def N(self):
-    return self.parent.child_N[self.fmove]
-  @N.setter
-  def N(self, value):
-    self.parent.child_N[self.fmove] = value
-  @property
-  def W(self):
-    return self.parent.child_W[self.fmove]
-  @W.setter
-  def W(self, value):
-    self.parent.child_W[self.fmove] = value
-  @property
-  def Q_perspective(self):
-    """Return value of position, from perspective of player to play."""
-    return self.Q * self.position.to_play
-  def select_leaf(self):
-    current = self
-    pass_move = self.board_size * self.board_size
-    while True:
-      current.N += 1
-      # if a node has never been evaluated, we have no basis to select a child.
-      if not current.is_expanded:
-        break
-      # HACK: if last move was a pass, always investigate double-pass first
-      # to avoid situations where we auto-lose by passing too early.
-      if (current.position.recent
-          and current.position.recent[-1].move is None
-          and current.child_N[pass_move] == 0):
-        current = current.maybe_add_child(pass_move)
-        continue
-      best_move = np.argmax(current.child_action_score)
-      current = current.maybe_add_child(best_move)
-    return current
-  def maybe_add_child(self, fcoord):
-    """Add child node for fcoord if it doesn't already exist, and returns it."""
-    if fcoord not in self.children:
-      new_position = self.position.play_move(
-          coords.from_flat(self.board_size, fcoord))
-      self.children[fcoord] = MCTSNode(
-          self.board_size, new_position, fmove=fcoord, parent=self)
-    return self.children[fcoord]
-  def add_virtual_loss(self, up_to):
-    """Propagate a virtual loss up to the root node.
-    Args:
-      up_to: The node to propagate until. (Keep track of this! You'll
-        need it to reverse the virtual loss later.)
-    """
-    self.losses_applied += 1
-    # This is a "win" for the current node; hence a loss for its parent node
-    # who will be deciding whether to investigate this node again.
-    loss = self.position.to_play
-    self.W += loss
-    if self.parent is None or self is up_to:
-      return
-    self.parent.add_virtual_loss(up_to)
-  def revert_virtual_loss(self, up_to):
-    self.losses_applied -= 1
-    revert = -self.position.to_play
-    self.W += revert
-    if self.parent is None or self is up_to:
-      return
-    self.parent.revert_virtual_loss(up_to)
-  def revert_visits(self, up_to):
-    """Revert visit increments."""
-    # Sometimes, repeated calls to select_leaf return the same node.
-    # This is rare and we're okay with the wasted computation to evaluate
-    # the position multiple times by the dual_net. But select_leaf has the
-    # side effect of incrementing visit counts. Since we want the value to
-    # only count once for the repeatedly selected node, we also have to
-    # revert the incremented visit counts.
-    self.N -= 1
-    if self.parent is None or self is up_to:
-      return
-    self.parent.revert_visits(up_to)
-  def incorporate_results(self, move_probabilities, value, up_to):
-    assert move_probabilities.shape == (self.board_size * self.board_size + 1,)
-    # A finished game should not be going through this code path - should
-    # directly call backup_value() on the result of the game.
-    assert not self.position.is_game_over()
-    if self.is_expanded:
-      self.revert_visits(up_to=up_to)
-      return
-    self.is_expanded = True
-    self.original_prior = self.child_prior = move_probabilities
-    # initialize child Q as current node's value, to prevent dynamics where
-    # if B is winning, then B will only ever explore 1 move, because the Q
-    # estimation will be so much larger than the 0 of the other moves.
-    #
-    # Conversely, if W is winning, then B will explore all 362 moves before
-    # continuing to explore the most favorable move. This is a waste of search.
-    #
-    # The value seeded here acts as a prior, and gets averaged into
-    # Q calculations.
-    self.child_W = np.ones([self.board_size * self.board_size + 1],
-                           dtype=np.float32) * value
-    self.backup_value(value, up_to=up_to)
-  def backup_value(self, value, up_to):
-    """Propagates a value estimation up to the root node.
-    Args:
-      value: the value to be propagated (1 = black wins, -1 = white wins)
-      up_to: the node to propagate until.
-    """
-    self.W += value
-    if self.parent is None or self is up_to:
-      return
-    self.parent.backup_value(value, up_to)
-  def is_done(self):
-    # True if the last two moves were Pass or if the position is at a move
-    # greater than the max depth.
-    max_depth = (self.board_size ** 2) * 1.4  # 505 moves for 19x19, 113 for 9x9
-    return self.position.is_game_over() or self.position.n >= max_depth
-  def inject_noise(self):
-    dirch = np.random.dirichlet([D_NOISE_ALPHA(self.board_size)] * (
-        (self.board_size * self.board_size) + 1))
-    self.child_prior = self.child_prior * 0.75 + dirch * 0.25
-  def children_as_pi(self, squash=False):
-    """Returns the child visit counts as a probability distribution, pi."""
-    # If squash is true, exponentiate the probabilities by a temperature
-    # slightly larger than unity to encourage diversity in early play and
-    # hopefully to move away from 3-3s
-    probs = self.child_N
-    if squash:
-      probs **= .95
-    return probs / np.sum(probs)
-  def most_visited_path(self):
-    node = self
-    output = []
-    while node.children:
-      next_kid = np.argmax(node.child_N)
-      node = node.children.get(next_kid)
-      if node is None:
-        output.append('GAME END')
-        break
-      output.append('{} ({}) ==> '.format(
-          coords.to_kgs(
-              self.board_size,
-              coords.from_flat(self.board_size, node.fmove)), node.N))
-    output.append('Q: {:.5f}\n'.format(node.Q))
-    return ''.join(output)
-  def mvp_gg(self):
-    """ Returns most visited path in go-gui VAR format e.g. 'b r3 w c17..."""
-    node = self
-    output = []
-    while node.children and max(node.child_N) > 1:
-      next_kid = np.argmax(node.child_N)
-      node = node.children[next_kid]
-      output.append('{}'.format(coords.to_kgs(
-          self.board_size, coords.from_flat(self.board_size, node.fmove))))
-    return ' '.join(output)
-  def describe(self):
-    sort_order = list(range(self.board_size * self.board_size + 1))
-    sort_order.sort(key=lambda i: (
-        self.child_N[i], self.child_action_score[i]), reverse=True)
-    soft_n = self.child_N / sum(self.child_N)
-    p_delta = soft_n - self.child_prior
-    p_rel = p_delta / self.child_prior
-    # Dump out some statistics
-    output = []
-    output.append('{q:.4f}\n'.format(q=self.Q))
-    output.append(self.most_visited_path())
-    output.append(
-        '''move:  action      Q      U      P    P-Dir    N  soft-N
-        p-delta  p-rel\n''')
-    output.append(
-        '\n'.join([
-            '''{!s:6}: {: .3f}, {: .3f}, {:.3f}, {:.3f}, {:.3f}, {:4d} {:.4f}
-            {: .5f} {: .2f}'''.format(
-                coords.to_kgs(self.board_size, coords.from_flat(
-                    self.board_size, key)),
-                self.child_action_score[key],
-                self.child_Q[key],
-                self.child_U[key],
-                self.child_prior[key],
-                self.original_prior[key],
-                int(self.child_N[key]),
-                soft_n[key],
-                p_delta[key],
-                p_rel[key])
-            for key in sort_order][:15]))
-    return ''.join(output)
--- a/research/minigo/mcts_test.py
+++ b/research/minigo/mcts_test.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Tests for mcts."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import copy
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import coords
-import go
-from mcts import MCTSNode
-import numpy as np
-import utils_test
-tf.logging.set_verbosity(tf.logging.ERROR)
-ALMOST_DONE_BOARD = utils_test.load_board('''
-  .XO.XO.OO
-  X.XXOOOO.
-  XXXXXOOOO
-  XXXXXOOOO
-  .XXXXOOO.
-  XXXXXOOOO
-  .XXXXOOO.
-  XXXXXOOOO
-  XXXXOOOOO
-''')
-TEST_POSITION = go.Position(
-    utils_test.BOARD_SIZE,
-    board=ALMOST_DONE_BOARD,
-    n=105,
-    komi=2.5,
-    caps=(1, 4),
-    ko=None,
-    recent=(go.PlayerMove(go.BLACK, (0, 1)),
-            go.PlayerMove(go.WHITE, (0, 8))),
-    to_play=go.BLACK
-)
-SEND_TWO_RETURN_ONE = go.Position(
-    utils_test.BOARD_SIZE,
-    board=ALMOST_DONE_BOARD,
-    n=75,
-    komi=0.5,
-    caps=(0, 0),
-    ko=None,
-    recent=(
-        go.PlayerMove(go.BLACK, (0, 1)),
-        go.PlayerMove(go.WHITE, (0, 8)),
-        go.PlayerMove(go.BLACK, (1, 0))),
-    to_play=go.WHITE
-)
-MAX_DEPTH = (utils_test.BOARD_SIZE ** 2) * 1.4
-class TestMctsNodes(utils_test.MiniGoUnitTest):
-  def test_action_flipping(self):
-    np.random.seed(1)
-    probs = np.array([.02] * (
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
-    probs += np.random.random(
-        [utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1]) * 0.001
-    black_root = MCTSNode(
-        utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
-    white_root = MCTSNode(utils_test.BOARD_SIZE, go.Position(
-        utils_test.BOARD_SIZE, to_play=go.WHITE))
-    black_root.select_leaf().incorporate_results(probs, 0, black_root)
-    white_root.select_leaf().incorporate_results(probs, 0, white_root)
-    # No matter who is to play, when we know nothing else, the priors
-    # should be respected, and the same move should be picked
-    black_leaf = black_root.select_leaf()
-    white_leaf = white_root.select_leaf()
-    self.assertEqual(black_leaf.fmove, white_leaf.fmove)
-    self.assertEqualNPArray(
-        black_root.child_action_score, white_root.child_action_score)
-  def test_select_leaf(self):
-    flattened = coords.to_flat(utils_test.BOARD_SIZE, coords.from_kgs(
-        utils_test.BOARD_SIZE, 'D9'))
-    probs = np.array([.02] * (
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
-    probs[flattened] = 0.4
-    root = MCTSNode(utils_test.BOARD_SIZE, SEND_TWO_RETURN_ONE)
-    root.select_leaf().incorporate_results(probs, 0, root)
-    self.assertEqual(root.position.to_play, go.WHITE)
-    self.assertEqual(root.select_leaf(), root.children[flattened])
-  def test_backup_incorporate_results(self):
-    probs = np.array([.02] * (
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
-    root = MCTSNode(utils_test.BOARD_SIZE, SEND_TWO_RETURN_ONE)
-    root.select_leaf().incorporate_results(probs, 0, root)
-    leaf = root.select_leaf()
-    leaf.incorporate_results(probs, -1, root)  # white wins!
-    # Root was visited twice: first at the root, then at this child.
-    self.assertEqual(root.N, 2)
-    # Root has 0 as a prior and two visits with value 0, -1
-    self.assertAlmostEqual(root.Q, -1/3)  # average of 0, 0, -1
-    # Leaf should have one visit
-    self.assertEqual(root.child_N[leaf.fmove], 1)
-    self.assertEqual(leaf.N, 1)
-    # And that leaf's value had its parent's Q (0) as a prior, so the Q
-    # should now be the average of 0, -1
-    self.assertAlmostEqual(root.child_Q[leaf.fmove], -0.5)
-    self.assertAlmostEqual(leaf.Q, -0.5)
-    # We're assuming that select_leaf() returns a leaf like:
-    #   root
-    #     \
-    #     leaf
-    #       \
-    #       leaf2
-    # which happens in this test because root is W to play and leaf was a W win.
-    self.assertEqual(root.position.to_play, go.WHITE)
-    leaf2 = root.select_leaf()
-    leaf2.incorporate_results(probs, -0.2, root)  # another white semi-win
-    self.assertEqual(root.N, 3)
-    # average of 0, 0, -1, -0.2
-    self.assertAlmostEqual(root.Q, -0.3)
-    self.assertEqual(leaf.N, 2)
-    self.assertEqual(leaf2.N, 1)
-    # average of 0, -1, -0.2
-    self.assertAlmostEqual(leaf.Q, root.child_Q[leaf.fmove])
-    self.assertAlmostEqual(leaf.Q, -0.4)
-    # average of -1, -0.2
-    self.assertAlmostEqual(leaf.child_Q[leaf2.fmove], -0.6)
-    self.assertAlmostEqual(leaf2.Q, -0.6)
-  def test_do_not_explore_past_finish(self):
-    probs = np.array([0.02] * (
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1), dtype=np.float32)
-    root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
-    root.select_leaf().incorporate_results(probs, 0, root)
-    first_pass = root.maybe_add_child(
-        coords.to_flat(utils_test.BOARD_SIZE, None))
-    first_pass.incorporate_results(probs, 0, root)
-    second_pass = first_pass.maybe_add_child(
-        coords.to_flat(utils_test.BOARD_SIZE, None))
-    with self.assertRaises(AssertionError):
-      second_pass.incorporate_results(probs, 0, root)
-    node_to_explore = second_pass.select_leaf()
-    # should just stop exploring at the end position.
-    self.assertEqual(node_to_explore, second_pass)
-  def test_add_child(self):
-    root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
-    child = root.maybe_add_child(17)
-    self.assertIn(17, root.children)
-    self.assertEqual(child.parent, root)
-    self.assertEqual(child.fmove, 17)
-  def test_add_child_idempotency(self):
-    root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
-    child = root.maybe_add_child(17)
-    current_children = copy.copy(root.children)
-    child2 = root.maybe_add_child(17)
-    self.assertEqual(child, child2)
-    self.assertEqual(current_children, root.children)
-  def test_never_select_illegal_moves(self):
-    probs = np.array([0.02] * (
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
-    # let's say the NN were to accidentally put a high weight on an illegal move
-    probs[1] = 0.99
-    root = MCTSNode(utils_test.BOARD_SIZE, SEND_TWO_RETURN_ONE)
-    root.incorporate_results(probs, 0, root)
-    # and let's say the root were visited a lot of times, which pumps up the
-    # action score for unvisited moves...
-    root.N = 100000
-    root.child_N[root.position.all_legal_moves()] = 10000
-    # this should not throw an error...
-    leaf = root.select_leaf()
-    # the returned leaf should not be the illegal move
-    self.assertNotEqual(leaf.fmove, 1)
-    # and even after injecting noise, we should still not select an illegal move
-    for _ in range(10):
-      root.inject_noise()
-      leaf = root.select_leaf()
-      self.assertNotEqual(leaf.fmove, 1)
-  def test_dont_pick_unexpanded_child(self):
-    probs = np.array([0.001] * (
-        utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
-    # make one move really likely so that tree search goes down that path twice
-    # even with a virtual loss
-    probs[17] = 0.999
-    root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
-    root.incorporate_results(probs, 0, root)
-    leaf1 = root.select_leaf()
-    self.assertEqual(leaf1.fmove, 17)
-    leaf1.add_virtual_loss(up_to=root)
-    # the second select_leaf pick should return the same thing, since the child
-    # hasn't yet been sent to neural net for eval + result incorporation
-    leaf2 = root.select_leaf()
-    self.assertIs(leaf1, leaf2)
-if __name__ == '__main__':
-  tf.test.main()
--- a/research/minigo/minigo.py
+++ b/research/minigo/minigo.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Train MiniGo with several iterations of RL learning.
-One iteration of RL learning consists of bootstrap, selfplay, gather and train:
-  bootstrap: Initialize a random model
-  selfplay: Play games with the latest model to produce data used for training
-  gather: Group games played with the same model into larger files of tfexamples
-  train: Train a new model with the selfplay results from the most recent
-    N generations.
-After training, validation can be performed on the holdout data.
-Given two models, evaluation can be applied to choose a stronger model.
-The training pipeline consists of multiple RL learning iterations to achieve
-better models.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import argparse
-import os
-import random
-import socket
-import sys
-import time
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import dualnet
-import evaluation
-import go
-import model_params
-import preprocessing
-import selfplay_mcts
-import utils
-_TF_RECORD_SUFFIX = '.tfrecord.zz'
-def _ensure_dir_exists(directory):
-  """Check if directory exists. If not, create it.
-  Args:
-    directory: A given directory
-  """
-  if os.path.isdir(directory) is False:
-    tf.gfile.MakeDirs(directory)
-def bootstrap(estimator_model_dir, trained_models_dir, params):
-  """Initialize the model with random weights.
-  Args:
-    estimator_model_dir: tf.estimator model directory.
-    trained_models_dir: Dir to save the trained models. Here to export the first
-      bootstrapped generation.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  """
-  bootstrap_name = utils.generate_model_name(0)
-  _ensure_dir_exists(trained_models_dir)
-  bootstrap_model_path = os.path.join(trained_models_dir, bootstrap_name)
-  _ensure_dir_exists(estimator_model_dir)
-  print('Bootstrapping with working dir {}\n Model 0 exported to {}'.format(
-      estimator_model_dir, bootstrap_model_path))
-  dualnet.bootstrap(estimator_model_dir, params)
-  dualnet.export_model(estimator_model_dir, bootstrap_model_path)
-def selfplay(selfplay_dirs, selfplay_model, params):
-  """Perform selfplay with a specific model.
-  Args:
-    selfplay_dirs: A dict to specify the directories used in selfplay.
-      selfplay_dirs = {
-          'output_dir': output_dir,
-          'holdout_dir': holdout_dir,
-          'clean_sgf': clean_sgf,
-          'full_sgf': full_sgf
-      }
-    selfplay_model: The actual Dualnet runner for selfplay.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  """
-  with utils.logged_timer('Playing game'):
-    player = selfplay_mcts.play(
-        params.board_size, selfplay_model, params.selfplay_readouts,
-        params.selfplay_resign_threshold, params.simultaneous_leaves,
-        params.selfplay_verbose)
-  output_name = '{}-{}'.format(int(time.time()), socket.gethostname())
-  def _write_sgf_data(dir_sgf, use_comments):
-    with tf.gfile.GFile(
-        os.path.join(dir_sgf, '{}.sgf'.format(output_name)), 'w') as f:
-      f.write(player.to_sgf(use_comments=use_comments))
-  _write_sgf_data(selfplay_dirs['clean_sgf'], use_comments=False)
-  _write_sgf_data(selfplay_dirs['full_sgf'], use_comments=True)
-  game_data = player.extract_data()
-  tf_examples = preprocessing.make_dataset_from_selfplay(game_data, params)
-  # Hold out 5% of games for evaluation.
-  if random.random() < params.holdout_pct:
-    fname = os.path.join(
-        selfplay_dirs['holdout_dir'], output_name + _TF_RECORD_SUFFIX)
-  else:
-    fname = os.path.join(
-        selfplay_dirs['output_dir'], output_name + _TF_RECORD_SUFFIX)
-  preprocessing.write_tf_examples(fname, tf_examples)
-def gather(selfplay_dir, training_chunk_dir, params):
-  """Gather selfplay data into large training chunk.
-  Args:
-    selfplay_dir: Where to look for games. Set as 'base_dir/data/selfplay/'.
-    training_chunk_dir: where to put collected games. Set as
-      'base_dir/data/training_chunks/'.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  """
-  # Check the selfplay data from the most recent 50 models.
-  _ensure_dir_exists(training_chunk_dir)
-  sorted_model_dirs = sorted(tf.gfile.ListDirectory(selfplay_dir))
-  models = [model_dir.strip('/')
-            for model_dir in sorted_model_dirs[-params.gather_generation:]]
-  with utils.logged_timer('Finding existing tfrecords...'):
-    model_gamedata = {
-        model: tf.gfile.Glob(
-            os.path.join(selfplay_dir, model, '*'+_TF_RECORD_SUFFIX))
-        for model in models
-    }
-  print('Found {} models'.format(len(models)))
-  for model_name, record_files in sorted(model_gamedata.items()):
-    print('    {}: {} files'.format(model_name, len(record_files)))
-  meta_file = os.path.join(training_chunk_dir, 'meta.txt')
-  try:
-    with tf.gfile.GFile(meta_file, 'r') as f:
-      already_processed = set(f.read().split())
-  except tf.errors.NotFoundError:
-    already_processed = set()
-  num_already_processed = len(already_processed)
-  for model_name, record_files in sorted(model_gamedata.items()):
-    if set(record_files) <= already_processed:
-      continue
-    print('Gathering files from {}:'.format(model_name))
-    tf_examples = preprocessing.shuffle_tf_examples(
-        params.shuffle_buffer_size, params.examples_per_chunk, record_files)
-    # tqdm to make the loops show a smart progress meter
-    for i, example_batch in enumerate(tf_examples):
-      output_record = os.path.join(
-          training_chunk_dir,
-          ('{}-{}'+_TF_RECORD_SUFFIX).format(model_name, str(i)))
-      preprocessing.write_tf_examples(
-          output_record, example_batch, serialize=False)
-    already_processed.update(record_files)
-  print('Processed {} new files'.format(
-      len(already_processed) - num_already_processed))
-  with tf.gfile.GFile(meta_file, 'w') as f:
-    f.write('\n'.join(sorted(already_processed)))
-def train(trained_models_dir, estimator_model_dir, training_chunk_dir,
-          generation, params):
-  """Train the latest model from gathered data.
-  Args:
-    trained_models_dir: Where to export the completed generation.
-    estimator_model_dir: tf.estimator model directory.
-    training_chunk_dir: Directory where gathered training chunks are.
-    generation: Which generation you are training.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  """
-  new_model_name = utils.generate_model_name(generation)
-  print('New model will be {}'.format(new_model_name))
-  new_model = os.path.join(trained_models_dir, new_model_name)
-  print('Training on gathered game data...')
-  tf_records = sorted(
-      tf.gfile.Glob(os.path.join(training_chunk_dir, '*'+_TF_RECORD_SUFFIX)))
-  tf_records = tf_records[
-      -(params.train_window_size // params.examples_per_chunk):]
-  print('Training from: {} to {}'.format(tf_records[0], tf_records[-1]))
-  with utils.logged_timer('Training'):
-    dualnet.train(estimator_model_dir, tf_records, generation, params)
-    dualnet.export_model(estimator_model_dir, new_model)
-def validate(trained_models_dir, holdout_dir, estimator_model_dir, params):
-  """Validate the latest model on the holdout dataset.
-  Args:
-    trained_models_dir: Directories where the completed generations/models are.
-    holdout_dir: Directories where holdout data are.
-    estimator_model_dir: tf.estimator model directory.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  """
-  model_num, _ = utils.get_latest_model(trained_models_dir)
-  # Get the holdout game data
-  nums_names = utils.get_models(trained_models_dir)
-  # Model N was trained on games up through model N-1, so the validation set
-  # should only be for models through N-1 as well, thus the (model_num) term.
-  models = [num_name for num_name in nums_names if num_name[0] < model_num]
-  # pair is a tuple of (model_num, model_name), like (13, 000013-modelname)
-  holdout_dirs = [os.path.join(holdout_dir, pair[1])
-                  for pair in models[-params.holdout_generation:]]
-  tf_records = []
-  with utils.logged_timer('Building lists of holdout files'):
-    for record_dir in holdout_dirs:
-      if os.path.exists(record_dir):  # make sure holdout dir exists
-        tf_records.extend(
-            tf.gfile.Glob(os.path.join(record_dir, '*'+_TF_RECORD_SUFFIX)))
-  if not tf_records:
-    print('No holdout dataset for validation! '
-          'Please check your holdout directory: {}'.format(holdout_dir))
-    return
-  print('The length of tf_records is {}.'.format(len(tf_records)))
-  first_tf_record = os.path.basename(tf_records[0])
-  last_tf_record = os.path.basename(tf_records[-1])
-  with utils.logged_timer('Validating from {} to {}'.format(
-      first_tf_record, last_tf_record)):
-    dualnet.validate(estimator_model_dir, tf_records, params)
-def evaluate(black_model_name, black_net, white_model_name, white_net,
-             evaluate_dir, params):
-  """Evaluate with two models.
-  With two DualNetRunners to play as black and white in a Go match. Two models
-  play several games, and the model that wins by a margin of 55% will be the
-  winner.
-  Args:
-    black_model_name: The name of the model playing black.
-    black_net: The DualNetRunner model for black
-    white_model_name: The name of the model playing white.
-    white_net: The DualNetRunner model for white.
-    evaluate_dir: Where to write the evaluation results. Set as
-      'base_dir/sgf/evaluate/'.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  Returns:
-    The model name of the winner.
-  Raises:
-      ValueError: if neither `WHITE` or `BLACK` is returned.
-  """
-  with utils.logged_timer('{} games'.format(params.eval_games)):
-    winner = evaluation.play_match(
-        params, black_net, white_net, params.eval_games,
-        params.eval_readouts, evaluate_dir, params.eval_verbose)
-  if winner != go.WHITE_NAME and winner != go.BLACK_NAME:
-    raise ValueError('Winner should be either White or Black!')
-  return black_model_name if winner == go.BLACK_NAME else white_model_name
-def _set_params(flags):
-  """Set hyperparameters from board size.
-  Args:
-    flags: Flags from Argparser.
-  Returns:
-  An MiniGoParams instance of hyperparameters.
-  """
-  params = model_params.MiniGoParams()
-  k = utils.round_power_of_two(flags.board_size ** 2 / 3)
-  params.num_filters = k  # Number of filters in the convolution layer
-  params.fc_width = 2 * k  # Width of each fully connected layer
-  params.num_shared_layers = flags.board_size  # Number of shared trunk layers
-  params.board_size = flags.board_size  # Board size
-  # How many positions can fit on a graphics card. 256 for 9s, 16 or 32 for 19s.
-  if flags.batch_size is None:
-    if flags.board_size == 9:
-      params.batch_size = 256
-    else:
-      params.batch_size = 32
-  else:
-    params.batch_size = flags.batch_size
-  return params
-def _prepare_selfplay(
-    model_name, trained_models_dir, selfplay_dir, holdout_dir, sgf_dir, params):
-  """Set directories and load the network for selfplay.
-  Args:
-    model_name: The name of the model for self-play
-    trained_models_dir: Directories where the completed generations/models are.
-    selfplay_dir: Where to write the games. Set as 'base_dir/data/selfplay/'.
-    holdout_dir: Where to write the holdout data. Set as
-      'base_dir/data/holdout/'.
-    sgf_dir: Where to write the sgf (Smart Game Format) files. Set as
-      'base_dir/sgf/'.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  Returns:
-    The directories and network model for selfplay.
-  """
-  # Set paths for the model with 'model_name'
-  model_path = os.path.join(trained_models_dir, model_name)
-  output_dir = os.path.join(selfplay_dir, model_name)
-  holdout_dir = os.path.join(holdout_dir, model_name)
-  # clean_sgf is to write sgf file without comments.
-  # full_sgf is to write sgf file with comments.
-  clean_sgf = os.path.join(sgf_dir, model_name, 'clean')
-  full_sgf = os.path.join(sgf_dir, model_name, 'full')
-  _ensure_dir_exists(output_dir)
-  _ensure_dir_exists(holdout_dir)
-  _ensure_dir_exists(clean_sgf)
-  _ensure_dir_exists(full_sgf)
-  selfplay_dirs = {
-      'output_dir': output_dir,
-      'holdout_dir': holdout_dir,
-      'clean_sgf': clean_sgf,
-      'full_sgf': full_sgf
-  }
-  # cache the network model for self-play
-  with utils.logged_timer('Loading weights from {} ... '.format(model_path)):
-    network = dualnet.DualNetRunner(model_path, params)
-  return selfplay_dirs, network
-def run_selfplay(selfplay_model, selfplay_games, dirs, params):
-  """Run selfplay to generate training data.
-  Args:
-    selfplay_model: The model name for selfplay.
-    selfplay_games: The number of selfplay games.
-    dirs: A MiniGoDirectory instance of directories used in each step.
-    params: A MiniGoParams instance of hyperparameters for the model.
-  """
-  selfplay_dirs, network = _prepare_selfplay(
-      selfplay_model, dirs.trained_models_dir, dirs.selfplay_dir,
-      dirs.holdout_dir, dirs.sgf_dir, params)
-  print('Self-play with model: {}'.format(selfplay_model))
-  for _ in range(selfplay_games):
-    selfplay(selfplay_dirs, network, params)
-def main(_):
-  """Run the reinforcement learning loop."""
-  tf.logging.set_verbosity(tf.logging.INFO)
-  params = _set_params(FLAGS)
-  # A dummy model for debug/testing purpose with fewer games and iterations
-  if FLAGS.test:
-    params = model_params.DummyMiniGoParams()
-    base_dir = FLAGS.base_dir + str(FLAGS.board_size) + '_size_dummy/'
-  else:
-    # Set directories for models and datasets
-    base_dir = FLAGS.base_dir + str(FLAGS.board_size) + '_size/'
-  dirs = utils.MiniGoDirectory(base_dir)
-  # Run selfplay only if user specifies the argument.
-  if FLAGS.selfplay:
-    selfplay_model_name = FLAGS.selfplay_model_name or utils.get_latest_model(
-        dirs.trained_models_dir)[1]
-    max_games = FLAGS.selfplay_max_games or params.max_games_per_generation
-    run_selfplay(selfplay_model_name, max_games, dirs, params)
-    return
-  # Run the RL pipeline
-  # if no models have been trained, start from bootstrap model
-  if not os.path.isdir(dirs.trained_models_dir):
-    print('No trained model exists! Starting from Bootstrap...')
-    print('Creating random initial weights...')
-    bootstrap(dirs.estimator_model_dir, dirs.trained_models_dir, params)
-  else:
-    print('A MiniGo base directory has been found! ')
-    print('Start from the last checkpoint...')
-  _, best_model_so_far = utils.get_latest_model(dirs.trained_models_dir)
-  for rl_iter in range(params.max_iters_per_pipeline):
-    print('RL_iteration: {}'.format(rl_iter))
-    # Self-play with the best model to generate training data
-    run_selfplay(
-        best_model_so_far, params.max_games_per_generation, dirs, params)
-    # gather selfplay data for training
-    print('Gathering game output...')
-    gather(dirs.selfplay_dir, dirs.training_chunk_dir, params)
-    # train the next generation model
-    model_num, _ = utils.get_latest_model(dirs.trained_models_dir)
-    print('Training on gathered game data...')
-    train(dirs.trained_models_dir, dirs.estimator_model_dir,
-          dirs.training_chunk_dir, model_num + 1, params)
-    # validate the latest model if needed
-    if FLAGS.validation:
-      print('Validating on the holdout game data...')
-      validate(dirs.trained_models_dir, dirs.holdout_dir,
-               dirs.estimator_model_dir, params)
-    _, current_model = utils.get_latest_model(dirs.trained_models_dir)
-    if FLAGS.evaluation:  # Perform evaluation if needed
-      print('Evaluate models between {} and {}'.format(
-          best_model_so_far, current_model))
-      black_model = os.path.join(dirs.trained_models_dir, best_model_so_far)
-      white_model = os.path.join(dirs.trained_models_dir, current_model)
-      _ensure_dir_exists(dirs.evaluate_dir)
-      with utils.logged_timer('Loading weights'):
-        black_net = dualnet.DualNetRunner(black_model, params)
-        white_net = dualnet.DualNetRunner(white_model, params)
-      best_model_so_far = evaluate(
-          best_model_so_far, black_net, current_model, white_net,
-          dirs.evaluate_dir, params)
-      print('Winner of evaluation: {}!'.format(best_model_so_far))
-    else:
-      best_model_so_far = current_model
-if __name__ == '__main__':
-  parser = argparse.ArgumentParser()
-  # flags to run the RL pipeline
-  parser.add_argument(
-      '--base_dir',
-      type=str,
-      default='/tmp/minigo/',
-      metavar='BD',
-      help='Base directory for the MiniGo models and datasets.')
-  parser.add_argument(
-      '--board_size',
-      type=int,
-      default=9,
-      metavar='N',
-      choices=[9, 19],
-      help='Go board size. The default size is 9.')
-  parser.add_argument(
-      '--batch_size',
-      type=int,
-      default=None,
-      metavar='BS',
-      help='Batch size for training. The default size is None')
-  # Test the pipeline with a dummy model
-  parser.add_argument(
-      '--test',
-      action='store_true',
-      help='A boolean to test RL pipeline with a dummy model.')
-  # Run RL pipeline with the validation step
-  parser.add_argument(
-      '--validation',
-      action='store_true',
-      help='A boolean to specify validation in the RL pipeline.')
-  # Run RL pipeline with the evaluation step
-  parser.add_argument(
-      '--evaluation',
-      action='store_true',
-      help='A boolean to specify evaluation in the RL pipeline.')
-  # self-play only
-  parser.add_argument(
-      '--selfplay',
-      action='store_true',
-      help='A boolean to run self-play only.')
-  parser.add_argument(
-      '--selfplay_model_name',
-      type=str,
-      default=None,
-      metavar='SM',
-      help='The model used for self-play only.')
-  parser.add_argument(
-      '--selfplay_max_games',
-      type=int,
-      default=None,
-      metavar='SMG',
-      help='The number of game data self-play only needs to generate')
-  FLAGS, unparsed = parser.parse_known_args()
-  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
--- a/research/minigo/model_params.py
+++ b/research/minigo/model_params.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Defines MiniGo parameters."""
-class MiniGoParams(object):
-  """Parameters for MiniGo."""
-  # Go board size
-  board_size = 9
-  # RL pipeline
-  max_games_per_generation = 10  # Number of games per selfplay generation
-  max_iters_per_pipeline = 2  # Number of RL iterations in one pipeline
-  # The shuffle buffer size determines how far an example could end up from
-  # where it started; this and the interleave parameters in preprocessing can
-  # give us an approximation of a uniform sampling.  The default of 4M is used
-  # in training, but smaller numbers can be used for aggregation or validation.
-  shuffle_buffer_size = 2000000  # shuffle buffer size in preprocessing
-  # dual_net
-  # How many positions to look at per generation.
-  # Per AlphaGo Zero (AGZ), 2048 minibatch * 1k = 2M positions/generation
-  examples_per_generation = 2000000
-  # for learning rate
-  l2_strength = 1e-4  # Regularization strength
-  momentum = 0.9  # Momentum used in SGD
-  kernel_size = [3, 3]  # kernel size of conv and res blocks is from AGZ paper
-  # selfplay
-  selfplay_readouts = 100  # How many simulations to run per move
-  selfplay_verbose = 1  # >=2 will print debug info, >=3 will print boards
-  # an absolute value of threshold to resign at
-  selfplay_resign_threshold = 0.95
-  # the number of simultaneous leaves in MCTS
-  simultaneous_leaves = 8
-  # holdout data for validation
-  holdout_pct = 0.05  # How many games to hold out for validation
-  holdout_generation = 50  # How many recent generations/models for holdout data
-  # gather
-  gather_generation = 50  # How many recent generations/models for gathered data
-  # How many positions we should aggregate per 'chunk'.
-  examples_per_chunk = 10000
-  # How many positions to draw from for our training window.
-  # AGZ used the most recent 500k games, which, assuming 250 moves/game = 125M
-  train_window_size = 125000000
-  # evaluation with two models
-  eval_games = 50  # The number of games to play in evaluation
-  eval_readouts = 100  # How many readouts to make per move in evaluation
-  eval_verbose = 1  # How verbose the players should be in evaluation
-  eval_win_rate = 0.55  # Winner needs to win by a margin of 55%.
-class DummyMiniGoParams(MiniGoParams):
-  """Parameters for a dummy model."""
-  num_filters = 8  # Number of filters in the convolution layer
-  fc_width = 16  # Width of each fully connected layer
-  num_shared_layers = 1  # Number of shared trunk layers
-  batch_size = 16
-  examples_per_generation = 64
-  max_games_per_generation = 2
-  max_iters_per_pipeline = 1
-  selfplay_readouts = 10
-  shuffle_buffer_size = 1000
-    # evaluation
-  eval_games = 10  # The number of games to play in evaluation
-  eval_readouts = 10  # How many readouts to make per move in evaluation
-  eval_verbose = 1  # How verbose the players should be in evaluation
-class DummyValidationParams(DummyMiniGoParams, MiniGoParams):
-  """Parameters for a dummy model."""
-  holdout_pct = 1  # Set holdout percent as 1 for validation testing purpose
--- a/research/minigo/preprocessing.py
+++ b/research/minigo/preprocessing.py
-# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# ==============================================================================
-"""Preprocessing step to create, read, write tf.Examples."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import functools
-import random
-import tensorflow as tf  # pylint: disable=g-bad-import-order
-import coords
-import features as features_lib
-import numpy as np
-import sgf_wrapper
-TF_RECORD_CONFIG = tf.python_io.TFRecordOptions(
-    tf.python_io.TFRecordCompressionType.ZLIB)
-# Constructing tf.Examples
-def _one_hot(board_size, index):
-  onehot = np.zeros([board_size * board_size + 1], dtype=np.float32)
-  onehot[index] = 1
-  return onehot
-def make_tf_example(features, pi, value):
-  """Make tf examples.
-  Args:
-    features: [N, N, FEATURE_DIM] nparray of uint8
-    pi: [N * N + 1] nparray of float32
-    value: float
-  Returns:
-    tf example.
-  """
-  return tf.train.Example(
-      features=tf.train.Features(
-          feature={
-              'x': tf.train.Feature(
-                  bytes_list=tf.train.BytesList(value=[features.tostring()])),
-              'pi': tf.train.Feature(
-                  bytes_list=tf.train.BytesList(value=[pi.tostring()])),
-              'outcome': tf.train.Feature(
-                  float_list=tf.train.FloatList(value=[value]))
-          }))
-def write_tf_examples(filename, tf_examples, serialize=True):
-  """Write tf.Example to files.
-  Args:
-    filename: Where to write tf.records
-    tf_examples: An iterable of tf.Example
-    serialize: whether to serialize the examples.
-  """
-  with tf.python_io.TFRecordWriter(
-      filename, options=TF_RECORD_CONFIG) as writer:
-    for ex in tf_examples:
-      if serialize:
-        writer.write(ex.SerializeToString())
-      else:
-        writer.write(ex)
-# Read tf.Example from files
-def _batch_parse_tf_example(board_size, batch_size, example_batch):
-  """Parse tf examples.
-  Args:
-    board_size: the go board size
-    batch_size: the batch size
-    example_batch: a batch of tf.Example
-  Returns:
-    A tuple (feature_tensor, dict of output tensors)
-  """
-  features = {
-      'x': tf.FixedLenFeature([], tf.string),
-      'pi': tf.FixedLenFeature([], tf.string),
-      'outcome': tf.FixedLenFeature([], tf.float32),
-  }
-  parsed = tf.parse_example(example_batch, features)
-  x = tf.decode_raw(parsed['x'], tf.uint8)
-  x = tf.cast(x, tf.float32)
-  x = tf.reshape(x, [batch_size, board_size, board_size,
-                     features_lib.NEW_FEATURES_PLANES])
-  pi = tf.decode_raw(parsed['pi'], tf.float32)
-  pi = tf.reshape(pi, [batch_size, board_size * board_size + 1])
-  outcome = parsed['outcome']
-  outcome.set_shape([batch_size])
-  return (x, {'pi_tensor': pi, 'value_tensor': outcome})
-def read_tf_records(
-    shuffle_buffer_size, batch_size, tf_records, num_repeats=None,
-    shuffle_records=True, shuffle_examples=True, filter_amount=1.0):
-  """Read tf records.
-  Args:
-    shuffle_buffer_size: how big of a buffer to fill before shuffling
-    batch_size: batch size to return
-    tf_records: a list of tf_record filenames
-    num_repeats: how many times the data should be read (default: infinite)
-    shuffle_records: whether to shuffle the order of files read
-    shuffle_examples: whether to shuffle the tf.Examples
-    filter_amount: what fraction of records to keep
-  Returns:
-    a tf dataset of batched tensors
-  """
-  if shuffle_records:
-    random.shuffle(tf_records)
-  record_list = tf.data.Dataset.from_tensor_slices(tf_records)
-  # compression_type here must agree with write_tf_examples
-  # cycle_length = how many tfrecord files are read in parallel
-  # block_length = how many tf.Examples are read from each file before
-  #   moving to the next file
-  # The idea is to shuffle both the order of the files being read,
-  # and the examples being read from the files.
-  dataset = record_list.interleave(
-      lambda x: tf.data.TFRecordDataset(x, compression_type='ZLIB'),
-      cycle_length=64, block_length=16)
-  dataset = dataset.filter(
-      lambda x: tf.less(tf.random_uniform([1]), filter_amount)[0])
-  # TODO(amj): apply py_func for transforms here.
-  if num_repeats is not None:
-    dataset = dataset.repeat(num_repeats)
-  else:
-    dataset = dataset.repeat()
-  if shuffle_examples:
-    dataset = dataset.shuffle(buffer_size=shuffle_buffer_size)
-  dataset = dataset.batch(batch_size)
-  return dataset
-def get_input_tensors(params, batch_size, tf_records, num_repeats=None,
-                      shuffle_records=True, shuffle_examples=True,
-                      filter_amount=0.05):
-  """Read tf.Records and prepare them for ingestion by dualnet.
-  Args:
-    params: An object of hyperparameters
-    batch_size: batch size to return
-    tf_records: a list of tf_record filenames
-    num_repeats: how many times the data should be read (default: infinite)
-    shuffle_records: whether to shuffle the order of files read
-    shuffle_examples: whether to shuffle the tf.Examples
-    filter_amount: what fraction of records to keep
-  Returns:
-    A dict of tensors (see return value of batch_parse_tf_example)
-  """
-  shuffle_buffer_size = params.shuffle_buffer_size
-  dataset = read_tf_records(
-      shuffle_buffer_size, batch_size, tf_records, num_repeats=num_repeats,
-      shuffle_records=shuffle_records, shuffle_examples=shuffle_examples,
-      filter_amount=filter_amount)
-  dataset = dataset.filter(lambda t: tf.equal(tf.shape(t)[0], batch_size))
-  def batch_parse_tf_example(batch_size, dataset):
-    return _batch_parse_tf_example(params.board_size, batch_size, dataset)
-  dataset = dataset.map(functools.partial(
-      batch_parse_tf_example, batch_size))
-  return dataset.make_one_shot_iterator().get_next()
-# End-to-end utility functions
-def make_dataset_from_selfplay(data_extracts, params):
-  """Make an iterable of tf.Examples.
-  Args:
-    data_extracts: An iterable of (position, pi, result) tuples
-    params: An object of hyperparameters
-  Returns:
-    An iterable of tf.Examples.
-  """
-  board_size = params.board_size
-  tf_examples = (make_tf_example(features_lib.extract_features(
-      board_size, pos), pi, result) for pos, pi, result in data_extracts)
-  return tf_examples
-def make_dataset_from_sgf(board_size, sgf_filename, tf_record):
-  pwcs = sgf_wrapper.replay_sgf_file(board_size, sgf_filename)
-  def make_tf_example_from_pwc(pwcs):
-    return _make_tf_example_from_pwc(board_size, pwcs)
-  tf_examples = map(make_tf_example_from_pwc, pwcs)
-  write_tf_examples(tf_record, tf_examples)
-def _make_tf_example_from_pwc(board_size, position_w_context):
-  features = features_lib.extract_features(
-      board_size, position_w_context.position)
-  pi = _one_hot(board_size, coords.to_flat(
-      board_size, position_w_context.next_move))
-  value = position_w_context.result
-  return make_tf_example(features, pi, value)
-def shuffle_tf_examples(shuffle_buffer_size, gather_size, records_to_shuffle):
-  """Read through tf.Record and yield shuffled, but unparsed tf.Examples.
-  Args:
-    shuffle_buffer_size: the size for shuffle buffer
-    gather_size: The number of tf.Examples to be gathered together
-    records_to_shuffle: A list of filenames
-  Yields:
-    An iterator yielding lists of bytes, which are serialized tf.Examples.
-  """
-  dataset = read_tf_records(shuffle_buffer_size, gather_size,
-                            records_to_shuffle, num_repeats=1)
-  batch = dataset.make_one_shot_iterator().get_next()
-  sess = tf.Session()
-  while True:
-    try:
-      result = sess.run(batch)
-      yield list(result)
-    except tf.errors.OutOfRangeError:
-      break