"...test_cli/git@developer.sourcefind.cn:wangsen/mineru.git" did not exist on "1b71bb9309de2857bc152b94138a2296c7df0e68"
Commit 6f1e3b38 authored by Brian Lee's avatar Brian Lee Committed by Hongkun Yu
Browse files

Remove unmaintained fork of Minigo code (#7605)

The reference implementation can be found at
https://github.com/tensorflow/minigo

This fork was originally created to experiment with performance upgrades
for MLPerf, but since MLPerf work is focused in the original repo,
this fork's existence only serves to confuse.
parent 497989e0
# MiniGo
This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, ["Mastering the Game of Go without Human
Knowledge"](https://www.nature.com/articles/nature24270). An useful one-diagram overview of Alphago Zero can be found in the [cheat sheet](https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0).
The implementation of MiniGo consists of three main components: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently, the **DualNet model** is our focus.
## DualNet Architecture
DualNet is the neural network used in MiniGo. It's based on residual blocks with two heads output. Following is a brief overview of the DualNet architecture.
### Input Features
The input to the neural network is a [board_size * board_size * 17] image stack
comprising 17 binary feature planes. 8 feature planes consist of binary values
indicating the presence of the current player's stones; A further 8 feature
planes represent the corresponding features for the opponent's stones; The final
feature plane represents the color to play, and has a constant value of either 1
if black is to play or 0 if white to play. Check [features.py](features.py) for more details.
### Neural Network Structure
In MiniGo implementation, the input features are processed by a residual tower
that consists of a single convolutional block followed by either 9 or 19
residual blocks.
The convolutional block applies the following modules:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
Each residual block applies the following modules sequentially to its input:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
5. Batch normalization
6. A skip connection that adds the input to the block
7. A rectifier non-linearity
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size in MiniGo implementation.
### Dual Heads Output
The output of the residual tower is passed into two separate "heads" for
computing the policy and value respectively. The policy head applies the
following modules:
1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move
The value head applies the following modules:
1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
board size and 64 for 9x9 board size
5. A rectifier non-linearity
6. A fully connected linear layer to a scalar
7. A tanh non-linearity outputting a scalar in the range [-1, 1]
In MiniGo, the overall network depth, in the 10 or 20 block network, is 19 or 39
parameterized layers respectively for the residual tower, plus an additional 2
layers for the policy head and 3 layers for the value head.
## Getting Started
This project assumes you have virtualenv, TensorFlow (>= 1.5) and two other Go-related
packages pygtp(>=0.4) and sgf (==0.5).
## Training Model
One iteration of reinforcement learning (RL) consists of the following steps:
- Bootstrap: initializes a random DualNet model. If the estimator directory has exist, the model is initialized with the last checkpoint.
- Selfplay: plays games with the latest model or the best model so far identified by evaluation, producing data used for training
- Gather: groups games played with the same model into larger files of tfexamples.
- Train: trains a new model with the selfplay results from the most recent N generations.
To run the RL pipeline, issue the following command:
```
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256
```
Arguments:
* `--base_dir`: Base directory for MiniGo data and models. If not specified, it's set as /tmp/minigo/ by default.
* `--board_size`: Go board size. It can be either 9 or 19. By default, it's 9.
* `--batch_size`: Batch size for model training. If not specified, it's calculated based on go board size.
Use the `--help` or `-h` flag to get a full list of possible arguments. Besides all these arguments, other parameters about RL pipeline and DualNet model can be found and configured in [model_params.py](model_params.py).
Suppose the base directory argument `base_dir` is `$HOME/minigo/` and we use 9 as the `board_size`. After model training, the following directories are created to store models and game data:
$HOME/minigo # base directory
├── 9_size # directory for 9x9 board size
│ │
│ ├── data
│ │ ├── holdout # holdout data for model validation
│ │ ├── selfplay # data generated by selfplay of each model
│ │ └── training_chunks # gatherd tf_examples for model training
│ │
│ ├── estimator_model_dir # estimator working directory
│ │
│ ├── trained_models # all the trained models
│ │
│ └── sgf # sgf (smart go files) folder
│ ├── 000000-bootstrap # model name
│ │ ├── clean # clean sgf files of model selfplay
│ │ └── full # full sgf files of model selfplay
│ ├── ...
│ └── evaluate # clean sgf files of model evaluation
└── ...
## Validating Model
To validate the trained model, issue the following command with `--validation` argument:
```
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --validation
```
## Evaluating Models
The performance of two models are compared with evaluation step. Given two models, one plays black and the other plays white. They play several games (# of games can be configured by parameter `eval_games` in [model_params.py](model_params.py)), and the one wins by a margin of 55% will be the winner.
To include the evaluation step in the RL pipeline, `--evaluation` argument can be specified to compare the performance of the `current_trained_model` and the `best_model_so_far`. The winner is used to update `best_model_so_far`. Run the following command to include evaluation step in the pipeline:
```
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --evaluation
```
## Testing Pipeline
As the whole RL pipeline may take hours to train even for a 9x9 board size, a `--test` argument is provided to test the pipeline quickly with a dummy neural network model.
To test the RL pipeline with a dummy model, issue the following command:
```
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --test
```
## Running Self-play Only
Self-play only option is provided to run selfplay step individually to generate training data in parallel. Issue the following command to run selfplay only with the latest trained model:
```
python minigo.py --selfplay
```
Other optional arguments:
* `--selfplay_model_name`: The name of the model used for selfplay only. If not specified, the latest trained model will be used for selfplay.
* `--selfplay_max_games`: The maximum number of games selfplay is required to generate. If not specified, the default parameter `max_games_per_generation` is used.
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Logic for dealing with coordinates.
This introduces some helpers and terminology that are used throughout MiniGo.
MiniGo Coordinate: This is a tuple of the form (row, column) that is indexed
starting out at (0, 0) from the upper-left.
Flattened Coordinate: this is a number ranging from 0 - N^2 (so N^2+1
possible values). The extra value N^2 is used to mark a 'pass' move.
SGF Coordinate: Coordinate used for SGF serialization format. Coordinates use
two-letter pairs having the form (column, row) indexed from the upper-left
where 0, 0 = 'aa'.
KGS Coordinate: Human-readable coordinate string indexed from bottom left, with
the first character a capital letter for the column and the second a number
from 1-19 for the row. Note that KGS chooses to skip the letter 'I' due to
its similarity with 'l' (lowercase 'L').
PYGTP Coordinate: Tuple coordinate indexed starting at 1,1 from bottom-left
in the format (column, row)
So, for a 19x19,
Coord Type upper_left upper_right pass
-------------------------------------------------------
minigo coord (0, 0) (0, 18) None
flat 0 18 361
SGF 'aa' 'sa' ''
KGS 'A19' 'T19' 'pass'
pygtp (1, 19) (19, 19) (0, 0)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import gtp
# We provide more than 19 entries here in case of boards larger than 19 x 19.
_SGF_COLUMNS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
_KGS_COLUMNS = 'ABCDEFGHJKLMNOPQRSTUVWXYZ'
def from_flat(board_size, flat):
"""Converts from a flattened coordinate to a MiniGo coordinate."""
if flat == board_size * board_size:
return None
return divmod(flat, board_size)
def to_flat(board_size, coord):
"""Converts from a MiniGo coordinate to a flattened coordinate."""
if coord is None:
return board_size * board_size
return board_size * coord[0] + coord[1]
def from_sgf(sgfc):
"""Converts from an SGF coordinate to a MiniGo coordinate."""
if not sgfc:
return None
return _SGF_COLUMNS.index(sgfc[1]), _SGF_COLUMNS.index(sgfc[0])
def to_sgf(coord):
"""Converts from a MiniGo coordinate to an SGF coordinate."""
if coord is None:
return ''
return _SGF_COLUMNS[coord[1]] + _SGF_COLUMNS[coord[0]]
def from_kgs(board_size, kgsc):
"""Converts from a KGS coordinate to a MiniGo coordinate."""
if kgsc == 'pass':
return None
kgsc = kgsc.upper()
col = _KGS_COLUMNS.index(kgsc[0])
row_from_bottom = int(kgsc[1:])
return board_size - row_from_bottom, col
def to_kgs(board_size, coord):
"""Converts from a MiniGo coordinate to a KGS coordinate."""
if coord is None:
return 'pass'
y, x = coord
return '{}{}'.format(_KGS_COLUMNS[x], board_size - y)
def from_pygtp(board_size, pygtpc):
"""Converts from a pygtp coordinate to a MiniGo coordinate."""
# GTP has a notion of both a Pass and a Resign, both of which are mapped to
# None, so the conversion is not precisely bijective.
if pygtpc in (gtp.PASS, gtp.RESIGN):
return None
return board_size - pygtpc[1], pygtpc[0] - 1
def to_pygtp(board_size, coord):
"""Converts from a MiniGo coordinate to a pygtp coordinate."""
if coord is None:
return gtp.PASS
return coord[1] + 1, board_size - coord[0]
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for coords."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf # pylint: disable=g-bad-import-order
import coords
import numpy
import utils_test
tf.logging.set_verbosity(tf.logging.ERROR)
class TestCoords(utils_test.MiniGoUnitTest):
def test_upperleft(self):
self.assertEqual(coords.from_sgf('aa'), (0, 0))
self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 0), (0, 0))
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'A9'), (0, 0))
self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (1, 9)), (0, 0))
self.assertEqual(coords.to_sgf((0, 0)), 'aa')
self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 0)), 0)
self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (0, 0)), 'A9')
self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (0, 0)), (1, 9))
def test_topleft(self):
self.assertEqual(coords.from_sgf('ia'), (0, 8))
self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 8), (0, 8))
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'J9'), (0, 8))
self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (9, 9)), (0, 8))
self.assertEqual(coords.to_sgf((0, 8)), 'ia')
self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 8)), 8)
self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (0, 8)), 'J9')
self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (0, 8)), (9, 9))
def test_pass(self):
self.assertEqual(coords.from_sgf(''), None)
self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 81), None)
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'pass'), None)
self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (0, 0)), None)
self.assertEqual(coords.to_sgf(None), '')
self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, None), 81)
self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, None), 'pass')
self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, None), (0, 0))
def test_parsing_9x9(self):
self.assertEqual(coords.from_sgf('aa'), (0, 0))
self.assertEqual(coords.from_sgf('ac'), (2, 0))
self.assertEqual(coords.from_sgf('ca'), (0, 2))
self.assertEqual(coords.from_sgf(''), None)
self.assertEqual(coords.to_sgf(None), '')
self.assertEqual('aa', coords.to_sgf(coords.from_sgf('aa')))
self.assertEqual('sa', coords.to_sgf(coords.from_sgf('sa')))
self.assertEqual((1, 17), coords.from_sgf(coords.to_sgf((1, 17))))
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'A1'), (8, 0))
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'A9'), (0, 0))
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'C2'), (7, 2))
self.assertEqual(coords.from_kgs(utils_test.BOARD_SIZE, 'J2'), (7, 8))
self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (1, 1)), (8, 0))
self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (1, 9)), (0, 0))
self.assertEqual(coords.from_pygtp(utils_test.BOARD_SIZE, (3, 2)), (7, 2))
self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (8, 0)), (1, 1))
self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (0, 0)), (1, 9))
self.assertEqual(coords.to_pygtp(utils_test.BOARD_SIZE, (7, 2)), (3, 2))
self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (0, 8)), 'J9')
self.assertEqual(coords.to_kgs(utils_test.BOARD_SIZE, (8, 0)), 'A1')
def test_flatten(self):
self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 0)), 0)
self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (0, 3)), 3)
self.assertEqual(coords.to_flat(utils_test.BOARD_SIZE, (3, 0)), 27)
self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 27), (3, 0))
self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 10), (1, 1))
self.assertEqual(coords.from_flat(utils_test.BOARD_SIZE, 80), (8, 8))
self.assertEqual(coords.to_flat(
utils_test.BOARD_SIZE, coords.from_flat(utils_test.BOARD_SIZE, 10)), 10)
self.assertEqual(coords.from_flat(
utils_test.BOARD_SIZE, coords.to_flat(
utils_test.BOARD_SIZE, (5, 4))), (5, 4))
def test_from_flat_ndindex_equivalence(self):
ndindices = list(numpy.ndindex(
utils_test.BOARD_SIZE, utils_test.BOARD_SIZE))
flat_coords = list(range(
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE))
def _from_flat(flat_coords):
return coords.from_flat(utils_test.BOARD_SIZE, flat_coords)
self.assertEqual(
list(map(_from_flat, flat_coords)), ndindices)
if __name__ == '__main__':
tf.test.main()
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains utility and supporting functions for DualNet.
This module provides the model interface, including functions for DualNet model
bootstrap, training, validation, loading and exporting.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import tensorflow as tf # pylint: disable=g-bad-import-order
import dualnet_model
import features
import preprocessing
import symmetries
class DualNetRunner(object):
"""The DualNetRunner class for the complete model with graph and weights.
This class can restore the model from saved files, and provide inference for
given examples.
"""
def __init__(self, save_file, params):
"""Initialize the dual network from saved model/checkpoints.
Args:
save_file: Path where model parameters were previously saved. For example:
'/tmp/minigo/models_dir/000000-bootstrap/'
params: An object with hyperparameters for DualNetRunner
"""
self.save_file = save_file
self.hparams = params
self.inference_input = None
self.inference_output = None
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
self.sess = tf.Session(graph=tf.Graph(), config=config)
self.initialize_graph()
def initialize_graph(self):
"""Initialize the graph with saved model."""
with self.sess.graph.as_default():
input_features, labels = get_inference_input(self.hparams)
estimator_spec = dualnet_model.model_fn(
input_features, labels, tf.estimator.ModeKeys.PREDICT, self.hparams)
self.inference_input = input_features
self.inference_output = estimator_spec.predictions
if self.save_file is not None:
self.initialize_weights(self.save_file)
else:
self.sess.run(tf.global_variables_initializer())
def initialize_weights(self, save_file):
"""Initialize the weights from the given save_file.
Assumes that the graph has been constructed, and the save_file contains
weights that match the graph. Used to set the weights to a different version
of the player without redefining the entire graph.
Args:
save_file: Path where model parameters were previously saved.
"""
tf.train.Saver().restore(self.sess, save_file)
def run(self, position, use_random_symmetry=True):
"""Compute the policy and value output for a given position.
Args:
position: A given go board status
use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
the extracted feature (defined in features.py) of the given position
Returns:
prob, value: The policy and value output (defined in dualnet_model.py)
"""
probs, values = self.run_many(
[position], use_random_symmetry=use_random_symmetry)
return probs[0], values[0]
def run_many(self, positions, use_random_symmetry=True):
"""Compute the policy and value output for given positions.
Args:
positions: A list of positions for go board status
use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
the extracted features (defined in features.py) of the given positions
Returns:
probabilities, value: The policy and value outputs (defined in
dualnet_model.py)
"""
def _extract_features(positions):
return features.extract_features(self.hparams.board_size, positions)
processed = list(map(_extract_features, positions))
# processed = [
# features.extract_features(self.hparams.board_size, p) for p in positions]
if use_random_symmetry:
syms_used, processed = symmetries.randomize_symmetries_feat(processed)
# feed_dict is a dict object to provide the input examples for the step of
# inference. sess.run() returns the inference predictions (indicated by
# self.inference_output) of the given input as outputs
outputs = self.sess.run(
self.inference_output, feed_dict={self.inference_input: processed})
probabilities, value = outputs['policy_output'], outputs['value_output']
if use_random_symmetry:
probabilities = symmetries.invert_symmetries_pi(
self.hparams.board_size, syms_used, probabilities)
return probabilities, value
def get_inference_input(params):
"""Set up placeholders for input features/labels.
Args:
params: An object to indicate the hyperparameters of the model.
Returns:
The features and output tensors that get passed into model_fn. Check
dualnet_model.py for more details on the models input and output.
"""
input_features = tf.placeholder(
tf.float32, [None, params.board_size, params.board_size,
features.NEW_FEATURES_PLANES],
name='pos_tensor')
labels = {
'pi_tensor': tf.placeholder(
tf.float32, [None, params.board_size * params.board_size + 1]),
'value_tensor': tf.placeholder(tf.float32, [None])
}
return input_features, labels
def bootstrap(working_dir, params):
"""Initialize a tf.Estimator run with random initial weights.
Args:
working_dir: The directory where tf.estimator will drop logs,
checkpoints, and so on
params: hyperparams of the model.
"""
# Forge an initial checkpoint with the name that subsequent Estimator will
# expect to find.
estimator_initial_checkpoint_name = 'model.ckpt-1'
save_file = os.path.join(working_dir,
estimator_initial_checkpoint_name)
sess = tf.Session()
with sess.graph.as_default():
input_features, labels = get_inference_input(params)
dualnet_model.model_fn(
input_features, labels, tf.estimator.ModeKeys.PREDICT, params)
sess.run(tf.global_variables_initializer())
tf.train.Saver().save(sess, save_file)
def export_model(working_dir, model_path):
"""Take the latest checkpoint and export it to model_path for selfplay.
Assumes that all relevant model files are prefixed by the same name.
(For example, foo.index, foo.meta and foo.data-00000-of-00001).
Args:
working_dir: The directory where tf.estimator keeps its checkpoints.
model_path: Either a local path or a gs:// path to export model to.
"""
latest_checkpoint = tf.train.latest_checkpoint(working_dir)
all_checkpoint_files = tf.gfile.Glob(latest_checkpoint + '*')
for filename in all_checkpoint_files:
suffix = filename.partition(latest_checkpoint)[2]
destination_path = model_path + suffix
tf.gfile.Copy(filename, destination_path)
def train(working_dir, tf_records, generation, params):
"""Train the model for a specific generation.
Args:
working_dir: The model working directory to save model parameters,
drop logs, checkpoints, and so on.
tf_records: A list of tf_record filenames for training input.
generation: The generation to be trained.
params: hyperparams of the model.
Raises:
ValueError: if generation is not greater than 0.
"""
if generation <= 0:
raise ValueError('Model 0 is random weights')
estimator = tf.estimator.Estimator(
dualnet_model.model_fn, model_dir=working_dir, params=params)
max_steps = (generation * params.examples_per_generation
// params.batch_size)
profiler_hook = tf.train.ProfilerHook(output_dir=working_dir, save_secs=600)
def input_fn():
return preprocessing.get_input_tensors(
params, params.batch_size, tf_records)
estimator.train(
input_fn, hooks=[profiler_hook], max_steps=max_steps)
def validate(working_dir, tf_records, params):
"""Perform model validation on the hold out data.
Args:
working_dir: The model working directory.
tf_records: A list of tf_records filenames for holdout data.
params: hyperparams of the model.
"""
estimator = tf.estimator.Estimator(
dualnet_model.model_fn, model_dir=working_dir, params=params)
def input_fn():
return preprocessing.get_input_tensors(
params, params.batch_size, tf_records, filter_amount=0.05)
estimator.evaluate(input_fn, steps=1000)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Defines DualNet model, the architecture of the policy and value network.
The input to the neural network is a [board_size * board_size * 17] image stack
comprising 17 binary feature planes. 8 feature planes consist of binary values
indicating the presence of the current player's stones; A further 8 feature
planes represent the corresponding features for the opponent's stones; The final
feature plane represents the color to play, and has a constant value of either 1
if black is to play or 0 if white to play. Check 'features.py' for more details.
In MiniGo implementation, the input features are processed by a residual tower
that consists of a single convolutional block followed by either 9 or 19
residual blocks.
The convolutional block applies the following modules:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
Each residual block applies the following modules sequentially to its input:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
5. Batch normalization
6. A skip connection that adds the input to the block
7. A rectifier non-linearity
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
The output of the residual tower is passed into two separate "heads" for
computing the policy and value respectively. The policy head applies the
following modules:
1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer that outputs a vector of size 19^2 + 1 = 362
corresponding to logit probabilities for all intersections and the pass move
The value head applies the following modules:
1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
board size and 64 for 9x9 board size
5. A rectifier non-linearity
6. A fully connected linear layer to a scalar
7. A tanh non-linearity outputting a scalar in the range [-1, 1]
The overall network depth, in the 10 or 20 block network, is 19 or 39
parameterized layers respectively for the residual tower, plus an additional 2
layers for the policy head and 3 layers for the value head.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
_BATCH_NORM_DECAY = 0.997
_BATCH_NORM_EPSILON = 1e-5
def _batch_norm(inputs, training, center=True, scale=True):
"""Performs a batch normalization using a standard set of parameters."""
return tf.layers.batch_normalization(
inputs=inputs, momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON,
center=center, scale=scale, fused=True, training=training)
def _conv2d(inputs, filters, kernel_size):
"""Performs 2D convolution with a standard set of parameters."""
return tf.layers.conv2d(
inputs=inputs, filters=filters, kernel_size=kernel_size,
padding='same')
def _conv_block(inputs, filters, kernel_size, training):
"""A convolutional block.
Args:
inputs: A tensor representing a batch of input features with shape
[BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
filters: The number of filters for network layers in residual tower.
kernel_size: The kernel to be used in conv2d.
training: Either True or False, whether we are currently training the
model. Needed for batch norm.
Returns:
The output tensor of the convolutional block layer.
"""
conv = _conv2d(inputs, filters, kernel_size)
batchn = _batch_norm(conv, training)
output = tf.nn.relu(batchn)
return output
def _res_block(inputs, filters, kernel_size, training):
"""A residual block.
Args:
inputs: A tensor representing a batch of input features with shape
[BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
filters: The number of filters for network layers in residual tower.
kernel_size: The kernel to be used in conv2d.
training: Either True or False, whether we are currently training the
model. Needed for batch norm.
Returns:
The output tensor of the residual block layer.
"""
initial_output = _conv_block(inputs, filters, kernel_size, training)
int_layer2_conv = _conv2d(initial_output, filters, kernel_size)
int_layer2_batchn = _batch_norm(int_layer2_conv, training)
output = tf.nn.relu(inputs + int_layer2_batchn)
return output
class Model(object):
"""Base class for building the DualNet Model."""
def __init__(self, num_filters, num_shared_layers, fc_width, board_size):
"""Initialize a model for computing the policy and value in RL.
Args:
num_filters: Number of filters (AlphaGoZero used 256). We use 128 by
default for a 19x19 go board, and 32 for 9x9 size.
num_shared_layers: Number of shared residual blocks. AGZ used both 19
and 39. Here we use 19 for 19x19 size and 9 for 9x9 size because it's
faster to train.
fc_width: Dimensionality of the fully connected linear layer.
board_size: A single integer for the board size.
"""
self.num_filters = num_filters
self.num_shared_layers = num_shared_layers
self.fc_width = fc_width
self.board_size = board_size
self.kernel_size = [3, 3] # kernel size is from AGZ paper
def __call__(self, inputs, training):
"""Add operations to classify a batch of input Go features.
Args:
inputs: A Tensor representing a batch of input Go features with shape
[BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES]
training: A boolean. Set to True to add operations required only when
training the classifier.
Returns:
policy_logits: A vector of size self.board_size * self.board_size + 1
corresponding to the policy logit probabilities for all intersections
and the pass move.
value_logits: A scalar for the value logits output
"""
initial_output = _conv_block(
inputs=inputs, filters=self.num_filters,
kernel_size=self.kernel_size, training=training)
# the shared stack
shared_output = initial_output
for _ in range(self.num_shared_layers):
shared_output = _res_block(
inputs=shared_output, filters=self.num_filters,
kernel_size=self.kernel_size, training=training)
# policy head
policy_conv2d = _conv2d(inputs=shared_output, filters=2, kernel_size=[1, 1])
policy_batchn = _batch_norm(inputs=policy_conv2d, training=training,
center=False, scale=False)
policy_relu = tf.nn.relu(policy_batchn)
policy_logits = tf.layers.dense(
tf.reshape(policy_relu, [-1, self.board_size * self.board_size * 2]),
self.board_size * self.board_size + 1)
# value head
value_conv2d = _conv2d(shared_output, filters=1, kernel_size=[1, 1])
value_batchn = _batch_norm(value_conv2d, training,
center=False, scale=False)
value_relu = tf.nn.relu(value_batchn)
value_fc_hidden = tf.nn.relu(tf.layers.dense(
tf.reshape(value_relu, [-1, self.board_size * self.board_size]),
self.fc_width))
value_logits = tf.reshape(tf.layers.dense(value_fc_hidden, 1), [-1])
return policy_logits, value_logits
def model_fn(features, labels, mode, params, config=None): # pylint: disable=unused-argument
"""DualNet model function.
Args:
features: tensor with shape
[BATCH_SIZE, self.board_size, self.board_size,
features.NEW_FEATURES_PLANES]
labels: dict from string to tensor with shape
'pi_tensor': [BATCH_SIZE, self.board_size * self.board_size + 1]
'value_tensor': [BATCH_SIZE]
mode: a tf.estimator.ModeKeys (batchnorm params update for TRAIN only)
params: an object of hyperparams
config: ignored; is required by Estimator API.
Returns:
EstimatorSpec parameterized according to the input params and the current
mode.
"""
model = Model(params.num_filters, params.num_shared_layers, params.fc_width,
params.board_size)
policy_logits, value_logits = model(
features, mode == tf.estimator.ModeKeys.TRAIN)
policy_output = tf.nn.softmax(policy_logits, name='policy_output')
value_output = tf.nn.tanh(value_logits, name='value_output')
# Calculate model loss. The loss function sums over the mean-squared error,
# the cross-entropy losses and the l2 regularization term.
# Cross-entropy of policy
policy_entropy = -tf.reduce_mean(tf.reduce_sum(
policy_output * tf.log(policy_output), axis=1))
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=policy_logits, labels=labels['pi_tensor']))
# Mean squared error
value_cost = tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
# L2 term
l2_cost = params.l2_strength * tf.add_n(
[tf.nn.l2_loss(v) for v in tf.trainable_variables()
if 'bias' not in v.name])
# The loss function
combined_cost = policy_cost + value_cost + l2_cost
# Get model train ops
global_step = tf.train.get_or_create_global_step()
boundaries = [int(1e6), int(2e6)]
values = [1e-2, 1e-3, 1e-4]
learning_rate = tf.train.piecewise_constant(
global_step, boundaries, values)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = tf.train.MomentumOptimizer(
learning_rate, params.momentum).minimize(
combined_cost, global_step=global_step)
# Create multiple tensors for logging purpose
metric_ops = {
'accuracy': tf.metrics.accuracy(labels=labels['pi_tensor'],
predictions=policy_output,
name='accuracy_op'),
'policy_cost': tf.metrics.mean(policy_cost),
'value_cost': tf.metrics.mean(value_cost),
'l2_cost': tf.metrics.mean(l2_cost),
'policy_entropy': tf.metrics.mean(policy_entropy),
'combined_cost': tf.metrics.mean(combined_cost),
}
for metric_name, metric_op in metric_ops.items():
tf.summary.scalar(metric_name, metric_op[1])
# Return tf.estimator.EstimatorSpec
return tf.estimator.EstimatorSpec(
mode=mode,
predictions={
'policy_output': policy_output,
'value_output': value_output,
},
loss=combined_cost,
train_op=train_op,
eval_metric_ops=metric_ops)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for dualnet and dualnet_model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import tempfile
import tensorflow as tf # pylint: disable=g-bad-import-order
import dualnet
import go
import model_params
import preprocessing
import utils_test
tf.logging.set_verbosity(tf.logging.ERROR)
class TestDualNet(utils_test.MiniGoUnitTest):
def test_train(self):
with tempfile.TemporaryDirectory() as working_dir, \
tempfile.NamedTemporaryFile() as tf_record:
preprocessing.make_dataset_from_sgf(
utils_test.BOARD_SIZE, 'example_game.sgf', tf_record.name)
dualnet.train(
working_dir, [tf_record.name], 1, model_params.DummyMiniGoParams())
def test_inference(self):
with tempfile.TemporaryDirectory() as working_dir, \
tempfile.TemporaryDirectory() as export_dir:
dualnet.bootstrap(working_dir, model_params.DummyMiniGoParams())
exported_model = os.path.join(export_dir, 'bootstrap-model')
dualnet.export_model(working_dir, exported_model)
n1 = dualnet.DualNetRunner(
exported_model, model_params.DummyMiniGoParams())
n1.run(go.Position(utils_test.BOARD_SIZE))
n2 = dualnet.DualNetRunner(
exported_model, model_params.DummyMiniGoParams())
n2.run(go.Position(utils_test.BOARD_SIZE))
if __name__ == '__main__':
tf.test.main()
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Evaluation of playing games between two neural nets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import go
from gtp_wrapper import MCTSPlayer
import sgf_wrapper
def play_match(params, black_net, white_net, games, readouts,
sgf_dir, verbosity):
"""Plays matches between two neural nets.
One net that wins by a margin of 55% will be the winner.
Args:
params: An object of hyperparameters.
black_net: Instance of the DualNetRunner class to play as black.
white_net: Instance of the DualNetRunner class to play as white.
games: Number of games to play. We play all the games at the same time.
readouts: Number of readouts to perform for each step in each game.
sgf_dir: Directory to write the sgf results.
verbosity: Verbosity to show evaluation process.
Returns:
'B' is the winner is black_net, otherwise 'W'.
"""
# For n games, we create lists of n black and n white players
black = MCTSPlayer(
params.board_size, black_net, verbosity=verbosity, two_player_mode=True,
num_parallel=params.simultaneous_leaves)
white = MCTSPlayer(
params.board_size, white_net, verbosity=verbosity, two_player_mode=True,
num_parallel=params.simultaneous_leaves)
black_name = os.path.basename(black_net.save_file)
white_name = os.path.basename(white_net.save_file)
black_win_counts = 0
white_win_counts = 0
for i in range(games):
num_move = 0 # The move number of the current game
black.initialize_game()
white.initialize_game()
while True:
start = time.time()
active = white if num_move % 2 else black
inactive = black if num_move % 2 else white
current_readouts = active.root.N
while active.root.N < current_readouts + readouts:
active.tree_search()
# print some stats on the search
if verbosity >= 3:
print(active.root.position)
# First, check the roots for hopeless games.
if active.should_resign(): # Force resign
active.set_result(-active.root.position.to_play, was_resign=True)
inactive.set_result(
active.root.position.to_play, was_resign=True)
if active.is_done():
fname = '{:d}-{:s}-vs-{:s}-{:d}.sgf'.format(
int(time.time()), white_name, black_name, i)
with open(os.path.join(sgf_dir, fname), 'w') as f:
sgfstr = sgf_wrapper.make_sgf(
params.board_size, active.position.recent, active.result_string,
black_name=black_name, white_name=white_name)
f.write(sgfstr)
print('Finished game', i, active.result_string)
if active.result_string is not None:
if active.result_string[0] == 'B':
black_win_counts += 1
elif active.result_string[0] == 'W':
white_win_counts += 1
break
move = active.pick_move()
active.play_move(move)
inactive.play_move(move)
dur = time.time() - start
num_move += 1
if (verbosity > 1) or (verbosity == 1 and num_move % 10 == 9):
timeper = (dur / readouts) * 100.0
print(active.root.position)
print('{:d}: {:d} readouts, {:.3f} s/100. ({:.2f} sec)'.format(
num_move, readouts, timeper, dur))
if (black_win_counts - white_win_counts) > params.eval_win_rate * games:
return go.BLACK_NAME
else:
return go.WHITE_NAME
(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[9]KM[0.00]
PW[White]PB[Black]RE[B+4.00]
;B[de]
;W[fe]
;B[ee]
;W[fd]
;B[ff]
;W[gf]
;B[gg]
;W[fg]
;B[ef]
;W[gh]
;B[hg]
;W[hh]
;B[eg]
;W[fh]
;B[ge]
;W[hf]
;B[he]
;W[ig]
;B[fc]
;W[gd]
;B[gc]
;W[hd]
;B[ed]
;W[be]
;B[hc]
;W[ie]
;B[bc]
;W[cg]
;B[cf]
;W[bf]
;B[ch]
(;W[dg]
;B[dh]
;W[bh]
;B[eh]
;W[cc]
;B[cb])
(;W[cc]
;B[cb]
(;W[bh]
;B[dh])
(;W[dg]
;B[dh]
;W[bh]
;B[eh]
;W[dc]
;B[bd]
;W[ec]
;B[cd]
;W[fb]
;B[gb]
(;W[db])
(;W[bb]
;B[eb]
;W[db]
;B[fa]
;W[ca]
;B[ea]
;W[da]
;B[df]
;W[bg]
;B[bi]
;W[ab]
;B[ah]
;W[ci]
;B[di]
;W[ag]
;B[ae]
;W[ac]
;B[ad]
;W[ha]
;B[hb]
;W[fi]
;B[ce]
;W[ai]
;B[ci]
;W[ei]
;B[ah]
;W[ic]
;B[ib]
;W[ai]
;B[ba]
;W[aa]
;B[ah]
;W[ga]
;B[ia]
;W[ai]
;B[ga]
;W[id]
;B[ah]
;W[dd]
;B[af]TW[ba][cb][ge][he][if][gg][hg][ih][gi][hi][ii]TB[ha][fb][be][bf][ag][bg][cg][dg][bh][ai]))))
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Features used by AlphaGo Zero, in approximate order of importance.
Feature # Notes
Stone History 16 The stones of each color during the last 8 moves.
Ones 1 Constant plane of 1s
All features with 8 planes are 1-hot encoded, with plane i marked with 1
only if the feature was equal to i. Any features >= 8 would be marked as 8.
This file includes the features from from AlphaGo Zero (AGZ) as NEW_FEATURES.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import go
import numpy as np
def planes(num_planes):
# to specify the number of planes in the features. For example, for a 19x19
# go board, the input stone feature will be in the shape of [19, 19, 16],
# where the third dimension is the num_planes.
def deco(f):
f.planes = num_planes
return f
return deco
@planes(16)
def stone_features(board_size, position):
"""Create the 16 planes of features for a given position.
Args:
board_size: the go board size.
position: a given go board status.
Returns:
The 16 plane features.
"""
# a bit easier to calculate it with axis 0 being the 16 board states,
# and then roll axis 0 to the end.
features = np.zeros([16, board_size, board_size], dtype=np.uint8)
num_deltas_avail = position.board_deltas.shape[0]
cumulative_deltas = np.cumsum(position.board_deltas, axis=0)
last_eight = np.tile(position.board, [8, 1, 1])
# apply deltas to compute previous board states
last_eight[1:num_deltas_avail + 1] -= cumulative_deltas
# if no more deltas are available, just repeat oldest board.
last_eight[num_deltas_avail + 1:] = last_eight[num_deltas_avail].reshape(
1, board_size, board_size)
features[::2] = last_eight == position.to_play
features[1::2] = last_eight == -position.to_play
return np.rollaxis(features, 0, 3)
@planes(1)
def color_to_play_feature(board_size, position):
if position.to_play == go.BLACK:
return np.ones([board_size, board_size, 1], dtype=np.uint8)
else:
return np.zeros([board_size, board_size, 1], dtype=np.uint8)
NEW_FEATURES = [
stone_features,
color_to_play_feature
]
NEW_FEATURES_PLANES = sum(f.planes for f in NEW_FEATURES)
def extract_features(board_size, position, features=None):
if features is None:
features = NEW_FEATURES
return np.concatenate([feature(board_size, position) for feature in features],
axis=2)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for features."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf # pylint: disable=g-bad-import-order
import features
import go
import numpy as np
import utils_test
tf.logging.set_verbosity(tf.logging.ERROR)
EMPTY_ROW = '.' * utils_test.BOARD_SIZE + '\n'
TEST_BOARD = utils_test.load_board('''
.X.....OO
X........
XXXXXXXXX
''' + EMPTY_ROW * 6)
TEST_POSITION = go.Position(
utils_test.BOARD_SIZE,
board=TEST_BOARD,
n=3,
komi=6.5,
caps=(1, 2),
ko=None,
recent=(go.PlayerMove(go.BLACK, (0, 1)),
go.PlayerMove(go.WHITE, (0, 8)),
go.PlayerMove(go.BLACK, (1, 0))),
to_play=go.BLACK,
)
TEST_BOARD2 = utils_test.load_board('''
.XOXXOO..
XO.OXOX..
XXO..X...
''' + EMPTY_ROW * 6)
TEST_POSITION2 = go.Position(
utils_test.BOARD_SIZE,
board=TEST_BOARD2,
n=0,
komi=6.5,
caps=(0, 0),
ko=None,
recent=tuple(),
to_play=go.BLACK,
)
TEST_POSITION3 = go.Position(utils_test.BOARD_SIZE)
for coord in ((0, 0), (0, 1), (0, 2), (0, 3), (1, 1)):
TEST_POSITION3.play_move(coord, mutate=True)
# resulting position should look like this:
# X.XO.....
# .X.......
# .........
class TestFeatureExtraction(utils_test.MiniGoUnitTest):
def test_stone_features(self):
f = features.stone_features(utils_test.BOARD_SIZE, TEST_POSITION3)
self.assertEqual(TEST_POSITION3.to_play, go.WHITE)
self.assertEqual(f.shape, (9, 9, 16))
self.assertEqualNPArray(f[:, :, 0], utils_test.load_board('''
...X.....
.........''' + EMPTY_ROW * 7))
self.assertEqualNPArray(f[:, :, 1], utils_test.load_board('''
X.X......
.X.......''' + EMPTY_ROW * 7))
self.assertEqualNPArray(f[:, :, 2], utils_test.load_board('''
.X.X.....
.........''' + EMPTY_ROW * 7))
self.assertEqualNPArray(f[:, :, 3], utils_test.load_board('''
X.X......
.........''' + EMPTY_ROW * 7))
self.assertEqualNPArray(f[:, :, 4], utils_test.load_board('''
.X.......
.........''' + EMPTY_ROW * 7))
self.assertEqualNPArray(f[:, :, 5], utils_test.load_board('''
X.X......
.........''' + EMPTY_ROW * 7))
for i in range(10, 16):
self.assertEqualNPArray(
f[:, :, i], np.zeros([utils_test.BOARD_SIZE, utils_test.BOARD_SIZE]))
if __name__ == '__main__':
tf.test.main()
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Describe the Go game status.
A board is a NxN numpy array.
A Coordinate is a tuple index into the board.
A Move is a (Coordinate c | None).
A PlayerMove is a (Color, Move) tuple
(0, 0) is considered to be the upper left corner of the board, and (18, 0)
is the lower left.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import namedtuple
import copy
import itertools
import coords
import numpy as np
# Represent a board as a numpy array, with 0 empty, 1 is black, -1 is white.
# This means that swapping colors is as simple as multiplying array by -1.
WHITE, EMPTY, BLACK, FILL, KO, UNKNOWN = range(-1, 5)
# Represents "group not found" in the LibertyTracker object
MISSING_GROUP_ID = -1
BLACK_NAME = 'BLACK'
WHITE_NAME = 'WHITE'
def _check_bounds(board_size, c):
return c[0] % board_size == c[0] and c[1] % board_size == c[1]
def get_neighbors_diagonals(board_size):
"""Return coordinates of neighbors and diagonals for a go board."""
all_coords = [(i, j) for i in range(board_size) for j in range(board_size)]
def check_bounds(c):
return _check_bounds(board_size, c)
neighbors = {(x, y): list(filter(check_bounds, [
(x+1, y), (x-1, y), (x, y+1), (x, y-1)])) for x, y in all_coords}
diagonals = {(x, y): list(filter(check_bounds, [
(x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)])) for x, y in all_coords}
return neighbors, diagonals
class IllegalMove(Exception):
pass
class PlayerMove(namedtuple('PlayerMove', ['color', 'move'])):
pass
class PositionWithContext(namedtuple('SgfPosition',
['position', 'next_move', 'result'])):
pass
def place_stones(board, color, stones):
for s in stones:
board[s] = color
def replay_position(board_size, position, result):
"""Wrapper for a go.Position which replays its history."""
# Assumes an empty start position! (i.e. no handicap, and history must
# be exhaustive.)
# Result must be passed in, since a resign cannot be inferred from position
# history alone.
# for position_w_context in replay_position(position):
# print(position_w_context.position)
if position.n != len(position.recent):
raise ValueError('Position history is incomplete!')
pos = Position(board_size=board_size, komi=position.komi)
for player_move in position.recent:
color, next_move = player_move
yield PositionWithContext(pos, next_move, result)
pos = pos.play_move(next_move, color=color)
def find_reached(board_size, board, c):
"""Find the chain to reach c."""
color = board[c]
chain = set([c])
reached = set()
frontier = [c]
neighbors, _ = get_neighbors_diagonals(board_size)
while frontier:
current = frontier.pop()
chain.add(current)
for n in neighbors[current]:
if board[n] == color and n not in chain:
frontier.append(n)
elif board[n] != color:
reached.add(n)
return chain, reached
def is_koish(board_size, board, c):
"""Check if c is surrounded on all sides by 1 color, and return that color."""
if board[c] != EMPTY:
return None
full_neighbors, _ = get_neighbors_diagonals(board_size)
neighbors = {board[n] for n in full_neighbors[c]}
if len(neighbors) == 1 and EMPTY not in neighbors:
return list(neighbors)[0]
else:
return None
def is_eyeish(board_size, board, c):
"""Check if c is an eye, for the purpose of restricting MC rollouts."""
# pass is fine.
if c is None:
return
color = is_koish(board_size, board, c)
if color is None:
return None
diagonal_faults = 0
_, all_diagonals = get_neighbors_diagonals(board_size)
diagonals = all_diagonals[c]
if len(diagonals) < 4:
diagonal_faults += 1
for d in diagonals:
if board[d] not in (color, EMPTY):
diagonal_faults += 1
if diagonal_faults > 1:
return None
else:
return color
class Group(namedtuple('Group', ['id', 'stones', 'liberties', 'color'])):
"""Group class.
stones: a frozenset of Coordinates belonging to this group
liberties: a frozenset of Coordinates that are empty and adjacent to
this group.
color: color of this group
"""
def __eq__(self, other):
return (self.stones == other.stones and self.liberties == other.liberties
and self.color == other.color)
class LibertyTracker(object):
"""LibertyTracker class."""
@staticmethod
def from_board(board_size, board):
board = np.copy(board)
curr_group_id = 0
lib_tracker = LibertyTracker(board_size)
for color in (WHITE, BLACK):
while color in board:
curr_group_id += 1
found_color = np.where(board == color)
coord = found_color[0][0], found_color[1][0]
chain, reached = find_reached(board_size, board, coord)
liberties = frozenset(r for r in reached if board[r] == EMPTY)
new_group = Group(curr_group_id, frozenset(
chain), liberties, color)
lib_tracker.groups[curr_group_id] = new_group
for s in chain:
lib_tracker.group_index[s] = curr_group_id
place_stones(board, FILL, chain)
lib_tracker.max_group_id = curr_group_id
liberty_counts = np.zeros([board_size, board_size], dtype=np.uint8)
for group in lib_tracker.groups.values():
num_libs = len(group.liberties)
for s in group.stones:
liberty_counts[s] = num_libs
lib_tracker.liberty_cache = liberty_counts
return lib_tracker
def __init__(self, board_size, group_index=None, groups=None,
liberty_cache=None, max_group_id=1):
# group_index: a NxN numpy array of group_ids. -1 means no group
# groups: a dict of group_id to groups
# liberty_cache: a NxN numpy array of liberty counts
self.board_size = board_size
self.group_index = (group_index if group_index is not None else
-np.ones([board_size, board_size], dtype=np.int32))
self.groups = groups or {}
self.liberty_cache = (
liberty_cache if liberty_cache is not None
else -np.zeros([board_size, board_size], dtype=np.uint8))
self.max_group_id = max_group_id
self.neighbors, _ = get_neighbors_diagonals(board_size)
def __deepcopy__(self, memodict=None):
new_group_index = np.copy(self.group_index)
new_lib_cache = np.copy(self.liberty_cache)
# shallow copy
new_groups = copy.copy(self.groups)
return LibertyTracker(
self.board_size, new_group_index, new_groups,
liberty_cache=new_lib_cache, max_group_id=self.max_group_id)
def add_stone(self, color, c):
assert self.group_index[c] == MISSING_GROUP_ID
captured_stones = set()
opponent_neighboring_group_ids = set()
friendly_neighboring_group_ids = set()
empty_neighbors = set()
for n in self.neighbors[c]:
neighbor_group_id = self.group_index[n]
if neighbor_group_id != MISSING_GROUP_ID:
neighbor_group = self.groups[neighbor_group_id]
if neighbor_group.color == color:
friendly_neighboring_group_ids.add(neighbor_group_id)
else:
opponent_neighboring_group_ids.add(neighbor_group_id)
else:
empty_neighbors.add(n)
new_group = self._create_group(color, c, empty_neighbors)
for group_id in friendly_neighboring_group_ids:
new_group = self._merge_groups(group_id, new_group.id)
# new_group becomes stale as _update_liberties and
# _handle_captures are called; must refetch with self.groups[new_group.id]
for group_id in opponent_neighboring_group_ids:
neighbor_group = self.groups[group_id]
if len(neighbor_group.liberties) == 1:
captured = self._capture_group(group_id)
captured_stones.update(captured)
else:
self._update_liberties(group_id, remove={c})
self._handle_captures(captured_stones)
# suicide is illegal
if self.groups[new_group.id].liberties is None:
raise IllegalMove('Move at {} would commit suicide!\n'.format(c))
return captured_stones
def _create_group(self, color, c, liberties):
self.max_group_id += 1
new_group = Group(self.max_group_id, frozenset([c]), liberties, color)
self.groups[new_group.id] = new_group
self.group_index[c] = new_group.id
self.liberty_cache[c] = len(liberties)
return new_group
def _merge_groups(self, group1_id, group2_id):
group1 = self.groups[group1_id]
group2 = self.groups[group2_id]
self.groups[group1_id] = Group(
group1_id, group1.stones | group2.stones, group1.liberties,
group1.color)
del self.groups[group2_id]
for s in group2.stones:
self.group_index[s] = group1_id
self._update_liberties(
group1_id, add=group2.liberties, remove=group2.stones)
return group1
def _capture_group(self, group_id):
dead_group = self.groups[group_id]
del self.groups[group_id]
for s in dead_group.stones:
self.group_index[s] = MISSING_GROUP_ID
self.liberty_cache[s] = 0
return dead_group.stones
def _update_liberties(self, group_id, add=set(), remove=set()):
group = self.groups[group_id]
new_libs = (group.liberties | add) - remove
self.groups[group_id] = Group(
group_id, group.stones, new_libs, group.color)
new_lib_count = len(new_libs)
for s in self.groups[group_id].stones:
self.liberty_cache[s] = new_lib_count
def _handle_captures(self, captured_stones):
for s in captured_stones:
for n in self.neighbors[s]:
group_id = self.group_index[n]
if group_id != MISSING_GROUP_ID:
self._update_liberties(group_id, add={s})
class Position(object):
def __init__(self, board_size, board=None, n=0, komi=7.5, caps=(0, 0),
lib_tracker=None, ko=None, recent=tuple(),
board_deltas=None, to_play=BLACK):
"""Initialize position class.
Args:
board_size: the go board size.
board: a numpy array
n: an int representing moves played so far
komi: a float, representing points given to the second player.
caps: a (int, int) tuple of captures for B, W.
lib_tracker: a LibertyTracker object
ko: a Move
recent: a tuple of PlayerMoves, such that recent[-1] is the last move.
board_deltas: a np.array of shape (n, go.N, go.N) representing changes
made to the board at each move (played move and captures).
Should satisfy next_pos.board - next_pos.board_deltas[0] == pos.board
to_play: BLACK or WHITE
"""
if not isinstance(recent, tuple):
raise TypeError('Recent must be a tuple!')
self.board_size = board_size
self.board = (board if board is not None else
-np.zeros([board_size, board_size], dtype=np.int8))
self.n = n
self.komi = komi
self.caps = caps
self.lib_tracker = lib_tracker or LibertyTracker.from_board(
self.board_size, self.board)
self.ko = ko
self.recent = recent
self.board_deltas = (board_deltas if board_deltas is not None else
-np.zeros([0, board_size, board_size], dtype=np.int8))
self.to_play = to_play
self.last_eight = None
self.neighbors, _ = get_neighbors_diagonals(board_size)
def __deepcopy__(self, memodict=None):
new_board = np.copy(self.board)
new_lib_tracker = copy.deepcopy(self.lib_tracker)
return Position(
self.board_size, new_board, self.n, self.komi, self.caps,
new_lib_tracker, self.ko, self.recent, self.board_deltas, self.to_play)
def __str__(self):
pretty_print_map = {
WHITE: '\x1b[0;31;47mO',
EMPTY: '\x1b[0;31;43m.',
BLACK: '\x1b[0;31;40mX',
FILL: '#',
KO: '*',
}
board = np.copy(self.board)
captures = self.caps
if self.ko is not None:
place_stones(board, KO, [self.ko])
raw_board_contents = []
for i in range(self.board_size):
row = []
for j in range(self.board_size):
appended = '<' if (
self.recent and (i, j) == self.recent[-1].move) else ' '
row.append(pretty_print_map[board[i, j]] + appended)
row.append('\x1b[0m')
raw_board_contents.append(''.join(row))
row_labels = ['%2d ' % i for i in range(self.board_size, 0, -1)]
annotated_board_contents = [''.join(r) for r in zip(
row_labels, raw_board_contents, row_labels)]
header_footer_rows = [
' ' + ' '.join('ABCDEFGHJKLMNOPQRST'[:self.board_size]) + ' ']
annotated_board = '\n'.join(itertools.chain(
header_footer_rows, annotated_board_contents, header_footer_rows))
details = '\nMove: {}. Captures X: {} O: {}\n'.format(
self.n, *captures)
return annotated_board + details
def is_move_suicidal(self, move):
potential_libs = set()
for n in self.neighbors[move]:
neighbor_group_id = self.lib_tracker.group_index[n]
if neighbor_group_id == MISSING_GROUP_ID:
# at least one liberty after playing here, so not a suicide
return False
neighbor_group = self.lib_tracker.groups[neighbor_group_id]
if neighbor_group.color == self.to_play:
potential_libs |= neighbor_group.liberties
elif len(neighbor_group.liberties) == 1:
# would capture an opponent group if they only had one lib.
return False
# it's possible to suicide by connecting several friendly groups
# each of which had one liberty.
potential_libs -= set([move])
return not potential_libs
def is_move_legal(self, move):
"""Checks that a move is on an empty space, not on ko, and not suicide."""
if move is None:
return True
if self.board[move] != EMPTY:
return False
if move == self.ko:
return False
if self.is_move_suicidal(move):
return False
return True
def all_legal_moves(self):
"""Returns a np.array of size go.N**2 + 1, with 1 = legal, 0 = illegal."""
# by default, every move is legal
legal_moves = np.ones([self.board_size, self.board_size], dtype=np.int8)
# ...unless there is already a stone there
legal_moves[self.board != EMPTY] = 0
# calculate which spots have 4 stones next to them
# padding is because the edge always counts as a lost liberty.
adjacent = np.ones([self.board_size+2, self.board_size+2], dtype=np.int8)
adjacent[1:-1, 1:-1] = np.abs(self.board)
num_adjacent_stones = (adjacent[:-2, 1:-1] + adjacent[1:-1, :-2] +
adjacent[2:, 1:-1] + adjacent[1:-1, 2:])
# Surrounded spots are those that are empty and have 4 adjacent stones.
surrounded_spots = np.multiply(
(self.board == EMPTY),
(num_adjacent_stones == 4))
# Such spots are possibly illegal, unless they are capturing something.
# Iterate over and manually check each spot.
for coord in np.transpose(np.nonzero(surrounded_spots)):
if self.is_move_suicidal(tuple(coord)):
legal_moves[tuple(coord)] = 0
# ...and retaking ko is always illegal
if self.ko is not None:
legal_moves[self.ko] = 0
# and pass is always legal
return np.concatenate([legal_moves.ravel(), [1]])
def pass_move(self, mutate=False):
pos = self if mutate else copy.deepcopy(self)
pos.n += 1
pos.recent += (PlayerMove(pos.to_play, None),)
pos.board_deltas = np.concatenate((
np.zeros([1, self.board_size, self.board_size], dtype=np.int8),
pos.board_deltas[:6]))
pos.to_play *= -1
pos.ko = None
return pos
def flip_playerturn(self, mutate=False):
pos = self if mutate else copy.deepcopy(self)
pos.ko = None
pos.to_play *= -1
return pos
def get_liberties(self):
return self.lib_tracker.liberty_cache
def play_move(self, c, color=None, mutate=False):
"""Obeys CGOS Rules of Play.
In short:
No suicides
Chinese/area scoring
Positional superko (this is very crudely approximate at the moment.)
Args:
c: the coordinate to play from.
color: the color of the player to play.
mutate:
Returns:
The position of next move.
Raises:
IllegalMove: if the input c is an illegal move.
"""
if color is None:
color = self.to_play
pos = self if mutate else copy.deepcopy(self)
if c is None:
pos = pos.pass_move(mutate=mutate)
return pos
if not self.is_move_legal(c):
raise IllegalMove('{} move at {} is illegal: \n{}'.format(
'Black' if self.to_play == BLACK else 'White',
coords.to_kgs(self.board_size, c), self))
potential_ko = is_koish(self.board_size, self.board, c)
place_stones(pos.board, color, [c])
captured_stones = pos.lib_tracker.add_stone(color, c)
place_stones(pos.board, EMPTY, captured_stones)
opp_color = -1 * color
new_board_delta = np.zeros([self.board_size, self.board_size],
dtype=np.int8)
new_board_delta[c] = color
place_stones(new_board_delta, color, captured_stones)
if len(captured_stones) == 1 and potential_ko == opp_color:
new_ko = list(captured_stones)[0]
else:
new_ko = None
if pos.to_play == BLACK:
new_caps = (pos.caps[0] + len(captured_stones), pos.caps[1])
else:
new_caps = (pos.caps[0], pos.caps[1] + len(captured_stones))
pos.n += 1
pos.caps = new_caps
pos.ko = new_ko
pos.recent += (PlayerMove(color, c),)
# keep a rolling history of last 7 deltas - that's all we'll need to
# extract the last 8 board states.
pos.board_deltas = np.concatenate((
new_board_delta.reshape(1, self.board_size, self.board_size),
pos.board_deltas[:6]))
pos.to_play *= -1
return pos
def is_game_over(self):
return (len(self.recent) >= 2
and self.recent[-1].move is None
and self.recent[-2].move is None)
def score(self):
"""Return score from B perspective. If W is winning, score is negative."""
working_board = np.copy(self.board)
while EMPTY in working_board:
unassigned_spaces = np.where(working_board == EMPTY)
c = unassigned_spaces[0][0], unassigned_spaces[1][0]
territory, borders = find_reached(self.board_size, working_board, c)
border_colors = set(working_board[b] for b in borders)
X_border = BLACK in border_colors # pylint: disable=invalid-name
O_border = WHITE in border_colors # pylint: disable=invalid-name
if X_border and not O_border:
territory_color = BLACK
elif O_border and not X_border:
territory_color = WHITE
else:
territory_color = UNKNOWN # dame, or seki
place_stones(working_board, territory_color, territory)
return np.count_nonzero(working_board == BLACK) - np.count_nonzero(
working_board == WHITE) - self.komi
def result(self):
score = self.score()
if score > 0:
return 1
elif score < 0:
return -1
else:
return 0
def result_string(self):
score = self.score()
if score > 0:
return 'B+' + '{:.1f}'.format(score)
elif score < 0:
return 'W+' + '{:.1f}'.format(abs(score))
else:
return 'DRAW'
This diff is collapsed.
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the 'License');
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an 'AS IS' BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Extends gtp.py."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
import sys
import coords
import go
import gtp
import sgf_wrapper
def parse_message(message):
message = gtp.pre_engine(message).strip()
first, rest = (message.split(' ', 1) + [None])[:2]
if first.isdigit():
message_id = int(first)
if rest is not None:
command, arguments = (rest.split(' ', 1) + [None])[:2]
else:
command, arguments = None, None
else:
message_id = None
command, arguments = first, rest
command = command.replace('-', '_') # for kgs extensions.
return message_id, command, arguments
class KgsExtensionsMixin(gtp.Engine):
def __init__(self, game_obj, name='gtp (python, kgs-chat extensions)',
version='0.1'):
super().__init__(game_obj=game_obj, name=name, version=version)
self.known_commands += ['kgs-chat']
def send(self, message):
message_id, command, arguments = parse_message(message)
if command in self.known_commands:
try:
retval = getattr(self, 'cmd_' + command)(arguments)
response = gtp.format_success(message_id, retval)
sys.stderr.flush()
return response
except ValueError as exception:
return gtp.format_error(message_id, exception.args[0])
else:
return gtp.format_error(message_id, 'unknown command: ' + command)
# Nice to implement this, as KGS sends it each move.
def cmd_time_left(self, arguments):
pass
def cmd_showboard(self, arguments):
return self._game.showboard()
def cmd_kgs_chat(self, arguments):
try:
arg_list = arguments.split()
msg_type, sender, text = arg_list[0], arg_list[1], arg_list[2:]
text = ' '.join(text)
except ValueError:
return 'Unparseable message, args: %r' % arguments
return self._game.chat(msg_type, sender, text)
class RegressionsMixin(gtp.Engine):
def cmd_loadsgf(self, arguments):
args = arguments.split()
if len(args) == 2:
file_, movenum = args
movenum = int(movenum)
print('movenum =', movenum, file=sys.stderr)
else:
file_ = args[0]
movenum = None
try:
with open(file_, 'r') as f:
contents = f.read()
except:
raise ValueError('Unreadable file: ' + file_)
try:
# This is kinda bad, because replay_sgf is already calling
# 'play move' on its internal position objects, but we really
# want to advance the engine along with us rather than try to
# push in some finished Position object.
for idx, p in enumerate(sgf_wrapper.replay_sgf(contents)):
print('playing #', idx, p.next_move, file=sys.stderr)
self._game.play_move(p.next_move)
if movenum and idx == movenum:
break
except:
raise
class GoGuiMixin(gtp.Engine):
"""GTP extensions of 'analysis commands' for gogui.
We reach into the game_obj (an instance of the players in strategies.py),
and extract stuff from its root nodes, etc. These could be extracted into
methods on the Player object, but its a little weird to do that on a Player,
which doesn't really care about GTP commands, etc. So instead, we just
violate encapsulation a bit.
"""
def __init__(self, game_obj, name='gtp (python, gogui extensions)',
version='0.1'):
super().__init__(game_obj=game_obj, name=name, version=version)
self.known_commands += ['gogui-analyze_commands']
def cmd_gogui_analyze_commands(self, arguments):
return '\n'.join(['var/Most Read Variation/nextplay',
'var/Think a spell/spin',
'pspairs/Visit Heatmap/visit_heatmap',
'pspairs/Q Heatmap/q_heatmap'])
def cmd_nextplay(self, arguments):
return self._game.root.mvp_gg()
def cmd_visit_heatmap(self, arguments):
sort_order = list(range(self._game.size * self._game.size + 1))
sort_order.sort(key=lambda i: self._game.root.child_N[i], reverse=True)
return self.heatmap(sort_order, self._game.root, 'child_N')
def cmd_q_heatmap(self, arguments):
sort_order = list(range(self._game.size * self._game.size + 1))
reverse = True if self._game.root.position.to_play is go.BLACK else False
sort_order.sort(
key=lambda i: self._game.root.child_Q[i], reverse=reverse)
return self.heatmap(sort_order, self._game.root, 'child_Q')
def heatmap(self, sort_order, node, prop):
return '\n'.join(['{!s:6} {}'.format(
coords.to_kgs(coords.from_flat(key)), node.__dict__.get(prop)[key])
for key in sort_order if node.child_N[key] > 0][:20])
def cmd_spin(self, arguments):
for _ in range(50):
for _ in range(100):
self._game.tree_search()
moves = self.cmd_nextplay(None).lower()
moves = moves.split()
colors = 'bw' if self._game.root.position.to_play is go.BLACK else 'wb'
moves_cols = ' '.join(['{} {}'.format(*z)
for z in zip(itertools.cycle(colors), moves)])
print('gogui-gfx: TEXT', '{:.3f} after {}'.format(
self._game.root.Q, self._game.root.N), file=sys.stderr, flush=True)
print('gogui-gfx: VAR', moves_cols, file=sys.stderr, flush=True)
return self.cmd_nextplay(None)
class GTPDeluxe(KgsExtensionsMixin, RegressionsMixin, GoGuiMixin):
pass
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A wrapper of gtp and gtp_extensions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import datetime
import os
import sys
import coords
from dualnet import DualNetRunner
import go
import gtp
import gtp_extensions
from strategies import MCTSPlayerMixin, CGOSPlayerMixin
def translate_gtp_colors(gtp_color):
if gtp_color == gtp.BLACK:
return go.BLACK
elif gtp_color == gtp.WHITE:
return go.WHITE
else:
return go.EMPTY
class GtpInterface(object):
def __init__(self, board_size):
self.size = 9
self.position = None
self.komi = 6.5
self.board_size = board_size
def set_size(self, n):
if n != self.board_size:
raise ValueError((
"Can't handle boardsize {}! Please check the board size.").format(n))
def set_komi(self, komi):
self.komi = komi
self.position.komi = komi
def clear(self):
if self.position and len(self.position.recent) > 1:
try:
sgf = self.to_sgf()
with open(datetime.datetime.now().strftime(
'%Y-%m-%d-%H:%M.sgf'), 'w') as f:
f.write(sgf)
except NotImplementedError:
pass
except:
print('Error saving sgf', file=sys.stderr, flush=True)
self.position = go.Position(komi=self.komi)
self.initialize_game(self.position)
def accomodate_out_of_turn(self, color):
if translate_gtp_colors(color) != self.position.to_play:
self.position.flip_playerturn(mutate=True)
def make_move(self, color, vertex):
c = coords.from_pygtp(self.board_size, vertex)
# let's assume this never happens for now.
# self.accomodate_out_of_turn(color)
return self.play_move(c)
def get_move(self, color):
self.accomodate_out_of_turn(color)
move = self.suggest_move(self.position)
if self.should_resign():
return gtp.RESIGN
return coords.to_pygtp(self.board_size, move)
def final_score(self):
return self.position.result_string()
def showboard(self):
print('\n\n' + str(self.position) + '\n\n', file=sys.stderr)
return True
def should_resign(self):
raise NotImplementedError
def get_score(self):
return self.position.result_string()
def suggest_move(self, position):
raise NotImplementedError
def play_move(self, c):
raise NotImplementedError
def initialize_game(self):
raise NotImplementedError
def chat(self, msg_type, sender, text):
raise NotImplementedError
def to_sgf(self):
raise NotImplementedError
class MCTSPlayer(MCTSPlayerMixin, GtpInterface):
pass
class CGOSPlayer(CGOSPlayerMixin, GtpInterface):
pass
def make_gtp_instance(board_size, read_file, readouts_per_move=100,
verbosity=1, cgos_mode=False):
n = DualNetRunner(read_file)
instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
verbosity=verbosity, two_player_mode=True)
gtp_engine = gtp.Engine(instance)
if cgos_mode:
instance = CGOSPlayer(board_size, n, seconds_per_move=5,
verbosity=verbosity, two_player_mode=True)
else:
instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
verbosity=verbosity, two_player_mode=True)
name = 'Somebot-' + os.path.basename(read_file)
gtp_engine = gtp_extensions.GTPDeluxe(instance, name=name)
return gtp_engine
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Monte Carlo Tree Search implementation.
All terminology here (Q, U, N, p_UCT) uses the same notation as in the
AlphaGo (AG) paper, and more details can be found in the paper. Here is a brief
description:
Q: the action value of a position
U: the search control strategy
N: the visit counts of a state
p_UCT: the PUCT algorithm for action selection
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import coords
import numpy as np
# Exploration constant
c_PUCT = 1.38 # pylint: disable=invalid-name
# Dirichlet noise, as a function of board_size
def D_NOISE_ALPHA(board_size): # pylint: disable=invalid-name
return 0.03 * 361 / (board_size ** 2)
class DummyNode(object):
"""A fake node of a MCTS search tree.
This node is intended to be a placeholder for the root node, which would
otherwise have no parent node. If all nodes have parents, code becomes
simpler.
"""
# pylint: disable=invalid-name
def __init__(self, board_size):
self.board_size = board_size
self.parent = None
self.child_N = collections.defaultdict(float)
self.child_W = collections.defaultdict(float)
class MCTSNode(object):
"""A node of a MCTS search tree.
A node knows how to compute the action scores of all of its children,
so that a decision can be made about which move to explore next. Upon
selecting a move, the children dictionary is updated with a new node.
position: A go.Position instance
fmove: A move (coordinate) that led to this position, a a flattened coord
(raw number between 0-N^2, with None a pass)
parent: A parent MCTSNode.
"""
# pylint: disable=invalid-name
def __init__(self, board_size, position, fmove=None, parent=None):
if parent is None:
parent = DummyNode(board_size)
self.board_size = board_size
self.parent = parent
self.fmove = fmove # move that led to this position, as flattened coords
self.position = position
self.is_expanded = False
self.losses_applied = 0 # number of virtual losses on this node
# using child_() allows vectorized computation of action score.
self.illegal_moves = 1000 * (1 - self.position.all_legal_moves())
self.child_N = np.zeros([board_size * board_size + 1], dtype=np.float32)
self.child_W = np.zeros([board_size * board_size + 1], dtype=np.float32)
# save a copy of the original prior before it gets mutated by d-noise.
self.original_prior = np.zeros([board_size * board_size + 1],
dtype=np.float32)
self.child_prior = np.zeros([board_size * board_size + 1], dtype=np.float32)
self.children = {} # map of flattened moves to resulting MCTSNode
def __repr__(self):
return '<MCTSNode move={}, N={}, to_play={}>'.format(
self.position.recent[-1:], self.N, self.position.to_play)
@property
def child_action_score(self):
return (self.child_Q * self.position.to_play
+ self.child_U - self.illegal_moves)
@property
def child_Q(self):
return self.child_W / (1 + self.child_N)
@property
def child_U(self):
return (c_PUCT * math.sqrt(1 + self.N) *
self.child_prior / (1 + self.child_N))
@property
def Q(self):
return self.W / (1 + self.N)
@property
def N(self):
return self.parent.child_N[self.fmove]
@N.setter
def N(self, value):
self.parent.child_N[self.fmove] = value
@property
def W(self):
return self.parent.child_W[self.fmove]
@W.setter
def W(self, value):
self.parent.child_W[self.fmove] = value
@property
def Q_perspective(self):
"""Return value of position, from perspective of player to play."""
return self.Q * self.position.to_play
def select_leaf(self):
current = self
pass_move = self.board_size * self.board_size
while True:
current.N += 1
# if a node has never been evaluated, we have no basis to select a child.
if not current.is_expanded:
break
# HACK: if last move was a pass, always investigate double-pass first
# to avoid situations where we auto-lose by passing too early.
if (current.position.recent
and current.position.recent[-1].move is None
and current.child_N[pass_move] == 0):
current = current.maybe_add_child(pass_move)
continue
best_move = np.argmax(current.child_action_score)
current = current.maybe_add_child(best_move)
return current
def maybe_add_child(self, fcoord):
"""Add child node for fcoord if it doesn't already exist, and returns it."""
if fcoord not in self.children:
new_position = self.position.play_move(
coords.from_flat(self.board_size, fcoord))
self.children[fcoord] = MCTSNode(
self.board_size, new_position, fmove=fcoord, parent=self)
return self.children[fcoord]
def add_virtual_loss(self, up_to):
"""Propagate a virtual loss up to the root node.
Args:
up_to: The node to propagate until. (Keep track of this! You'll
need it to reverse the virtual loss later.)
"""
self.losses_applied += 1
# This is a "win" for the current node; hence a loss for its parent node
# who will be deciding whether to investigate this node again.
loss = self.position.to_play
self.W += loss
if self.parent is None or self is up_to:
return
self.parent.add_virtual_loss(up_to)
def revert_virtual_loss(self, up_to):
self.losses_applied -= 1
revert = -self.position.to_play
self.W += revert
if self.parent is None or self is up_to:
return
self.parent.revert_virtual_loss(up_to)
def revert_visits(self, up_to):
"""Revert visit increments."""
# Sometimes, repeated calls to select_leaf return the same node.
# This is rare and we're okay with the wasted computation to evaluate
# the position multiple times by the dual_net. But select_leaf has the
# side effect of incrementing visit counts. Since we want the value to
# only count once for the repeatedly selected node, we also have to
# revert the incremented visit counts.
self.N -= 1
if self.parent is None or self is up_to:
return
self.parent.revert_visits(up_to)
def incorporate_results(self, move_probabilities, value, up_to):
assert move_probabilities.shape == (self.board_size * self.board_size + 1,)
# A finished game should not be going through this code path - should
# directly call backup_value() on the result of the game.
assert not self.position.is_game_over()
if self.is_expanded:
self.revert_visits(up_to=up_to)
return
self.is_expanded = True
self.original_prior = self.child_prior = move_probabilities
# initialize child Q as current node's value, to prevent dynamics where
# if B is winning, then B will only ever explore 1 move, because the Q
# estimation will be so much larger than the 0 of the other moves.
#
# Conversely, if W is winning, then B will explore all 362 moves before
# continuing to explore the most favorable move. This is a waste of search.
#
# The value seeded here acts as a prior, and gets averaged into
# Q calculations.
self.child_W = np.ones([self.board_size * self.board_size + 1],
dtype=np.float32) * value
self.backup_value(value, up_to=up_to)
def backup_value(self, value, up_to):
"""Propagates a value estimation up to the root node.
Args:
value: the value to be propagated (1 = black wins, -1 = white wins)
up_to: the node to propagate until.
"""
self.W += value
if self.parent is None or self is up_to:
return
self.parent.backup_value(value, up_to)
def is_done(self):
# True if the last two moves were Pass or if the position is at a move
# greater than the max depth.
max_depth = (self.board_size ** 2) * 1.4 # 505 moves for 19x19, 113 for 9x9
return self.position.is_game_over() or self.position.n >= max_depth
def inject_noise(self):
dirch = np.random.dirichlet([D_NOISE_ALPHA(self.board_size)] * (
(self.board_size * self.board_size) + 1))
self.child_prior = self.child_prior * 0.75 + dirch * 0.25
def children_as_pi(self, squash=False):
"""Returns the child visit counts as a probability distribution, pi."""
# If squash is true, exponentiate the probabilities by a temperature
# slightly larger than unity to encourage diversity in early play and
# hopefully to move away from 3-3s
probs = self.child_N
if squash:
probs **= .95
return probs / np.sum(probs)
def most_visited_path(self):
node = self
output = []
while node.children:
next_kid = np.argmax(node.child_N)
node = node.children.get(next_kid)
if node is None:
output.append('GAME END')
break
output.append('{} ({}) ==> '.format(
coords.to_kgs(
self.board_size,
coords.from_flat(self.board_size, node.fmove)), node.N))
output.append('Q: {:.5f}\n'.format(node.Q))
return ''.join(output)
def mvp_gg(self):
""" Returns most visited path in go-gui VAR format e.g. 'b r3 w c17..."""
node = self
output = []
while node.children and max(node.child_N) > 1:
next_kid = np.argmax(node.child_N)
node = node.children[next_kid]
output.append('{}'.format(coords.to_kgs(
self.board_size, coords.from_flat(self.board_size, node.fmove))))
return ' '.join(output)
def describe(self):
sort_order = list(range(self.board_size * self.board_size + 1))
sort_order.sort(key=lambda i: (
self.child_N[i], self.child_action_score[i]), reverse=True)
soft_n = self.child_N / sum(self.child_N)
p_delta = soft_n - self.child_prior
p_rel = p_delta / self.child_prior
# Dump out some statistics
output = []
output.append('{q:.4f}\n'.format(q=self.Q))
output.append(self.most_visited_path())
output.append(
'''move: action Q U P P-Dir N soft-N
p-delta p-rel\n''')
output.append(
'\n'.join([
'''{!s:6}: {: .3f}, {: .3f}, {:.3f}, {:.3f}, {:.3f}, {:4d} {:.4f}
{: .5f} {: .2f}'''.format(
coords.to_kgs(self.board_size, coords.from_flat(
self.board_size, key)),
self.child_action_score[key],
self.child_Q[key],
self.child_U[key],
self.child_prior[key],
self.original_prior[key],
int(self.child_N[key]),
soft_n[key],
p_delta[key],
p_rel[key])
for key in sort_order][:15]))
return ''.join(output)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for mcts."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import tensorflow as tf # pylint: disable=g-bad-import-order
import coords
import go
from mcts import MCTSNode
import numpy as np
import utils_test
tf.logging.set_verbosity(tf.logging.ERROR)
ALMOST_DONE_BOARD = utils_test.load_board('''
.XO.XO.OO
X.XXOOOO.
XXXXXOOOO
XXXXXOOOO
.XXXXOOO.
XXXXXOOOO
.XXXXOOO.
XXXXXOOOO
XXXXOOOOO
''')
TEST_POSITION = go.Position(
utils_test.BOARD_SIZE,
board=ALMOST_DONE_BOARD,
n=105,
komi=2.5,
caps=(1, 4),
ko=None,
recent=(go.PlayerMove(go.BLACK, (0, 1)),
go.PlayerMove(go.WHITE, (0, 8))),
to_play=go.BLACK
)
SEND_TWO_RETURN_ONE = go.Position(
utils_test.BOARD_SIZE,
board=ALMOST_DONE_BOARD,
n=75,
komi=0.5,
caps=(0, 0),
ko=None,
recent=(
go.PlayerMove(go.BLACK, (0, 1)),
go.PlayerMove(go.WHITE, (0, 8)),
go.PlayerMove(go.BLACK, (1, 0))),
to_play=go.WHITE
)
MAX_DEPTH = (utils_test.BOARD_SIZE ** 2) * 1.4
class TestMctsNodes(utils_test.MiniGoUnitTest):
def test_action_flipping(self):
np.random.seed(1)
probs = np.array([.02] * (
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
probs += np.random.random(
[utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1]) * 0.001
black_root = MCTSNode(
utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
white_root = MCTSNode(utils_test.BOARD_SIZE, go.Position(
utils_test.BOARD_SIZE, to_play=go.WHITE))
black_root.select_leaf().incorporate_results(probs, 0, black_root)
white_root.select_leaf().incorporate_results(probs, 0, white_root)
# No matter who is to play, when we know nothing else, the priors
# should be respected, and the same move should be picked
black_leaf = black_root.select_leaf()
white_leaf = white_root.select_leaf()
self.assertEqual(black_leaf.fmove, white_leaf.fmove)
self.assertEqualNPArray(
black_root.child_action_score, white_root.child_action_score)
def test_select_leaf(self):
flattened = coords.to_flat(utils_test.BOARD_SIZE, coords.from_kgs(
utils_test.BOARD_SIZE, 'D9'))
probs = np.array([.02] * (
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
probs[flattened] = 0.4
root = MCTSNode(utils_test.BOARD_SIZE, SEND_TWO_RETURN_ONE)
root.select_leaf().incorporate_results(probs, 0, root)
self.assertEqual(root.position.to_play, go.WHITE)
self.assertEqual(root.select_leaf(), root.children[flattened])
def test_backup_incorporate_results(self):
probs = np.array([.02] * (
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
root = MCTSNode(utils_test.BOARD_SIZE, SEND_TWO_RETURN_ONE)
root.select_leaf().incorporate_results(probs, 0, root)
leaf = root.select_leaf()
leaf.incorporate_results(probs, -1, root) # white wins!
# Root was visited twice: first at the root, then at this child.
self.assertEqual(root.N, 2)
# Root has 0 as a prior and two visits with value 0, -1
self.assertAlmostEqual(root.Q, -1/3) # average of 0, 0, -1
# Leaf should have one visit
self.assertEqual(root.child_N[leaf.fmove], 1)
self.assertEqual(leaf.N, 1)
# And that leaf's value had its parent's Q (0) as a prior, so the Q
# should now be the average of 0, -1
self.assertAlmostEqual(root.child_Q[leaf.fmove], -0.5)
self.assertAlmostEqual(leaf.Q, -0.5)
# We're assuming that select_leaf() returns a leaf like:
# root
# \
# leaf
# \
# leaf2
# which happens in this test because root is W to play and leaf was a W win.
self.assertEqual(root.position.to_play, go.WHITE)
leaf2 = root.select_leaf()
leaf2.incorporate_results(probs, -0.2, root) # another white semi-win
self.assertEqual(root.N, 3)
# average of 0, 0, -1, -0.2
self.assertAlmostEqual(root.Q, -0.3)
self.assertEqual(leaf.N, 2)
self.assertEqual(leaf2.N, 1)
# average of 0, -1, -0.2
self.assertAlmostEqual(leaf.Q, root.child_Q[leaf.fmove])
self.assertAlmostEqual(leaf.Q, -0.4)
# average of -1, -0.2
self.assertAlmostEqual(leaf.child_Q[leaf2.fmove], -0.6)
self.assertAlmostEqual(leaf2.Q, -0.6)
def test_do_not_explore_past_finish(self):
probs = np.array([0.02] * (
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1), dtype=np.float32)
root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
root.select_leaf().incorporate_results(probs, 0, root)
first_pass = root.maybe_add_child(
coords.to_flat(utils_test.BOARD_SIZE, None))
first_pass.incorporate_results(probs, 0, root)
second_pass = first_pass.maybe_add_child(
coords.to_flat(utils_test.BOARD_SIZE, None))
with self.assertRaises(AssertionError):
second_pass.incorporate_results(probs, 0, root)
node_to_explore = second_pass.select_leaf()
# should just stop exploring at the end position.
self.assertEqual(node_to_explore, second_pass)
def test_add_child(self):
root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
child = root.maybe_add_child(17)
self.assertIn(17, root.children)
self.assertEqual(child.parent, root)
self.assertEqual(child.fmove, 17)
def test_add_child_idempotency(self):
root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
child = root.maybe_add_child(17)
current_children = copy.copy(root.children)
child2 = root.maybe_add_child(17)
self.assertEqual(child, child2)
self.assertEqual(current_children, root.children)
def test_never_select_illegal_moves(self):
probs = np.array([0.02] * (
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
# let's say the NN were to accidentally put a high weight on an illegal move
probs[1] = 0.99
root = MCTSNode(utils_test.BOARD_SIZE, SEND_TWO_RETURN_ONE)
root.incorporate_results(probs, 0, root)
# and let's say the root were visited a lot of times, which pumps up the
# action score for unvisited moves...
root.N = 100000
root.child_N[root.position.all_legal_moves()] = 10000
# this should not throw an error...
leaf = root.select_leaf()
# the returned leaf should not be the illegal move
self.assertNotEqual(leaf.fmove, 1)
# and even after injecting noise, we should still not select an illegal move
for _ in range(10):
root.inject_noise()
leaf = root.select_leaf()
self.assertNotEqual(leaf.fmove, 1)
def test_dont_pick_unexpanded_child(self):
probs = np.array([0.001] * (
utils_test.BOARD_SIZE * utils_test.BOARD_SIZE + 1))
# make one move really likely so that tree search goes down that path twice
# even with a virtual loss
probs[17] = 0.999
root = MCTSNode(utils_test.BOARD_SIZE, go.Position(utils_test.BOARD_SIZE))
root.incorporate_results(probs, 0, root)
leaf1 = root.select_leaf()
self.assertEqual(leaf1.fmove, 17)
leaf1.add_virtual_loss(up_to=root)
# the second select_leaf pick should return the same thing, since the child
# hasn't yet been sent to neural net for eval + result incorporation
leaf2 = root.select_leaf()
self.assertIs(leaf1, leaf2)
if __name__ == '__main__':
tf.test.main()
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Train MiniGo with several iterations of RL learning.
One iteration of RL learning consists of bootstrap, selfplay, gather and train:
bootstrap: Initialize a random model
selfplay: Play games with the latest model to produce data used for training
gather: Group games played with the same model into larger files of tfexamples
train: Train a new model with the selfplay results from the most recent
N generations.
After training, validation can be performed on the holdout data.
Given two models, evaluation can be applied to choose a stronger model.
The training pipeline consists of multiple RL learning iterations to achieve
better models.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import random
import socket
import sys
import time
import tensorflow as tf # pylint: disable=g-bad-import-order
import dualnet
import evaluation
import go
import model_params
import preprocessing
import selfplay_mcts
import utils
_TF_RECORD_SUFFIX = '.tfrecord.zz'
def _ensure_dir_exists(directory):
"""Check if directory exists. If not, create it.
Args:
directory: A given directory
"""
if os.path.isdir(directory) is False:
tf.gfile.MakeDirs(directory)
def bootstrap(estimator_model_dir, trained_models_dir, params):
"""Initialize the model with random weights.
Args:
estimator_model_dir: tf.estimator model directory.
trained_models_dir: Dir to save the trained models. Here to export the first
bootstrapped generation.
params: A MiniGoParams instance of hyperparameters for the model.
"""
bootstrap_name = utils.generate_model_name(0)
_ensure_dir_exists(trained_models_dir)
bootstrap_model_path = os.path.join(trained_models_dir, bootstrap_name)
_ensure_dir_exists(estimator_model_dir)
print('Bootstrapping with working dir {}\n Model 0 exported to {}'.format(
estimator_model_dir, bootstrap_model_path))
dualnet.bootstrap(estimator_model_dir, params)
dualnet.export_model(estimator_model_dir, bootstrap_model_path)
def selfplay(selfplay_dirs, selfplay_model, params):
"""Perform selfplay with a specific model.
Args:
selfplay_dirs: A dict to specify the directories used in selfplay.
selfplay_dirs = {
'output_dir': output_dir,
'holdout_dir': holdout_dir,
'clean_sgf': clean_sgf,
'full_sgf': full_sgf
}
selfplay_model: The actual Dualnet runner for selfplay.
params: A MiniGoParams instance of hyperparameters for the model.
"""
with utils.logged_timer('Playing game'):
player = selfplay_mcts.play(
params.board_size, selfplay_model, params.selfplay_readouts,
params.selfplay_resign_threshold, params.simultaneous_leaves,
params.selfplay_verbose)
output_name = '{}-{}'.format(int(time.time()), socket.gethostname())
def _write_sgf_data(dir_sgf, use_comments):
with tf.gfile.GFile(
os.path.join(dir_sgf, '{}.sgf'.format(output_name)), 'w') as f:
f.write(player.to_sgf(use_comments=use_comments))
_write_sgf_data(selfplay_dirs['clean_sgf'], use_comments=False)
_write_sgf_data(selfplay_dirs['full_sgf'], use_comments=True)
game_data = player.extract_data()
tf_examples = preprocessing.make_dataset_from_selfplay(game_data, params)
# Hold out 5% of games for evaluation.
if random.random() < params.holdout_pct:
fname = os.path.join(
selfplay_dirs['holdout_dir'], output_name + _TF_RECORD_SUFFIX)
else:
fname = os.path.join(
selfplay_dirs['output_dir'], output_name + _TF_RECORD_SUFFIX)
preprocessing.write_tf_examples(fname, tf_examples)
def gather(selfplay_dir, training_chunk_dir, params):
"""Gather selfplay data into large training chunk.
Args:
selfplay_dir: Where to look for games. Set as 'base_dir/data/selfplay/'.
training_chunk_dir: where to put collected games. Set as
'base_dir/data/training_chunks/'.
params: A MiniGoParams instance of hyperparameters for the model.
"""
# Check the selfplay data from the most recent 50 models.
_ensure_dir_exists(training_chunk_dir)
sorted_model_dirs = sorted(tf.gfile.ListDirectory(selfplay_dir))
models = [model_dir.strip('/')
for model_dir in sorted_model_dirs[-params.gather_generation:]]
with utils.logged_timer('Finding existing tfrecords...'):
model_gamedata = {
model: tf.gfile.Glob(
os.path.join(selfplay_dir, model, '*'+_TF_RECORD_SUFFIX))
for model in models
}
print('Found {} models'.format(len(models)))
for model_name, record_files in sorted(model_gamedata.items()):
print(' {}: {} files'.format(model_name, len(record_files)))
meta_file = os.path.join(training_chunk_dir, 'meta.txt')
try:
with tf.gfile.GFile(meta_file, 'r') as f:
already_processed = set(f.read().split())
except tf.errors.NotFoundError:
already_processed = set()
num_already_processed = len(already_processed)
for model_name, record_files in sorted(model_gamedata.items()):
if set(record_files) <= already_processed:
continue
print('Gathering files from {}:'.format(model_name))
tf_examples = preprocessing.shuffle_tf_examples(
params.shuffle_buffer_size, params.examples_per_chunk, record_files)
# tqdm to make the loops show a smart progress meter
for i, example_batch in enumerate(tf_examples):
output_record = os.path.join(
training_chunk_dir,
('{}-{}'+_TF_RECORD_SUFFIX).format(model_name, str(i)))
preprocessing.write_tf_examples(
output_record, example_batch, serialize=False)
already_processed.update(record_files)
print('Processed {} new files'.format(
len(already_processed) - num_already_processed))
with tf.gfile.GFile(meta_file, 'w') as f:
f.write('\n'.join(sorted(already_processed)))
def train(trained_models_dir, estimator_model_dir, training_chunk_dir,
generation, params):
"""Train the latest model from gathered data.
Args:
trained_models_dir: Where to export the completed generation.
estimator_model_dir: tf.estimator model directory.
training_chunk_dir: Directory where gathered training chunks are.
generation: Which generation you are training.
params: A MiniGoParams instance of hyperparameters for the model.
"""
new_model_name = utils.generate_model_name(generation)
print('New model will be {}'.format(new_model_name))
new_model = os.path.join(trained_models_dir, new_model_name)
print('Training on gathered game data...')
tf_records = sorted(
tf.gfile.Glob(os.path.join(training_chunk_dir, '*'+_TF_RECORD_SUFFIX)))
tf_records = tf_records[
-(params.train_window_size // params.examples_per_chunk):]
print('Training from: {} to {}'.format(tf_records[0], tf_records[-1]))
with utils.logged_timer('Training'):
dualnet.train(estimator_model_dir, tf_records, generation, params)
dualnet.export_model(estimator_model_dir, new_model)
def validate(trained_models_dir, holdout_dir, estimator_model_dir, params):
"""Validate the latest model on the holdout dataset.
Args:
trained_models_dir: Directories where the completed generations/models are.
holdout_dir: Directories where holdout data are.
estimator_model_dir: tf.estimator model directory.
params: A MiniGoParams instance of hyperparameters for the model.
"""
model_num, _ = utils.get_latest_model(trained_models_dir)
# Get the holdout game data
nums_names = utils.get_models(trained_models_dir)
# Model N was trained on games up through model N-1, so the validation set
# should only be for models through N-1 as well, thus the (model_num) term.
models = [num_name for num_name in nums_names if num_name[0] < model_num]
# pair is a tuple of (model_num, model_name), like (13, 000013-modelname)
holdout_dirs = [os.path.join(holdout_dir, pair[1])
for pair in models[-params.holdout_generation:]]
tf_records = []
with utils.logged_timer('Building lists of holdout files'):
for record_dir in holdout_dirs:
if os.path.exists(record_dir): # make sure holdout dir exists
tf_records.extend(
tf.gfile.Glob(os.path.join(record_dir, '*'+_TF_RECORD_SUFFIX)))
if not tf_records:
print('No holdout dataset for validation! '
'Please check your holdout directory: {}'.format(holdout_dir))
return
print('The length of tf_records is {}.'.format(len(tf_records)))
first_tf_record = os.path.basename(tf_records[0])
last_tf_record = os.path.basename(tf_records[-1])
with utils.logged_timer('Validating from {} to {}'.format(
first_tf_record, last_tf_record)):
dualnet.validate(estimator_model_dir, tf_records, params)
def evaluate(black_model_name, black_net, white_model_name, white_net,
evaluate_dir, params):
"""Evaluate with two models.
With two DualNetRunners to play as black and white in a Go match. Two models
play several games, and the model that wins by a margin of 55% will be the
winner.
Args:
black_model_name: The name of the model playing black.
black_net: The DualNetRunner model for black
white_model_name: The name of the model playing white.
white_net: The DualNetRunner model for white.
evaluate_dir: Where to write the evaluation results. Set as
'base_dir/sgf/evaluate/'.
params: A MiniGoParams instance of hyperparameters for the model.
Returns:
The model name of the winner.
Raises:
ValueError: if neither `WHITE` or `BLACK` is returned.
"""
with utils.logged_timer('{} games'.format(params.eval_games)):
winner = evaluation.play_match(
params, black_net, white_net, params.eval_games,
params.eval_readouts, evaluate_dir, params.eval_verbose)
if winner != go.WHITE_NAME and winner != go.BLACK_NAME:
raise ValueError('Winner should be either White or Black!')
return black_model_name if winner == go.BLACK_NAME else white_model_name
def _set_params(flags):
"""Set hyperparameters from board size.
Args:
flags: Flags from Argparser.
Returns:
An MiniGoParams instance of hyperparameters.
"""
params = model_params.MiniGoParams()
k = utils.round_power_of_two(flags.board_size ** 2 / 3)
params.num_filters = k # Number of filters in the convolution layer
params.fc_width = 2 * k # Width of each fully connected layer
params.num_shared_layers = flags.board_size # Number of shared trunk layers
params.board_size = flags.board_size # Board size
# How many positions can fit on a graphics card. 256 for 9s, 16 or 32 for 19s.
if flags.batch_size is None:
if flags.board_size == 9:
params.batch_size = 256
else:
params.batch_size = 32
else:
params.batch_size = flags.batch_size
return params
def _prepare_selfplay(
model_name, trained_models_dir, selfplay_dir, holdout_dir, sgf_dir, params):
"""Set directories and load the network for selfplay.
Args:
model_name: The name of the model for self-play
trained_models_dir: Directories where the completed generations/models are.
selfplay_dir: Where to write the games. Set as 'base_dir/data/selfplay/'.
holdout_dir: Where to write the holdout data. Set as
'base_dir/data/holdout/'.
sgf_dir: Where to write the sgf (Smart Game Format) files. Set as
'base_dir/sgf/'.
params: A MiniGoParams instance of hyperparameters for the model.
Returns:
The directories and network model for selfplay.
"""
# Set paths for the model with 'model_name'
model_path = os.path.join(trained_models_dir, model_name)
output_dir = os.path.join(selfplay_dir, model_name)
holdout_dir = os.path.join(holdout_dir, model_name)
# clean_sgf is to write sgf file without comments.
# full_sgf is to write sgf file with comments.
clean_sgf = os.path.join(sgf_dir, model_name, 'clean')
full_sgf = os.path.join(sgf_dir, model_name, 'full')
_ensure_dir_exists(output_dir)
_ensure_dir_exists(holdout_dir)
_ensure_dir_exists(clean_sgf)
_ensure_dir_exists(full_sgf)
selfplay_dirs = {
'output_dir': output_dir,
'holdout_dir': holdout_dir,
'clean_sgf': clean_sgf,
'full_sgf': full_sgf
}
# cache the network model for self-play
with utils.logged_timer('Loading weights from {} ... '.format(model_path)):
network = dualnet.DualNetRunner(model_path, params)
return selfplay_dirs, network
def run_selfplay(selfplay_model, selfplay_games, dirs, params):
"""Run selfplay to generate training data.
Args:
selfplay_model: The model name for selfplay.
selfplay_games: The number of selfplay games.
dirs: A MiniGoDirectory instance of directories used in each step.
params: A MiniGoParams instance of hyperparameters for the model.
"""
selfplay_dirs, network = _prepare_selfplay(
selfplay_model, dirs.trained_models_dir, dirs.selfplay_dir,
dirs.holdout_dir, dirs.sgf_dir, params)
print('Self-play with model: {}'.format(selfplay_model))
for _ in range(selfplay_games):
selfplay(selfplay_dirs, network, params)
def main(_):
"""Run the reinforcement learning loop."""
tf.logging.set_verbosity(tf.logging.INFO)
params = _set_params(FLAGS)
# A dummy model for debug/testing purpose with fewer games and iterations
if FLAGS.test:
params = model_params.DummyMiniGoParams()
base_dir = FLAGS.base_dir + str(FLAGS.board_size) + '_size_dummy/'
else:
# Set directories for models and datasets
base_dir = FLAGS.base_dir + str(FLAGS.board_size) + '_size/'
dirs = utils.MiniGoDirectory(base_dir)
# Run selfplay only if user specifies the argument.
if FLAGS.selfplay:
selfplay_model_name = FLAGS.selfplay_model_name or utils.get_latest_model(
dirs.trained_models_dir)[1]
max_games = FLAGS.selfplay_max_games or params.max_games_per_generation
run_selfplay(selfplay_model_name, max_games, dirs, params)
return
# Run the RL pipeline
# if no models have been trained, start from bootstrap model
if not os.path.isdir(dirs.trained_models_dir):
print('No trained model exists! Starting from Bootstrap...')
print('Creating random initial weights...')
bootstrap(dirs.estimator_model_dir, dirs.trained_models_dir, params)
else:
print('A MiniGo base directory has been found! ')
print('Start from the last checkpoint...')
_, best_model_so_far = utils.get_latest_model(dirs.trained_models_dir)
for rl_iter in range(params.max_iters_per_pipeline):
print('RL_iteration: {}'.format(rl_iter))
# Self-play with the best model to generate training data
run_selfplay(
best_model_so_far, params.max_games_per_generation, dirs, params)
# gather selfplay data for training
print('Gathering game output...')
gather(dirs.selfplay_dir, dirs.training_chunk_dir, params)
# train the next generation model
model_num, _ = utils.get_latest_model(dirs.trained_models_dir)
print('Training on gathered game data...')
train(dirs.trained_models_dir, dirs.estimator_model_dir,
dirs.training_chunk_dir, model_num + 1, params)
# validate the latest model if needed
if FLAGS.validation:
print('Validating on the holdout game data...')
validate(dirs.trained_models_dir, dirs.holdout_dir,
dirs.estimator_model_dir, params)
_, current_model = utils.get_latest_model(dirs.trained_models_dir)
if FLAGS.evaluation: # Perform evaluation if needed
print('Evaluate models between {} and {}'.format(
best_model_so_far, current_model))
black_model = os.path.join(dirs.trained_models_dir, best_model_so_far)
white_model = os.path.join(dirs.trained_models_dir, current_model)
_ensure_dir_exists(dirs.evaluate_dir)
with utils.logged_timer('Loading weights'):
black_net = dualnet.DualNetRunner(black_model, params)
white_net = dualnet.DualNetRunner(white_model, params)
best_model_so_far = evaluate(
best_model_so_far, black_net, current_model, white_net,
dirs.evaluate_dir, params)
print('Winner of evaluation: {}!'.format(best_model_so_far))
else:
best_model_so_far = current_model
if __name__ == '__main__':
parser = argparse.ArgumentParser()
# flags to run the RL pipeline
parser.add_argument(
'--base_dir',
type=str,
default='/tmp/minigo/',
metavar='BD',
help='Base directory for the MiniGo models and datasets.')
parser.add_argument(
'--board_size',
type=int,
default=9,
metavar='N',
choices=[9, 19],
help='Go board size. The default size is 9.')
parser.add_argument(
'--batch_size',
type=int,
default=None,
metavar='BS',
help='Batch size for training. The default size is None')
# Test the pipeline with a dummy model
parser.add_argument(
'--test',
action='store_true',
help='A boolean to test RL pipeline with a dummy model.')
# Run RL pipeline with the validation step
parser.add_argument(
'--validation',
action='store_true',
help='A boolean to specify validation in the RL pipeline.')
# Run RL pipeline with the evaluation step
parser.add_argument(
'--evaluation',
action='store_true',
help='A boolean to specify evaluation in the RL pipeline.')
# self-play only
parser.add_argument(
'--selfplay',
action='store_true',
help='A boolean to run self-play only.')
parser.add_argument(
'--selfplay_model_name',
type=str,
default=None,
metavar='SM',
help='The model used for self-play only.')
parser.add_argument(
'--selfplay_max_games',
type=int,
default=None,
metavar='SMG',
help='The number of game data self-play only needs to generate')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Defines MiniGo parameters."""
class MiniGoParams(object):
"""Parameters for MiniGo."""
# Go board size
board_size = 9
# RL pipeline
max_games_per_generation = 10 # Number of games per selfplay generation
max_iters_per_pipeline = 2 # Number of RL iterations in one pipeline
# The shuffle buffer size determines how far an example could end up from
# where it started; this and the interleave parameters in preprocessing can
# give us an approximation of a uniform sampling. The default of 4M is used
# in training, but smaller numbers can be used for aggregation or validation.
shuffle_buffer_size = 2000000 # shuffle buffer size in preprocessing
# dual_net
# How many positions to look at per generation.
# Per AlphaGo Zero (AGZ), 2048 minibatch * 1k = 2M positions/generation
examples_per_generation = 2000000
# for learning rate
l2_strength = 1e-4 # Regularization strength
momentum = 0.9 # Momentum used in SGD
kernel_size = [3, 3] # kernel size of conv and res blocks is from AGZ paper
# selfplay
selfplay_readouts = 100 # How many simulations to run per move
selfplay_verbose = 1 # >=2 will print debug info, >=3 will print boards
# an absolute value of threshold to resign at
selfplay_resign_threshold = 0.95
# the number of simultaneous leaves in MCTS
simultaneous_leaves = 8
# holdout data for validation
holdout_pct = 0.05 # How many games to hold out for validation
holdout_generation = 50 # How many recent generations/models for holdout data
# gather
gather_generation = 50 # How many recent generations/models for gathered data
# How many positions we should aggregate per 'chunk'.
examples_per_chunk = 10000
# How many positions to draw from for our training window.
# AGZ used the most recent 500k games, which, assuming 250 moves/game = 125M
train_window_size = 125000000
# evaluation with two models
eval_games = 50 # The number of games to play in evaluation
eval_readouts = 100 # How many readouts to make per move in evaluation
eval_verbose = 1 # How verbose the players should be in evaluation
eval_win_rate = 0.55 # Winner needs to win by a margin of 55%.
class DummyMiniGoParams(MiniGoParams):
"""Parameters for a dummy model."""
num_filters = 8 # Number of filters in the convolution layer
fc_width = 16 # Width of each fully connected layer
num_shared_layers = 1 # Number of shared trunk layers
batch_size = 16
examples_per_generation = 64
max_games_per_generation = 2
max_iters_per_pipeline = 1
selfplay_readouts = 10
shuffle_buffer_size = 1000
# evaluation
eval_games = 10 # The number of games to play in evaluation
eval_readouts = 10 # How many readouts to make per move in evaluation
eval_verbose = 1 # How verbose the players should be in evaluation
class DummyValidationParams(DummyMiniGoParams, MiniGoParams):
"""Parameters for a dummy model."""
holdout_pct = 1 # Set holdout percent as 1 for validation testing purpose
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Preprocessing step to create, read, write tf.Examples."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import random
import tensorflow as tf # pylint: disable=g-bad-import-order
import coords
import features as features_lib
import numpy as np
import sgf_wrapper
TF_RECORD_CONFIG = tf.python_io.TFRecordOptions(
tf.python_io.TFRecordCompressionType.ZLIB)
# Constructing tf.Examples
def _one_hot(board_size, index):
onehot = np.zeros([board_size * board_size + 1], dtype=np.float32)
onehot[index] = 1
return onehot
def make_tf_example(features, pi, value):
"""Make tf examples.
Args:
features: [N, N, FEATURE_DIM] nparray of uint8
pi: [N * N + 1] nparray of float32
value: float
Returns:
tf example.
"""
return tf.train.Example(
features=tf.train.Features(
feature={
'x': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[features.tostring()])),
'pi': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[pi.tostring()])),
'outcome': tf.train.Feature(
float_list=tf.train.FloatList(value=[value]))
}))
def write_tf_examples(filename, tf_examples, serialize=True):
"""Write tf.Example to files.
Args:
filename: Where to write tf.records
tf_examples: An iterable of tf.Example
serialize: whether to serialize the examples.
"""
with tf.python_io.TFRecordWriter(
filename, options=TF_RECORD_CONFIG) as writer:
for ex in tf_examples:
if serialize:
writer.write(ex.SerializeToString())
else:
writer.write(ex)
# Read tf.Example from files
def _batch_parse_tf_example(board_size, batch_size, example_batch):
"""Parse tf examples.
Args:
board_size: the go board size
batch_size: the batch size
example_batch: a batch of tf.Example
Returns:
A tuple (feature_tensor, dict of output tensors)
"""
features = {
'x': tf.FixedLenFeature([], tf.string),
'pi': tf.FixedLenFeature([], tf.string),
'outcome': tf.FixedLenFeature([], tf.float32),
}
parsed = tf.parse_example(example_batch, features)
x = tf.decode_raw(parsed['x'], tf.uint8)
x = tf.cast(x, tf.float32)
x = tf.reshape(x, [batch_size, board_size, board_size,
features_lib.NEW_FEATURES_PLANES])
pi = tf.decode_raw(parsed['pi'], tf.float32)
pi = tf.reshape(pi, [batch_size, board_size * board_size + 1])
outcome = parsed['outcome']
outcome.set_shape([batch_size])
return (x, {'pi_tensor': pi, 'value_tensor': outcome})
def read_tf_records(
shuffle_buffer_size, batch_size, tf_records, num_repeats=None,
shuffle_records=True, shuffle_examples=True, filter_amount=1.0):
"""Read tf records.
Args:
shuffle_buffer_size: how big of a buffer to fill before shuffling
batch_size: batch size to return
tf_records: a list of tf_record filenames
num_repeats: how many times the data should be read (default: infinite)
shuffle_records: whether to shuffle the order of files read
shuffle_examples: whether to shuffle the tf.Examples
filter_amount: what fraction of records to keep
Returns:
a tf dataset of batched tensors
"""
if shuffle_records:
random.shuffle(tf_records)
record_list = tf.data.Dataset.from_tensor_slices(tf_records)
# compression_type here must agree with write_tf_examples
# cycle_length = how many tfrecord files are read in parallel
# block_length = how many tf.Examples are read from each file before
# moving to the next file
# The idea is to shuffle both the order of the files being read,
# and the examples being read from the files.
dataset = record_list.interleave(
lambda x: tf.data.TFRecordDataset(x, compression_type='ZLIB'),
cycle_length=64, block_length=16)
dataset = dataset.filter(
lambda x: tf.less(tf.random_uniform([1]), filter_amount)[0])
# TODO(amj): apply py_func for transforms here.
if num_repeats is not None:
dataset = dataset.repeat(num_repeats)
else:
dataset = dataset.repeat()
if shuffle_examples:
dataset = dataset.shuffle(buffer_size=shuffle_buffer_size)
dataset = dataset.batch(batch_size)
return dataset
def get_input_tensors(params, batch_size, tf_records, num_repeats=None,
shuffle_records=True, shuffle_examples=True,
filter_amount=0.05):
"""Read tf.Records and prepare them for ingestion by dualnet.
Args:
params: An object of hyperparameters
batch_size: batch size to return
tf_records: a list of tf_record filenames
num_repeats: how many times the data should be read (default: infinite)
shuffle_records: whether to shuffle the order of files read
shuffle_examples: whether to shuffle the tf.Examples
filter_amount: what fraction of records to keep
Returns:
A dict of tensors (see return value of batch_parse_tf_example)
"""
shuffle_buffer_size = params.shuffle_buffer_size
dataset = read_tf_records(
shuffle_buffer_size, batch_size, tf_records, num_repeats=num_repeats,
shuffle_records=shuffle_records, shuffle_examples=shuffle_examples,
filter_amount=filter_amount)
dataset = dataset.filter(lambda t: tf.equal(tf.shape(t)[0], batch_size))
def batch_parse_tf_example(batch_size, dataset):
return _batch_parse_tf_example(params.board_size, batch_size, dataset)
dataset = dataset.map(functools.partial(
batch_parse_tf_example, batch_size))
return dataset.make_one_shot_iterator().get_next()
# End-to-end utility functions
def make_dataset_from_selfplay(data_extracts, params):
"""Make an iterable of tf.Examples.
Args:
data_extracts: An iterable of (position, pi, result) tuples
params: An object of hyperparameters
Returns:
An iterable of tf.Examples.
"""
board_size = params.board_size
tf_examples = (make_tf_example(features_lib.extract_features(
board_size, pos), pi, result) for pos, pi, result in data_extracts)
return tf_examples
def make_dataset_from_sgf(board_size, sgf_filename, tf_record):
pwcs = sgf_wrapper.replay_sgf_file(board_size, sgf_filename)
def make_tf_example_from_pwc(pwcs):
return _make_tf_example_from_pwc(board_size, pwcs)
tf_examples = map(make_tf_example_from_pwc, pwcs)
write_tf_examples(tf_record, tf_examples)
def _make_tf_example_from_pwc(board_size, position_w_context):
features = features_lib.extract_features(
board_size, position_w_context.position)
pi = _one_hot(board_size, coords.to_flat(
board_size, position_w_context.next_move))
value = position_w_context.result
return make_tf_example(features, pi, value)
def shuffle_tf_examples(shuffle_buffer_size, gather_size, records_to_shuffle):
"""Read through tf.Record and yield shuffled, but unparsed tf.Examples.
Args:
shuffle_buffer_size: the size for shuffle buffer
gather_size: The number of tf.Examples to be gathered together
records_to_shuffle: A list of filenames
Yields:
An iterator yielding lists of bytes, which are serialized tf.Examples.
"""
dataset = read_tf_records(shuffle_buffer_size, gather_size,
records_to_shuffle, num_repeats=1)
batch = dataset.make_one_shot_iterator().get_next()
sess = tf.Session()
while True:
try:
result = sess.run(batch)
yield list(result)
except tf.errors.OutOfRangeError:
break
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment