"vscode:/vscode.git/clone" did not exist on "798e1d0cfaa0dfc44e9925fb73eecb93ea0c71f1"
Unverified Commit 84da970e authored by Yanhui Liang's avatar Yanhui Liang Committed by GitHub
Browse files

Add minigo (#3955)

* Add minigo

* Fix comments and make python version compatible
parent 6ff0a53f
# MiniGo
This is a simplified implementation of MiniGo based on the code provided by the authors: [MiniGo](https://github.com/tensorflow/minigo).
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, built on MuGo. The current implementation consists of three main modules: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently the **model** part is our focus.
This implementation maintains the features of model training and validation, and also provides evaluation of two Go models.
## DualNet Model
The input to the neural network is a [board_size * board_size * 17] image stack
comprising 17 binary feature planes. 8 feature planes consist of binary values
indicating the presence of the current player's stones; A further 8 feature
planes represent the corresponding features for the opponent's stones; The final
feature plane represents the color to play, and has a constant value of either 1
if black is to play or 0 if white to play. Check `features.py` for more details.
In MiniGo implementation, the input features are processed by a residual tower
that consists of a single convolutional block followed by either 9 or 19
residual blocks.
The convolutional block applies the following modules:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
Each residual block applies the following modules sequentially to its input:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
5. Batch normalization
6. A skip connection that adds the input to the block
7. A rectifier non-linearity
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
The output of the residual tower is passed into two separate "heads" for
computing the policy and value respectively. The policy head applies the
following modules:
1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move
The value head applies the following modules:
1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
board size and 64 for 9x9 board size
5. A rectifier non-linearity
6. A fully connected linear layer to a scalar
7. A tanh non-linearity outputting a scalar in the range [-1, 1]
The overall network depth, in the 10 or 20 block network, is 19 or 39
parameterized layers respectively for the residual tower, plus an additional 2
layers for the policy head and 3 layers for the value head.
## Getting Started
Please follow the [instructions](https://github.com/tensorflow/minigo/blob/master/README.md#getting-started) in original Minigo repo to set up the environment.
## Training Model
One iteration of reinforcement learning consists of the following steps:
- Bootstrap: initializes a random model
- Selfplay: plays games with the latest model, producing data used for training
- Gather: groups games played with the same model into larger files of tfexamples.
- Train: trains a new model with the selfplay results from the most recent N
generations.
Run `minigo.py`.
```
python minigo.py
```
## Validating Model
Run `minigo.py` with `--validation` argument
```
python minigo.py --validation
```
The `--validation` argument is to generate holdout dataset for model validation
## Evaluating MiniGo Models
Run `minigo.py` with `--evaluation` argument
```
python minigo.py --evaluation
```
The `--evaluation` argument is to invoke the evaluation between the latest model and the current best model.
## Testing Pipeline
As the whole RL pipeline may takes hours to train even for a 9x9 board size, we provide a dummy model with a `--debug` mode for testing purpose.
Run `minigo.py` with `--debug` argument
```
python minigo.py --debug
```
The `--debug` argument is for testing purpose with a dummy model.
Validation and evaluation can also be tested with the dummy model by combing their corresponding arguments with `--debug`.
To test validation, run the following commands:
```
python minigo.py --debug --validation
```
To test evaluation, run the following commands:
```
python minigo.py --debug --evaluation
```
To test both validation and evaluation, run the following commands:
```
python minigo.py --debug --validation --evaluation
```
## MCTS and Go features (TODO)
Code clean up on MCTS and Go features.
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Logic for dealing with coordinates.
This introduces some helpers and terminology that are used throughout MiniGo.
MiniGo Coordinate: This is a tuple of the form (row, column) that is indexed
starting out at (0, 0) from the upper-left.
Flattened Coordinate: this is a number ranging from 0 - N^2 (so N^2+1
possible values). The extra value N^2 is used to mark a 'pass' move.
SGF Coordinate: Coordinate used for SGF serialization format. Coordinates use
two-letter pairs having the form (column, row) indexed from the upper-left
where 0, 0 = 'aa'.
KGS Coordinate: Human-readable coordinate string indexed from bottom left, with
the first character a capital letter for the column and the second a number
from 1-19 for the row. Note that KGS chooses to skip the letter 'I' due to
its similarity with 'l' (lowercase 'L').
PYGTP Coordinate: Tuple coordinate indexed starting at 1,1 from bottom-left
in the format (column, row)
So, for a 19x19,
Coord Type upper_left upper_right pass
-------------------------------------------------------
minigo coord (0, 0) (0, 18) None
flat 0 18 361
SGF 'aa' 'sa' ''
KGS 'A19' 'T19' 'pass'
pygtp (1, 19) (19, 19) (0, 0)
"""
import gtp
# We provide more than 19 entries here in case of boards larger than 19 x 19.
_SGF_COLUMNS = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
_KGS_COLUMNS = 'ABCDEFGHJKLMNOPQRSTUVWXYZ'
def from_flat(board_size, flat):
"""Converts from a flattened coordinate to a MiniGo coordinate."""
if flat == board_size * board_size:
return None
return divmod(flat, board_size)
def to_flat(board_size, coord):
"""Converts from a MiniGo coordinate to a flattened coordinate."""
if coord is None:
return board_size * board_size
return board_size * coord[0] + coord[1]
def from_sgf(sgfc):
"""Converts from an SGF coordinate to a MiniGo coordinate."""
if not sgfc:
return None
return _SGF_COLUMNS.index(sgfc[1]), _SGF_COLUMNS.index(sgfc[0])
def to_sgf(coord):
"""Converts from a MiniGo coordinate to an SGF coordinate."""
if coord is None:
return ''
return _SGF_COLUMNS[coord[1]] + _SGF_COLUMNS[coord[0]]
def from_kgs(board_size, kgsc):
"""Converts from a KGS coordinate to a MiniGo coordinate."""
if kgsc == 'pass':
return None
kgsc = kgsc.upper()
col = _KGS_COLUMNS.index(kgsc[0])
row_from_bottom = int(kgsc[1:])
return board_size - row_from_bottom, col
def to_kgs(board_size, coord):
"""Converts from a MiniGo coordinate to a KGS coordinate."""
if coord is None:
return 'pass'
y, x = coord
return '{}{}'.format(_KGS_COLUMNS[x], board_size - y)
def from_pygtp(board_size, pygtpc):
"""Converts from a pygtp coordinate to a MiniGo coordinate."""
# GTP has a notion of both a Pass and a Resign, both of which are mapped to
# None, so the conversion is not precisely bijective.
if pygtpc in (gtp.PASS, gtp.RESIGN):
return None
return board_size - pygtpc[1], pygtpc[0] - 1
def to_pygtp(board_size, coord):
"""Converts from a MiniGo coordinate to a pygtp coordinate."""
if coord is None:
return gtp.PASS
return coord[1] + 1, board_size - coord[0]
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains utility and supporting functions for DualNet.
This module provides the model interface, including functions for DualNet model
bootstrap, training, validation, loading and exporting.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import tensorflow as tf # pylint: disable=g-bad-import-order
import dualnet_model
import features
import preprocessing
import symmetries
class DualNetRunner(object):
"""The DualNetRunner class for the complete model with graph and weights.
This class can restore the model from saved files, and provide inference for
given examples.
"""
def __init__(self, save_file, params):
"""Initialize the dual network from saved model/checkpoints.
Args:
save_file: Path where model parameters were previously saved. For example:
'/tmp/minigo/models_dir/000000-bootstrap/'
params: An object with hyperparameters for DualNetRunner
"""
self.save_file = save_file
self.hparams = params
self.inference_input = None
self.inference_output = None
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
self.sess = tf.Session(graph=tf.Graph(), config=config)
self.initialize_graph()
def initialize_graph(self):
"""Initialize the graph with saved model."""
with self.sess.graph.as_default():
input_features, labels = get_inference_input(self.hparams)
estimator_spec = dualnet_model.model_fn(
input_features, labels, tf.estimator.ModeKeys.PREDICT, self.hparams)
self.inference_input = input_features
self.inference_output = estimator_spec.predictions
if self.save_file is not None:
self.initialize_weights(self.save_file)
else:
self.sess.run(tf.global_variables_initializer())
def initialize_weights(self, save_file):
"""Initialize the weights from the given save_file.
Assumes that the graph has been constructed, and the save_file contains
weights that match the graph. Used to set the weights to a different version
of the player without redefining the entire graph.
Args:
save_file: Path where model parameters were previously saved.
"""
tf.train.Saver().restore(self.sess, save_file)
def run(self, position, use_random_symmetry=True):
"""Compute the policy and value output for a given position.
Args:
position: A given go board status
use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
the extracted feature (defined in features.py) of the given position
Returns:
prob, value: The policy and value output (defined in dualnet_model.py)
"""
probs, values = self.run_many(
[position], use_random_symmetry=use_random_symmetry)
return probs[0], values[0]
def run_many(self, positions, use_random_symmetry=True):
"""Compute the policy and value output for given positions.
Args:
positions: A list of positions for go board status
use_random_symmetry: Apply random symmetry (defined in symmetries.py) to
the extracted features (defined in features.py) of the given positions
Returns:
probabilities, value: The policy and value outputs (defined in
dualnet_model.py)
"""
def _extract_features(positions):
return features.extract_features(self.hparams.board_size, positions)
processed = list(map(_extract_features, positions))
# processed = [
# features.extract_features(self.hparams.board_size, p) for p in positions]
if use_random_symmetry:
syms_used, processed = symmetries.randomize_symmetries_feat(processed)
# feed_dict is a dict object to provide the input examples for the step of
# inference. sess.run() returns the inference predictions (indicated by
# self.inference_output) of the given input as outputs
outputs = self.sess.run(
self.inference_output, feed_dict={self.inference_input: processed})
probabilities, value = outputs['policy_output'], outputs['value_output']
if use_random_symmetry:
probabilities = symmetries.invert_symmetries_pi(
self.hparams.board_size, syms_used, probabilities)
return probabilities, value
def get_inference_input(params):
"""Set up placeholders for input features/labels.
Args:
params: An object to indicate the hyperparameters of the model.
Returns:
The features and output tensors that get passed into model_fn. Check
dualnet_model.py for more details on the models input and output.
"""
input_features = tf.placeholder(
tf.float32, [None, params.board_size, params.board_size,
features.NEW_FEATURES_PLANES],
name='pos_tensor')
labels = {
'pi_tensor': tf.placeholder(
tf.float32, [None, params.board_size * params.board_size + 1]),
'value_tensor': tf.placeholder(tf.float32, [None])
}
return input_features, labels
def bootstrap(working_dir, params):
"""Initialize a tf.Estimator run with random initial weights.
Args:
working_dir: The directory where tf.estimator will drop logs,
checkpoints, and so on
params: hyperparams of the model.
"""
# Forge an initial checkpoint with the name that subsequent Estimator will
# expect to find.
estimator_initial_checkpoint_name = 'model.ckpt-1'
save_file = os.path.join(working_dir,
estimator_initial_checkpoint_name)
sess = tf.Session()
with sess.graph.as_default():
input_features, labels = get_inference_input(params)
dualnet_model.model_fn(
input_features, labels, tf.estimator.ModeKeys.PREDICT, params)
sess.run(tf.global_variables_initializer())
tf.train.Saver().save(sess, save_file)
def export_model(working_dir, model_path):
"""Take the latest checkpoint and export it to model_path for selfplay.
Assumes that all relevant model files are prefixed by the same name.
(For example, foo.index, foo.meta and foo.data-00000-of-00001).
Args:
working_dir: The directory where tf.estimator keeps its checkpoints.
model_path: Either a local path or a gs:// path to export model to.
"""
latest_checkpoint = tf.train.latest_checkpoint(working_dir)
all_checkpoint_files = tf.gfile.Glob(latest_checkpoint + '*')
for filename in all_checkpoint_files:
suffix = filename.partition(latest_checkpoint)[2]
destination_path = model_path + suffix
tf.gfile.Copy(filename, destination_path)
def train(working_dir, tf_records, generation_num, params):
"""Train the model for a specific generation.
Args:
working_dir: The model working directory to save model parameters,
drop logs, checkpoints, and so on.
tf_records: A list of tf_record filenames for training input.
generation_num: The generation to be trained.
params: hyperparams of the model.
Raises:
ValueError: if generation_num is not greater than 0.
"""
if generation_num <= 0:
raise ValueError('Model 0 is random weights')
estimator = tf.estimator.Estimator(
dualnet_model.model_fn, model_dir=working_dir, params=params)
max_steps = (generation_num * params.examples_per_generation
// params.batch_size)
profiler_hook = tf.train.ProfilerHook(output_dir=working_dir, save_secs=600)
def input_fn():
return preprocessing.get_input_tensors(
params, params.batch_size, tf_records)
estimator.train(
input_fn, hooks=[profiler_hook], max_steps=max_steps)
def validate(working_dir, tf_records, params):
"""Perform model validation on the hold out data.
Args:
working_dir: The model working directory.
tf_records: A list of tf_records filenames for holdout data.
params: hyperparams of the model.
"""
estimator = tf.estimator.Estimator(
dualnet_model.model_fn, model_dir=working_dir, params=params)
def input_fn():
return preprocessing.get_input_tensors(
params, params.batch_size, tf_records, filter_amount=0.05)
estimator.evaluate(input_fn, steps=1000)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Defines DualNet model, the architecture of the policy and value network.
The input to the neural network is a [board_size * board_size * 17] image stack
comprising 17 binary feature planes. 8 feature planes consist of binary values
indicating the presence of the current player's stones; A further 8 feature
planes represent the corresponding features for the opponent's stones; The final
feature plane represents the color to play, and has a constant value of either 1
if black is to play or 0 if white to play. Check 'features.py' for more details.
In MiniGo implementation, the input features are processed by a residual tower
that consists of a single convolutional block followed by either 9 or 19
residual blocks.
The convolutional block applies the following modules:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
Each residual block applies the following modules sequentially to its input:
1. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A convolution of num_filter filters of kernel size 3 x 3 with stride 1
5. Batch normalization
6. A skip connection that adds the input to the block
7. A rectifier non-linearity
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size.
The output of the residual tower is passed into two separate "heads" for
computing the policy and value respectively. The policy head applies the
following modules:
1. A convolution of 2 filters of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer that outputs a vector of size 19^2 + 1 = 362
corresponding to logit probabilities for all intersections and the pass move
The value head applies the following modules:
1. A convolution of 1 filter of kernel size 1 x 1 with stride 1
2. Batch normalization
3. A rectifier non-linearity
4. A fully connected linear layer to a hidden layer of size 256 for 19 x 19
board size and 64 for 9x9 board size
5. A rectifier non-linearity
6. A fully connected linear layer to a scalar
7. A tanh non-linearity outputting a scalar in the range [-1, 1]
The overall network depth, in the 10 or 20 block network, is 19 or 39
parameterized layers respectively for the residual tower, plus an additional 2
layers for the policy head and 3 layers for the value head.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
_BATCH_NORM_DECAY = 0.997
_BATCH_NORM_EPSILON = 1e-5
def _batch_norm(inputs, training, center=True, scale=True):
"""Performs a batch normalization using a standard set of parameters."""
return tf.layers.batch_normalization(
inputs=inputs, momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON,
center=center, scale=scale, fused=True, training=training)
def _conv2d(inputs, filters, kernel_size):
"""Performs 2D convolution with a standard set of parameters."""
return tf.layers.conv2d(
inputs=inputs, filters=filters, kernel_size=kernel_size,
padding='same')
def _conv_block(inputs, filters, kernel_size, training):
"""A convolutional block.
Args:
inputs: A tensor representing a batch of input features with shape
[BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
filters: The number of filters for network layers in residual tower.
kernel_size: The kernel to be used in conv2d.
training: Either True or False, whether we are currently training the
model. Needed for batch norm.
Returns:
The output tensor of the convolutional block layer.
"""
conv = _conv2d(inputs, filters, kernel_size)
batchn = _batch_norm(conv, training)
output = tf.nn.relu(batchn)
return output
def _res_block(inputs, filters, kernel_size, training):
"""A residual block.
Args:
inputs: A tensor representing a batch of input features with shape
[BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES].
filters: The number of filters for network layers in residual tower.
kernel_size: The kernel to be used in conv2d.
training: Either True or False, whether we are currently training the
model. Needed for batch norm.
Returns:
The output tensor of the residual block layer.
"""
initial_output = _conv_block(inputs, filters, kernel_size, training)
int_layer2_conv = _conv2d(initial_output, filters, kernel_size)
int_layer2_batchn = _batch_norm(int_layer2_conv, training)
output = tf.nn.relu(inputs + int_layer2_batchn)
return output
class Model(object):
"""Base class for building the DualNet Model."""
def __init__(self, num_filters, num_shared_layers, fc_width, board_size):
"""Initialize a model for computing the policy and value in RL.
Args:
num_filters: Number of filters (AlphaGoZero used 256). We use 128 by
default for a 19x19 go board, and 32 for 9x9 size.
num_shared_layers: Number of shared residual blocks. AGZ used both 19
and 39. Here we use 19 for 19x19 size and 9 for 9x9 size because it's
faster to train.
fc_width: Dimensionality of the fully connected linear layer.
board_size: A single integer for the board size.
"""
self.num_filters = num_filters
self.num_shared_layers = num_shared_layers
self.fc_width = fc_width
self.board_size = board_size
self.kernel_size = [3, 3] # kernel size is from AGZ paper
def __call__(self, inputs, training):
"""Add operations to classify a batch of input Go features.
Args:
inputs: A Tensor representing a batch of input Go features with shape
[BATCH_SIZE, board_size, board_size, features.NEW_FEATURES_PLANES]
training: A boolean. Set to True to add operations required only when
training the classifier.
Returns:
policy_logits: A vector of size self.board_size * self.board_size + 1
corresponding to the policy logit probabilities for all intersections
and the pass move.
value_logits: A scalar for the value logits output
"""
initial_output = _conv_block(
inputs=inputs, filters=self.num_filters,
kernel_size=self.kernel_size, training=training)
# the shared stack
shared_output = initial_output
for _ in range(self.num_shared_layers):
shared_output = _res_block(
inputs=shared_output, filters=self.num_filters,
kernel_size=self.kernel_size, training=training)
# policy head
policy_conv2d = _conv2d(inputs=shared_output, filters=2, kernel_size=[1, 1])
policy_batchn = _batch_norm(inputs=policy_conv2d, training=training,
center=False, scale=False)
policy_relu = tf.nn.relu(policy_batchn)
policy_logits = tf.layers.dense(
tf.reshape(policy_relu, [-1, self.board_size * self.board_size * 2]),
self.board_size * self.board_size + 1)
# value head
value_conv2d = _conv2d(shared_output, filters=1, kernel_size=[1, 1])
value_batchn = _batch_norm(value_conv2d, training,
center=False, scale=False)
value_relu = tf.nn.relu(value_batchn)
value_fc_hidden = tf.nn.relu(tf.layers.dense(
tf.reshape(value_relu, [-1, self.board_size * self.board_size]),
self.fc_width))
value_logits = tf.reshape(tf.layers.dense(value_fc_hidden, 1), [-1])
return policy_logits, value_logits
def model_fn(features, labels, mode, params, config=None): # pylint: disable=unused-argument
"""DualNet model function.
Args:
features: tensor with shape
[BATCH_SIZE, self.board_size, self.board_size,
features.NEW_FEATURES_PLANES]
labels: dict from string to tensor with shape
'pi_tensor': [BATCH_SIZE, self.board_size * self.board_size + 1]
'value_tensor': [BATCH_SIZE]
mode: a tf.estimator.ModeKeys (batchnorm params update for TRAIN only)
params: an object of hyperparams
config: ignored; is required by Estimator API.
Returns:
EstimatorSpec parameterized according to the input params and the current
mode.
"""
model = Model(params.num_filters, params.num_shared_layers, params.fc_width,
params.board_size)
policy_logits, value_logits = model(
features, mode == tf.estimator.ModeKeys.TRAIN)
policy_output = tf.nn.softmax(policy_logits, name='policy_output')
value_output = tf.nn.tanh(value_logits, name='value_output')
# Calculate model loss. The loss function sums over the mean-squared error,
# the cross-entropy losses and the l2 regularization term.
# Cross-entropy of policy
policy_entropy = -tf.reduce_mean(tf.reduce_sum(
policy_output * tf.log(policy_output), axis=1))
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=policy_logits, labels=labels['pi_tensor']))
# Mean squared error
value_cost = tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
# L2 term
l2_cost = params.l2_strength * tf.add_n(
[tf.nn.l2_loss(v) for v in tf.trainable_variables()
if 'bias' not in v.name])
# The loss function
combined_cost = policy_cost + value_cost + l2_cost
# Get model train ops
global_step = tf.train.get_or_create_global_step()
boundaries = [int(1e6), int(2e6)]
values = [1e-2, 1e-3, 1e-4]
learning_rate = tf.train.piecewise_constant(
global_step, boundaries, values)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = tf.train.MomentumOptimizer(
learning_rate, params.momentum).minimize(
combined_cost, global_step=global_step)
# Create multiple tensors for logging purpose
metric_ops = {
'accuracy': tf.metrics.accuracy(labels=labels['pi_tensor'],
predictions=policy_output,
name='accuracy_op'),
'policy_cost': tf.metrics.mean(policy_cost),
'value_cost': tf.metrics.mean(value_cost),
'l2_cost': tf.metrics.mean(l2_cost),
'policy_entropy': tf.metrics.mean(policy_entropy),
'combined_cost': tf.metrics.mean(combined_cost),
}
for metric_name, metric_op in metric_ops.items():
tf.summary.scalar(metric_name, metric_op[1])
# Return tf.estimator.EstimatorSpec
return tf.estimator.EstimatorSpec(
mode=mode,
predictions={
'policy_output': policy_output,
'value_output': value_output,
},
loss=combined_cost,
train_op=train_op,
eval_metric_ops=metric_ops)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Evaluation of playing games between two neural nets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import go
from gtp_wrapper import MCTSPlayer
import sgf_wrapper
def play_match(params, black_net, white_net, games, readouts,
sgf_dir, verbosity):
"""Plays matches between two neural nets.
One net that wins by a margin of 55% will be the winner.
Args:
params: An object of hyperparameters.
black_net: Instance of the DualNetRunner class to play as black.
white_net: Instance of the DualNetRunner class to play as white.
games: Number of games to play. We play all the games at the same time.
readouts: Number of readouts to perform for each step in each game.
sgf_dir: Directory to write the sgf results.
verbosity: Verbosity to show evaluation process.
Returns:
'B' is the winner is black_net, otherwise 'W'.
"""
# For n games, we create lists of n black and n white players
black = MCTSPlayer(
params.board_size, black_net, verbosity=verbosity, two_player_mode=True,
num_parallel=params.simultaneous_leaves)
white = MCTSPlayer(
params.board_size, white_net, verbosity=verbosity, two_player_mode=True,
num_parallel=params.simultaneous_leaves)
black_name = os.path.basename(black_net.save_file)
white_name = os.path.basename(white_net.save_file)
black_win_counts = 0
white_win_counts = 0
for i in range(games):
num_move = 0 # The move number of the current game
black.initialize_game()
white.initialize_game()
while True:
start = time.time()
active = white if num_move % 2 else black
inactive = black if num_move % 2 else white
current_readouts = active.root.N
while active.root.N < current_readouts + readouts:
active.tree_search()
# print some stats on the search
if verbosity >= 3:
print(active.root.position)
# First, check the roots for hopeless games.
if active.should_resign(): # Force resign
active.set_result(-active.root.position.to_play, was_resign=True)
inactive.set_result(
active.root.position.to_play, was_resign=True)
if active.is_done():
fname = '{:d}-{:s}-vs-{:s}-{:d}.sgf'.format(
int(time.time()), white_name, black_name, i)
with open(os.path.join(sgf_dir, fname), 'w') as f:
sgfstr = sgf_wrapper.make_sgf(
params.board_size, active.position.recent, active.result_string,
black_name=black_name, white_name=white_name)
f.write(sgfstr)
print('Finished game', i, active.result_string)
if active.result_string is not None:
if active.result_string[0] == 'B':
black_win_counts += 1
elif active.result_string[0] == 'W':
white_win_counts += 1
break
move = active.pick_move()
active.play_move(move)
inactive.play_move(move)
dur = time.time() - start
num_move += 1
if (verbosity > 1) or (verbosity == 1 and num_move % 10 == 9):
timeper = (dur / readouts) * 100.0
print(active.root.position)
print('{:d}: {:d} readouts, {:.3f} s/100. ({:.2f} sec)'.format(
num_move, readouts, timeper, dur))
if (black_win_counts - white_win_counts) > params.eval_win_rate * games:
return go.BLACK_NAME
else:
return go.WHITE_NAME
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Features used by AlphaGo Zero, in approximate order of importance.
Feature # Notes
Stone History 16 The stones of each color during the last 8 moves.
Ones 1 Constant plane of 1s
All features with 8 planes are 1-hot encoded, with plane i marked with 1
only if the feature was equal to i. Any features >= 8 would be marked as 8.
This file includes the features from from AlphaGo Zero (AGZ) as NEW_FEATURES.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import go
import numpy as np
def planes(num_planes):
# to specify the number of planes in the features. For example, for a 19x19
# go board, the input stone feature will be in the shape of [19, 19, 16],
# where the third dimension is the num_planes.
def deco(f):
f.planes = num_planes
return f
return deco
@planes(16)
def stone_features(board_size, position):
"""Create the 16 planes of features for a given position.
Args:
board_size: the go board size.
position: a given go board status.
Returns:
The 16 plane features.
"""
# a bit easier to calculate it with axis 0 being the 16 board states,
# and then roll axis 0 to the end.
features = np.zeros([16, board_size, board_size], dtype=np.uint8)
num_deltas_avail = position.board_deltas.shape[0]
cumulative_deltas = np.cumsum(position.board_deltas, axis=0)
last_eight = np.tile(position.board, [8, 1, 1])
# apply deltas to compute previous board states
last_eight[1:num_deltas_avail + 1] -= cumulative_deltas
# if no more deltas are available, just repeat oldest board.
last_eight[num_deltas_avail + 1:] = last_eight[num_deltas_avail].reshape(
1, board_size, board_size)
features[::2] = last_eight == position.to_play
features[1::2] = last_eight == -position.to_play
return np.rollaxis(features, 0, 3)
@planes(1)
def color_to_play_feature(board_size, position):
if position.to_play == go.BLACK:
return np.ones([board_size, board_size, 1], dtype=np.uint8)
else:
return np.zeros([board_size, board_size, 1], dtype=np.uint8)
NEW_FEATURES = [
stone_features,
color_to_play_feature
]
NEW_FEATURES_PLANES = sum(f.planes for f in NEW_FEATURES)
def extract_features(board_size, position, features=None):
if features is None:
features = NEW_FEATURES
return np.concatenate([feature(board_size, position) for feature in features],
axis=2)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Describe the Go game status.
A board is a NxN numpy array.
A Coordinate is a tuple index into the board.
A Move is a (Coordinate c | None).
A PlayerMove is a (Color, Move) tuple
(0, 0) is considered to be the upper left corner of the board, and (18, 0)
is the lower left.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import namedtuple
import copy
import itertools
import coords
import numpy as np
# Represent a board as a numpy array, with 0 empty, 1 is black, -1 is white.
# This means that swapping colors is as simple as multiplying array by -1.
WHITE, EMPTY, BLACK, FILL, KO, UNKNOWN = range(-1, 5)
# Represents "group not found" in the LibertyTracker object
MISSING_GROUP_ID = -1
BLACK_NAME = 'BLACK'
WHITE_NAME = 'WHITE'
def _check_bounds(board_size, c):
return c[0] % board_size == c[0] and c[1] % board_size == c[1]
def get_neighbors_diagonals(board_size):
all_coords = [(i, j) for i in range(board_size) for j in range(board_size)]
def check_bounds(c):
return _check_bounds(board_size, c)
neighbors = {(x, y): list(filter(check_bounds, [
(x+1, y), (x-1, y), (x, y+1), (x, y-1)])) for x, y in all_coords}
diagonals = {(x, y): list(filter(check_bounds, [
(x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)])) for x, y in all_coords}
return neighbors, diagonals
class IllegalMove(Exception):
pass
class PlayerMove(namedtuple('PlayerMove', ['color', 'move'])):
pass
class PositionWithContext(namedtuple('SgfPosition',
['position', 'next_move', 'result'])):
pass
def place_stones(board, color, stones):
for s in stones:
board[s] = color
def replay_position(board_size, position, result):
"""Wrapper for a go.Position which replays its history.
Assumes an empty start position! (i.e. no handicap, and history must
be exhaustive.)
Result must be passed in, since a resign cannot be inferred from position
history alone.
for position_w_context in replay_position(position):
print(position_w_context.position)
"""
if position.n != len(position.recent):
raise ValueError('Position history is incomplete!')
pos = Position(board_size=board_size, komi=position.komi)
for player_move in position.recent:
color, next_move = player_move
yield PositionWithContext(pos, next_move, result)
pos = pos.play_move(next_move, color=color)
def find_reached(board_size, board, c):
color = board[c]
chain = set([c])
reached = set()
frontier = [c]
neighbors, _ = get_neighbors_diagonals(board_size)
while frontier:
current = frontier.pop()
chain.add(current)
for n in neighbors[current]:
if board[n] == color and n not in chain:
frontier.append(n)
elif board[n] != color:
reached.add(n)
return chain, reached
def is_koish(board_size, board, c):
"""Check if c is surrounded on all sides by 1 color, and return that color."""
if board[c] != EMPTY:
return None
full_neighbors, _ = get_neighbors_diagonals(board_size)
neighbors = {board[n] for n in full_neighbors[c]}
if len(neighbors) == 1 and EMPTY not in neighbors:
return list(neighbors)[0]
else:
return None
def is_eyeish(board_size, board, c):
"""Check if c is an eye, for the purpose of restricting MC rollouts."""
# pass is fine.
if c is None:
return
color = is_koish(board_size, board, c)
if color is None:
return None
diagonal_faults = 0
_, diagonals = get_neighbors_diagonals[c]
if len(diagonals) < 4:
diagonal_faults += 1
for d in diagonals:
if not board[d] in (color, EMPTY):
diagonal_faults += 1
if diagonal_faults > 1:
return None
else:
return color
class Group(namedtuple('Group', ['id', 'stones', 'liberties', 'color'])):
"""
stones: a frozenset of Coordinates belonging to this group
liberties: a frozenset of Coordinates that are empty and adjacent to
this group.
color: color of this group
"""
def __eq__(self, other):
return (self.stones == other.stones and self.liberties == other.liberties
and self.color == other.color)
class LibertyTracker(object):
@staticmethod
def from_board(board_size, board):
board = np.copy(board)
curr_group_id = 0
lib_tracker = LibertyTracker(board_size)
for color in (WHITE, BLACK):
while color in board:
curr_group_id += 1
found_color = np.where(board == color)
coord = found_color[0][0], found_color[1][0]
chain, reached = find_reached(board_size, board, coord)
liberties = frozenset(r for r in reached if board[r] == EMPTY)
new_group = Group(curr_group_id, frozenset(
chain), liberties, color)
lib_tracker.groups[curr_group_id] = new_group
for s in chain:
lib_tracker.group_index[s] = curr_group_id
place_stones(board, FILL, chain)
lib_tracker.max_group_id = curr_group_id
liberty_counts = np.zeros([board_size, board_size], dtype=np.uint8)
for group in lib_tracker.groups.values():
num_libs = len(group.liberties)
for s in group.stones:
liberty_counts[s] = num_libs
lib_tracker.liberty_cache = liberty_counts
return lib_tracker
def __init__(self, board_size, group_index=None, groups=None,
liberty_cache=None, max_group_id=1):
# group_index: a NxN numpy array of group_ids. -1 means no group
# groups: a dict of group_id to groups
# liberty_cache: a NxN numpy array of liberty counts
self.board_size = board_size
self.group_index = group_index if group_index is not None else - \
np.ones([board_size, board_size], dtype=np.int32)
self.groups = groups or {}
self.liberty_cache = liberty_cache if liberty_cache is not None else - \
np.zeros([board_size, board_size], dtype=np.uint8)
self.max_group_id = max_group_id
self.neighbors, _ = get_neighbors_diagonals(board_size)
def __deepcopy__(self, memodict={}):
new_group_index = np.copy(self.group_index)
new_lib_cache = np.copy(self.liberty_cache)
# shallow copy
new_groups = copy.copy(self.groups)
return LibertyTracker(
self.board_size, new_group_index, new_groups,
liberty_cache=new_lib_cache, max_group_id=self.max_group_id)
def add_stone(self, color, c):
assert self.group_index[c] == MISSING_GROUP_ID
captured_stones = set()
opponent_neighboring_group_ids = set()
friendly_neighboring_group_ids = set()
empty_neighbors = set()
for n in self.neighbors[c]:
neighbor_group_id = self.group_index[n]
if neighbor_group_id != MISSING_GROUP_ID:
neighbor_group = self.groups[neighbor_group_id]
if neighbor_group.color == color:
friendly_neighboring_group_ids.add(neighbor_group_id)
else:
opponent_neighboring_group_ids.add(neighbor_group_id)
else:
empty_neighbors.add(n)
new_group = self._create_group(color, c, empty_neighbors)
for group_id in friendly_neighboring_group_ids:
new_group = self._merge_groups(group_id, new_group.id)
# new_group becomes stale as _update_liberties and
# _handle_captures are called; must refetch with self.groups[new_group.id]
for group_id in opponent_neighboring_group_ids:
neighbor_group = self.groups[group_id]
if len(neighbor_group.liberties) == 1:
captured = self._capture_group(group_id)
captured_stones.update(captured)
else:
self._update_liberties(group_id, remove={c})
self._handle_captures(captured_stones)
# suicide is illegal
if len(self.groups[new_group.id].liberties) == 0:
raise IllegalMove('Move at {} would commit suicide!\n'.format(c))
return captured_stones
def _create_group(self, color, c, liberties):
self.max_group_id += 1
new_group = Group(self.max_group_id, frozenset([c]), liberties, color)
self.groups[new_group.id] = new_group
self.group_index[c] = new_group.id
self.liberty_cache[c] = len(liberties)
return new_group
def _merge_groups(self, group1_id, group2_id):
group1 = self.groups[group1_id]
group2 = self.groups[group2_id]
self.groups[group1_id] = Group(
group1_id, group1.stones | group2.stones, group1.liberties,
group1.color)
del self.groups[group2_id]
for s in group2.stones:
self.group_index[s] = group1_id
self._update_liberties(
group1_id, add=group2.liberties, remove=group2.stones)
return group1
def _capture_group(self, group_id):
dead_group = self.groups[group_id]
del self.groups[group_id]
for s in dead_group.stones:
self.group_index[s] = MISSING_GROUP_ID
self.liberty_cache[s] = 0
return dead_group.stones
def _update_liberties(self, group_id, add=set(), remove=set()):
group = self.groups[group_id]
new_libs = (group.liberties | add) - remove
self.groups[group_id] = Group(
group_id, group.stones, new_libs, group.color)
new_lib_count = len(new_libs)
for s in self.groups[group_id].stones:
self.liberty_cache[s] = new_lib_count
def _handle_captures(self, captured_stones):
for s in captured_stones:
for n in self.neighbors[s]:
group_id = self.group_index[n]
if group_id != MISSING_GROUP_ID:
self._update_liberties(group_id, add={s})
class Position(object):
def __init__(self, board_size, board=None, n=0, komi=7.5, caps=(0, 0),
lib_tracker=None, ko=None, recent=tuple(),
board_deltas=None, to_play=BLACK):
"""
board_size: the go board size.
board: a numpy array
n: an int representing moves played so far
komi: a float, representing points given to the second player.
caps: a (int, int) tuple of captures for B, W.
lib_tracker: a LibertyTracker object
ko: a Move
recent: a tuple of PlayerMoves, such that recent[-1] is the last move.
board_deltas: a np.array of shape (n, go.N, go.N) representing changes
made to the board at each move (played move and captures).
Should satisfy next_pos.board - next_pos.board_deltas[0] == pos.board
to_play: BLACK or WHITE
"""
assert type(recent) is tuple
self.board_size = board_size
self.board = board if board is not None else - \
np.zeros([board_size, board_size], dtype=np.int8)
self.n = n
self.komi = komi
self.caps = caps
self.lib_tracker = lib_tracker or LibertyTracker.from_board(
self.board_size, self.board)
self.ko = ko
self.recent = recent
self.board_deltas = board_deltas if board_deltas is not None else - \
np.zeros([0, board_size, board_size], dtype=np.int8)
self.to_play = to_play
self.last_eight = None
self.neighbors, _ = get_neighbors_diagonals(board_size)
def __deepcopy__(self, memodict={}):
new_board = np.copy(self.board)
new_lib_tracker = copy.deepcopy(self.lib_tracker)
return Position(
self.board_size, new_board, self.n, self.komi, self.caps,
new_lib_tracker, self.ko, self.recent, self.board_deltas, self.to_play)
def __str__(self):
pretty_print_map = {
WHITE: '\x1b[0;31;47mO',
EMPTY: '\x1b[0;31;43m.',
BLACK: '\x1b[0;31;40mX',
FILL: '#',
KO: '*',
}
board = np.copy(self.board)
captures = self.caps
if self.ko is not None:
place_stones(board, KO, [self.ko])
raw_board_contents = []
for i in range(self.board_size):
row = []
for j in range(self.board_size):
appended = '<' if (
self.recent and (i, j) == self.recent[-1].move) else ' '
row.append(pretty_print_map[board[i, j]] + appended)
row.append('\x1b[0m')
raw_board_contents.append(''.join(row))
row_labels = ['%2d ' % i for i in range(self.board_size, 0, -1)]
annotated_board_contents = [''.join(r) for r in zip(
row_labels, raw_board_contents, row_labels)]
header_footer_rows = [
' ' + ' '.join('ABCDEFGHJKLMNOPQRST'[:self.board_size]) + ' ']
annotated_board = '\n'.join(itertools.chain(
header_footer_rows, annotated_board_contents, header_footer_rows))
details = '\nMove: {}. Captures X: {} O: {}\n'.format(
self.n, *captures)
return annotated_board + details
def is_move_suicidal(self, move):
potential_libs = set()
for n in self.neighbors[move]:
neighbor_group_id = self.lib_tracker.group_index[n]
if neighbor_group_id == MISSING_GROUP_ID:
# at least one liberty after playing here, so not a suicide
return False
neighbor_group = self.lib_tracker.groups[neighbor_group_id]
if neighbor_group.color == self.to_play:
potential_libs |= neighbor_group.liberties
elif len(neighbor_group.liberties) == 1:
# would capture an opponent group if they only had one lib.
return False
# it's possible to suicide by connecting several friendly groups
# each of which had one liberty.
potential_libs -= set([move])
return not potential_libs
def is_move_legal(self, move):
"""Checks that a move is on an empty space, not on ko, and not suicide."""
if move is None:
return True
if self.board[move] != EMPTY:
return False
if move == self.ko:
return False
if self.is_move_suicidal(move):
return False
return True
def all_legal_moves(self):
"""Returns a np.array of size go.N**2 + 1, with 1 = legal, 0 = illegal."""
# by default, every move is legal
legal_moves = np.ones([self.board_size, self.board_size], dtype=np.int8)
# ...unless there is already a stone there
legal_moves[self.board != EMPTY] = 0
# calculate which spots have 4 stones next to them
# padding is because the edge always counts as a lost liberty.
adjacent = np.ones([self.board_size+2, self.board_size+2], dtype=np.int8)
adjacent[1:-1, 1:-1] = np.abs(self.board)
num_adjacent_stones = (adjacent[:-2, 1:-1] + adjacent[1:-1, :-2] +
adjacent[2:, 1:-1] + adjacent[1:-1, 2:])
# Surrounded spots are those that are empty and have 4 adjacent stones.
surrounded_spots = np.multiply(
(self.board == EMPTY),
(num_adjacent_stones == 4))
# Such spots are possibly illegal, unless they are capturing something.
# Iterate over and manually check each spot.
for coord in np.transpose(np.nonzero(surrounded_spots)):
if self.is_move_suicidal(tuple(coord)):
legal_moves[tuple(coord)] = 0
# ...and retaking ko is always illegal
if self.ko is not None:
legal_moves[self.ko] = 0
# and pass is always legal
return np.concatenate([legal_moves.ravel(), [1]])
def pass_move(self, mutate=False):
pos = self if mutate else copy.deepcopy(self)
pos.n += 1
pos.recent += (PlayerMove(pos.to_play, None),)
pos.board_deltas = np.concatenate((
np.zeros([1, self.board_size, self.board_size], dtype=np.int8),
pos.board_deltas[:6]))
pos.to_play *= -1
pos.ko = None
return pos
def flip_playerturn(self, mutate=False):
pos = self if mutate else copy.deepcopy(self)
pos.ko = None
pos.to_play *= -1
return pos
def get_liberties(self):
return self.lib_tracker.liberty_cache
def play_move(self, c, color=None, mutate=False):
# Obeys CGOS Rules of Play. In short:
# No suicides
# Chinese/area scoring
# Positional superko (this is very crudely approximate at the moment.)
if color is None:
color = self.to_play
pos = self if mutate else copy.deepcopy(self)
if c is None:
pos = pos.pass_move(mutate=mutate)
return pos
if not self.is_move_legal(c):
raise IllegalMove('{} move at {} is illegal: \n{}'.format(
'Black' if self.to_play == BLACK else 'White',
coords.to_kgs(c), self))
potential_ko = is_koish(self.board_size, self.board, c)
place_stones(pos.board, color, [c])
captured_stones = pos.lib_tracker.add_stone(color, c)
place_stones(pos.board, EMPTY, captured_stones)
opp_color = color * -1
new_board_delta = np.zeros([self.board_size, self.board_size],
dtype=np.int8)
new_board_delta[c] = color
place_stones(new_board_delta, color, captured_stones)
if len(captured_stones) == 1 and potential_ko == opp_color:
new_ko = list(captured_stones)[0]
else:
new_ko = None
if pos.to_play == BLACK:
new_caps = (pos.caps[0] + len(captured_stones), pos.caps[1])
else:
new_caps = (pos.caps[0], pos.caps[1] + len(captured_stones))
pos.n += 1
pos.caps = new_caps
pos.ko = new_ko
pos.recent += (PlayerMove(color, c),)
# keep a rolling history of last 7 deltas - that's all we'll need to
# extract the last 8 board states.
pos.board_deltas = np.concatenate((
new_board_delta.reshape(1, self.board_size, self.board_size),
pos.board_deltas[:6]))
pos.to_play *= -1
return pos
def is_game_over(self):
return (len(self.recent) >= 2
and self.recent[-1].move is None
and self.recent[-2].move is None)
def score(self):
"""Return score from B perspective. If W is winning, score is negative."""
working_board = np.copy(self.board)
while EMPTY in working_board:
unassigned_spaces = np.where(working_board == EMPTY)
c = unassigned_spaces[0][0], unassigned_spaces[1][0]
territory, borders = find_reached(self.board_size, working_board, c)
border_colors = set(working_board[b] for b in borders)
X_border = BLACK in border_colors
O_border = WHITE in border_colors
if X_border and not O_border:
territory_color = BLACK
elif O_border and not X_border:
territory_color = WHITE
else:
territory_color = UNKNOWN # dame, or seki
place_stones(working_board, territory_color, territory)
return np.count_nonzero(working_board == BLACK) - np.count_nonzero(
working_board == WHITE) - self.komi
def result(self):
score = self.score()
if score > 0:
return 1
elif score < 0:
return -1
else:
return 0
def result_string(self):
score = self.score()
if score > 0:
return 'B+' + '{:.1f}'.format(score)
elif score < 0:
return 'W+' + '{:.1f}'.format(abs(score))
else:
return 'DRAW'
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Extends gtp.py."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import itertools
import coords
import go
import gtp
import sgf_wrapper
def parse_message(message):
message = gtp.pre_engine(message).strip()
first, rest = (message.split(" ", 1) + [None])[:2]
if first.isdigit():
message_id = int(first)
if rest is not None:
command, arguments = (rest.split(" ", 1) + [None])[:2]
else:
command, arguments = None, None
else:
message_id = None
command, arguments = first, rest
command = command.replace("-", "_") # for kgs extensions.
return message_id, command, arguments
class KgsExtensionsMixin(gtp.Engine):
def __init__(self, game_obj, name="gtp (python, kgs-chat extensions)",
version="0.1"):
super().__init__(game_obj=game_obj, name=name, version=version)
self.known_commands += ["kgs-chat"]
def send(self, message):
message_id, command, arguments = parse_message(message)
if command in self.known_commands:
try:
retval = getattr(self, "cmd_" + command)(arguments)
response = gtp.format_success(message_id, retval)
sys.stderr.flush()
return response
except ValueError as exception:
return gtp.format_error(message_id, exception.args[0])
else:
return gtp.format_error(message_id, "unknown command: " + command)
# Nice to implement this, as KGS sends it each move.
def cmd_time_left(self, arguments):
pass
def cmd_showboard(self, arguments):
return self._game.showboard()
def cmd_kgs_chat(self, arguments):
try:
arg_list = arguments.split()
msg_type, sender, text = arg_list[0], arg_list[1], arg_list[2:]
text = " ".join(text)
except ValueError:
return "Unparseable message, args: %r" % arguments
return self._game.chat(msg_type, sender, text)
class RegressionsMixin(gtp.Engine):
def cmd_loadsgf(self, arguments):
args = arguments.split()
if len(args) == 2:
file_, movenum = args
movenum = int(movenum)
print("movenum =", movenum, file=sys.stderr)
else:
file_ = args[0]
movenum = None
try:
with open(file_, 'r') as f:
contents = f.read()
except:
raise ValueError("Unreadable file: " + file_)
try:
# This is kinda bad, because replay_sgf is already calling
# 'play move' on its internal position objects, but we really
# want to advance the engine along with us rather than try to
# push in some finished Position object.
for idx, p in enumerate(sgf_wrapper.replay_sgf(contents)):
print("playing #", idx, p.next_move, file=sys.stderr)
self._game.play_move(p.next_move)
if movenum and idx == movenum:
break
except:
raise
class GoGuiMixin(gtp.Engine):
""" GTP extensions of 'analysis commands' for gogui.
We reach into the game_obj (an instance of the players in strategies.py),
and extract stuff from its root nodes, etc. These could be extracted into
methods on the Player object, but its a little weird to do that on a Player,
which doesn't really care about GTP commands, etc. So instead, we just
violate encapsulation a bit... Suggestions welcome :) """
def __init__(self, game_obj, name="gtp (python, gogui extensions)",
version="0.1"):
super().__init__(game_obj=game_obj, name=name, version=version)
self.known_commands += ["gogui-analyze_commands"]
def cmd_gogui_analyze_commands(self, arguments):
return "\n".join(["var/Most Read Variation/nextplay",
"var/Think a spell/spin",
"pspairs/Visit Heatmap/visit_heatmap",
"pspairs/Q Heatmap/q_heatmap"])
def cmd_nextplay(self, arguments):
return self._game.root.mvp_gg()
def cmd_visit_heatmap(self, arguments):
sort_order = list(range(self._game.size * self._game.size + 1))
sort_order.sort(key=lambda i: self._game.root.child_N[i], reverse=True)
return self.heatmap(sort_order, self._game.root, 'child_N')
def cmd_q_heatmap(self, arguments):
sort_order = list(range(self._game.size * self._game.size + 1))
reverse = True if self._game.root.position.to_play is go.BLACK else False
sort_order.sort(
key=lambda i: self._game.root.child_Q[i], reverse=reverse)
return self.heatmap(sort_order, self._game.root, 'child_Q')
def heatmap(self, sort_order, node, prop):
return "\n".join(["{!s:6} {}".format(
coords.to_kgs(coords.from_flat(key)),
node.__dict__.get(prop)[key])
for key in sort_order if node.child_N[key] > 0][:20])
def cmd_spin(self, arguments):
for i in range(50):
for j in range(100):
self._game.tree_search()
moves = self.cmd_nextplay(None).lower()
moves = moves.split()
colors = "bw" if self._game.root.position.to_play is go.BLACK else "wb"
moves_cols = " ".join(['{} {}'.format(*z)
for z in zip(itertools.cycle(colors), moves)])
print("gogui-gfx: TEXT", "{:.3f} after {}".format(
self._game.root.Q, self._game.root.N), file=sys.stderr, flush=True)
print("gogui-gfx: VAR", moves_cols, file=sys.stderr, flush=True)
return self.cmd_nextplay(None)
class GTPDeluxe(KgsExtensionsMixin, RegressionsMixin, GoGuiMixin):
pass
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A wrapper of gtp and gtp_extensions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import datetime
import os
import sys
import coords
from dualnet import DualNetRunner
import go
import gtp
import gtp_extensions
from strategies import MCTSPlayerMixin, CGOSPlayerMixin
def translate_gtp_colors(gtp_color):
if gtp_color == gtp.BLACK:
return go.BLACK
elif gtp_color == gtp.WHITE:
return go.WHITE
else:
return go.EMPTY
class GtpInterface(object):
def __init__(self, board_size):
self.size = 9
self.position = None
self.komi = 6.5
self.board_size = board_size
def set_size(self, n):
if n != self.board_size:
raise ValueError(
("Can't handle boardsize {n}!"
"Restart with env var BOARD_SIZE={n}").format(n=n))
def set_komi(self, komi):
self.komi = komi
self.position.komi = komi
def clear(self):
if self.position and len(self.position.recent) > 1:
try:
sgf = self.to_sgf()
with open(datetime.datetime.now().strftime(
"%Y-%m-%d-%H:%M.sgf"), 'w') as f:
f.write(sgf)
except NotImplementedError:
pass
except:
print("Error saving sgf", file=sys.stderr, flush=True)
self.position = go.Position(komi=self.komi)
self.initialize_game(self.position)
def accomodate_out_of_turn(self, color):
if not translate_gtp_colors(color) == self.position.to_play:
self.position.flip_playerturn(mutate=True)
def make_move(self, color, vertex):
c = coords.from_pygtp(vertex)
# let's assume this never happens for now.
# self.accomodate_out_of_turn(color)
return self.play_move(c)
def get_move(self, color):
self.accomodate_out_of_turn(color)
move = self.suggest_move(self.position)
if self.should_resign():
return gtp.RESIGN
return coords.to_pygtp(move)
def final_score(self):
return self.position.result_string()
def showboard(self):
print('\n\n' + str(self.position) + '\n\n', file=sys.stderr)
return True
def should_resign(self):
raise NotImplementedError
def get_score(self):
return self.position.result_string()
def suggest_move(self, position):
raise NotImplementedError
def play_move(self, c):
raise NotImplementedError
def initialize_game(self):
raise NotImplementedError
def chat(self, msg_type, sender, text):
raise NotImplementedError
def to_sgf(self):
raise NotImplementedError
class MCTSPlayer(MCTSPlayerMixin, GtpInterface):
pass
class CGOSPlayer(CGOSPlayerMixin, GtpInterface):
pass
def make_gtp_instance(board_size, read_file, readouts_per_move=100,
verbosity=1, cgos_mode=False):
n = DualNetRunner(read_file)
instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
verbosity=verbosity, two_player_mode=True)
gtp_engine = gtp.Engine(instance)
if cgos_mode:
instance = CGOSPlayer(board_size, n, seconds_per_move=5,
verbosity=verbosity, two_player_mode=True)
else:
instance = MCTSPlayer(board_size, n, simulations_per_move=readouts_per_move,
verbosity=verbosity, two_player_mode=True)
name = "Somebot-" + os.path.basename(read_file)
gtp_engine = gtp_extensions.GTPDeluxe(instance, name=name)
return gtp_engine
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Monte Carlo Tree Search implementation.
All terminology here (Q, U, N, p_UCT) uses the same notation as in the
AlphaGo (AG) paper, and more details can be found in the paper. Here is a brief
description:
Q: the action value of a position
U: the search control strategy
N: the visit counts of a state
p_UCT: the PUCT algorithm for action selection
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
import coords
import numpy as np
# Exploration constant
c_PUCT = 1.38
# Dirichlet noise, as a function of board_size
def D_NOISE_ALPHA(board_size): return 0.03 * 361 / (board_size ** 2)
class DummyNode(object):
"""A fake node of a MCTS search tree.
This node is intended to be a placeholder for the root node, which would
otherwise have no parent node. If all nodes have parents, code becomes
simpler."""
def __init__(self, board_size):
self.board_size = board_size
self.parent = None
self.child_N = collections.defaultdict(float)
self.child_W = collections.defaultdict(float)
class MCTSNode(object):
"""A node of a MCTS search tree.
A node knows how to compute the action scores of all of its children,
so that a decision can be made about which move to explore next. Upon
selecting a move, the children dictionary is updated with a new node.
position: A go.Position instance
fmove: A move (coordinate) that led to this position, a a flattened coord
(raw number between 0-N^2, with None a pass)
parent: A parent MCTSNode.
"""
def __init__(self, board_size, position, fmove=None, parent=None):
if parent is None:
parent = DummyNode(board_size)
self.board_size = board_size
self.parent = parent
self.fmove = fmove # move that led to this position, as flattened coords
self.position = position
self.is_expanded = False
self.losses_applied = 0 # number of virtual losses on this node
# using child_() allows vectorized computation of action score.
self.illegal_moves = 1000 * (1 - self.position.all_legal_moves())
self.child_N = np.zeros([board_size * board_size + 1], dtype=np.float32)
self.child_W = np.zeros([board_size * board_size + 1], dtype=np.float32)
# save a copy of the original prior before it gets mutated by d-noise.
self.original_prior = np.zeros([board_size * board_size + 1],
dtype=np.float32)
self.child_prior = np.zeros([board_size * board_size + 1], dtype=np.float32)
self.children = {} # map of flattened moves to resulting MCTSNode
def __repr__(self):
return "<MCTSNode move=%s, N=%s, to_play=%s>" % (
self.position.recent[-1:], self.N, self.position.to_play)
@property
def child_action_score(self):
return (self.child_Q * self.position.to_play
+ self.child_U - self.illegal_moves)
@property
def child_Q(self):
return self.child_W / (1 + self.child_N)
@property
def child_U(self):
return (c_PUCT * math.sqrt(1 + self.N) *
self.child_prior / (1 + self.child_N))
@property
def Q(self):
return self.W / (1 + self.N)
@property
def N(self):
return self.parent.child_N[self.fmove]
@N.setter
def N(self, value):
self.parent.child_N[self.fmove] = value
@property
def W(self):
return self.parent.child_W[self.fmove]
@W.setter
def W(self, value):
self.parent.child_W[self.fmove] = value
@property
def Q_perspective(self):
"Return value of position, from perspective of player to play."
return self.Q * self.position.to_play
def select_leaf(self):
current = self
pass_move = self.board_size * self.board_size
while True:
current.N += 1
# if a node has never been evaluated, we have no basis to select a child.
if not current.is_expanded:
break
# HACK: if last move was a pass, always investigate double-pass first
# to avoid situations where we auto-lose by passing too early.
if (current.position.recent
and current.position.recent[-1].move is None
and current.child_N[pass_move] == 0):
current = current.maybe_add_child(pass_move)
continue
best_move = np.argmax(current.child_action_score)
current = current.maybe_add_child(best_move)
return current
def maybe_add_child(self, fcoord):
"""Add child node for fcoord if it doesn't already exist, and returns it."""
if fcoord not in self.children:
new_position = self.position.play_move(
coords.from_flat(self.board_size, fcoord))
self.children[fcoord] = MCTSNode(
self.board_size, new_position, fmove=fcoord, parent=self)
return self.children[fcoord]
def add_virtual_loss(self, up_to):
"""Propagate a virtual loss up to the root node.
Args:
up_to: The node to propagate until. (Keep track of this! You'll
need it to reverse the virtual loss later.)
"""
self.losses_applied += 1
# This is a "win" for the current node; hence a loss for its parent node
# who will be deciding whether to investigate this node again.
loss = self.position.to_play
self.W += loss
if self.parent is None or self is up_to:
return
self.parent.add_virtual_loss(up_to)
def revert_virtual_loss(self, up_to):
self.losses_applied -= 1
revert = -1 * self.position.to_play
self.W += revert
if self.parent is None or self is up_to:
return
self.parent.revert_virtual_loss(up_to)
def revert_visits(self, up_to):
"""Revert visit increments.
Sometimes, repeated calls to select_leaf return the same node.
This is rare and we're okay with the wasted computation to evaluate
the position multiple times by the dual_net. But select_leaf has the
side effect of incrementing visit counts. Since we want the value to
only count once for the repeatedly selected node, we also have to
revert the incremented visit counts.
"""
self.N -= 1
if self.parent is None or self is up_to:
return
self.parent.revert_visits(up_to)
def incorporate_results(self, move_probabilities, value, up_to):
assert move_probabilities.shape == (self.board_size * self.board_size + 1,)
# A finished game should not be going through this code path - should
# directly call backup_value() on the result of the game.
assert not self.position.is_game_over()
if self.is_expanded:
self.revert_visits(up_to=up_to)
return
self.is_expanded = True
self.original_prior = self.child_prior = move_probabilities
# initialize child Q as current node's value, to prevent dynamics where
# if B is winning, then B will only ever explore 1 move, because the Q
# estimation will be so much larger than the 0 of the other moves.
#
# Conversely, if W is winning, then B will explore all 362 moves before
# continuing to explore the most favorable move. This is a waste of search.
#
# The value seeded here acts as a prior, and gets averaged into
# Q calculations.
self.child_W = np.ones([self.board_size * self.board_size + 1],
dtype=np.float32) * value
self.backup_value(value, up_to=up_to)
def backup_value(self, value, up_to):
"""Propagates a value estimation up to the root node.
Args:
value: the value to be propagated (1 = black wins, -1 = white wins)
up_to: the node to propagate until.
"""
self.W += value
if self.parent is None or self is up_to:
return
self.parent.backup_value(value, up_to)
def is_done(self):
'''True if the last two moves were Pass or if the position is at a move
greater than the max depth.
'''
max_depth = (self.board_size ** 2) * 1.4 # 505 moves for 19x19, 113 for 9x9
return self.position.is_game_over() or self.position.n >= max_depth
def inject_noise(self):
dirch = np.random.dirichlet([D_NOISE_ALPHA(self.board_size)] * (
(self.board_size * self.board_size) + 1))
self.child_prior = self.child_prior * 0.75 + dirch * 0.25
def children_as_pi(self, squash=False):
"""Returns the child visit counts as a probability distribution, pi
If squash is true, exponentiate the probabilities by a temperature
slightly larger than unity to encourage diversity in early play and
hopefully to move away from 3-3s
"""
probs = self.child_N
if squash:
probs = probs ** .95
return probs / np.sum(probs)
def most_visited_path(self):
node = self
output = []
while node.children:
next_kid = np.argmax(node.child_N)
node = node.children.get(next_kid)
if node is None:
output.append("GAME END")
break
output.append("%s (%d) ==> " % (
coords.to_kgs(self.board_size,
coords.from_flat(self.board_size, node.fmove)), node.N))
output.append("Q: {:.5f}\n".format(node.Q))
return ''.join(output)
def mvp_gg(self):
""" Returns most visited path in go-gui VAR format e.g. 'b r3 w c17..."""
node = self
output = []
while node.children and max(node.child_N) > 1:
next_kid = np.argmax(node.child_N)
node = node.children[next_kid]
output.append("%s" % coords.to_kgs(
self.board_size, coords.from_flat(self.board_size, node.fmove)))
return ' '.join(output)
def describe(self):
sort_order = list(range(self.board_size * self.board_size + 1))
sort_order.sort(key=lambda i: (
self.child_N[i], self.child_action_score[i]), reverse=True)
soft_n = self.child_N / sum(self.child_N)
p_delta = soft_n - self.child_prior
p_rel = p_delta / self.child_prior
# Dump out some statistics
output = []
output.append("{q:.4f}\n".format(q=self.Q))
output.append(self.most_visited_path())
output.append(
"move: action Q U P P-Dir N soft-N" +
" p-delta p-rel\n")
output.append(
"\n".join(["{!s:6}: {: .3f}, {: .3f}, {:.3f}, {:.3f}, {:.3f}, {:4d} {:.4f} {: .5f} {: .2f}".format(
coords.to_kgs(self.board_size, coords.from_flat(self.board_size, key)),
self.child_action_score[key],
self.child_Q[key],
self.child_U[key],
self.child_prior[key],
self.original_prior[key],
int(self.child_N[key]),
soft_n[key],
p_delta[key],
p_rel[key])
for key in sort_order][:15]))
return "".join(output)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Train MiniGo with several iterations of RL learning.
One iteration of RL learning consists of bootstrap, selfplay, gather and train:
bootstrap: Initialize a random model
selfplay: Play games with the latest model to produce data used for training
gather: Group games played with the same model into larger files of tfexamples
train: Train a new model with the selfplay results from the most recent
N generations.
After training, validation can be performed on the holdout data.
Given two models, evaluation can be applied to choose a stronger model.
The training pipeline consists of multiple RL learning iterations to achieve
better models.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import random
import socket
import sys
import time
import tensorflow as tf # pylint: disable=g-bad-import-order
import dualnet
import evaluation
import go
import model_params
import preprocessing
import selfplay_mcts
import utils
_TF_RECORD_SUFFIX = '.tfrecord.zz'
def _ensure_dir_exists(directory):
"""Check if directory exists. If not, create it.
Args:
directory: A given directory
"""
if os.path.isdir(directory) is False:
tf.gfile.MakeDirs(directory)
def bootstrap(estimator_model_dir, trained_models_dir, params):
"""Initialize the model with random weights.
Args:
estimator_model_dir: tf.estimator model directory.
trained_models_dir: Dir to save the trained models. Here to export the first
bootstrapped generation.
params: An object of hyperparameters for the model.
"""
bootstrap_name = utils.generate_model_name(0)
_ensure_dir_exists(trained_models_dir)
bootstrap_model_path = os.path.join(trained_models_dir, bootstrap_name)
_ensure_dir_exists(estimator_model_dir)
print('Bootstrapping with working dir {}\n Model 0 exported to {}'.format(
estimator_model_dir, bootstrap_model_path))
dualnet.bootstrap(estimator_model_dir, params)
dualnet.export_model(estimator_model_dir, bootstrap_model_path)
def selfplay(model_name, trained_models_dir, selfplay_dir, holdout_dir, sgf_dir,
params):
"""Perform selfplay with a specific model.
Args:
model_name: The name of the model used for selfplay.
trained_models_dir: The path to the model files.
selfplay_dir: Where to write the games. Set as 'base_dir/data/selfplay/'.
holdout_dir: Where to write the holdout data. Set as
'base_dir/data/holdout/'.
sgf_dir: Where to write the sgf (Smart Game Format) files. Set as
'base_dir/sgf/'.
params: An object of hyperparameters for the model.
"""
print('Playing a game with model {}'.format(model_name))
# Set paths for the model with 'model_name'
model_path = os.path.join(trained_models_dir, model_name)
output_dir = os.path.join(selfplay_dir, model_name)
holdout_dir = os.path.join(holdout_dir, model_name)
# clean_sgf is to write sgf file without comments.
# full_sgf is to write sgf file with comments.
clean_sgf = os.path.join(sgf_dir, model_name, 'clean')
full_sgf = os.path.join(sgf_dir, model_name, 'full')
_ensure_dir_exists(output_dir)
_ensure_dir_exists(holdout_dir)
_ensure_dir_exists(clean_sgf)
_ensure_dir_exists(full_sgf)
with utils.logged_timer('Loading weights from {} ... '.format(model_path)):
network = dualnet.DualNetRunner(model_path, params)
with utils.logged_timer('Playing game'):
player = selfplay_mcts.play(
params.board_size, network, params.selfplay_readouts,
params.selfplay_resign_threshold, params.simultaneous_leaves,
params.selfplay_verbose)
output_name = '{}-{}'.format(int(time.time()), socket.gethostname())
def _write_sgf_data(dir_sgf, use_comments):
with tf.gfile.GFile(
os.path.join(dir_sgf, '{}.sgf'.format(output_name)), 'w') as f:
f.write(player.to_sgf(use_comments=use_comments))
_write_sgf_data(clean_sgf, use_comments=False)
_write_sgf_data(full_sgf, use_comments=True)
game_data = player.extract_data()
tf_examples = preprocessing.make_dataset_from_selfplay(game_data, params)
# Hold out 5% of games for evaluation.
if random.random() < params.holdout_pct:
fname = os.path.join(
holdout_dir, ('{}'+_TF_RECORD_SUFFIX).format(output_name))
else:
fname = os.path.join(
output_dir, ('{}'+_TF_RECORD_SUFFIX).format(output_name))
preprocessing.write_tf_examples(fname, tf_examples)
def gather(selfplay_dir, training_chunk_dir, params):
"""Gather selfplay data into large training chunk.
Args:
selfplay_dir: Where to look for games. Set as 'base_dir/data/selfplay/'.
training_chunk_dir: where to put collected games. Set as
'base_dir/data/training_chunks/'.
params: An object of hyperparameters for the model.
"""
# Check the selfplay data from the most recent 50 models.
_ensure_dir_exists(training_chunk_dir)
sorted_model_dirs = sorted(tf.gfile.ListDirectory(selfplay_dir))
models = [model_dir.strip('/')
for model_dir in sorted_model_dirs[-params.gather_generation:]]
with utils.logged_timer('Finding existing tfrecords...'):
model_gamedata = {
model: tf.gfile.Glob(
os.path.join(selfplay_dir, model, '*'+_TF_RECORD_SUFFIX))
for model in models
}
print('Found {} models'.format(len(models)))
for model_name, record_files in sorted(model_gamedata.items()):
print(' {}: {} files'.format(model_name, len(record_files)))
meta_file = os.path.join(training_chunk_dir, 'meta.txt')
try:
with tf.gfile.GFile(meta_file, 'r') as f:
already_processed = set(f.read().split())
except tf.errors.NotFoundError:
already_processed = set()
num_already_processed = len(already_processed)
for model_name, record_files in sorted(model_gamedata.items()):
if set(record_files) <= already_processed:
continue
print('Gathering files from {}:'.format(model_name))
tf_examples = preprocessing.shuffle_tf_examples(
params.shuffle_buffer_size, params.examples_per_chunk, record_files)
# tqdm to make the loops show a smart progress meter
for i, example_batch in enumerate(tf_examples):
output_record = os.path.join(
training_chunk_dir,
('{}-{}'+_TF_RECORD_SUFFIX).format(model_name, str(i)))
preprocessing.write_tf_examples(
output_record, example_batch, serialize=False)
already_processed.update(record_files)
print('Processed {} new files'.format(
len(already_processed) - num_already_processed))
with tf.gfile.GFile(meta_file, 'w') as f:
f.write('\n'.join(sorted(already_processed)))
def train(trained_models_dir, estimator_model_dir, training_chunk_dir, params):
"""Train the latest model from gathered data.
Args:
trained_models_dir: Where to export the completed generation.
estimator_model_dir: tf.estimator model directory.
training_chunk_dir: Directory where gathered training chunks are.
params: An object of hyperparameters for the model.
"""
model_num, model_name = utils.get_latest_model(trained_models_dir)
print('Initializing from model {}'.format(model_name))
new_model_name = utils.generate_model_name(model_num + 1)
print('New model will be {}'.format(new_model_name))
save_file = os.path.join(trained_models_dir, new_model_name)
tf_records = sorted(
tf.gfile.Glob(os.path.join(training_chunk_dir, '*'+_TF_RECORD_SUFFIX)))
tf_records = tf_records[
-(params.train_window_size // params.examples_per_chunk):]
print('Training from: {} to {}'.format(tf_records[0], tf_records[-1]))
with utils.logged_timer('Training'):
dualnet.train(estimator_model_dir, tf_records, model_num + 1, params)
dualnet.export_model(estimator_model_dir, save_file)
def validate(trained_models_dir, holdout_dir, estimator_model_dir, params):
"""Validate the latest model on the holdout dataset.
Args:
trained_models_dir: Directories where the completed generations/models are.
holdout_dir: Directories where holdout data are.
estimator_model_dir: tf.estimator model directory.
params: An object of hyperparameters for the model.
"""
model_num, _ = utils.get_latest_model(trained_models_dir)
# Get the holdout game data
nums_names = utils.get_models(trained_models_dir)
# Model N was trained on games up through model N-1, so the validation set
# should only be for models through N-1 as well, thus the (model_num) term.
models = [num_name for num_name in nums_names if num_name[0] < model_num]
# pair is a tuple of (model_num, model_name), like (13, 000013-modelname)
holdout_dirs = [os.path.join(holdout_dir, pair[1])
for pair in models[-params.holdout_generation:]]
tf_records = []
with utils.logged_timer('Building lists of holdout files'):
for record_dir in holdout_dirs:
if os.path.exists(record_dir): # make sure holdout dir exists
tf_records.extend(
tf.gfile.Glob(os.path.join(record_dir, '*'+_TF_RECORD_SUFFIX)))
print('The length of tf_records is {}.'.format(len(tf_records)))
first_tf_record = os.path.basename(tf_records[0])
last_tf_record = os.path.basename(tf_records[-1])
with utils.logged_timer('Validating from {} to {}'.format(
first_tf_record, last_tf_record)):
dualnet.validate(estimator_model_dir, tf_records, params)
def evaluate(trained_models_dir, black_model_name, white_model_name,
evaluate_dir, params):
"""Evaluate with two models.
With the model name, construct two DualNetRunners to play as black and white
in a Go match. Two models play several names, and the model that wins by a
margin of 55% will be the winner.
Args:
trained_models_dir: Directories where the completed generations/models are.
black_model_name: The name of the model playing black.
white_model_name: The name of the model playing white.
evaluate_dir: Where to write the evaluation results. Set as
'base_dir/sgf/evaluate/''
params: An object of hyperparameters for the model.
Returns:
The model name of the winner.
Raises:
ValueError: if neither `WHITE` or `BLACK` is returned.
"""
black_model = os.path.join(trained_models_dir, black_model_name)
white_model = os.path.join(trained_models_dir, white_model_name)
print('Evaluate models between {} and {}'.format(
black_model_name, white_model_name))
_ensure_dir_exists(evaluate_dir)
with utils.logged_timer('Loading weights'):
black_net = dualnet.DualNetRunner(black_model, params)
white_net = dualnet.DualNetRunner(white_model, params)
with utils.logged_timer('{} games'.format(params.eval_games)):
winner = evaluation.play_match(
params, black_net, white_net, params.eval_games,
params.eval_readouts, evaluate_dir, params.eval_verbose)
if winner != go.WHITE_NAME and winner != go.BLACK_NAME:
raise ValueError('Winner should be either White or Black!')
return black_model_name if winner == go.BLACK_NAME else white_model_name
def _set_params_from_board_size(board_size):
"""Set hyperparameters from board size."""
params = model_params.MiniGoParams()
k = utils.round_power_of_two(board_size ** 2 / 3)
params.num_filters = k # Number of filters in the convolution layer
params.fc_width = 2 * k # Width of each fully connected layer
params.num_shared_layers = board_size # Number of shared trunk layers
params.board_size = board_size # Board size
# How many positions can fit on a graphics card. 256 for 9s, 16 or 32 for 19s.
if FLAGS.board_size == 9:
params.batch_size = 256
else:
params.batch_size = 32
return params
def main(_):
"""Run the reinforcement learning loop."""
tf.logging.set_verbosity(tf.logging.INFO)
params = _set_params_from_board_size(FLAGS.board_size)
# A dummy model for debug/testing purpose with fewer games and iterations
if FLAGS.debug:
params = model_params.DummyMiniGoParams()
# Set directories for models and datasets
base_dir = FLAGS.base_dir + str(FLAGS.board_size) + '_board_size/'
dirs = utils.MiniGoDirectory(base_dir)
# if no models have been trained, start from bootstrap model
if os.path.isdir(base_dir) is False:
print('No trained model exists! Starting from Bootstrap...')
print('Creating random initial weights...')
bootstrap(dirs.estimator_model_dir, dirs.trained_models_dir, params)
else:
print('A MiniGo base directory has been found! ')
print('Start from the last checkpoint...')
_, best_model_so_far = utils.get_latest_model(dirs.trained_models_dir)
for rl_iter in range(params.max_iters_per_pipeline):
print('RL_iteration: {}'.format(rl_iter))
# Self-play to generate at least params.max_games_per_generation games
selfplay(best_model_so_far, dirs.trained_models_dir, dirs.selfplay_dir,
dirs.holdout_dir, dirs.sgf_dir, params)
games = tf.gfile.Glob(
os.path.join(dirs.selfplay_dir, best_model_so_far, '*.zz'))
while len(games) < params.max_games_per_generation:
selfplay(best_model_so_far, dirs.trained_models_dir, dirs.selfplay_dir,
dirs.holdout_dir, dirs.sgf_dir, params)
if FLAGS.validation:
params = model_params.DummyValidationParams()
selfplay(best_model_so_far, dirs.trained_models_dir, dirs.selfplay_dir,
dirs.holdout_dir, dirs.sgf_dir, params)
games = tf.gfile.Glob(
os.path.join(dirs.selfplay_dir, best_model_so_far, '*.zz'))
print('Gathering game output...')
gather(dirs.selfplay_dir, dirs.training_chunk_dir, params)
print('Training on gathered game data...')
train(dirs.trained_models_dir, dirs.estimator_model_dir,
dirs.training_chunk_dir, params)
if FLAGS.validation:
print('Validating on the holdout game data...')
validate(dirs.trained_models_dir, dirs.holdout_dir,
dirs.estimator_model_dir, params)
_, current_model = utils.get_latest_model(dirs.trained_models_dir)
if FLAGS.evaluation: # Perform evaluation if needed
print('Evaluating the latest model...')
best_model_so_far = evaluate(
dirs.trained_models_dir, best_model_so_far, current_model,
dirs.evaluate_dir, params)
print('Winner: {}!'.format(best_model_so_far))
else:
best_model_so_far = current_model
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--base_dir',
type=str,
default='/tmp/minigo/',
metavar='BD',
help='Base directory for the MiniGo models and datasets.')
parser.add_argument(
'--board_size',
type=int,
default=9,
metavar='N',
choices=[9, 19],
help='Go board size. The default size is 9.')
parser.add_argument(
'--evaluation',
action='store_true',
help='A boolean to specify evaluation in the RL pipeline.')
parser.add_argument(
'--debug',
action='store_true',
help='A boolean to indicate debug mode for testing purpose.')
parser.add_argument(
'--validation',
action='store_true',
help='A boolean to explicitly generate holdout data for validation.')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Defines MiniGo parameters."""
class MiniGoParams(object):
"""Parameters for MiniGo."""
# Go params
board_size = 9
# RL pipeline
max_games_per_generation = 10 # Number of games per selfplay generation
max_iters_per_pipeline = 2 # Number of RL iterations in one pipeline
# The shuffle buffer size determines how far an example could end up from
# where it started; this and the interleave parameters in preprocessing can
# give us an approximation of a uniform sampling. The default of 4M is used
# in training, but smaller numbers can be used for aggregation or validation.
shuffle_buffer_size = 2000000 # shuffle buffer size in preprocessing
# dual_net
# How many positions to look at per generation.
# Per AlphaGo Zero (AGZ), 2048 minibatch * 1k = 2M positions/generation
examples_per_generation = 2000000
# for learning rate
l2_strength = 1e-4 # Regularization strength
momentum = 0.9 # Momentum used in SGD
kernel_size = [3, 3] # kernel size of conv and res blocks is from AGZ paper
# selfplay
selfplay_readouts = 100 # How many simulations to run per move
selfplay_verbose = 1 # >=2 will print debug info, >=3 will print boards
# an absolute value of threshold to resign at
selfplay_resign_threshold = 0.95
# the number of simultaneous leaves in MCTS
simultaneous_leaves = 8
holdout_pct = 0.05 # How many games to hold out for validation
holdout_generation = 50 # How many recent generations/models for holdout data
# gather
gather_generation = 50 # How many recent generations/models for gathered data
# How many positions we should aggregate per 'chunk'.
examples_per_chunk = 10000
# How many positions to draw from for our training window.
# AGZ used the most recent 500k games, which, assuming 250 moves/game = 125M
train_window_size = 125000000
# evaluation
eval_games = 50 # The number of games to play in evaluation
eval_readouts = 100 # How many readouts to make per move in evaluation
eval_verbose = 1 # How verbose the players should be in evaluation
eval_win_rate = 0.55 # Winner needs to win by a margin of 55%.
class DummyMiniGoParams(MiniGoParams):
"""Parameters for a dummy model."""
num_filters = 8 # Number of filters in the convolution layer
fc_width = 16 # Width of each fully connected layer
num_shared_layers = 1 # Number of shared trunk layers
batch_size = 16
examples_per_generation = 64
max_games_per_generation = 2
max_iters_per_pipeline = 1
selfplay_readouts = 10
shuffle_buffer_size = 1000
# evaluation
eval_games = 10 # The number of games to play in evaluation
eval_readouts = 10 # How many readouts to make per move in evaluation
eval_verbose = 1 # How verbose the players should be in evaluation
class DummyValidationParams(DummyMiniGoParams, MiniGoParams):
"""Parameters for a dummy model."""
holdout_pct = 1 # Set holdout percent as 1 for validation testing purpose
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Preprocessing step to create, read, write tf.Examples."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import random
import tensorflow as tf # pylint: disable=g-bad-import-order
import coords
import features as features_lib
import numpy as np
import sgf_wrapper
TF_RECORD_CONFIG = tf.python_io.TFRecordOptions(
tf.python_io.TFRecordCompressionType.ZLIB)
# Constructing tf.Examples
def _one_hot(board_size, index):
onehot = np.zeros([board_size * board_size + 1], dtype=np.float32)
onehot[index] = 1
return onehot
def make_tf_example(features, pi, value):
"""
Args:
features: [N, N, FEATURE_DIM] nparray of uint8
pi: [N * N + 1] nparray of float32
value: float
"""
return tf.train.Example(
features=tf.train.Features(
feature={
'x': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[features.tostring()])),
'pi': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[pi.tostring()])),
'outcome': tf.train.Feature(
float_list=tf.train.FloatList(value=[value]))
}))
# Write tf.Example to files
def write_tf_examples(filename, tf_examples, serialize=True):
"""
Args:
filename: Where to write tf.records
tf_examples: An iterable of tf.Example
serialize: whether to serialize the examples.
"""
with tf.python_io.TFRecordWriter(
filename, options=TF_RECORD_CONFIG) as writer:
for ex in tf_examples:
if serialize:
writer.write(ex.SerializeToString())
else:
writer.write(ex)
# Read tf.Example from files
def _batch_parse_tf_example(board_size, batch_size, example_batch):
"""
Args:
example_batch: a batch of tf.Example
Returns:
A tuple (feature_tensor, dict of output tensors)
"""
features = {
'x': tf.FixedLenFeature([], tf.string),
'pi': tf.FixedLenFeature([], tf.string),
'outcome': tf.FixedLenFeature([], tf.float32),
}
parsed = tf.parse_example(example_batch, features)
x = tf.decode_raw(parsed['x'], tf.uint8)
x = tf.cast(x, tf.float32)
x = tf.reshape(x, [batch_size, board_size, board_size,
features_lib.NEW_FEATURES_PLANES])
pi = tf.decode_raw(parsed['pi'], tf.float32)
pi = tf.reshape(pi, [batch_size, board_size * board_size + 1])
outcome = parsed['outcome']
outcome.set_shape([batch_size])
return (x, {'pi_tensor': pi, 'value_tensor': outcome})
def read_tf_records(
shuffle_buffer_size, batch_size, tf_records, num_repeats=None,
shuffle_records=True, shuffle_examples=True, filter_amount=1.0):
"""
Args:
batch_size: batch size to return
tf_records: a list of tf_record filenames
num_repeats: how many times the data should be read (default: infinite)
shuffle_records: whether to shuffle the order of files read
shuffle_examples: whether to shuffle the tf.Examples
shuffle_buffer_size: how big of a buffer to fill before shuffling.
filter_amount: what fraction of records to keep
Returns:
a tf dataset of batched tensors
"""
if shuffle_buffer_size is None:
shuffle_buffer_size = params.shuffle_buffer_size
if shuffle_records:
random.shuffle(tf_records)
record_list = tf.data.Dataset.from_tensor_slices(tf_records)
# compression_type here must agree with write_tf_examples
# cycle_length = how many tfrecord files are read in parallel
# block_length = how many tf.Examples are read from each file before
# moving to the next file
# The idea is to shuffle both the order of the files being read,
# and the examples being read from the files.
dataset = record_list.interleave(
lambda x: tf.data.TFRecordDataset(x, compression_type='ZLIB'),
cycle_length=64, block_length=16)
dataset = dataset.filter(lambda x: tf.less(
tf.random_uniform([1]), filter_amount)[0])
# TODO(amj): apply py_func for transforms here.
if num_repeats is not None:
dataset = dataset.repeat(num_repeats)
else:
dataset = dataset.repeat()
if shuffle_examples:
dataset = dataset.shuffle(buffer_size=shuffle_buffer_size)
dataset = dataset.batch(batch_size)
return dataset
def get_input_tensors(params, batch_size, tf_records, num_repeats=None,
shuffle_records=True, shuffle_examples=True,
filter_amount=0.05):
"""Read tf.Records and prepare them for ingestion by dual_net. See
`read_tf_records` for parameter documentation.
Returns a dict of tensors (see return value of batch_parse_tf_example)
"""
shuffle_buffer_size = params.shuffle_buffer_size
dataset = read_tf_records(
shuffle_buffer_size, batch_size, tf_records, num_repeats=num_repeats,
shuffle_records=shuffle_records, shuffle_examples=shuffle_examples,
filter_amount=filter_amount)
dataset = dataset.filter(lambda t: tf.equal(tf.shape(t)[0], batch_size))
def batch_parse_tf_example(batch_size, dataset):
return _batch_parse_tf_example(params.board_size, batch_size, dataset)
dataset = dataset.map(functools.partial(
batch_parse_tf_example, batch_size))
return dataset.make_one_shot_iterator().get_next()
# End-to-end utility functions
def make_dataset_from_selfplay(data_extracts, params):
"""Make an iterable of tf.Examples.
Args:
data_extracts: An iterable of (position, pi, result) tuples
Returns an iterable of tf.Examples.
"""
board_size = params.board_size
tf_examples = (make_tf_example(features_lib.extract_features(
board_size, pos), pi, result) for pos, pi, result in data_extracts)
return tf_examples
def make_dataset_from_sgf(board_size, sgf_filename, tf_record):
pwcs = sgf_wrapper.replay_sgf_file(board_size, sgf_filename)
def make_tf_example_from_pwc(pwcs):
return _make_tf_example_from_pwc(board_size, pwcs)
tf_examples = map(make_tf_example_from_pwc, pwcs)
write_tf_examples(tf_record, tf_examples)
def _make_tf_example_from_pwc(board_size, position_w_context):
features = features_lib.extract_features(
board_size, position_w_context.position)
pi = _one_hot(board_size, coords.to_flat(position_w_context.next_move))
value = position_w_context.result
return make_tf_example(features, pi, value)
def shuffle_tf_examples(shuffle_buffer_size, gather_size, records_to_shuffle):
"""Read through tf.Record and yield shuffled, but unparsed tf.Examples.
Args:
shuffle_buffer_size: the size for shuffle buffer
gather_size: The number of tf.Examples to be gathered together
records_to_shuffle: A list of filenames
Returns:
An iterator yielding lists of bytes, which are serialized tf.Examples.
"""
dataset = read_tf_records(shuffle_buffer_size, gather_size,
records_to_shuffle, num_repeats=1)
batch = dataset.make_one_shot_iterator().get_next()
sess = tf.Session()
while True:
try:
result = sess.run(batch)
yield list(result)
except tf.errors.OutOfRangeError:
break
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Play a self-play match with a given DualNet model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import random
import sys
import time
import coords
from gtp_wrapper import MCTSPlayer
def play(board_size, network, readouts, resign_threshold, simultaneous_leaves,
verbosity=0):
"""Plays out a self-play match.
Args:
board_size: the go board size
network: the DualNet model
readouts: the number of readouts in MCTS
resign_threshold: the threshold to resign at in the match
simultaneous_leaves: the number of simultaneous leaves in MCTS
verbosity: the verbosity of the self-play match
Returns:
the final position
the n x 362 tensor of floats representing the mcts search probabilities
the n-ary tensor of floats representing the original value-net estimate
where n is the number of moves in the game.
"""
player = MCTSPlayer(board_size, network, resign_threshold=resign_threshold,
verbosity=verbosity, num_parallel=simultaneous_leaves)
# Disable resign in 5% of games
if random.random() < 0.05:
player.resign_threshold = -1.0
player.initialize_game()
# Must run this once at the start, so that noise injection actually
# affects the first move of the game.
first_node = player.root.select_leaf()
prob, val = network.run(first_node.position)
first_node.incorporate_results(prob, val, first_node)
while True:
start = time.time()
player.root.inject_noise()
current_readouts = player.root.N
# we want to do "X additional readouts", rather than "up to X readouts".
while player.root.N < current_readouts + readouts:
player.tree_search()
if verbosity >= 3:
print(player.root.position)
print(player.root.describe())
if player.should_resign():
player.set_result(-1 * player.root.position.to_play, was_resign=True)
break
move = player.pick_move()
player.play_move(move)
if player.root.is_done():
player.set_result(player.root.position.result(), was_resign=False)
break
if (verbosity >= 2) or (
verbosity >= 1 and player.root.position.n % 10 == 9):
print("Q: {:.5f}".format(player.root.Q))
dur = time.time() - start
print("%d: %d readouts, %.3f s/100. (%.2f sec)" % (
player.root.position.n, readouts, dur / readouts * 100.0, dur))
if verbosity >= 3:
print("Played >>",
coords.to_kgs(coords.from_flat(player.root.fmove)))
if verbosity >= 2:
print("%s: %.3f" % (player.result_string, player.root.Q), file=sys.stderr)
print(player.root.position,
player.root.position.score(), file=sys.stderr)
return player
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Code to extract a series of positions + their next moves from an SGF.
Most of the complexity here is dealing with two features of SGF:
- Stones can be added via "play move" or "add move", the latter being used
to configure L+D puzzles, but also for initial handicap placement.
- Plays don't necessarily alternate colors; they can be repeated B or W moves
This feature is used to handle free handicap placement.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import namedtuple
import coords
import go
from go import Position, PositionWithContext
import numpy as np
import sgf
import utils
SGF_TEMPLATE = """(;GM[1]FF[4]CA[UTF-8]AP[Minigo_sgfgenerator]RU[{ruleset}]
SZ[{boardsize}]KM[{komi}]PW[{white_name}]PB[{black_name}]RE[{result}]
{game_moves})"""
PROGRAM_IDENTIFIER = "Minigo"
def translate_sgf_move_qs(player_move, q):
return "{move}C[{q:.4f}]".format(
move=translate_sgf_move(player_move), q=q)
def translate_sgf_move(player_move, comment):
if player_move.color not in (go.BLACK, go.WHITE):
raise ValueError("Can't translate color %s to sgf" % player_move.color)
c = coords.to_sgf(player_move.move)
color = 'B' if player_move.color == go.BLACK else 'W'
if comment is not None:
comment = comment.replace(']', r'\]')
comment_node = "C[{}]".format(comment)
else:
comment_node = ""
return ";{color}[{coords}]{comment_node}".format(
color=color, coords=c, comment_node=comment_node)
def make_sgf(board_size,
move_history,
result_string,
ruleset="Chinese",
komi=7.5,
white_name=PROGRAM_IDENTIFIER,
black_name=PROGRAM_IDENTIFIER,
comments=[]
):
"""Turn a game into SGF.
Doesn't handle handicap games or positions with incomplete history.
Args:
move_history: iterable of PlayerMoves
result_string: "B+R", "W+0.5", etc.
comments: iterable of string/None. Will be zipped with move_history.
"""
try:
# Python 2
from itertools import izip_longest
zip_longest = izip_longest
except ImportError:
# Python 3
from itertools import zip_longest
boardsize = board_size
game_moves = ''.join(translate_sgf_move(*z)
for z in zip_longest(move_history, comments))
result = result_string
return SGF_TEMPLATE.format(**locals())
def sgf_prop(value_list):
'Converts raw sgf library output to sensible value'
if value_list is None:
return None
if len(value_list) == 1:
return value_list[0]
else:
return value_list
def sgf_prop_get(props, key, default):
return sgf_prop(props.get(key, default))
def handle_node(board_size, pos, node):
'A node can either add B+W stones, play as B, or play as W.'
props = node.properties
black_stones_added = [coords.from_sgf(board_size,
c) for c in props.get('AB', [])]
white_stones_added = [coords.from_sgf(board_size,
c) for c in props.get('AW', [])]
if black_stones_added or white_stones_added:
return add_stones(pos, black_stones_added, white_stones_added)
# If B/W props are not present, then there is no move. But if it is present
# and equal to the empty string, then the move was a pass.
elif 'B' in props:
black_move = coords.from_sgf(board_size, props.get('B', [''])[0])
return pos.play_move(black_move, color=go.BLACK)
elif 'W' in props:
white_move = coords.from_sgf(board_size, props.get('W', [''])[0])
return pos.play_move(white_move, color=go.WHITE)
else:
return pos
def add_stones(pos, black_stones_added, white_stones_added):
working_board = np.copy(pos.board)
go.place_stones(working_board, go.BLACK, black_stones_added)
go.place_stones(working_board, go.WHITE, white_stones_added)
new_position = Position(board=working_board, n=pos.n, komi=pos.komi,
caps=pos.caps, ko=pos.ko, recent=pos.recent, to_play=pos.to_play)
return new_position
def get_next_move(board_size, node):
props = node.next.properties
if 'W' in props:
return coords.from_sgf(board_size, props['W'][0])
else:
return coords.from_sgf(board_size, props['B'][0])
def maybe_correct_next(pos, next_node):
if (('B' in next_node.properties and not pos.to_play == go.BLACK) or
('W' in next_node.properties and not pos.to_play == go.WHITE)):
pos.flip_playerturn(mutate=True)
def replay_sgf(board_size, sgf_contents):
"""
Wrapper for sgf files, returning go.PositionWithContext instances.
It does NOT return the very final position, as there is no follow up.
To get the final position, call pwc.position.play_move(pwc.next_move)
on the last PositionWithContext returned.
Example usage:
with open(filename) as f:
for position_w_context in replay_sgf(f.read()):
print(position_w_context.position)
"""
collection = sgf.parse(sgf_contents)
game = collection.children[0]
props = game.root.properties
assert int(sgf_prop(props.get('GM', ['1']))) == 1, "Not a Go SGF!"
komi = 0
if props.get('KM') != None:
komi = float(sgf_prop(props.get('KM')))
result = utils.parse_game_result(sgf_prop(props.get('RE')))
pos = Position(komi=komi)
current_node = game.root
while pos is not None and current_node.next is not None:
pos = handle_node(board_size, pos, current_node)
maybe_correct_next(pos, current_node.next)
next_move = get_next_move(board_size, current_node)
yield PositionWithContext(pos, next_move, result)
current_node = current_node.next
def replay_sgf_file(board_size, sgf_file):
with open(sgf_file) as f:
for pwc in replay_sgf(board_size, f.read()):
yield pwc
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""The strategy to play each move with MCTS."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import random
import sys
import time
import coords
import go
from mcts import MCTSNode
import numpy as np
import sgf_wrapper
def time_recommendation(move_num, seconds_per_move=5, time_limit=15*60,
decay_factor=0.98):
"""
Given current move number and "desired" seconds per move,
return how much time should actually be used. To be used specifically
for CGOS time controls, which are absolute 15 minute time.
The strategy is to spend the maximum time possible using seconds_per_move,
and then switch to an exponentially decaying time usage, calibrated so that
we have enough time for an infinite number of moves.
"""
# divide by two since you only play half the moves in a game.
player_move_num = move_num / 2
# sum of geometric series maxes out at endgame_time seconds.
endgame_time = seconds_per_move / (1 - decay_factor)
if endgame_time > time_limit:
# there is so little main time that we're already in "endgame" mode.
base_time = time_limit * (1 - decay_factor)
return base_time * decay_factor ** player_move_num
# leave over endgame_time seconds for the end, and play at seconds_per_move
# for as long as possible
core_time = time_limit - endgame_time
core_moves = core_time / seconds_per_move
if player_move_num < core_moves:
return seconds_per_move
else:
return seconds_per_move * decay_factor ** (player_move_num - core_moves)
def _get_temperature_cutoff(board_size):
# When to do deterministic move selection. ~30 moves on a 19x19, ~8 on 9x9
return int((board_size * board_size) / 12)
class MCTSPlayerMixin(object):
# If 'simulations_per_move' is nonzero, it will perform that many reads
# before playing. Otherwise, it uses 'seconds_per_move' of wall time'
def __init__(self, board_size, network, seconds_per_move=5,
simulations_per_move=0, resign_threshold=-0.90,
verbosity=0, two_player_mode=False, num_parallel=8):
self.board_size = board_size
self.network = network
self.seconds_per_move = seconds_per_move
self.simulations_per_move = simulations_per_move
self.verbosity = verbosity
self.two_player_mode = two_player_mode
if two_player_mode:
self.temp_threshold = -1
else:
self.temp_threshold = _get_temperature_cutoff(board_size)
self.num_parallel = num_parallel
self.qs = []
self.comments = []
self.searches_pi = []
self.root = None
self.result = 0
self.result_string = None
self.resign_threshold = -abs(resign_threshold)
super(MCTSPlayerMixin, self).__init__(board_size)
def initialize_game(self, position=None):
if position is None:
position = go.Position(self.board_size)
self.root = MCTSNode(self.board_size, position)
self.result = 0
self.result_string = None
self.comments = []
self.searches_pi = []
self.qs = []
def suggest_move(self, position):
""" Used for playing a single game.
For parallel play, use initialize_move, select_leaf,
incorporate_results, and pick_move
"""
start = time.time()
if self.simulations_per_move == 0:
while time.time() - start < self.seconds_per_move:
self.tree_search()
else:
current_readouts = self.root.N
while self.root.N < current_readouts + self.simulations_per_move:
self.tree_search()
if self.verbosity > 0:
print("%d: Searched %d times in %s seconds\n\n" % (
position.n, self.simulations_per_move, time.time() - start),
file=sys.stderr)
# print some stats on anything with probability > 1%
if self.verbosity > 2:
print(self.root.describe(), file=sys.stderr)
print('\n\n', file=sys.stderr)
if self.verbosity > 3:
print(self.root.position, file=sys.stderr)
return self.pick_move()
def play_move(self, c):
"""
Notable side effects:
- finalizes the probability distribution according to
this roots visit counts into the class' running tally, `searches_pi`
- Makes the node associated with this move the root, for future
`inject_noise` calls.
"""
if not self.two_player_mode:
self.searches_pi.append(
self.root.children_as_pi(self.root.position.n < self.temp_threshold))
self.qs.append(self.root.Q) # Save our resulting Q.
self.comments.append(self.root.describe())
self.root = self.root.maybe_add_child(coords.to_flat(self.board_size, c))
self.position = self.root.position # for showboard
del self.root.parent.children
return True # GTP requires positive result.
def pick_move(self):
"""Picks a move to play, based on MCTS readout statistics.
Highest N is most robust indicator. In the early stage of the game, pick
a move weighted by visit count; later on, pick the absolute max."""
if self.root.position.n > self.temp_threshold:
fcoord = np.argmax(self.root.child_N)
else:
cdf = self.root.child_N.cumsum()
cdf /= cdf[-1]
selection = random.random()
fcoord = cdf.searchsorted(selection)
assert self.root.child_N[fcoord] != 0
return coords.from_flat(self.board_size, fcoord)
def tree_search(self, num_parallel=None):
if num_parallel is None:
num_parallel = self.num_parallel
leaves = []
failsafe = 0
while len(leaves) < num_parallel and failsafe < num_parallel * 2:
failsafe += 1
leaf = self.root.select_leaf()
if self.verbosity >= 4:
print(self.show_path_to_root(leaf))
# if game is over, override the value estimate with the true score
if leaf.is_done():
value = 1 if leaf.position.score() > 0 else -1
leaf.backup_value(value, up_to=self.root)
continue
leaf.add_virtual_loss(up_to=self.root)
leaves.append(leaf)
if leaves:
move_probs, values = self.network.run_many(
[leaf.position for leaf in leaves])
for leaf, move_prob, value in zip(leaves, move_probs, values):
leaf.revert_virtual_loss(up_to=self.root)
leaf.incorporate_results(move_prob, value, up_to=self.root)
def show_path_to_root(self, node):
MAX_DEPTH = (self.board_size ** 2) * 1.4 # 505 moves for 19x19, 113 for 9x9
pos = node.position
diff = node.position.n - self.root.position.n
if len(pos.recent) == 0:
return
def fmt(move):
return "{}-{}".format('b' if move.color == 1 else 'w',
coords.to_kgs(self.board_size, move.move))
path = " ".join(fmt(move) for move in pos.recent[-diff:])
if node.position.n >= MAX_DEPTH:
path += " (depth cutoff reached) %0.1f" % node.position.score()
elif node.position.is_game_over():
path += " (game over) %0.1f" % node.position.score()
return path
def should_resign(self):
"""Returns true if the player resigned.
No further moves should be played.
"""
return self.root.Q_perspective < self.resign_threshold
def set_result(self, winner, was_resign):
self.result = winner
if was_resign:
string = "B+R" if winner == go.BLACK else "W+R"
else:
string = self.root.position.result_string()
self.result_string = string
def to_sgf(self, use_comments=True):
assert self.result_string is not None
pos = self.root.position
if use_comments:
comments = self.comments or ['No comments.']
comments[0] = ("Resign Threshold: %0.3f\n" %
self.resign_threshold) + comments[0]
else:
comments = []
return sgf_wrapper.make_sgf(
self.board_size, pos.recent, self.result_string,
white_name=os.path.basename(self.network.save_file) or "Unknown",
black_name=os.path.basename(self.network.save_file) or "Unknown",
comments=comments)
def is_done(self):
return self.result != 0 or self.root.is_done()
def extract_data(self):
assert len(self.searches_pi) == self.root.position.n
assert self.result != 0
for pwc, pi in zip(go.replay_position(
self.board_size, self.root.position, self.result), self.searches_pi):
yield pwc.position, pi, pwc.result
def chat(self, msg_type, sender, text):
default_response = (
"Supported commands are 'winrate', 'nextplay', 'fortune', and 'help'.")
if self.root is None or self.root.position.n == 0:
return "I'm not playing right now. " + default_response
if 'winrate' in text.lower():
wr = (abs(self.root.Q) + 1.0) / 2.0
color = "Black" if self.root.Q > 0 else "White"
return "{:s} {:.2f}%".format(color, wr * 100.0)
elif 'nextplay' in text.lower():
return "I'm thinking... " + self.root.most_visited_path()
elif 'fortune' in text.lower():
return "You're feeling lucky!"
elif 'help' in text.lower():
return "I can't help much with go -- try ladders! Otherwise: {}".format(
default_response)
else:
return default_response
class CGOSPlayerMixin(MCTSPlayerMixin):
def suggest_move(self, position):
self.seconds_per_move = time_recommendation(position.n)
return super().suggest_move(position)
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Define symmetries for feature transformation.
Allowable symmetries:
identity [12][34]
rot90 [24][13]
rot180 [43][21]
rot270 [31][42]
flip [13][24]
fliprot90 [34][12]
fliprot180 [42][31]
fliprot270 [21][43]
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import random
import numpy as np
INVERSES = {
'identity': 'identity',
'rot90': 'rot270',
'rot180': 'rot180',
'rot270': 'rot90',
'flip': 'flip',
'fliprot90': 'fliprot90',
'fliprot180': 'fliprot180',
'fliprot270': 'fliprot270',
}
IMPLS = {
'identity': lambda x: x,
'rot90': np.rot90,
'rot180': functools.partial(np.rot90, k=2),
'rot270': functools.partial(np.rot90, k=3),
'flip': lambda x: np.rot90(np.fliplr(x)),
'fliprot90': np.flipud,
'fliprot180': lambda x: np.rot90(np.flipud(x)),
'fliprot270': np.fliplr,
}
assert set(INVERSES.keys()) == set(IMPLS.keys())
SYMMETRIES = list(INVERSES.keys())
# A symmetry is just a string describing the transformation.
def invert_symmetry(s):
return INVERSES[s]
def apply_symmetry_feat(s, features):
return IMPLS[s](features)
def apply_symmetry_pi(board_size, s, pi):
pi = np.copy(pi)
# rotate all moves except for the pass move at end
pi[:-1] = IMPLS[s](pi[:-1].reshape([board_size, board_size])).ravel()
return pi
def randomize_symmetries_feat(features):
symmetries_used = [random.choice(SYMMETRIES) for f in features]
return symmetries_used, [apply_symmetry_feat(s, f)
for s, f in zip(symmetries_used, features)]
def invert_symmetries_pi(board_size, symmetries, pis):
return [apply_symmetry_pi(board_size, invert_symmetry(s), pi)
for s, pi in zip(symmetries, pis)]
# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Utilities for MiniGo and DualNet model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from contextlib import contextmanager
import functools
import itertools
import math
import operator
import os
import random
import re
import string
import time
import tensorflow as tf # pylint: disable=g-bad-import-order
# Regular expression of model number and name.
MODEL_NUM_REGEX = r'^\d{6}' # model_num consists of six digits
# model_name consists of six digits followed by a dash and the model name
MODEL_NAME_REGEX = r'^\d{6}(-\w+)+'
def random_generator(size=6, chars=string.ascii_letters + string.digits):
return ''.join(random.choice(chars) for x in range(size))
def generate_model_name(model_num):
"""Generate a full model name for the given model number.
Args:
model_num: The number/generation of the model.
Returns:
The model's full name: model_num-model_name.
"""
if model_num == 0: # Model number for bootstrap model
new_name = 'bootstrap'
else:
new_name = random_generator()
full_name = '{:06d}-{}'.format(model_num, new_name)
return full_name
def detect_model_num(full_name):
"""Take the full name of a model and extract its model number.
Args:
full_name: The full name of a model.
Returns:
The model number. For example: '000000-bootstrap.index' => 0.
"""
match = re.match(MODEL_NUM_REGEX, full_name)
if match:
return int(match.group())
else:
return None
def detect_model_name(full_name):
"""Take the full name of a model and extract its model name.
Args:
full_name: The full name of a model.
Returns:
The model name. For example: '000000-bootstrap.index' => '000000-bootstrap'.
"""
match = re.match(MODEL_NAME_REGEX, full_name)
if match:
return match.group()
else:
return None
def get_models(models_dir):
"""Get all models.
Args:
models_dir: The directory of all models.
Returns:
A list of model number and names sorted increasingly. For example:
[(13, 000013-modelname), (17, 000017-modelname), ...etc]
"""
all_models = tf.gfile.Glob(os.path.join(models_dir, '*.meta'))
model_filenames = [os.path.basename(m) for m in all_models]
model_numbers_names = sorted([
(detect_model_num(m), detect_model_name(m))
for m in model_filenames])
return model_numbers_names
def get_latest_model(models_dir):
"""Find the latest model.
Args:
models_dir: The directory of all models.
Returns:
The model number and name of the latest model. For example:
(17, 000017-modelname)
"""
models = get_models(models_dir)
if models is None:
models = [(0, '000000-bootstrap')]
return models[-1]
def round_power_of_two(n):
"""Finds the nearest power of 2 to a number.
Thus 84 -> 64, 120 -> 128, etc.
Args:
n: The given number.
Returns:
The nearest 2-power number to n.
"""
return 2 ** int(round(math.log(n, 2)))
def parse_game_result(result):
if re.match(r'[bB]\+', result):
return 1
elif re.match(r'[wW]\+', result):
return -1
else:
return 0
def product(numbers):
return functools.reduce(operator.mul, numbers)
def take_n(n, iterable):
return list(itertools.islice(iterable, n))
def iter_chunks(chunk_size, iterator):
iterator = iter(iterator)
while True:
next_chunk = take_n(chunk_size, iterator)
# If len(iterable) % chunk_size == 0, don't return an empty chunk.
if next_chunk:
yield next_chunk
else:
break
def shuffler(iterator, pool_size=10**5, refill_threshold=0.9):
yields_between_refills = round(pool_size * (1 - refill_threshold))
# initialize pool; this step may or may not exhaust the iterator.
pool = take_n(pool_size, iterator)
while True:
random.shuffle(pool)
for _ in range(yields_between_refills):
yield pool.pop()
next_batch = take_n(yields_between_refills, iterator)
if not next_batch:
break
pool.extend(next_batch)
# finish consuming whatever's left - no need for further randomization.
# yield from pool
print(type(pool))
for p in pool:
yield p
@contextmanager
def timer(message):
tick = time.time()
yield
tock = time.time()
print('{}: {:.3} seconds'.foramt(message, (tock - tick)))
@contextmanager
def logged_timer(message):
tick = time.time()
yield
tock = time.time()
print('{}: {:.3} seconds'.format(message, (tock - tick)))
tf.logging.info('{}: {:.3} seconds'.format(message, (tock - tick)))
class MiniGoDirectory(object):
"""The class to set up directories of MiniGo."""
def __init__(self, base_dir):
self.trained_models_dir = os.path.join(base_dir, 'models')
self.estimator_model_dir = os.path.join(base_dir, 'estimator_model_dir/')
self.selfplay_dir = os.path.join(base_dir, 'data/selfplay/')
self.holdout_dir = os.path.join(base_dir, 'data/holdout/')
self.training_chunk_dir = os.path.join(base_dir, 'data/training_chunks/')
self.sgf_dir = os.path.join(base_dir, 'sgf/')
self.evaluate_dir = os.path.join(base_dir, 'sgf/evaluate/')
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment