Unverified Commit a82b4a3b authored by RayMeng8's avatar RayMeng8 Committed by GitHub
Browse files

add PBT tuner (#2139)

parent c261146a
...@@ -21,6 +21,7 @@ Currently, we support the following algorithms: ...@@ -21,6 +21,7 @@ Currently, we support the following algorithms:
|[__BOHB__](#BOHB)|BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)| |[__BOHB__](#BOHB)|BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
|[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)| |[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
|[__PPO Tuner__](#PPOTuner)|PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)| |[__PPO Tuner__](#PPOTuner)|PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
|[__PBT Tuner__](#PBTTuner)|PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. [Reference Paper](https://arxiv.org/abs/1711.09846v1)|
## Usage of Built-in Tuners ## Usage of Built-in Tuners
...@@ -453,6 +454,34 @@ tuner: ...@@ -453,6 +454,34 @@ tuner:
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
``` ```
<a name="PBTTuner"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `PBT Tuner`
> Built-in Tuner Name: **PBTTuner**
**Suggested scenario**
Population Based Training (PBT) which bridges and extends parallel search methods and sequential optimization methods. It has a wallclock run time that is no greater than that of a single optimization process, does not require sequential runs, and is also able to use fewer computational resources than naive search methods. Therefore, it's effective when you want to save computational resources and time. Besides, PBT returns hyperparameter scheduler instead of configuration. If you don't need to get a specific configuration, but just expect good results, you can choose this tuner. It should be noted that, in our implementation, the operation of checkpoint storage location is involved. A trial is considered as several traning epochs of training, so the loading and saving of checkpoint must be specified in the trial code, which is different with other tuners. Otherwise, if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials. You could try it on very simple task, such as the [mnist-pbt-tuner-pytorch](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-pbt-tuner-pytorch) example. [See details](./PBTTuner.md)
**classArgs requirements:**
* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **all_checkpoint_dir** (*str, optional, default = None*) - Directory for trials to load and save checkpoint, if not specified, the directory would be "~/nni/checkpoint/<exp-id>". Note that if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials.
* **population_size** (*int, optional, default = 10*) - Number of trials for each step. In our implementation, one step is running each trial by specific training epochs set by users.
* **factors** (*tuple, optional, default = (1.2, 0.8)*) - Factors for perturbation of hyperparameters.
* **fraction** (*float, optional, default = 0.2*) - Fraction for selecting bottom and top trials.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: PBTTuner
classArgs:
optimize_mode: maximize
```
## **Reference and Feedback** ## **Reference and Feedback**
* To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub; * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub; * To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
......
PBT Tuner on NNI
===
## PBTTuner
Population Based Training (PBT) comes from [Population Based Training of Neural Networks](https://arxiv.org/abs/1711.09846v1). It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training.
PBTTuner initializes a population with several trials. Users can set a specific number of training epochs. After a certain number of epochs, the parameters and hyperparameters in the trial with bad metrics will be replaced with a better trial (exploit). Then the hyperparameters are perturbed (explore).
In our implementation, training epochs in the trial code is regarded as a step of PBT, different with other tuners. At the end of each step, PBT tuner will do exploitation and exploration -- replacing some trials with new trials. This is implemented by constantly modifying the values of `load_checkpoint_dir` and `save_checkpoint_dir`. We can directly change `load_checkpoint_dir` to replace parameters and hyperparameters, and `save_checkpoint_dir` to save a checkpoint that will be loaded in the next step. To this end, we need a shared folder which is accessible to all trials.
If the experiment is running in local mode, users could provide an argument `all_checkpoint_dir` which will be the base folder of `load_checkpoint_dir` and `save_checkpoint_dir` (`checkpoint_dir` is set to `all_checkpoint_dir/<population-id>/<step>`). By default, `all_checkpoint_dir` is set to be `~/nni/experiments/<exp-id>/checkpoint`. If the experiment is in non-local mode, then users should provide a path in a shared storage folder which is mounted at `all_checkpoint_dir` on worker machines (but it's not necessarily available on the machine which runs tuner).
...@@ -23,3 +23,4 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa ...@@ -23,3 +23,4 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
Hyperband <Tuner/HyperbandAdvisor> Hyperband <Tuner/HyperbandAdvisor>
BOHB <Tuner/BohbAdvisor> BOHB <Tuner/BohbAdvisor>
PPO Tuner <Tuner/PPOTuner> PPO Tuner <Tuner/PPOTuner>
PBT Tuner <Tuner/PBTTuner>
authorName: default
experimentName: example_mnist_pbt_tuner_pytorch
trialConcurrency: 3
maxExecDuration: 2h
maxTrialNum: 100
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
# codeDir: ~/nni/src/sdk/pynni/nni/pbt_tuner
# classFileName: pbt_tuner.py
# className: PBTTuner
builtinTunerName: PBTTuner
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 1
import argparse
import logging
import os
import nni
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
logger = logging.getLogger('mnist_pbt_tuner_pytorch_AutoML')
class Net(nn.Module):
def __init__(self, hidden_size):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, hidden_size)
self.fc2 = nn.Linear(hidden_size, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args['log_interval'] == 0:
logger.info('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, reduction='sum').item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
accuracy = 100. * correct / len(test_loader.dataset)
logger.info('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset), accuracy))
return accuracy
def save_checkpoint(model, checkpoint_path):
torch.save(model.state_dict(), checkpoint_path)
def load_checkpoint(checkpoint_path):
model_state_dict = torch.load(checkpoint_path)
return model_state_dict
def main(args):
use_cuda = not args['no_cuda'] and torch.cuda.is_available()
torch.manual_seed(args['seed'])
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
data_dir = os.path.join(args['data_dir'], nni.get_trial_id())
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(data_dir, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args['batch_size'], shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(data_dir, train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=1000, shuffle=True, **kwargs)
hidden_size = args['hidden_size']
model = Net(hidden_size=hidden_size).to(device)
save_checkpoint_dir = args['save_checkpoint_dir']
save_checkpoint_path = os.path.join(save_checkpoint_dir, 'model.pth')
load_checkpoint_path = os.path.join(args['load_checkpoint_dir'], 'model.pth')
if os.path.isfile(load_checkpoint_path):
model_state_dict = load_checkpoint(load_checkpoint_path)
logger.info("test : " + load_checkpoint_path)
logger.info(type(model_state_dict))
model.load_state_dict(model_state_dict)
optimizer = optim.SGD(model.parameters(), lr=args['lr'],
momentum=args['momentum'])
#epoch is perturbation interval
for epoch in range(1, args['epochs'] + 1):
train(args, model, device, train_loader, optimizer, epoch)
test_acc = test(args, model, device, test_loader)
if epoch < args['epochs']:
# report intermediate result
nni.report_intermediate_result(test_acc)
logger.debug('test accuracy %g', test_acc)
logger.debug('Pipe send intermediate result done.')
else:
# report final result
nni.report_final_result(test_acc)
logger.debug('Final result is %g', test_acc)
logger.debug('Send final result done.')
if not os.path.exists(save_checkpoint_dir):
os.makedirs(save_checkpoint_dir)
save_checkpoint(model, save_checkpoint_path)
def get_params():
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument("--data_dir", type=str,
default='./tmp/pytorch/mnist/input_data', help="data directory")
parser.add_argument('--batch_size', type=int, default=64, metavar='N',
help='input batch size for training (default: 64)')
parser.add_argument("--hidden_size", type=int, default=512, metavar='N',
help='hidden layer size (default: 512)')
parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
help='learning rate (default: 0.01)')
parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
help='SGD momentum (default: 0.5)')
parser.add_argument('--epochs', type=int, default=10, metavar='N',
help='number of epochs to train (default: 10)')
parser.add_argument('--seed', type=int, default=1, metavar='S',
help='random seed (default: 1)')
parser.add_argument('--no_cuda', action='store_true', default=False,
help='disables CUDA training')
parser.add_argument('--log_interval', type=int, default=1000, metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--save_checkpoint_dir', type=str,
help='where to save checkpoint of this trial')
parser.add_argument('--load_checkpoint_dir', type=str,
help='where to load the model')
args, _ = parser.parse_known_args()
return args
if __name__ == '__main__':
try:
# get parameters form tuner
tuner_params = nni.get_next_parameter()
logger.debug(tuner_params)
params = vars(get_params())
params.update(tuner_params)
main(params)
except Exception as exception:
logger.exception(exception)
raise
\ No newline at end of file
{
"batch_size": {"_type":"choice", "_value": [16, 32, 64, 128]},
"hidden_size":{"_type":"choice","_value":[128, 256, 512, 1024]},
"lr":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]},
"momentum":{"_type":"uniform","_value":[0, 1]}
}
...@@ -178,7 +178,7 @@ export namespace ValidationSchemas { ...@@ -178,7 +178,7 @@ export namespace ValidationSchemas {
gpuIndices: joi.string() gpuIndices: joi.string()
}), }),
tuner: joi.object({ tuner: joi.object({
builtinTunerName: joi.string().valid('TPE', 'Random', 'Anneal', 'Evolution', 'SMAC', 'BatchTuner', 'GridSearch', 'NetworkMorphism', 'MetisTuner', 'GPTuner', 'PPOTuner'), builtinTunerName: joi.string().valid('TPE', 'Random', 'Anneal', 'Evolution', 'SMAC', 'BatchTuner', 'GridSearch', 'NetworkMorphism', 'MetisTuner', 'GPTuner', 'PPOTuner', 'PBTTuner'),
codeDir: joi.string(), codeDir: joi.string(),
classFileName: joi.string(), classFileName: joi.string(),
className: joi.string(), className: joi.string(),
......
...@@ -15,7 +15,8 @@ ModuleName = { ...@@ -15,7 +15,8 @@ ModuleName = {
'Curvefitting': 'nni.curvefitting_assessor.curvefitting_assessor', 'Curvefitting': 'nni.curvefitting_assessor.curvefitting_assessor',
'MetisTuner': 'nni.metis_tuner.metis_tuner', 'MetisTuner': 'nni.metis_tuner.metis_tuner',
'GPTuner': 'nni.gp_tuner.gp_tuner', 'GPTuner': 'nni.gp_tuner.gp_tuner',
'PPOTuner': 'nni.ppo_tuner.ppo_tuner' 'PPOTuner': 'nni.ppo_tuner.ppo_tuner',
'PBTTuner': 'nni.pbt_tuner.pbt_tuner'
} }
ClassName = { ClassName = {
...@@ -30,6 +31,7 @@ ClassName = { ...@@ -30,6 +31,7 @@ ClassName = {
'MetisTuner':'MetisTuner', 'MetisTuner':'MetisTuner',
'GPTuner':'GPTuner', 'GPTuner':'GPTuner',
'PPOTuner': 'PPOTuner', 'PPOTuner': 'PPOTuner',
'PBTTuner': 'PBTTuner',
'Medianstop': 'MedianstopAssessor', 'Medianstop': 'MedianstopAssessor',
'Curvefitting': 'CurvefittingAssessor' 'Curvefitting': 'CurvefittingAssessor'
......
...@@ -10,97 +10,8 @@ import random ...@@ -10,97 +10,8 @@ import random
import numpy as np import numpy as np
from nni.tuner import Tuner from nni.tuner import Tuner
from nni.utils import NodeType, OptimizeMode, extract_scalar_reward, split_index from nni.utils import OptimizeMode, extract_scalar_reward, split_index, json2parameter, json2space
import nni.parameter_expressions as parameter_expressions
def json2space(x, oldy=None, name=NodeType.ROOT):
"""
Change search space from json format to hyperopt format
"""
y = list()
if isinstance(x, dict):
if NodeType.TYPE in x.keys():
_type = x[NodeType.TYPE]
name = name + '-' + _type
if _type == 'choice':
if oldy is not None:
_index = oldy[NodeType.INDEX]
y += json2space(x[NodeType.VALUE][_index],
oldy[NodeType.VALUE], name=name+'[%d]' % _index)
else:
y += json2space(x[NodeType.VALUE], None, name=name)
y.append(name)
else:
for key in x.keys():
y += json2space(x[key], oldy[key] if oldy else None, name+"[%s]" % str(key))
elif isinstance(x, list):
for i, x_i in enumerate(x):
if isinstance(x_i, dict):
if NodeType.NAME not in x_i.keys():
raise RuntimeError('\'_name\' key is not found in this nested search space.')
y += json2space(x_i, oldy[i] if oldy else None, name + "[%d]" % i)
return y
def json2parameter(x, is_rand, random_state, oldy=None, Rand=False, name=NodeType.ROOT):
"""
Json to pramaters.
"""
if isinstance(x, dict):
if NodeType.TYPE in x.keys():
_type = x[NodeType.TYPE]
_value = x[NodeType.VALUE]
name = name + '-' + _type
Rand |= is_rand[name]
if Rand is True:
if _type == 'choice':
_index = random_state.randint(len(_value))
y = {
NodeType.INDEX: _index,
NodeType.VALUE: json2parameter(
x[NodeType.VALUE][_index],
is_rand,
random_state,
None,
Rand,
name=name+"[%d]" % _index
)
}
else:
y = getattr(parameter_expressions, _type)(*(_value + [random_state]))
else:
y = copy.deepcopy(oldy)
else:
y = dict()
for key in x.keys():
y[key] = json2parameter(
x[key],
is_rand,
random_state,
oldy[key] if oldy else None,
Rand,
name + "[%s]" % str(key)
)
elif isinstance(x, list):
y = list()
for i, x_i in enumerate(x):
if isinstance(x_i, dict):
if NodeType.NAME not in x_i.keys():
raise RuntimeError('\'_name\' key is not found in this nested search space.')
y.append(json2parameter(
x_i,
is_rand,
random_state,
oldy[i] if oldy else None,
Rand,
name + "[%d]" % i
))
else:
y = copy.deepcopy(x)
return y
class Individual: class Individual:
""" """
......
...@@ -9,7 +9,7 @@ import numpy as np ...@@ -9,7 +9,7 @@ import numpy as np
from unittest import TestCase, main from unittest import TestCase, main
from nni.evolution_tuner.evolution_tuner import json2space, json2parameter from nni.utils import json2space, json2parameter
class EvolutionTunerTestCase(TestCase): class EvolutionTunerTestCase(TestCase):
......
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import copy
import logging
import os
import numpy as np
import nni
from nni.tuner import Tuner
from nni.utils import OptimizeMode, extract_scalar_reward, split_index, json2parameter, json2space
logger = logging.getLogger('pbt_tuner_AutoML')
def exploit_and_explore(bot_trial_info, top_trial_info, factors, epoch, search_space):
"""
Replace checkpoint of bot_trial with top, and perturb hyperparameters
Parameters
----------
bot_trial_info : TrialInfo
bottom model whose parameters should be replaced
top_trial_info : TrialInfo
better model
factors : float
factors for perturbation
epoch : int
step of PBTTuner
search_space : dict
search_space to keep perturbed hyperparameters in range
"""
bot_checkpoint_dir = bot_trial_info.checkpoint_dir
top_hyper_parameters = top_trial_info.hyper_parameters
hyper_parameters = copy.deepcopy(top_hyper_parameters)
# TODO think about different type of hyperparameters for 1.perturbation 2.within search space
for key in hyper_parameters.keys():
if key == 'load_checkpoint_dir':
hyper_parameters[key] = hyper_parameters['save_checkpoint_dir']
elif key == 'save_checkpoint_dir':
hyper_parameters[key] = os.path.join(bot_checkpoint_dir, str(epoch))
elif isinstance(hyper_parameters[key], float):
perturb = np.random.choice(factors)
val = hyper_parameters[key] * perturb
lb, ub = search_space[key]["_value"][:2]
if search_space[key]["_type"] in ("uniform", "normal"):
val = np.clip(val, lb, ub).item()
hyper_parameters[key] = val
else:
continue
bot_trial_info.hyper_parameters = hyper_parameters
bot_trial_info.clean_id()
class TrialInfo:
"""
Information of each trial, refresh for each epoch
"""
def __init__(self, checkpoint_dir=None, hyper_parameters=None, parameter_id=None, score=None):
self.checkpoint_dir = checkpoint_dir
self.hyper_parameters = hyper_parameters
self.parameter_id = parameter_id
self.score = score
def clean_id(self):
self.parameter_id = None
class PBTTuner(Tuner):
def __init__(self, optimize_mode="maximize", all_checkpoint_dir=None, population_size=10, factors=(1.2, 0.8), fraction=0.2):
"""
Initialization
Parameters
----------
optimize_mode : str
maximize or minimize
all_checkpoint_dir : str
directory to store training model checkpoint
population_size : int
number of trials for each epoch
factors : tuple
factors for perturbation
fraction : float
fraction for selecting bottom and top trials
"""
self.optimize_mode = OptimizeMode(optimize_mode)
if all_checkpoint_dir is None:
all_checkpoint_dir = os.getenv('NNI_CHECKPOINT_DIRECTORY')
logger.info("Checkpoint dir is set to %s by default.", all_checkpoint_dir)
self.all_checkpoint_dir = all_checkpoint_dir
self.population_size = population_size
self.factors = factors
self.fraction = fraction
# defined in trial code
#self.perturbation_interval = perturbation_interval
self.population = None
self.pos = -1
self.param_ids = []
self.running = {}
self.finished = []
self.credit = 0
self.finished_trials = 0
self.epoch = 0
self.searchspace_json = None
self.space = None
self.send_trial_callback = None
logger.info('PBT tuner initialization')
def update_search_space(self, search_space):
"""
Get search space
Parameters
----------
search_space : dict
Search space
"""
logger.info('Update search space %s', search_space)
self.searchspace_json = search_space
self.space = json2space(self.searchspace_json)
self.random_state = np.random.RandomState()
self.population = []
is_rand = dict()
for item in self.space:
is_rand[item] = True
for i in range(self.population_size):
hyper_parameters = json2parameter(
self.searchspace_json, is_rand, self.random_state)
checkpoint_dir = os.path.join(self.all_checkpoint_dir, str(i))
hyper_parameters['load_checkpoint_dir'] = os.path.join(checkpoint_dir, str(self.epoch))
hyper_parameters['save_checkpoint_dir'] = os.path.join(checkpoint_dir, str(self.epoch))
self.population.append(TrialInfo(checkpoint_dir=checkpoint_dir, hyper_parameters=hyper_parameters))
def generate_multiple_parameters(self, parameter_id_list, **kwargs):
"""
Returns multiple sets of trial (hyper-)parameters, as iterable of serializable objects.
Parameters
----------
parameter_id_list : list of int
Unique identifiers for each set of requested hyper-parameters.
These will later be used in :meth:`receive_trial_result`.
**kwargs
Used for send_trial_callback.
Returns
-------
list
A list of newly generated configurations
"""
result = []
self.send_trial_callback = kwargs['st_callback']
for parameter_id in parameter_id_list:
had_exception = False
try:
logger.debug("generating param for %s", parameter_id)
res = self.generate_parameters(parameter_id, **kwargs)
except nni.NoMoreTrialError:
had_exception = True
if not had_exception:
result.append(res)
return result
def generate_parameters(self, parameter_id, **kwargs):
"""
Generate parameters, if no trial configration for now, self.credit plus 1 to send the config later
Parameters
----------
parameter_id : int
Unique identifier for requested hyper-parameters.
This will later be used in :meth:`receive_trial_result`.
**kwargs
Not used
Returns
-------
dict
One newly generated configuration
"""
if self.pos == self.population_size - 1:
logger.debug('Credit added by one in parameters request')
self.credit += 1
self.param_ids.append(parameter_id)
raise nni.NoMoreTrialError('No more parameters now.')
self.pos += 1
trial_info = self.population[self.pos]
trial_info.parameter_id = parameter_id
self.running[parameter_id] = trial_info
logger.info('Generate parameter : %s', trial_info.hyper_parameters)
return split_index(trial_info.hyper_parameters)
def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
"""
Receive trial's result. if the number of finished trials equals ``self.population_size``, start the next epoch to
train the model.
Parameters
----------
parameter_id : int
Unique identifier of used hyper-parameters, same with :meth:`generate_parameters`.
parameters : dict
Hyper-parameters generated by :meth:`generate_parameters`.
value : dict
Result from trial (the return value of :func:`nni.report_final_result`).
"""
logger.info('Get one trial result, id = %d, value = %s', parameter_id, value)
value = extract_scalar_reward(value)
if self.optimize_mode == OptimizeMode.Minimize:
value = -value
trial_info = self.running.pop(parameter_id, None)
trial_info.score = value
self.finished.append(trial_info)
self.finished_trials += 1
if self.finished_trials == self.population_size:
logger.info('Proceeding to next epoch')
self.epoch += 1
self.population = []
self.pos = -1
self.running = {}
#exploit and explore
self.finished = sorted(self.finished, key=lambda x: x.score, reverse=True)
cutoff = int(np.ceil(self.fraction * len(self.finished)))
tops = self.finished[:cutoff]
bottoms = self.finished[self.finished_trials - cutoff:]
for bottom in bottoms:
top = np.random.choice(tops)
exploit_and_explore(bottom, top, self.factors, self.epoch, self.searchspace_json)
for trial in self.finished:
if trial not in bottoms:
trial.clean_id()
trial.hyper_parameters['load_checkpoint_dir'] = trial.hyper_parameters['save_checkpoint_dir']
trial.hyper_parameters['save_checkpoint_dir'] = os.path.join(trial.checkpoint_dir, str(self.epoch))
self.finished_trials = 0
for _ in range(self.population_size):
trial_info = self.finished.pop()
self.population.append(trial_info)
while self.credit > 0 and self.pos + 1 < len(self.population):
self.credit -= 1
self.pos += 1
parameter_id = self.param_ids.pop()
trial_info = self.population[self.pos]
trial_info.parameter_id = parameter_id
self.running[parameter_id] = trial_info
self.send_trial_callback(parameter_id, split_index(trial_info.hyper_parameters))
def import_data(self, data):
pass
...@@ -2,13 +2,16 @@ ...@@ -2,13 +2,16 @@
# Licensed under the MIT license. # Licensed under the MIT license.
import os import os
import copy
import functools import functools
from enum import Enum, unique from enum import Enum, unique
import json_tricks import json_tricks
from . import parameter_expressions
from .common import init_logger from .common import init_logger
from .env_vars import dispatcher_env_vars from .env_vars import dispatcher_env_vars
to_json = functools.partial(json_tricks.dumps, allow_nan=True) to_json = functools.partial(json_tricks.dumps, allow_nan=True)
@unique @unique
...@@ -124,3 +127,92 @@ def init_dispatcher_logger(): ...@@ -124,3 +127,92 @@ def init_dispatcher_logger():
if dispatcher_env_vars.NNI_LOG_DIRECTORY is not None: if dispatcher_env_vars.NNI_LOG_DIRECTORY is not None:
logger_file_path = os.path.join(dispatcher_env_vars.NNI_LOG_DIRECTORY, logger_file_path) logger_file_path = os.path.join(dispatcher_env_vars.NNI_LOG_DIRECTORY, logger_file_path)
init_logger(logger_file_path, dispatcher_env_vars.NNI_LOG_LEVEL) init_logger(logger_file_path, dispatcher_env_vars.NNI_LOG_LEVEL)
def json2space(x, oldy=None, name=NodeType.ROOT):
"""
Change search space from json format to hyperopt format
"""
y = list()
if isinstance(x, dict):
if NodeType.TYPE in x.keys():
_type = x[NodeType.TYPE]
name = name + '-' + _type
if _type == 'choice':
if oldy is not None:
_index = oldy[NodeType.INDEX]
y += json2space(x[NodeType.VALUE][_index],
oldy[NodeType.VALUE], name=name+'[%d]' % _index)
else:
y += json2space(x[NodeType.VALUE], None, name=name)
y.append(name)
else:
for key in x.keys():
y += json2space(x[key], oldy[key] if oldy else None, name+"[%s]" % str(key))
elif isinstance(x, list):
for i, x_i in enumerate(x):
if isinstance(x_i, dict):
if NodeType.NAME not in x_i.keys():
raise RuntimeError('\'_name\' key is not found in this nested search space.')
y += json2space(x_i, oldy[i] if oldy else None, name + "[%d]" % i)
return y
def json2parameter(x, is_rand, random_state, oldy=None, Rand=False, name=NodeType.ROOT):
"""
Json to pramaters.
"""
if isinstance(x, dict):
if NodeType.TYPE in x.keys():
_type = x[NodeType.TYPE]
_value = x[NodeType.VALUE]
name = name + '-' + _type
Rand |= is_rand[name]
if Rand is True:
if _type == 'choice':
_index = random_state.randint(len(_value))
y = {
NodeType.INDEX: _index,
NodeType.VALUE: json2parameter(
x[NodeType.VALUE][_index],
is_rand,
random_state,
None,
Rand,
name=name+"[%d]" % _index
)
}
else:
y = getattr(parameter_expressions, _type)(*(_value + [random_state]))
else:
y = copy.deepcopy(oldy)
else:
y = dict()
for key in x.keys():
y[key] = json2parameter(
x[key],
is_rand,
random_state,
oldy[key] if oldy else None,
Rand,
name + "[%s]" % str(key)
)
elif isinstance(x, list):
y = list()
for i, x_i in enumerate(x):
if isinstance(x_i, dict):
if NodeType.NAME not in x_i.keys():
raise RuntimeError('\'_name\' key is not found in this nested search space.')
y.append(json2parameter(
x_i,
is_rand,
random_state,
oldy[i] if oldy else None,
Rand,
name + "[%d]" % i
))
else:
y = copy.deepcopy(x)
return y
...@@ -8,6 +8,7 @@ import os ...@@ -8,6 +8,7 @@ import os
import random import random
import shutil import shutil
import sys import sys
from collections import deque
from unittest import TestCase, main from unittest import TestCase, main
from nni.batch_tuner.batch_tuner import BatchTuner from nni.batch_tuner.batch_tuner import BatchTuner
...@@ -16,6 +17,8 @@ from nni.gp_tuner.gp_tuner import GPTuner ...@@ -16,6 +17,8 @@ from nni.gp_tuner.gp_tuner import GPTuner
from nni.gridsearch_tuner.gridsearch_tuner import GridSearchTuner from nni.gridsearch_tuner.gridsearch_tuner import GridSearchTuner
from nni.hyperopt_tuner.hyperopt_tuner import HyperoptTuner from nni.hyperopt_tuner.hyperopt_tuner import HyperoptTuner
from nni.metis_tuner.metis_tuner import MetisTuner from nni.metis_tuner.metis_tuner import MetisTuner
from nni.msg_dispatcher import _pack_parameter, MsgDispatcher
from nni.pbt_tuner.pbt_tuner import PBTTuner
try: try:
from nni.smac_tuner.smac_tuner import SMACTuner from nni.smac_tuner.smac_tuner import SMACTuner
...@@ -23,6 +26,7 @@ except ImportError: ...@@ -23,6 +26,7 @@ except ImportError:
assert sys.platform == "win32" assert sys.platform == "win32"
from nni.tuner import Tuner from nni.tuner import Tuner
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('test_tuner') logger = logging.getLogger('test_tuner')
...@@ -44,18 +48,29 @@ class BuiltinTunersTestCase(TestCase): ...@@ -44,18 +48,29 @@ class BuiltinTunersTestCase(TestCase):
self.params_each_round = 50 self.params_each_round = 50
self.exhaustive = False self.exhaustive = False
def send_trial_callback(self, param_queue):
def receive(*args):
param_queue.append(tuple(args))
return receive
def search_space_test_one(self, tuner_factory, search_space): def search_space_test_one(self, tuner_factory, search_space):
tuner = tuner_factory() tuner = tuner_factory()
self.assertIsInstance(tuner, Tuner) self.assertIsInstance(tuner, Tuner)
tuner.update_search_space(search_space) tuner.update_search_space(search_space)
for i in range(self.test_round): for i in range(self.test_round):
queue = deque()
parameters = tuner.generate_multiple_parameters(list(range(i * self.params_each_round, parameters = tuner.generate_multiple_parameters(list(range(i * self.params_each_round,
(i + 1) * self.params_each_round))) (i + 1) * self.params_each_round)),
st_callback=self.send_trial_callback(queue))
logger.debug(parameters) logger.debug(parameters)
self.check_range(parameters, search_space) self.check_range(parameters, search_space)
for k in range(min(len(parameters), self.params_each_round)): for k in range(min(len(parameters), self.params_each_round)):
tuner.receive_trial_result(self.params_each_round * i + k, parameters[k], random.uniform(-100, 100)) tuner.receive_trial_result(self.params_each_round * i + k, parameters[k], random.uniform(-100, 100))
while queue:
id_, params = queue.popleft()
self.check_range([params], search_space)
tuner.receive_trial_result(id_, params, random.uniform(-100, 100))
if not parameters and not self.exhaustive: if not parameters and not self.exhaustive:
raise ValueError("No parameters generated") raise ValueError("No parameters generated")
...@@ -65,6 +80,9 @@ class BuiltinTunersTestCase(TestCase): ...@@ -65,6 +80,9 @@ class BuiltinTunersTestCase(TestCase):
if self._testMethodName == "test_batch": if self._testMethodName == "test_batch":
param = {list(search_space.keys())[0]: param} param = {list(search_space.keys())[0]: param}
for k, v in param.items(): for k, v in param.items():
if k == "load_checkpoint_dir" or k == "save_checkpoint_dir":
self.assertIsInstance(v, str)
continue
if k.startswith("_mutable_layer"): if k.startswith("_mutable_layer"):
_, block, layer, choice = k.split("/") _, block, layer, choice = k.split("/")
cand = search_space[block]["_value"][layer].get(choice) cand = search_space[block]["_value"][layer].get(choice)
...@@ -124,8 +142,8 @@ class BuiltinTunersTestCase(TestCase): ...@@ -124,8 +142,8 @@ class BuiltinTunersTestCase(TestCase):
if any(single.startswith(t) for t in ignore_types): if any(single.startswith(t) for t in ignore_types):
continue continue
expected_fail = not any(single.startswith(t) for t in supported_types) or \ expected_fail = not any(single.startswith(t) for t in supported_types) or \
any(single.startswith(t) for t in fail_types) or \ any(single.startswith(t) for t in fail_types) or \
"fail" in single # name contains fail (fail on all) "fail" in single # name contains fail (fail on all)
single_search_space = {single: space} single_search_space = {single: space}
if not expected_fail: if not expected_fail:
# supports this key # supports this key
...@@ -270,6 +288,16 @@ class BuiltinTunersTestCase(TestCase): ...@@ -270,6 +288,16 @@ class BuiltinTunersTestCase(TestCase):
def test_ppo(self): def test_ppo(self):
pass pass
def test_pbt(self):
self.search_space_test_all(lambda: PBTTuner(
all_checkpoint_dir=os.path.expanduser("~/nni/checkpoint/test/"),
population_size=12
))
self.search_space_test_all(lambda: PBTTuner(
all_checkpoint_dir=os.path.expanduser("~/nni/checkpoint/test/"),
population_size=100
))
def tearDown(self): def tearDown(self):
file_list = glob.glob("smac3*") + ["param_config_space.pcs", "scenario.txt", "model_path"] file_list = glob.glob("smac3*") + ["param_config_space.pcs", "scenario.txt", "model_path"]
for file in file_list: for file in file_list:
......
...@@ -153,6 +153,18 @@ tuner_schema_dict = { ...@@ -153,6 +153,18 @@ tuner_schema_dict = {
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool), Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'), Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
}, },
'PBTTuner': {
'builtinTunerName': 'PBTTuner',
'classArgs': {
'optimize_mode': setChoice('optimize_mode', 'maximize', 'minimize'),
Optional('all_checkpoint_dir'): setType('all_checkpoint_dir', str),
Optional('population_size'): setNumberRange('population_size', int, 0, 99999),
Optional('factors'): setType('factors', tuple),
Optional('fraction'): setType('fraction', float),
},
Optional('includeIntermediateResults'): setType('includeIntermediateResults', bool),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
},
'customized': { 'customized': {
'codeDir': setPathCheck('codeDir'), 'codeDir': setPathCheck('codeDir'),
'classFileName': setType('classFileName', str), 'classFileName': setType('classFileName', str),
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment