Unverified Commit a911b856 authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Resolve conflicts for #4760 (#4762)

parent 14d2966b
ProxylessNAS on NNI
===================
Introduction
------------
The paper `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/pdf/1812.00332.pdf>`__ removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details.
Usage
-----
To use ProxylessNAS training/searching approach, users need to specify search space in their model using `NNI NAS interface <./MutationPrimitives.rst>`__\ , e.g., ``LayerChoice``\ , ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
.. code-block:: python
trainer = ProxylessTrainer(model,
loss=LabelSmoothingLoss(),
dataset=None,
optimizer=optimizer,
metrics=lambda output, target: accuracy(output, target, topk=(1, 5,)),
num_epochs=120,
log_frequency=10,
grad_reg_loss_type=args.grad_reg_loss_type,
grad_reg_loss_params=grad_reg_loss_params,
applied_hardware=args.applied_hardware, dummy_input=(1, 3, 224, 224),
ref_latency=args.reference_latency)
trainer.train()
trainer.export(args.arch_path)
The complete example code can be found :githublink:`here <examples/nas/oneshot/proxylessnas>`.
**Input arguments of ProxylessNasTrainer**
* **model** (*PyTorch model, required*\ ) - The model that users want to tune/search. It has mutables to specify search space.
* **metrics** (*PyTorch module, required*\ ) - The main term of the loss function for model train. Receives logits and ground truth label, return a loss tensor.
* **optimizer** (*PyTorch Optimizer, required*\) - The optimizer used for optimizing the model.
* **num_epochs** (*int, optional, default = 120*\ ) - The number of epochs to train/search.
* **dataset** (*PyTorch dataset, required*\ ) - Dataset for training. Will be split for training weights and architecture weights.
* **warmup_epochs** (*int, optional, default = 0*\ ) - The number of epochs to do during warmup.
* **batch_size** (*int, optional, default = 64*\ ) - Batch size.
* **workers** (*int, optional, default = 4*\ ) - Workers for data loading.
* **device** (*device, optional, default = 'cpu'*\ ) - The devices that users provide to do the train/search. The trainer applies data parallel on the model for users.
* **log_frequency** (*int, optional, default = None*\ ) - Step count per logging.
* **arc_learning_rate** (*float, optional, default = 1e-3*\ ) - The learning rate of the architecture parameters optimizer.
* **grad_reg_loss_type** (*'mul#log', 'add#linear', or None, optional, default = 'add#linear'*\ ) - Regularization type to add hardware related loss. The trainer will not apply loss regularization when grad_reg_loss_type is set as None.
* **grad_reg_loss_params** (*dict, optional, default = None*\ ) - Regularization params. 'alpha' and 'beta' is required when ``grad_reg_loss_type`` is 'mul#log', 'lambda' is required when ``grad_reg_loss_type`` is 'add#linear'.
* **applied_hardware** (*string, optional, default = None*\ ) - Applied hardware for to constraint the model's latency. Latency is predicted by Microsoft nn-Meter (https://github.com/microsoft/nn-Meter).
* **dummy_input** (*tuple, optional, default = (1, 3, 224, 224)*\ ) - The dummy input shape when applied to the target hardware.
* **ref_latency** (*float, optional, default = 65.0*\ ) - Reference latency value in the applied hardware (ms).
Implementation
--------------
The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based. In our current implementation on NNI, gradient descent training approach is supported. The complete support of ProxylessNAS is ongoing.
The official implementation supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In NNI repo, the hardware latency prediction is supported by `Microsoft nn-Meter <https://github.com/microsoft/nn-Meter>`__. nn-Meter is an accurate inference latency predictor for DNN models on diverse edge devices. nn-Meter support four hardwares up to now, including *'cortexA76cpu_tflite21'*, *'adreno640gpu_tflite21'*, *'adreno630gpu_tflite21'*, and *'myriadvpu_openvino2019r2'*. Users can find more information about nn-Meter on its website. More hardware will be supported in the future. Users could find more details about applying ``nn-Meter`` `here <./HardwareAwareNAS.rst>`__ .
Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/oneshot/proxylessnas>` using :githublink:`NNI NAS interface <nni/retiarii/oneshot/pytorch/proxyless>`.
.. image:: ../../img/proxylessnas.png
:target: ../../img/proxylessnas.png
:alt:
ProxylessNAS training approach is composed of ProxylessLayerChoice and ProxylessNasTrainer. ProxylessLayerChoice instantiates MixedOp for each mutable (i.e., LayerChoice), and manage architecture weights in MixedOp. **For DataParallel**\ , architecture weights should be included in user model. Specifically, in ProxylessNAS implementation, we add MixedOp to the corresponding mutable (i.e., LayerChoice) as a member variable. The ProxylessLayerChoice class also exposes two member functions, i.e., ``resample``\ , ``finalize_grad``\ , for the trainer to control the training of architecture weights.
ProxylessNasMutator also implements the forward logic of the mutables (i.e., LayerChoice).
Reproduce Results
-----------------
To reproduce the result, we first run the search, we found that though it runs many epochs the chosen architecture converges at the first several epochs. This is probably induced by hyper-parameters or the implementation, we are working on it.
\ No newline at end of file
Quick Start of Retiarii on NNI
==============================
.. contents::
In this quick start, we use multi-trial NAS as an example to show how to construct and explore a model space. There are mainly three crucial components for a neural architecture search task, namely,
* Model search space that defines a set of models to explore.
* A proper strategy as the method to explore this model space.
* A model evaluator that reports the performance of every model in the space.
The tutorial for One-shot NAS can be found `here <./OneshotTrainer.rst>`__.
Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**. This documentation assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
Define your Model Space
-----------------------
Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models. In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
Define Base Model
^^^^^^^^^^^^^^^^^
Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model. Usually, you only need to replace the code ``import torch.nn as nn`` with ``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
Below is a very simple example of defining a base model.
.. code-block:: python
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper # this decorator should be put on the out most
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
.. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`. Many mistakes are a result of forgetting one of those. Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.
Define Model Mutations
^^^^^^^^^^^^^^^^^^^^^^
A base model is only one concrete model not a model space. We provide `APIs and primitives <./MutationPrimitives.rst>`__ for users to express how the base model can be mutated. That is, to build a model space which includes many models.
Based on the above base model, we can define a model space as below.
.. code-block:: diff
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
- self.conv2 = nn.Conv2d(32, 64, 3, 1)
+ self.conv2 = nn.LayerChoice([
+ nn.Conv2d(32, 64, 3, 1),
+ DepthwiseSeparableConv(32, 64)
+ ])
- self.dropout1 = nn.Dropout(0.25)
+ self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
self.dropout2 = nn.Dropout(0.5)
- self.fc1 = nn.Linear(9216, 128)
- self.fc2 = nn.Linear(128, 10)
+ feature = nn.ValueChoice([64, 128, 256])
+ self.fc1 = nn.Linear(9216, feature)
+ self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
This example uses two mutation APIs, ``nn.LayerChoice`` and ``nn.ValueChoice``. ``nn.LayerChoice`` takes a list of candidate modules (two in this example), one will be chosen for each sampled model. It can be used like normal PyTorch module. ``nn.ValueChoice`` takes a list of candidate values, one will be chosen to take effect for each sampled model.
More detailed API description and usage can be found `here <./construct_space.rst>`__ .
.. note:: We are actively enriching the mutation APIs, to facilitate easy construction of model space. If the currently supported mutation APIs cannot express your model space, please refer to `this doc <./Mutators.rst>`__ for customizing mutators.
Explore the Defined Model Space
-------------------------------
There are basically two exploration approaches: (1) search by evaluating each sampled model independently, which is the search approach in multi-trial NAS and (2) one-shot weight-sharing based search, which is used in one-shot NAS. We demonstrate the first approach in this tutorial. Users can refer to `here <./OneshotTrainer.rst>`__ for the second approach.
First, users need to pick a proper exploration strategy to explore the defined model space. Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
Pick an exploration strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Retiarii supports many `exploration strategies <./ExplorationStrategies.rst>`__.
Simply choosing (i.e., instantiate) an exploration strategy as below.
.. code-block:: python
import nni.retiarii.strategy as strategy
search_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
Pick or customize a model evaluator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model to obtain the model's performance. The performance is sent to the exploration strategy for the strategy to generate better models.
Retiarii has provided `built-in model evaluators <./ModelEvaluators.rst>`__, but to start with, it is recommended to use ``FunctionalEvaluator``, that is, to wrap your own training and evaluation code with one single function. This function should receive one single model class and uses ``nni.report_final_result`` to report the final score of this model.
An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
.. code-block:: python
def evaluate_model(model_cls):
# "model_cls" is a class, need to instantiate
model = model_cls()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
for epoch in range(3):
# train the model for one epoch
train_epoch(model, device, train_loader, optimizer, epoch)
# test the model for one epoch
accuracy = test_epoch(model, device, test_loader)
# call report intermediate result. Result can be float or dict
nni.report_intermediate_result(accuracy)
# report final test result
nni.report_final_result(accuracy)
# Create the evaluator
evaluator = nni.retiarii.evaluator.FunctionalEvaluator(evaluate_model)
The ``train_epoch`` and ``test_epoch`` here can be any customized function, where users can write their own training recipe. See :githublink:`examples/nas/multi-trial/mnist/search.py` for the full example.
It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``. However, in the `advanced tutorial <./ModelEvaluators.rst>`__, we will show how to use additional arguments in case you actually need those. In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
Launch an Experiment
--------------------
After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
.. code-block:: python
exp = RetiariiExperiment(base_model, evaluator, [], search_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnist_search'
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 20
exp_config.training_service.use_active_gpu = False
exp.run(exp_config, 8081)
The complete code of this example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`. Users can also run Retiarii Experiment with `different training services <../training_services.rst>`__ besides ``local`` training service.
Visualize the Experiment
------------------------
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../Tutorial/WebUI.rst>`__ for details.
We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__). This can be used by clicking ``Visualization`` in detail panel for each trial. Note that current visualization is based on `onnx <https://onnx.ai/>`__ , thus visualization is not feasible if the model cannot be exported into onnx.
Built-in evaluators (e.g., Classification) will automatically export the model into a file. For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work. For instance,
.. code-block:: python
def evaluate_model(model_cls):
model = model_cls()
# dump the model into an onnx
if 'NNI_OUTPUT_DIR' in os.environ:
torch.onnx.export(model, (dummy_input, ),
Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
# the rest are training and evaluation
Export Top Models
-----------------
Users can export top models after the exploration is done using ``export_top_models``.
.. code-block:: python
for model_code in exp.export_top_models(formatter='dict'):
print(model_code)
The output is `json` object which records the mutation actions of the top model. If users want to output source code of the top model, they can use graph-based execution engine for the experiment, by simply adding the following two lines.
.. code-block:: python
exp_config.execution_engine = 'base'
export_formatter = 'code'
.. d6ad1b913b292469c647ca68ac158840
快速入门 Retiarii
==============================
.. contents::
在快速入门教程中,我们以 multi-trial NAS 为例来展示如何构建和探索模型空间。 神经网络架构搜索任务主要有三个关键组件,即:
* 模型搜索空间(Model search space),定义了要探索的模型集合。
* 一个适当的策略(strategy),作为探索这个搜索空间的方法。
* 一个模型评估器(model evaluator),报告一个给定模型的性能。
One-shot NAS 教程在 `这里 <./OneshotTrainer.rst>`__
.. note:: 目前,PyTorch Retiarii 唯一支持的框架,我们只用 **PyTorch 1.7 1.10** 进行了测试。 本文档基于 PyTorch 的背景,但它也应该适用于其他框架,这在我们未来的计划中。
定义模型空间
-----------------------
模型空间是由用户定义的,用来表达用户想要探索、认为包含性能良好模型的一组模型。 模型空间是由用户定义的,用来表达用户想要探索、认为包含性能良好模型的一组模型。 在这个框架中,模型空间由两部分组成:基本模型和基本模型上可能的突变。
定义基本模型
^^^^^^^^^^^^^^^^^
定义基本模型与定义 PyTorch(或 TensorFlow)模型几乎相同, 只有两个小区别。 对于 PyTorch 模块(例如 ``nn.Conv2d``, ``nn.ReLU``),将代码 ``import torch.nn as nn`` 替换为 ``import nni.retiarii.nn.pytorch as nn``
下面是定义基本模型的一个简单的示例,它与定义 PyTorch 模型几乎相同。
.. code-block:: python
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper # this decorator should be put on the out most
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
.. tip:: 记得使用 ``import nni.retiarii.nn.pytorch as nn`` :meth:`nni.retiarii.model_wrapper`. 许多错误都源于忘记使用它们。同时,对于 ``nn`` 的子模块(例如 ``nn.init``)请使用 ``torch.nn``,比如,``torch.nn.init`` 而不是 ``nn.init``
定义模型突变
^^^^^^^^^^^^^^^^^^^^^^
基本模型只是一个具体模型,而不是模型空间。 我们为用户提供 `API 和原语 <./MutationPrimitives.rst>`__,用于把基本模型变形成包含多个模型的模型空间。
基于上面定义的基本模型,我们可以这样定义一个模型空间:
.. code-block:: diff
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
- self.conv2 = nn.Conv2d(32, 64, 3, 1)
+ self.conv2 = nn.LayerChoice([
+ nn.Conv2d(32, 64, 3, 1),
+ DepthwiseSeparableConv(32, 64)
+ ])
- self.dropout1 = nn.Dropout(0.25)
+ self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
self.dropout2 = nn.Dropout(0.5)
- self.fc1 = nn.Linear(9216, 128)
- self.fc2 = nn.Linear(128, 10)
+ feature = nn.ValueChoice([64, 128, 256])
+ self.fc1 = nn.Linear(9216, feature)
+ self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
在这个例子中我们使用了两个突变 API ``nn.LayerChoice`` ``nn.ValueChoice`` ``nn.LayerChoice`` 的输入参数是一个候选模块的列表(在这个例子中是两个),每个采样到的模型会选择其中的一个,然后它就可以像一般的 PyTorch 模块一样被使用。 ``nn.ValueChoice`` 输入一系列候选的值,然后对于每个采样到的模型,其中的一个值会生效。
更多的 API 描述和用法可以请阅读 `这里 <./construct_space.rst>`__
.. note:: 我们正在积极的丰富突变 API,以简化模型空间的构建。如果我们提供的 API 不能满足您表达模型空间的需求,请阅读 `这个文档 <./Mutators.rst>`__ 以获得更多定制突变的资讯。
探索定义的模型空间
-------------------------------
简单来说,探索模型空间有两种方法:(1) 通过独立评估每个采样模型进行搜索;(2) 基于 One-Shot 的权重共享式搜索。 我们在本教程中演示了下面的第一种方法。 第二种方法可以参考 `这里 <./OneshotTrainer.rst>`__
首先,用户需要选择合适的探索策略来探索模型空间。然后,用户需要选择或自定义模型评估器来评估每个采样模型的性能。
选择搜索策略
^^^^^^^^^^^^^^^^^^^^^^^^
Retiarii 支持许多 `探索策略(exploration strategies <./ExplorationStrategies.rst>`__
简单地选择(即实例化)一个探索策略:
.. code-block:: python
import nni.retiarii.strategy as strategy
search_strategy = strategy.Random(dedup=True) # dedup=False 如果不希望有重复数据删除
选择或编写模型评估器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NAS 过程中,探索策略反复生成新模型。 模型评估器用于训练和验证每个生成的模型。 生成的模型所获得的性能被收集起来,并送至探索策略以生成更好的模型。
Retiarii 提供了诸多的 `内置模型评估器 <./ModelEvaluators.rst>`__,但是作为第一步,我们还是推荐使用 ``FunctionalEvaluator``,也就是说,将您自己的训练和测试代码用一个函数包起来。这个函数的输入参数是一个模型的类,然后使用 ``nni.report_final_result`` 来汇报模型的效果。
这里的一个例子创建了一个简单的评估器,它在 MNIST 数据集上运行,训练 2 Epoch,并报告其在验证集上的准确率。
.. code-block:: python
def evaluate_model(model_cls):
# "model_cls" 是一个类,需要初始化
model = model_cls()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
for epoch in range(3):
# 训练模型,1 epoch
train_epoch(model, device, train_loader, optimizer, epoch)
# 测试模型,1 epoch
accuracy = test_epoch(model, device, test_loader)
# 汇报中间结果,可以是 float 或者 dict 类型
nni.report_intermediate_result(accuracy)
# 汇报最终结果
nni.report_final_result(accuracy)
# 创建模型评估器
evaluator = nni.retiarii.evaluator.FunctionalEvaluator(evaluate_model)
在这里 ``train_epoch`` ``test_epoch`` 可以是任意自定义的函数,用户可以写自己的训练流程。完整的样例可以参见 :githublink:`examples/nas/multi-trial/mnist/search.py`
我们建议 ``evaluate_model`` 不接受 ``model_cls`` 以外的其他参数。但是,我们在 `高级教程 <./ModelEvaluators.rst>`__ 中展示了其他参数的用法,如果您真的需要的话。另外,我们会在未来支持这些参数的突变(这通常会成为 "超参调优")。
发起 Experiment
--------------------
一切准备就绪,就可以发起 Experiment 以进行模型搜索了。 样例如下:
.. code-block:: python
exp = RetiariiExperiment(base_model, evaluator, [], search_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnist_search'
exp_config.trial_concurrency = 2
exp_config.max_trial_number = 20
exp_config.training_service.use_active_gpu = False
exp.run(exp_config, 8081)
一个简单 MNIST 示例的完整代码在 :githublink:`这里 <examples/nas/multi-trial/mnist/search.py>` 除了本地训练平台,用户还可以在除了本地机器以外的 `不同的训练平台 <../training_services.rst>`__ 上运行 Retiarii 的实验。
可视化 Experiment
------------------------
用户可以像可视化普通的超参数调优 Experiment 一样可视化他们的 Experiment 例如,在浏览器里打开 ``localhost:8081``8081 是在 ``exp.run`` 里设置的端口。 参考 `这里 <../Tutorial/WebUI.rst>`__ 了解更多细节。
我们支持使用第三方工具(例如 `Netron <https://netron.app/>`__)可视化搜索过程中采样到的模型。您可以点击每个 trial 面板下的 ``Visualization``。注意,目前的可视化是基于导出成 `onnx <https://onnx.ai/>`__ 格式的模型实现的,所以如果模型无法导出成 onnx,那么可视化就无法进行。
内置的模型评估器(比如 Classification)已经自动将模型导出成了一个文件。如果您自定义了模型,您需要将模型导出到 ``$NNI_OUTPUT_DIR/model.onnx``。例如,
.. code-block:: python
def evaluate_model(model_cls):
model = model_cls()
# 把模型导出成 onnx
if 'NNI_OUTPUT_DIR' in os.environ:
torch.onnx.export(model, (dummy_input, ),
Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
# 剩下的就是训练和测试流程
导出最佳模型
-----------------
探索完成后,用户可以使用 ``export_top_models`` 导出最佳模型。
.. code-block:: python
for model_code in exp.export_top_models(formatter='dict'):
print(model_code)
导出的 `json` 记录的是最佳模型的突变记录。如果用户想要最佳模型的代码,可以简单的使用基于图的执行引擎,增加如下两行代码即可:
.. code-block:: python
exp_config.execution_engine = 'base'
export_formatter = 'code'
Single Path One-Shot (SPOS)
===========================
Introduction
------------
Proposed in `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase.
Examples
--------
Here is a use case, which is the search space in paper. However, we applied latency limit instead of flops limit to perform the architecture search phase.
:githublink:`Example code <examples/nas/oneshot/spos>`
Requirements
^^^^^^^^^^^^
Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__\ ). Linking it to ``data/imagenet`` will be more convenient.
Download the checkpoint file from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__\ ) if you don't want to retrain the supernet.
Put ``checkpoint-150000.pth.tar`` under ``data`` directory.
After preparation, it's expected to have the following code structure:
.. code-block:: bash
spos
├── architecture_final.json
├── blocks.py
├── data
│ ├── imagenet
│ │ ├── train
│ │ └── val
│ └── checkpoint-150000.pth.tar
├── network.py
├── readme.md
├── supernet.py
├── evaluation.py
├── search.py
└── utils.py
Step 1. Train Supernet
^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
python supernet.py
Will export the checkpoint to ``checkpoints`` directory, for the next step.
NOTE: The data loading used in the official repo is `slightly different from usual <https://github.com/megvii-model/SinglePathOneShot/issues/5>`__\ , as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option ``--spos-preprocessing`` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
Step 2. Evolution Search
^^^^^^^^^^^^^^^^^^^^^^^^
Single Path One-Shot leverages evolution algorithm to search for the best architecture. In the paper, the search module, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
In this example, we have an incomplete implementation of the evolution search. The example only support training from scratch. Inheriting weights from pretrained supernet is not supported yet. To search with the regularized evolution strategy, run
.. code-block:: bash
python search.py
The final architecture exported from every epoch of evolution can be found in ``trials`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/trials``.
Step 3. Train for Evaluation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
python evaluation.py
By default, it will use ``architecture_final.json``. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with ``--fixed-arc`` option.
Reference
---------
PyTorch
^^^^^^^
.. autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
:noindex:
Known Limitations
-----------------
* Block search only. Channel search is not supported yet.
* In the search phase, training from the scratch is required. Inheriting weights from supernet is not supported yet.
Current Reproduction Results
----------------------------
Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to `this issue <https://github.com/megvii-model/SinglePathOneShot/issues/6>`__.
* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
#####################
Construct Model Space
#####################
NNI provides powerful APIs for users to easily express model space (or search space). First, users can use mutation primitives (e.g., ValueChoice, LayerChoice) to inline a space in their model. Second, NNI provides simple interface for users to customize new mutators for expressing more complicated model spaces. In most cases, the mutation primitives are enough to express users' model spaces.
.. toctree::
:maxdepth: 1
Mutation Primitives <MutationPrimitives>
Customize Mutators <Mutators>
Hypermodule Lib <Hypermodules>
\ No newline at end of file
.. bb39a6ac0ae1f5554bc38604c77fb616
#####################
构建模型空间
#####################
NNI为用户提供了强大的API,以方便表达模型空间(或搜索空间)。 首先,用户可以使用 mutation 原语(如 ValueChoice、LayerChoice)在他们的模型中内联一个空间。 其次,NNI为用户提供了简单的接口,可以定制新的 mutators 来表达更复杂的模型空间。 在大多数情况下,mutation 原语足以表达用户的模型空间。
.. toctree::
:maxdepth: 1
mutation 原语 <MutationPrimitives>
定制 mutator <Mutators>
Hypermodule Lib <Hypermodules>
\ No newline at end of file
Multi-trial NAS
===============
In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
.. toctree::
:maxdepth: 2
Model Evaluators <ModelEvaluators>
Exploration Strategies <ExplorationStrategies>
Execution Engines <ExecutionEngines>
Serialization <Serialization>
.. 51734c9945d4eca0f9b5633929d8fadf
Multi-trial NAS
===============
在 multi-trial NAS 中,用户需要模型评估器来评估每个采样模型的性能,并且需要一个探索策略来从定义的模型空间中采样模型。 在这里,用户可以使用 NNI 提供的模型评估器或编写自己的模型评估器。 他们可以简单地选择一种探索策略。 高级用户还可以自定义新的探索策略。 关于如何运行 multi-trial NAS 实验的简单例子,请参考 `快速入门 <./QuickStart.rst>`__。
.. toctree::
:maxdepth: 1
模型评估器 <ModelEvaluators>
探索策略 <ExplorationStrategies>
执行引擎 <ExecutionEngines>
序列化 <Serialization>
One-shot NAS
============
One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Multi-trial NAS"). NNI has supported many popular One-shot NAS algorithms as following.
.. toctree::
:maxdepth: 1
Run One-shot NAS <OneshotTrainer>
ENAS <ENAS>
DARTS <DARTS>
SPOS <SPOS>
ProxylessNAS <Proxylessnas>
FBNet <FBNet>
Customize One-shot NAS <WriteOneshot>
.. c9ab8a1f91c587ad72d66b6c43e06528
One-shot NAS
============
One-Shot NAS 算法利用了搜索空间中模型间的权重共享来训练超网络,并使用超网络来指导选择出更好的模型。 与从头训练每个模型(我们称之为 "Multi-trial NAS")算法相比,此类算法大大减少了使用的计算资源。 NNI 支持下列流行的 One-Shot NAS 算法。
.. toctree::
:maxdepth: 1
运行 One-shot NAS <OneshotTrainer>
ENAS <ENAS>
DARTS <DARTS>
SPOS <SPOS>
ProxylessNAS <Proxylessnas>
FBNet <FBNet>
自定义 One-shot NAS <WriteOneshot>
\ No newline at end of file
.. 6e45ee0ddd5d0315e5c946149d4f9c31
概述
========
NNI (Neural Network Intelligence) 是一个工具包,可有效的帮助用户设计并调优机器学习模型的神经网络架构,复杂系统的参数(如超参)等。 NNI 的特性包括:易于使用,可扩展,灵活,高效。
* **易于使用**:NNI 可通过 pip 安装。 只需要在代码中添加几行,就可以利用 NNI 来调优参数。 可使用命令行工具或 Web 界面来查看 Experiment。
* **可扩展**:调优超参或网络结构通常需要大量的计算资源。NNI 在设计时就支持了多种不同的计算资源,如远程服务器组,训练平台(如:OpenPAI,Kubernetes),等等。 根据您配置的培训平台的能力,可以并行运行数百个 Trial 。
* **灵活**:除了内置的算法,NNI 中还可以轻松集成自定义的超参调优算法,神经网络架构搜索算法,提前终止算法等等。 还可以将 NNI 连接到更多的训练平台上,如云中的虚拟机集群,Kubernetes 服务等等。 此外,NNI 还可以连接到外部环境中的特殊应用和模型上。
* **高效**:NNI 在系统及算法级别上不断地进行优化。 例如:通过早期的反馈来加速调优过程。
下图显示了 NNI 的体系结构。
.. raw:: html
<p align="center">
<img src="https://user-images.githubusercontent.com/16907603/92089316-94147200-ee00-11ea-9944-bf3c4544257f.png" alt="drawing" width="700"/>
</p>
主要概念
------------
*
*Experiment(实验)*: 表示一次任务,例如,寻找模型的最佳超参组合,或最好的神经网络架构等。 它由 Trial 和自动机器学习算法所组成。
*
*搜索空间*:是模型调优的范围。 例如,超参的取值范围。
*
*Configuration(配置)*:配置是来自搜索空间的实例,每个超参都会有特定的值。
*
*Trial*:是一次独立的尝试,它会使用某组配置(例如,一组超参值,或者特定的神经网络架构)。 Trial 会基于提供的配置来运行。
*
*Tuner(调优器)*:一种自动机器学习算法,会为下一个 Trial 生成新的配置。 新的 Trial 会使用这组配置来运行。
*
*Assessor(评估器)*:分析 Trial 的中间结果(例如,定期评估数据集上的精度),来确定 Trial 是否应该被提前终止。
*
*训练平台*:是 Trial 的执行环境。 根据 Experiment 的配置,可以是本机,远程服务器组,或其它大规模训练平台(如,OpenPAI,Kubernetes)。
Experiment 的运行过程为:Tuner 接收搜索空间并生成配置。 这些配置将被提交到训练平台,如本机,远程服务器组或训练集群。 执行的性能结果会被返回给 Tuner。 然后,再生成并提交新的配置。
每次 Experiment 执行时,用户只需要定义搜索空间,改动几行代码,就能利用 NNI 内置的 Tuner/Assessor 和训练平台来搜索最好的超参组合以及神经网络结构。 基本上分为三步:
..
步骤一:`定义搜索空间 <Tutorial/SearchSpaceSpec.rst>`__
步骤二:`改动模型代码 <TrialExample/Trials.rst>`__
步骤三:`定义实验配置 <Tutorial/ExperimentConfig.rst>`__
.. raw:: html
<p align="center">
<img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/>
</p>
可查看 `快速入门 <Tutorial/QuickStart.rst>`__ 来调优你的模型或系统。
核心功能
-------------
NNI 提供了并行运行多个实例以查找最佳参数组合的能力。 此功能可用于各种领域,例如,为深度学习模型查找最佳超参数,或查找具有真实数据的数据库和其他复杂系统的最佳配置。
NNI 还希望提供用于机器学习和深度学习的算法工具包,尤其是神经体系结构搜索(NAS)算法,模型压缩算法和特征工程算法。
超参调优
^^^^^^^^^^^^^^^^^^^^^
这是 NNI 最核心、基本的功能,其中提供了许多流行的 `自动调优算法 <Tuner/BuiltinTuner.rst>`__ (即 Tuner) 以及 `提前终止算法 <Assessor/BuiltinAssessor.rst>`__ (即 Assessor)。 可查看 `快速入门 <Tutorial/QuickStart.rst>`__ 来调优你的模型或系统。 基本上通过以上三步,就能开始 NNI Experiment。
通用 NAS 框架
^^^^^^^^^^^^^^^^^^^^^
此 NAS 框架可供用户轻松指定候选的神经体系结构,例如,可以为单个层指定多个候选操作(例如,可分离的 conv、扩张 conv),并指定可能的跳过连接。 NNI 将自动找到最佳候选。 另一方面,NAS 框架为其他类型的用户(如,NAS 算法研究人员)提供了简单的接口,以实现新的 NAS 算法。 NAS 详情及用法参考 `这里 <NAS/Overview.rst>`__。
NNI 通过 Trial SDK 支持多种 one-shot(一次性) NAS 算法,如:ENAS、DARTS。 使用这些算法时,不需启动 NNI Experiment。 在 Trial 代码中加入算法,直接运行即可。 如果要调整算法中的超参数,或运行多个实例,可以使用 Tuner 并启动 NNI Experiment。
除了 one-shot NAS 外,NAS 还能以 NNI 模式运行,其中每个候选的网络结构都作为独立 Trial 任务运行。 在此模式下,与超参调优类似,必须启动 NNI Experiment 并为 NAS 选择 Tuner。
模型压缩
^^^^^^^^^^^^^^^^^
NNI 提供了一个易于使用的模型压缩框架来压缩深度神经网络,压缩后的网络通常具有更小的模型尺寸和更快的推理速度,
模型性能也不会有明显的下降。 NNI 上的模型压缩包括剪枝和量化算法。 这些算法通过 NNI Trial SDK 提供
。 可以直接在 Trial 代码中使用,并在不启动 NNI Experiment 的情况下运行 Trial 代码。 用户还可以使用 NNI 模型压缩框架集成自定义的剪枝和量化算法。
模型压缩的详细说明和算法可在 `这里 <Compression/Overview.rst>`__ 找到。
自动特征工程
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
自动特征工程,可以为下游任务找到最有效的特征。 自动特征工程及其用法的详细说明可在 `这里 <FeatureEngineering/Overview.rst>`__ 找到。 通过 NNI Trial SDK 支持,不必创建 NNI Experiment, 只需在 Trial 代码中加入内置的自动特征工程算法,然后直接运行 Trial 代码。
自动特征工程算法通常有一些超参。 如果要自动调整这些超参,可以利用 NNI 的超参数调优,即选择调优算法(即 Tuner)并启动 NNI Experiment。
了解更多信息
--------------------
* `入门 <Tutorial/QuickStart.rst>`__
* `如何为 NNI 调整代码? <TrialExample/Trials.rst>`__
* `NNI 支持哪些 Tuner? <Tuner/BuiltinTuner.rst>`__
* `如何自定义 Tuner? <Tuner/CustomizeTuner.rst>`__
* `NNI 支持哪些 Assessor? <Assessor/BuiltinAssessor.rst>`__
* `如何自定义 Assessor? <Assessor/CustomizeAssessor.rst>`__
* `如何在本机上运行 Experiment? <TrainingService/LocalMode.rst>`__
* `如何在多机上运行 Experiment? <TrainingService/RemoteMachineMode.rst>`__
* `如何在 OpenPAI 上运行 Experiment? <TrainingService/PaiMode.rst>`__
* `示例 <TrialExample/MnistExamples.rst>`__
* `NNI 上的神经网络架构搜索 <NAS/Overview.rst>`__
* `NNI 上的自动模型压缩 <Compression/Overview.rst>`__
* `NNI 上的自动特征工程 <FeatureEngineering/Overview.rst>`__
**Run an Experiment on DLTS**
=================================
NNI supports running an experiment on `DLTS <https://github.com/microsoft/DLWorkspace.git>`__\ , called dlts mode. Before starting to use NNI dlts mode, you should have an account to access DLTS dashboard.
Setup Environment
-----------------
Step 1. Choose a cluster from DLTS dashboard, ask administrator for the cluster dashboard URL.
.. image:: ../../img/dlts-step1.png
:target: ../../img/dlts-step1.png
:alt: Choose Cluster
Step 2. Prepare a NNI config YAML like the following:
.. code-block:: yaml
# Set this field to "dlts"
trainingServicePlatform: dlts
authorName: your_name
experimentName: auto_mnist
trialConcurrency: 2
maxExecDuration: 3h
maxTrialNum: 100
searchSpacePath: search_space.json
useAnnotation: false
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 1
image: msranni/nni
# Configuration to access DLTS
dltsConfig:
dashboard: # Ask administrator for the cluster dashboard URL
Remember to fill the cluster dashboard URL to the last line.
Step 3. Open your working directory of the cluster, paste the NNI config as well as related code to a directory.
.. image:: ../../img/dlts-step3.png
:target: ../../img/dlts-step3.png
:alt: Copy Config
Step 4. Submit a NNI manager job to the specified cluster.
.. image:: ../../img/dlts-step4.png
:target: ../../img/dlts-step4.png
:alt: Submit Job
Step 5. Go to Endpoints tab of the newly created job, click the Port 40000 link to check trial's information.
.. image:: ../../img/dlts-step5.png
:target: ../../img/dlts-step5.png
:alt: View NNI WebUI
**Tutorial: Create and Run an Experiment on local with NNI API**
================================================================
In this tutorial, we will use the example in [nni/examples/trials/mnist-pytorch] to explain how to create and run an experiment on local with NNI API.
..
Before starts
You have an implementation for MNIST classifer using convolutional layers, the Python code is similar to ``mnist.py``.
..
Step 1 - Update model codes
To enable NNI API, make the following changes:
1.1 Declare NNI API: include ``import nni`` in your trial code to use NNI APIs.
1.2 Get predefined parameters
Use the following code snippet:
.. code-block:: python
tuner_params = nni.get_next_parameter()
to get hyper-parameters' values assigned by tuner. ``tuner_params`` is an object, for example:
.. code-block:: json
{"batch_size": 32, "hidden_size": 128, "lr": 0.01, "momentum": 0.2029}
..
1.3 Report NNI results: Use the API: ``nni.report_intermediate_result(accuracy)`` to send ``accuracy`` to assessor. Use the API: ``nni.report_final_result(accuracy)`` to send `accuracy` to tuner.
**NOTE**\ :
.. code-block:: bash
accuracy - The `accuracy` could be any python object, but if you use NNI built-in tuner/assessor, `accuracy` should be a numerical variable (e.g. float, int).
tuner - The tuner will generate next parameters/architecture based on the explore history (final result of all trials).
assessor - The assessor will decide which trial should early stop based on the history performance of trial (intermediate result of one trial).
..
Step 2 - Define SearchSpace
The hyper-parameters used in ``Step 1.2 - Get predefined parameters`` is defined in a ``search_space.json`` file like below:
.. code-block:: bash
{
"batch_size": {"_type":"choice", "_value": [16, 32, 64, 128]},
"hidden_size":{"_type":"choice","_value":[128, 256, 512, 1024]},
"lr":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]},
"momentum":{"_type":"uniform","_value":[0, 1]}
}
Refer to `define search space <../Tutorial/SearchSpaceSpec.rst>`__ to learn more about search space.
..
Step 3 - Define Experiment
..
To run an experiment in NNI, you only needed:
* Provide a runnable trial
* Provide or choose a tuner
* Provide a YAML experiment configure file
* (optional) Provide or choose an assessor
**Prepare trial**\ :
..
You can download nni source code and a set of examples can be found in ``nni/examples``, run ``ls nni/examples/trials`` to see all the trial examples.
Let's use a simple trial example, e.g. mnist, provided by NNI. After you cloned NNI source, NNI examples have been put in ~/nni/examples, run ``ls ~/nni/examples/trials`` to see all the trial examples. You can simply execute the following command to run the NNI mnist example:
.. code-block:: bash
python ~/nni/examples/trials/mnist-pytorch/mnist.py
This command will be filled in the YAML configure file below. Please refer to `here <../TrialExample/Trials.rst>`__ for how to write your own trial.
**Prepare tuner**\ : NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to `here <../Tuner/CustomizeTuner.rst>`__\ ), but for simplicity, here we choose a tuner provided by NNI as below:
.. code-block:: bash
tuner:
name: TPE
classArgs:
optimize_mode: maximize
*name* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found `here <../Tuner/BuiltinTuner.rst>`__\ ), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**\ : Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, ``cat ~/nni/examples/trials/mnist-pytorch/config.yml`` to see it. Its content is basically shown below:
.. code-block:: yaml
experimentName: local training service example
searchSpaceFile ~/nni/examples/trials/mnist-pytorch/search_space.json
trailCommand: python3 mnist.py
trialCodeDirectory: ~/nni/examples/trials/mnist-pytorch
trialGpuNumber: 0
trialConcurrency: 1
maxExperimentDuration: 3h
maxTrialNumber: 10
trainingService:
platform: local
tuner:
name: TPE
classArgs:
optimize_mode: maximize
With all these steps done, we can run the experiment with the following command:
.. code-block:: bash
nnictl create --config ~/nni/examples/trials/mnist-pytorch/config.yml
You can refer to `here <../Tutorial/Nnictl.rst>`__ for more usage guide of *nnictl* command line tool.
View experiment results
-----------------------
The experiment has been running now. Other than *nnictl*\ , NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features.
Using multiple local GPUs to speed up search
--------------------------------------------
The following steps assume that you have 4 NVIDIA GPUs installed at local and PyTorch with CUDA support. The demo enables 4 concurrent trail jobs and each trail job uses 1 GPU.
**Prepare configure file**\ : NNI provides a demo configuration file for the setting above, ``cat ~/nni/examples/trials/mnist-pytorch/config_detailed.yml`` to see it. The trailConcurrency and trialGpuNumber are different from the basic configure file:
.. code-block:: bash
...
trialGpuNumber: 1
trialConcurrency: 4
...
trainingService:
platform: local
useActiveGpu: false # set to "true" if you are using graphical OS like Windows 10 and Ubuntu desktop
We can run the experiment with the following command:
.. code-block:: bash
nnictl create --config ~/nni/examples/trials/mnist-pytorch/config_detailed.yml
You can use *nnictl* command line tool or WebUI to trace the training progress. *nvidia_smi* command line tool can also help you to monitor the GPU usage during training.
.. 263c2dcfaee0c2fd06c19b5e90b96374
.. role:: raw-html(raw)
:format: html
实现 NNI 的 Trial(试验)代码
=================================
**Trial(试验)** 是将一组参数组合(例如,超参)在模型上独立的一次尝试。
定义 NNI 的 Trial,需要首先定义参数组(例如,搜索空间),并更新模型代码。 有两种方法来定义一个 Trial:`NNI Python API <#nni-api>`__ 和 `NNI Python annotation <#nni-annotation>`__。 参考 `这里 <#more-examples>`__ 更多 Trial 示例。
:raw-html:`<a name="nni-api"></a>`
NNI Trial API
-------------
第一步:准备搜索空间参数文件。
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
样例如下:
.. code-block:: json
{
"dropout_rate":{"_type":"uniform","_value":[0.1,0.5]},
"conv_size":{"_type":"choice","_value":[2,3,5,7]},
"hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
"learning_rate":{"_type":"uniform","_value":[0.0001, 0.1]}
}
参考 `SearchSpaceSpec.rst <../Tutorial/SearchSpaceSpec.rst>`__ 进一步了解搜索空间。 Tuner 会根据搜索空间来生成配置,即从每个超参的范围中选一个值。
第二步:更新模型代码
^^^^^^^^^^^^^^^^^^^^^^^^^^
* Import NNI
在 Trial 代码中加上 ``import nni`` 。
* 从 Tuner 获得参数值
.. code-block:: python
RECEIVED_PARAMS = nni.get_next_parameter()
``RECEIVED_PARAMS`` 是一个对象,如:
``{"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}``
* 定期返回指标数据(可选)
.. code-block:: python
nni.report_intermediate_result(metrics)
``指标`` 可以是任意的 Python 对象。 如果使用了 NNI 内置的 Tuner/Assessor,``指标`` 只可以是两种类型:1) 数值类型,如 float、int, 2) dict 对象,其中必须有键名为 ``default`` ,值为数值的项目。 ``指标`` 会发送给 `assessor <../Assessor/BuiltinAssessor.rst>`__。 通常,``指标`` 包含了定期评估的损失值或精度。
* 返回配置的最终性能
.. code-block:: python
nni.report_final_result(metrics)
``指标`` 可以是任意的 Python 对象。 如果使用了内置的 Tuner/Assessor,``指标`` 格式和 ``report_intermediate_result`` 中一样,这个数值表示模型的性能,如精度、损失值等。 ``指标`` 会发送给 `tuner <../Tuner/BuiltinTuner.rst>`__。
第三步:启动 NNI Experiment (实验)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
启动 NNI 实验,提供搜索空间文件的路径,即第一步中定义的文件:
.. code-block:: yaml
searchSpacePath: /path/to/your/search_space.json
参考 `这里 <../Tutorial/ExperimentConfig.rst>`__ 进一步了解如何配置 Experiment。
参考 `这里 <../sdk_reference.rst>`__ ,了解更多 NNI Trial API (例如:``nni.get_sequence_id()``)。
:raw-html:`<a name="nni-annotation"></a>`
NNI Annotation
---------------------
另一种实现 Trial 的方法是使用 Python 注释来标记 NNI。 NNI Annotation 很简单,类似于注释。 不必对现有代码进行结构更改。 只需要添加一些 NNI Annotation,就能够:
* 标记需要调整的参数变量
* 指定要在其中调整的变量的范围
* 标记哪个变量需要作为中间结果范围给 ``assessor``
* 标记哪个变量需要作为最终结果(例如:模型精度) 返回给 ``tuner``
同样以 MNIST 为例,只需要两步就能用 NNI Annotation 来实现 Trial 代码。
第一步:在代码中加入 Annotation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
下面是加入了 Annotation 的 TensorFlow 代码片段,高亮的 4 行 Annotation 用于:
#. 调优 batch_size 和 dropout_rate
#. 每执行 100 步返回 test_acc
#. 最后返回 test_acc 作为最终结果。
值得注意的是,新添加的代码都是注释,不会影响以前的执行逻辑。因此这些代码仍然能在没有安装 NNI 的环境中运行。
.. code-block:: diff
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
+ """@nni.variable(nni.choice(50, 250, 500), name=batch_size)"""
batch_size = 128
for i in range(10000):
batch = mnist.train.next_batch(batch_size)
+ """@nni.variable(nni.choice(0.1, 0.5), name=dropout_rate)"""
dropout_rate = 0.5
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
mnist_network.labels: batch[1],
mnist_network.keep_prob: dropout_rate})
if i % 100 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
+ """@nni.report_intermediate_result(test_acc)"""
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
+ """@nni.report_final_result(test_acc)"""
**注意**:
* ``@nni.variable`` 会对它的下面一行进行修改,左边被赋值变量必须与 ``@nni.variable`` 的关键字 ``name`` 相同。
* ``@nni.report_intermediate_result``\ /\ ``@nni.report_final_result`` 会将数据发送给 assessor/tuner。
Annotation 的语法和用法等,参考 `Annotation <../Tutorial/AnnotationSpec.rst>`__。
第二步:启用 Annotation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
在 YAML 配置文件中设置 *useAnnotation* 为 true 来启用 Annotation:
.. code-block:: bash
useAnnotation: true
用于调试的独立模式
-----------------------------
NNI 支持独立模式,使 Trial 代码无需启动 NNI 实验即可运行。 这样能更容易的找出 Trial 代码中的 Bug。 NNI Annotation 天然支持独立模式,因为添加的 NNI 相关的行都是注释的形式。 NNI Trial API 在独立模式下的行为有所变化,某些 API 返回虚拟值,而某些 API 不报告值。 有关这些 API 的完整列表,请参阅下表。
.. code-block:: python
# 注意:请为 Trial 代码中的超参分配默认值
nni.get_next_parameter # 返回 {}
nni.report_final_result # 已在 stdout 上打印日志,但不报告
nni.report_intermediate_result # 已在 stdout 上打印日志,但不报告
nni.get_experiment_id # 返回 "STANDALONE"
nni.get_trial_id # 返回 "STANDALONE"
nni.get_sequence_id # 返回 0
可使用 :githublink:`mnist 示例 <examples/trials/mnist-pytorch>` 来尝试独立模式。 只需在代码目录下运行 ``python3 mnist.py``。 Trial 代码会使用默认超参成功运行。
更多调试的信息,可参考 `How to Debug <../Tutorial/HowToDebug.rst>`__。
Trial 存放在什么地方?
----------------------------------------
本机模式
^^^^^^^^^^
每个 Trial 都有单独的目录来输出自己的数据。 在每次 Trial 运行后,环境变量 ``NNI_OUTPUT_DIR`` 定义的目录都会被导出。 在这个目录中可以看到 Trial 的代码、数据和日志。 此外,Trial 的日志(包括 stdout)还会被重定向到此目录中的 ``trial.log`` 文件。
如果使用了 Annotation 方法,转换后的 Trial 代码会存放在另一个临时目录中。 可以在 ``run.sh`` 文件中的 ``NNI_OUTPUT_DIR`` 变量找到此目录。 文件中的第二行(即:``cd``)会切换到代码所在的实际路径。 ``run.sh`` 文件示例:
.. code-block:: bash
#!/bin/bash
cd /tmp/user_name/nni/annotation/tmpzj0h72x6 #This is the actual directory
export NNI_PLATFORM=local
export NNI_SYS_DIR=/home/user_name/nni-experiments/$experiment_id$/trials/$trial_id$
export NNI_TRIAL_JOB_ID=nrbb2
export NNI_OUTPUT_DIR=/home/user_name/nni-experiments/$eperiment_id$/trials/$trial_id$
export NNI_TRIAL_SEQ_ID=1
export MULTI_PHASE=false
export CUDA_VISIBLE_DEVICES=
eval python3 mnist.py 2>/home/user_name/nni-experiments/$experiment_id$/trials/$trial_id$/stderr
echo $? `date +%s%3N` >/home/user_name/nni-experiments/$experiment_id$/trials/$trial_id$/.nni/state
其它模式
^^^^^^^^^^^
当 Trial 运行在 OpenPAI 这样的远程服务器上时,``NNI_OUTPUT_DIR`` 仅会指向 Trial 的输出目录,而 ``run.sh`` 不会在此目录中。 ``trial.log`` 文件会被复制回本机的 Trial 目录中。目录的默认位置在 ``~/nni-experiments/$experiment_id$/trials/$trial_id$/``。
更多调试的信息,可参考 `How to Debug <../Tutorial/HowToDebug.rst>`__。
:raw-html:`<a name="more-examples"></a>`
更多 Trial 的示例
-------------------
* `将日志写入 TensorBoard 的 Trial 输出目录 <../Tutorial/Tensorboard.rst>`__
* `MNIST 示例 <MnistExamples.rst>`__
* `为 CIFAR 10 分类找到最佳的 optimizer <Cifar10Examples.rst>`__
* `如何在 NNI 调优 SciKit-learn 的参数 <SklearnExamples.rst>`__
* `在阅读理解上使用自动模型架构搜索。 <SquadEvolutionExamples.rst>`__
* `如何在 NNI 上调优 GBDT <GbdtExample.rst>`__
* `在 NNI 上调优 RocksDB <RocksdbExamples.rst>`__
Anneal Tuner
============
This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
Usage
-----
classArgs Requirements
^^^^^^^^^^^^^^^^^^^^^^
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
# config.yml
tuner:
name: Anneal
classArgs:
optimize_mode: maximize
Batch Tuner
===========
Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type ``choice`` in the `search space spec <../Tutorial/SearchSpaceSpec.rst>`__.
Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using ``choice``) and run them using the batch tuner.
Usage
-----
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
# config.yml
tuner:
name: BatchTuner
Note that the search space for BatchTuner should look like:
.. code-block:: json
{
"combine_params":
{
"_type" : "choice",
"_value" : [{"optimizer": "Adam", "learning_rate": 0.00001},
{"optimizer": "Adam", "learning_rate": 0.0001},
{"optimizer": "Adam", "learning_rate": 0.001},
{"optimizer": "SGD", "learning_rate": 0.01},
{"optimizer": "SGD", "learning_rate": 0.005},
{"optimizer": "SGD", "learning_rate": 0.0002}]
}
}
The search space file should include the high-level key ``combine_params``. The type of params in the search space must be ``choice`` and the ``values`` must include all the combined params values.
Grid Search Tuner
=================
Grid Search performs an exhaustive search through a search space.
For uniform and normal distributed parameters, grid search tuner samples them at progressively decreased intervals.
Usage
-----
Grid search tuner has no argument.
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
tuner:
name: GridSearch
Network Morphism Tuner
======================
`Autokeras <https://arxiv.org/abs/1806.10282>`__ is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its `code <https://github.com/jhfjhfj1/autokeras>`__, we implemented our Network Morphism method on the NNI platform.
If you want to know more about network morphism trial usage, please see the :githublink:`Readme.md <examples/trials/network_morphism/README.rst>`.
Usage
-----
Installation
^^^^^^^^^^^^
NetworkMorphism requires :githublink:`PyTorch <examples/trials/network_morphism/requirements.txt>`.
classArgs Requirements
^^^^^^^^^^^^^^^^^^^^^^
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
* **task** (*('cv'), optional, default = 'cv'*) - The domain of the experiment. For now, this tuner only supports the computer vision (CV) domain.
* **input_width** (*int, optional, default = 32*) - input image width
* **input_channel** (*int, optional, default = 3*) - input image channel
* **n_output_node** (*int, optional, default = 10*) - number of classes
Config File
^^^^^^^^^^^
To use Network Morphism, you should modify the following spec in your ``config.yml`` file:
.. code-block:: yaml
tuner:
#choice: NetworkMorphism
name: NetworkMorphism
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
#for now, this tuner only supports cv domain
task: cv
#modify to fit your input image width
input_width: 32
#modify to fit your input image channel
input_channel: 3
#modify to fit your number of classes
n_output_node: 10
Example Configuration
^^^^^^^^^^^^^^^^^^^^^
.. code-block:: yaml
# config.yml
tuner:
name: NetworkMorphism
classArgs:
optimize_mode: maximize
task: cv
input_width: 32
input_channel: 3
n_output_node: 10
In the training procedure, it generates a JSON file which represents a Network Graph. Users can call the "json_to_graph()" function to build a PyTorch or Keras model from this JSON file.
.. code-block:: python
import nni
from nni.networkmorphism_tuner.graph import json_to_graph
def build_graph_from_json(ir_model_json):
"""build a pytorch model from json representation
"""
graph = json_to_graph(ir_model_json)
model = graph.produce_torch_model()
return model
# trial get next parameter from network morphism tuner
RCV_CONFIG = nni.get_next_parameter()
# call the function to build pytorch model or keras model
net = build_graph_from_json(RCV_CONFIG)
# training procedure
# ....
# report the final accuracy to NNI
nni.report_final_result(best_acc)
If you want to save and load the **best model**, the following methods are recommended.
.. code-block:: python
# 1. Use NNI API
## You can get the best model ID from WebUI
## or `nni-experiments/experiment_id/log/model_path/best_model.txt'
## read the json string from model file and load it with NNI API
with open("best-model.json") as json_file:
json_of_model = json_file.read()
model = build_graph_from_json(json_of_model)
# 2. Use Framework API (Related to Framework)
## 2.1 Keras API
## Save the model with Keras API in the trial code
## it's better to save model with id in nni local mode
model_id = nni.get_sequence_id()
## serialize model to JSON
model_json = model.to_json()
with open("model-{}.json".format(model_id), "w") as json_file:
json_file.write(model_json)
## serialize weights to HDF5
model.save_weights("model-{}.h5".format(model_id))
## Load the model with Keras API if you want to reuse the model
## load json and create model
model_id = "" # id of the model you want to reuse
with open('model-{}.json'.format(model_id), 'r') as json_file:
loaded_model_json = json_file.read()
loaded_model = model_from_json(loaded_model_json)
## load weights into new model
loaded_model.load_weights("model-{}.h5".format(model_id))
## 2.2 PyTorch API
## Save the model with PyTorch API in the trial code
model_id = nni.get_sequence_id()
torch.save(model, "model-{}.pt".format(model_id))
## Load the model with PyTorch API if you want to reuse the model
model_id = "" # id of the model you want to reuse
loaded_model = torch.load("model-{}.pt".format(model_id))
File Structure
--------------
The tuner has a lot of different files, functions, and classes. Here, we will give most of those files only a brief introduction:
*
``networkmorphism_tuner.py`` is a tuner which uses network morphism techniques.
*
``bayesian.py`` is a Bayesian method to estimate the metric of unseen model based on the models we have already searched.
* ``graph.py`` is the meta graph data structure. The class Graph represents the neural architecture graph of a model.
* Graph extracts the neural architecture graph from a model.
* Each node in the graph is an intermediate tensor between layers.
* Each layer is an edge in the graph.
* Notably, multiple edges may refer to the same layer.
*
``graph_transformer.py`` includes some graph transformers which widen, deepen, or add skip-connections to the graph.
*
``layers.py`` includes all the layers we use in our model.
* ``layer_transformer.py`` includes some layer transformers which widen, deepen, or add skip-connections to the layer.
* ``nn.py`` includes the class which generates the initial network.
* ``metric.py`` some metric classes including Accuracy and MSE.
* ``utils.py`` is the example search network architectures for the ``cifar10`` dataset, using Keras.
The Network Representation Json Example
---------------------------------------
Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call the "json_to_graph()" function in the trial code to build a PyTorch or Keras model from this JSON file.
.. code-block:: json
{
"input_shape": [32, 32, 3],
"weighted": false,
"operation_history": [],
"layer_id_to_input_node_ids": {"0": [0],"1": [1],"2": [2],"3": [3],"4": [4],"5": [5],"6": [6],"7": [7],"8": [8],"9": [9],"10": [10],"11": [11],"12": [12],"13": [13],"14": [14],"15": [15],"16": [16]
},
"layer_id_to_output_node_ids": {"0": [1],"1": [2],"2": [3],"3": [4],"4": [5],"5": [6],"6": [7],"7": [8],"8": [9],"9": [10],"10": [11],"11": [12],"12": [13],"13": [14],"14": [15],"15": [16],"16": [17]
},
"adj_list": {
"0": [[1, 0]],
"1": [[2, 1]],
"2": [[3, 2]],
"3": [[4, 3]],
"4": [[5, 4]],
"5": [[6, 5]],
"6": [[7, 6]],
"7": [[8, 7]],
"8": [[9, 8]],
"9": [[10, 9]],
"10": [[11, 10]],
"11": [[12, 11]],
"12": [[13, 12]],
"13": [[14, 13]],
"14": [[15, 14]],
"15": [[16, 15]],
"16": [[17, 16]],
"17": []
},
"reverse_adj_list": {
"0": [],
"1": [[0, 0]],
"2": [[1, 1]],
"3": [[2, 2]],
"4": [[3, 3]],
"5": [[4, 4]],
"6": [[5, 5]],
"7": [[6, 6]],
"8": [[7, 7]],
"9": [[8, 8]],
"10": [[9, 9]],
"11": [[10, 10]],
"12": [[11, 11]],
"13": [[12, 12]],
"14": [[13, 13]],
"15": [[14, 14]],
"16": [[15, 15]],
"17": [[16, 16]]
},
"node_list": [
[0, [32, 32, 3]],
[1, [32, 32, 3]],
[2, [32, 32, 64]],
[3, [32, 32, 64]],
[4, [16, 16, 64]],
[5, [16, 16, 64]],
[6, [16, 16, 64]],
[7, [16, 16, 64]],
[8, [8, 8, 64]],
[9, [8, 8, 64]],
[10, [8, 8, 64]],
[11, [8, 8, 64]],
[12, [4, 4, 64]],
[13, [64]],
[14, [64]],
[15, [64]],
[16, [64]],
[17, [10]]
],
"layer_list": [
[0, ["StubReLU", 0, 1]],
[1, ["StubConv2d", 1, 2, 3, 64, 3]],
[2, ["StubBatchNormalization2d", 2, 3, 64]],
[3, ["StubPooling2d", 3, 4, 2, 2, 0]],
[4, ["StubReLU", 4, 5]],
[5, ["StubConv2d", 5, 6, 64, 64, 3]],
[6, ["StubBatchNormalization2d", 6, 7, 64]],
[7, ["StubPooling2d", 7, 8, 2, 2, 0]],
[8, ["StubReLU", 8, 9]],
[9, ["StubConv2d", 9, 10, 64, 64, 3]],
[10, ["StubBatchNormalization2d", 10, 11, 64]],
[11, ["StubPooling2d", 11, 12, 2, 2, 0]],
[12, ["StubGlobalPooling2d", 12, 13]],
[13, ["StubDropout2d", 13, 14, 0.25]],
[14, ["StubDense", 14, 15, 64, 64]],
[15, ["StubReLU", 15, 16]],
[16, ["StubDense", 16, 17, 64, 10]]
]
}
You can consider the model to be a `directed acyclic graph <https://en.wikipedia.org/wiki/Directed_acyclic_graph>`__. The definition of each model is a JSON object where:
* ``input_shape`` is a list of integers which do not include the batch axis.
* ``weighted`` means whether the weights and biases in the neural network should be included in the graph.
* ``operation_history`` is a list saving all the network morphism operations.
* ``layer_id_to_input_node_ids`` is a dictionary mapping from layer identifiers to their input nodes identifiers.
* ``layer_id_to_output_node_ids`` is a dictionary mapping from layer identifiers to their output nodes identifiers
* ``adj_list`` is a two-dimensional list; the adjacency list of the graph. The first dimension is identified by tensor identifiers. In each edge list, the elements are two-element tuples of (tensor identifier, layer identifier).
* ``reverse_adj_list`` is a reverse adjacent list in the same format as adj_list.
* ``node_list`` is a list of integers. The indices of the list are the identifiers.
*
``layer_list`` is a list of stub layers. The indices of the list are the identifiers.
*
For ``StubConv (StubConv1d, StubConv2d, StubConv3d)``, the numbering follows the format: its node input id (or id list), node output id, input_channel, filters, kernel_size, stride, and padding.
*
For ``StubDense``, the numbering follows the format: its node input id (or id list), node output id, input_units, and units.
*
For ``StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)``, the numbering follows the format: its node input id (or id list), node output id, and features numbers.
*
For ``StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)``, the numbering follows the format: its node input id (or id list), node output id, and dropout rate.
*
For ``StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)``, the numbering follows the format: its node input id (or id list), node output id, kernel_size, stride, and padding.
*
For else layers, the numbering follows the format: its node input id (or id list) and node output id.
TODO
----
Next step, we will change the API from s fixed network generator to a network generator with more available operators. We will use ONNX instead of JSON later as the intermediate representation spec in the future.
Random Tuner
============
In `Random Search for Hyper-Parameter Optimization <http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>`__ we show that Random Search might be surprisingly effective despite its simplicity.
We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
Usage
-----
Example Configuration
.. code-block:: yaml
tuner:
name: Random
classArgs:
seed: 100 # optional
TPE Tuner
=========
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach.
SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements,
and then subsequently choose new hyperparameters to test based on this model.
The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric.
P(x|y) is modeled by transforming the generative process of hyperparameters,
replacing the distributions of the configuration prior with non-parametric densities.
This optimization approach is described in detail in `Algorithms for Hyper-Parameter Optimization <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__.
Parallel TPE optimization
^^^^^^^^^^^^^^^^^^^^^^^^^
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete.
The original algorithm design was optimized for sequential computation.
If we were to use TPE with much concurrency, its performance will be bad.
We have optimized this case using the Constant Liar algorithm.
For these principles of optimization, please refer to our `research blog <../CommunitySharings/ParallelizingTpeSearch.rst>`__.
Usage
-----
To use TPE, you should add the following spec in your experiment's YAML config file:
.. code-block:: yaml
## minimal config ##
tuner:
name: TPE
classArgs:
optimize_mode: minimize
.. code-block:: yaml
## advanced config ##
tuner:
name: TPE
classArgs:
optimize_mode: maximize
seed: 12345
tpe_args:
constant_liar_type: 'mean'
n_startup_jobs: 10
n_ei_candidates: 20
linear_forgetting: 100
prior_weight: 0
gamma: 0.5
classArgs
^^^^^^^^^
.. list-table::
:widths: 10 20 10 60
:header-rows: 1
* - Field
- Type
- Default
- Description
* - ``optimize_mode``
- ``'minimize' | 'maximize'``
- ``'minimize'``
- Whether to minimize or maximize trial metrics.
* - ``seed``
- ``int | null``
- ``null``
- The random seed.
* - ``tpe_args.constant_liar_type``
- ``'best' | 'worst' | 'mean' | null``
- ``'best'``
- TPE algorithm itself does not support parallel tuning. This parameter specifies how to optimize for trial_concurrency > 1. How each liar works is explained in paper's section 6.1.
In general ``best`` suit for small trial number and ``worst`` suit for large trial number.
* - ``tpe_args.n_startup_jobs``
- ``int``
- ``20``
- The first N hyper-parameters are generated fully randomly for warming up.
If the search space is large, you can increase this value. Or if max_trial_number is small, you may want to decrease it.
* - ``tpe_args.n_ei_candidates``
- ``int``
- ``24``
- For each iteration TPE samples EI for N sets of parameters and choose the best one. (loosely speaking)
* - ``tpe_args.linear_forgetting``
- ``int``
- ``25``
- TPE will lower the weights of old trials. This controls how many iterations it takes for a trial to start decay.
* - ``tpe_args.prior_weight``
- ``float``
- ``1.0``
- TPE treats user provided search space as prior.
When generating new trials, it also incorporates the prior in trial history by transforming the search space to
one trial configuration (i.e., each parameter of this configuration chooses the mean of its candidate range).
Here, prior_weight determines the weight of this trial configuration in the history trial configurations.
With prior weight 1.0, the search space is treated as one good trial.
For example, "normal(0, 1)" effectly equals to a trial with x = 0 which has yielded good result.
* - ``tpe_args.gamma``
- ``float``
- ``0.25``
- Controls how many trials are considered "good".
The number is calculated as "min(gamma * sqrt(N), linear_forgetting)".
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment