"src/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "3fdbbdb3afbfe9564222a8e8a2381e1651a08a62"
Unverified Commit c785655e authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #207 from microsoft/master

merge master
parents 9fae194a d6b61e2f
...@@ -10,12 +10,17 @@ jobs: ...@@ -10,12 +10,17 @@ jobs:
steps: steps:
- script: python3 -m pip install --upgrade pip setuptools --user - script: python3 -m pip install --upgrade pip setuptools --user
displayName: 'Install python tools' displayName: 'Install python tools'
- script: |
python3 -m pip install torch==0.4.1 --user
python3 -m pip install torchvision==0.2.1 --user
python3 -m pip install tensorflow==1.12.0 --user
displayName: 'Install dependencies for integration'
- script: | - script: |
source install.sh source install.sh
displayName: 'Install nni toolkit via source code' displayName: 'Install nni toolkit via source code'
- script: | - script: |
python3 -m pip install flake8 --user python3 -m pip install flake8 --user
IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821 IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821,./examples/trials/nas_cifar10/src/cifar10/general_child.py:F821
python3 -m flake8 . --count --per-file-ignores=$IGNORE --select=E9,F63,F72,F82 --show-source --statistics python3 -m flake8 . --count --per-file-ignores=$IGNORE --select=E9,F63,F72,F82 --show-source --statistics
displayName: 'Run flake8 tests to find Python syntax errors and undefined names' displayName: 'Run flake8 tests to find Python syntax errors and undefined names'
- script: | - script: |
...@@ -50,6 +55,11 @@ jobs: ...@@ -50,6 +55,11 @@ jobs:
steps: steps:
- script: python3 -m pip install --upgrade pip setuptools - script: python3 -m pip install --upgrade pip setuptools
displayName: 'Install python tools' displayName: 'Install python tools'
- script: |
python3 -m pip install torch==0.4.1 --user
python3 -m pip install torchvision==0.2.1 --user
python3 -m pip install tensorflow==1.13.1 --user
displayName: 'Install dependencies for integration'
- script: | - script: |
source install.sh source install.sh
displayName: 'Install nni toolkit via source code' displayName: 'Install nni toolkit via source code'
......
...@@ -63,7 +63,7 @@ sudo mount -t nfs 10.10.10.10:/tmp/nni/shared /mnt/nfs/nni ...@@ -63,7 +63,7 @@ sudo mount -t nfs 10.10.10.10:/tmp/nni/shared /mnt/nfs/nni
``` ```
where `10.10.10.10` should be replaced by the real IP of NFS server machine in practice. where `10.10.10.10` should be replaced by the real IP of NFS server machine in practice.
## Asynchornous Dispatcher Mode for trial dependency control ## Asynchronous Dispatcher Mode for trial dependency control
The feature of weight sharing enables trials from different machines, in which most of the time **read after write** consistency must be assured. After all, the child model should not load parent model before parent trial finishes training. To deal with this, users can enable **asynchronous dispatcher mode** with `multiThread: true` in `config.yml` in NNI, where the dispatcher assign a tuner thread each time a `NEW_TRIAL` request comes in, and the tuner thread can decide when to submit a new trial by blocking and unblocking the thread itself. For example: The feature of weight sharing enables trials from different machines, in which most of the time **read after write** consistency must be assured. After all, the child model should not load parent model before parent trial finishes training. To deal with this, users can enable **asynchronous dispatcher mode** with `multiThread: true` in `config.yml` in NNI, where the dispatcher assign a tuner thread each time a `NEW_TRIAL` request comes in, and the tuner thread can decide when to submit a new trial by blocking and unblocking the thread itself. For example:
```python ```python
def generate_parameters(self, parameter_id): def generate_parameters(self, parameter_id):
......
...@@ -8,8 +8,6 @@ Typically each trial job gets a single configuration (e.g., hyperparameters) fro ...@@ -8,8 +8,6 @@ Typically each trial job gets a single configuration (e.g., hyperparameters) fro
The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results. The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results.
Note that, `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
## Create multi-phase experiment ## Create multi-phase experiment
### Write trial code which leverages multi-phase: ### Write trial code which leverages multi-phase:
...@@ -23,6 +21,9 @@ It is pretty simple to use multi-phase in trial code, an example is shown below: ...@@ -23,6 +21,9 @@ It is pretty simple to use multi-phase in trial code, an example is shown below:
for i in range(5): for i in range(5):
# get parameter from tuner # get parameter from tuner
tuner_param = nni.get_next_parameter() tuner_param = nni.get_next_parameter()
# nni.get_next_parameter returns None if there is no more hyper parameters can be generated by tuner.
if tuner_param is None:
break
# consume the params # consume the params
# ... # ...
...@@ -32,6 +33,10 @@ It is pretty simple to use multi-phase in trial code, an example is shown below: ...@@ -32,6 +33,10 @@ It is pretty simple to use multi-phase in trial code, an example is shown below:
# ... # ...
``` ```
In multi-phase experiments, at each time the API ```nni.get_next_parameter()``` is called, it returns a new hyper parameter generated by tuner, then the trail code consume this new hyper parameter and report final result of this hyper parameter. `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
Note that, ```nni.get_next_parameter``` returns None if there is no more hyper parameters can be generated by tuner.
__2. Experiment configuration__ __2. Experiment configuration__
To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration. To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration.
......
# Automatic Model Compression on NNI
TBD.
\ No newline at end of file
# Compressor
NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It supports Tensorflow and PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
## Supported algorithms
We have provided two naive compression algorithms and four popular ones for users, including three pruning algorithms and three quantization algorithms:
|Name|Brief Introduction of Algorithm|
|---|---|
| [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Sensitivity Pruner](./Pruner.md#sensitivity-pruner) | Learning both Weights and Connections for Efficient Neural Networks. [Reference Paper](https://arxiv.org/abs/1506.02626)|
| [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
## Usage of built-in compression algorithms
We use a simple example to show how to modify your trial code in order to apply the compression algorithms. Let's say you want to prune all weight to 80% sparsity with Level Pruner, you can add the following three lines into your code before training your model ([here](https://github.com/microsoft/nni/tree/master/examples/model_compress) is complete code).
Tensorflow code
```python
from nni.compression.tensorflow import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(tf.get_default_graph())
```
PyTorch code
```python
from nni.compression.torch import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(model)
```
You can use other compression algorithms in the package of `nni.compression`. The algorithms are implemented in both PyTorch and Tensorflow, under `nni.compression.torch` and `nni.compression.tensorflow` respectively. You can refer to [Pruner](./Pruner.md) and [Quantizer](./Quantizer.md) for detail description of supported algorithms.
The function call `pruner(model)` receives user defined model (in Tensorflow the model can be obtained with `tf.get_default_graph()`, while in PyTorch the model is the defined model class), and the model is modified with masks inserted. Then when you run the model, the masks take effect. The masks can be adjusted at runtime by the algorithms.
When instantiate a compression algorithm, there is `config_list` passed in. We describe how to write this config below.
### User configuration for a compression algorithm
When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object. In each `dict`, there are some keys commonly supported by NNI compression:
* __op_types__: This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting.
* __op_names__: This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
* __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.
There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some.
The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
A simple example of configuration is shown below:
```python
[
{
'sparsity': 0.8,
'op_types': 'default'
},
{
'sparsity': 0.6,
'op_names': ['op_name1', 'op_name2']
},
{
'exclude': True,
'op_names': ['op_name3']
}
]
```
It means following the algorithm's default setting for compressed operations with sparsity 0.8, but for `op_name1` and `op_name2` use sparsity 0.6, and please do not compress `op_name3`.
### Other APIs
Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](./Pruner.md#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke. One is `update_epoch`, you can use it as follows:
Tensorflow code
```python
pruner.update_epoch(epoch, sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
The other is `step`, it can be called with `pruner.step()` after each minibatch. Note that not all algorithms need these two APIs, for those that do not need them, calling them is allowed but has no effect.
__[TODO]__ The last API is for users to export the compressed model. You will get a compressed model when you finish the training using this API. It also exports another file storing the values of masks.
## Customize new compression algorithms
To simplify writing a new compression algorithm, we design programming interfaces which are simple but flexible enough. There are interfaces for pruner and quantizer respectively.
### Pruning algorithm
If you want to write a new pruning algorithm, you can write a class that inherits `nni.compression.tensorflow.Pruner` or `nni.compression.torch.Pruner` depending on which framework you use. Then, override the member functions with the logic of your algorithm.
```python
# This is writing a pruner in tensorflow.
# For writing a pruner in PyTorch, you can simply replace
# nni.compression.tensorflow.Pruner with
# nni.compression.torch.Pruner
class YourPruner(nni.compression.tensorflow.Pruner):
def __init__(self, config_list):
# suggest you to use the NNI defined spec for config
super().__init__(config_list)
def bind_model(self, model):
# this func can be used to remember the model or its weights
# in member variables, for getting their values during training
pass
def calc_mask(self, weight, config, **kwargs):
# weight is the target weight tensor
# config is the selected dict object in config_list for this layer
# kwargs contains op, op_type, and op_name
# design your mask and return your mask
return your_mask
# note for pytorch version, there is no sess in input arguments
def update_epoch(self, epoch_num, sess):
pass
# note for pytorch version, there is no sess in input arguments
def step(self, sess):
# can do some processing based on the model or weights binded
# in the func bind_model
pass
```
For the simpliest algorithm, you only need to override `calc_mask`. It receives each layer's weight and selected configuration, as well as op information. You generate the mask for this weight in this function and return. Then NNI applies the mask for you.
Some algorithms generate mask based on training progress, i.e., epoch number. We provide `update_epoch` for the pruner to be aware of the training progress.
Some algorithms may want global information for generating masks, for example, all weights of the model (for statistic information), model optimizer's information. NNI supports this requirement using `bind_model`. `bind_model` receives the complete model, thus, it could record any information (e.g., reference to weights) it cares about. Then `step` can process or update the information according to the algorithm. You can refer to [source code of built-in algorithms](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/compressors) for example implementations.
### Quantization algorithm
The interface for customizing quantization algorithm is similar to that of pruning algorithms. The only difference is that `calc_mask` is replaced with `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
```python
# This is writing a Quantizer in tensorflow.
# For writing a Quantizer in PyTorch, you can simply replace
# nni.compression.tensorflow.Quantizer with
# nni.compression.torch.Quantizer
class YourPruner(nni.compression.tensorflow.Quantizer):
def __init__(self, config_list):
# suggest you to use the NNI defined spec for config
super().__init__(config_list)
def bind_model(self, model):
# this func can be used to remember the model or its weights
# in member variables, for getting their values during training
pass
def quantize_weight(self, weight, config, **kwargs):
# weight is the target weight tensor
# config is the selected dict object in config_list for this layer
# kwargs contains op, op_type, and op_name
# design your quantizer and return new weight
return new_weight
# note for pytorch version, there is no sess in input arguments
def update_epoch(self, epoch_num, sess):
pass
# note for pytorch version, there is no sess in input arguments
def step(self, sess):
# can do some processing based on the model or weights binded
# in the func bind_model
pass
# you can also design your method
def your_method(self, your_input):
#your code
def bind_model(self, model):
#preprocess model
```
__[TODO]__ Will add another member function `quantize_layer_output`, as some quantization algorithms also quantize layers' output.
### Usage of user customized compression algorithm
__[TODO]__ ...
Pruner on NNI Compressor
===
## Level Pruner
This is one basic pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60%).
We first sort the weights in the specified layer by their absolute values. And then mask to zero the smallest magnitude weights until the desired sparsity level is reached.
### Usage
Tensorflow code
```
from nni.compression.tensorflow import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(model_graph)
```
PyTorch code
```
from nni.compression.torch import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(model)
```
#### User configuration for Level Pruner
* **sparsity:** This is to specify the sparsity operations to be compressed to
***
## AGP Pruner
In [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
>We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
![](../../img/agp_pruner.png)
>The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation
### Usage
You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
First, you should import pruner and add mask to model.
Tensorflow code
```python
from nni.compression.tensorflow import AGP_Pruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_types': 'default'
}]
pruner = AGP_Pruner(config_list)
pruner(tf.get_default_graph())
```
PyTorch code
```python
from nni.compression.torch import AGP_Pruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_types': 'default'
}]
pruner = AGP_Pruner(config_list)
pruner(model)
```
Second, you should add code below to update epoch number when you finish one epoch in your training code.
Tensorflow code
```python
pruner.update_epoch(epoch, sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
You can view example for more information
#### User configuration for AGP Pruner
* **initial_sparsity:** This is to specify the sparsity when compressor starts to compress
* **final_sparsity:** This is to specify the sparsity when compressor finishes to compress
* **start_epoch:** This is to specify the epoch number when compressor starts to compress
* **end_epoch:** This is to specify the epoch number when compressor finishes to compress
* **frequency:** This is to specify every *frequency* number epochs compressor compress once
***
## Sensitivity Pruner
In [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626), author Song Han and provide an algorithm to find the sensitivity of each layer and set the pruning threshold to each layer.
>We used the sensitivity results to find each layer’s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer’s weights
### Usage
You can prune weight step by step and reach one target sparsity by Sensitivity Pruner with the code below.
Tensorflow code
```python
from nni.compression.tensorflow import SensitivityPruner
config_list = [{ 'sparsity':0.8, 'op_types': 'default' }]
pruner = SensitivityPruner(config_list)
pruner(tf.get_default_graph())
```
PyTorch code
```python
from nni.compression.torch import SensitivityPruner
config_list = [{ 'sparsity':0.8, 'op_types': 'default' }]
pruner = SensitivityPruner(config_list)
pruner(model)
```
Like AGP Pruner, you should update mask information every epoch by adding code below
Tensorflow code
```python
pruner.update_epoch(epoch, sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
You can view example for more information
#### User configuration for Sensitivity Pruner
* **sparsity:** This is to specify the sparsity operations to be compressed to
***
Quantizer on NNI Compressor
===
## Naive Quantizer
We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it to test quantize algorithm without any configure.
### Usage
tensorflow
```python
nni.compressors.tensorflow.NaiveQuantizer()(model_graph)
```
pytorch
```python
nni.compressors.torch.NaiveQuantizer()(model)
```
***
## QAT Quantizer
In [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf), authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training.
>We propose an approach that simulates quantization effects in the forward pass of training. Backpropagation still happens as usual, and all weights and biases are stored in floating point so that they can be easily nudged by small amounts. The forward propagation pass however simulates quantized inference as it will happen in the inference engine, by implementing in floating-point arithmetic the rounding behavior of the quantization scheme
>* Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer, the batch normalization parameters are “folded into” the weights before quantization.
>* Activations are quantized at points where they would be during inference, e.g. after the activation function is applied to a convolutional or fully connected layer’s output, or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.
### Usage
You can quantize your model to 8 bits with the code below before your training code.
Tensorflow code
```python
from nni.compressors.tensorflow import QAT_Quantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = QAT_Quantizer(config_list)
quantizer(tf.get_default_graph())
```
PyTorch code
```python
from nni.compressors.torch import QAT_Quantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = QAT_Quantizer(config_list)
quantizer(model)
```
You can view example for more information
#### User configuration for QAT Quantizer
* **q_bits:** This is to specify the q_bits operations to be quantized to
***
## DoReFa Quantizer
In [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160), authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training.
### Usage
To implement DoReFa Quantizer, you can add code below before your training code
Tensorflow code
```python
from nni.compressors.tensorflow import DoReFaQuantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = DoReFaQuantizer(config_list)
quantizer(tf.get_default_graph())
```
PyTorch code
```python
from nni.compressors.torch import DoReFaQuantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = DoReFaQuantizer(config_list)
quantizer(model)
```
You can view example for more information
#### User configuration for QAT Quantizer
* **q_bits:** This is to specify the q_bits operations to be quantized to
...@@ -20,6 +20,7 @@ Currently we support the following algorithms: ...@@ -20,6 +20,7 @@ Currently we support the following algorithms:
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)| |[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
|[__BOHB__](#BOHB)|BOHB is a follow-up work of Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)| |[__BOHB__](#BOHB)|BOHB is a follow-up work of Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
|[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)| |[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
|[__PPO Tuner__](#PPOTuner)|PPO Tuner is an Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
## Usage of Built-in Tuners ## Usage of Built-in Tuners
...@@ -38,7 +39,7 @@ Note: Please follow the format when you write your `config.yml` file. Some built ...@@ -38,7 +39,7 @@ Note: Please follow the format when you write your `config.yml` file. Some built
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md) TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
...@@ -66,7 +67,7 @@ tuner: ...@@ -66,7 +67,7 @@ tuner:
Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm. [Detailed Description](./HyperoptTuner.md) Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm. [Detailed Description](./HyperoptTuner.md)
**Requirement of classArg:** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
...@@ -91,7 +92,7 @@ tuner: ...@@ -91,7 +92,7 @@ tuner:
Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md) Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
...@@ -117,7 +118,7 @@ tuner: ...@@ -117,7 +118,7 @@ tuner:
Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress. [Detailed Description](./EvolutionTuner.md) Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress. [Detailed Description](./EvolutionTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
...@@ -156,7 +157,7 @@ nnictl package install --name=SMAC ...@@ -156,7 +157,7 @@ nnictl package install --name=SMAC
Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete. [Detailed Description](./SmacTuner.md) Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete. [Detailed Description](./SmacTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
...@@ -243,7 +244,7 @@ tuner: ...@@ -243,7 +244,7 @@ tuner:
It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. [Detailed Description](./HyperbandAdvisor.md) It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. [Detailed Description](./HyperbandAdvisor.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs. * **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
...@@ -277,7 +278,7 @@ NetworkMorphism requires [PyTorch](https://pytorch.org/get-started/locally) and ...@@ -277,7 +278,7 @@ NetworkMorphism requires [PyTorch](https://pytorch.org/get-started/locally) and
It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain. [Detailed Description](./NetworkmorphismTuner.md) It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain. [Detailed Description](./NetworkmorphismTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain. * **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
...@@ -307,13 +308,12 @@ tuner: ...@@ -307,13 +308,12 @@ tuner:
> Built-in Tuner Name: **MetisTuner** > Built-in Tuner Name: **MetisTuner**
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`. Note that the only acceptable types of search space are `quniform`, `uniform` and `randint` and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
**Suggested scenario** **Suggested scenario**
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK. [Detailed Description](./MetisTuner.md) Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK. [Detailed Description](./MetisTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
...@@ -347,7 +347,7 @@ nnictl package install --name=BOHB ...@@ -347,7 +347,7 @@ nnictl package install --name=BOHB
Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to Bayesian optimization usage. [Detailed Description](./BohbAdvisor.md) Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to Bayesian optimization usage. [Detailed Description](./BohbAdvisor.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
* **min_budget** (*int, optional, default = 1*) - The smallest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be positive. * **min_budget** (*int, optional, default = 1*) - The smallest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be positive.
...@@ -380,13 +380,13 @@ advisor: ...@@ -380,13 +380,13 @@ advisor:
> Built-in Tuner Name: **GPTuner** > Built-in Tuner Name: **GPTuner**
Note that the only acceptable types of search space are `choice`, `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`. Note that the only acceptable types of search space are `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`, and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
**Suggested scenario** **Suggested scenario**
As a strategy in Sequential Model-based Global Optimization(SMBO) algorithm, GP Tuner uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore GP Tuner is most adequate for situations where the function to be optimized is a very expensive endeavor. GP can be used when the computation resource is limited. While GP Tuner has a computational cost that grows at *O(N^3)* due to the requirement of inverting the Gram matrix, so it's not suitable when lots of trials are needed. [Detailed Description](./GPTuner.md) As a strategy in Sequential Model-based Global Optimization(SMBO) algorithm, GP Tuner uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore GP Tuner is most adequate for situations where the function to be optimized is a very expensive endeavor. GP can be used when the computation resource is limited. While GP Tuner has a computational cost that grows at *O(N^3)* due to the requirement of inverting the Gram matrix, so it's not suitable when lots of trials are needed. [Detailed Description](./GPTuner.md)
**Requirement of classArg** **Requirement of classArgs**
* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics. * **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The kind of utility function(acquisition function). 'ei', 'ucb' and 'poi' corresponds to 'Expected Improvement', 'Upper Confidence Bound' and 'Probability of Improvement' respectively. * **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The kind of utility function(acquisition function). 'ei', 'ucb' and 'poi' corresponds to 'Expected Improvement', 'Upper Confidence Bound' and 'Probability of Improvement' respectively.
...@@ -415,3 +415,39 @@ tuner: ...@@ -415,3 +415,39 @@ tuner:
selection_num_warm_up: 100000 selection_num_warm_up: 100000
selection_num_starting_points: 250 selection_num_starting_points: 250
``` ```
<a name="PPOTuner"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `PPO Tuner`
> Built-in Tuner Name: **PPOTuner**
Note that the only acceptable type of search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].
**Suggested scenario**
PPOTuner is a Reinforcement Learning tuner based on PPO algorithm. When you are using NNI NAS interface in your trial code to do neural architecture search, PPOTuner is recommended. It has relatively high data efficiency but is suggested when you have large amount of computation resource. You could try it on very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [Detailed Description](./PPOTuner.md)
**Requirement of classArgs**
* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **trials_per_update** (*int, optional, default = 20*) - The number of trials to be used for one update. This number is recommended to be larger than `trialConcurrency` and `trialConcurrency` be a aliquot devisor of `trials_per_update`. Note that trials_per_update should be divisible by minibatch_size.
* **epochs_per_update** (*int, optional, default = 4*) - The number of epochs for one update.
* **minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that, trials_per_update should be divisible by minibatch_size.
* **ent_coef** (*float, optional, default = 0.0*) - Policy entropy coefficient in the optimization objective.
* **lr** (*float, optional, default = 3e-4*) - Learning rate of the model (lstm network), constant.
* **vf_coef** (*float, optional, default = 0.5*) - Value function loss coefficient in the optimization objective.
* **max_grad_norm** (*float, optional, default = 0.5*) - Gradient norm clipping coefficient.
* **gamma** (*float, optional, default = 0.99*) - Discounting factor.
* **lam** (*float, optional, default = 0.95*) - Advantage estimation discounting factor (lambda in the paper).
* **cliprange** (*float, optional, default = 0.2*) - Cliprange in the PPO algorithm, constant.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
```
\ No newline at end of file
# **How To** - Customize Your Own Advisor # **How To** - Customize Your Own Advisor
*Advisor targets the scenario that the automl algorithm wants the methods of both tuner and assessor. Advisor is similar to tuner on that it receives trial parameters request, final results, and generate trial parameters. Also, it is similar to assessor on that it receives intermediate results, trial's end state, and could send trial kill command. Note that, if you use Advisor, tuner and assessor are not allowed to be used at the same time.* *Warning: API is subject to change in future releases.*
So, if user want to implement a customized Advisor, she/he only need to: Advisor targets the scenario that the automl algorithm wants the methods of both tuner and assessor. Advisor is similar to tuner on that it receives trial parameters request, final results, and generate trial parameters. Also, it is similar to assessor on that it receives intermediate results, trial's end state, and could send trial kill command. Note that, if you use Advisor, tuner and assessor are not allowed to be used at the same time.
1. Define an Advisor inheriting from the MsgDispatcherBase class If a user want to implement a customized Advisor, she/he only needs to:
1. Implement the methods with prefix `handle_` except `handle_request`
1. Configure your customized Advisor in experiment YAML config file
Here is an example: **1. Define an Advisor inheriting from the MsgDispatcherBase class.** For example:
**1) Define an Advisor inheriting from the MsgDispatcherBase class**
```python ```python
from nni.msg_dispatcher_base import MsgDispatcherBase from nni.msg_dispatcher_base import MsgDispatcherBase
...@@ -20,13 +16,11 @@ class CustomizedAdvisor(MsgDispatcherBase): ...@@ -20,13 +16,11 @@ class CustomizedAdvisor(MsgDispatcherBase):
... ...
``` ```
**2) Implement the methods with prefix `handle_` except `handle_request`** **2. Implement the methods with prefix `handle_` except `handle_request`**.. You might find [docs](https://nni.readthedocs.io/en/latest/sdk_reference.html#nni.msg_dispatcher_base.MsgDispatcherBase) for `MsgDispatcherBase` helpful.
Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods.
**3) Configure your customized Advisor in experiment YAML config file** **3. Configure your customized Advisor in experiment YAML config file.**
Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor. Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the `__init__` constructor.
```yaml ```yaml
advisor: advisor:
...@@ -38,3 +32,7 @@ advisor: ...@@ -38,3 +32,7 @@ advisor:
classArgs: classArgs:
arg1: value1 arg1: value1
``` ```
## Example
Here we provide an [example](../../../examples/tuners/mnist_keras_customized_advisor).
...@@ -7,4 +7,6 @@ Bayesian optimization works by constructing a posterior distribution of function ...@@ -7,4 +7,6 @@ Bayesian optimization works by constructing a posterior distribution of function
GP Tuner is designed to minimize/maximize the number of steps required to find a combination of parameters that are close to the optimal combination. To do so, this method uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore Bayesian Optimization is most adequate for situations where sampling the function to be optimized is a very expensive endeavor. GP Tuner is designed to minimize/maximize the number of steps required to find a combination of parameters that are close to the optimal combination. To do so, this method uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore Bayesian Optimization is most adequate for situations where sampling the function to be optimized is a very expensive endeavor.
Note that the only acceptable types of search space are `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`, and numerical `choice`.
This optimization approach is described in Section 3 of [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf). This optimization approach is described in Section 3 of [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
...@@ -15,6 +15,6 @@ It finds the global optimal point in the Gaussian Process space. This point repr ...@@ -15,6 +15,6 @@ It finds the global optimal point in the Gaussian Process space. This point repr
It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and re-sampling. It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and re-sampling.
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`. Note that the only acceptable types of search space are `quniform`, `uniform` and `randint` and numerical `choice`.
More details can be found in our paper: https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/ More details can be found in our paper: https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/
\ No newline at end of file
PPO Tuner on NNI
===
## PPOTuner
This is a tuner generally for NNI's NAS interface, it uses [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2) (i.e., ppo2 from OpenAI), and is adapted for NAS scenario.
It could successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:
![](../../img/ppo_mnist.png)
We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10) (with limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Use Figure 7 in the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show how the search space looks like
![](../../img/enas_search_space.png)
The figure above is a chosen architecture, we use it to show how the search space looks like. Each square is a layer whose operation can be chosen from 6 operations. Each dash line is a skip connection, each square layer could choose 0 or 1 skip connection getting the output of a previous layer. __Note that__ in original macro search space each square layer could choose any number of skip connections, while in our implementation it is only allowed to choose 0 or 1.
The result is shown in figure below (with the experiment config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml)):
![](../../img/ppo_cifar10.png)
...@@ -35,7 +35,7 @@ tuner: ...@@ -35,7 +35,7 @@ tuner:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
gpuNum: gpuIndices:
trial: trial:
command: command:
codeDir: codeDir:
...@@ -71,14 +71,13 @@ tuner: ...@@ -71,14 +71,13 @@ tuner:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
gpuNum: gpuIndices:
assessor: assessor:
#choice: Medianstop #choice: Medianstop
builtinAssessorName: builtinAssessorName:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
gpuNum:
trial: trial:
command: command:
codeDir: codeDir:
...@@ -113,14 +112,13 @@ tuner: ...@@ -113,14 +112,13 @@ tuner:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
gpuNum: gpuIndices:
assessor: assessor:
#choice: Medianstop #choice: Medianstop
builtinAssessorName: builtinAssessorName:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
gpuNum:
trial: trial:
command: command:
codeDir: codeDir:
...@@ -245,11 +243,11 @@ machineList: ...@@ -245,11 +243,11 @@ machineList:
* __builtinTunerName__ and __classArgs__ * __builtinTunerName__ and __classArgs__
* __builtinTunerName__ * __builtinTunerName__
__builtinTunerName__ specifies the name of system tuner, NNI sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__} __builtinTunerName__ specifies the name of system tuner, NNI sdk provides different tuners introduced [here](../Tuner/BuiltinTuner.md).
* __classArgs__ * __classArgs__
__classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__. __classArgs__ specifies the arguments of tuner algorithm. Please refer to [this file](../Tuner/BuiltinTuner.md) for the configurable arguments of each built-in tuner.
* __codeDir__, __classFileName__, __className__ and __classArgs__ * __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__ * __codeDir__
...@@ -264,16 +262,16 @@ machineList: ...@@ -264,16 +262,16 @@ machineList:
__classArgs__ specifies the arguments of tuner algorithm. __classArgs__ specifies the arguments of tuner algorithm.
* __gpuNum__ * __gpuIndices__
__gpuNum__ specifies the gpu number to run the tuner process. The value of this field should be a positive number. If the field is not set, NNI will not set `CUDA_VISIBLE_DEVICES` in script (that is, will not control the visibility of GPUs on trial command through `CUDA_VISIBLE_DEVICES`), and will not manage gpu resource.
Note: users could only specify one way to set tuner, for example, set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and could not set them both. __gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.
* __includeIntermediateResults__ * __includeIntermediateResults__
If __includeIntermediateResults__ is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result. The default value of __includeIntermediateResults__ is false. If __includeIntermediateResults__ is true, the last intermediate result of the trial that is early stopped by assessor is sent to tuner as final result. The default value of __includeIntermediateResults__ is false.
Note: users could only use one way to specify tuner, either specifying `builtinTunerName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.
* __assessor__ * __assessor__
* Description * Description
...@@ -282,7 +280,7 @@ machineList: ...@@ -282,7 +280,7 @@ machineList:
* __builtinAssessorName__ and __classArgs__ * __builtinAssessorName__ and __classArgs__
* __builtinAssessorName__ * __builtinAssessorName__
__builtinAssessorName__ specifies the name of system assessor, NNI sdk provides one kind of assessor {__Medianstop__} __builtinAssessorName__ specifies the name of built-in assessor, NNI sdk provides different assessors introducted [here](../Assessor/BuiltinAssessor.md).
* __classArgs__ * __classArgs__
__classArgs__ specifies the arguments of assessor algorithm __classArgs__ specifies the arguments of assessor algorithm
...@@ -305,11 +303,39 @@ machineList: ...@@ -305,11 +303,39 @@ machineList:
__classArgs__ specifies the arguments of assessor algorithm. __classArgs__ specifies the arguments of assessor algorithm.
* __gpuNum__ Note: users could only use one way to specify assessor, either specifying `builtinAssessorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`. If users do not want to use assessor, assessor fileld should leave to empty.
* __advisor__
* Description
__gpuNum__ specifies the gpu number to run the assessor process. The value of this field should be a positive number. __advisor__ specifies the advisor algorithm in the experiment, there are two kinds of ways to specify advisor. One way is to use advisor provided by NNI sdk, need to set __builtinAdvisorName__ and __classArgs__. Another way is to use users' own advisor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinAdvisorName__ and __classArgs__
* __builtinAdvisorName__
Note: users' could only specify one way to set assessor, for example,set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and users could not set them both.If users do not want to use assessor, assessor fileld should leave to empty. __builtinAdvisorName__ specifies the name of a built-in advisor, NNI sdk provides [different advisors](../Tuner/BuiltinTuner.md).
* __classArgs__
__classArgs__ specifies the arguments of the advisor algorithm. Please refer to [this file](../Tuner/BuiltinTuner.md) for the configurable arguments of each built-in advisor.
* __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__
__codeDir__ specifies the directory of advisor code.
* __classFileName__
__classFileName__ specifies the name of advisor file.
* __className__
__className__ specifies the name of advisor class.
* __classArgs__
__classArgs__ specifies the arguments of advisor algorithm.
* __gpuIndices__
__gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.
Note: users could only use one way to specify advisor, either specifying `builtinAdvisorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.
* __trial(local, remote)__ * __trial(local, remote)__
...@@ -560,7 +586,6 @@ machineList: ...@@ -560,7 +586,6 @@ machineList:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
...@@ -586,14 +611,12 @@ machineList: ...@@ -586,14 +611,12 @@ machineList:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
assessor: assessor:
#choice: Medianstop #choice: Medianstop
builtinAssessorName: Medianstop builtinAssessorName: Medianstop
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
...@@ -620,7 +643,6 @@ machineList: ...@@ -620,7 +643,6 @@ machineList:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
assessor: assessor:
codeDir: /nni/assessor codeDir: /nni/assessor
classFileName: myassessor.py classFileName: myassessor.py
...@@ -628,7 +650,6 @@ machineList: ...@@ -628,7 +650,6 @@ machineList:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
...@@ -656,7 +677,6 @@ machineList: ...@@ -656,7 +677,6 @@ machineList:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
...@@ -780,7 +800,6 @@ machineList: ...@@ -780,7 +800,6 @@ machineList:
builtinAssessorName: Medianstop builtinAssessorName: Medianstop
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0
trial: trial:
codeDir: . codeDir: .
worker: worker:
......
...@@ -10,6 +10,7 @@ nnictl support commands: ...@@ -10,6 +10,7 @@ nnictl support commands:
* [nnictl create](#create) * [nnictl create](#create)
* [nnictl resume](#resume) * [nnictl resume](#resume)
* [nnictl view](#view)
* [nnictl stop](#stop) * [nnictl stop](#stop)
* [nnictl update](#update) * [nnictl update](#update)
* [nnictl trial](#trial) * [nnictl trial](#trial)
...@@ -104,6 +105,35 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -104,6 +105,35 @@ Debug mode will disable version check function in Trialkeeper.
nnictl resume [experiment_id] --port 8088 nnictl resume [experiment_id] --port 8088
``` ```
<a name="view"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl view`
* Description
You can use this command to view a stopped experiment.
* Usage
```bash
nnictl view [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| True| |The id of the experiment you want to view|
|--port, -p| False| |Rest port of the experiment you want to view|
* Example
> view an experiment with specified port 8088
```bash
nnictl view [experiment_id] --port 8088
```
<a name="stop"></a> <a name="stop"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl stop` ![](https://placehold.it/15/1589F0/000000?text=+) `nnictl stop`
......
...@@ -94,7 +94,7 @@ All types of sampling strategies and their parameter are listed here: ...@@ -94,7 +94,7 @@ All types of sampling strategies and their parameter are listed here:
Known Limitations: Known Limitations:
* Note that Metis Tuner only supports numerical `choice` now * GP Tuner and Metis Tuner support only **numerical values** in search space(`choice` type values can be no-numeraical with other tuners, e.g. string values). Both GP Tuner and Metis Tuner use Gaussian Process Regressor(GPR). GPR make predictions based on a kernel function and the 'distance' between different points, it's hard to get the true distance between no-numerical values.
* Note that for nested search space: * Note that for nested search space:
......
...@@ -50,6 +50,9 @@ Assessor ...@@ -50,6 +50,9 @@ Assessor
Advisor Advisor
------------------------ ------------------------
.. autoclass:: nni.msg_dispatcher_base.MsgDispatcherBase
:members:
.. autoclass:: nni.hyperband_advisor.hyperband_advisor.Hyperband .. autoclass:: nni.hyperband_advisor.hyperband_advisor.Hyperband
:members: :members:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment