It is pretty simple to use multi-phase in trial code, an example is shown below:
```python
# ...
for i in range(5):
```python
# ...
foriinrange(5):
# get parameter from tuner
tuner_param=nni.get_next_parameter()
# nni.get_next_parameter returns None if there is no more hyper parameters can be generated by tuner.
...
...
@@ -30,12 +30,12 @@ It is pretty simple to use multi-phase in trial code, an example is shown below:
# report final result somewhere for the parameter retrieved above
nni.report_final_result()
# ...
# ...
```
# ...
```
In multi-phase experiments, at each time the API ```nni.get_next_parameter()``` is called, it returns a new hyper parameter generated by tuner, then the trail code consume this new hyper parameter and report final result of this hyper parameter. `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
In multi-phase experiments, at each time the API `nni.get_next_parameter()` is called, it returns a new hyper parameter generated by tuner, then the trail code consume this new hyper parameter and report final result of this hyper parameter. `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
Note that, ```nni.get_next_parameter``` returns None if there is no more hyper parameters can be generated by tuner.
Note that, `nni.get_next_parameter` returns None if there is no more hyper parameters can be generated by tuner.
__2. Experiment configuration__
...
...
@@ -43,7 +43,7 @@ To enable multi-phase, you should also add `multiPhase: true` in your experiment
Multi-phase experiment configuration example:
```
```yaml
authorName:default
experimentName:multiphase experiment
trialConcurrency:2
...
...
@@ -66,13 +66,15 @@ trial:
### Write a tuner that leverages multi-phase:
Before writing a multi-phase tuner, we highly suggest you to go through [Customize Tuner](https://nni.readthedocs.io/en/latest/Tuner/CustomizeTuner.html). Same as writing a normal tuner, your tuner needs to inherit from `Tuner` class. When you enable multi-phase through configuration (set `multiPhase` to true), your tuner will get an additional parameter `trial_job_id` via tuner's following methods:
```
```text
generate_parameters
generate_multiple_parameters
receive_trial_result
receive_customized_trial_result
trial_end
```
With this information, the tuner could know which trial is requesting a configuration, and which trial is reporting results. This information provides enough flexibility for your tuner to deal with different trials and different phases. For example, you may want to use the trial_job_id parameter of generate_parameters method to generate hyperparameters for a specific trial job.
@@ -5,6 +5,7 @@ We are glad to announce the alpha release for model compression toolkit on top o
NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It supports Tensorflow and PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
## Supported algorithms
We have provided two naive compression algorithms and three popular ones for users, including two pruning algorithms and three quantization algorithms:
|Name|Brief Introduction of Algorithm|
...
...
@@ -20,6 +21,7 @@ We have provided two naive compression algorithms and three popular ones for use
We use a simple example to show how to modify your trial code in order to apply the compression algorithms. Let's say you want to prune all weight to 80% sparsity with Level Pruner, you can add the following three lines into your code before training your model ([here](https://github.com/microsoft/nni/tree/master/examples/model_compress) is complete code).
@@ -54,6 +57,7 @@ There are also other keys in the `dict`, but they are specific for every compres
The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
A simple example of configuration is shown below:
```python
[
{
...
...
@@ -70,6 +74,7 @@ A simple example of configuration is shown below:
}
]
```
It means following the algorithm's default setting for compressed operations with sparsity 0.8, but for `op_name1` and `op_name2` use sparsity 0.6, and please do not compress `op_name3`.
### Other APIs
...
...
@@ -77,10 +82,13 @@ It means following the algorithm's default setting for compressed operations wit
Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](./Pruner.md#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke. One is `update_epoch`, you can use it as follows:
Tensorflow code
```python
pruner.update_epoch(epoch,sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
...
...
@@ -130,7 +138,7 @@ class YourPruner(nni.compression.tensorflow.Pruner):
pass
```
For the simpliest algorithm, you only need to override `calc_mask`. It receives each layer's weight and selected configuration, as well as op information. You generate the mask for this weight in this function and return. Then NNI applies the mask for you.
For the simplest algorithm, you only need to override `calc_mask`. It receives each layer's weight and selected configuration, as well as op information. You generate the mask for this weight in this function and return. Then NNI applies the mask for you.
Some algorithms generate mask based on training progress, i.e., epoch number. We provide `update_epoch` for the pruner to be aware of the training progress.
...
...
@@ -145,7 +153,7 @@ The interface for customizing quantization algorithm is similar to that of pruni
# For writing a Quantizer in PyTorch, you can simply replace
@@ -20,7 +20,7 @@ Currently we support the following algorithms:
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
|[__BOHB__](#BOHB)|BOHB is a follow-up work of Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
|[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
|[__PPO Tuner__](#PPOTuner)|PPO Tuner is an Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
|[__PPO Tuner__](#PPOTuner)|PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
## Usage of Built-in Tuners
...
...
@@ -309,6 +309,7 @@ tuner:
> Built-in Tuner Name: **MetisTuner**
Note that the only acceptable types of search space are `quniform`, `uniform` and `randint` and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
**Suggested scenario**
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK. [Detailed Description](./MetisTuner.md)
...
...
@@ -426,14 +427,14 @@ Note that the only acceptable type of search space is `mutable_layer`. `optional
**Suggested scenario**
PPOTuner is a Reinforcement Learning tuner based on PPO algorithm. When you are using NNI NAS interface in your trial code to do neural architecture search, PPOTuner is recommended. It has relatively high data efficiency but is suggested when you have large amount of computation resource. You could try it on very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [Detailed Description](./PPOTuner.md)
PPOTuner is a Reinforcement Learning tuner based on PPO algorithm. When you are using NNI NAS interface in your trial code to do neural architecture search, PPOTuner can be used. In general, Reinforcement Learning algorithm need more computing resource, though PPO algorithm is more efficient than others relatively. So it's recommended to use this tuner when there are large amount of computing resource. You could try it on very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [See details](./PPOTuner.md)
**Requirement of classArgs**
***optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
***trials_per_update** (*int, optional, default = 20*) - The number of trials to be used for one update. This number is recommended to be larger than `trialConcurrency` and `trialConcurrency` be a aliquot devisor of `trials_per_update`. Note that trials_per_update should be divisible by minibatch_size.
***trials_per_update** (*int, optional, default = 20*) - The number of trials to be used for one update. It must be divisible by minibatch_size. `trials_per_update` is recommended to be an exact multiple of `trialConcurrency` for better concurrency of trials.
***epochs_per_update** (*int, optional, default = 4*) - The number of epochs for one update.
***minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that, trials_per_update should be divisible by minibatch_size.
***minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that, trials_per_update must be divisible by minibatch_size.
***ent_coef** (*float, optional, default = 0.0*) - Policy entropy coefficient in the optimization objective.
***lr** (*float, optional, default = 3e-4*) - Learning rate of the model (lstm network), constant.
***vf_coef** (*float, optional, default = 0.5*) - Value function loss coefficient in the optimization objective.
* __light weight(without Annotation and Assessor)__
...
...
@@ -131,7 +130,6 @@ machineList:
passwd:
```
<aname="Configuration"></a>
## Configuration spec
* __authorName__
...
...
@@ -333,7 +331,7 @@ machineList:
* __gpuIndices__
__gpuIndices__ specifies the gpus that can be used by the tuner process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.
__gpuIndices__ specifies the gpus that can be used by the advisor process. Single or multiple GPU indices can be specified, multiple GPU indices are seperated by comma(,), such as `1` or `0,1,3`. If the field is not set, `CUDA_VISIBLE_DEVICES` will be '' in script, that is, no GPU is visible to tuner.
Note: users could only use one way to specify advisor, either specifying `builtinAdvisorName` and `classArgs`, or specifying `codeDir`, `classFileName`, `className` and `classArgs`.
我们很高兴的宣布,基于 NNI 的模型压缩工具发布了 Alpha 版本。该版本仍处于试验阶段,根据用户反馈会进行改进。 诚挚邀请您使用、反馈,或更多贡献。
NNI 提供了易于使用的工具包来帮助用户设计并使用压缩算法。 其使用了统一的接口来支持 TensorFlow 和 PyTorch。 只需要添加几行代码即可压缩模型。 NNI 中也内置了一些流程的模型压缩算法。 用户还可以通过 NNI 强大的自动调参功能来找到最好的压缩后的模型,详见[自动模型压缩](./AutoCompression.md)。 另外,用户还能使用 NNI 的接口,轻松定制新的压缩算法,详见[教程](#customize-new-compression-algorithms)。
在 [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878)中,作者 Michael Zhu 和 Suyog Gupta 提出了一种逐渐修建权重的算法。
在 [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf) 中,作者 Benoit Jacob 和 Skirmantas Kligys 提出了一种算法在训练中量化模型。
> 我们提出了一种方法,在训练的前向过程中模拟量化效果。 此方法不影响反向传播,所有权重和偏差都使用了浮点数保存,因此能很容易的进行量化。 然后,前向传播通过实现浮点算法的舍入操作,来在推理引擎中模拟量化的推理。 * 权重在与输入卷积操作前进行量化。 如果在层中使用了批量归一化(参考 [17]),批量归一化参数会被在量化前被“折叠”到权重中。 * 激活操作在推理时会被量化,例如,在激活函数被应用到卷积或全连接层输出之后,或在增加旁路连接,或连接多个层的输出之后(如:ResNet)。 Activations are quantized at points where they would be during inference, e.g. after the activation function is applied to a convolutional or fully connected layer’s output, or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.
这是通常用于 NAS 接口的 NNI Tuner,使用了 [PPO 算法](https://arxiv.org/abs/1707.06347)。 此实现继承了[这里](https://github.com/openai/baselines/tree/master/baselines/ppo2)的主要逻辑,(即 OpenAI 的 PPO2),并为 NAS 场景做了适配。