Unverified Commit 035d58bc authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #121 from Microsoft/master

merge master
parents b633c265 8e732f2c
...@@ -11,10 +11,12 @@ ...@@ -11,10 +11,12 @@
[![Pull Requests](https://img.shields.io/github/issues-pr-raw/Microsoft/nni.svg)](https://github.com/Microsoft/nni/pulls?q=is%3Apr+is%3Aopen) [![Pull Requests](https://img.shields.io/github/issues-pr-raw/Microsoft/nni.svg)](https://github.com/Microsoft/nni/pulls?q=is%3Apr+is%3Aopen)
[![Version](https://img.shields.io/github/release/Microsoft/nni.svg)](https://github.com/Microsoft/nni/releases) [![Join the chat at https://gitter.im/Microsoft/nni](https://badges.gitter.im/Microsoft/nni.svg)](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Version](https://img.shields.io/github/release/Microsoft/nni.svg)](https://github.com/Microsoft/nni/releases) [![Join the chat at https://gitter.im/Microsoft/nni](https://badges.gitter.im/Microsoft/nni.svg)](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[简体中文](zh_CN/README.md)
NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments. NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud. The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.
### **NNI [v0.5](https://github.com/Microsoft/nni/releases) has been released!** ### **NNI [v0.5.1](https://github.com/Microsoft/nni/releases) has been released!**
<p align="center"> <p align="center">
<a href="#nni-v05-has-been-released"><img src="https://microsoft.github.io/nni/docs/img/overview.svg" /></a> <a href="#nni-v05-has-been-released"><img src="https://microsoft.github.io/nni/docs/img/overview.svg" /></a>
</p> </p>
...@@ -49,33 +51,33 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search ...@@ -49,33 +51,33 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
</ul> </ul>
</td> </td>
<td> <td>
<a href="docs/HowToChooseTuner.md">Tuner</a> <a href="docs/Builtin_Tuner.md">Tuner</a>
<ul> <ul>
<li><a href="docs/HowToChooseTuner.md#TPE">TPE</a></li> <li><a href="docs/Builtin_Tuner.md#TPE">TPE</a></li>
<li><a href="docs/HowToChooseTuner.md#Random">Random Search</a></li> <li><a href="docs/Builtin_Tuner.md#Random">Random Search</a></li>
<li><a href="docs/HowToChooseTuner.md#Anneal">Anneal</a></li> <li><a href="docs/Builtin_Tuner.md#Anneal">Anneal</a></li>
<li><a href="docs/HowToChooseTuner.md#Evolution">Naive Evolution</a></li> <li><a href="docs/Builtin_Tuner.md#Evolution">Naive Evolution</a></li>
<li><a href="docs/HowToChooseTuner.md#SMAC">SMAC</a></li> <li><a href="docs/Builtin_Tuner.md#SMAC">SMAC</a></li>
<li><a href="docs/HowToChooseTuner.md#Batch">Batch</a></li> <li><a href="docs/Builtin_Tuner.md#Batch">Batch</a></li>
<li><a href="docs/HowToChooseTuner.md#Grid">Grid Search</a></li> <li><a href="docs/Builtin_Tuner.md#Grid">Grid Search</a></li>
<li><a href="docs/HowToChooseTuner.md#Hyperband">Hyperband</a></li> <li><a href="docs/Builtin_Tuner.md#Hyperband">Hyperband</a></li>
<li><a href="docs/HowToChooseTuner.md#NetworkMorphism">Network Morphism</a></li> <li><a href="docs/Builtin_Tuner.md#NetworkMorphism">Network Morphism</a></li>
<li><a href="examples/tuners/enas_nni/README.md">ENAS</a></li> <li><a href="examples/tuners/enas_nni/README.md">ENAS</a></li>
<li><a href="docs/HowToChooseTuner.md#NetworkMorphism#MetisTuner">Metis Tuner</a></li> <li><a href="docs/Builtin_Tuner.md#NetworkMorphism#MetisTuner">Metis Tuner</a></li>
</ul> </ul>
<a href="docs/HowToChooseTuner.md#assessor">Assessor</a> <a href="docs/Builtin_Tuner.md#assessor">Assessor</a>
<ul> <ul>
<li><a href="docs/HowToChooseTuner.md#Medianstop">Median Stop</a></li> <li><a href="docs/Builtin_Tuner.md#Medianstop">Median Stop</a></li>
<li><a href="docs/HowToChooseTuner.md#Curvefitting">Curve Fitting</a></li> <li><a href="docs/Builtin_Tuner.md#Curvefitting">Curve Fitting</a></li>
</ul> </ul>
</td> </td>
<td> <td>
<ul> <ul>
<li><a href="docs/tutorial_1_CR_exp_local_api.md">Local Machine</a></li> <li><a href="docs/tutorial_1_CR_exp_local_api.md">Local Machine</a></li>
<li><a href="docs/tutorial_2_RemoteMachineMode.md">Remote Servers</a></li> <li><a href="docs/RemoteMachineMode.md">Remote Servers</a></li>
<li><a href="docs/PAIMode.md">OpenPAI</a></li> <li><a href="docs/PAIMode.md">OpenPAI</a></li>
<li><a href="docs/KubeflowMode.md">Kubeflow</a></li> <li><a href="docs/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/KubeflowMode.md">FrameworkController on K8S (AKS etc.)</a></li> <li><a href="docs/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
</ul> </ul>
</td> </td>
</tr> </tr>
...@@ -112,7 +114,7 @@ Note: ...@@ -112,7 +114,7 @@ Note:
* We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) in our current stage. * We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) in our current stage.
* Run the following commands in an environment that has `python >= 3.5`, `git` and `wget`. * Run the following commands in an environment that has `python >= 3.5`, `git` and `wget`.
```bash ```bash
git clone -b v0.5 https://github.com/Microsoft/nni.git git clone -b v0.5.1 https://github.com/Microsoft/nni.git
cd nni cd nni
source install.sh source install.sh
``` ```
...@@ -124,7 +126,7 @@ For the system requirements of NNI, please refer to [Install NNI](docs/Installat ...@@ -124,7 +126,7 @@ For the system requirements of NNI, please refer to [Install NNI](docs/Installat
The following example is an experiment built on TensorFlow. Make sure you have **TensorFlow installed** before running it. The following example is an experiment built on TensorFlow. Make sure you have **TensorFlow installed** before running it.
* Download the examples via clone the source code. * Download the examples via clone the source code.
```bash ```bash
git clone -b v0.5 https://github.com/Microsoft/nni.git git clone -b v0.5.1 https://github.com/Microsoft/nni.git
``` ```
* Run the mnist example. * Run the mnist example.
```bash ```bash
...@@ -168,24 +170,25 @@ You can use these commands to get more information about the experiment ...@@ -168,24 +170,25 @@ You can use these commands to get more information about the experiment
## **Documentation** ## **Documentation**
* [NNI overview](docs/Overview.md) * [NNI overview](docs/Overview.md)
* [Quick start](docs/GetStarted.md) * [Quick start](docs/QuickStart.md)
## **How to** ## **How to**
* [Install NNI](docs/Installation.md) * [Install NNI](docs/Installation.md)
* [Use command line tool nnictl](docs/NNICTLDOC.md) * [Use command line tool nnictl](docs/NNICTLDOC.md)
* [Use NNIBoard](docs/WebUI.md) * [Use NNIBoard](docs/WebUI.md)
* [How to define search space](docs/SearchSpaceSpec.md) * [How to define search space](docs/SearchSpaceSpec.md)
* [How to define a trial](docs/howto_1_WriteTrial.md) * [How to define a trial](docs/Trials.md)
* [How to choose tuner/search-algorithm](docs/HowToChooseTuner.md) * [How to choose tuner/search-algorithm](docs/Builtin_Tuner.md)
* [Config an experiment](docs/ExperimentConfig.md) * [Config an experiment](docs/ExperimentConfig.md)
* [How to use annotation](docs/howto_1_WriteTrial.md#nni-python-annotation) * [How to use annotation](docs/Trials.md#nni-python-annotation)
## **Tutorials** ## **Tutorials**
* [Run an experiment on local (with multiple GPUs)?](docs/tutorial_1_CR_exp_local_api.md) * [Run an experiment on local (with multiple GPUs)?](docs/tutorial_1_CR_exp_local_api.md)
* [Run an experiment on multiple machines?](docs/tutorial_2_RemoteMachineMode.md) * [Run an experiment on multiple machines?](docs/RemoteMachineMode.md)
* [Run an experiment on OpenPAI?](docs/PAIMode.md) * [Run an experiment on OpenPAI?](docs/PAIMode.md)
* [Run an experiment on Kubeflow?](docs/KubeflowMode.md) * [Run an experiment on Kubeflow?](docs/KubeflowMode.md)
* [Try different tuners and assessors](docs/tutorial_3_tryTunersAndAssessors.md) * [Try different tuners](docs/tuners.rst)
* [Implement a customized tuner](docs/howto_2_CustomizedTuner.md) * [Try different assessors](docs/assessors.rst)
* [Implement a customized tuner](docs/Customize_Tuner.md)
* [Implement a customized assessor](examples/assessors/README.md) * [Implement a customized assessor](examples/assessors/README.md)
* [Use Genetic Algorithm to find good model architectures for Reading Comprehension task](examples/trials/ga_squad/README.md) * [Use Genetic Algorithm to find good model architectures for Reading Comprehension task](examples/trials/ga_squad/README.md)
......
...@@ -33,7 +33,7 @@ jobs: ...@@ -33,7 +33,7 @@ jobs:
displayName: 'Built-in tuners / assessors tests' displayName: 'Built-in tuners / assessors tests'
- script: | - script: |
cd test cd test
PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts local PATH=$HOME/.local/bin:$PATH python3 config_test.py --ts local --local_gpu
displayName: 'Examples and advanced features tests on local machine' displayName: 'Examples and advanced features tests on local machine'
- script: | - script: |
cd test cd test
......
Dockerfile Dockerfile
=== ===
## 1.Description ## 1.Description
This is the Dockerfile of nni project. It includes serveral popular deep learning frameworks and NNI. It is tested on `Ubuntu 16.04 LTS`: This is the Dockerfile of NNI project. It includes serveral popular deep learning frameworks and NNI. It is tested on `Ubuntu 16.04 LTS`:
``` ```
CUDA 9.0, CuDNN 7.0 CUDA 9.0, CuDNN 7.0
numpy 1.14.3,scipy 1.1.0 numpy 1.14.3,scipy 1.1.0
TensorFlow 1.10.0 TensorFlow-gpu 1.10.0
Keras 2.1.6 Keras 2.1.6
PyTorch 0.4.1 PyTorch 0.4.1
scikit-learn 0.20.0 scikit-learn 0.20.0
pandas 0.23.4 pandas 0.23.4
lightgbm 2.2.2 lightgbm 2.2.2
NNI v0.5 NNI v0.5.1
``` ```
You can take this Dockerfile as a reference for your own customized Dockerfile. You can take this Dockerfile as a reference for your own customized Dockerfile.
...@@ -26,6 +26,8 @@ __Run the docker image__ ...@@ -26,6 +26,8 @@ __Run the docker image__
``` ```
docker run -it nni/nni docker run -it nni/nni
``` ```
Note that if you want to use tensorflow, please uninstall tensorflow-gpu and install tensorflow in this docker container. Or modify `Dockerfile` to install tensorflow (without gpu) and build docker image.
* If use GPU in docker container, make sure you have installed [NVIDIA Container Runtime](https://github.com/NVIDIA/nvidia-docker), then run the following command * If use GPU in docker container, make sure you have installed [NVIDIA Container Runtime](https://github.com/NVIDIA/nvidia-docker), then run the following command
``` ```
nvidia-docker run -it nni/nni nvidia-docker run -it nni/nni
......
_build
_static
_templates
\ No newline at end of file
...@@ -8,6 +8,7 @@ Currently we recommend sharing weights through NFS (Network File System), which ...@@ -8,6 +8,7 @@ Currently we recommend sharing weights through NFS (Network File System), which
### Weight Sharing through NFS file ### Weight Sharing through NFS file
With the NFS setup (see below), trial code can share model weight through loading & saving files. Here we recommend that user feed the tuner with the storage path: With the NFS setup (see below), trial code can share model weight through loading & saving files. Here we recommend that user feed the tuner with the storage path:
```yaml ```yaml
tuner: tuner:
codeDir: path/to/customer_tuner codeDir: path/to/customer_tuner
...@@ -17,9 +18,10 @@ tuner: ...@@ -17,9 +18,10 @@ tuner:
... ...
save_dir_root: /nfs/storage/path/ save_dir_root: /nfs/storage/path/
``` ```
And let tuner decide where to save & load weights and feed the paths to trials through `nni.get_next_parameters()`: And let tuner decide where to save & load weights and feed the paths to trials through `nni.get_next_parameters()`:
![weight_sharing_design](./img/weight_sharing.png) <img src="https://user-images.githubusercontent.com/23273522/51817667-93ebf080-2306-11e9-8395-b18b322062bc.png" alt="drawing" width="700"/>
For example, in tensorflow: For example, in tensorflow:
```python ```python
...@@ -80,7 +82,7 @@ The feature of weight sharing enables trials from different machines, in which m ...@@ -80,7 +82,7 @@ The feature of weight sharing enables trials from different machines, in which m
``` ```
## Examples ## Examples
For details, please refer to this [simple weight sharing example](../test/async_sharing_test). We also provided a [practice example](../examples/trials/weight_sharing/ga_squad) for reading comprehension, based on previous [ga_squad](../examples/trials/ga_squad) example. For details, please refer to this [simple weight sharing example](https://github.com/Microsoft/nni/tree/master/test/async_sharing_test). We also provided a [practice example](https://github.com/Microsoft/nni/tree/master/examples/trials/weight_sharing/ga_squad) for reading comprehension, based on previous [ga_squad](https://github.com/Microsoft/nni/tree/master/examples/trials/ga_squad) example.
[1]: https://arxiv.org/abs/1802.03268 [1]: https://arxiv.org/abs/1802.03268
[2]: https://arxiv.org/abs/1707.07012 [2]: https://arxiv.org/abs/1707.07012
......
# NNI Annotation # NNI Annotation
For good user experience and reduce user effort, we need to design a good annotation grammar.
If users use NNI system, they only need to: ## Overview
1. Use nni.get_next_parameter() to retrieve hyper parameters from Tuner, before using other annotation, use following annotation at the begining of trial code: To improve user experience and reduce user effort, we design an annotation grammar. Using NNI annotation, users can adapt their code to NNI just by adding some standalone annotating strings, which does not affect the execution of the original code.
'''@nni.get_next_parameter()'''
2. Annotation variable in code as: Below is an example:
'''@nni.variable(nni.choice(2,3,5,7),name=self.conv_size)''' ```python
'''@nni.variable(nni.choice(0.1, 0.01, 0.001), name=learning_rate)'''
learning_rate = 0.1
```
The meaning of this example is that NNI will choose one of several values (0.1, 0.01, 0.001) to assign to the learning_rate variable. Specifically, this first line is an NNI annotation, which is a single string. Following is an assignment statement. What nni does here is to replace the right value of this assignment statement according to the information provided by the annotation line.
3. Annotation intermediate in code as:
'''@nni.report_intermediate_result(test_acc)''' In this way, users could either run the python code directly or launch NNI to tune hyper-parameter in this code, without changing any codes.
4. Annotation output in code as: ## Types of Annotation:
'''@nni.report_final_result(test_acc)''' In NNI, there are mainly four types of annotation:
5. Annotation `function_choice` in code as:
'''@nni.function_choice(max_pool(h_conv1, self.pool_size),avg_pool(h_conv1, self.pool_size),name=max_pool)''' ### 1. Annotate variables
In this way, they can easily implement automatic tuning on NNI. `'''@nni.variable(sampling_algo, name)'''`
For `@nni.variable`, `nni.choice` is the type of search space and there are 10 types to express your search space as follows: `@nni.variable` is used in NNI to annotate a variable.
1. `@nni.variable(nni.choice(option1,option2,...,optionN),name=variable)` **Arguments**
Which means the variable value is one of the options, which should be a list The elements of options can themselves be stochastic expressions
2. `@nni.variable(nni.randint(upper),name=variable)` - **sampling_algo**: Sampling algorithm that specifies a search space. User should replace it with a built-in NNI sampling function whose name consists of an `nni.` identification and a search space type specified in [SearchSpaceSpec](SearchSpaceSpec.md) such as `choice` or `uniform`.
Which means the variable value is a random integer in the range [0, upper). - **name**: The name of the variable that the selected value will be assigned to. Note that this argument should be the same as the left value of the following assignment statement.
3. `@nni.variable(nni.uniform(low, high),name=variable)` An example here is:
Which means the variable value is a value uniformly between low and high.
4. `@nni.variable(nni.quniform(low, high, q),name=variable)` ```python
Which means the variable value is a value like round(uniform(low, high) / q) * q '''@nni.variable(nni.choice(0.1, 0.01, 0.001), name=learning_rate)'''
learning_rate = 0.1
```
5. `@nni.variable(nni.loguniform(low, high),name=variable)` ### 2. Annotate functions
Which means the variable value is a value drawn according to exp(uniform(low, high)) so that the logarithm of the return value is uniformly distributed.
6. `@nni.variable(nni.qloguniform(low, high, q),name=variable)` `'''@nni.function_choice(*functions, name)'''`
Which means the variable value is a value like round(exp(uniform(low, high)) / q) * q
7. `@nni.variable(nni.normal(label, mu, sigma),name=variable)` `@nni.function_choice` is used to choose one from several functions.
Which means the variable value is a real value that's normally-distributed with mean mu and standard deviation sigma.
8. `@nni.variable(nni.qnormal(label, mu, sigma, q),name=variable)` **Arguments**
Which means the variable value is a value like round(normal(mu, sigma) / q) * q
9. `@nni.variable(nni.lognormal(label, mu, sigma),name=variable)` - **\*functions**: Several functions that are waiting to be selected from. Note that it should be a complete function call with arguments. Such as `max_pool(hidden_layer, pool_size)`.
Which means the variable value is a value drawn according to exp(normal(mu, sigma)) - **name**: The name of the function that will be replaced in the following assignment statement.
10. `@nni.variable(nni.qlognormal(label, mu, sigma, q),name=variable)` An example here is:
Which means the variable value is a value like round(exp(normal(mu, sigma)) / q) * q
```python
"""@nni.function_choice(max_pool(hidden_layer, pool_size), avg_pool(hidden_layer, pool_size), name=max_pool)"""
h_pooling = max_pool(hidden_layer, pool_size)
```
### 3. Annotate intermediate result
`'''@nni.report_intermediate_result(metrics)'''`
`@nni.report_intermediate_result` is used to report intermediate result, whose usage is the same as `nni.report_intermediate_result` in [Trials.md](Trials.md)
### 4. Annotate final result
`'''@nni.report_final_result(metrics)'''`
`@nni.report_final_result` is used to report the final result of the current trial, whose usage is the same as `nni.report_final_result` in [Trials.md](Trials.md)
# Built-in Assessors
NNI provides state-of-the-art tuning algorithm in our builtin-assessors and makes them easy to use. Below is the brief overview of NNI current builtin Assessors:
|Assessor|Brief Introduction of Algorithm|
|---|---|
|**Medianstop** [(Usage)](#MedianStop)|Medianstop is a simple early stopping rule mentioned in the [paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S.|
|[Curvefitting](https://github.com/Microsoft/nni/blob/master/src/sdk/pynni/nni/curvefitting_assessor/README.md) [(Usage)](#Curvefitting)|Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve|
## Usage of Builtin Assessors
Use builtin assessors provided by NNI SDK requires to declare the **builtinAssessorName** and **classArgs** in `config.yml` file. In this part, we will introduce the detailed usage about the suggested scenarios, classArg requirements, and example for each assessor.
Note: Please follow the format when you write your `config.yml` file.
<a name="MedianStop"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Median Stop Assessor`
> Builtin Assessor Name: **Medianstop**
**Suggested scenario**
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
* **start_step** (*int, optional, default = 0*) - A trial is determined to be stopped or not, only after receiving start_step number of reported intermediate results.
**Usage example:**
```yaml
# config.yml
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
start_step: 5
```
<br>
<a name="Curvefitting"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Curve Fitting Assessor`
> Builtin Assessor Name: **Curvefitting**
**Suggested scenario**
It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance.
**Requirement of classArg**
* **epoch_num** (*int, **required***) - The total number of epoch. We need to know the number of epoch to determine which point we need to predict.
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
* **start_step** (*int, optional, default = 6*) - A trial is determined to be stopped or not, we start to predict only after receiving start_step number of reported intermediate results.
* **threshold** (*float, optional, default = 0.95*) - The threshold that we decide to early stop the worse performance curve. For example: if threshold = 0.95, optimize_mode = maximize, best performance in the history is 0.9, then we will stop the trial which predict value is lower than 0.95 * 0.9 = 0.855.
**Usage example:**
```yaml
# config.yml
assessor:
builtinAssessorName: Curvefitting
classArgs:
epoch_num: 20
optimize_mode: maximize
start_step: 6
threshold: 0.95
```
\ No newline at end of file
# Built-in Tuners
NNI provides state-of-the-art tuning algorithm as our builtin-tuners and makes them easy to use. Below is the brief summary of NNI currently built-in Tuners:
|Tuner|Brief Introduction of Algorithm|
|---|---|
|**TPE** [(Usage)](#TPE)|The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model.|
|**Random Search** [(Usage)](#Random)|In Random Search for Hyper-Parameter Optimization show that Random Search might be surprisingly simple and effective. We suggest that we could use Random Search as the baseline when we have no knowledge about the prior distribution of hyper-parameters.|
|**Anneal** [(Usage)](#Anneal)|This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive.|
|**Naive Evolution** [(Usage)](#Evolution)|Naive Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naive Evolution requires many trials to works, but it's very simple and easy to expand new features.|
|**SMAC** [(Usage)](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on the SMAC3 Github repo. Notice, SMAC need to be installed by `nnictl package` command.|
|**Batch tuner** [(Usage)](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
|**Grid Search** [(Usage)](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.|
|[Hyperband](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/hyperband_advisor) [(Usage)](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of STEPs to find out promising one, then further training those promising ones to select several more promising one.|
|[Network Morphism](https://github.com/Microsoft/nni/blob/master/src/sdk/pynni/nni/networkmorphism_tuner/README.md) [(Usage)](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train.|
|**Metis Tuner** [(Usage)](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter.|
<br>
## Usage of Builtin Tuners
Use builtin tuner provided by NNI SDK requires to declare the **builtinTunerName** and **classArgs** in `config.yml` file. In this part, we will introduce the detailed usage about the suggested scenarios, classArg requirements and example for each tuner.
Note: Please follow the format when you write your `config.yml` file. Some builtin tuner need to be installed by `nnictl package`, like SMAC.
<a name="TPE"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `TPE`
> Builtin Tuner Name: **TPE**
**Suggested scenario**
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
**Usage example:**
```yaml
# config.yml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
```
<br>
<a name="Random"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Random Search`
> Builtin Tuner Name: **Random**
**Suggested scenario**
Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm.
**Requirement of classArg:**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: Random
classArgs:
optimize_mode: maximize
```
<br>
<a name="Anneal"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Anneal`
> Builtin Tuner Name: **Anneal**
**Suggested scenario**
Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: Anneal
classArgs:
optimize_mode: maximize
```
<br>
<a name="Evolution"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Naive Evolution`
> Builtin Tuner Name: **Evolution**
**Suggested scenario**
Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: Evolution
classArgs:
optimize_mode: maximize
```
<br>
<a name="SMAC"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `SMAC`
> Builtin Tuner Name: **SMAC**
**Installation**
SMAC need to be installed by following command before first use.
```bash
nnictl package install --name=SMAC
```
**Suggested scenario**
Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: SMAC
classArgs:
optimize_mode: maximize
```
<br>
<a name="Batch"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Batch Tuner`
> Builtin Tuner Name: BatchTuner
**Suggested scenario**
If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: BatchTuner
```
<br>
Note that the search space that BatchTuner supported like:
```json
{
"combine_params":
{
"_type" : "choice",
"_value" : [{"optimizer": "Adam", "learning_rate": 0.00001},
{"optimizer": "Adam", "learning_rate": 0.0001},
{"optimizer": "Adam", "learning_rate": 0.001},
{"optimizer": "SGD", "learning_rate": 0.01},
{"optimizer": "SGD", "learning_rate": 0.005},
{"optimizer": "SGD", "learning_rate": 0.0002}]
}
}
```
The search space file including the high-level key `combine_params`. The type of params in search space must be `choice` and the `values` including all the combined-params value.
<a name="GridSearch"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Grid Search`
> Builtin Tuner Name: **Grid Search**
**Suggested scenario**
Note that the only acceptable types of search space are `choice`, `quniform`, `qloguniform`. **The number `q` in `quniform` and `qloguniform` has special meaning (different from the spec in [search space spec](./SearchSpaceSpec.md)). It means the number of values that will be sampled evenly from the range `low` and `high`.**
It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: GridSearch
```
<br>
<a name="Hyperband"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Hyperband`
> Builtin Advisor Name: **Hyperband**
**Suggested scenario**
It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
* **R** (*int, optional, default = 60*) - the maximum STEPS (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use STEPS to control how long it runs.
* **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials
**Usage example**
```yaml
# config.yml
advisor:
builtinAdvisorName: Hyperband
classArgs:
optimize_mode: maximize
R: 60
eta: 3
```
<br>
<a name="NetworkMorphism"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Network Morphism`
> Builtin Tuner Name: **NetworkMorphism**
**Installation**
NetworkMorphism requires [pyTorch](https://pytorch.org/get-started/locally), so users should install it first.
**Suggested scenario**
It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain.
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
* **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
* **input_width** (*int, optional, default = 32*) - input image width
* **input_channel** (*int, optional, default = 3*) - input image channel
* **n_output_node** (*int, optional, default = 10*) - number of classes
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: NetworkMorphism
classArgs:
optimize_mode: maximize
task: cv
input_width: 32
input_channel: 3
n_output_node: 10
```
<br>
<a name="MetisTuner"></a>
![](https://placehold.it/15/1589F0/000000?text=+) `Metis Tuner`
> Builtin Tuner Name: **MetisTuner**
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`.
**Installation**
Metis Tuner requires [sklearn](https://scikit-learn.org/), so users should install it first. User could use `pip3 install sklearn` to install it.
**Suggested scenario**
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK.
**Requirement of classArg**
* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation.
**Usage example**
```yaml
# config.yml
tuner:
builtinTunerName: MetisTuner
classArgs:
optimize_mode: maximize
```
...@@ -6,7 +6,7 @@ Firstly, if you are unsure or afraid of anything, just ask or submit the issue o ...@@ -6,7 +6,7 @@ Firstly, if you are unsure or afraid of anything, just ask or submit the issue o
However, for those individuals who want a bit more guidance on the best way to contribute to the project, read on. This document will cover all the points we're looking for in your contributions, raising your chances of quickly merging or addressing your contributions. However, for those individuals who want a bit more guidance on the best way to contribute to the project, read on. This document will cover all the points we're looking for in your contributions, raising your chances of quickly merging or addressing your contributions.
Looking for a quickstart, get acquainted with our [Get Started](./GetStarted.md) guide. Looking for a quickstart, get acquainted with our [Get Started](./QuickStart.md) guide.
There are a few simple guidelines that you need to follow before providing your hacks. There are a few simple guidelines that you need to follow before providing your hacks.
...@@ -30,7 +30,7 @@ Provide PRs with appropriate tags for bug fixes or enhancements to the source co ...@@ -30,7 +30,7 @@ Provide PRs with appropriate tags for bug fixes or enhancements to the source co
If you are looking for How to develop and debug the NNI source code, you can refer to [How to set up NNI developer environment doc](./SetupNNIDeveloperEnvironment.md) file in the `docs` folder. If you are looking for How to develop and debug the NNI source code, you can refer to [How to set up NNI developer environment doc](./SetupNNIDeveloperEnvironment.md) file in the `docs` folder.
Similarly for [writing trials](./WriteYourTrial.md) or [starting experiments](StartExperiment.md). For everything else, refer [here](https://github.com/Microsoft/nni/tree/master/docs). Similarly for [Quick Start](QuickStart.md). For everything else, refer to [NNI Home page](http://nni.readthedocs.io).
## Solve Existing Issues ## Solve Existing Issues
Head over to [issues](https://github.com/Microsoft/nni/issues) to find issues where help is needed from contributors. You can find issues tagged with 'good-first-issue' or 'help-wanted' to contribute in. Head over to [issues](https://github.com/Microsoft/nni/issues) to find issues where help is needed from contributors. You can find issues tagged with 'good-first-issue' or 'help-wanted' to contribute in.
......
###############################
Contribute to NNI
###############################
.. toctree::
Development Setup<SetupNNIDeveloperEnvironment>
Contribution Guide<CONTRIBUTING>
Debug HowTo<HowToDebug>
\ No newline at end of file
...@@ -6,7 +6,7 @@ So, if user want to implement a customized Advisor, she/he only need to: ...@@ -6,7 +6,7 @@ So, if user want to implement a customized Advisor, she/he only need to:
1. Define an Advisor inheriting from the MsgDispatcherBase class 1. Define an Advisor inheriting from the MsgDispatcherBase class
1. Implement the methods with prefix `handle_` except `handle_request` 1. Implement the methods with prefix `handle_` except `handle_request`
1. Configure your customized Advisor in experiment yaml config file 1. Configure your customized Advisor in experiment YAML config file
Here is an example: Here is an example:
...@@ -22,9 +22,9 @@ class CustomizedAdvisor(MsgDispatcherBase): ...@@ -22,9 +22,9 @@ class CustomizedAdvisor(MsgDispatcherBase):
**2) Implement the methods with prefix `handle_` except `handle_request`** **2) Implement the methods with prefix `handle_` except `handle_request`**
Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](../src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods. Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods.
**3) Configure your customized Advisor in experiment yaml config file** **3) Configure your customized Advisor in experiment YAML config file**
Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor. Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor.
......
# Customize Assessor
NNI supports to build an assessor by yourself for tuning demand.
If you want to implement a customized Assessor, there are three things to do:
1. Inherit the base Assessor class
1. Implement assess_trial function
1. Configure your customized Assessor in experiment YAML config file
**1. Inherit the base Assessor class**
```python
from nni.assessor import Assessor
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
```
**2. Implement assess trial function**
```python
from nni.assessor import Assessor, AssessResult
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
def assess_trial(self, trial_history):
"""
Determines whether a trial should be killed. Must override.
trial_history: a list of intermediate result objects.
Returns AssessResult.Good or AssessResult.Bad.
"""
# you code implement here.
...
```
**3. Configure your customized Assessor in experiment YAML config file**
NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the \_\_init__ constructor.
```yaml
assessor:
codeDir: /home/abc/myassessor
classFileName: my_customized_assessor.py
className: CustomizedAssessor
# Any parameter need to pass to your Assessor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
```
Please noted in **2**. The object `trial_history` are exact the object that Trial send to Assessor by using SDK `report_intermediate_result` function.
More detail example you could see:
> * [medianstop-assessor](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/medianstop_assessor)
> * [curvefitting-assessor](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/curvefitting_assessor)
\ No newline at end of file
# **How To** - Customize Your Own Tuner # Customize-Tuner
*Tuner receive result from Trial as a matric to evaluate the performance of a specific parameters/architecture configure. And tuner send next hyper-parameter or architecture configure to Trial.* ## Customize Tuner
So, if user want to implement a customized Tuner, she/he only need to: NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand.
1. Inherit a tuner of a base Tuner class If you want to implement your own tuning algorithm, you can implement a customized Tuner, there are three things to do:
1. Inherit the base Tuner class
1. Implement receive_trial_result and generate_parameter function 1. Implement receive_trial_result and generate_parameter function
1. Configure your customized tuner in experiment yaml config file 1. Configure your customized tuner in experiment YAML config file
Here is an example: Here is an example:
**1) Inherit a tuner of a base Tuner class** **1. Inherit the base Tuner class**
```python ```python
from nni.tuner import Tuner from nni.tuner import Tuner
...@@ -20,7 +22,7 @@ class CustomizedTuner(Tuner): ...@@ -20,7 +22,7 @@ class CustomizedTuner(Tuner):
... ...
``` ```
**2) Implement receive_trial_result and generate_parameter function** **2. Implement receive_trial_result and generate_parameter function**
```python ```python
from nni.tuner import Tuner from nni.tuner import Tuner
...@@ -31,10 +33,10 @@ class CustomizedTuner(Tuner): ...@@ -31,10 +33,10 @@ class CustomizedTuner(Tuner):
def receive_trial_result(self, parameter_id, parameters, value): def receive_trial_result(self, parameter_id, parameters, value):
''' '''
Record an observation of the objective function and Train Receive trial's final result.
parameter_id: int parameter_id: int
parameters: object created by 'generate_parameters()' parameters: object created by 'generate_parameters()'
value: final metrics of the trial, including reward value: final metrics of the trial, including default metric
''' '''
# your code implements here. # your code implements here.
... ...
...@@ -57,13 +59,14 @@ For example: ...@@ -57,13 +59,14 @@ For example:
If the you implement the `generate_parameters` like this: If the you implement the `generate_parameters` like this:
```python ```python
def generate_parameters(self, parameter_id): def generate_parameters(self, parameter_id):
''' '''
Returns a set of trial (hyper-)parameters, as a serializable object Returns a set of trial (hyper-)parameters, as a serializable object
parameter_id: int parameter_id: int
''' '''
# your code implements here. # your code implements here.
return {"dropout": 0.3, "learning_rate": 0.4} return {"dropout": 0.3, "learning_rate": 0.4}
``` ```
It means your Tuner will always generate parameters `{"dropout": 0.3, "learning_rate": 0.4}`. Then Trial will receive `{"dropout": 0.3, "learning_rate": 0.4}` by calling API `nni.get_next_parameter()`. Once the trial ends with a result (normally some kind of metrics), it can send the result to Tuner by calling API `nni.report_final_result()`, for example `nni.report_final_result(0.93)`. Then your Tuner's `receive_trial_result` function will receied the result like: It means your Tuner will always generate parameters `{"dropout": 0.3, "learning_rate": 0.4}`. Then Trial will receive `{"dropout": 0.3, "learning_rate": 0.4}` by calling API `nni.get_next_parameter()`. Once the trial ends with a result (normally some kind of metrics), it can send the result to Tuner by calling API `nni.report_final_result()`, for example `nni.report_final_result(0.93)`. Then your Tuner's `receive_trial_result` function will receied the result like:
...@@ -83,7 +86,7 @@ _fd = open(os.path.join(_pwd, 'data.txt'), 'r') ...@@ -83,7 +86,7 @@ _fd = open(os.path.join(_pwd, 'data.txt'), 'r')
This is because your tuner is not executed in the directory of your tuner (i.e., `pwd` is not the directory of your own tuner). This is because your tuner is not executed in the directory of your tuner (i.e., `pwd` is not the directory of your own tuner).
**3) Configure your customized tuner in experiment yaml config file** **3. Configure your customized tuner in experiment YAML config file**
NNI needs to locate your customized tuner class and instantiate the class, so you need to specify the location of the customized tuner class and pass literal values as parameters to the \_\_init__ constructor. NNI needs to locate your customized tuner class and instantiate the class, so you need to specify the location of the customized tuner class and pass literal values as parameters to the \_\_init__ constructor.
...@@ -96,14 +99,14 @@ tuner: ...@@ -96,14 +99,14 @@ tuner:
# can be specified in this optional classArgs field, for example # can be specified in this optional classArgs field, for example
classArgs: classArgs:
arg1: value1 arg1: value1
``` ```
More detail example you could see: More detail example you could see:
> * [evolution-tuner](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/evolution_tuner)
> * [hyperopt-tuner](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/hyperopt_tuner)
> * [evolution-based-customized-tuner](https://github.com/Microsoft/nni/tree/master/examples/tuners/ga_customer_tuner)
> * [evolution-tuner](../src/sdk/pynni/nni/evolution_tuner) ### Write a more advanced automl algorithm
> * [hyperopt-tuner](../src/sdk/pynni/nni/hyperopt_tuner)
> * [evolution-based-customized-tuner](../examples/tuners/ga_customer_tuner)
## Write a more advanced automl algorithm
The information above are usually enough to write a general tuner. However, users may also want more information, for example, intermediate results, trials' state (e.g., the information in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](../src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](./howto_3_CustomizedAdvisor.md) for how to write a customized advisor. The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](Customize_Advisor.md) for how to write a customized advisor.
\ No newline at end of file
**Enable Assessor in your expeirment**
===
Assessor module is for assessing running trials. One common use case is early stopping, which terminates unpromising trial jobs based on their intermediate results.
## Using NNI built-in Assessor
Here we use the same example `examples/trials/mnist-annotation`. We use `Medianstop` assessor for this experiment. The yaml configure file is shown below:
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
assessor:
builtinAssessorName: Medianstop
classArgs:
optimize_mode: maximize
trial:
command: python mnist.py
codeDir: /usr/share/nni/examples/trials/mnist-annotation
gpuNum: 0
```
For our built-in assessors, you need to fill two fields: `builtinAssessorName` which chooses NNI provided assessors (refer to [here]() for built-in assessors), `optimize_mode` which includes maximize and minimize (you want to maximize or minimize your trial result).
## Using user customized Assessor
You can also write your own assessor following the guidance [here](). For example, you wrote an assessor for `examples/trials/mnist-annotation`. You should prepare the yaml configure below:
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
# Possible values: TPE, Random, Anneal, Evolution
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
assessor:
# Your assessor code directory
codeDir:
# Name of the file which contains your assessor class
classFileName:
# Your assessor class name, must be a subclass of nni.Assessor
className:
# Parameter names and literal values you want to pass to
# the __init__ constructor of your assessor class
classArgs:
arg1: value1
gpuNum: 0
trial:
command: python mnist.py
codeDir: /usr/share/nni/examples/trials/mnist-annotation
gpuNum: 0
```
You need to fill: `codeDir`, `classFileName`, `className`, and pass parameters to \_\_init__ constructor through `classArgs` field if the \_\_init__ constructor of your assessor class has required parameters.
**Note that** if you want to access a file (e.g., ```data.txt```) in the directory of your own assessor, you cannot use ```open('data.txt', 'r')```. Instead, you should use the following:
```
_pwd = os.path.dirname(__file__)
_fd = open(os.path.join(_pwd, 'data.txt'), 'r')
```
This is because your assessor is not executed in the directory of your assessor (i.e., ```pwd``` is not the directory of your own assessor).
\ No newline at end of file
######################
Examples
######################
.. toctree::
:maxdepth: 2
MNIST<mnist_examples>
Cifar10<cifar10_examples>
Scikit-learn<sklearn_examples>
EvolutionSQuAD<SQuAD_evolution_examples>
GBDT<gbdt_example>
# Experiment config reference # Experiment config reference
A config file is needed when create an experiment, the path of the config file is provide to nnictl. A config file is needed when create an experiment, the path of the config file is provide to nnictl.
The config file is written in yaml format, and need to be written correctly. The config file is written in YAML format, and need to be written correctly.
This document describes the rule to write config file, and will provide some examples and templates. This document describes the rule to write config file, and will provide some examples and templates.
* [Template](#Template) (the templates of an config file)
* [Configuration spec](#Configuration) (the configuration specification of every attribute in config file)
* [Examples](#Examples) (the examples of config file)
<a name="Template"></a>
## Template ## Template
* __light weight(without Annotation and Assessor)__ * __light weight(without Annotation and Assessor)__
``` ```yaml
authorName: authorName:
experimentName: experimentName:
trialConcurrency: trialConcurrency:
...@@ -38,7 +45,7 @@ machineList: ...@@ -38,7 +45,7 @@ machineList:
* __Use Assessor__ * __Use Assessor__
``` ```yaml
authorName: authorName:
experimentName: experimentName:
trialConcurrency: trialConcurrency:
...@@ -77,7 +84,7 @@ machineList: ...@@ -77,7 +84,7 @@ machineList:
* __Use Annotation__ * __Use Annotation__
``` ```yaml
authorName: authorName:
experimentName: experimentName:
trialConcurrency: trialConcurrency:
...@@ -113,7 +120,9 @@ machineList: ...@@ -113,7 +120,9 @@ machineList:
passwd: passwd:
``` ```
## Configuration <a name="Configuration"></a>
## Configuration spec
* __authorName__ * __authorName__
* Description * Description
...@@ -138,10 +147,12 @@ machineList: ...@@ -138,10 +147,12 @@ machineList:
__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}. __maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.
Note: The maxExecDuration spec set the time of an experiment, not a trial job. If the experiment reach the max duration time, the experiment will not stop, but could not submit new trial jobs any more.
* __maxTrialNum__ * __maxTrialNum__
* Description * Description
__maxTrialNum__ specifies the max number of trial jobs created by nni, including succeeded and failed jobs. __maxTrialNum__ specifies the max number of trial jobs created by NNI, including succeeded and failed jobs.
* __trainingServicePlatform__ * __trainingServicePlatform__
* Description * Description
...@@ -150,13 +161,11 @@ machineList: ...@@ -150,13 +161,11 @@ machineList:
* __local__ run an experiment on local ubuntu machine. * __local__ run an experiment on local ubuntu machine.
* __remote__ submit trial jobs to remote ubuntu machines, and __machineList__ field should be filed in order to set up SSH connection to remote machine. * __remote__ submit trial jobs to remote ubuntu machines, and __machineList__ field should be filed in order to set up SSH connection to remote machine.
* __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md) * __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), nni support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/). * __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), NNI support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).
* __searchSpacePath__ * __searchSpacePath__
* Description * Description
...@@ -164,6 +173,7 @@ machineList: ...@@ -164,6 +173,7 @@ machineList:
__searchSpacePath__ specifies the path of search space file, which should be a valid path in the local linux machine. __searchSpacePath__ specifies the path of search space file, which should be a valid path in the local linux machine.
Note: if set useAnnotation=True, the searchSpacePath field should be removed. Note: if set useAnnotation=True, the searchSpacePath field should be removed.
* __useAnnotation__ * __useAnnotation__
* Description * Description
...@@ -174,7 +184,7 @@ machineList: ...@@ -174,7 +184,7 @@ machineList:
* __nniManagerIp__ * __nniManagerIp__
* Description * Description
__nniManagerIp__ set the IP address of the machine on which nni manager process runs. This field is optional, and if it's not set, eth0 device IP will be used instead. __nniManagerIp__ set the IP address of the machine on which NNI manager process runs. This field is optional, and if it's not set, eth0 device IP will be used instead.
Note: run ifconfig on NNI manager's machine to check if eth0 device exists. If not, we recommend to set nnimanagerIp explicitly. Note: run ifconfig on NNI manager's machine to check if eth0 device exists. If not, we recommend to set nnimanagerIp explicitly.
...@@ -188,15 +198,14 @@ machineList: ...@@ -188,15 +198,14 @@ machineList:
__logLevel__ sets log level for the experiment, available log levels are: `trace, debug, info, warning, error, fatal`. The default value is `info`. __logLevel__ sets log level for the experiment, available log levels are: `trace, debug, info, warning, error, fatal`. The default value is `info`.
* __tuner__ * __tuner__
* Description * Description
__tuner__ specifies the tuner algorithm in the experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, need to set __builtinTunerName__ and __classArgs__. Another way is to use users' own tuner file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__. __tuner__ specifies the tuner algorithm in the experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by NNI sdk, need to set __builtinTunerName__ and __classArgs__. Another way is to use users' own tuner file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinTunerName__ and __classArgs__ * __builtinTunerName__ and __classArgs__
* __builtinTunerName__ * __builtinTunerName__
__builtinTunerName__ specifies the name of system tuner, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__} __builtinTunerName__ specifies the name of system tuner, NNI sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
* __classArgs__ * __classArgs__
__classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__. __classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__.
...@@ -223,56 +232,71 @@ machineList: ...@@ -223,56 +232,71 @@ machineList:
* Description * Description
__assessor__ specifies the assessor algorithm to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, users need to set __builtinAssessorName__ and __classArgs__. Another way is to use users' own assessor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__. __assessor__ specifies the assessor algorithm to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by NNI sdk, users need to set __builtinAssessorName__ and __classArgs__. Another way is to use users' own assessor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinAssessorName__ and __classArgs__ * __builtinAssessorName__ and __classArgs__
* __builtinAssessorName__ * __builtinAssessorName__
__builtinAssessorName__ specifies the name of system assessor, nni sdk provides one kind of assessor {__Medianstop__} __builtinAssessorName__ specifies the name of system assessor, NNI sdk provides one kind of assessor {__Medianstop__}
* __classArgs__ * __classArgs__
__classArgs__ specifies the arguments of assessor algorithm __classArgs__ specifies the arguments of assessor algorithm
* __codeDir__, __classFileName__, __className__ and __classArgs__ * __codeDir__, __classFileName__, __className__ and __classArgs__
* __codeDir__ * __codeDir__
__codeDir__ specifies the directory of assessor code. __codeDir__ specifies the directory of assessor code.
* __classFileName__ * __classFileName__
__classFileName__ specifies the name of assessor file. __classFileName__ specifies the name of assessor file.
* __className__ * __className__
__className__ specifies the name of assessor class. __className__ specifies the name of assessor class.
* __classArgs__ * __classArgs__
__classArgs__ specifies the arguments of assessor algorithm. __classArgs__ specifies the arguments of assessor algorithm.
* __gpuNum__ * __gpuNum__
__gpuNum__ specifies the gpu number to run the assessor process. The value of this field should be a positive number. __gpuNum__ specifies the gpu number to run the assessor process. The value of this field should be a positive number.
Note: users' could only specify one way to set assessor, for example,set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and users could not set them both.If users do not want to use assessor, assessor fileld should leave to empty. Note: users' could only specify one way to set assessor, for example,set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and users could not set them both.If users do not want to use assessor, assessor fileld should leave to empty.
* __trial(local, remote)__ * __trial(local, remote)__
* __command__ * __command__
__command__ specifies the command to run trial process. __command__ specifies the command to run trial process.
* __codeDir__ * __codeDir__
__codeDir__ specifies the directory of your own trial file. __codeDir__ specifies the directory of your own trial file.
* __gpuNum__ * __gpuNum__
__gpuNum__ specifies the num of gpu to run the trial process. Default value is 0. __gpuNum__ specifies the num of gpu to run the trial process. Default value is 0.
* __trial(pai)__ * __trial(pai)__
* __command__ * __command__
__command__ specifies the command to run trial process. __command__ specifies the command to run trial process.
* __codeDir__ * __codeDir__
__codeDir__ specifies the directory of the own trial file. __codeDir__ specifies the directory of the own trial file.
* __gpuNum__ * __gpuNum__
__gpuNum__ specifies the num of gpu to run the trial process. Default value is 0. __gpuNum__ specifies the num of gpu to run the trial process. Default value is 0.
* __cpuNum__ * __cpuNum__
__cpuNum__ is the cpu number of cpu to be used in pai container. __cpuNum__ is the cpu number of cpu to be used in pai container.
* __memoryMB__ * __memoryMB__
__memoryMB__ set the momory size to be used in pai's container. __memoryMB__ set the momory size to be used in pai's container.
...@@ -289,8 +313,6 @@ machineList: ...@@ -289,8 +313,6 @@ machineList:
__outputDir__ is the output directory in hdfs to be used in pai, the stdout and stderr files are stored in the directory after job finished. __outputDir__ is the output directory in hdfs to be used in pai, the stdout and stderr files are stored in the directory after job finished.
* __trial(kubeflow)__ * __trial(kubeflow)__
* __codeDir__ * __codeDir__
...@@ -300,6 +322,7 @@ machineList: ...@@ -300,6 +322,7 @@ machineList:
* __ps(optional)__ * __ps(optional)__
__ps__ is the configuration for kubeflow's tensorflow-operator. __ps__ is the configuration for kubeflow's tensorflow-operator.
* __replicas__ * __replicas__
__replicas__ is the replica number of __ps__ role. __replicas__ is the replica number of __ps__ role.
...@@ -327,6 +350,7 @@ machineList: ...@@ -327,6 +350,7 @@ machineList:
* __worker__ * __worker__
__worker__ is the configuration for kubeflow's tensorflow-operator. __worker__ is the configuration for kubeflow's tensorflow-operator.
* __replicas__ * __replicas__
__replicas__ is the replica number of __worker__ role. __replicas__ is the replica number of __worker__ role.
...@@ -351,14 +375,14 @@ machineList: ...@@ -351,14 +375,14 @@ machineList:
__image__ set the image to be used in __worker__. __image__ set the image to be used in __worker__.
* __machineList__ * __machineList__
__machineList__ should be set if users set __trainingServicePlatform__=remote, or it could be empty. __machineList__ should be set if __trainingServicePlatform__ is set to remote, or it should be empty.
* __ip__ * __ip__
__ip__ is the ip address of remote machine. __ip__ is the ip address of remote machine.
* __port__ * __port__
__port__ is the ssh port to be used to connect machine. __port__ is the ssh port to be used to connect machine.
...@@ -375,7 +399,7 @@ machineList: ...@@ -375,7 +399,7 @@ machineList:
If users use ssh key to login remote machine, could set __sshKeyPath__ in config file. __sshKeyPath__ is the path of ssh key file, which should be valid. If users use ssh key to login remote machine, could set __sshKeyPath__ in config file. __sshKeyPath__ is the path of ssh key file, which should be valid.
Note: if users set passwd and sshKeyPath simultaneously, nni will try passwd. Note: if users set passwd and sshKeyPath simultaneously, NNI will try passwd.
* __passphrase__ * __passphrase__
...@@ -385,7 +409,7 @@ machineList: ...@@ -385,7 +409,7 @@ machineList:
* __operator__ * __operator__
__operator__ specify the kubeflow's operator to be used, nni support __tf-operator__ in current version. __operator__ specify the kubeflow's operator to be used, NNI support __tf-operator__ in current version.
* __storage__ * __storage__
...@@ -403,11 +427,11 @@ machineList: ...@@ -403,11 +427,11 @@ machineList:
* __vaultName__ * __vaultName__
__vaultName__ is the value of ```--vault-name``` used in az command. __vaultName__ is the value of `--vault-name` used in az command.
* __name__ * __name__
__name__ is the value of ```--name``` used in az command. __name__ is the value of `--name` used in az command.
* __azureStorage__ * __azureStorage__
...@@ -435,84 +459,83 @@ machineList: ...@@ -435,84 +459,83 @@ machineList:
__host__ is the host of pai. __host__ is the host of pai.
<a name="Examples"></a>
## Examples ## Examples
* __local mode__ * __local mode__
If users want to run trial jobs in local machine, and use annotation to generate search space, could use the following config: If users want to run trial jobs in local machine, and use annotation to generate search space, could use the following config:
``` ```yaml
authorName: test authorName: test
experimentName: test_experiment experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: local trainingServicePlatform: local
#choice: true, false #choice: true, false
useAnnotation: true useAnnotation: true
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
gpuNum: 0 gpuNum: 0
``` ```
Could add assessor configuration in config file if set assessor. You can add assessor configuration.
``` ```yaml
authorName: test authorName: test
experimentName: test_experiment experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: local trainingServicePlatform: local
searchSpacePath: /nni/search_space.json searchSpacePath: /nni/search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
assessor: assessor:
#choice: Medianstop #choice: Medianstop
builtinAssessorName: Medianstop builtinAssessorName: Medianstop
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
gpuNum: 0 gpuNum: 0
``` ```
Or you could specify your own tuner and assessor file as following: Or you could specify your own tuner and assessor file as following,
``` ```yaml
authorName: test authorName: test
experimentName: test_experiment experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: local trainingServicePlatform: local
searchSpacePath: /nni/search_space.json searchSpacePath: /nni/search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
codeDir: /nni/tuner codeDir: /nni/tuner
classFileName: mytuner.py classFileName: mytuner.py
className: MyTuner className: MyTuner
...@@ -520,7 +543,7 @@ tuner: ...@@ -520,7 +543,7 @@ tuner:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
assessor: assessor:
codeDir: /nni/assessor codeDir: /nni/assessor
classFileName: myassessor.py classFileName: myassessor.py
className: MyAssessor className: MyAssessor
...@@ -528,40 +551,40 @@ assessor: ...@@ -528,40 +551,40 @@ assessor:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
gpuNum: 0 gpuNum: 0
``` ```
* __remote mode__ * __remote mode__
If run trial jobs in remote machine, users could specify the remote mahcine information as fllowing format: If run trial jobs in remote machine, users could specify the remote mahcine information as fllowing format:
``` ```yaml
authorName: test authorName: test
experimentName: test_experiment experimentName: test_experiment
trialConcurrency: 3 trialConcurrency: 3
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: remote trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json searchSpacePath: /nni/search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: /nni/mnist codeDir: /nni/mnist
gpuNum: 0 gpuNum: 0
#machineList can be empty if the platform is local #machineList can be empty if the platform is local
machineList: machineList:
- ip: 10.10.10.10 - ip: 10.10.10.10
port: 22 port: 22
username: test username: test
...@@ -575,71 +598,71 @@ machineList: ...@@ -575,71 +598,71 @@ machineList:
username: test username: test
sshKeyPath: /nni/sshkey sshKeyPath: /nni/sshkey
passphrase: qwert passphrase: qwert
``` ```
* __pai mode__ * __pai mode__
``` ```yaml
authorName: test authorName: test
experimentName: nni_test1 experimentName: nni_test1
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration:500h maxExecDuration:500h
maxTrialNum: 1 maxTrialNum: 1
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: pai trainingServicePlatform: pai
searchSpacePath: search_space.json searchSpacePath: search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner #choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl) #SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
trial: trial:
command: python3 main.py command: python3 main.py
codeDir: . codeDir: .
gpuNum: 4 gpuNum: 4
cpuNum: 2 cpuNum: 2
memoryMB: 10000 memoryMB: 10000
#The docker image to run nni job on pai #The docker image to run NNI job on pai
image: msranni/nni:latest image: msranni/nni:latest
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' #The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
dataDir: hdfs://10.11.12.13:9000/test dataDir: hdfs://10.11.12.13:9000/test
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' #The hdfs directory to store output data generated by NNI, format 'hdfs://host:port/directory'
outputDir: hdfs://10.11.12.13:9000/test outputDir: hdfs://10.11.12.13:9000/test
paiConfig: paiConfig:
#The username to login pai #The username to login pai
userName: test userName: test
#The password to login pai #The password to login pai
passWord: test passWord: test
#The host of restful server of pai #The host of restful server of pai
host: 10.10.10.10 host: 10.10.10.10
``` ```
* __kubeflow mode__ * __kubeflow mode__
kubeflow use nfs as storage. kubeflow with nfs storage.
``` ```yaml
authorName: default authorName: default
experimentName: example_mni experimentName: example_mni
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 1 maxTrialNum: 1
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow trainingServicePlatform: kubeflow
searchSpacePath: search_space.json searchSpacePath: search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
trial: trial:
codeDir: . codeDir: .
worker: worker:
replicas: 1 replicas: 1
...@@ -648,39 +671,39 @@ trial: ...@@ -648,39 +671,39 @@ trial:
cpuNum: 1 cpuNum: 1
memoryMB: 8192 memoryMB: 8192
image: msranni/nni:latest image: msranni/nni:latest
kubeflowConfig: kubeflowConfig:
operator: tf-operator operator: tf-operator
nfs: nfs:
server: 10.10.10.10 server: 10.10.10.10
path: /var/nfs/general path: /var/nfs/general
``` ```
kubeflow use azure storage kubeflow with azure storage
``` ```yaml
authorName: default authorName: default
experimentName: example_mni experimentName: example_mni
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 1 maxTrialNum: 1
#choice: local, remote, pai, kubeflow #choice: local, remote, pai, kubeflow
trainingServicePlatform: kubeflow trainingServicePlatform: kubeflow
searchSpacePath: search_space.json searchSpacePath: search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
#nniManagerIp: 10.10.10.10 #nniManagerIp: 10.10.10.10
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: maximize optimize_mode: maximize
assessor: assessor:
builtinAssessorName: Medianstop builtinAssessorName: Medianstop
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
gpuNum: 0 gpuNum: 0
trial: trial:
codeDir: . codeDir: .
worker: worker:
replicas: 1 replicas: 1
...@@ -689,7 +712,7 @@ trial: ...@@ -689,7 +712,7 @@ trial:
cpuNum: 1 cpuNum: 1
memoryMB: 4096 memoryMB: 4096
image: msranni/nni:latest image: msranni/nni:latest
kubeflowConfig: kubeflowConfig:
operator: tf-operator operator: tf-operator
keyVault: keyVault:
vaultName: Contoso-Vault vaultName: Contoso-Vault
...@@ -697,4 +720,4 @@ kubeflowConfig: ...@@ -697,4 +720,4 @@ kubeflowConfig:
azureStorage: azureStorage:
accountName: storage accountName: storage
azureShare: share01 azureShare: share01
``` ```
# FAQ
This page is for frequent asked questions and answers. This page is for frequent asked questions and answers.
...@@ -7,23 +9,23 @@ When met errors like below, try to clean up **tmp** folder first. ...@@ -7,23 +9,23 @@ When met errors like below, try to clean up **tmp** folder first.
> OSError: [Errno 28] No space left on device > OSError: [Errno 28] No space left on device
### Cannot get trials' metrics in OpenPAI mode ### Cannot get trials' metrics in OpenPAI mode
In OpenPAI training mode, we start a rest server which listens on 51189 port in nniManager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where nniManager runs on to make sure 51189 port is turned on in the firewall rule. In OpenPAI training mode, we start a rest server which listens on 51189 port in NNI Manager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where NNI manager runs on to make sure 51189 port is turned on in the firewall rule.
### Segmentation Fault (core dumped) when installing ### Segmentation Fault (core dumped) when installing
> make: *** [install-XXX] Segmentation fault (core dumped) > make: *** [install-XXX] Segmentation fault (core dumped)
Please try the following solutions in turn: Please try the following solutions in turn:
* Update or reinstall you current python's pip like `python3 -m pip install -U pip` * Update or reinstall you current python's pip like `python3 -m pip install -U pip`
* Install nni with `--no-cache-dir` flag like `python3 -m pip install nni --no-cache-dir` * Install NNI with `--no-cache-dir` flag like `python3 -m pip install nni --no-cache-dir`
### Job management error: getIPV4Address() failed because os.networkInterfaces().eth0 is undefined. ### Job management error: getIPV4Address() failed because os.networkInterfaces().eth0 is undefined.
Your machine don't have eth0 device, please set nniManagerIp in your config file manually. [refer](https://github.com/Microsoft/nni/blob/master/docs/ExperimentConfig.md) Your machine don't have eth0 device, please set [nniManagerIp](ExperimentConfig.md) in your config file manually.
### Exceed the MaxDuration but didn't stop ### Exceed the MaxDuration but didn't stop
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment. When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
### Could not stop an experiment using `nnictl stop` ### Could not stop an experiment using `nnictl stop`
If you upgrade your nni or you delete some config files of nni when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually. If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually.
### Could not get `default metric` in webUI of virtual machines ### Could not get `default metric` in webUI of virtual machines
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall. Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
......
...@@ -6,23 +6,23 @@ NNI supports running experiment using [FrameworkController](https://github.com/M ...@@ -6,23 +6,23 @@ NNI supports running experiment using [FrameworkController](https://github.com/M
1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes 1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes
2. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig. 2. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig.
3. If your NNI trial job needs GPU resource, you should follow this [guideline](https://github.com/NVIDIA/k8s-device-plugin) to configure **Nvidia device plugin for Kubernetes**. 3. If your NNI trial job needs GPU resource, you should follow this [guideline](https://github.com/NVIDIA/k8s-device-plugin) to configure **Nvidia device plugin for Kubernetes**.
4. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when nni copy files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**. 4. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when NNI copies files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**.
5. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment. Run this command to install NFSv4 client: 5. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment. Run this command to install NFSv4 client:
``` ```
apt-get install nfs-common apt-get install nfs-common
``` ```
6. Install **NNI**, follow the install guide [here](GetStarted.md). 6. Install **NNI**, follow the install guide [here](QuickStart.md).
## Prerequisite for Azure Kubernetes Service ## Prerequisite for Azure Kubernetes Service
1. NNI support kubeflow based on Azure Kubernetes Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernetes Service. 1. NNI support kubeflow based on Azure Kubernetes Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernetes Service.
2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, refer this [guideline](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster). 2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, refer this [guideline](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster).
3. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, nni need Azure Storage Service to store code files and the output files. 3. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files.
4. To access Azure storage service, nni need the access key of the storage account, and nni use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key. 4. To access Azure storage service, NNI need the access key of the storage account, and NNI uses [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
## Set up FrameworkController ## Set up FrameworkController
Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run) to set up frameworkcontroller in the kubernetes cluster, nni support frameworkcontroller by the statefulset mode. Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run) to set up frameworkcontroller in the kubernetes cluster, NNI supports frameworkcontroller by the statefulset mode.
## Design ## Design
Please refer the design of [kubeflow training service](./KubeflowMode.md), frameworkcontroller training service pipeline is similar. Please refer the design of [kubeflow training service](./KubeflowMode.md), frameworkcontroller training service pipeline is similar.
...@@ -71,7 +71,7 @@ frameworkcontrollerConfig: ...@@ -71,7 +71,7 @@ frameworkcontrollerConfig:
server: {your_nfs_server} server: {your_nfs_server}
path: {your_nfs_server_exported_path} path: {your_nfs_server_exported_path}
``` ```
If you use Azure Kubernetes Service, you should set `frameworkcontrollerConfig` in your config yaml file as follows: If you use Azure Kubernetes Service, you should set `frameworkcontrollerConfig` in your config YAML file as follows:
``` ```
frameworkcontrollerConfig: frameworkcontrollerConfig:
storage: azureStorage storage: azureStorage
...@@ -82,9 +82,9 @@ frameworkcontrollerConfig: ...@@ -82,9 +82,9 @@ frameworkcontrollerConfig:
accountName: {your_storage_account_name} accountName: {your_storage_account_name}
azureShare: {your_azure_share_name} azureShare: {your_azure_share_name}
``` ```
Note: You should explicitly set `trainingServicePlatform: frameworkcontroller` in nni config yaml file if you want to start experiment in frameworkcontrollerConfig mode. Note: You should explicitly set `trainingServicePlatform: frameworkcontroller` in NNI config YAML file if you want to start experiment in frameworkcontrollerConfig mode.
The trial's config format for nni frameworkcontroller mode is a simple version of frameworkcontroller's offical config, you could refer the [tensorflow example of frameworkcontroller](https://github.com/Microsoft/frameworkcontroller/blob/master/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml) for deep understanding. The trial's config format for NNI frameworkcontroller mode is a simple version of frameworkcontroller's offical config, you could refer the [tensorflow example of frameworkcontroller](https://github.com/Microsoft/frameworkcontroller/blob/master/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml) for deep understanding.
Trial configuration in frameworkcontroller mode have the following configuration keys: Trial configuration in frameworkcontroller mode have the following configuration keys:
* taskRoles: you could set multiple task roles in config file, and each task role is a basic unit to process in kubernetes cluster. * taskRoles: you could set multiple task roles in config file, and each task role is a basic unit to process in kubernetes cluster.
* name: the name of task role specified, like "worker", "ps", "master". * name: the name of task role specified, like "worker", "ps", "master".
......
**Get Started with NNI**
===
## **Installation**
* __Dependencies__
```bash
python >= 3.5
git
wget
```
python pip should also be correctly installed. You could use "python3 -m pip -v" to check in Linux.
* Note: we don't support virtual environment in current releases.
* __Install NNI through pip__
```bash
python3 -m pip install --user --upgrade nni
```
* __Install NNI through source code__
```bash
git clone -b v0.5 https://github.com/Microsoft/nni.git
cd nni
source install.sh
```
## **Quick start: run a customized experiment**
An experiment is to run multiple trial jobs, each trial job tries a configuration which includes a specific neural architecture (or model) and hyper-parameter values. To run an experiment through NNI, you should:
* Provide a runnable trial
* Provide or choose a tuner
* Provide a yaml experiment configure file
* (optional) Provide or choose an assessor
**Prepare trial**: Let's use a simple trial example, e.g. mnist, provided by NNI. After you installed NNI, NNI examples have been put in ~/nni/examples, run `ls ~/nni/examples/trials` to see all the trial examples. You can simply execute the following command to run the NNI mnist example:
```bash
python3 ~/nni/examples/trials/mnist-annotation/mnist.py
```
This command will be filled in the yaml configure file below. Please refer to [here](howto_1_WriteTrial.md) for how to write your own trial.
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](howto_2_CustomizedTuner.md), but for simplicity, here we choose a tuner provided by NNI as below:
```yaml
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
```
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner, *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the yaml configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below:
```yaml
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote, pai
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
trial:
command: python mnist.py
codeDir: ~/nni/examples/trials/mnist-annotation
gpuNum: 0
```
Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](../tools/annotation/README.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
With all these steps done, we can run the experiment with the following command:
nnictl create --config ~/nni/examples/trials/mnist-annotation/config.yml
You can refer to [here](NNICTLDOC.md) for more usage guide of *nnictl* command line tool.
## View experiment results
The experiment has been running now, NNI provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features. The WebUI is opened by default by `nnictl create`.
## Read more
* [Tuners supported in the latest NNI release](./HowToChooseTuner.md)
* [Overview](Overview.md)
* [Installation](Installation.md)
* [Use command line tool nnictl](NNICTLDOC.md)
* [Use NNIBoard](WebUI.md)
* [Define search space](SearchSpaceSpec.md)
* [Config an experiment](ExperimentConfig.md)
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md)
* [How to run an experiment on multiple machines?](tutorial_2_RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md)
* [How to create a multi-phase experiment](multiPhase.md)
# How to use Tuner that NNI supports?
For now, NNI has supported the following tuner algorithms. Note that NNI installation only installs a subset of those algorithms, other algorithms should be installed through `nnictl package install` before you use them. For example, for SMAC the installation command is `nnictl package install --name=SMAC`.
- [TPE](#TPE)
- [Random Search](#Random)
- [Anneal](#Anneal)
- [Naive Evolution](#Evolution)
- [SMAC](#SMAC) (to install through `nnictl`)
- [Batch Tuner](#Batch)
- [Grid Search](#Grid)
- [Hyperband](#Hyperband)
- [Network Morphism](#NetworkMorphism) (require pyTorch)
- [Metis Tuner](#MetisTuner) (require sklearn)
## Supported tuner algorithms
We will introduce some basic knowledge about the tuner algorithms, suggested scenarios for each tuner, and their example usage (for complete usage spec, please refer to [here]()).
<a name="TPE"></a>
**TPE**
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model.
The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluate matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities.
This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization][1].
_Suggested scenario_: TPE, as a black-box optimization, can be used in various scenarios, and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: TPE
classArgs:
# choice: maximize, minimize
optimize_mode: maximize
```
<a name="Random"></a>
**Random Search**
In [Random Search for Hyper-Parameter Optimization][2] show that Random Search might be surprisingly simple and effective. We suggests that we could use Random Search as baseline when we have no knowledge about the prior distribution of hyper-parameters.
_Suggested scenario_: Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: Random
```
<a name="Anneal"></a>
**Anneal**
This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
_Suggested scenario_: Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: Anneal
classArgs:
# choice: maximize, minimize
optimize_mode: maximize
```
<a name="Evolution"></a>
**Naive Evolution**
Naive Evolution comes from [Large-Scale Evolution of Image Classifiers][3]. It randomly initializes a population based on search space. For each generation, it chooses better ones and do some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naive Evolution requires many trials to works, but it's very simple and easily to expand new features.
_Suggested scenario_: Its requirement of computation resource is relatively high. Specifically, it requires large inital population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: Evolution
classArgs:
# choice: maximize, minimize
optimize_mode: maximize
```
<a name="SMAC"></a>
**SMAC**
[SMAC][4] is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on [the SMAC3 github repo][5].
Note that SMAC on nni only supports a subset of the types in [search space spec](./SearchSpaceSpec.md), including `choice`, `randint`, `uniform`, `loguniform`, `quniform(q=1)`.
_Installation_:
* Install swig first. (`sudo apt-get install swig` for Ubuntu users)
* Run `nnictl package install --name=SMAC`
_Suggested scenario_: Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: SMAC
classArgs:
# choice: maximize, minimize
optimize_mode: maximize
```
<a name="Batch"></a>
**Batch tuner**
Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type `choice` in [search space spec](./SearchSpaceSpec.md).
_Suggested sceanrio_: If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: BatchTuner
```
Note that the search space that BatchTuner supported like:
```json
{
"combine_params":
{
"_type" : "choice",
"_value" : [{"optimizer": "Adam", "learning_rate": 0.00001},
{"optimizer": "Adam", "learning_rate": 0.0001},
{"optimizer": "Adam", "learning_rate": 0.001},
{"optimizer": "SGD", "learning_rate": 0.01},
{"optimizer": "SGD", "learning_rate": 0.005},
{"optimizer": "SGD", "learning_rate": 0.0002}]
}
}
```
The search space file including the high-level key `combine_params`. The type of params in search space must be `choice` and the `values` including all the combined-params value.
<a name="Grid"></a>
**Grid Search**
Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file.
Note that the only acceptable types of search space are `choice`, `quniform`, `qloguniform`. **The number `q` in `quniform` and `qloguniform` has special meaning (different from the spec in [search space spec](./SearchSpaceSpec.md)). It means the number of values that will be sampled evenly from the range `low` and `high`.**
_Suggested scenario_: It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: GridSearch
```
<a name="Hyperband"></a>
**Hyperband**
[Hyperband][6] tries to use limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for small number of STEPs to find out promising one, then further training those promising ones to select several more promising one. More detail can be referred to [here](../src/sdk/pynni/nni/hyperband_advisor/README.md).
_Suggested scenario_: It is suggested when you have limited computation resource but have relatively large search space. It performs good in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent.
_Usage_:
```yaml
# config.yaml
advisor:
builtinAdvisorName: Hyperband
classArgs:
# choice: maximize, minimize
optimize_mode: maximize
# R: the maximum STEPS (could be the number of mini-batches or epochs) can be
# allocated to a trial. Each trial should use STEPS to control how long it runs.
R: 60
# eta: proportion of discarded trials
eta: 3
```
<a name="NetworkMorphism"></a>
**Network Morphism**
[Network Morphism][7] provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width and skip-connection. Next, it estimates the value of child network using the history architecture and metric pairs. Then it selects the most promising one to train. More detail can be referred to [here](../src/sdk/pynni/nni/networkmorphism_tuner/README.md).
_Installation_:
NetworkMorphism requires [pyTorch](https://pytorch.org/get-started/locally), so users should install it first.
_Suggested scenario_: It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](../examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the cv domain.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: NetworkMorphism
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
#for now, this tuner only supports cv domain
task: cv
#input image width
input_width: 32
#input image channel
input_channel: 3
#number of classes
n_output_node: 10
```
<a name="MetisTuner"></a>
**Metis Tuner**
[Metis][10] offers the following benefits when it comes to tuning parameters:
While most tools only predicts the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guess work!
While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter.
While most tools have problems of being exploitation-heavy, Metis' search strategy balances exploration, exploitation, and (optional) re-sampling.
Metis belongs to the class of sequential model-based optimization (SMBO), and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trial. At each iteration, Metis does two tasks:
* It finds the global optimal point in the Gaussian Process space. This point represents the optimal configuration.
* It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and re-sampling.
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`. We only support
numerical `choice` now. More features will support later.
More details can be found in our paper: https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/
_Installation_:
Metis Tuner requires [sklearn](https://scikit-learn.org/), so users should install it first. User could use `pip3 install sklearn` to install it.
_Suggested scenario_:
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](../examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK.
_Usage_:
```yaml
# config.yaml
tuner:
builtinTunerName: MetisTuner
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
```
<a name="assessor"></a>
# How to use Assessor that NNI supports?
For now, NNI has supported the following assessor algorithms.
- [Medianstop](#Medianstop)
- [Curvefitting](#Curvefitting)
## Supported Assessor Algorithms
<a name="Medianstop"></a>
**Medianstop**
Medianstop is a simple early stopping rule mentioned in the [paper][8]. It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S.
_Suggested scenario_: It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress.
_Usage_:
```yaml
assessor:
builtinAssessorName: Medianstop
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
# (optional) A trial is determined to be stopped or not,
* only after receiving start_step number of reported intermediate results.
* The default value of start_step is 0.
start_step: 5
```
<a name="Curvefitting"></a>
**Curvefitting**
Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve, the large set of parametric curve models are chosen from [reference paper][9]. The learning curves' shape coincides with our prior knowlwdge about the form of learning curves: They are typically increasing, saturating functions.
_Suggested scenario_: It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance.
_Usage_:
```yaml
assessor:
builtinAssessorName: Curvefitting
classArgs:
# (required)The total number of epoch.
# We need to know the number of epoch to determine which point we need to predict.
epoch_num: 20
# (optional) choice: maximize, minimize
# Kindly reminds that if you choose minimize mode, please adjust the value of threshold >= 1.0 (e.g threshold=1.1)
* The default value of optimize_mode is maximize
optimize_mode: maximize
# (optional) A trial is determined to be stopped or not
# In order to save our computing resource, we start to predict when we have more than start_step(default=6) accuracy points.
# only after receiving start_step number of reported intermediate results.
* The default value of start_step is 6.
start_step: 6
# (optional) The threshold that we decide to early stop the worse performance curve.
# For example: if threshold = 0.95, optimize_mode = maximize, best performance in the history is 0.9, then we will stop the trial which predict value is lower than 0.95 * 0.9 = 0.855.
* The default value of threshold is 0.95.
threshold: 0.95
```
[1]: https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
[2]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
[3]: https://arxiv.org/pdf/1703.01041.pdf
[4]: https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf
[5]: https://github.com/automl/SMAC3
[6]: https://arxiv.org/pdf/1603.06560.pdf
[7]: https://arxiv.org/abs/1806.10282
[8]: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf
[9]: http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf
[10]:https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment