NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.
### **NNI [v0.7](https://github.com/Microsoft/nni/releases) has been released!**
### **NNI [v0.8](https://github.com/Microsoft/nni/releases) has been released!**
Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in [search space spec](SearchSpaceSpec.md).
Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in [search space spec](SearchSpaceSpec.md).
Suggested sceanrio: If the configurations you want to try have been decided, you can list them in searchspace file (using choice) and run them using batch tuner.
Suggested scenario: If the configurations you want to try have been decided, you can list them in SearchSpace file (using choice) and run them using batch tuner.
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`.
Note that the only acceptable types of search space are `choice`, `quniform`, `uniform` and `randint`.
**Installation**
Metis Tuner requires [sklearn](https://scikit-learn.org/), so users should install it first. User could use `pip3 install sklearn` to install it.
**Suggested scenario**
**Suggested scenario**
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK. [Detailed Description](./MetisTuner.md)
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK. [Detailed Description](./MetisTuner.md)
In this tutorial, we first introduce a github repo [Recommenders](https://github.com/Microsoft/Recommenders). It is a repository that provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. It has various models that are popular and widely deployed in recommendation systems. To provide a complete end-to-end experience, they present each example in five key tasks, as shown below:
In this tutorial, we first introduce a github repo [Recommenders](https://github.com/Microsoft/Recommenders). It is a repository that provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. It has various models that are popular and widely deployed in recommendation systems. To provide a complete end-to-end experience, they present each example in five key tasks, as shown below:
-[Prepare Data](https://github.com/Microsoft/Recommenders/blob/master/notebooks/01_prepare_data/README.md): Preparing and loading data for each recommender algorithm
-[Prepare Data](https://github.com/Microsoft/Recommenders/blob/master/notebooks/01_prepare_data/README.md): Preparing and loading data for each recommender algorithm.
-[Model](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/README.md): Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares ([ALS](https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS)) or eXtreme Deep Factorization Machines ([xDeepFM](https://arxiv.org/abs/1803.05170)).
-[Model](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/README.md): Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares ([ALS](https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS)) or eXtreme Deep Factorization Machines ([xDeepFM](https://arxiv.org/abs/1803.05170)).
-[Evaluate](https://github.com/Microsoft/Recommenders/blob/master/notebooks/03_evaluate/README.md): Evaluating algorithms with offline metrics
-[Evaluate](https://github.com/Microsoft/Recommenders/blob/master/notebooks/03_evaluate/README.md): Evaluating algorithms with offline metrics.
-[Model Select and Optimize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/README.md): Tuning and optimizing hyperparameters for recommender models
-[Model Select and Optimize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/README.md): Tuning and optimizing hyperparameters for recommender models.
-[Operationalize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/05_operationalize/README.md): Operationalizing models in a production environment on Azure
-[Operationalize](https://github.com/Microsoft/Recommenders/blob/master/notebooks/05_operationalize/README.md): Operationalizing models in a production environment on Azure.
The fourth task is tuning and optimizing the model's hyperparametrs, this is where NNI could help. To give a concrete example that NNI tunes the models in Recommenders, let's demonstrate with the model [SVD](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb), and data Movielens100k. There are more than 10 hyperparameters to be tuned in this model.
The fourth task is tuning and optimizing the model's hyperparameters, this is where NNI could help. To give a concrete example that NNI tunes the models in Recommenders, let's demonstrate with the model [SVD](https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb), and data Movielens100k. There are more than 10 hyperparameters to be tuned in this model.
[This Jupyter notebook](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb) provided by Recommenders is a very detailed step-by-step tutorial for this example. It uses different built-in tuning algorithms in NNI, including `Annealing`, `SMAC`, `Random Search`, `TPE`, `Hyperband`, `Metis` and `Evolution`. Finally, the results of different tuning algorithms are compared. Please go through this notebook to learn how to use NNI to tune SVD model, then you could further use NNI to tune other models in Recommenders.
[This Jupyter notebook](https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb) provided by Recommenders is a very detailed step-by-step tutorial for this example. It uses different built-in tuning algorithms in NNI, including `Annealing`, `SMAC`, `Random Search`, `TPE`, `Hyperband`, `Metis` and `Evolution`. Finally, the results of different tuning algorithms are compared. Please go through this notebook to learn how to use NNI to tune SVD model, then you could further use NNI to tune other models in Recommenders.
@@ -24,7 +24,7 @@ Your machine don't have eth0 device, please set [nniManagerIp](ExperimentConfig.
...
@@ -24,7 +24,7 @@ Your machine don't have eth0 device, please set [nniManagerIp](ExperimentConfig.
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
### Could not stop an experiment using `nnictl stop`
### Could not stop an experiment using `nnictl stop`
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually.
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the PID of your experiment, and use `kill -9 {pid}` to kill it manually.
### Could not get `default metric` in webUI of virtual machines
### Could not get `default metric` in webUI of virtual machines
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
...
@@ -34,7 +34,7 @@ Unable to open the WebUI may have the following reasons:
...
@@ -34,7 +34,7 @@ Unable to open the WebUI may have the following reasons:
* http://127.0.0.1, http://172.17.0.1 and http://10.0.0.15 are referred to localhost, if you start your experiment on the server or remote machine. You can replace the IP to your server IP to view the WebUI, like http://[your_server_ip]:8080
* http://127.0.0.1, http://172.17.0.1 and http://10.0.0.15 are referred to localhost, if you start your experiment on the server or remote machine. You can replace the IP to your server IP to view the WebUI, like http://[your_server_ip]:8080
* If you still can't see the WebUI after you use the server IP, you can check the proxy and the firewall of your machine. Or use the browser on the machine where you start your NNI experiment.
* If you still can't see the WebUI after you use the server IP, you can check the proxy and the firewall of your machine. Or use the browser on the machine where you start your NNI experiment.
* Another reason may be your experiment is failed and NNI may fail to get the experiment infomation. You can check the log of NNImanager in the following directory: ~/nni/experiment/[your_experiment_id] /log/nnimanager.log
* Another reason may be your experiment is failed and NNI may fail to get the experiment information. You can check the log of NNIManager in the following directory: ~/nni/experiment/[your_experiment_id] /log/nnimanager.log
# General Programming Interface for Neural Architecture Search
# General Programming Interface for Neural Architecture Search (experimental feature)
_*This is an experimental feature, currently, we only implemented the general NAS programming interface. Weight sharing and one-shot NAS based on this programming interface will be supported in the following releases._
Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexibile programming interface is crucial.
To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexible programming interface is crucial.
## Programming interface
## Programming interface
...
@@ -10,19 +12,21 @@ To facilitate NAS innovations (e.g., design/implement new NAS models, compare di
...
@@ -10,19 +12,21 @@ To facilitate NAS innovations (e.g., design/implement new NAS models, compare di
We designed a simple and flexible programming interface based on [NNI annotation](./AnnotationSpec.md). It is elaborated through examples below.
We designed a simple and flexible programming interface based on [NNI annotation](./AnnotationSpec.md). It is elaborated through examples below.
### Example: choose an operator for a layer
### Example: choose an operator for a layer
When designing the following model there might be several choices in the fourth layer that may make this model perform good. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:
When designing the following model there might be several choices in the fourth layer that may make this model perform good. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:


* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where `inputs` is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., `input`). Note that only one of the function calls are chosen for this layer.
* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where inputs is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., input). Note that only one of the function calls are chosen for this layer.
* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another nni.mutable_layer before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the `input` list).
* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the input list).
* __optional_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another nni.mutable_layer before this layer, or other python variables before this layer. Only `input_num` variables will be fed into the chosen function in `layer_choice` (as the second element of the `input` list).
* __optional_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. Only `optional_input_size` variables will be fed into the chosen function in `layer_choice` (as the second element of the input list).
* __optional_input_size__: It indicates how many inputs are chosen from `input_candidates`. It could be a number or a range. A range [1,3] means it chooses 1, 2, or 3 inputs.
* __optional_input_size__: It indicates how many inputs are chosen from `input_candidates`. It could be a number or a range. A range [1,3] means it chooses 1, 2, or 3 inputs.
* __layer_output__: The name of the output(s) of this layer, in this case it represents the return of the function call in `layer_choice`. This will be a variable name that can be used in the following python code or nni.mutable_layer(s).
* __layer_output__: The name of the output(s) of this layer, in this case it represents the return of the function call in `layer_choice`. This will be a variable name that can be used in the following python code or `nni.mutable_layer`.
There are two ways to write annotation for this example. For the upper one, input of the function calls is `[[],[out3]]`. For the bottom one, input is `[[out3],[]]`.
There are two ways to write annotation for this example. For the upper one, `input` of the function calls is `[[],[out3]]`. For the bottom one, `input` is `[[out3],[]]`.
__Debugging__: We provided an `nnictl trial codegen` command to help debugging your code of NAS programming on NNI. If your trial with trial_id `XXX` in your experiment `YYY` is failed, you could run `nnictl trial codegen YYY --trial_id XXX` to generate an executable code for this trial under your current directory. With this code, you can directly run the trial command without NNI to check why this trial is failed. Basically, this command is to compile your trial code and replace the NNI NAS code with the real chosen layers and inputs.
### Example: choose input connections for a layer
### Example: choose input connections for a layer
...
@@ -32,7 +36,7 @@ Designing connections of layers is critical for making a high performance model.
...
@@ -32,7 +36,7 @@ Designing connections of layers is critical for making a high performance model.
### Example: choose both operators and connections
### Example: choose both operators and connections
In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in `inputs`, we call `concat` at the beginning of the functions.
In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in inputs, we call `concat` at the beginning of the functions.


...
@@ -42,10 +46,9 @@ To illustrate the convenience of the programming interface, we use the interface
...
@@ -42,10 +46,9 @@ To illustrate the convenience of the programming interface, we use the interface


## Unified NAS search space specification
## Unified NAS search space specification
After finishing the trial code through the annotation above, users have implicitly specified the search space of neural architectures in the code. Based on the code, NNI will automatcailly generate a search space file which could be fed into tuning algorithms. This search space file follows the following `json` format.
After finishing the trial code through the annotation above, users have implicitly specified the search space of neural architectures in the code. Based on the code, NNI will automatically generate a search space file which could be fed into tuning algorithms. This search space file follows the following JSON format.
```json
```json
{
{
...
@@ -78,7 +81,7 @@ Accordingly, a specified neural architecture (generated by tuning algorithm) is
...
@@ -78,7 +81,7 @@ Accordingly, a specified neural architecture (generated by tuning algorithm) is
}
}
```
```
With the specification of the format of search space and architecture (choice) expression, users are free to implement various (general) tuning algorithms for neural architecture search on NNI. One future work is to provide a general NAS algorihtm.
With the specification of the format of search space and architecture (choice) expression, users are free to implement various (general) tuning algorithms for neural architecture search on NNI. One future work is to provide a general NAS algorithm.
@@ -90,11 +93,11 @@ NNI's annotation compiler transforms the annotated trial code to the code that c
...
@@ -90,11 +93,11 @@ NNI's annotation compiler transforms the annotated trial code to the code that c


The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trilas.
The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials.
[__TODO__] Simple example of NAS on NNI.
[Simple example of NAS on NNI](https://github.com/microsoft/nni/tree/v0.8/examples/trials/mnist-nas).
### Weight sharing
### [__TODO__] Weight sharing
Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.
Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.
...
@@ -102,9 +105,9 @@ Sharing weights among chosen architectures (i.e., trials) could speedup model se
...
@@ -102,9 +105,9 @@ Sharing weights among chosen architectures (i.e., trials) could speedup model se
We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
[__TODO__] Example of weight sharing on NNI.
Example of weight sharing on NNI.
### Support of One-Shot NAS
### [__TODO__] Support of One-Shot NAS
One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS).
One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS).
...
@@ -112,20 +115,20 @@ With the same annotated trial code, users could choose One-Shot NAS as execution
...
@@ -112,20 +115,20 @@ With the same annotated trial code, users could choose One-Shot NAS as execution


The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchroneous parameter-server mode). This may speedup converge.
The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchronous parameter-server mode). This may speedup converge.
[__TODO__] Example of One-Shot NAS on NNI.
Example of One-Shot NAS on NNI.
## General tuning algorithms for NAS
## [__TODO__] General tuning algorithms for NAS
Like hyperparameter tuning, a relatively general algorithm for NAS is required. The general programming interface makes this task easier to some extent. We have a RL-based tuner algorithm for NAS from our contributors. We expect efforts from community to design and implement better NAS algorithms.
Like hyperparameter tuning, a relatively general algorithm for NAS is required. The general programming interface makes this task easier to some extent. We have a RL-based tuner algorithm for NAS from our contributors. We expect efforts from community to design and implement better NAS algorithms.
[__TODO__] More tuning algorithms for NAS.
More tuning algorithms for NAS.
## Export best neural architecture and code
## [__TODO__] Export best neural architecture and code
[__TODO__] After the NNI experiment is done, users could run `nnictl experiment export --code` to export the trial code with the best neural architecture.
After the NNI experiment is done, users could run `nnictl experiment export --code` to export the trial code with the best neural architecture.
## Conclusion and Future work
## Conclusion and Future work
...
@@ -133,7 +136,6 @@ There could be different NAS algorithms and execution modes, but they could be s
...
@@ -133,7 +136,6 @@ There could be different NAS algorithms and execution modes, but they could be s
There are many interesting research topics in this area, both system and machine learning.
There are many interesting research topics in this area, both system and machine learning.
@@ -38,7 +38,7 @@ All types of sampling strategies and their parameter are listed here:
...
@@ -38,7 +38,7 @@ All types of sampling strategies and their parameter are listed here:
* {"_type":"randint","_value":[lower, upper]}
* {"_type":"randint","_value":[lower, upper]}
* For now, we implment the "randint" distribution with "quniform", which means the variable value is a value like round(uniform(lower, upper)). The type of chosen value is float. If you want to use integer value, please convert it explicitly.
* For now, we implement the "randint" distribution with "quniform", which means the variable value is a value like round(uniform(lower, upper)). The type of chosen value is float. If you want to use integer value, please convert it explicitly.
* {"_type":"uniform","_value":[low, high]}
* {"_type":"uniform","_value":[low, high]}
* Which means the variable value is a value uniformly between low and high.
* Which means the variable value is a value uniformly between low and high.
权重分配(转移)在加速 NAS 中有关键作用,而找到有效的权重共享方式仍是热门的研究课题。 NNI 提供了一个键值存储,用于存储和加载权重。 Tuner 和 Trial 使用 KV 客户端库来访问存储。
[**待实现**] NNI 上的权重共享示例。
### 支持 One-Shot NAS
One-Shot NAS 是流行的,能在有限的时间和资源预算内找到较好的神经网络结构的方法。 本质上,它会基于搜索空间来构建完整的图,并使用梯度下降最终找到最佳子图。 它有不同的训练方法,如:[training subgraphs (per mini-batch)](https://arxiv.org/abs/1802.03268) ,[training full graph through dropout](http://proceedings.mlr.press/v80/bender18a/bender18a.pdf),以及 [training with architecture weights (regularization)](https://arxiv.org/abs/1806.09055) 。 这里会关注第一种方法,即训练子图(ENAS)。