Unverified Commit cb361b34 authored by chicm-ms's avatar chicm-ms Committed by GitHub
Browse files

Merge pull request #27 from microsoft/master

pull code
parents f36758da 87b0f640
...@@ -21,6 +21,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search ...@@ -21,6 +21,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<p align="center"> <p align="center">
<a href="#nni-has-been-released"><img src="docs/img/overview.svg" /></a> <a href="#nni-has-been-released"><img src="docs/img/overview.svg" /></a>
</p> </p>
<div>
<table> <table>
<tbody> <tbody>
<tr align="center" valign="bottom"> <tr align="center" valign="bottom">
...@@ -37,7 +38,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search ...@@ -37,7 +38,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<img src="docs/img/bar.png"/> <img src="docs/img/bar.png"/>
</td> </td>
</tr> </tr>
<tr/> </tr>
<tr valign="top"> <tr valign="top">
<td> <td>
<ul> <ul>
...@@ -51,41 +52,91 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search ...@@ -51,41 +52,91 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<li>Theano</li> <li>Theano</li>
</ul> </ul>
</td> </td>
<td> <td align="left">
<a href="docs/en_US/BuiltinTuner.md">Tuner</a> <a href="docs/en_US/Tuner/BuiltinTuner.md">Tuner</a>
<br />
<ul> <ul>
<li><a href="docs/en_US/BuiltinTuner.md#TPE">TPE</a></li> <b style="margin-left:-20px"><font size=4 color=#800000>General Tuner</font></b>
<li><a href="docs/en_US/BuiltinTuner.md#Random">Random Search</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#Random"><font size=2.9>Random Search</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#Anneal">Anneal</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#Evolution"><font size=2.9>Naïve Evolution</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#Evolution">Naïve Evolution</a></li> <b><font size=4 color=#800000 style="margin-left:-20px">Tuner for HPO</font></b>
<li><a href="docs/en_US/BuiltinTuner.md#SMAC">SMAC</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#TPE"><font size=2.9>TPE</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#Batch">Batch</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#Anneal"><font size=2.9>Anneal</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#GridSearch">Grid Search</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#SMAC"><font size=2.9>SMAC</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#Hyperband">Hyperband</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#Batch"><font size=2.9>Batch</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#NetworkMorphism">Network Morphism</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#GridSearch"><font size=2.9>Grid Search</font></a></li>
<li><a href="examples/tuners/enas_nni/README.md">ENAS</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#Hyperband"><font size=2.9>Hyperband</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#MetisTuner">Metis Tuner</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#MetisTuner"><font size=2.9>Metis Tuner</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#BOHB">BOHB</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#BOHB"><font size=2.9>BOHB</font></a></li>
<li><a href="docs/en_US/BuiltinTuner.md#GPTuner">GP Tuner</a></li> <li><a href="docs/en_US/Tuner/BuiltinTuner.md#GPTuner"><font size=2.9>GP Tuner</font></a></li>
<b style="margin-left:-20px"><font size=4 color=#800000 style="margin-left:-20px">Tuner for NAS</font></b>
<li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism"><font size=2.9>Network Morphism</font></a></li>
<li><a href="examples/tuners/enas_nni/README.md"><font size=2.9>ENAS</font></a></li>
</ul> </ul>
<a href="docs/en_US/BuiltinAssessor.md">Assessor</a> <a href="docs/en_US/Assessor/BuiltinAssessor.md">Assessor</a>
<ul> <ul>
<li><a href="docs/en_US/BuiltinAssessor.md#Medianstop">Median Stop</a></li> <li><a href="docs/en_US/Assessor/BuiltinAssessor.md#Medianstop"><font size=2.9>Median Stop</font></a></li>
<li><a href="docs/en_US/BuiltinAssessor.md#Curvefitting">Curve Fitting</a></li> <li><a href="docs/en_US/Assessor/BuiltinAssessor.md#Curvefitting"><font size=2.9>Curve Fitting</font></a></li>
</ul> </ul>
</td> </td>
<td> <td>
<ul> <ul>
<li><a href="docs/en_US/LocalMode.md">Local Machine</a></li> <li><a href="docs/en_US/TrainingService/LocalMode.md">Local Machine</a></li>
<li><a href="docs/en_US/RemoteMachineMode.md">Remote Servers</a></li> <li><a href="docs/en_US/TrainingService/RemoteMachineMode.md">Remote Servers</a></li>
<li><a href="docs/en_US/PaiMode.md">OpenPAI</a></li> <li><b>Kubernetes based services</b></li>
<li><a href="docs/en_US/KubeflowMode.md">Kubeflow</a></li> <ul><li><a href="docs/en_US/TrainingService/PaiMode.md">OpenPAI</a></li>
<li><a href="docs/en_US/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li> <li><a href="docs/en_US/TrainingService/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/en_US/TrainingService/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
</ul>
</ul> </ul>
</td> </td>
</tr> </tr>
<tr align="center" valign="bottom">
<td style="border-top:#FF0000 solid 0px;">
<b>References</b>
<img src="docs/img/bar.png"/>
</td>
<td style="border-top:#FF0000 solid 0px;">
<b>References</b>
<img src="docs/img/bar.png"/>
</td>
<td style="border-top:#FF0000 solid 0px;">
<b>References</b>
<img src="docs/img/bar.png"/>
</td>
</tr>
<tr valign="top">
<td style="border-top:#FF0000 solid 0px;">
<ul>
<li><a href="docs/en_US/sdk_reference.rst">Python API</a></li>
<li><a href="docs/en_US/Tutorial/AnnotationSpec.md">NNI Annotation</a></li>
<li><a href="docs/en_US/TrialExample/Trials.md#nni-python-annotation">Annotation tutorial</a></li>
</ul>
</td>
<td style="border-top:#FF0000 solid 0px;">
<ul>
<li><a href="docs/en_US/tuners.rst">Try different tuners</a></li>
<li><a href="docs/en_US/assessors.rst">Try different assessors</a></li>
<li><a href="docs/en_US/Tuner/CustomizeTuner.md">Implement a customized tuner</a></li>
<li><a href="docs/en_US/Tuner/CustomizeAdvisor.md">Implement a customized advisor</a></li>
<li><a href="docs/en_US/Assessor/CustomizeAssessor.md">Implement a customized assessor </a></li>
<li><a href="docs/en_US/CommunitySharings/HpoComparision.md">HPO Comparison</a></li>
<li><a href="docs/en_US/CommunitySharings/NasComparision.md">NAS Comparison</a></li>
<li><a href="docs/en_US/CommunitySharings/RecommendersSvd.md">Automatically tuning SVD on NNI</a></li>
</ul>
</td>
<td style="border-top:#FF0000 solid 0px;">
<ul>
<li><a href="docs/en_US/TrainingService/HowToImplementTrainingService.md">Implement TrainingService in NNI</a></li>
<li><a href="docs/en_US/TrainingService/LocalMode.md">Run an experiment on local</a></li>
<li><a href="docs/en_US/TrainingService/KubeflowMode.md">Run an experiment on Kubeflow</a></li>
<li><a href="docs/en_US/TrainingService/PaiMode.md">Run an experiment on OpenPAI?</a></li>
<li><a href="docs/en_US/TrainingService/RemoteMachineMode.md">Run an experiment on multiple machines?</a></li>
</ul>
</td>
</tbody> </tbody>
</table> </table>
</div>
## **Who should consider using NNI** ## **Who should consider using NNI**
...@@ -127,7 +178,7 @@ Note: ...@@ -127,7 +178,7 @@ Note:
* `--user` can be added if you want to install NNI in your home directory, which does not require any special privileges. * `--user` can be added if you want to install NNI in your home directory, which does not require any special privileges.
* Currently NNI on Windows support local, remote and pai mode. Anaconda or Miniconda is highly recommended to install NNI on Windows. * Currently NNI on Windows support local, remote and pai mode. Anaconda or Miniconda is highly recommended to install NNI on Windows.
* If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/FAQ.md) * If there is any error like `Segmentation fault`, please refer to [FAQ](docs/en_US/Tutorial/FAQ.md)
**Install through source code** **Install through source code**
...@@ -153,9 +204,9 @@ Windows ...@@ -153,9 +204,9 @@ Windows
powershell -ExecutionPolicy Bypass -file install.ps1 powershell -ExecutionPolicy Bypass -file install.ps1
``` ```
For the system requirements of NNI, please refer to [Install NNI](docs/en_US/Installation.md) For the system requirements of NNI, please refer to [Install NNI](docs/en_US/Tutorial/Installation.md)
For NNI on Windows, please refer to [NNI on Windows](docs/en_US/NniOnWindows.md) For NNI on Windows, please refer to [NNI on Windows](docs/en_US/Tutorial/NniOnWindows.md)
**Verify install** **Verify install**
...@@ -211,7 +262,7 @@ You can use these commands to get more information about the experiment ...@@ -211,7 +262,7 @@ You can use these commands to get more information about the experiment
----------------------------------------------------------------------- -----------------------------------------------------------------------
``` ```
* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](docs/en_US/WebUI.md) are more Web UI pages. * Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](docs/en_US/Tutorial/WebUI.md) are more Web UI pages.
<table style="border: none"> <table style="border: none">
<th><img src="./docs/img/webui_overview_page.png" alt="drawing" width="395"/></th> <th><img src="./docs/img/webui_overview_page.png" alt="drawing" width="395"/></th>
...@@ -219,44 +270,63 @@ You can use these commands to get more information about the experiment ...@@ -219,44 +270,63 @@ You can use these commands to get more information about the experiment
</table> </table>
## **Documentation** ## **Documentation**
Our primary documentation is at [here](https://nni.readthedocs.io/en/latest/Overview.html) and is generated from this repository.<br/>
Maybe you want to read:
* [NNI overview](docs/en_US/Overview.md) * [NNI overview](docs/en_US/Overview.md)
* [Quick start](docs/en_US/QuickStart.md) * [Quick start](docs/en_US/Tutorial/QuickStart.md)
* [Contributing](docs/en_US/Tutorial/Contributing.md)
* [Examples](docs/en_US/examples.rst)
* [References](docs/en_US/reference.rst)
* [WebUI tutorial](docs/en_US/Tutorial/WebUI.md)
## **How to** ## **How to**
* [Install NNI](docs/en_US/Installation.md) * [Install NNI](docs/en_US/Tutorial/Installation.md)
* [Use command line tool nnictl](docs/en_US/Nnictl.md) * [Use command line tool nnictl](docs/en_US/Tutorial/Nnictl.md)
* [Use NNIBoard](docs/en_US/WebUI.md) * [Use NNIBoard](docs/en_US/Tutorial/WebUI.md)
* [How to define search space](docs/en_US/SearchSpaceSpec.md) * [How to define search space](docs/en_US/Tutorial/SearchSpaceSpec.md)
* [How to define a trial](docs/en_US/Trials.md) * [How to define a trial](docs/en_US/TrialExample/Trials.md)
* [How to choose tuner/search-algorithm](docs/en_US/BuiltinTuner.md) * [How to choose tuner/search-algorithm](docs/en_US/Tuner/BuiltinTuner.md)
* [Config an experiment](docs/en_US/ExperimentConfig.md) * [Config an experiment](docs/en_US/Tutorial/ExperimentConfig.md)
* [How to use annotation](docs/en_US/Trials.md#nni-python-annotation) * [How to use annotation](docs/en_US/TrialExample/Trials.md#nni-python-annotation)
## **Tutorials** ## **Tutorials**
* [Run an experiment on local (with multiple GPUs)?](docs/en_US/LocalMode.md)
* [Run an experiment on multiple machines?](docs/en_US/RemoteMachineMode.md)
* [Run an experiment on OpenPAI?](docs/en_US/PaiMode.md) * [Run an experiment on OpenPAI?](docs/en_US/PaiMode.md)
* [Run an experiment on Kubeflow?](docs/en_US/KubeflowMode.md) * [Run an experiment on Kubeflow?](docs/en_US/KubeflowMode.md)
* [Run an experiment on local (with multiple GPUs)?](docs/en_US/LocalMode.md)
* [Run an experiment on multiple machines?](docs/en_US/RemoteMachineMode.md)
* [Try different tuners](docs/en_US/tuners.rst) * [Try different tuners](docs/en_US/tuners.rst)
* [Try different assessors](docs/en_US/assessors.rst) * [Try different assessors](docs/en_US/assessors.rst)
* [Implement a customized tuner](docs/en_US/CustomizeTuner.md) * [Implement a customized tuner](docs/en_US/Tuner/CustomizeTuner.md)
* [Implement a customized assessor](docs/en_US/CustomizeAssessor.md) * [Implement a customized assessor](docs/en_US/CustomizeAssessor.md)
* [Use Genetic Algorithm to find good model architectures for Reading Comprehension task](examples/trials/ga_squad/README.md) * [Use Genetic Algorithm to find good model architectures for Reading Comprehension task](examples/trials/ga_squad/README.md)
## **Contribute** ## **Contribute**
This project welcomes contributions and there are many ways in which you can participate in the project, for example:
* Review [source code changes](https://github.com/microsoft/nni/pulls)
* Review the [documentation](https://github.com/microsoft/nni/tree/master/docs) and make pull requests for anything from typos to new content
* Find the issues tagged with ['good first issue'](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) or ['help-wanted'](https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22), these are simple and easy to start , we recommend new contributors to start with.
Before providing your hacks, there are a few simple guidelines that you need to follow:
* [How to debug](docs/en_US/Tutorial/HowToDebug.md)
* [Code Styles & Naming Conventions](docs/en_US/Tutorial/Contributing.md)
* How to Set up [NNI developer environment](docs/en_US/Tutorial/SetupNniDeveloperEnvironment.md)
* Review the [Contributing Instruction](docs/en_US/Tutorial/Contributing.md) and get familiar with the NNI Code Contribution Guideline
## **External Repositories**
Now we have some external usage examples run in NNI from our contributors. Thanks our lovely contributors. And welcome more and more people to join us!
* Run [ENAS](examples/tuners/enas_nni/README.md) in NNI
* Run [Neural Network Architecture Search](examples/trials/nas_cifar10/README.md) in NNI
## **Feedback**
* Open [bug reports](https://github.com/microsoft/nni/issues/new/choose).<br/>
* Request a [new feature](https://github.com/microsoft/nni/issues/new/choose).
* Discuss on the NNI [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) in NNI
* Ask a question with NNI tags on [Stack Overflow](https://stackoverflow.com/questions/tagged/nni?sort=Newest&edited=true)or [file an issue](https://github.com/microsoft/nni/issues/new/choose)on GitHub.
* We are in construction of the instruction for [How to Debug](docs/en_US/Tutorial/HowToDebug.md), you are also welcome to contribute questions or suggestions on this area.
This project welcomes contributions and suggestions, we use [GitHub issues](https://github.com/Microsoft/nni/issues) for tracking requests and bugs.
Issues with the **good first issue** label are simple and easy-to-start ones that we recommend new contributors to start with.
To set up environment for NNI development, refer to the instruction: [Set up NNI developer environment](docs/en_US/SetupNniDeveloperEnvironment.md)
Before start coding, review and get familiar with the NNI Code Contribution Guideline: [Contributing](docs/en_US/Contributing.md)
We are in construction of the instruction for [How to Debug](docs/en_US/HowToDebug.md), you are also welcome to contribute questions or suggestions on this area.
## **License** ## **License**
......
...@@ -10,13 +10,13 @@ To facilitate NAS innovations (e.g., design/implement new NAS models, compare di ...@@ -10,13 +10,13 @@ To facilitate NAS innovations (e.g., design/implement new NAS models, compare di
A new programming interface for designing and searching for a model is often demanded in two scenarios. 1) When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. 2) For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms. A new programming interface for designing and searching for a model is often demanded in two scenarios. 1) When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. 2) For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms.
We designed a simple and flexible programming interface based on [NNI annotation](./AnnotationSpec.md). It is elaborated through examples below. We designed a simple and flexible programming interface based on [NNI annotation](../Tutorial/AnnotationSpec.md). It is elaborated through examples below.
### Example: choose an operator for a layer ### Example: choose an operator for a layer
When designing the following model there might be several choices in the fourth layer that may make this model perform well. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total: When designing the following model there might be several choices in the fourth layer that may make this model perform well. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:
![](../img/example_layerchoice.png) ![](../../img/example_layerchoice.png)
* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where inputs is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., input). Note that only one of the function calls are chosen for this layer. * __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where inputs is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., input). Note that only one of the function calls are chosen for this layer.
* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the input list). * __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the input list).
...@@ -32,19 +32,19 @@ __Debugging__: We provided an `nnictl trial codegen` command to help debugging y ...@@ -32,19 +32,19 @@ __Debugging__: We provided an `nnictl trial codegen` command to help debugging y
Designing connections of layers is critical for making a high performance model. With our provided interface, users could annotate which connections a layer takes (as inputs). They could choose several ones from a set of connections. Below is an example which chooses two inputs from three candidate inputs for `concat`. Here `concat` always takes the output of its previous layer using `fixed_inputs`. Designing connections of layers is critical for making a high performance model. With our provided interface, users could annotate which connections a layer takes (as inputs). They could choose several ones from a set of connections. Below is an example which chooses two inputs from three candidate inputs for `concat`. Here `concat` always takes the output of its previous layer using `fixed_inputs`.
![](../img/example_connectchoice.png) ![](../../img/example_connectchoice.png)
### Example: choose both operators and connections ### Example: choose both operators and connections
In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in inputs, we call `concat` at the beginning of the functions. In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in inputs, we call `concat` at the beginning of the functions.
![](../img/example_combined.png) ![](../../img/example_combined.png)
### Example: [ENAS][1] macro search space ### Example: [ENAS][1] macro search space
To illustrate the convenience of the programming interface, we use the interface to implement the trial code of "ENAS + macro search space". The left figure is the macro search space in ENAS paper. To illustrate the convenience of the programming interface, we use the interface to implement the trial code of "ENAS + macro search space". The left figure is the macro search space in ENAS paper.
![](../img/example_enas.png) ![](../../img/example_enas.png)
## Unified NAS search space specification ## Unified NAS search space specification
...@@ -91,7 +91,7 @@ With the specification of the format of search space and architecture (choice) e ...@@ -91,7 +91,7 @@ With the specification of the format of search space and architecture (choice) e
NNI's annotation compiler transforms the annotated trial code to the code that could receive architecture choice and build the corresponding model (i.e., graph). The NAS search space can be seen as a full graph (here, full graph means enabling all the provided operators and connections to build a graph), the architecture chosen by the tuning algorithm is a subgraph in it. By default, the compiled trial code only builds and executes the subgraph. NNI's annotation compiler transforms the annotated trial code to the code that could receive architecture choice and build the corresponding model (i.e., graph). The NAS search space can be seen as a full graph (here, full graph means enabling all the provided operators and connections to build a graph), the architecture chosen by the tuning algorithm is a subgraph in it. By default, the compiled trial code only builds and executes the subgraph.
![](../img/nas_on_nni.png) ![](../../img/nas_on_nni.png)
The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials. The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials.
...@@ -101,7 +101,7 @@ The above figure shows how the trial code runs on NNI. `nnictl` processes user t ...@@ -101,7 +101,7 @@ The above figure shows how the trial code runs on NNI. `nnictl` processes user t
Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph. Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.
![](../img/nas_weight_share.png) ![](../../img/nas_weight_share.png)
We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage. We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
...@@ -111,9 +111,9 @@ Example of weight sharing on NNI. ...@@ -111,9 +111,9 @@ Example of weight sharing on NNI.
One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS). One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS).
With the same annotated trial code, users could choose One-Shot NAS as execution mode on NNI. Specifically, the compiled trial code builds the full graph (rather than subgraph demonstrated above), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](./MultiPhase.md). We support this training approach because training a subgraph is very fast, building the graph every time training a subgraph induces too much overhead. With the same annotated trial code, users could choose One-Shot NAS as execution mode on NNI. Specifically, the compiled trial code builds the full graph (rather than subgraph demonstrated above), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](MultiPhase.md). We support this training approach because training a subgraph is very fast, building the graph every time training a subgraph induces too much overhead.
![](../img/one-shot_training.png) ![](../../img/one-shot_training.png)
The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchronous parameter-server mode). This may speedup converge. The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchronous parameter-server mode). This may speedup converge.
......
...@@ -6,15 +6,15 @@ Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It s ...@@ -6,15 +6,15 @@ Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It s
In this algorithm, we use 12 curves to fit the learning curve, the large set of parametric curve models are chosen from [reference paper][1]. The learning curves' shape coincides with our prior knowlwdge about the form of learning curves: They are typically increasing, saturating functions. In this algorithm, we use 12 curves to fit the learning curve, the large set of parametric curve models are chosen from [reference paper][1]. The learning curves' shape coincides with our prior knowlwdge about the form of learning curves: They are typically increasing, saturating functions.
![](../img/curvefitting_learning_curve.PNG) ![](../../img/curvefitting_learning_curve.PNG)
We combine all learning curve models into a single, more powerful model. This combined model is given by a weighted linear combination: We combine all learning curve models into a single, more powerful model. This combined model is given by a weighted linear combination:
![](../img/curvefitting_f_comb.gif) ![](../../img/curvefitting_f_comb.gif)
where the new combined parameter vector where the new combined parameter vector
![](../img/curvefitting_expression_xi.gif) ![](../../img/curvefitting_expression_xi.gif)
Assuming additive a Gaussian noise and the noise parameter is initialized to its maximum likelihood estimate. Assuming additive a Gaussian noise and the noise parameter is initialized to its maximum likelihood estimate.
...@@ -30,7 +30,7 @@ Concretely,this algorithm goes through three stages of learning, predicting and ...@@ -30,7 +30,7 @@ Concretely,this algorithm goes through three stages of learning, predicting and
The figure below is the result of our algorithm on MNIST trial history data, where the green point represents the data obtained by Assessor, the blue point represents the future but unknown data, and the red line is the Curve predicted by the Curve fitting assessor. The figure below is the result of our algorithm on MNIST trial history data, where the green point represents the data obtained by Assessor, the blue point represents the future but unknown data, and the red line is the Curve predicted by the Curve fitting assessor.
![](../img/curvefitting_example.PNG) ![](../../img/curvefitting_example.PNG)
## 2. Usage ## 2. Usage
To use Curve Fitting Assessor, you should add the following spec in your experiment's YAML config file: To use Curve Fitting Assessor, you should add the following spec in your experiment's YAML config file:
......
...@@ -5,15 +5,15 @@ Comparison of Hyperparameter Optimization algorithms on several problems. ...@@ -5,15 +5,15 @@ Comparison of Hyperparameter Optimization algorithms on several problems.
Hyperparameter Optimization algorithms are list below: Hyperparameter Optimization algorithms are list below:
- [Random Search](../BuiltinTuner.md) - [Random Search](../Tuner/BuiltinTuner.md)
- [Grid Search](../BuiltinTuner.md) - [Grid Search](../Tuner/BuiltinTuner.md)
- [Evolution](../BuiltinTuner.md) - [Evolution](../Tuner/BuiltinTuner.md)
- [Anneal](../BuiltinTuner.md) - [Anneal](../Tuner/BuiltinTuner.md)
- [Metis](../BuiltinTuner.md) - [Metis](../Tuner/BuiltinTuner.md)
- [TPE](../BuiltinTuner.md) - [TPE](../Tuner/BuiltinTuner.md)
- [SMAC](../BuiltinTuner.md) - [SMAC](../Tuner/BuiltinTuner.md)
- [HyperBand](../BuiltinTuner.md) - [HyperBand](../Tuner/BuiltinTuner.md)
- [BOHB](../BuiltinTuner.md) - [BOHB](../Tuner/BuiltinTuner.md)
All algorithms run in NNI local environment. All algorithms run in NNI local environment.
...@@ -34,7 +34,7 @@ is running in docker?: no ...@@ -34,7 +34,7 @@ is running in docker?: no
### Problem Description ### Problem Description
Nonconvex problem on the hyper-parameter search of [AutoGBDT](../GbdtExample.md) example. Nonconvex problem on the hyper-parameter search of [AutoGBDT](../TrialExample/GbdtExample.md) example.
### Search Space ### Search Space
......
...@@ -7,6 +7,6 @@ In addtion to the official tutorilas and examples, we encourage community contri ...@@ -7,6 +7,6 @@ In addtion to the official tutorilas and examples, we encourage community contri
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
NNI Practice Sharing<nni_practice_sharing> NNI in Recommenders <RecommendersSvd>
Neural Architecture Search Comparison<./CommunitySharings/NasComparison> Neural Architecture Search Comparison <NasComparision>
Hyper-parameter Tuning Algorithm Comparsion<./CommunitySharings/HpoComparison> Hyper-parameter Tuning Algorithm Comparsion <HpoComparision>
...@@ -33,27 +33,27 @@ Basically, an experiment runs as follows: Tuner receives search space and genera ...@@ -33,27 +33,27 @@ Basically, an experiment runs as follows: Tuner receives search space and genera
For each experiment, user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps: For each experiment, user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps:
>Step 1: [Define search space](SearchSpaceSpec.md) >Step 1: [Define search space](Tutorial/SearchSpaceSpec.md)
>Step 2: [Update model codes](Trials.md) >Step 2: [Update model codes](TrialExample/Trials.md)
>Step 3: [Define Experiment](ExperimentConfig.md) >Step 3: [Define Experiment](Tutorial/ExperimentConfig.md)
<p align="center"> <p align="center">
<img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/> <img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/>
</p> </p>
More details about how to run an experiment, please refer to [Get Started](QuickStart.md). More details about how to run an experiment, please refer to [Get Started](Tutorial/QuickStart.md).
## Learn More ## Learn More
* [Get started](QuickStart.md) * [Get started](Tutorial/QuickStart.md)
* [How to adapt your trial code on NNI?](Trials.md) * [How to adapt your trial code on NNI?](TrialExample/Trials.md)
* [What are tuners supported by NNI?](BuiltinTuner.md) * [What are tuners supported by NNI?](Tuner/BuiltinTuner.md)
* [How to customize your own tuner?](CustomizeTuner.md) * [How to customize your own tuner?](Tuner/CustomizeTuner.md)
* [What are assessors supported by NNI?](BuiltinAssessor.md) * [What are assessors supported by NNI?](Assessor/BuiltinAssessor.md)
* [How to customize your own assessor?](CustomizeAssessor.md) * [How to customize your own assessor?](Assessor/CustomizeAssessor.md)
* [How to run an experiment on local?](LocalMode.md) * [How to run an experiment on local?](TrainingService/LocalMode.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md) * [How to run an experiment on multiple machines?](TrainingService/RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PaiMode.md) * [How to run an experiment on OpenPAI?](TrainingService/PaiMode.md)
* [Examples](MnistExamples.md) * [Examples](TrialExample/MnistExamples.md)
\ No newline at end of file \ No newline at end of file
...@@ -5,20 +5,20 @@ ...@@ -5,20 +5,20 @@
### Major Features ### Major Features
* General NAS programming interface * General NAS programming interface
* Add `enas-mode` and `oneshot-mode` for NAS interface: [PR #1201](https://github.com/microsoft/nni/pull/1201#issue-291094510) * Add `enas-mode` and `oneshot-mode` for NAS interface: [PR #1201](https://github.com/microsoft/nni/pull/1201#issue-291094510)
* [Gaussian Process Tuner with Matern kernel](./GPTuner.md) * [Gaussian Process Tuner with Matern kernel](Tuner/GPTuner.md)
* Multiphase experiment supports * Multiphase experiment supports
* Added new training service support for multiphase experiment: PAI mode supports multiphase experiment since v0.9. * Added new training service support for multiphase experiment: PAI mode supports multiphase experiment since v0.9.
* Added multiphase capability for the following builtin tuners: * Added multiphase capability for the following builtin tuners:
* TPE, Random Search, Anneal, Naïve Evolution, SMAC, Network Morphism, Metis Tuner. * TPE, Random Search, Anneal, Naïve Evolution, SMAC, Network Morphism, Metis Tuner.
For details, please refer to [Write a tuner that leverages multi-phase](./MultiPhase.md) For details, please refer to [Write a tuner that leverages multi-phase](AdvancedFeature/MultiPhase.md)
* Web Portal * Web Portal
* Enable trial comparation in Web Portal. For details, refer to [View trials status](WebUI.md) * Enable trial comparation in Web Portal. For details, refer to [View trials status](Tutorial/WebUI.md)
* Allow users to adjust rendering interval of Web Portal. For details, refer to [View Summary Page](WebUI.md) * Allow users to adjust rendering interval of Web Portal. For details, refer to [View Summary Page](Tutorial/WebUI.md)
* show intermediate results more friendly. For details, refer to [View trials status](WebUI.md) * show intermediate results more friendly. For details, refer to [View trials status](Tutorial/WebUI.md)
* [Commandline Interface](Nnictl.md) * [Commandline Interface](Tutorial/Nnictl.md)
* `nnictl experiment delete`: delete one or all experiments, it includes log, result, environment information and cache. It uses to delete useless experiment result, or save disk space. * `nnictl experiment delete`: delete one or all experiments, it includes log, result, environment information and cache. It uses to delete useless experiment result, or save disk space.
* `nnictl platform clean`: It uses to clean up disk on a target platform. The provided YAML file includes the information of target platform, and it follows the same schema as the NNI configuration file. * `nnictl platform clean`: It uses to clean up disk on a target platform. The provided YAML file includes the information of target platform, and it follows the same schema as the NNI configuration file.
### Bug fix and other changes ### Bug fix and other changes
...@@ -41,7 +41,7 @@ ...@@ -41,7 +41,7 @@
* Run trial jobs on the GPU running non-NNI jobs * Run trial jobs on the GPU running non-NNI jobs
* Kubeflow v1beta2 operator * Kubeflow v1beta2 operator
* Support Kubeflow TFJob/PyTorchJob v1beta2 * Support Kubeflow TFJob/PyTorchJob v1beta2
* [General NAS programming interface](./GeneralNasInterfaces.md) * [General NAS programming interface](AdvancedFeature/GeneralNasInterfaces.md)
* Provide NAS programming interface for users to easily express their neural architecture search space through NNI annotation * Provide NAS programming interface for users to easily express their neural architecture search space through NNI annotation
* Provide a new command `nnictl trial codegen` for debugging the NAS code * Provide a new command `nnictl trial codegen` for debugging the NAS code
* Tutorial of NAS programming interface, example of NAS on MNIST, customized random tuner for NAS * Tutorial of NAS programming interface, example of NAS on MNIST, customized random tuner for NAS
...@@ -60,22 +60,22 @@ ...@@ -60,22 +60,22 @@
* Fix bug of table entries * Fix bug of table entries
* Nested search space refinement * Nested search space refinement
* Refine 'randint' type and support lower bound * Refine 'randint' type and support lower bound
* [Comparison of different hyper-parameter tuning algorithm](./CommunitySharings/HpoComparision.md) * [Comparison of different hyper-parameter tuning algorithm](CommunitySharings/HpoComparision.md)
* [Comparison of NAS algorithm](./CommunitySharings/NasComparision.md) * [Comparison of NAS algorithm](CommunitySharings/NasComparision.md)
* [NNI practice on Recommenders](./CommunitySharings/NniPracticeSharing/RecommendersSvd.md) * [NNI practice on Recommenders](CommunitySharings/RecommendersSvd.md)
## Release 0.7 - 4/29/2018 ## Release 0.7 - 4/29/2018
### Major Features ### Major Features
* [Support NNI on Windows](./NniOnWindows.md) * [Support NNI on Windows](Tutorial/NniOnWindows.md)
* NNI running on windows for local mode * NNI running on windows for local mode
* [New advisor: BOHB](./BohbAdvisor.md) * [New advisor: BOHB](Tuner/BohbAdvisor.md)
* Support a new advisor BOHB, which is a robust and efficient hyperparameter tuning algorithm, combines the advantages of Bayesian optimization and Hyperband * Support a new advisor BOHB, which is a robust and efficient hyperparameter tuning algorithm, combines the advantages of Bayesian optimization and Hyperband
* [Support import and export experiment data through nnictl](./Nnictl.md#experiment) * [Support import and export experiment data through nnictl](Tutorial/Nnictl.md#experiment)
* Generate analysis results report after the experiment execution * Generate analysis results report after the experiment execution
* Support import data to tuner and advisor for tuning * Support import data to tuner and advisor for tuning
* [Designated gpu devices for NNI trial jobs](./ExperimentConfig.md#localConfig) * [Designated gpu devices for NNI trial jobs](Tutorial/ExperimentConfig.md#localConfig)
* Specify GPU devices for NNI trial jobs by gpuIndices configuration, if gpuIndices is set in experiment configuration file, only the specified GPU devices are used for NNI trial jobs. * Specify GPU devices for NNI trial jobs by gpuIndices configuration, if gpuIndices is set in experiment configuration file, only the specified GPU devices are used for NNI trial jobs.
* Web Portal enhancement * Web Portal enhancement
* Decimal format of metrics other than default on the Web UI * Decimal format of metrics other than default on the Web UI
...@@ -151,14 +151,14 @@ ...@@ -151,14 +151,14 @@
#### New tuner and assessor supports #### New tuner and assessor supports
* Support [Metis tuner](MetisTuner.md) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning. * Support [Metis tuner](Tuner/MetisTuner.md) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning.
* Support [ENAS customized tuner](https://github.com/countif/enas_nni), a tuner contributed by github community user, is an algorithm for neural network search, it could learn neural network architecture via reinforcement learning and serve a better performance than NAS. * Support [ENAS customized tuner](https://github.com/countif/enas_nni), a tuner contributed by github community user, is an algorithm for neural network search, it could learn neural network architecture via reinforcement learning and serve a better performance than NAS.
* Support [Curve fitting assessor](CurvefittingAssessor.md) for early stop policy using learning curve extrapolation. * Support [Curve fitting assessor](Assessor/CurvefittingAssessor.md) for early stop policy using learning curve extrapolation.
* Advanced Support of [Weight Sharing](./AdvancedNas.md): Enable weight sharing for NAS tuners, currently through NFS. * Advanced Support of [Weight Sharing](AdvancedFeature/AdvancedNas.md): Enable weight sharing for NAS tuners, currently through NFS.
#### Training Service Enhancement #### Training Service Enhancement
* [FrameworkController Training service](./FrameworkControllerMode.md): Support run experiments using frameworkcontroller on kubernetes * [FrameworkController Training service](TrainingService/FrameworkControllerMode.md): Support run experiments using frameworkcontroller on kubernetes
* FrameworkController is a Controller on kubernetes that is general enough to run (distributed) jobs with various machine learning frameworks, such as tensorflow, pytorch, MXNet. * FrameworkController is a Controller on kubernetes that is general enough to run (distributed) jobs with various machine learning frameworks, such as tensorflow, pytorch, MXNet.
* NNI provides unified and simple specification for job definition. * NNI provides unified and simple specification for job definition.
* MNIST example for how to use FrameworkController. * MNIST example for how to use FrameworkController.
...@@ -176,11 +176,11 @@ ...@@ -176,11 +176,11 @@
#### New tuner supports #### New tuner supports
* Support [network morphism](NetworkmorphismTuner.md) as a new tuner * Support [network morphism](Tuner/NetworkmorphismTuner.md) as a new tuner
#### Training Service improvements #### Training Service improvements
* Migrate [Kubeflow training service](KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client * Migrate [Kubeflow training service](TrainingService/KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client
* [Pytorch-operator](https://github.com/kubeflow/pytorch-operator) support for Kubeflow training service * [Pytorch-operator](https://github.com/kubeflow/pytorch-operator) support for Kubeflow training service
* Improvement on local code files uploading to OpenPAI HDFS * Improvement on local code files uploading to OpenPAI HDFS
* Fixed OpenPAI integration WebUI bug: WebUI doesn't show latest trial job status, which is caused by OpenPAI token expiration * Fixed OpenPAI integration WebUI bug: WebUI doesn't show latest trial job status, which is caused by OpenPAI token expiration
...@@ -207,11 +207,11 @@ ...@@ -207,11 +207,11 @@
### Major Features ### Major Features
* [Kubeflow Training service](./KubeflowMode.md) * [Kubeflow Training service](TrainingService/KubeflowMode.md)
* Support tf-operator * Support tf-operator
* [Distributed trial example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow * [Distributed trial example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
* [Grid search tuner](GridsearchTuner.md) * [Grid search tuner](Tuner/GridsearchTuner.md)
* [Hyperband tuner](HyperbandAdvisor.md) * [Hyperband tuner](Tuner/HyperbandAdvisor.md)
* Support launch NNI experiment on MAC * Support launch NNI experiment on MAC
* WebUI * WebUI
* UI support for hyperband tuner * UI support for hyperband tuner
...@@ -246,7 +246,7 @@ ...@@ -246,7 +246,7 @@
``` ```
* Support updating max trial number. * Support updating max trial number.
use `nnictl update --help` to learn more. Or refer to [NNICTL Spec](Nnictl.md) for the fully usage of NNICTL. use `nnictl update --help` to learn more. Or refer to [NNICTL Spec](Tutorial/Nnictl.md) for the fully usage of NNICTL.
### API new features and updates ### API new features and updates
...@@ -283,7 +283,7 @@ ...@@ -283,7 +283,7 @@
### Others ### Others
* UI refactoring, refer to [WebUI doc](WebUI.md) for how to work with the new UI. * UI refactoring, refer to [WebUI doc](Tutorial/WebUI.md) for how to work with the new UI.
* Continuous Integration: NNI had switched to Azure pipelines * Continuous Integration: NNI had switched to Azure pipelines
* [Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues). * [Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues).
...@@ -291,10 +291,10 @@ ...@@ -291,10 +291,10 @@
### Major Features ### Major Features
* Support [OpenPAI](https://github.com/Microsoft/pai) Training Platform (See [here](./PaiMode.md) for instructions about how to submit NNI job in pai mode) * Support [OpenPAI](https://github.com/Microsoft/pai) Training Platform (See [here](TrainingService/PaiMode.md) for instructions about how to submit NNI job in pai mode)
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster * Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking * NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](SmacTuner.md) for instructions about how to use SMAC tuner) * Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](Tuner/SmacTuner.md) for instructions about how to use SMAC tuner)
* [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO to handle categorical parameters. The SMAC supported by NNI is a wrapper on [SMAC3](https://github.com/automl/SMAC3) * [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO to handle categorical parameters. The SMAC supported by NNI is a wrapper on [SMAC3](https://github.com/automl/SMAC3)
* Support NNI installation on [conda](https://conda.io/docs/index.html) and python virtual environment * Support NNI installation on [conda](https://conda.io/docs/index.html) and python virtual environment
* Others * Others
......
...@@ -15,7 +15,7 @@ NNI supports running experiment using [FrameworkController](https://github.com/M ...@@ -15,7 +15,7 @@ NNI supports running experiment using [FrameworkController](https://github.com/M
apt-get install nfs-common apt-get install nfs-common
``` ```
6. Install **NNI**, follow the install guide [here](QuickStart.md). 6. Install **NNI**, follow the install guide [here](../Tutorial/QuickStart.md).
## Prerequisite for Azure Kubernetes Service ## Prerequisite for Azure Kubernetes Service
...@@ -30,7 +30,7 @@ Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/mas ...@@ -30,7 +30,7 @@ Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/mas
## Design ## Design
Please refer the design of [Kubeflow training service](./KubeflowMode.md), FrameworkController training service pipeline is similar. Please refer the design of [Kubeflow training service](KubeflowMode.md), FrameworkController training service pipeline is similar.
## Example ## Example
...@@ -109,7 +109,7 @@ Trial configuration in frameworkcontroller mode have the following configuration ...@@ -109,7 +109,7 @@ Trial configuration in frameworkcontroller mode have the following configuration
## How to run example ## How to run example
After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on FrameworkController is similar to Kubeflow, please refer the [document](./KubeflowMode.md) for more information. After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on FrameworkController is similar to Kubeflow, please refer the [document](KubeflowMode.md) for more information.
## version check ## version check
......
...@@ -5,9 +5,9 @@ ...@@ -5,9 +5,9 @@
TrainingService is a module related to platform management and job schedule in NNI. TrainingService is designed to be easily implemented, we define an abstract class TrainingService as the parent class of all kinds of TrainingService, users just need to inherit the parent class and complete their own child class if they want to implement customized TrainingService. TrainingService is a module related to platform management and job schedule in NNI. TrainingService is designed to be easily implemented, we define an abstract class TrainingService as the parent class of all kinds of TrainingService, users just need to inherit the parent class and complete their own child class if they want to implement customized TrainingService.
## System architecture ## System architecture
![](../img/NNIDesign.jpg) ![](../../img/NNIDesign.jpg)
The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports local platfrom, [remote platfrom](RemoteMachineMode.md), [PAI platfrom](PaiMode.md), [kubeflow platform](KubeflowMode.md) and [FrameworkController platfrom](FrameworkControllerMode.md). The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports [local platfrom](LocalMode.md), [remote platfrom](RemoteMachineMode.md), [PAI platfrom](PaiMode.md), [kubeflow platform](KubeflowMode.md) and [FrameworkController platfrom](FrameworkControllerMode.md).
In this document, we introduce the brief design of TrainingService. If users want to add a new TrainingService instance, they just need to complete a child class to implement TrainingService, don't need to understand the code detail of NNIManager, Dispatcher or other modules. In this document, we introduce the brief design of TrainingService. If users want to add a new TrainingService instance, they just need to complete a child class to implement TrainingService, don't need to understand the code detail of NNIManager, Dispatcher or other modules.
...@@ -154,12 +154,12 @@ NNI offers a TrialKeeper tool to help maintaining trial jobs. Users can find the ...@@ -154,12 +154,12 @@ NNI offers a TrialKeeper tool to help maintaining trial jobs. Users can find the
The running architecture of TrialKeeper is show as follow: The running architecture of TrialKeeper is show as follow:
![](../img/trialkeeper.jpg) ![](../../img/trialkeeper.jpg)
When users submit a trial job to cloud platform, they should wrap their trial command into TrialKeeper, and start a TrialKeeper process in cloud platform. Notice that TrialKeeper use restful server to communicate with TrainingService, users should start a restful server in local machine to receive metrics sent from TrialKeeper. The source code about restful server could be found in `nni/src/nni_manager/training_service/common/clusterJobRestServer.ts`. When users submit a trial job to cloud platform, they should wrap their trial command into TrialKeeper, and start a TrialKeeper process in cloud platform. Notice that TrialKeeper use restful server to communicate with TrainingService, users should start a restful server in local machine to receive metrics sent from TrialKeeper. The source code about restful server could be found in `nni/src/nni_manager/training_service/common/clusterJobRestServer.ts`.
## Reference ## Reference
For more information about how to debug, please [refer](HowToDebug.md). For more information about how to debug, please [refer](../Tutorial/HowToDebug.md).
The guideline of how to contribute, please [refer](Contributing.md). The guideline of how to contribute, please [refer](../Tutorial/Contributing.md).
...@@ -16,7 +16,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku ...@@ -16,7 +16,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
apt-get install nfs-common apt-get install nfs-common
``` ```
7. Install **NNI**, follow the install guide [here](QuickStart.md). 7. Install **NNI**, follow the install guide [here](../Tutorial/QuickStart.md).
## Prerequisite for Azure Kubernetes Service ## Prerequisite for Azure Kubernetes Service
...@@ -28,7 +28,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku ...@@ -28,7 +28,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
## Design ## Design
![](../img/kubeflow_training_design.png) ![](../../img/kubeflow_training_design.png)
Kubeflow training service instantiates a Kubernetes rest client to interact with your K8s cluster's API server. Kubeflow training service instantiates a Kubernetes rest client to interact with your K8s cluster's API server.
For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumes: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in NNI config YAML file. After files are prepared, Kubeflow training service will call K8S rest API to create Kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volume into the job's pod. Output files of Kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files. For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumes: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in NNI config YAML file. After files are prepared, Kubeflow training service will call K8S rest API to create Kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volume into the job's pod. Output files of Kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files.
......
...@@ -56,7 +56,7 @@ The hyper-parameters used in `Step 1.2 - Get predefined parameters` is defined i ...@@ -56,7 +56,7 @@ The hyper-parameters used in `Step 1.2 - Get predefined parameters` is defined i
"learning_rate":{"_type":"uniform","_value":[0.0001, 0.1]} "learning_rate":{"_type":"uniform","_value":[0.0001, 0.1]}
} }
``` ```
Refer to [SearchSpaceSpec.md](SearchSpaceSpec.md) to learn more about search space. Refer to [SearchSpaceSpec.md](../Tutorial/SearchSpaceSpec.md) to learn more about search space.
>Step 3 - Define Experiment >Step 3 - Define Experiment
...@@ -83,16 +83,16 @@ Let's use a simple trial example, e.g. mnist, provided by NNI. After you install ...@@ -83,16 +83,16 @@ Let's use a simple trial example, e.g. mnist, provided by NNI. After you install
python ~/nni/examples/trials/mnist-annotation/mnist.py python ~/nni/examples/trials/mnist-annotation/mnist.py
This command will be filled in the YAML configure file below. Please refer to [here](Trials.md) for how to write your own trial. This command will be filled in the YAML configure file below. Please refer to [here](../TrialExample/Trials.md) for how to write your own trial.
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](CustomizeTuner.md)), but for simplicity, here we choose a tuner provided by NNI as below: **Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](../Tuner/CustomizeTuner.md)), but for simplicity, here we choose a tuner provided by NNI as below:
tuner: tuner:
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found [here](BuiltinTuner.md)), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result. *builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found [here](../Tuner/BuiltinTuner.md)), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below: **Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below:
...@@ -124,13 +124,13 @@ trial: ...@@ -124,13 +124,13 @@ trial:
gpuNum: 0 gpuNum: 0
``` ```
Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](AnnotationSpec.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires. Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](../Tutorial/AnnotationSpec.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
With all these steps done, we can run the experiment with the following command: With all these steps done, we can run the experiment with the following command:
nnictl create --config ~/nni/examples/trials/mnist-annotation/config.yml nnictl create --config ~/nni/examples/trials/mnist-annotation/config.yml
You can refer to [here](Nnictl.md) for more usage guide of *nnictl* command line tool. You can refer to [here](../Tutorial/Nnictl.md) for more usage guide of *nnictl* command line tool.
## View experiment results ## View experiment results
The experiment has been running now. Other than *nnictl*, NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features. The experiment has been running now. Other than *nnictl*, NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features.
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
NNI supports running an experiment on [OpenPAI](https://github.com/Microsoft/pai) (aka pai), called pai mode. Before starting to use NNI pai mode, you should have an account to access an [OpenPAI](https://github.com/Microsoft/pai) cluster. See [here](https://github.com/Microsoft/pai#how-to-deploy) if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. In pai mode, your trial program will run in pai's container created by Docker. NNI supports running an experiment on [OpenPAI](https://github.com/Microsoft/pai) (aka pai), called pai mode. Before starting to use NNI pai mode, you should have an account to access an [OpenPAI](https://github.com/Microsoft/pai) cluster. See [here](https://github.com/Microsoft/pai#how-to-deploy) if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. In pai mode, your trial program will run in pai's container created by Docker.
## Setup environment ## Setup environment
Install NNI, follow the install guide [here](QuickStart.md). Install NNI, follow the install guide [here](../Tutorial/QuickStart.md).
## Run an experiment ## Run an experiment
Use `examples/trials/mnist-annotation` as an example. The NNI config YAML file's content is like: Use `examples/trials/mnist-annotation` as an example. The NNI config YAML file's content is like:
...@@ -43,7 +43,7 @@ paiConfig: ...@@ -43,7 +43,7 @@ paiConfig:
Note: You should set `trainingServicePlatform: pai` in NNI config YAML file if you want to start experiment in pai mode. Note: You should set `trainingServicePlatform: pai` in NNI config YAML file if you want to start experiment in pai mode.
Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial configuration in pai mode have these additional keys: Compared with [LocalMode](LocalMode.md) and [RemoteMachineMode](RemoteMachineMode.md), trial configuration in pai mode have these additional keys:
* cpuNum * cpuNum
* Required key. Should be positive number based on your trial program's CPU requirement * Required key. Should be positive number based on your trial program's CPU requirement
* memoryMB * memoryMB
...@@ -66,17 +66,17 @@ nnictl create --config exp_pai.yml ...@@ -66,17 +66,17 @@ nnictl create --config exp_pai.yml
``` ```
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`. to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see jobs created by NNI in the OpenPAI cluster's web portal, like: You can see jobs created by NNI in the OpenPAI cluster's web portal, like:
![](../img/nni_pai_joblist.jpg) ![](../../img/nni_pai_joblist.jpg)
Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic. Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information. Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
Expand a trial information in trial list view, click the logPath link like: Expand a trial information in trial list view, click the logPath link like:
![](../img/nni_webui_joblist.jpg) ![](../../img/nni_webui_joblist.jpg)
And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS: And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS:
![](../img/nni_trial_hdfs_output.jpg) ![](../../img/nni_trial_hdfs_output.jpg)
You can see there're three fils in output folder: stderr, stdout, and trial.log You can see there're three fils in output folder: stderr, stdout, and trial.log
...@@ -92,4 +92,4 @@ Check policy: ...@@ -92,4 +92,4 @@ Check policy:
3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7. 3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7.
If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check. If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check.
![](../img/version_check.png) ![](../../img/version_check.png)
\ No newline at end of file \ No newline at end of file
...@@ -12,7 +12,7 @@ e.g. Three machines and you login in with account `bob` (Note: the account is no ...@@ -12,7 +12,7 @@ e.g. Three machines and you login in with account `bob` (Note: the account is no
## Setup NNI environment ## Setup NNI environment
Install NNI on each of your machines following the install guide [here](QuickStart.md). Install NNI on each of your machines following the install guide [here](../Tutorial/QuickStart.md).
## Run an experiment ## Run an experiment
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment