* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](docs/en_US/WebUI.md) are more Web UI pages.
* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](docs/en_US/Tutorial/WebUI.md) are more Web UI pages.
* Review the [documentation](https://github.com/microsoft/nni/tree/master/docs) and make pull requests for anything from typos to new content
* Find the issues tagged with ['good first issue'](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) or ['help-wanted'](https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22), these are simple and easy to start , we recommend new contributors to start with.
Before providing your hacks, there are a few simple guidelines that you need to follow:
*[How to debug](docs/en_US/Tutorial/HowToDebug.md)
* How to Set up [NNI developer environment](docs/en_US/Tutorial/SetupNniDeveloperEnvironment.md)
* Review the [Contributing Instruction](docs/en_US/Tutorial/Contributing.md) and get familiar with the NNI Code Contribution Guideline
## **External Repositories**
Now we have some external usage examples run in NNI from our contributors. Thanks our lovely contributors. And welcome more and more people to join us!
* Run [ENAS](examples/tuners/enas_nni/README.md) in NNI
* Run [Neural Network Architecture Search](examples/trials/nas_cifar10/README.md) in NNI
## **Feedback**
* Open [bug reports](https://github.com/microsoft/nni/issues/new/choose).<br/>
* Request a [new feature](https://github.com/microsoft/nni/issues/new/choose).
* Discuss on the NNI [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) in NNI
* Ask a question with NNI tags on [Stack Overflow](https://stackoverflow.com/questions/tagged/nni?sort=Newest&edited=true)or [file an issue](https://github.com/microsoft/nni/issues/new/choose)on GitHub.
* We are in construction of the instruction for [How to Debug](docs/en_US/Tutorial/HowToDebug.md), you are also welcome to contribute questions or suggestions on this area.
This project welcomes contributions and suggestions, we use [GitHub issues](https://github.com/Microsoft/nni/issues) for tracking requests and bugs.
Issues with the **good first issue** label are simple and easy-to-start ones that we recommend new contributors to start with.
To set up environment for NNI development, refer to the instruction: [Set up NNI developer environment](docs/en_US/SetupNniDeveloperEnvironment.md)
Before start coding, review and get familiar with the NNI Code Contribution Guideline: [Contributing](docs/en_US/Contributing.md)
We are in construction of the instruction for [How to Debug](docs/en_US/HowToDebug.md), you are also welcome to contribute questions or suggestions on this area.
@@ -10,13 +10,13 @@ To facilitate NAS innovations (e.g., design/implement new NAS models, compare di
A new programming interface for designing and searching for a model is often demanded in two scenarios. 1) When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. 2) For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms.
We designed a simple and flexible programming interface based on [NNI annotation](./AnnotationSpec.md). It is elaborated through examples below.
We designed a simple and flexible programming interface based on [NNI annotation](../Tutorial/AnnotationSpec.md). It is elaborated through examples below.
### Example: choose an operator for a layer
When designing the following model there might be several choices in the fourth layer that may make this model perform well. In the script of this model, we can use annotation for the fourth layer as shown in the figure. In this annotation, there are five fields in total:


* __layer_choice__: It is a list of function calls, each function should have defined in user's script or imported libraries. The input arguments of the function should follow the format: `def XXX(inputs, arg2, arg3, ...)`, where inputs is a list with two elements. One is the list of `fixed_inputs`, and the other is a list of the chosen inputs from `optional_inputs`. `conv` and `pool` in the figure are examples of function definition. For the function calls in this list, no need to write the first argument (i.e., input). Note that only one of the function calls are chosen for this layer.
* __fixed_inputs__: It is a list of variables, the variable could be an output tensor from a previous layer. The variable could be `layer_output` of another `nni.mutable_layer` before this layer, or other python variables before this layer. All the variables in this list will be fed into the chosen function in `layer_choice` (as the first element of the input list).
...
...
@@ -32,19 +32,19 @@ __Debugging__: We provided an `nnictl trial codegen` command to help debugging y
Designing connections of layers is critical for making a high performance model. With our provided interface, users could annotate which connections a layer takes (as inputs). They could choose several ones from a set of connections. Below is an example which chooses two inputs from three candidate inputs for `concat`. Here `concat` always takes the output of its previous layer using `fixed_inputs`.


### Example: choose both operators and connections
In this example, we choose one from the three operators and choose two connections for it. As there are multiple variables in inputs, we call `concat` at the beginning of the functions.


### Example: [ENAS][1] macro search space
To illustrate the convenience of the programming interface, we use the interface to implement the trial code of "ENAS + macro search space". The left figure is the macro search space in ENAS paper.


## Unified NAS search space specification
...
...
@@ -91,7 +91,7 @@ With the specification of the format of search space and architecture (choice) e
NNI's annotation compiler transforms the annotated trial code to the code that could receive architecture choice and build the corresponding model (i.e., graph). The NAS search space can be seen as a full graph (here, full graph means enabling all the provided operators and connections to build a graph), the architecture chosen by the tuning algorithm is a subgraph in it. By default, the compiled trial code only builds and executes the subgraph.


The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials.
...
...
@@ -101,7 +101,7 @@ The above figure shows how the trial code runs on NNI. `nnictl` processes user t
Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.


We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
...
...
@@ -111,9 +111,9 @@ Example of weight sharing on NNI.
One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS).
With the same annotated trial code, users could choose One-Shot NAS as execution mode on NNI. Specifically, the compiled trial code builds the full graph (rather than subgraph demonstrated above), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](./MultiPhase.md). We support this training approach because training a subgraph is very fast, building the graph every time training a subgraph induces too much overhead.
With the same annotated trial code, users could choose One-Shot NAS as execution mode on NNI. Specifically, the compiled trial code builds the full graph (rather than subgraph demonstrated above), it receives a chosen architecture and training this architecture on the full graph for a mini-batch, then request another chosen architecture. It is supported by [NNI multi-phase](MultiPhase.md). We support this training approach because training a subgraph is very fast, building the graph every time training a subgraph induces too much overhead.


The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchronous parameter-server mode). This may speedup converge.
@@ -6,15 +6,15 @@ Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It s
In this algorithm, we use 12 curves to fit the learning curve, the large set of parametric curve models are chosen from [reference paper][1]. The learning curves' shape coincides with our prior knowlwdge about the form of learning curves: They are typically increasing, saturating functions.


We combine all learning curve models into a single, more powerful model. This combined model is given by a weighted linear combination:


where the new combined parameter vector


Assuming additive a Gaussian noise and the noise parameter is initialized to its maximum likelihood estimate.
...
...
@@ -30,7 +30,7 @@ Concretely,this algorithm goes through three stages of learning, predicting and
The figure below is the result of our algorithm on MNIST trial history data, where the green point represents the data obtained by Assessor, the blue point represents the future but unknown data, and the red line is the Curve predicted by the Curve fitting assessor.


## 2. Usage
To use Curve Fitting Assessor, you should add the following spec in your experiment's YAML config file:
@@ -33,27 +33,27 @@ Basically, an experiment runs as follows: Tuner receives search space and genera
For each experiment, user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps:
For details, please refer to [Write a tuner that leverages multi-phase](./MultiPhase.md)
For details, please refer to [Write a tuner that leverages multi-phase](AdvancedFeature/MultiPhase.md)
* Web Portal
* Enable trial comparation in Web Portal. For details, refer to [View trials status](WebUI.md)
* Allow users to adjust rendering interval of Web Portal. For details, refer to [View Summary Page](WebUI.md)
* show intermediate results more friendly. For details, refer to [View trials status](WebUI.md)
*[Commandline Interface](Nnictl.md)
* Enable trial comparation in Web Portal. For details, refer to [View trials status](Tutorial/WebUI.md)
* Allow users to adjust rendering interval of Web Portal. For details, refer to [View Summary Page](Tutorial/WebUI.md)
* show intermediate results more friendly. For details, refer to [View trials status](Tutorial/WebUI.md)
*[Commandline Interface](Tutorial/Nnictl.md)
*`nnictl experiment delete`: delete one or all experiments, it includes log, result, environment information and cache. It uses to delete useless experiment result, or save disk space.
*`nnictl platform clean`: It uses to clean up disk on a target platform. The provided YAML file includes the information of target platform, and it follows the same schema as the NNI configuration file.
### Bug fix and other changes
...
...
@@ -41,7 +41,7 @@
* Run trial jobs on the GPU running non-NNI jobs
* Kubeflow v1beta2 operator
* Support Kubeflow TFJob/PyTorchJob v1beta2
*[General NAS programming interface](./GeneralNasInterfaces.md)
*[General NAS programming interface](AdvancedFeature/GeneralNasInterfaces.md)
* Provide NAS programming interface for users to easily express their neural architecture search space through NNI annotation
* Provide a new command `nnictl trial codegen` for debugging the NAS code
* Tutorial of NAS programming interface, example of NAS on MNIST, customized random tuner for NAS
...
...
@@ -60,22 +60,22 @@
* Fix bug of table entries
* Nested search space refinement
* Refine 'randint' type and support lower bound
*[Comparison of different hyper-parameter tuning algorithm](./CommunitySharings/HpoComparision.md)
*[Comparison of NAS algorithm](./CommunitySharings/NasComparision.md)
*[NNI practice on Recommenders](./CommunitySharings/NniPracticeSharing/RecommendersSvd.md)
*[Comparison of different hyper-parameter tuning algorithm](CommunitySharings/HpoComparision.md)
*[Comparison of NAS algorithm](CommunitySharings/NasComparision.md)
*[NNI practice on Recommenders](CommunitySharings/RecommendersSvd.md)
## Release 0.7 - 4/29/2018
### Major Features
*[Support NNI on Windows](./NniOnWindows.md)
*[Support NNI on Windows](Tutorial/NniOnWindows.md)
* NNI running on windows for local mode
*[New advisor: BOHB](./BohbAdvisor.md)
*[New advisor: BOHB](Tuner/BohbAdvisor.md)
* Support a new advisor BOHB, which is a robust and efficient hyperparameter tuning algorithm, combines the advantages of Bayesian optimization and Hyperband
*[Support import and export experiment data through nnictl](./Nnictl.md#experiment)
*[Support import and export experiment data through nnictl](Tutorial/Nnictl.md#experiment)
* Generate analysis results report after the experiment execution
* Support import data to tuner and advisor for tuning
*[Designated gpu devices for NNI trial jobs](./ExperimentConfig.md#localConfig)
*[Designated gpu devices for NNI trial jobs](Tutorial/ExperimentConfig.md#localConfig)
* Specify GPU devices for NNI trial jobs by gpuIndices configuration, if gpuIndices is set in experiment configuration file, only the specified GPU devices are used for NNI trial jobs.
* Web Portal enhancement
* Decimal format of metrics other than default on the Web UI
...
...
@@ -151,14 +151,14 @@
#### New tuner and assessor supports
* Support [Metis tuner](MetisTuner.md) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning.
* Support [Metis tuner](Tuner/MetisTuner.md) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning.
* Support [ENAS customized tuner](https://github.com/countif/enas_nni), a tuner contributed by github community user, is an algorithm for neural network search, it could learn neural network architecture via reinforcement learning and serve a better performance than NAS.
* Support [Curve fitting assessor](CurvefittingAssessor.md) for early stop policy using learning curve extrapolation.
* Advanced Support of [Weight Sharing](./AdvancedNas.md): Enable weight sharing for NAS tuners, currently through NFS.
* Support [Curve fitting assessor](Assessor/CurvefittingAssessor.md) for early stop policy using learning curve extrapolation.
* Advanced Support of [Weight Sharing](AdvancedFeature/AdvancedNas.md): Enable weight sharing for NAS tuners, currently through NFS.
#### Training Service Enhancement
*[FrameworkController Training service](./FrameworkControllerMode.md): Support run experiments using frameworkcontroller on kubernetes
*[FrameworkController Training service](TrainingService/FrameworkControllerMode.md): Support run experiments using frameworkcontroller on kubernetes
* FrameworkController is a Controller on kubernetes that is general enough to run (distributed) jobs with various machine learning frameworks, such as tensorflow, pytorch, MXNet.
* NNI provides unified and simple specification for job definition.
* MNIST example for how to use FrameworkController.
...
...
@@ -176,11 +176,11 @@
#### New tuner supports
* Support [network morphism](NetworkmorphismTuner.md) as a new tuner
* Support [network morphism](Tuner/NetworkmorphismTuner.md) as a new tuner
#### Training Service improvements
* Migrate [Kubeflow training service](KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client
* Migrate [Kubeflow training service](TrainingService/KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client
*[Pytorch-operator](https://github.com/kubeflow/pytorch-operator) support for Kubeflow training service
* Improvement on local code files uploading to OpenPAI HDFS
* Fixed OpenPAI integration WebUI bug: WebUI doesn't show latest trial job status, which is caused by OpenPAI token expiration
...
...
@@ -207,11 +207,11 @@
### Major Features
*[Kubeflow Training service](./KubeflowMode.md)
*[Kubeflow Training service](TrainingService/KubeflowMode.md)
* Support tf-operator
*[Distributed trial example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
*[Grid search tuner](GridsearchTuner.md)
*[Hyperband tuner](HyperbandAdvisor.md)
*[Grid search tuner](Tuner/GridsearchTuner.md)
*[Hyperband tuner](Tuner/HyperbandAdvisor.md)
* Support launch NNI experiment on MAC
* WebUI
* UI support for hyperband tuner
...
...
@@ -246,7 +246,7 @@
```
* Support updating max trial number.
use `nnictl update --help` to learn more. Or refer to [NNICTL Spec](Nnictl.md) for the fully usage of NNICTL.
use `nnictl update --help` to learn more. Or refer to [NNICTL Spec](Tutorial/Nnictl.md) for the fully usage of NNICTL.
### API new features and updates
...
...
@@ -283,7 +283,7 @@
### Others
* UI refactoring, refer to [WebUI doc](WebUI.md) for how to work with the new UI.
* UI refactoring, refer to [WebUI doc](Tutorial/WebUI.md) for how to work with the new UI.
* Continuous Integration: NNI had switched to Azure pipelines
*[Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues).
...
...
@@ -291,10 +291,10 @@
### Major Features
* Support [OpenPAI](https://github.com/Microsoft/pai) Training Platform (See [here](./PaiMode.md) for instructions about how to submit NNI job in pai mode)
* Support [OpenPAI](https://github.com/Microsoft/pai) Training Platform (See [here](TrainingService/PaiMode.md) for instructions about how to submit NNI job in pai mode)
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](SmacTuner.md) for instructions about how to use SMAC tuner)
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](Tuner/SmacTuner.md) for instructions about how to use SMAC tuner)
*[SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO to handle categorical parameters. The SMAC supported by NNI is a wrapper on [SMAC3](https://github.com/automl/SMAC3)
* Support NNI installation on [conda](https://conda.io/docs/index.html) and python virtual environment
@@ -15,7 +15,7 @@ NNI supports running experiment using [FrameworkController](https://github.com/M
apt-get install nfs-common
```
6. Install **NNI**, follow the install guide [here](QuickStart.md).
6. Install **NNI**, follow the install guide [here](../Tutorial/QuickStart.md).
## Prerequisite for Azure Kubernetes Service
...
...
@@ -30,7 +30,7 @@ Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/mas
## Design
Please refer the design of [Kubeflow training service](./KubeflowMode.md), FrameworkController training service pipeline is similar.
Please refer the design of [Kubeflow training service](KubeflowMode.md), FrameworkController training service pipeline is similar.
## Example
...
...
@@ -109,7 +109,7 @@ Trial configuration in frameworkcontroller mode have the following configuration
## How to run example
After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on FrameworkController is similar to Kubeflow, please refer the [document](./KubeflowMode.md) for more information.
After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on FrameworkController is similar to Kubeflow, please refer the [document](KubeflowMode.md) for more information.
TrainingService is a module related to platform management and job schedule in NNI. TrainingService is designed to be easily implemented, we define an abstract class TrainingService as the parent class of all kinds of TrainingService, users just need to inherit the parent class and complete their own child class if they want to implement customized TrainingService.
## System architecture


The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports local platfrom, [remote platfrom](RemoteMachineMode.md), [PAI platfrom](PaiMode.md), [kubeflow platform](KubeflowMode.md) and [FrameworkController platfrom](FrameworkControllerMode.md).
The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports [local platfrom](LocalMode.md), [remote platfrom](RemoteMachineMode.md), [PAI platfrom](PaiMode.md), [kubeflow platform](KubeflowMode.md) and [FrameworkController platfrom](FrameworkControllerMode.md).
In this document, we introduce the brief design of TrainingService. If users want to add a new TrainingService instance, they just need to complete a child class to implement TrainingService, don't need to understand the code detail of NNIManager, Dispatcher or other modules.
...
...
@@ -154,12 +154,12 @@ NNI offers a TrialKeeper tool to help maintaining trial jobs. Users can find the
The running architecture of TrialKeeper is show as follow:


When users submit a trial job to cloud platform, they should wrap their trial command into TrialKeeper, and start a TrialKeeper process in cloud platform. Notice that TrialKeeper use restful server to communicate with TrainingService, users should start a restful server in local machine to receive metrics sent from TrialKeeper. The source code about restful server could be found in `nni/src/nni_manager/training_service/common/clusterJobRestServer.ts`.
## Reference
For more information about how to debug, please [refer](HowToDebug.md).
For more information about how to debug, please [refer](../Tutorial/HowToDebug.md).
The guideline of how to contribute, please [refer](Contributing.md).
The guideline of how to contribute, please [refer](../Tutorial/Contributing.md).
@@ -16,7 +16,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
apt-get install nfs-common
```
7. Install **NNI**, follow the install guide [here](QuickStart.md).
7. Install **NNI**, follow the install guide [here](../Tutorial/QuickStart.md).
## Prerequisite for Azure Kubernetes Service
...
...
@@ -28,7 +28,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
## Design


Kubeflow training service instantiates a Kubernetes rest client to interact with your K8s cluster's API server.
For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumes: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in NNI config YAML file. After files are prepared, Kubeflow training service will call K8S rest API to create Kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volume into the job's pod. Output files of Kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files.
This command will be filled in the YAML configure file below. Please refer to [here](Trials.md) for how to write your own trial.
This command will be filled in the YAML configure file below. Please refer to [here](../TrialExample/Trials.md) for how to write your own trial.
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](CustomizeTuner.md)), but for simplicity, here we choose a tuner provided by NNI as below:
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](../Tuner/CustomizeTuner.md)), but for simplicity, here we choose a tuner provided by NNI as below:
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found [here](BuiltinTuner.md)), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found [here](../Tuner/BuiltinTuner.md)), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below:
...
...
@@ -124,13 +124,13 @@ trial:
gpuNum: 0
```
Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](AnnotationSpec.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](../Tutorial/AnnotationSpec.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
With all these steps done, we can run the experiment with the following command:
You can refer to [here](Nnictl.md) for more usage guide of *nnictl* command line tool.
You can refer to [here](../Tutorial/Nnictl.md) for more usage guide of *nnictl* command line tool.
## View experiment results
The experiment has been running now. Other than *nnictl*, NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features.
NNI supports running an experiment on [OpenPAI](https://github.com/Microsoft/pai)(aka pai), called pai mode. Before starting to use NNI pai mode, you should have an account to access an [OpenPAI](https://github.com/Microsoft/pai) cluster. See [here](https://github.com/Microsoft/pai#how-to-deploy) if you don't have any OpenPAI account and want to deploy an OpenPAI cluster. In pai mode, your trial program will run in pai's container created by Docker.
## Setup environment
Install NNI, follow the install guide [here](QuickStart.md).
Install NNI, follow the install guide [here](../Tutorial/QuickStart.md).
## Run an experiment
Use `examples/trials/mnist-annotation` as an example. The NNI config YAML file's content is like:
...
...
@@ -43,7 +43,7 @@ paiConfig:
Note: You should set `trainingServicePlatform: pai` in NNI config YAML file if you want to start experiment in pai mode.
Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial configuration in pai mode have these additional keys:
Compared with [LocalMode](LocalMode.md) and [RemoteMachineMode](RemoteMachineMode.md), trial configuration in pai mode have these additional keys:
* cpuNum
* Required key. Should be positive number based on your trial program's CPU requirement
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see jobs created by NNI in the OpenPAI cluster's web portal, like:


Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
Expand a trial information in trial list view, click the logPath link like:


And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS:


You can see there're three fils in output folder: stderr, stdout, and trial.log
...
...
@@ -92,4 +92,4 @@ Check policy:
3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7.
If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check.