[Hyperband][1] is a popular automl algorithm. The basic idea of Hyperband is that it creates several buckets, each bucket has`n` randomly generated hyperparameter configurations, each configuration uses`r` resource (e.g., epoch number, batch number). After the `n` configurations is finished, it chooses top `n/eta` configurations and runs them using increased `r*eta` resource. At last, it chooses the best configuration it has found so far.
[Hyperband][1] is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having`n` randomly generated hyperparameter configurations, each configuration using`r` resources (e.g., epoch number, batch number). After the `n` configurations are finished, it chooses the top `n/eta` configurations and runs them using increased `r*eta` resources. At last, it chooses the best configuration it has found so far.
## 2. Implementation with fully parallelism
First, this is an example of how to write an automl algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it advisor.
## 2. Implementation with full parallelism
First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.
Second, this implementation fully leverages Hyperband's internal parallelism. More specifically, the next bucket is not started strictly after the current bucket, instead, it starts when there is available resource.
Second, this implementation fully leverages Hyperband's internal parallelism. Specifically, the next bucket is not started strictly after the current bucket. Instead, it starts when there are available resources.
## 3. Usage
To use Hyperband, you should add the following spec in your experiment's YAML config file:
...
...
@@ -25,8 +25,7 @@ advisor:
optimize_mode: maximize
```
Note that once you use advisor, it is not allowed to add tuner and assessor spec in the config file any more.
If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `TRIAL_BUDGET` besides the hyperparameters defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.
Note that once you use Advisor, you are not allowed to add a Tuner and Assessor spec in the config file. If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there will be one more key called `TRIAL_BUDGET` defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.
For `report_intermediate_result(metric)` and `report_final_result(metric)` in your trial code, **`metric` should be either a number or a dict which has a key `default` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.
...
...
@@ -47,11 +46,11 @@ Here is a concrete example of `R=81` and `eta=3`:
`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many budgets these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.
About how to write trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.
For information about writing trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.
## 4. To be improved
The current implementation of Hyperband can be further improved by supporting simple early stop algorithm, because it is possible that not all the configurations in the top `n/eta` perform good. The unpromising configurations can be stopped early.
## 4. Future improvements
The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top `n/eta` perform well. Any unpromising configurations should be stopped early.
In the current implementation, configurations are generated randomly, which follows the design in the [paper][1]. To further improve, configurations could be generated more wisely by leveraging advanced algorithms.
In the current implementation, configurations are generated randomly which follows the design in the [paper][1]. As an improvement, configurations could be generated more wisely by leveraging advanced algorithms.
@@ -3,11 +3,11 @@ TPE, Random Search, Anneal Tuners on NNI
## TPE
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluate matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
### Parallel TPE optimization
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original intention of the algorithm design is to optimize sequential. When we use TPE with a large concurrency, its performance will be bad. We have optimized this phenomenon using Constant Liar algorithm. For the principle of optimization, please refer to our [research blog](../CommunitySharings/ParallelizingTpeSearch.md).
TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original algorithm design was optimized for sequential computation. If we were to use TPE with much concurrency, its performance will be bad. We have optimized this case using the Constant Liar algorithm. For these principles of optimization, please refer to our [research blog](../CommunitySharings/ParallelizingTpeSearch.md).
### Usage
...
...
@@ -22,16 +22,16 @@ tuner:
constant_liar_type:min
```
**Requirement of classArg**
***optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
***parallel_optimize** (*bool, optional, default = False*) - If True, TPE will use Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
***constant_liar_type** (*min or max or mean, optional, default = min*) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. Corresponding to three values, min{Y}, max{Y}, and mean{Y}.
**classArgs requirements:**
***optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
***parallel_optimize** (*bool, optional, default = False*) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
***constant_liar_type** (*min or max or mean, optional, default = min*) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.
## Random Search
In [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) show that Random Search might be surprisingly simple and effective. We suggests that we could use Random Search as baseline when we have no knowledge about the prior distribution of hyper-parameters.
In [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf)we show that Random Search might be surprisingly effective despite its simplicity. We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
## Anneal
This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
[Metis](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/) offers the following benefits when it comes to tuning parameters: While most tools only predicts the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guess work!
[Metis](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/) offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter.
While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to resample a particular hyper-parameter.
While most tools have problems of being exploitation-heavy, Metis' search strategy balances exploration, exploitation, and (optional) re-sampling.
While most tools have problems of being exploitation-heavy, Metis' search strategy balances exploration, exploitation, and (optional) resampling.
Metis belongs to the class of sequential model-based optimization (SMBO), and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trial. At each iteration, Metis does two tasks:
Metis belongs to the class of sequential model-based optimization (SMBO) algorithms and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both a Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trials. At each iteration, Metis does two tasks:
It finds the global optimal point in the Gaussian Process space. This point represents the optimal configuration.
*It finds the global optimal point in the Gaussian Process space. This point represents the optimal configuration.
It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and re-sampling.
*It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and resampling.
Note that the only acceptable types of search space are `quniform`, `uniform` and`randint` and numerical `choice`.
Note that the only acceptable types within the search space are `quniform`, `uniform`,`randint`, and numerical `choice`.
More details can be found in our paper: https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/
\ No newline at end of file
More details can be found in our [paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/).
[Autokeras](https://arxiv.org/abs/1806.10282) is a popular automl tools using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression estimate its metric value from history trained results of network and metric value pair. Next, it chooses the the child which has best estimated performance and adds it to the training queue. Inspired by its work and referring to its [code](https://github.com/jhfjhfj1/autokeras), we implement our Network Morphism method in our NNI platform.
[Autokeras](https://arxiv.org/abs/1806.10282) is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its [code](https://github.com/jhfjhfj1/autokeras), we implemented our Network Morphism method on the NNI platform.
If you want to know about network morphism trial usage, please check[Readme.md](https://github.com/Microsoft/nni/blob/master/examples/trials/network_morphism/README.md) of the trial to get more detail.
If you want to know more about network morphism trial usage, please see the[Readme.md](https://github.com/Microsoft/nni/blob/master/examples/trials/network_morphism/README.md).
## 2. Usage
...
...
@@ -29,7 +29,7 @@ tuner:
In the training procedure, it generate a JSON file which represent a Network Graph. Users can call "json\_to\_graph()" function to build a pytorch model or keras model from this JSON file.
In the training procedure, it generates a JSON file which represents a Network Graph. Users can call the "json\_to\_graph()" function to build a PyTorch or Keras model from this JSON file.
```python
importnni
...
...
@@ -54,7 +54,7 @@ net = build_graph_from_json(RCV_CONFIG)
nni.report_final_result(best_acc)
```
If you want to save and **load the best model**, the following methods are recommended.
If you want to save and load the **best model**, the following methods are recommended.
The tuner has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
The tuner has a lot of different files, functions, and classes. Here, we will give most of those files only a brief introduction:
-`networkmorphism_tuner.py` is a tuner which using network morphism techniques.
-`networkmorphism_tuner.py` is a tuner which uses network morphism techniques.
-`bayesian.py` is Bayesian method to estimate the metric of unseen model based on the models we have already searched.
-`graph.py` is the meta graph data structure. Class Graph is representing the neural architecture graph of a model.
-`bayesian.py` is a Bayesian method to estimate the metric of unseen model based on the models we have already searched.
-`graph.py` is the meta graph data structure. The class Graph represents the neural architecture graph of a model.
- Graph extracts the neural architecture graph from a model.
- Each node in the graph is a intermediate tensor between layers.
- Each node in the graph is an intermediate tensor between layers.
- Each layer is an edge in the graph.
- Notably, multiple edges may refer to the same layer.
-`graph_transformer.py` includes some graph transformer to wider, deeper or add a skip-connection into the graph.
-`graph_transformer.py` includes some graph transformers which widen, deepen, or add skip-connections to the graph.
-`layers.py` includes all the layers we use in our model.
-`layer_transformer.py` includes some layer transformer to wider, deeper or add a skip-connection into the layer.
-`nn.py` includes the class to generate network class initially.
-`layer_transformer.py` includes some layer transformers which widen, deepen, or add skip-connections to the layer.
-`nn.py` includes the class which generates the initial network.
-`metric.py` some metric classes including Accuracy and MSE.
-`utils.py` is the example search network architectures in dataset`cifar10`by using Keras.
-`utils.py` is the example search network architectures for the`cifar10`dataset, using Keras.
## 4. The Network Representation Json Example
Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call "json\_to\_graph()" function in trial code to build a pytorch model or keras model from this JSON file. The example is as follows.
Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call the "json\_to\_graph()" function in the trial code to build a PyTorch or Keras model from this JSON file.
```json
{
...
...
@@ -215,29 +215,29 @@ Here is an example of the intermediate representation JSON file we defined, whic
}
```
The definition of each model is a JSON object(also you can consider the model as a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph)), where:
You can consider the model to be a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). The definition of each model is a JSON object where:
-`input_shape` is a list of integers, which does not include the batch axis.
-`input_shape` is a list of integers which do not include the batch axis.
-`weighted` means whether the weights and biases in the neural network should be included in the graph.
-`operation_history` is a list saving all the network morphism operations.
-`layer_id_to_input_node_ids` is a dictionary instance mapping from layer identifiers to their input nodes identifiers.
-`layer_id_to_output_node_ids` is a dictionary instance mapping from layer identifiers to their output nodes identifiers
-`adj_list` is a twodimensional list. The adjacency list of the graph. The first dimension is identified by tensor identifiers. In each edge list, the elements are two-element tuples of (tensor identifier, layer identifier).
-`reverse_adj_list` is a A reverse adjacent list in the same format as adj_list.
-`layer_id_to_input_node_ids` is a dictionary mapping from layer identifiers to their input nodes identifiers.
-`layer_id_to_output_node_ids` is a dictionary mapping from layer identifiers to their output nodes identifiers
-`adj_list` is a two-dimensional list; the adjacency list of the graph. The first dimension is identified by tensor identifiers. In each edge list, the elements are two-element tuples of (tensor identifier, layer identifier).
-`reverse_adj_list` is a reverse adjacent list in the same format as adj_list.
-`node_list` is a list of integers. The indices of the list are the identifiers.
-`layer_list` is a list of stub layers. The indices of the list are the identifiers.
- For `StubConv (StubConv1d, StubConv2d, StubConv3d)`, the number follows is its node input id(or id list), node output id, input_channel, filters, kernel_size, stride and padding.
- For `StubConv (StubConv1d, StubConv2d, StubConv3d)`, the numbering follows the format: its node input id(or id list), node output id, input_channel, filters, kernel_size, stride, and padding.
- For `StubDense`, the number follows is its node input id(or id list), node output id, input_units and units.
- For `StubDense`, the numbering follows the format: its node input id(or id list), node output id, input_units, and units.
- For `StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)`, the number follows is its node input id(or id list), node output id and features numbers.
- For `StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)`, the numbering follows the format: its node input id(or id list), node output id, and features numbers.
- For `StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)`, the number follows is its node input id(or id list), node output id and dropout rate.
- For `StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)`, the numbering follows the format: its node input id(or id list), node output id, and dropout rate.
- For `StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)`, the number follows is its node input id(or id list), node output id, kernel_size, stride and padding.
- For `StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)`, the numbering follows the format: its node input id(or id list), node output id, kernel_size, stride, and padding.
- For else layers, the number follows is its node input id(or id list) and node output id.
- For else layers, the numbering follows the format: its node input id(or id list) and node output id.
## 5. TODO
Next step, we will change the API from fixed network generator to more network operator generator. Besides, we will use ONNX instead of JSON later as the intermediate representation spec in the future.
Next step, we will change the API from s fixed network generator to a network generator with more available operators. We will use ONNX instead of JSON later as the intermediate representation spec in the future.
This is a tuner generally for NNI's NAS interface, it uses [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2)(i.e., ppo2 from OpenAI), and is adapted for NAS scenario.
This is a tuner geared for NNI's Neural Architecture Search (NAS) interface. It uses the [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the ppo2 OpenAI implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2) and is adapted for the NAS scenario.
It could successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:
It can successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:

We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10)(with limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Use Figure 7 in the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show how the search space looks like
We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10)(witha limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Here is Figure 7 from the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show what the search space looks like

The figure above is a chosen architecture, we use it to show how the search space looks like. Each square is a layer whose operation can be chosen from 6 operations. Each dash line is a skip connection, each square layer could choose 0 or 1 skip connection getting the output of a previous layer. __Note that__ in original macro search space each square layer could choose any number of skip connections, while in our implementation it is only allowed to choose 0 or 1.
The figure above was the chosen architecture. Each square is a layer whose operation was chosen from 6 options. Each dashed line is a skip connection, each square layer can choose 0 or 1 skip connections, getting the output from a previous layer. __Note that__, in original macro search space, each square layer could choose any number of skip connections, while in our implementation, it is only allowed to choose 0 or 1.
The result is shown in figure below (with the experiment config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml)):
The results are shown in figure below (see the experimenal config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml):
[SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on [the SMAC3 github repo](https://github.com/automl/SMAC3).
[SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on [the SMAC3 github repo](https://github.com/automl/SMAC3).
Note that SMAC on nni only supports a subset of the types in [search space spec](../Tutorial/SearchSpaceSpec.md), including `choice`, `randint`, `uniform`, `loguniform`, `quniform`.
\ No newline at end of file
Note that SMAC on nni only supports a subset of the types in the [search space spec](../Tutorial/SearchSpaceSpec.md): `choice`, `randint`, `uniform`, `loguniform`, and `quniform`.
[Docker](https://www.docker.com/) is a tool to make it easier for users to deploy and run applications based on their own operating system by starting containers. Docker is not a virtual machine, it does not create a virtual operating system, bug it allows different applications to use the same OS kernel, and isolate different applications by container.
[Docker](https://www.docker.com/) is a tool to make it easier for users to deploy and run applications based on their own operating system by starting containers. Docker is not a virtual machine, it does not create a virtual operating system, but it allows different applications to use the same OS kernel and isolate different applications by container.
Users could start NNI experiments using docker, and NNI provides an offical docker image [msranni/nni](https://hub.docker.com/r/msranni/nni)in docker hub.
Users can start NNI experiments using Docker. NNI also provides an official Docker image [msranni/nni](https://hub.docker.com/r/msranni/nni)on Docker Hub.
## Using docker in local machine
## Using Docker in local machine
### Step 1: Installation of docker
Before you start using docker to start NNI experiments, you should install a docker software in your local machine. [Refer](https://docs.docker.com/install/linux/docker-ce/ubuntu/)
### Step 1: Installation of Docker
Before you start using Docker for NNI experiments, you should install Docker on your local machine. [See here](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
### Step2: Start docker container
If you have installed the docker package in your local machine, you could start a docker container instance to run NNI examples. You should notice that because NNI will start a web UI process in container and continue to listen to a port, you need to specify the port mapping between your host machine and docker container to give access to web UI outside the container. By visting the host ip address and port, you could redirect to the web UI process started in docker container, and visit web UI content.
### Step2: Start a Docker container
If you have installed the Docker package in your local machine, you can start a Docker container instance to run NNI examples. You should notice that because NNI will start a web UI process in a container and continue to listen to a port, you need to specify the port mapping between your host machine and Docker container to give access to web UI outside the container. By visiting the host IP address and port, you can redirect to the web UI process started in Docker container and visit web UI content.
For example, you could start a new docker container from following command:
For example, you could start a new Docker container from the following command:
```
docker run -i -t -p [hostPort]:[containerPort] [image]
```
`-i:` Start a docker in an interactive mode.
`-i:` Start a Docker in an interactive mode.
`-t:` Docker assign the container a input terminal.
`-t:` Docker assign the container an input terminal.
`-p:` Port mapping, map host port to a container port.
For more information about docker command, please [refer](https://docs.docker.com/v17.09/edge/engine/reference/run/)
For more information about Docker commands, please [refer to this](https://docs.docker.com/v17.09/edge/engine/reference/run/).
Note:
```
NNI only support Ubuntu and MacOS system in local mode for the moment, please use correct docker image type.If you want to use gpu in docker container, please use nvidia-docker.
NNI only supports Ubuntu and MacOS systems in local mode for the moment, please use correct Docker image type.If you want to use gpu in a Docker container, please use nvidia-docker.
```
### Step3: Run NNI in docker container
### Step3: Run NNI in a Docker container
If you start a docker image using NNI's offical image `msranni/nni`, you could directly start NNI experiments by using `nnictl` command. Our offical image has NNI's running environment and basic python and deep learning frameworks environment.
If you start a Docker image using NNI's official image `msranni/nni`, you can directly start NNI experiments by using the `nnictl` command. Our official image has NNI's running environment and basic python and deep learning frameworks preinstalled.
If you start your own docker image, you may need to install NNI package first, please refer to [NNI installation](InstallationLinux.md).
If you start your own Docker image, you may need to install the NNI package first; please refer to [NNI installation](InstallationLinux.md).
If you want to run NNI's offical examples, you may need to clone NNI repo in github using
If you want to run NNI's official examples, you may need to clone the NNI repo in GitHub using
```
git clone https://github.com/Microsoft/nni.git
```
then you could enter `nni/examples/trials` to start an experiment.
then you can enter `nni/examples/trials` to start an experiment.
After you prepare NNI's environment, you could start a new experiment using `nnictl` command,[refer](QuickStart.md)
After you prepare NNI's environment, you can start a new experiment using the `nnictl` command.[See here](QuickStart.md).
## Using docker in remote platform
## Using Docker on a remote platform
NNI support starting experiments in [remoteTrainingService](../TrainingService/RemoteMachineMode.md), and run trial jobs in remote machines. As docker could start an independent Ubuntu system as SSH server, docker container could be used as the remote machine in NNI's remot mode.
NNI supports starting experiments in [remoteTrainingService](../TrainingService/RemoteMachineMode.md), and running trial jobs on remote machines. As Docker can start an independent Ubuntu system as an SSH server, a Docker container can be used as the remote machine in NNI's remote mode.
### Step 1: Setting docker environment
### Step 1: Setting a Docker environment
You should install a docker software in your remote machine first, please [refer](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
You should install the Docker software on your remote machine first, please [refer to this](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
To make sure your docker container could be connected by NNI experiments, you should build your own docker image to set SSH server or use images with SSH configuration. If you want to use docker container as SSH server, you should configure SSH password login or private key login, please [refer](https://docs.docker.com/engine/examples/running_ssh_service/).
To make sure your Docker container can be connected by NNI experiments, you should build your own Docker image to set an SSH server or use images with an SSH configuration. If you want to use a Docker container as an SSH server, you should configure the SSH password login or private key login; please [refer to this](https://docs.docker.com/engine/examples/running_ssh_service/).
Note:
```
NNI's offical image msranni/nni does not support SSH server for the time being, you should build your own docker image with SSH configuration or use other images as remote server.
NNI's official image msranni/nni does not support SSH servers for the time being; you should build your own Docker image with an SSH configuration or use other images as a remote server.
```
### Step2: Start docker container in remote machine
### Step2: Start a Docker container on a remote machine
SSH server need a port, you need to expose docker's SSH port to NNI as the connection port. For example, if you set your container's SSH port as **`A`**, you should map container's port **`A`** to your remote host machine's another port **`B`**, NNI will connect port **`B`** as SSH port, and your host machine will map the connection from port **`B`** to port **`A`**, then NNI could connect to your docker container.
An SSH server needs a port; you need to expose Docker's SSH port to NNI as the connection port. For example, if you set your container's SSH port as **`A`**, you should map the container's port **`A`** to your remote host machine's other port **`B`**, NNI will connect port **`B`** as an SSH port, and your host machine will map the connection from port **`B`** to port **`A`** then NNI could connect to your Docker container.
For example, you could start your docker container using following commands:
For example, you could start your Docker container using the following commands:
```
docker run -dit -p [hostPort]:[containerPort] [image]
```
The `containerPort` is the SSH port used in your docker container, and the `hostPort` is your host machine's port exposed to NNI. You could set your NNI's config file to connect to `hostPort`, and the connection will be transmitted to your docker container.
For more information about docker command, please [refer](https://docs.docker.com/v17.09/edge/engine/reference/run/).
The `containerPort` is the SSH port used in your Docker container and the `hostPort` is your host machine's port exposed to NNI. You can set your NNI's config file to connect to `hostPort` and the connection will be transmitted to your Docker container.
For more information about Docker commands, please [refer to this](https://docs.docker.com/v17.09/edge/engine/reference/run/).
Note:
```
If you use your own docker image as remote server, please make sure that this image has basic python environment and NNI SDK runtime environment. If you want to use gpu in docker container, please use nvidia-docker.
If you use your own Docker image as a remote server, please make sure that this image has a basic python environment and an NNI SDK runtime environment. If you want to use a GPU in a Docker container, please use nvidia-docker.
```
### Step3: Run NNI experiments
### Step3: Run NNI experiments
You could set your config file as remote platform, and setting the `machineList` configuration to connect your docker SSH server,[refer](../TrainingService/RemoteMachineMode.md). Note that you should set correct `port`,`username` and `passwd` or `sshKeyPath` of your host machine.
You can set your config file as a remote platform and set the `machineList` configuration to connect to your Docker SSH server;[refer to this](../TrainingService/RemoteMachineMode.md). Note that you should set the correct `port`,`username`, and `passWd` or `sshKeyPath` of your host machine.
`port:` The host machine's port, mapping to docker's SSH port.
`port:` The host machine's port, mapping to Docker's SSH port.
`username:` The username of docker container.
`username:` The username of the Docker container.
`passWd:` The password of docker container.
`passWd:` The password of the Docker container.
`sshKeyPath:` The path of private key of docker container.
`sshKeyPath:` The path of the private key of the Docker container.
After the configuration of config file, you could start an experiment, [refer](QuickStart.md)
After the configuration of the config file, you could start an experiment, [refer to this](QuickStart.md).
@@ -26,13 +26,13 @@ Installation on Linux and macOS follow the same instruction below.
### Use NNI in a docker image
You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build an NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
## Verify installation
The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is used** when running it.
* Download the examples via clone the source code.
* Download the examples via cloning the source code.
* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
* Open the `Web UI url` in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
* Open the `Web UI url` in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.

...
...
@@ -94,35 +94,35 @@ Below are the minimum system requirements for NNI on Windows, Windows 10.1809 is
### simplejson failed when installing NNI
Make sure C++ 14.0 compiler installed.
Make sure a C++ 14.0 compiler is installed.
>building 'simplejson._speedups' extension error: [WinError 3] The system cannot find the path specified
### Trial failed with missing DLL in command line or PowerShell
This error caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and fail to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it.
This error is caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and failure to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it.
>ImportError: DLL load failed
### Trial failed on webUI
Please check the trial log file stderr for more details.
If there is a stderr file, please check out. Two possible cases are as follows:
If there is a stderr file, please check it. Two possible cases are:
* forget to change the trial command `python3`into `python` in each experiment YAML.
* forget to install experiment dependencies such as TensorFlow, Keras and so on.
* forgetting to change the trial command `python3` to `python` in each experiment YAML.
* forgetting to install experiment dependencies such as TensorFlow, Keras and so on.
### Fail to use BOHB on Windows
Make sure C++ 14.0 compiler installed then try to run `nnictl package install --name=BOHB` to install the dependencies.
Make sure a C++ 14.0 compiler is installed when trying to run `nnictl package install --name=BOHB` to install the dependencies.
### Not supported tuner on Windows
SMAC is not supported currently, the specific reason can be referred to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).
SMAC is not supported currently; for the specific reason refer to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).
### Use a Windows server as a remote worker
Currently you can't.
Currently, you can't.
Note:
* If there is any error like `Segmentation fault`, please refer to [FAQ](FAQ.md)
* If an error like `Segmentation fault` is encountered, please refer to the [FAQ](FAQ.md)
We support Linux macOS and Windows in current stage, Ubuntu 16.04 or higher, macOS 10.14.1 and Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
We currently support Linux, macOS, and Windows. Ubuntu 16.04 or higher, macOS 10.14.1, and Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
**Linux and macOS**
...
...
@@ -18,15 +18,15 @@ We support Linux macOS and Windows in current stage, Ubuntu 16.04 or higher, mac
Note:
* For Linux and macOS `--user` can be added if you want to install NNI in your home directory, which does not require any special privileges.
* If there is any error like `Segmentation fault`, please refer to [FAQ](FAQ.md)
* For the `system requirements` of NNI, please refer to [Install NNI on Linux&Mac](InstallationLinux.md) or [Windows](InstallationWin.md)
* For Linux and macOS,`--user` can be added if you want to install NNI in your home directory; this does not require any special privileges.
* If there is an error like `Segmentation fault`, please refer to the [FAQ](FAQ.md).
* For the `system requirements` of NNI, please refer to [Install NNI on Linux&Mac](InstallationLinux.md) or [Windows](InstallationWin.md).
## "Hello World" example on MNIST
NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, tuning hyperparameters. Now, we show how to use NNI to help you find the optimal hyperparameters.
NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, and tuning hyperparameters. Here, we'll show how to use NNI to help you find the optimal hyperparameters for a MNIST model.
Here is an example script to train a CNN on MNIST dataset **without NNI**:
Here is an example script to train a CNN on the MNIST dataset **without NNI**:
```python
defrun_trial(params):
...
...
@@ -48,11 +48,11 @@ if __name__ == '__main__':
run_trial(params)
```
Note: If you want to see the full implementation, please refer to [examples/trials/mnist-tfv1/mnist_before.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/mnist_before.py)
Note: If you want to see the full implementation, please refer to [examples/trials/mnist-tfv1/mnist_before.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/mnist_before.py).
The above code can only try one set of parameters at a time, if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again.
The above code can only try one set of parameters at a time; if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again.
NNI is born for helping user do the tuning jobs, the NNI working process is presented below:
NNI is born to help the user do tuning jobs; the NNI working process is presented below:
```text
input: search space, trial code, config file
...
...
@@ -67,11 +67,11 @@ output: one optimal hyperparameter configuration
7: return hyperparameter value with best final result
```
If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes base on your code:
If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes based on your code:
**Three steps to start an experiment**
**Step 1**: Give a `Search Space` file in JSON, includes the `name` and the `distribution` (discretevalued or continuousvalued) of all the hyperparameters you need to search.
**Step 1**: Give a `Search Space` file in JSON, including the `name` and the `distribution` (discrete-valued or continuous-valued) of all the hyperparameters you need to search.
**Step 3**: Define a `config` file in YAML, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max duration arguments.
**Step 3**: Define a `config` file in YAML which declares the `path` to the search space and trial files. It also gives other information such as the tuning algorithm, max trial number, and max duration arguments.
```yaml
authorName:default
...
...
@@ -133,15 +133,15 @@ trial:
gpuNum:0
```
Note, **for Windows, you need to change trial command `python3` to `python`**
Note, **for Windows, you need to change the trial command from `python3` to `python`**.
All the codes above are already prepared and stored in [examples/trials/mnist-tfv1/](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1).
All the cod above is already prepared and stored in [examples/trials/mnist-tfv1/](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1).
**Linux and macOS**
Run the **config.yml** file from your command line to start MNIST experiment.
Run the **config.yml** file from your command line to start an MNIST experiment.
@@ -149,17 +149,17 @@ Run the **config.yml** file from your command line to start MNIST experiment.
**Windows**
Run the **config_windows.yml** file from your command line to start MNIST experiment.
Run the **config_windows.yml** file from your command line to start an MNIST experiment.
Note, if you're using NNI on Windows, it needs to change `python3` to `python` in the config.yml file, or use the config_windows.yml file to start the experiment.
Note: if you're using NNI on Windows, you need to change `python3` to `python` in the config.yml file or use the config_windows.yml file to start the experiment.
Note,`nnictl` is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click [here](Nnictl.md) for more usage of `nnictl`
Note:`nnictl` is a command line tool that can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click [here](Nnictl.md) for more usage of `nnictl`
Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. And this is what we expected to get:
Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. And this is what we expect to get:
```text
INFO: Starting restful server...
...
...
@@ -187,53 +187,53 @@ You can use these commands to get more information about the experiment
If you prepare `trial`, `search space` and `config` according to the above steps and successfully create a NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameters sets for each trial according to the requirements you set. You can clearly sees its progress by NNI WebUI.
If you prepared`trial`, `search space`, and `config` according to the above steps and successfully created an NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameter sets for each trial according to the requirements you set. You can clearly see its progress through the NNI WebUI.
## WebUI
After you start your experiment in NNI successfully, you can find a message in the command-line interface to tell you `Web UI url` like this:
After you start your experiment in NNI successfully, you can find a message in the command-line interface that tells you the `Web UI url` like this:
```text
The Web UI urls are: [Your IP]:8080
```
Open the `Web UI url`(In this information is: `[Your IP]:8080`) in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. If you cannot open the WebUI link in your terminal, you can refer to [FAQ](FAQ.md).
Open the `Web UI url` (Here it's: `[Your IP]:8080`) in your browser; you can view detailed information about the experiment and all the submitted trial jobs as shown below. If you cannot open the WebUI link in your terminal, please refer to the [FAQ](FAQ.md).
### View summary page
Click the tab "Overview".
Click the "Overview" tab.
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also support `download these information and parameters` through the **Download** button. You can download the experiment result anytime in the middle for the running or at the end of the execution, etc.
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also supports downloading this information and the parameters through the **Download** button. You can download the experiment results anytime while the experiment is running, or you can wait until the end of the execution, etc.

Top 10 trials will be listed in the Overview page, you can browse all the trials in "Trials Detail" page.
The top 10 trials will be listed on the Overview page. You can browse all the trials on the "Trials Detail" page.

### View trials detail page
Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
Click the "Default Metric" tab to see the point graph of all trials. Hover to see specific default metrics and search space messages.

Click the tab "Hyper Parameter" to see the parallel graph.
Click the "Hyper Parameter" tab to see the parallel graph.
* You can select the percentage to see top trials.
* Choose two axis to swap its positions
* You can select the percentage to see the top trials.
* Choose two axis to swap their positions.

Click the tab "Trial Duration" to see the bar graph.
Click the "Trial Duration" tab to see the bar graph.

Below is the status of the all trials. Specifically:
Below is the status of all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
* Trial detail: trial's id, duration, start time, end time, status, accuracy, and search space file.
* If you run on the OpenPAI platform, you can also see the hdfsLogPath.
* Kill: you can kill a job that status is running.
* Support to search for a specific trial.
* Kill: you can kill a job that has the `Running` status.
*See the experiment trial profile/search space and performanced good trials.
*On the overview tab, you can see the experiment trial profile/search space and the performance of top trials.


* If your experiment have many trials, you can change the refresh interval on here.
* If your experiment has many trials, you can change the refresh interval here.

*Support to review and download the experiment result and nni-manager/dispatcher log file from the "View" button.
*You can review and download the experiment results and nni-manager/dispatcher log files from the "View" button.

* You can click the learn about in the error box to track experiment log message if the experiment's status is error.
* You can click the exclamation point in the error box to see a log message if the experiment's status is an error.


* You can click "Feedback" to report it if you have any questions.
* You can click "Feedback" to report any questions.
## View job default metric
...
...
@@ -46,25 +46,23 @@ Click the tab "Trial Duration" to see the bar graph.

## View Trial Intermediate Result Graph
Click the tab "Intermediate Result" to see the lines graph.
Click the tab "Intermediate Result" to see the line graph.

The trial may have many intermediate results in the training progress. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result graph.
The trial may have many intermediate results in the training process. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result graph.
You may find that these trials will get better or worse at one of intermediate results. In other words, this is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding abscissa value at #Intermediate.
And then input the range of metrics on this intermedia result. Like below picture, it chooses No. 4 intermediate result and set the range of metrics to 0.8-1.
You may find that these trials will get better or worse at an intermediate result. This indicates that it is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding X-value at #Intermediate. Then input the range of metrics on this intermedia result. In the picture below, we choose the No. 4 intermediate result and set the range of metrics to 0.8-1.

## View trials status
Click the tab "Trials Detail" to see the status of the all trials. Specifically:
Click the tab "Trials Detail" to see the status of all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy, and search space file.

* The button named "Add column" can select which column to show in the table. If you run an experiment that final result is dict, you can see other keys in the table. You can choose the column "Intermediate count" to watch the trial's progress.
* The button named "Add column" can select which column to show on the table. If you run an experiment whose final result is a dict, you can see other keys in the table. You can choose the column "Intermediate count" to watch the trial's progress.

* If you want to compare some trials, you can select them and then click "Compare" to see the results.
...
...
@@ -74,13 +72,13 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically:
* Support to search for a specific trial by it's id, status, Trial No. and parameters.

* You can use the button named "Copy as python" to copy trial's parameters.
* You can use the button named "Copy as python" to copy the trial's parameters.

* If you run on OpenPAI or Kubeflow platform, you can also see the hdfsLog.
* If you run on the OpenPAI or Kubeflow platform, you can also see the hdfsLog.

* Intermediate Result Graph: you can see default and other keys in this graph by click the operation column button.
* Intermediate Result Graph: you can see the default and other keys in this graph by clicking the operation column button.
In order to save our computing resources, NNI supports an early stop policy and creates **Assessor** to finish this job.
In order to save on computing resources, NNI supports an early stopping policy and has an interface called **Assessor** to do this job.
Assessor receives the intermediate result from Trial and decides whether the Trial should be killed by specific algorithm. Once the Trial experiment meets the early stop conditions(which means assessor is pessimistic about the final results), the assessor will kill the trial and the status of trial will be `"EARLY_STOPPED"`.
Assessor receives the intermediate result from a trial and decides whether the trial should be killed using a specific algorithm. Once the trial experiment meets the early stopping conditions(which means Assessor is pessimistic about the final results), the assessor will kill the trial and the status of the trial will be `EARLY_STOPPED`.
Here is an experimental result of MNIST after using 'Curvefitting' Assessor in 'maximize' mode, you can see that assessor successfully **early stopped** many trials with bad hyperparameters in advance. If you use assessor, we may get better hyperparameters under the same computing resources.
Here is an experimental result of MNIST after using the 'Curvefitting' Assessor in 'maximize' mode. You can see that Assessor successfully **early stopped** many trials with bad hyperparameters in advance. If you use Assessor, you may get better hyperparameters using the same computing resources.
NNI provides an easy way to adopt an approach to set up parameter tuning algorithms, we call them **Tuner**.
Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configures. And tuner sends next hyper-parameter or architecture configure to Trial.
Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configuration. Tuner sends the next hyper-parameter or architecture configuration to Trial.