Unverified Commit 346d49d5 authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #156 from Microsoft/master

merge master
parents d95c3513 58b259a5
...@@ -29,4 +29,6 @@ Please fill this for deployment related issues: ...@@ -29,4 +29,6 @@ Please fill this for deployment related issues:
- is conda or virtualenv used?: - is conda or virtualenv used?:
- is running in docker?: - is running in docker?:
**need to update document(yes/no)**:
**Anything else we need to know**: **Anything else we need to know**:
...@@ -74,7 +74,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search ...@@ -74,7 +74,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
</td> </td>
<td> <td>
<ul> <ul>
<li><a href="docs/en_US/tutorial_1_CR_exp_local_api.md">Local Machine</a></li> <li><a href="docs/en_US/LocalMode.md">Local Machine</a></li>
<li><a href="docs/en_US/RemoteMachineMode.md">Remote Servers</a></li> <li><a href="docs/en_US/RemoteMachineMode.md">Remote Servers</a></li>
<li><a href="docs/en_US/PAIMode.md">OpenPAI</a></li> <li><a href="docs/en_US/PAIMode.md">OpenPAI</a></li>
<li><a href="docs/en_US/KubeflowMode.md">Kubeflow</a></li> <li><a href="docs/en_US/KubeflowMode.md">Kubeflow</a></li>
...@@ -183,7 +183,7 @@ You can use these commands to get more information about the experiment ...@@ -183,7 +183,7 @@ You can use these commands to get more information about the experiment
* [Config an experiment](docs/en_US/ExperimentConfig.md) * [Config an experiment](docs/en_US/ExperimentConfig.md)
* [How to use annotation](docs/en_US/Trials.md#nni-python-annotation) * [How to use annotation](docs/en_US/Trials.md#nni-python-annotation)
## **Tutorials** ## **Tutorials**
* [Run an experiment on local (with multiple GPUs)?](docs/en_US/tutorial_1_CR_exp_local_api.md) * [Run an experiment on local (with multiple GPUs)?](docs/en_US/LocalMode.md)
* [Run an experiment on multiple machines?](docs/en_US/RemoteMachineMode.md) * [Run an experiment on multiple machines?](docs/en_US/RemoteMachineMode.md)
* [Run an experiment on OpenPAI?](docs/en_US/PAIMode.md) * [Run an experiment on OpenPAI?](docs/en_US/PAIMode.md)
* [Run an experiment on Kubeflow?](docs/en_US/KubeflowMode.md) * [Run an experiment on Kubeflow?](docs/en_US/KubeflowMode.md)
......
# NAS Algorithms Comparison
*Posted by Anonymous Author*
Train and Compare NAS models including Autokeras, DARTS, ENAS and NAO.
Their source code link is as below:
- Autokeras: [https://github.com/jhfjhfj1/autokeras](https://github.com/jhfjhfj1/autokeras)
- DARTS: [https://github.com/quark0/darts](https://github.com/quark0/darts)
- ENAS: [https://github.com/melodyguan/enas](https://github.com/melodyguan/enas)
- NAO: [https://github.com/renqianluo/NAO](https://github.com/renqianluo/NAO)
## Experiment Description
To avoid over-fitting in **CIFAR-10**, we also compare the models in the other five datasets including Fashion-MNIST, CIFAR-100, OUI-Adience-Age, ImageNet-10-1 (subset of ImageNet), ImageNet-10-2 (another subset of ImageNet). We just sample a subset with 10 different labels from ImageNet to make ImageNet-10-1 or ImageNet-10-2.
| Dataset | Training Size | Numer of Classes | Descriptions |
| :----------------------------------------------------------- | ------------- | ---------------- | ------------------------------------------------------------ |
| [Fashion-MNIST](<https://github.com/zalandoresearch/fashion-mnist>) | 60,000 | 10 | T-shirt/top, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot. |
| [CIFAR-10](<https://www.cs.toronto.edu/~kriz/cifar.html>) | 50,000 | 10 | Airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks. |
| [CIFAR-100](<https://www.cs.toronto.edu/~kriz/cifar.html>) | 50,000 | 100 | Similar to CIFAR-10 but with 100 classes and 600 images each. |
| [OUI-Adience-Age](<https://talhassner.github.io/home/projects/Adience/Adience-data.html>) | 26,580 | 8 | 8 age groups/labels (0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60-). |
| [ImageNet-10-1](<http://www.image-net.org/>) | 9,750 | 10 | Coffee mug, computer keyboard, dining table, wardrobe, lawn mower, microphone, swing, sewing machine, odometer and gas pump. |
| [ImageNet-10-2](<http://www.image-net.org/>) | 9,750 | 10 | Drum, banj, whistle, grand piano, violin, organ, acoustic guitar, trombone, flute and sax. |
We do not change the default fine-tuning technique in their source code. In order to match each task, the codes of input image shape and output numbers are changed.
Search phase time for all NAS methods is **two days** as well as the retrain time. Average results are reported based on **three repeat times**. Our evaluation machines have one Nvidia Tesla P100 GPU, 112GB of RAM and one 2.60GHz CPU (Intel E5-2690).
For NAO, it requires too much computing resources, so we only use NAO-WS which provides the pipeline script.
For AutoKeras, we used 0.2.18 version because it was the latest version when we started the experiment.
## NAS Performance
| NAS | AutoKeras (%) | ENAS (macro) (%) | ENAS (micro) (%) | DARTS (%) | NAO-WS (%) |
| --------------- | :-----------: | :--------------: | :--------------: | :-------: | :--------: |
| Fashion-MNIST | 91.84 | 95.44 | 95.53 | **95.74** | 95.20 |
| CIFAR-10 | 75.78 | 95.68 | **96.16** | 94.23 | 95.64 |
| CIFAR-100 | 43.61 | 78.13 | 78.84 | **79.74** | 75.75 |
| OUI-Adience-Age | 63.20 | **80.34** | 78.55 | 76.83 | 72.96 |
| ImageNet-10-1 | 61.80 | 77.07 | 79.80 | **80.48** | 77.20 |
| ImageNet-10-2 | 37.20 | 58.13 | 56.47 | 60.53 | **61.20** |
Unfortunately, we cannot reproduce all the results in the paper.
The best or average results reported in the paper:
| NAS | AutoKeras(%) | ENAS (macro) (%) | ENAS (micro) (%) | DARTS (%) | NAO-WS (%) |
| --------- | ------------ | :--------------: | :--------------: | :------------: | :---------: |
| CIFAR- 10 | 88.56(best) | 96.13(best) | 97.11(best) | 97.17(average) | 96.47(best) |
For AutoKeras, it has relatively worse performance across all datasets due to its random factor on network morphism.
For ENAS, ENAS (macro) shows good results in OUI-Adience-Age and ENAS (micro) shows good results in CIFAR-10.
For DARTS, it has a good performance on some datasets but we found its high variance in other datasets. The difference among three runs of benchmarks can be up to 5.37% in OUI-Adience-Age and 4.36% in ImageNet-10-1.
For NAO-WS, it shows good results in ImageNet-10-2 but it can perform very poorly in OUI-Adience-Age.
## Reference
1. Jin, Haifeng, Qingquan Song, and Xia Hu. "Efficient neural architecture search with network morphism." *arXiv preprint arXiv:1806.10282* (2018).
2. Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." arXiv preprint arXiv:1806.09055 (2018).
3. Pham, Hieu, et al. "Efficient Neural Architecture Search via Parameters Sharing." international conference on machine learning (2018): 4092-4101.
4. Luo, Renqian, et al. "Neural Architecture Optimization." neural information processing systems (2018): 7827-7838.
######################
Blog
######################
.. toctree::
:maxdepth: 2
NAS Comparison<NASComparison>
...@@ -15,7 +15,7 @@ Currently we support the following algorithms: ...@@ -15,7 +15,7 @@ Currently we support the following algorithms:
|[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on the SMAC3 Github repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [Github Repo](https://github.com/automl/SMAC3)| |[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on the SMAC3 Github repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [Github Repo](https://github.com/automl/SMAC3)|
|[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.| |[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
|[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.| |[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, qloguniform. The number q in quniform and qloguniform has special meaning (different from the spec in search space spec). It means the number of values that will be sampled evenly from the range low and high.|
|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of STEPs to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)| |[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of trial budget to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
|[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)| |[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)|
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)| |[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
...@@ -39,7 +39,7 @@ TPE, as a black-box optimization, can be used in various scenarios and shows goo ...@@ -39,7 +39,7 @@ TPE, as a black-box optimization, can be used in various scenarios and shows goo
**Requirement of classArg** **Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
**Usage example:** **Usage example:**
...@@ -65,7 +65,7 @@ Random search is suggested when each trial does not take too long (e.g., each tr ...@@ -65,7 +65,7 @@ Random search is suggested when each trial does not take too long (e.g., each tr
**Requirement of classArg:** **Requirement of classArg:**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
**Usage example** **Usage example**
...@@ -89,7 +89,7 @@ Anneal is suggested when each trial does not take too long, and you have enough ...@@ -89,7 +89,7 @@ Anneal is suggested when each trial does not take too long, and you have enough
**Requirement of classArg** **Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
**Usage example** **Usage example**
...@@ -145,7 +145,7 @@ Similar to TPE, SMAC is also a black-box tuner which can be tried in various sce ...@@ -145,7 +145,7 @@ Similar to TPE, SMAC is also a black-box tuner which can be tried in various sce
**Requirement of classArg** **Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
**Usage example** **Usage example**
...@@ -232,8 +232,8 @@ It is suggested when you have limited computation resource but have relatively l ...@@ -232,8 +232,8 @@ It is suggested when you have limited computation resource but have relatively l
**Requirement of classArg** **Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **R** (*int, optional, default = 60*) - the maximum STEPS (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use STEPS to control how long it runs. * **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
* **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials * **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials
**Usage example** **Usage example**
...@@ -266,7 +266,7 @@ It is suggested that you want to apply deep learning methods to your task (your ...@@ -266,7 +266,7 @@ It is suggested that you want to apply deep learning methods to your task (your
**Requirement of classArg** **Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain. * **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
* **input_width** (*int, optional, default = 32*) - input image width * **input_width** (*int, optional, default = 32*) - input image width
* **input_channel** (*int, optional, default = 3*) - input image channel * **input_channel** (*int, optional, default = 3*) - input image channel
...@@ -306,7 +306,7 @@ Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long ...@@ -306,7 +306,7 @@ Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long
**Requirement of classArg** **Requirement of classArg**
* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', tuners will return the hyperparameter set with larger expectation. If 'minimize', tuner will return the hyperparameter set with smaller expectation. * **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
**Usage example** **Usage example**
......
...@@ -57,7 +57,7 @@ Below are the minimum system requirements for NNI on macOS. Due to potential pro ...@@ -57,7 +57,7 @@ Below are the minimum system requirements for NNI on macOS. Due to potential pro
* [Use NNIBoard](WebUI.md) * [Use NNIBoard](WebUI.md)
* [Define search space](SearchSpaceSpec.md) * [Define search space](SearchSpaceSpec.md)
* [Config an experiment](ExperimentConfig.md) * [Config an experiment](ExperimentConfig.md)
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md) * [How to run an experiment on local (with multiple GPUs)?](LocalMode.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md) * [How to run an experiment on multiple machines?](RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md) * [How to run an experiment on OpenPAI?](PAIMode.md)
* [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md) * [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md)
......
...@@ -25,7 +25,8 @@ nnictl support commands: ...@@ -25,7 +25,8 @@ nnictl support commands:
### Manage an experiment ### Manage an experiment
<a name="create"></a> <a name="create"></a>
* __nnictl create__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl create`
* Description * Description
...@@ -47,13 +48,34 @@ nnictl support commands: ...@@ -47,13 +48,34 @@ nnictl support commands:
|--port, -p|False| |the port of restful server| |--port, -p|False| |the port of restful server|
|--debug, -d|False||set debug mode| |--debug, -d|False||set debug mode|
* Examples
> create a new experiment with the default port: 8080
```bash
nnictl create --config nni/examples/trials/mnist/config.yml
```
> create a new experiment with specified port 8088
```bash
nnictl create --config nni/examples/trials/mnist/config.yml --port 8088
```
> create a new experiment with specified port 8088 and debug mode
```bash
nnictl create --config nni/examples/trials/mnist/config.yml --port 8088 --debug
```
Note: Note:
``` ```
Debug mode will disable version check function in Trialkeeper. Debug mode will disable version check function in Trialkeeper.
``` ```
<a name="resume"></a> <a name="resume"></a>
* __nnictl resume__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl resume`
* Description * Description
...@@ -69,12 +91,21 @@ nnictl support commands: ...@@ -69,12 +91,21 @@ nnictl support commands:
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |The id of the experiment you want to resume| |id| True| |The id of the experiment you want to resume|
|--port, -p| False| |Rest port of the experiment you want to resume| |--port, -p| False| |Rest port of the experiment you want to resume|
|--debug, -d|False||set debug mode| |--debug, -d|False||set debug mode|
* Example
> resume an experiment with specified port 8088
```bash
nnictl resume [experiment_id] --port 8088
```
<a name="stop"></a> <a name="stop"></a>
* __nnictl stop__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl stop`
* Description * Description
...@@ -85,18 +116,33 @@ nnictl support commands: ...@@ -85,18 +116,33 @@ nnictl support commands:
```bash ```bash
nnictl stop [id] nnictl stop [id]
``` ```
* Detail
1. If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message. * Details & Examples
2. If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
3. If the id ends with *, nnictl will stop all experiments whose ids matchs the regular. 1. If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
4. If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
5. If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information. ```bash
6. Users could use 'nnictl stop all' to stop all experiments. nnictl stop
```
2. If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
```bash
nnictl stop [experiment_id]
```
3. Users could use 'nnictl stop all' to stop all experiments.
```bash
nnictl stop all
```
4. If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
5. If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
6. If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
<a name="update"></a> <a name="update"></a>
* __nnictl update__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl update`
* __nnictl update searchspace__ * __nnictl update searchspace__
* Description * Description
...@@ -111,10 +157,18 @@ nnictl support commands: ...@@ -111,10 +157,18 @@ nnictl support commands:
* Options * Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
|--filename, -f| True| |the file storing your new search space| |--filename, -f| True| |the file storing your new search space|
* Example
`update experiment's new search space with file dir 'examples/trials/mnist/search_space.json'`
```bash
nnictl update searchspace [experiment_id] --file examples/trials/mnist/search_space.json
```
* __nnictl update concurrency__ * __nnictl update concurrency__
* Description * Description
...@@ -129,10 +183,18 @@ nnictl support commands: ...@@ -129,10 +183,18 @@ nnictl support commands:
* Options * Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
|--value, -v| True| |the number of allowed concurrent trials| |--value, -v| True| |the number of allowed concurrent trials|
* Example
> update experiment's concurrency
```bash
nnictl update concurrency [experiment_id] --value [concurrency_number]
```
* __nnictl update duration__ * __nnictl update duration__
...@@ -145,12 +207,21 @@ nnictl support commands: ...@@ -145,12 +207,21 @@ nnictl support commands:
```bash ```bash
nnictl update duration [OPTIONS] nnictl update duration [OPTIONS]
``` ```
* Options * Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
|--value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.| |--value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
* Example
> update experiment's duration
```bash
nnictl update duration [experiment_id] --value [duration]
```
* __nnictl update trialnum__ * __nnictl update trialnum__
* Description * Description
...@@ -165,13 +236,22 @@ nnictl support commands: ...@@ -165,13 +236,22 @@ nnictl support commands:
* Options * Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
|--value, -v| True| |the new number of maxtrialnum you want to set| |--value, -v| True| |the new number of maxtrialnum you want to set|
* Example
> update experiment's trial num
```bash
nnictl update trialnum --id [experiment_id] --value [trial_num]
```
<a name="trial"></a> <a name="trial"></a>
* __nnictl trial__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl trial`
* __nnictl trial ls__ * __nnictl trial ls__
...@@ -187,9 +267,9 @@ nnictl support commands: ...@@ -187,9 +267,9 @@ nnictl support commands:
* Options * Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
* __nnictl trial kill__ * __nnictl trial kill__
...@@ -203,15 +283,40 @@ nnictl support commands: ...@@ -203,15 +283,40 @@ nnictl support commands:
nnictl trial kill [OPTIONS] nnictl trial kill [OPTIONS]
``` ```
* Options * Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the trial to be killed|
|--experiment, -E| True| |Experiment id of the trial|
* Example
> kill trail job
```bash
nnictl trial [trial_id] --vexperiment [experiment_id]
```
* __nnictl trial export__
* Description
You can use this command to export reward & hyper-parameter of trial jobs to a csv file.
* Usage
```bash
nnictl trial export [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| False| |ID of the trial to be killed| |id| False| |ID of the experiment |
|--experiment, -E| True| |Experiment id of the trial| |--file| True| |File path of the output csv file |
<a name="top"></a> <a name="top"></a>
* __nnictl top__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl top`
* Description * Description
...@@ -231,7 +336,8 @@ nnictl support commands: ...@@ -231,7 +336,8 @@ nnictl support commands:
|--time, -t| False| |The interval to update the experiment status, the unit of time is second, and the default value is 3 second.| |--time, -t| False| |The interval to update the experiment status, the unit of time is second, and the default value is 3 second.|
<a name="experiment"></a> <a name="experiment"></a>
### Manage experiment information
![](https://placehold.it/15/1589F0/000000?text=+) `Manage experiment information`
* __nnictl experiment show__ * __nnictl experiment show__
...@@ -282,7 +388,8 @@ nnictl support commands: ...@@ -282,7 +388,8 @@ nnictl support commands:
``` ```
<a name="config"></a> <a name="config"></a>
* __nnictl config show__
![](https://placehold.it/15/1589F0/000000?text=+) `nnictl config show`
* Description * Description
...@@ -295,7 +402,8 @@ nnictl support commands: ...@@ -295,7 +402,8 @@ nnictl support commands:
``` ```
<a name="log"></a> <a name="log"></a>
### Manage log
![](https://placehold.it/15/1589F0/000000?text=+) `Manage log`
* __nnictl log stdout__ * __nnictl log stdout__
...@@ -318,6 +426,14 @@ nnictl support commands: ...@@ -318,6 +426,14 @@ nnictl support commands:
|--tail, -t| False| |show tail lines of stdout| |--tail, -t| False| |show tail lines of stdout|
|--path, -p| False| |show the path of stdout file| |--path, -p| False| |show the path of stdout file|
* Example
> Show the tail of stdout log content
```bash
nnictl log stdout [experiment_id] --tail [lines_number]
```
* __nnictl log stderr__ * __nnictl log stderr__
* Description * Description
...@@ -358,12 +474,14 @@ nnictl support commands: ...@@ -358,12 +474,14 @@ nnictl support commands:
|--experiment, -E| False| |Experiment ID of the trial, required when id is not empty.| |--experiment, -E| False| |Experiment ID of the trial, required when id is not empty.|
<a name="webui"></a> <a name="webui"></a>
### Manage webui
![](https://placehold.it/15/1589F0/000000?text=+) `Manage webui`
* __nnictl webui url__ * __nnictl webui url__
<a name="tensorboard"></a> <a name="tensorboard"></a>
### Manage tensorboard
![](https://placehold.it/15/1589F0/000000?text=+) `Manage tensorboard`
* __nnictl tensorboard start__ * __nnictl tensorboard start__
...@@ -411,7 +529,8 @@ nnictl support commands: ...@@ -411,7 +529,8 @@ nnictl support commands:
|id| False| |ID of the experiment you want to set| |id| False| |ID of the experiment you want to set|
<a name="package"></a> <a name="package"></a>
### Manage package
![](https://placehold.it/15/1589F0/000000?text=+) `Manage package`
* __nnictl package install__ * __nnictl package install__
* Description * Description
...@@ -430,6 +549,14 @@ nnictl support commands: ...@@ -430,6 +549,14 @@ nnictl support commands:
|------|------|------ |------| |------|------|------ |------|
|--name| True| |The name of package to be installed| |--name| True| |The name of package to be installed|
* Example
> Install the packages needed in tuner SMAC
```bash
nnictl package install --name=SMAC
```
* __nnictl package show__ * __nnictl package show__
* Description * Description
...@@ -443,7 +570,8 @@ nnictl support commands: ...@@ -443,7 +570,8 @@ nnictl support commands:
``` ```
<a name="version"></a> <a name="version"></a>
### Check NNI version
![](https://placehold.it/15/1589F0/000000?text=+) `Check NNI version`
* __nnictl --version__ * __nnictl --version__
......
...@@ -53,7 +53,7 @@ More details about how to run an experiment, please refer to [Get Started](Quick ...@@ -53,7 +53,7 @@ More details about how to run an experiment, please refer to [Get Started](Quick
* [How to customize your own tuner?](Customize_Tuner.md) * [How to customize your own tuner?](Customize_Tuner.md)
* [What are assessors supported by NNI?](Builtin_Assessors.md) * [What are assessors supported by NNI?](Builtin_Assessors.md)
* [How to customize your own assessor?](Customize_Assessor.md) * [How to customize your own assessor?](Customize_Assessor.md)
* [How to run an experiment on local?](tutorial_1_CR_exp_local_api.md) * [How to run an experiment on local?](LocalMode.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md) * [How to run an experiment on multiple machines?](RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md) * [How to run an experiment on OpenPAI?](PAIMode.md)
* [Examples](mnist_examples.md) * [Examples](mnist_examples.md)
\ No newline at end of file
...@@ -225,7 +225,7 @@ Below is the status of the all trials. Specifically: ...@@ -225,7 +225,7 @@ Below is the status of the all trials. Specifically:
* [Try different Assessors](Builtin_Assessors.md) * [Try different Assessors](Builtin_Assessors.md)
* [How to use command line tool nnictl](NNICTLDOC.md) * [How to use command line tool nnictl](NNICTLDOC.md)
* [How to write a trial](Trials.md) * [How to write a trial](Trials.md)
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md) * [How to run an experiment on local (with multiple GPUs)?](LocalMode.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md) * [How to run an experiment on multiple machines?](RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md) * [How to run an experiment on OpenPAI?](PAIMode.md)
* [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md) * [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md)
......
...@@ -17,7 +17,7 @@ advisor: ...@@ -17,7 +17,7 @@ advisor:
#choice: Hyperband #choice: Hyperband
builtinAdvisorName: Hyperband builtinAdvisorName: Hyperband
classArgs: classArgs:
#R: the maximum STEPS #R: the maximum trial budget
R: 100 R: 100
#eta: proportion of discarded trials #eta: proportion of discarded trials
eta: 3 eta: 3
...@@ -26,13 +26,13 @@ advisor: ...@@ -26,13 +26,13 @@ advisor:
``` ```
Note that once you use advisor, it is not allowed to add tuner and assessor spec in the config file any more. Note that once you use advisor, it is not allowed to add tuner and assessor spec in the config file any more.
If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `STEPS` besides the hyperparameters defined by user. **By using this `STEPS`, the trial can control how long it runs**. If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `TRIAL_BUDGET` besides the hyperparameters defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.
For `report_intermediate_result(metric)` and `report_final_result(metric)` in your trial code, **`metric` should be either a number or a dict which has a key `default` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss. For `report_intermediate_result(metric)` and `report_final_result(metric)` in your trial code, **`metric` should be either a number or a dict which has a key `default` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.
`R` and `eta` are the parameters of Hyperband that you can change. `R` means the maximum STEPS that can be allocated to a configuration. Here, STEPS could mean the number of epochs or mini-batches. This `STEPS` should be used by the trial to control how long it runs. Refer to the example under `examples/trials/mnist-hyperband/` for details. `R` and `eta` are the parameters of Hyperband that you can change. `R` means the maximum trial budget that can be allocated to a configuration. Here, trial budget could mean the number of epochs or mini-batches. This `TRIAL_BUDGET` should be used by the trial to control how long it runs. Refer to the example under `examples/trials/mnist-advisor/` for details.
`eta` means `n/eta` configurations from `n` configurations will survive and rerun using more STEPS. `eta` means `n/eta` configurations from `n` configurations will survive and rerun using more budgets.
Here is a concrete example of `R=81` and `eta=3`: Here is a concrete example of `R=81` and `eta=3`:
...@@ -45,7 +45,7 @@ Here is a concrete example of `R=81` and `eta=3`: ...@@ -45,7 +45,7 @@ Here is a concrete example of `R=81` and `eta=3`:
|3 |3 27 |1 81 | | | | |3 |3 27 |1 81 | | | |
|4 |1 81 | | | | | |4 |1 81 | | | | |
`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many STEPS these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds. `s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many budgets these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.
About how to write trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`. About how to write trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.
......
...@@ -18,4 +18,5 @@ Contents ...@@ -18,4 +18,5 @@ Contents
Reference Reference
FAQ FAQ
Contribution Contribution
Changelog<RELEASE> Changelog<RELEASE>
\ No newline at end of file Blog<Blog/index>
...@@ -2,7 +2,7 @@ Introduction to NNI Training Services ...@@ -2,7 +2,7 @@ Introduction to NNI Training Services
===================================== =====================================
.. toctree:: .. toctree::
Local<tutorial_1_CR_exp_local_api> Local<LocalMode>
Remote<RemoteMachineMode> Remote<RemoteMachineMode>
OpenPAI<PAIMode> OpenPAI<PAIMode>
Kubeflow<KubeflowMode> Kubeflow<KubeflowMode>
......
...@@ -73,8 +73,6 @@ tuner: ...@@ -73,8 +73,6 @@ tuner:
# config.yml # config.yml
tuner: tuner:
builtinTunerName: Random builtinTunerName: Random
classArgs:
optimize_mode: maximize
``` ```
<br /> <br />
...@@ -115,10 +113,6 @@ tuner: ...@@ -115,10 +113,6 @@ tuner:
此算法对计算资源的需求相对较高。 需要非常大的初始种群,以免落入局部最优中。 如果 Trial 时间很短,或者使用了 Assessor,就非常适合此算法。 如果 Trial 代码支持权重迁移,即每次 Trial 会从上一轮继承已经收敛的权重,建议使用此算法。 这会大大提高训练速度。 此算法对计算资源的需求相对较高。 需要非常大的初始种群,以免落入局部最优中。 如果 Trial 时间很短,或者使用了 Assessor,就非常适合此算法。 如果 Trial 代码支持权重迁移,即每次 Trial 会从上一轮继承已经收敛的权重,建议使用此算法。 这会大大提高训练速度。
**参数**
* **optimize_mode** (*maximize 或 minimize,可选,默认值为 maximize*) - 如果为 'maximize',Tuner 会给出有可能产生较大值的参数组合。 如果为 'minimize',Tuner 会给出有可能产生较小值的参数组合。
**使用样例:** **使用样例:**
```yaml ```yaml
...@@ -239,7 +233,7 @@ tuner: ...@@ -239,7 +233,7 @@ tuner:
**参数** **参数**
* **optimize_mode** (*maximize 或 minimize,可选,默认值为 maximize*) - 如果为 'maximize',Tuner 会给出有可能产生较大值的参数组合。 如果为 'minimize',Tuner 会给出有可能产生较小值的参数组合。 * **optimize_mode** (*maximize 或 minimize,可选,默认值为 maximize*) - 如果为 'maximize',Tuner 会给出有可能产生较大值的参数组合。 如果为 'minimize',Tuner 会给出有可能产生较小值的参数组合。
* **R** (*int, 可选, 默认为 60*) - 能分配给 Trial 的最大 STEPS (可以是 mini-batches 或 epochs 的数值)。 Trial 需要用 STEPS 来控制运行时间。 * **R** (*int, 可选, 默认为 60*) - 能分配给 Trial 的最大 STEPS (可以是 mini-batches 或 epochs 的数值)。 每个 Trial 需要用 STEPS 来控制运行时间。
* **eta** (*int, 可选, 默认为 3*) - `(eta-1)/eta` 是丢弃 Trial 的比例。 * **eta** (*int, 可选, 默认为 3*) - `(eta-1)/eta` 是丢弃 Trial 的比例。
**使用样例:** **使用样例:**
......
...@@ -4,5 +4,4 @@ ...@@ -4,5 +4,4 @@
.. toctree:: .. toctree::
设置开发环境<SetupNNIDeveloperEnvironment> 设置开发环境<SetupNNIDeveloperEnvironment>
贡献指南<CONTRIBUTING> 贡献指南<CONTRIBUTING>
如何调试<HowToDebug> \ No newline at end of file
\ No newline at end of file
...@@ -157,7 +157,7 @@ machineList: ...@@ -157,7 +157,7 @@ machineList:
- 说明 - 说明
NNI 会检查 remote, pai 和 Kubernetes 模式下管理器以及 trialKeeper 进程的版本。 如果需要禁用版本检查,debug 应设置为 true。 NNI 会校验 remote, pai 和 Kubernetes 模式下 NNIManager 与 trialKeeper 进程的版本。 如果需要禁用版本校验,debug 应设置为 true。
- **maxTrialNum** - **maxTrialNum**
......
...@@ -102,4 +102,8 @@ frameworkcontroller 模式中的 Trial 配置使用以下主键: ...@@ -102,4 +102,8 @@ frameworkcontroller 模式中的 Trial 配置使用以下主键:
## 如何运行示例 ## 如何运行示例
准备好配置文件后,通过运行 nnictl 来启动 Experiment。 在 Frameworkcontroller 上开始 Experiment 的方法与 Kubeflow 类似,可参考[指南](./KubeflowMode.md)了解更多信息。 准备好配置文件后,通过运行 nnictl 来启动 Experiment。 在 Frameworkcontroller 上开始 Experiment 的方法与 Kubeflow 类似,可参考[指南](./KubeflowMode.md)了解更多信息。
\ No newline at end of file
## 版本校验
从 0.6 开始,NNI 支持查看版本,详情参考[这里](PAIMode.md)
\ No newline at end of file
...@@ -202,4 +202,8 @@ Kubeflow 模式的配置有下列主键: ...@@ -202,4 +202,8 @@ Kubeflow 模式的配置有下列主键:
当一个 Trial 作业完成后,可以在 NNI 网页的概述页面(如:http://localhost:8080/oview)中查看 Trial 的信息。 当一个 Trial 作业完成后,可以在 NNI 网页的概述页面(如:http://localhost:8080/oview)中查看 Trial 的信息。
## 版本校验
从 0.6 开始,NNI 支持版本校验,详情参考[这里](PAIMode.md)
如果在使用 Kubeflow 模式时遇到任何问题,请到 [NNI Github](https://github.com/Microsoft/nni) 中创建问题。 如果在使用 Kubeflow 模式时遇到任何问题,请到 [NNI Github](https://github.com/Microsoft/nni) 中创建问题。
\ No newline at end of file
...@@ -50,7 +50,7 @@ nnictl 支持的命令: ...@@ -50,7 +50,7 @@ nnictl 支持的命令:
注意: 注意:
调试模式会禁用 Trialkeeper 中的版本检查功能。 调试模式会禁用 Trialkeeper 中的版本校验功能。
<a name="resume"></a> <a name="resume"></a>
...@@ -462,7 +462,7 @@ nnictl 支持的命令: ...@@ -462,7 +462,7 @@ nnictl 支持的命令:
<a name="version"></a> <a name="version"></a>
### 检查 NNI 版本 ### NNI 版本校验
* **nnictl --version** * **nnictl --version**
......
...@@ -58,7 +58,7 @@ paiConfig: ...@@ -58,7 +58,7 @@ paiConfig:
* 可选。 指定了 Trial 用于下载数据的 HDFS 数据目录。 格式应为 hdfs://{your HDFS host}:9000/{数据目录} * 可选。 指定了 Trial 用于下载数据的 HDFS 数据目录。 格式应为 hdfs://{your HDFS host}:9000/{数据目录}
* outputDir * outputDir
* 可选。 指定了 Trial 的 HDFS 输出目录。 Trial 在完成(成功或失败)后,Trial 的 stdout, stderr 会被 NNI 自动复制到此目录中。 格式应为 hdfs://{your HDFS host}:9000/{输出目录} * 可选。 指定了 Trial 的 HDFS 输出目录。 Trial 在完成(成功或失败)后,Trial 的 stdout, stderr 会被 NNI 自动复制到此目录中。 格式应为 hdfs://{your HDFS host}:9000/{输出目录}
* virturlCluster * virtualCluster
* 可选。 设置 OpenPAI 的 virtualCluster,即虚拟集群。 如果未设置此参数,将使用默认的虚拟集群。 * 可选。 设置 OpenPAI 的 virtualCluster,即虚拟集群。 如果未设置此参数,将使用默认的虚拟集群。
* shmMB * shmMB
* 可选。 设置 OpenPAI 的 shmMB,即 Docker 中的共享内存。 * 可选。 设置 OpenPAI 的 shmMB,即 Docker 中的共享内存。
...@@ -82,4 +82,16 @@ paiConfig: ...@@ -82,4 +82,16 @@ paiConfig:
如果希望将 Trial 的模型数据等其它输出保存到HDFS中,可在 Trial 代码中使用 `NNI_OUTPUT_DIR` 来自己保存输出文件,NNI SDK会从 Trial 的容器中将 `NNI_OUTPUT_DIR` 中的文件复制到 HDFS 中。 如果希望将 Trial 的模型数据等其它输出保存到HDFS中,可在 Trial 代码中使用 `NNI_OUTPUT_DIR` 来自己保存输出文件,NNI SDK会从 Trial 的容器中将 `NNI_OUTPUT_DIR` 中的文件复制到 HDFS 中。
如果在使用 pai 模式时遇到任何问题,请到 [NNI Github](https://github.com/Microsoft/nni) 中创建问题。 如果在使用 pai 模式时遇到任何问题,请到 [NNI Github](https://github.com/Microsoft/nni) 中创建问题。
\ No newline at end of file
## 版本校验
从 0.6 开始,NNI 支持版本校验。确保 NNIManager 与 trialKeeper 的版本一致,避免兼容性错误。
检查策略:
1. 0.6 以前的 NNIManager 可与任何版本的 trialKeeper 一起运行,trialKeeper 支持向后兼容。
2. 从 NNIManager 0.6 开始,与 triakKeeper 的版本必须一致。 例如,如果 NNIManager 是 0.6 版,则 trialKeeper 也必须是 0.6 版。
3. 注意,只有版本的前两位数字才会被检查。例如,NNIManager 0.6.1 可以和 trialKeeper 的 0.6 或 0.6.2 一起使用,但不能与 trialKeeper 的 0.5.1 或 0.7 版本一起使用。
如果 Experiment 无法运行,而且不能确认是否是因为版本不匹配造成的,可以在 Web 界面检查是否有相关的错误消息。
![](../img/version_check.png)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment