Unverified Commit 45c6508e authored by Chi Song's avatar Chi Song Committed by GitHub
Browse files

fix format of doc, change nni to NNI, yaml to yml. (#660)

fix indents of doc,
change nni to NNI
yaml to yml(file) and YAML(doc)
parent bc9eab33
Dockerfile
===
## 1.Description
This is the Dockerfile of nni project. It includes serveral popular deep learning frameworks and NNI. It is tested on `Ubuntu 16.04 LTS`:
This is the Dockerfile of NNI project. It includes serveral popular deep learning frameworks and NNI. It is tested on `Ubuntu 16.04 LTS`:
```
CUDA 9.0, CuDNN 7.0
......
......@@ -8,7 +8,8 @@ Currently we recommend sharing weights through NFS (Network File System), which
### Weight Sharing through NFS file
With the NFS setup (see below), trial code can share model weight through loading & saving files. Here we recommend that user feed the tuner with the storage path:
```yaml
```yml
tuner:
codeDir: path/to/customer_tuner
classFileName: customer_tuner.py
......@@ -17,6 +18,7 @@ tuner:
...
save_dir_root: /nfs/storage/path/
```
And let tuner decide where to save & load weights and feed the paths to trials through `nni.get_next_parameters()`:
<img src="https://user-images.githubusercontent.com/23273522/51817667-93ebf080-2306-11e9-8395-b18b322062bc.png" alt="drawing" width="700"/>
......
......@@ -32,7 +32,7 @@ It is applicable in a wide range of performance curves, thus, can be used in var
**Usage example:**
```yaml
```yml
# config.yml
assessor:
builtinAssessorName: Medianstop
......@@ -62,7 +62,7 @@ It is applicable in a wide range of performance curves, thus, can be used in var
**Usage example:**
```yaml
```yml
# config.yml
assessor:
builtinAssessorName: Curvefitting
......
......@@ -39,7 +39,7 @@ TPE, as a black-box optimization, can be used in various scenarios and shows goo
**Usage example:**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: TPE
......@@ -65,7 +65,7 @@ Random search is suggested when each trial does not take too long (e.g., each tr
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: Random
......@@ -91,7 +91,7 @@ Anneal is suggested when each trial does not take too long, and you have enough
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: Anneal
......@@ -117,7 +117,7 @@ Its requirement of computation resource is relatively high. Specifically, it req
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: Evolution
......@@ -143,7 +143,7 @@ Similar to TPE, SMAC is also a black-box tuner which can be tried in various sce
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: SMAC
......@@ -165,7 +165,7 @@ If the configurations you want to try have been decided, you can list them in se
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: BatchTuner
......@@ -206,7 +206,7 @@ It is suggested when search space is small, it is feasible to exhaustively sweep
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: GridSearch
......@@ -232,7 +232,7 @@ It is suggested when you have limited computation resource but have relatively l
**Usage example**
```yaml
```yml
# config.yml
advisor:
builtinAdvisorName: Hyperband
......@@ -268,7 +268,7 @@ It is suggested that you want to apply deep learning methods to your task (your
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: NetworkMorphism
......@@ -304,7 +304,7 @@ Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long
**Usage example**
```yaml
```yml
# config.yml
tuner:
builtinTunerName: MetisTuner
......
......@@ -6,7 +6,7 @@ So, if user want to implement a customized Advisor, she/he only need to:
1. Define an Advisor inheriting from the MsgDispatcherBase class
1. Implement the methods with prefix `handle_` except `handle_request`
1. Configure your customized Advisor in experiment yaml config file
1. Configure your customized Advisor in experiment YAML config file
Here is an example:
......@@ -24,11 +24,11 @@ class CustomizedAdvisor(MsgDispatcherBase):
Please refer to the implementation of Hyperband ([src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py](../src/sdk/pynni/nni/hyperband_advisor/hyperband_advisor.py)) for how to implement the methods.
**3) Configure your customized Advisor in experiment yaml config file**
**3) Configure your customized Advisor in experiment YAML config file**
Similar to tuner and assessor. NNI needs to locate your customized Advisor class and instantiate the class, so you need to specify the location of the customized Advisor class and pass literal values as parameters to the \_\_init__ constructor.
```yaml
```yml
advisor:
codeDir: /home/abc/myadvisor
classFileName: my_customized_advisor.py
......
......@@ -8,7 +8,7 @@ If you want to implement a customized Assessor, there are three things for you t
1) Inherit an assessor of a base Assessor class
2) Implement assess_trial function
3) Configure your customized Assessor in experiment yaml config file
3) Configure your customized Assessor in experiment YAML config file
**1. Inherit an assessor of a base Assessor class**
......@@ -38,11 +38,11 @@ class CustomizedAssessor(Assessor):
...
```
**3. Configure your customized Assessor in experiment yaml config file**
**3. Configure your customized Assessor in experiment YAML config file**
NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the \_\_init__ constructor.
```yaml
```yml
assessor:
codeDir: /home/abc/myassessor
......
......@@ -8,7 +8,7 @@ If you want to implement and use your own tuning algorithm, you can implement a
1) Inherit a tuner of a base Tuner class
2) Implement receive_trial_result and generate_parameter function
3) Configure your customized tuner in experiment yaml config file
3) Configure your customized tuner in experiment YAML config file
Here is an example:
......@@ -91,11 +91,11 @@ _fd = open(os.path.join(_pwd, 'data.txt'), 'r')
This is because your tuner is not executed in the directory of your tuner (i.e., `pwd` is not the directory of your own tuner).
**3. Configure your customized tuner in experiment yaml config file**
**3. Configure your customized tuner in experiment YAML config file**
NNI needs to locate your customized tuner class and instantiate the class, so you need to specify the location of the customized tuner class and pass literal values as parameters to the \_\_init__ constructor.
```yaml
```yml
tuner:
codeDir: /home/abc/mytuner
......
......@@ -3,7 +3,7 @@
Assessor module is for assessing running trials. One common use case is early stopping, which terminates unpromising trial jobs based on their intermediate results.
## Using NNI built-in Assessor
Here we use the same example `examples/trials/mnist-annotation`. We use `Medianstop` assessor for this experiment. The yaml configure file is shown below:
Here we use the same example `examples/trials/mnist-annotation`. We use `Medianstop` assessor for this experiment. The yml configure file is shown below:
```
authorName: your_name
experimentName: auto_mnist
......@@ -33,7 +33,7 @@ trial:
For our built-in assessors, you need to fill two fields: `builtinAssessorName` which chooses NNI provided assessors (refer to [here]() for built-in assessors), `optimize_mode` which includes maximize and minimize (you want to maximize or minimize your trial result).
## Using user customized Assessor
You can also write your own assessor following the guidance [here](). For example, you wrote an assessor for `examples/trials/mnist-annotation`. You should prepare the yaml configure below:
You can also write your own assessor following the guidance [here](). For example, you wrote an assessor for `examples/trials/mnist-annotation`. You should prepare the yml configure below:
```
authorName: your_name
experimentName: auto_mnist
......
# Experiment config reference
A config file is needed when create an experiment, the path of the config file is provide to nnictl.
The config file is written in yaml format, and need to be written correctly.
The config file is written in YAML format, and need to be written correctly.
This document describes the rule to write config file, and will provide some examples and templates.
- [Template](#Template) (the templates of an config file)
......@@ -149,7 +149,7 @@ machineList:
* __maxTrialNum__
* Description
__maxTrialNum__ specifies the max number of trial jobs created by nni, including succeeded and failed jobs.
__maxTrialNum__ specifies the max number of trial jobs created by NNI, including succeeded and failed jobs.
* __trainingServicePlatform__
* Description
......@@ -164,7 +164,7 @@ machineList:
* __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), nni support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), NNI support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).
* __searchSpacePath__
* Description
......@@ -182,7 +182,7 @@ machineList:
* __nniManagerIp__
* Description
__nniManagerIp__ set the IP address of the machine on which nni manager process runs. This field is optional, and if it's not set, eth0 device IP will be used instead.
__nniManagerIp__ set the IP address of the machine on which NNI manager process runs. This field is optional, and if it's not set, eth0 device IP will be used instead.
Note: run ifconfig on NNI manager's machine to check if eth0 device exists. If not, we recommend to set nnimanagerIp explicitly.
......@@ -200,11 +200,11 @@ machineList:
* __tuner__
* Description
__tuner__ specifies the tuner algorithm in the experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, need to set __builtinTunerName__ and __classArgs__. Another way is to use users' own tuner file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
__tuner__ specifies the tuner algorithm in the experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by NNI sdk, need to set __builtinTunerName__ and __classArgs__. Another way is to use users' own tuner file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinTunerName__ and __classArgs__
* __builtinTunerName__
__builtinTunerName__ specifies the name of system tuner, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
__builtinTunerName__ specifies the name of system tuner, NNI sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__, __BatchTuner__, __GridSearch__}
* __classArgs__
__classArgs__ specifies the arguments of tuner algorithm. If the __builtinTunerName__ is in {__TPE__, __Random__, __Anneal__, __Evolution__}, user should set __optimize_mode__.
......@@ -231,11 +231,11 @@ machineList:
* Description
__assessor__ specifies the assessor algorithm to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, users need to set __builtinAssessorName__ and __classArgs__. Another way is to use users' own assessor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
__assessor__ specifies the assessor algorithm to run an experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by NNI sdk, users need to set __builtinAssessorName__ and __classArgs__. Another way is to use users' own assessor file, and need to set __codeDirectory__, __classFileName__, __className__ and __classArgs__.
* __builtinAssessorName__ and __classArgs__
* __builtinAssessorName__
__builtinAssessorName__ specifies the name of system assessor, nni sdk provides one kind of assessor {__Medianstop__}
__builtinAssessorName__ specifies the name of system assessor, NNI sdk provides one kind of assessor {__Medianstop__}
* __classArgs__
__classArgs__ specifies the arguments of assessor algorithm
......@@ -383,7 +383,7 @@ machineList:
If users use ssh key to login remote machine, could set __sshKeyPath__ in config file. __sshKeyPath__ is the path of ssh key file, which should be valid.
Note: if users set passwd and sshKeyPath simultaneously, nni will try passwd.
Note: if users set passwd and sshKeyPath simultaneously, NNI will try passwd.
* __passphrase__
......@@ -393,7 +393,7 @@ machineList:
* __operator__
__operator__ specify the kubeflow's operator to be used, nni support __tf-operator__ in current version.
__operator__ specify the kubeflow's operator to be used, NNI support __tf-operator__ in current version.
* __storage__
......@@ -611,11 +611,11 @@ trial:
gpuNum: 4
cpuNum: 2
memoryMB: 10000
#The docker image to run nni job on pai
#The docker image to run NNI job on pai
image: msranni/nni:latest
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory'
dataDir: hdfs://10.11.12.13:9000/test
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
#The hdfs directory to store output data generated by NNI, format 'hdfs://host:port/directory'
outputDir: hdfs://10.11.12.13:9000/test
paiConfig:
#The username to login pai
......
......@@ -9,14 +9,14 @@ When met errors like below, try to clean up **tmp** folder first.
> OSError: [Errno 28] No space left on device
### Cannot get trials' metrics in OpenPAI mode
In OpenPAI training mode, we start a rest server which listens on 51189 port in nniManager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where nniManager runs on to make sure 51189 port is turned on in the firewall rule.
In OpenPAI training mode, we start a rest server which listens on 51189 port in NNI Manager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where NNI manager runs on to make sure 51189 port is turned on in the firewall rule.
### Segmentation Fault (core dumped) when installing
> make: *** [install-XXX] Segmentation fault (core dumped)
Please try the following solutions in turn:
* Update or reinstall you current python's pip like `python3 -m pip install -U pip`
* Install nni with `--no-cache-dir` flag like `python3 -m pip install nni --no-cache-dir`
* Install NNI with `--no-cache-dir` flag like `python3 -m pip install nni --no-cache-dir`
### Job management error: getIPV4Address() failed because os.networkInterfaces().eth0 is undefined.
Your machine don't have eth0 device, please set nniManagerIp in your config file manually. [refer](https://github.com/Microsoft/nni/blob/master/docs/ExperimentConfig.md)
......@@ -25,7 +25,7 @@ Your machine don't have eth0 device, please set nniManagerIp in your config file
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
### Could not stop an experiment using `nnictl stop`
If you upgrade your nni or you delete some config files of nni when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually.
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually.
### Could not get `default metric` in webUI of virtual machines
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
......
......@@ -4,9 +4,9 @@ NNI supports running experiment using [FrameworkController](https://github.com/M
## Prerequisite for on-premises Kubernetes Service
1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes
2. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig.
2. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig.
3. If your NNI trial job needs GPU resource, you should follow this [guideline](https://github.com/NVIDIA/k8s-device-plugin) to configure **Nvidia device plugin for Kubernetes**.
4. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when nni copy files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**.
4. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when NNI copies files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**.
5. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment. Run this command to install NFSv4 client:
```
apt-get install nfs-common
......@@ -17,12 +17,12 @@ NNI supports running experiment using [FrameworkController](https://github.com/M
## Prerequisite for Azure Kubernetes Service
1. NNI support kubeflow based on Azure Kubernetes Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernetes Service.
2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, refer this [guideline](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster).
3. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, nni need Azure Storage Service to store code files and the output files.
4. To access Azure storage service, nni need the access key of the storage account, and nni use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
3. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files.
4. To access Azure storage service, NNI need the access key of the storage account, and NNI uses [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
## Set up FrameworkController
Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run) to set up frameworkcontroller in the kubernetes cluster, nni support frameworkcontroller by the statefulset mode.
Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run) to set up frameworkcontroller in the kubernetes cluster, NNI supports frameworkcontroller by the statefulset mode.
## Design
Please refer the design of [kubeflow training service](./KubeflowMode.md), frameworkcontroller training service pipeline is similar.
......@@ -71,7 +71,7 @@ frameworkcontrollerConfig:
server: {your_nfs_server}
path: {your_nfs_server_exported_path}
```
If you use Azure Kubernetes Service, you should set `frameworkcontrollerConfig` in your config yaml file as follows:
If you use Azure Kubernetes Service, you should set `frameworkcontrollerConfig` in your config YAML file as follows:
```
frameworkcontrollerConfig:
storage: azureStorage
......@@ -82,9 +82,9 @@ frameworkcontrollerConfig:
accountName: {your_storage_account_name}
azureShare: {your_azure_share_name}
```
Note: You should explicitly set `trainingServicePlatform: frameworkcontroller` in nni config yaml file if you want to start experiment in frameworkcontrollerConfig mode.
Note: You should explicitly set `trainingServicePlatform: frameworkcontroller` in NNI config YAML file if you want to start experiment in frameworkcontrollerConfig mode.
The trial's config format for nni frameworkcontroller mode is a simple version of frameworkcontroller's offical config, you could refer the [tensorflow example of frameworkcontroller](https://github.com/Microsoft/frameworkcontroller/blob/master/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml) for deep understanding.
The trial's config format for NNI frameworkcontroller mode is a simple version of frameworkcontroller's offical config, you could refer the [tensorflow example of frameworkcontroller](https://github.com/Microsoft/frameworkcontroller/blob/master/example/framework/scenario/tensorflow/cpu/tensorflowdistributedtrainingwithcpu.yaml) for deep understanding.
Trial configuration in frameworkcontroller mode have the following configuration keys:
* taskRoles: you could set multiple task roles in config file, and each task role is a basic unit to process in kubernetes cluster.
* name: the name of task role specified, like "worker", "ps", "master".
......
......@@ -34,7 +34,7 @@ An experiment is to run multiple trial jobs, each trial job tries a configuratio
* Provide a runnable trial
* Provide or choose a tuner
* Provide a yaml experiment configure file
* Provide a YAML experiment configure file
* (optional) Provide or choose an assessor
**Prepare trial**: Let's use a simple trial example, e.g. mnist, provided by NNI. After you installed NNI, NNI examples have been put in ~/nni/examples, run `ls ~/nni/examples/trials` to see all the trial examples. You can simply execute the following command to run the NNI mnist example:
......@@ -43,11 +43,11 @@ An experiment is to run multiple trial jobs, each trial job tries a configuratio
python3 ~/nni/examples/trials/mnist-annotation/mnist.py
```
This command will be filled in the yaml configure file below. Please refer to [here](howto_1_WriteTrial.md) for how to write your own trial.
This command will be filled in the YAML configure file below. Please refer to [here](howto_1_WriteTrial.md) for how to write your own trial.
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](howto_2_CustomizedTuner.md), but for simplicity, here we choose a tuner provided by NNI as below:
```yaml
```yml
tuner:
builtinTunerName: TPE
classArgs:
......@@ -56,9 +56,9 @@ tuner:
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner, *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the yaml configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below:
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below:
```yaml
```yml
authorName: your_name
experimentName: auto_mnist
......
......@@ -14,7 +14,7 @@ Moreover, in GridSearch Tuner, for users' convenience, the definition of `qunifo
## 2. Usage
Since Grid Search Tuner will exhaust all possible hyper-parameter combination according to the search space file without any hyper-parameter for tuner itself, all you need to do is to specify tuner name in your experiment's yaml config file:
Since Grid Search Tuner will exhaust all possible hyper-parameter combination according to the search space file without any hyper-parameter for tuner itself, all you need to do is to specify tuner name in your experiment's YAML config file:
```
tuner:
......
......@@ -29,8 +29,8 @@ This optimization approach is described in detail in [Algorithms for Hyper-Param
_Suggested scenario_: TPE, as a black-box optimization, can be used in various scenarios, and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: TPE
classArgs:
......@@ -46,8 +46,8 @@ In [Random Search for Hyper-Parameter Optimization][2] show that Random Search m
_Suggested scenario_: Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: Random
```
......@@ -60,8 +60,8 @@ This simple annealing algorithm begins by sampling from the prior, but tends ove
_Suggested scenario_: Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: Anneal
classArgs:
......@@ -77,8 +77,8 @@ Naive Evolution comes from [Large-Scale Evolution of Image Classifiers][3]. It r
_Suggested scenario_: Its requirement of computation resource is relatively high. Specifically, it requires large inital population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: Evolution
classArgs:
......@@ -89,9 +89,9 @@ _Usage_:
<a name="SMAC"></a>
**SMAC**
[SMAC][4] is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on [the SMAC3 github repo][5].
[SMAC][4] is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on [the SMAC3 github repo][5].
Note that SMAC on nni only supports a subset of the types in [search space spec](./SearchSpaceSpec.md), including `choice`, `randint`, `uniform`, `loguniform`, `quniform(q=1)`.
Note that SMAC on NNI only supports a subset of the types in [search space spec](./SearchSpaceSpec.md), including `choice`, `randint`, `uniform`, `loguniform`, `quniform(q=1)`.
_Installation_:
* Install swig first. (`sudo apt-get install swig` for Ubuntu users)
......@@ -100,8 +100,8 @@ _Installation_:
_Suggested scenario_: Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: SMAC
classArgs:
......@@ -117,8 +117,8 @@ Batch tuner allows users to simply provide several configurations (i.e., choices
_Suggested sceanrio_: If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: BatchTuner
```
......@@ -149,8 +149,8 @@ Note that the only acceptable types of search space are `choice`, `quniform`, `q
_Suggested scenario_: It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: GridSearch
```
......@@ -163,8 +163,8 @@ _Usage_:
_Suggested scenario_: It is suggested when you have limited computation resource but have relatively large search space. It performs good in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
advisor:
builtinAdvisorName: Hyperband
classArgs:
......@@ -189,8 +189,8 @@ NetworkMorphism requires [pyTorch](https://pytorch.org/get-started/locally), so
_Suggested scenario_: It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](../examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the cv domain.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: NetworkMorphism
classArgs:
......@@ -232,11 +232,11 @@ Metis Tuner requires [sklearn](https://scikit-learn.org/), so users should insta
_Suggested scenario_:
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](../examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the nni SDK.
Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](../examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK.
_Usage_:
```yaml
# config.yaml
```yml
# config.yml
tuner:
builtinTunerName: MetisTuner
classArgs:
......@@ -262,7 +262,7 @@ Medianstop is a simple early stopping rule mentioned in the [paper][8]. It stops
_Suggested scenario_: It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress.
_Usage_:
```yaml
```yml
assessor:
builtinAssessorName: Medianstop
classArgs:
......@@ -282,7 +282,7 @@ Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It s
_Suggested scenario_: It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance.
_Usage_:
```yaml
```yml
assessor:
builtinAssessorName: Curvefitting
classArgs:
......
......@@ -7,7 +7,7 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
2. Download, set up, and deploy **Kubelow** to your Kubernetes cluster. Follow this [guideline](https://www.kubeflow.org/docs/started/getting-started/) to set up Kubeflow
3. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig.
4. If your NNI trial job needs GPU resource, you should follow this [guideline](https://github.com/NVIDIA/k8s-device-plugin) to configure **Nvidia device plugin for Kubernetes**.
5. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when nni copy files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**.
5. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when NNI copy files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**.
6. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment. Run this command to install NFSv4 client:
```
apt-get install nfs-common
......@@ -19,14 +19,14 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
1. NNI support kubeflow based on Azure Kubernetes Service, follow the [guideline](https://azure.microsoft.com/en-us/services/kubernetes-service/) to set up Azure Kubernetes Service.
2. Install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and __kubectl__. Use `az login` to set azure account, and connect kubectl client to AKS, refer this [guideline](https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster).
3. Deploy kubeflow on Azure Kubernetes Service, follow the [guideline](https://www.kubeflow.org/docs/started/getting-started/).
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, nni need Azure Storage Service to store code files and the output files.
5. To access Azure storage service, nni need the access key of the storage account, and nni use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files.
5. To access Azure storage service, NNI need the access key of the storage account, and NNI use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
## Design
![](./img/kubeflow_training_design.png)
Kubeflow training service instantiates a kubernetes rest client to interact with your K8s cluster's API server.
For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yaml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumns: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in nni config yaml file. After files are prepared, Kubeflow training service will call K8S rest API to create kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volumn into the job's pod. Output files of kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files.
For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumns: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in NNI config YAML file. After files are prepared, Kubeflow training service will call K8S rest API to create kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volumn into the job's pod. Output files of kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files.
## Supported operator
NNI only support tf-operator and pytorch-operator of kubeflow, other operators is not tested.
......@@ -55,7 +55,7 @@ kubeflowConfig:
# Your NFS server export path, like /var/nfs/nni
path: {your_nfs_server_export_path}
```
If you use Azure storage, you should set `kubeflowConfig` in your config yaml file as follows:
If you use Azure storage, you should set `kubeflowConfig` in your config YAML file as follows:
```
kubeflowConfig:
storage: azureStorage
......@@ -69,7 +69,7 @@ kubeflowConfig:
## Run an experiment
Use `examples/trials/mnist` as an example. This is a tensorflow job, and use tf-operator of kubeflow. The nni config yaml file's content is like:
Use `examples/trials/mnist` as an example. This is a tensorflow job, and use tf-operator of kubeflow. The NNI config yml file's content is like:
```
authorName: default
experimentName: example_mnist
......@@ -119,7 +119,7 @@ kubeflowConfig:
path: {your_nfs_server_export_path}
```
Note: You should explicitly set `trainingServicePlatform: kubeflow` in nni config yaml file if you want to start experiment in kubeflow mode.
Note: You should explicitly set `trainingServicePlatform: kubeflow` in NNI config yml file if you want to start experiment in kubeflow mode.
If you want to run Pytorch jobs, you could set your config files as follow:
```
......@@ -185,9 +185,9 @@ Trial configuration in kubeflow mode have the following configuration keys:
* ps (optional). This config section is used to configure tensorflow parameter server role.
* master(optional). This config section is used to configure pytorch parameter server role.
Once complete to fill nni experiment config file and save (for example, save as exp_kubeflow.yaml), then run the following command
Once complete to fill NNI experiment config file and save (for example, save as exp_kubeflow.yml), then run the following command
```
nnictl create --config exp_kubeflow.yaml
nnictl create --config exp_kubeflow.yml
```
to start the experiment in kubeflow mode. NNI will create Kubeflow tfjob or pytorchjob for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see the kubeflow tfjob created by NNI in your Kubernetes dashboard.
......
# nnictl
## Introduction
__nnictl__ is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc.
......@@ -8,41 +7,45 @@ __nnictl__ is a command line tool, which can be used to control experiments, suc
## Commands
nnictl support commands:
- [nnictl create](#create)
- [nnictl resume](#resume)
- [nnictl stop](#stop)
- [nnictl update](#update)
- [nnictl trial](#trial)
- [nnictl top](#top)
- [nnictl experiment](#experiment)
- [nnictl config](#config)
- [nnictl log](#log)
- [nnictl webui](#webui)
- [nnictl tensorboard](#tensorboard)
- [nnictl package](#package)
* [nnictl create](#create)
* [nnictl resume](#resume)
* [nnictl stop](#stop)
* [nnictl update](#update)
* [nnictl trial](#trial)
* [nnictl top](#top)
* [nnictl experiment](#experiment)
* [nnictl config](#config)
* [nnictl log](#log)
* [nnictl webui](#webui)
* [nnictl tensorboard](#tensorboard)
* [nnictl package](#package)
* [nnictl --version](#version)
### Manage an experiment
<a name="create"></a>
* __nnictl create__
* Description
You can use this command to create a new experiment, using the configuration specified in config file.
After this command is successfully done, the context will be set as this experiment,
which means the following command you issued is associated with this experiment,
unless you explicitly changes the context(not supported yet).
* __nnictl create__
* Description
You can use this command to create a new experiment, using the configuration specified in config file.
After this command is successfully done, the context will be set as this experiment, which means the following command you issued is associated with this experiment, unless you explicitly changes the context(not supported yet).
* Usage
nnictl create [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --config, -c| True| |yaml configure file of the experiment|
| --port, -p | False| |the port of restful server|
```bash
nnictl create [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|--config, -c| True| |YAML configure file of the experiment|
|--port, -p|False| |the port of restful server|
<a name="resume"></a>
* __nnictl resume__
......@@ -56,17 +59,16 @@ nnictl support commands:
nnictl resume [OPTIONS]
```
Options:
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |The id of the experiment you want to resume|
|--port, -p| False| |Rest port of the experiment you want to resume|
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |The id of the experiment you want to resume|
| --port, -p| False| |Rest port of the experiment you want to resume|
<a name="stop"></a>
* __nnictl stop__
* Description
You can use this command to stop a running experiment or multiple experiments.
......@@ -78,81 +80,89 @@ nnictl support commands:
```
* Detail
1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
6.Users could use 'nnictl stop all' to stop all experiments
1.If there is an id specified, and the id matches the running experiment, nnictl will stop the corresponding experiment, or will print error message.
2.If there is no id specified, and there is an experiment running, stop the running experiment, or print error message.
3.If the id ends with *, nnictl will stop all experiments whose ids matchs the regular.
4.If the id does not exist but match the prefix of an experiment id, nnictl will stop the matched experiment.
5.If the id does not exist but match multiple prefix of the experiment ids, nnictl will give id information.
6.Users could use 'nnictl stop all' to stop all experiments.
<a name="update"></a>
* __nnictl update__
* __nnictl update searchspace__
* Description
You can use this command to update an experiment's search space.
* Usage
nnictl update searchspace [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --filename, -f| True| |the file storing your new search space|
* __nnictl update concurrency__
* Description
You can use this command to update an experiment's concurrency.
* Usage
nnictl update concurrency [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --value, -v| True| |the number of allowed concurrent trials|
* __nnictl update duration__
* Description
You can use this command to update an experiment's concurrency.
* Usage
nnictl update duration [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
* __nnictl update trialnum__
* Description
You can use this command to update an experiment's maxtrialnum.
* Usage
nnictl update trialnum [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --value, -v| True| |the new number of maxtrialnum you want to set|
* __nnictl update searchspace__
* Description
You can use this command to update an experiment's search space.
* Usage
```bash
nnictl update searchspace [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--filename, -f| True| |the file storing your new search space|
* __nnictl update concurrency__
* Description
You can use this command to update an experiment's concurrency.
* Usage
```bash
nnictl update concurrency [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--value, -v| True| |the number of allowed concurrent trials|
* __nnictl update duration__
* Description
You can use this command to update an experiment's concurrency.
* Usage
```bash
nnictl update duration [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
* __nnictl update trialnum__
* Description
You can use this command to update an experiment's maxtrialnum.
* Usage
```bash
nnictl update trialnum [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--value, -v| True| |the new number of maxtrialnum you want to set|
<a name="trial"></a>
* __nnictl trial__
......@@ -163,50 +173,56 @@ nnictl support commands:
You can use this command to show trial's information.
* Usage
```bash
nnictl trial ls
```
Options:
* Options
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
* __nnictl trial kill__
* Description
You can use this command to kill a trial job.
* Usage
nnictl trial kill [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --trialid, -t| True| |ID of the trial you want to kill.|
* Description
You can use this command to kill a trial job.
* Usage
```bash
nnictl trial kill [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--trialid, -t| True| |ID of the trial you want to kill.|
<a name="top"></a>
* __nnictl top__
* Description
Monitor all of running experiments.
* Usage
nnictl top
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --time, -t| False| |The interval to update the experiment status, the unit of time is second, and the default value is 3 second.|
* Description
Monitor all of running experiments.
* Usage
```bash
nnictl top
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--time, -t| False| |The interval to update the experiment status, the unit of time is second, and the default value is 3 second.|
<a name="experiment"></a>
### Manage experiment information
......@@ -222,11 +238,11 @@ nnictl support commands:
nnictl experiment show
```
Options:
* Options
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
* __nnictl experiment status__
......@@ -240,13 +256,14 @@ nnictl support commands:
nnictl experiment status
```
Options:
* Options
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
* __nnictl experiment list__
* Description
Show the information of all the (running) experiments.
......@@ -257,17 +274,19 @@ nnictl support commands:
nnictl experiment list
```
<a name="config"></a>
* __nnictl config show__
* Description
Display the current context information.
* Usage
nnictl config show
* Description
Display the current context information.
* Usage
```bash
nnictl config show
```
<a name="log"></a>
### Manage log
......@@ -283,42 +302,53 @@ nnictl support commands:
nnictl log stdout [options]
```
Options:
* Options
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --head, -h| False| |show head lines of stdout|
| --tail, -t| False| |show tail lines of stdout|
| --path, -p| False| |show the path of stdout file|
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--head, -h| False| |show head lines of stdout|
|--tail, -t| False| |show tail lines of stdout|
|--path, -p| False| |show the path of stdout file|
* __nnictl log stderr__
* Description
Show the stderr log content.
* Usage
```bash
nnictl log stderr [options]
```
Options:
* Options
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --head, -h| False| |show head lines of stderr|
| --tail, -t| False| |show tail lines of stderr|
| --path, -p| False| |show the path of stderr file|
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--head, -h| False| |show head lines of stderr|
|--tail, -t| False| |show tail lines of stderr|
|--path, -p| False| |show the path of stderr file|
* __nnictl log trial__
* Description
Show trial log path.
* Usage
```bash
nnictl log trial [options]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |the id of trial|
<a name="webui"></a>
### Manage webui
......@@ -339,13 +369,13 @@ nnictl support commands:
nnictl tensorboard start
```
Options:
* Options
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
| --trialid| False| |ID of the trial|
| --port| False| 6006|The port of the tensorboard process|
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
|--trialid| False| |ID of the trial|
|--port| False| 6006|The port of the tensorboard process|
* Detail
......@@ -356,42 +386,65 @@ nnictl support commands:
5. If there is only one trial job, you don't need to set trialid. If there are multiple trial jobs running, you should set the trialid, or you could use [nnictl tensorboard start --trialid all] to map --logdir to all trial log paths.
* __nnictl tensorboard stop__
* Description
Stop all of the tensorboard process.
* Usage
nnictl tensorboard stop
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| id| False| |ID of the experiment you want to set|
* Description
Stop all of the tensorboard process.
* Usage
```bash
nnictl tensorboard stop
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|id| False| |ID of the experiment you want to set|
<a name="package"></a>
### Manage package
* __nnictl package install__
* Description
Install the packages needed in nni experiments.
* Usage
nnictl package install [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --name| True| |The name of package to be installed|
* Description
Install the packages needed in nni experiments.
* Usage
```bash
nnictl package install [OPTIONS]
```
* Options
|Name, shorthand|Required|Default|Description|
|------|------|------ |------|
|--name| True| |The name of package to be installed|
* __nnictl package show__
* Description
List the packages supported.
* Usage
nnictl package show
* Description
List the packages supported.
* Usage
```bash
nnictl package show
```
<a name="version"></a>
### Check NNI version
* __nnictl --version__
* Description
Describe the current version of NNI installed.
* Usage
```bash
nnictl --version
```
\ No newline at end of file
......@@ -6,9 +6,9 @@ NNI supports running an experiment on [OpenPAI](https://github.com/Microsoft/pai
Install NNI, follow the install guide [here](GetStarted.md).
## Run an experiment
Use `examples/trials/mnist-annotation` as an example. The nni config yaml file's content is like:
Use `examples/trials/mnist-annotation` as an example. The NNI config YAML file's content is like:
```yaml
```yml
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
......@@ -41,7 +41,7 @@ paiConfig:
host: 10.1.1.1
```
Note: You should set `trainingServicePlatform: pai` in nni config yaml file if you want to start experiment in pai mode.
Note: You should set `trainingServicePlatform: pai` in NNI config YAML file if you want to start experiment in pai mode.
Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial configuration in pai mode have five additional keys:
* cpuNum
......@@ -49,16 +49,16 @@ Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial con
* memoryMB
* Required key. Should be positive number based on your trial program's memory requirement
* image
* Required key. In pai mode, your trial program will be scheduled by OpenPAI to run in [Docker container](https://www.docker.com/). This key is used to specify the Docker image used to create the container in which your traill will run.
* Required key. In pai mode, your trial program will be scheduled by OpenPAI to run in [Docker container](https://www.docker.com/). This key is used to specify the Docker image used to create the container in which your trial will run.
* We already build a docker image [nnimsra/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](../deployment/Dockerfile.build.base). You can either use this image directly in your config file, or build your own image based on it.
* dataDir
* Optional key. It specifies the HDFS data direcotry for trial to download data. The format should be something like hdfs://{your HDFS host}:9000/{your data directory}
* outputDir
* Optional key. It specifies the HDFS output direcotry for trial. Once the trial is completed (either succeed or fail), trial's stdout, stderr will be copied to this directory by NNI sdk automatically. The format should be something like hdfs://{your HDFS host}:9000/{your output directory}
Once complete to fill nni experiment config file and save (for example, save as exp_pai.yaml), then run the following command
Once complete to fill NNI experiment config file and save (for example, save as exp_pai.yml), then run the following command
```
nnictl create --config exp_pai.yaml
nnictl create --config exp_pai.yml
```
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see the pai jobs created by NNI in your OpenPAI cluster's web portal, like:
......@@ -78,4 +78,4 @@ You can see there're three fils in output folder: stderr, stdout, and trial.log
If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS.
Any problems when using NNI in pai mode, plesae create issues on [NNI github repo](https://github.com/Microsoft/nni), or send mail to nni@microsoft.com
Any problems when using NNI in pai mode, plesae create issues on [NNI github repo](https://github.com/Microsoft/nni).
......@@ -104,9 +104,9 @@ If you want to use NNI to automatically train your model and find the optimal hy
*Implemented code directory: [mnist.py](../examples/trials/mnist/mnist.py)*
**Step 3**: Define a `config` file in yaml, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max runtime arguments.
**Step 3**: Define a `config` file in YAML, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max runtime arguments.
```yaml
```yml
authorName: default
experimentName: example_mnist
trialConcurrency: 1
......
# ChangeLog
## Release 0.5.0 - 01/14/2019
### Major Features
#### New tuner and assessor supports
* Support [Metis tuner](./HowToChooseTuner.md#MetisTuner) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning.
* Support [ENAS customized tuner](https://github.com/countif/enas_nni), a tuner contributed by github community user, is an algorithm for neural network search, it could learn neural network architecture via reinforcement learning and serve a better performance than NAS.
* Support [Curve fitting assessor](./HowToChooseTuner.md#Curvefitting) for early stop policy using learning curve extrapolation.
* Advanced Support of [Weight Sharing](./AdvancedNAS.md): Enable weight sharing for NAS tuners, currently through NFS.
* Support [Metis tuner](./HowToChooseTuner.md#MetisTuner) as a new NNI tuner. Metis algorithm has been proofed to be well performed for **online** hyper-parameter tuning.
* Support [ENAS customized tuner](https://github.com/countif/enas_nni), a tuner contributed by github community user, is an algorithm for neural network search, it could learn neural network architecture via reinforcement learning and serve a better performance than NAS.
* Support [Curve fitting assessor](./HowToChooseTuner.md#Curvefitting) for early stop policy using learning curve extrapolation.
* Advanced Support of [Weight Sharing](./AdvancedNAS.md): Enable weight sharing for NAS tuners, currently through NFS.
#### Training Service Enhancement
* [FrameworkController Training service](./FrameworkControllerMode.md): Support run experiments using frameworkcontroller on kubernetes
* FrameworkController is a Controller on kubernetes that is general enough to run (distributed) jobs with various machine learning frameworks, such as tensorflow, pytorch, MXNet.
* NNI provides unified and simple specification for job definition.
* MNIST example for how to use FrameworkController.
* FrameworkController is a Controller on kubernetes that is general enough to run (distributed) jobs with various machine learning frameworks, such as tensorflow, pytorch, MXNet.
* NNI provides unified and simple specification for job definition.
* MNIST example for how to use FrameworkController.
#### User Experience improvements
* A better trial logging support for NNI experiments in PAI, Kubeflow and FrameworkController mode:
* An improved logging architecture to send stdout/stderr of trials to NNI manager via Http post. NNI manager will store trial's stdout/stderr messages in local log file.
* Show the link for trial log file on WebUI.
* Support to show final result's all key-value pairs.
* A better trial logging support for NNI experiments in PAI, Kubeflow and FrameworkController mode:
* An improved logging architecture to send stdout/stderr of trials to NNI manager via Http post. NNI manager will store trial's stdout/stderr messages in local log file.
* Show the link for trial log file on WebUI.
* Support to show final result's all key-value pairs.
## Release 0.4.1 - 12/14/2018
### Major Features
#### New tuner supports
* Support [network morphism](./HowToChooseTuner.md#NetworkMorphism) as a new tuner
* Support [network morphism](./HowToChooseTuner.md#NetworkMorphism) as a new tuner
#### Training Service improvements
* Migrate [Kubeflow training service](https://github.com/Microsoft/nni/blob/master/docs/KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client
* [Pytorch-operator](https://github.com/kubeflow/pytorch-operator) support for Kubeflow training service
* Improvement on local code files uploading to OpenPAI HDFS
* Fixed OpenPAI integration WebUI bug: WebUI doesn't show latest trial job status, which is caused by OpenPAI token expiration
* Migrate [Kubeflow training service](https://github.com/Microsoft/nni/blob/master/docs/KubeflowMode.md)'s dependency from kubectl CLI to [Kubernetes API](https://kubernetes.io/docs/concepts/overview/kubernetes-api/) client
* [Pytorch-operator](https://github.com/kubeflow/pytorch-operator) support for Kubeflow training service
* Improvement on local code files uploading to OpenPAI HDFS
* Fixed OpenPAI integration WebUI bug: WebUI doesn't show latest trial job status, which is caused by OpenPAI token expiration
#### NNICTL improvements
* Show version information both in nnictl and WebUI. You can run **nnictl -v** to show your current installed NNI version
* Show version information both in nnictl and WebUI. You can run **nnictl -v** to show your current installed NNI version
#### WebUI improvements
* Enable modify concurrency number during experiment
* Add feedback link to NNI github 'create issue' page
* Enable customize top 10 trials regarding to metric numbers (largest or smallest)
* Enable download logs for dispatcher & nnimanager
* Enable automatic scaling of axes for metric number
* Update annotation to support displaying real choice in searchspace
* Enable modify concurrency number during experiment
* Add feedback link to NNI github 'create issue' page
* Enable customize top 10 trials regarding to metric numbers (largest or smallest)
* Enable download logs for dispatcher & nnimanager
* Enable automatic scaling of axes for metric number
* Update annotation to support displaying real choice in searchspace
### New examples
* [FashionMnist](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism), work together with network morphism tuner
* [Distributed MNIST example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed-pytorch) written in PyTorch
* [FashionMnist](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism), work together with network morphism tuner
* [Distributed MNIST example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed-pytorch) written in PyTorch
## Release 0.4 - 12/6/2018
### Major Features
* [Kubeflow Training service](./KubeflowMode.md)
* Support tf-operator
* [Distributed trial example](../examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
* [Grid search tuner](../src/sdk/pynni/nni/README.md#Grid)
* [Hyperband tuner](../src/sdk/pynni/nni/README.md#Hyperband)
* Support launch NNI experiment on MAC
* WebUI
* UI support for hyperband tuner
* Remove tensorboard button
* Show experiment error message
* Show line numbers in search space and trial profile
* Support search a specific trial by trial number
* Show trial's hdfsLogPath
* Download experiment parameters
* [Kubeflow Training service](./KubeflowMode.md)
* Support tf-operator
* [Distributed trial example](../examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
* [Grid search tuner](../src/sdk/pynni/nni/README.md#Grid)
* [Hyperband tuner](../src/sdk/pynni/nni/README.md#Hyperband)
* Support launch NNI experiment on MAC
* WebUI
* UI support for hyperband tuner
* Remove tensorboard button
* Show experiment error message
* Show line numbers in search space and trial profile
* Support search a specific trial by trial number
* Show trial's hdfsLogPath
* Download experiment parameters
### Others
* Asynchronous dispatcher
* Docker file update, add pytorch library
* Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API.
* OpenPAI training service bug fix
* Support NNI Manager IP configuration(nniManagerIp) in PAI cluster config file, to fix the issue that user’s machine has no eth0 device
* File number in codeDir is capped to 1000 now, to avoid user mistakenly fill root dir for codeDir
* Don’t print useless ‘metrics is empty’ log int PAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks PAI trial’s output for debugging purpose
* Add timestamp at the beginning of each log entry in trial keeper.
* Asynchronous dispatcher
* Docker file update, add pytorch library
* Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API.
* OpenPAI training service bug fix
* Support NNI Manager IP configuration(nniManagerIp) in PAI cluster config file, to fix the issue that user’s machine has no eth0 device
* File number in codeDir is capped to 1000 now, to avoid user mistakenly fill root dir for codeDir
* Don’t print useless ‘metrics is empty’ log int PAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks PAI trial’s output for debugging purpose
* Add timestamp at the beginning of each log entry in trial keeper.
## Release 0.3.0 - 11/2/2018
### NNICTL new features and updates
* Support running multiple experiments simultaneously.
Before v0.3, NNI only supports running single experiment once a time. After this realse, users are able to run multiple experiments simultaneously. Each experiment will require a unique port, the 1st experiment will be set to the default port as previous versions. You can specify a unique port for the rest experiments as below:
* Support running multiple experiments simultaneously.
Before v0.3, NNI only supports running single experiment once a time. After this realse, users are able to run multiple experiments simultaneously. Each experiment will require a unique port, the 1st experiment will be set to the default port as previous versions. You can specify a unique port for the rest experiments as below:
```bash
nnictl create --port 8081 --config <config file path>
```
```nnictl create --port 8081 --config <config file path>```
* Support updating max trial number.
use ```nnictl update --help``` to learn more. Or refer to [NNICTL Spec](https://github.com/Microsoft/nni/blob/master/docs/NNICTLDOC.md) for the fully usage of NNICTL.
use `nnictl update --help` to learn more. Or refer to [NNICTL Spec](https://github.com/Microsoft/nni/blob/master/docs/NNICTLDOC.md) for the fully usage of NNICTL.
### API new features and updates
* <span style="color:red">**breaking change**</span>: nn.get_parameters() is refactored to nni.get_next_parameter. All examples of prior releases can not run on v0.3, please clone nni repo to get new examples. If you had applied NNI to your own codes, please update the API accordingly.
* New API **nni.get_sequence_id()**.
Each trial job is allocated a unique sequence number, which can be retrieved by nni.get_sequence_id() API.
Each trial job is allocated a unique sequence number, which can be retrieved by nni.get_sequence_id() API.
```bash
git clone -b v0.3 https://github.com/Microsoft/nni.git
```
* **nni.report_final_result(result)** API supports more data types for result parameter.
```git clone -b v0.3 https://github.com/Microsoft/nni.git```
* **nni.report_final_result(result)** API supports more data types for result parameter.
It can be of following types:
* int
* float
* A python dict containing 'default' key, the value of 'default' key should be of type int or float. The dict can contain any other key value pairs.
It can be of following types:
* int
* float
* A python dict containing 'default' key, the value of 'default' key should be of type int or float. The dict can contain any other key value pairs.
### New tuner support
* **Batch Tuner** which iterates all parameter combination, can be used to submit batch trial jobs.
### New examples
* A NNI Docker image for public usage:
```docker pull msranni/nni:latest```
```bash
docker pull msranni/nni:latest
```
* New trial example: [NNI Sklearn Example](https://github.com/Microsoft/nni/tree/master/examples/trials/sklearn)
* New competition example: [Kaggle Competition TGS Salt Example](https://github.com/Microsoft/nni/tree/master/examples/trials/kaggle-tgs-salt)
### Others
* UI refactoring, refer to [WebUI doc](WebUI.md) for how to work with the new UI.
* Continuous Integration: NNI had switched to Azure pipelines
* [Known Issues in release 0.3.0](https://github.com/Microsoft/nni/labels/nni030knownissues).
## Release 0.2.0 - 9/29/2018
### Major Features
* Support [OpenPAI](https://github.com/Microsoft/pai) (aka pai) Training Service (See [here](./PAIMode.md) for instructions about how to submit NNI job in pai mode)
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](HowToChooseTuner.md) for instructions about how to use SMAC tuner)
* [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO to handle categorical parameters. The SMAC supported by NNI is a wrapper on [SMAC3](https://github.com/automl/SMAC3)
* Support NNI installation on [conda](https://conda.io/docs/index.html) and python virtual environment
* Others
* Update ga squad example and related documentation
* WebUI UX small enhancement and bug fix
* Support [OpenPAI](https://github.com/Microsoft/pai) (aka pai) Training Service (See [here](./PAIMode.md) for instructions about how to submit NNI job in pai mode)
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](HowToChooseTuner.md) for instructions about how to use SMAC tuner)
* [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO to handle categorical parameters. The SMAC supported by NNI is a wrapper on [SMAC3](https://github.com/automl/SMAC3)
* Support NNI installation on [conda](https://conda.io/docs/index.html) and python virtual environment
* Others
* Update ga squad example and related documentation
* WebUI UX small enhancement and bug fix
### Known Issues
[Known Issues in release 0.2.0](https://github.com/Microsoft/nni/labels/nni020knownissues).
## Release 0.1.0 - 9/10/2018 (initial release)
......@@ -133,21 +165,23 @@
Initial release of Neural Network Intelligence (NNI).
### Major Features
* Installation and Deployment
* Support pip install and source codes install
* Support training services on local mode(including Multi-GPU mode) as well as multi-machines mode
* Tuners, Assessors and Trial
* Support AutoML algorithms including: hyperopt_tpe, hyperopt_annealing, hyperopt_random, and evolution_tuner
* Support assessor(early stop) algorithms including: medianstop algorithm
* Provide Python API for user defined tuners and assessors
* Provide Python API for user to wrap trial code as NNI deployable codes
* Experiments
* Provide a command line toolkit 'nnictl' for experiments management
* Provide a WebUI for viewing experiments details and managing experiments
* Continuous Integration
* Support CI by providing out-of-box integration with [travis-ci](https://github.com/travis-ci) on ubuntu
* Others
* Support simple GPU job scheduling
* Installation and Deployment
* Support pip install and source codes install
* Support training services on local mode(including Multi-GPU mode) as well as multi-machines mode
* Tuners, Assessors and Trial
* Support AutoML algorithms including: hyperopt_tpe, hyperopt_annealing, hyperopt_random, and evolution_tuner
* Support assessor(early stop) algorithms including: medianstop algorithm
* Provide Python API for user defined tuners and assessors
* Provide Python API for user to wrap trial code as NNI deployable codes
* Experiments
* Provide a command line toolkit 'nnictl' for experiments management
* Provide a WebUI for viewing experiments details and managing experiments
* Continuous Integration
* Support CI by providing out-of-box integration with [travis-ci](https://github.com/travis-ci) on ubuntu
* Others
* Support simple GPU job scheduling
### Known Issues
[Known Issues in release 0.1.0](https://github.com/Microsoft/nni/labels/nni010knownissues).
......@@ -20,7 +20,7 @@ Install NNI on another machine which has network accessibility to those three ma
We use `examples/trials/mnist-annotation` as an example here. `cat ~/nni/examples/trials/mnist-annotation/config_remote.yml` to see the detailed configuration file:
```yaml
```yml
authorName: default
experimentName: example_mnist
trialConcurrency: 1
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment