Unverified Commit d48ad027 authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #184 from microsoft/master

merge master
parents 9352cc88 22993e5d
...@@ -28,12 +28,6 @@ Currently we support installation on Linux, Mac and Windows(local, remote and pa ...@@ -28,12 +28,6 @@ Currently we support installation on Linux, Mac and Windows(local, remote and pa
## **Installation on Windows** ## **Installation on Windows**
When you use PowerShell to run script for the first time, you need **run PowerShell as administrator** with this command:
```powershell
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
Anaconda or Miniconda is highly recommended. Anaconda or Miniconda is highly recommended.
* __Install NNI through pip__ * __Install NNI through pip__
...@@ -47,13 +41,11 @@ Set-ExecutionPolicy -ExecutionPolicy Unrestricted ...@@ -47,13 +41,11 @@ Set-ExecutionPolicy -ExecutionPolicy Unrestricted
* __Install NNI through source code__ * __Install NNI through source code__
Prerequisite: `python >=3.5`, `git`, `PowerShell`. Prerequisite: `python >=3.5`, `git`, `PowerShell`.
you can install NNI as administrator or current user as follows:
```bash ```bash
git clone -b v0.8 https://github.com/Microsoft/nni.git git clone -b v0.8 https://github.com/Microsoft/nni.git
cd nni cd nni
powershell .\install.ps1 powershell -ExecutionPolicy Bypass -file install.ps1
``` ```
## **System requirements** ## **System requirements**
......
**Run an Experiment on Kubeflow** **Run an Experiment on Kubeflow**
=== ===
Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/kubeflow), called kubeflow mode. Before starting to use NNI kubeflow mode, you should have a kubernetes cluster, either on-prem or [Azure Kubernetes Service(AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/), a Ubuntu machine on which [kubeconfig](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) is setup to connect to your kubernetes cluster. If you are not familiar with kubernetes, [here](https://kubernetes.io/docs/tutorials/kubernetes-basics/) is a good start. In kubeflow mode, your trial program will run as kubeflow job in kubernetes cluster. Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/kubeflow), called kubeflow mode. Before starting to use NNI kubeflow mode, you should have a kubernetes cluster, either on-prem or [Azure Kubernetes Service(AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/), a Ubuntu machine on which [kubeconfig](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/) is setup to connect to your kubernetes cluster. If you are not familiar with kubernetes, [here](https://kubernetes.io/docs/tutorials/kubernetes-basics/) is a good start. In kubeflow mode, your trial program will run as kubeflow job in kubernetes cluster.
## Prerequisite for on-premises Kubernetes Service ## Prerequisite for on-premises Kubernetes Service
1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes 1. A **Kubernetes** cluster using Kubernetes 1.8 or later. Follow this [guideline](https://kubernetes.io/docs/setup/) to set up Kubernetes
2. Download, set up, and deploy **Kubelow** to your Kubernetes cluster. Follow this [guideline](https://www.kubeflow.org/docs/started/getting-started/) to set up Kubeflow 2. Download, set up, and deploy **Kubelow** to your Kubernetes cluster. Follow this [guideline](https://www.kubeflow.org/docs/started/getting-started/) to set up Kubeflow
3. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig. 3. Prepare a **kubeconfig** file, which will be used by NNI to interact with your kubernetes API server. By default, NNI manager will use $(HOME)/.kube/config as kubeconfig file's path. You can also specify other kubeconfig files by setting the **KUBECONFIG** environment variable. Refer this [guideline]( https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig) to learn more about kubeconfig.
4. If your NNI trial job needs GPU resource, you should follow this [guideline](https://github.com/NVIDIA/k8s-device-plugin) to configure **Nvidia device plugin for Kubernetes**. 4. If your NNI trial job needs GPU resource, you should follow this [guideline](https://github.com/NVIDIA/k8s-device-plugin) to configure **Nvidia device plugin for Kubernetes**.
5. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when NNI copy files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**. 5. Prepare a **NFS server** and export a general purpose mount (we recommend to map your NFS server path in `root_squash option`, otherwise permission issue may raise when NNI copy files to NFS. Refer this [page](https://linux.die.net/man/5/exports) to learn what root_squash option is), or **Azure File Storage**.
6. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment. Run this command to install NFSv4 client: 6. Install **NFS client** on the machine where you install NNI and run nnictl to create experiment. Run this command to install NFSv4 client:
``` ```
apt-get install nfs-common apt-get install nfs-common
``` ```
7. Install **NNI**, follow the install guide [here](QuickStart.md). 7. Install **NNI**, follow the install guide [here](QuickStart.md).
...@@ -22,11 +22,11 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku ...@@ -22,11 +22,11 @@ Now NNI supports running experiment on [Kubeflow](https://github.com/kubeflow/ku
4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files. 4. Follow the [guideline](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=portal) to create azure file storage account. If you use Azure Kubernetes Service, NNI need Azure Storage Service to store code files and the output files.
5. To access Azure storage service, NNI need the access key of the storage account, and NNI use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key. 5. To access Azure storage service, NNI need the access key of the storage account, and NNI use [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/) Service to protect your private key. Set up Azure Key Vault Service, add a secret to Key Vault to store the access key of Azure storage account. Follow this [guideline](https://docs.microsoft.com/en-us/azure/key-vault/quick-create-cli) to store the access key.
## Design ## Design
![](../img/kubeflow_training_design.png) ![](../img/kubeflow_training_design.png)
Kubeflow training service instantiates a kubernetes rest client to interact with your K8s cluster's API server. Kubeflow training service instantiates a kubernetes rest client to interact with your K8s cluster's API server.
For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumns: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in NNI config YAML file. After files are prepared, Kubeflow training service will call K8S rest API to create kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volumn into the job's pod. Output files of kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files. For each trial, we will upload all the files in your local codeDir path (configured in nni_config.yml) together with NNI generated files like parameter.cfg into a storage volumn. Right now we support two kinds of storage volumns: [nfs](https://en.wikipedia.org/wiki/Network_File_System) and [azure file storage](https://azure.microsoft.com/en-us/services/storage/files/), you should configure the storage volumn in NNI config YAML file. After files are prepared, Kubeflow training service will call K8S rest API to create kubeflow jobs ([tf-operator](https://github.com/kubeflow/tf-operator) job or [pytorch-operator](https://github.com/kubeflow/pytorch-operator) job) in K8S, and mount your storage volumn into the job's pod. Output files of kubeflow job, like stdout, stderr, trial.log or model files, will also be copied back to the storage volumn. NNI will show the storage volumn's URL for each trial in WebUI, to allow user browse the log files and job's output files.
## Supported operator ## Supported operator
NNI only support tf-operator and pytorch-operator of kubeflow, other operators is not tested. NNI only support tf-operator and pytorch-operator of kubeflow, other operators is not tested.
...@@ -41,10 +41,10 @@ The setting of pytorch-operator: ...@@ -41,10 +41,10 @@ The setting of pytorch-operator:
kubeflowConfig: kubeflowConfig:
operator: pytorch-operator operator: pytorch-operator
``` ```
If users want to use tf-operator, he could set `ps` and `worker` in trial config. If users want to use pytorch-operator, he could set `master` and `worker` in trial config. If users want to use tf-operator, he could set `ps` and `worker` in trial config. If users want to use pytorch-operator, he could set `master` and `worker` in trial config.
## Supported storage type ## Supported storage type
NNI support NFS and Azure Storage to store the code and output files, users could set storage type in config file and set the corresponding config. NNI support NFS and Azure Storage to store the code and output files, users could set storage type in config file and set the corresponding config.
The setting for NFS storage are as follows: The setting for NFS storage are as follows:
``` ```
kubeflowConfig: kubeflowConfig:
...@@ -69,7 +69,7 @@ kubeflowConfig: ...@@ -69,7 +69,7 @@ kubeflowConfig:
## Run an experiment ## Run an experiment
Use `examples/trials/mnist` as an example. This is a tensorflow job, and use tf-operator of kubeflow. The NNI config YAML file's content is like: Use `examples/trials/mnist` as an example. This is a tensorflow job, and use tf-operator of kubeflow. The NNI config YAML file's content is like:
``` ```
authorName: default authorName: default
experimentName: example_mnist experimentName: example_mnist
...@@ -119,7 +119,7 @@ kubeflowConfig: ...@@ -119,7 +119,7 @@ kubeflowConfig:
path: {your_nfs_server_export_path} path: {your_nfs_server_export_path}
``` ```
Note: You should explicitly set `trainingServicePlatform: kubeflow` in NNI config YAML file if you want to start experiment in kubeflow mode. Note: You should explicitly set `trainingServicePlatform: kubeflow` in NNI config YAML file if you want to start experiment in kubeflow mode.
If you want to run PyTorch jobs, you could set your config files as follow: If you want to run PyTorch jobs, you could set your config files as follow:
``` ```
...@@ -178,7 +178,7 @@ Trial configuration in kubeflow mode have the following configuration keys: ...@@ -178,7 +178,7 @@ Trial configuration in kubeflow mode have the following configuration keys:
* cpuNum * cpuNum
* gpuNum * gpuNum
* image * image
* Required key. In kubeflow mode, your trial program will be scheduled by Kubernetes to run in [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). This key is used to specify the Docker image used to create the pod where your trail program will run. * Required key. In kubeflow mode, your trial program will be scheduled by Kubernetes to run in [Pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). This key is used to specify the Docker image used to create the pod where your trail program will run.
* We already build a docker image [msranni/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it. * We already build a docker image [msranni/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it.
* apiVersion * apiVersion
* Required key. The API version of your kubeflow. * Required key. The API version of your kubeflow.
...@@ -189,12 +189,12 @@ Once complete to fill NNI experiment config file and save (for example, save as ...@@ -189,12 +189,12 @@ Once complete to fill NNI experiment config file and save (for example, save as
``` ```
nnictl create --config exp_kubeflow.yml nnictl create --config exp_kubeflow.yml
``` ```
to start the experiment in kubeflow mode. NNI will create Kubeflow tfjob or pytorchjob for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`. to start the experiment in kubeflow mode. NNI will create Kubeflow tfjob or pytorchjob for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see the kubeflow tfjob created by NNI in your Kubernetes dashboard. You can see the kubeflow tfjob created by NNI in your Kubernetes dashboard.
Notice: In kubeflow mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic. Notice: In kubeflow mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information. Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
## version check ## version check
NNI support version check feature in since version 0.6, [refer](PaiMode.md) NNI support version check feature in since version 0.6, [refer](PaiMode.md)
......
...@@ -19,26 +19,26 @@ To enable NNI API, make the following changes: ...@@ -19,26 +19,26 @@ To enable NNI API, make the following changes:
RECEIVED_PARAMS = nni.get_next_parameter() RECEIVED_PARAMS = nni.get_next_parameter()
to get hyper-parameters' values assigned by tuner. `RECEIVED_PARAMS` is an object, for example: to get hyper-parameters' values assigned by tuner. `RECEIVED_PARAMS` is an object, for example:
{"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029} {"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}
1.3 Report NNI results 1.3 Report NNI results
Use the API: Use the API:
`nni.report_intermediate_result(accuracy)`
`nni.report_intermediate_result(accuracy)`
to send `accuracy` to assessor. to send `accuracy` to assessor.
Use the API: Use the API:
`nni.report_final_result(accuracy)` `nni.report_final_result(accuracy)`
to send `accuracy` to tuner. to send `accuracy` to tuner.
~~~~ ~~~~
We had made the changes and saved it to `mnist.py`. We had made the changes and saved it to `mnist.py`.
**NOTE**: **NOTE**:
~~~~ ~~~~
accuracy - The `accuracy` could be any python object, but if you use NNI built-in tuner/assessor, `accuracy` should be a numerical variable (e.g. float, int). accuracy - The `accuracy` could be any python object, but if you use NNI built-in tuner/assessor, `accuracy` should be a numerical variable (e.g. float, int).
assessor - The assessor will decide which trial should early stop based on the history performance of trial (intermediate result of one trial). assessor - The assessor will decide which trial should early stop based on the history performance of trial (intermediate result of one trial).
...@@ -47,7 +47,7 @@ tuner - The tuner will generate next parameters/architecture based on the exp ...@@ -47,7 +47,7 @@ tuner - The tuner will generate next parameters/architecture based on the exp
>Step 2 - Define SearchSpace >Step 2 - Define SearchSpace
The hyper-parameters used in `Step 1.2 - Get predefined parameters` is defined in a `search_space.json` file like below: The hyper-parameters used in `Step 1.2 - Get predefined parameters` is defined in a `search_space.json` file like below:
``` ```
{ {
"dropout_rate":{"_type":"uniform","_value":[0.1,0.5]}, "dropout_rate":{"_type":"uniform","_value":[0.1,0.5]},
...@@ -76,10 +76,10 @@ To run an experiment in NNI, you only needed: ...@@ -76,10 +76,10 @@ To run an experiment in NNI, you only needed:
* Provide a YAML experiment configure file * Provide a YAML experiment configure file
* (optional) Provide or choose an assessor * (optional) Provide or choose an assessor
**Prepare trial**: **Prepare trial**:
>A set of examples can be found in ~/nni/examples after your installation, run `ls ~/nni/examples/trials` to see all the trial examples. >A set of examples can be found in ~/nni/examples after your installation, run `ls ~/nni/examples/trials` to see all the trial examples.
Let's use a simple trial example, e.g. mnist, provided by NNI. After you installed NNI, NNI examples have been put in ~/nni/examples, run `ls ~/nni/examples/trials` to see all the trial examples. You can simply execute the following command to run the NNI mnist example: Let's use a simple trial example, e.g. mnist, provided by NNI. After you installed NNI, NNI examples have been put in ~/nni/examples, run `ls ~/nni/examples/trials` to see all the trial examples. You can simply execute the following command to run the NNI mnist example:
python ~/nni/examples/trials/mnist-annotation/mnist.py python ~/nni/examples/trials/mnist-annotation/mnist.py
...@@ -109,10 +109,10 @@ maxExecDuration: 3h ...@@ -109,10 +109,10 @@ maxExecDuration: 3h
# empty means never stop # empty means never stop
maxTrialNum: 100 maxTrialNum: 100
# choice: local, remote # choice: local, remote
trainingServicePlatform: local trainingServicePlatform: local
# choice: true, false # choice: true, false
useAnnotation: true useAnnotation: true
tuner: tuner:
builtinTunerName: TPE builtinTunerName: TPE
...@@ -122,7 +122,7 @@ trial: ...@@ -122,7 +122,7 @@ trial:
command: python mnist.py command: python mnist.py
codeDir: ~/nni/examples/trials/mnist-annotation codeDir: ~/nni/examples/trials/mnist-annotation
gpuNum: 0 gpuNum: 0
``` ```
Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](AnnotationSpec.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires. Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here](AnnotationSpec.md) for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
...@@ -136,7 +136,7 @@ You can refer to [here](Nnictl.md) for more usage guide of *nnictl* command line ...@@ -136,7 +136,7 @@ You can refer to [here](Nnictl.md) for more usage guide of *nnictl* command line
The experiment has been running now. Other than *nnictl*, NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features. The experiment has been running now. Other than *nnictl*, NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features.
## Using multiple local GPUs to speed up search ## Using multiple local GPUs to speed up search
The following steps assume that you have 4 NVIDIA GPUs installed at local and [tensorflow with GPU support](https://www.tensorflow.org/install/gpu). The demo enables 4 concurrent trail jobs and each trail job uses 1 GPU. The following steps assume that you have 4 NVIDIA GPUs installed at local and [tensorflow with GPU support](https://www.tensorflow.org/install/gpu). The demo enables 4 concurrent trail jobs and each trail job uses 1 GPU.
**Prepare configure file**: NNI provides a demo configuration file for the setting above, `cat ~/nni/examples/trials/mnist-annotation/config_gpu.yml` to see it. The trailConcurrency and gpuNum are different from the basic configure file: **Prepare configure file**: NNI provides a demo configuration file for the setting above, `cat ~/nni/examples/trials/mnist-annotation/config_gpu.yml` to see it. The trailConcurrency and gpuNum are different from the basic configure file:
...@@ -152,7 +152,7 @@ trial: ...@@ -152,7 +152,7 @@ trial:
command: python mnist.py command: python mnist.py
codeDir: ~/nni/examples/trials/mnist-annotation codeDir: ~/nni/examples/trials/mnist-annotation
gpuNum: 1 gpuNum: 1
``` ```
We can run the experiment with the following command: We can run the experiment with the following command:
......
...@@ -6,7 +6,7 @@ Typically each trial job gets a single configuration (e.g., hyperparameters) fro ...@@ -6,7 +6,7 @@ Typically each trial job gets a single configuration (e.g., hyperparameters) fro
2. Some types of models have to be trained phase by phase, the configuration of next phase depends on the results of previous phase(s). For example, to find the best quantization for a model, the training procedure is often as follows: the auto-quantization algorithm (i.e., tuner in NNI) chooses a size of bits (e.g., 16 bits), a trial job gets this configuration and trains the model for some epochs and reports result (e.g., accuracy). The algorithm receives this result and makes decision of changing 16 bits to 8 bits, or changing back to 32 bits. This process is repeated for a configured times. 2. Some types of models have to be trained phase by phase, the configuration of next phase depends on the results of previous phase(s). For example, to find the best quantization for a model, the training procedure is often as follows: the auto-quantization algorithm (i.e., tuner in NNI) chooses a size of bits (e.g., 16 bits), a trial job gets this configuration and trains the model for some epochs and reports result (e.g., accuracy). The algorithm receives this result and makes decision of changing 16 bits to 8 bits, or changing back to 32 bits. This process is repeated for a configured times.
The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results. The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results.
Note that, `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken. Note that, `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
......
...@@ -54,11 +54,11 @@ net = build_graph_from_json(RCV_CONFIG) ...@@ -54,11 +54,11 @@ net = build_graph_from_json(RCV_CONFIG)
nni.report_final_result(best_acc) nni.report_final_result(best_acc)
``` ```
If you want to save and **load the best model**, the following methods are recommended. If you want to save and **load the best model**, the following methods are recommended.
```python ```python
# 1. Use NNI API # 1. Use NNI API
## You can get the best model ID from WebUI ## You can get the best model ID from WebUI
## or `nni/experiments/experiment_id/log/model_path/best_model.txt' ## or `nni/experiments/experiment_id/log/model_path/best_model.txt'
## read the json string from model file and load it with NNI API ## read the json string from model file and load it with NNI API
...@@ -66,7 +66,7 @@ with open("best-model.json") as json_file: ...@@ -66,7 +66,7 @@ with open("best-model.json") as json_file:
json_of_model = json_file.read() json_of_model = json_file.read()
model = build_graph_from_json(json_of_model) model = build_graph_from_json(json_of_model)
# 2. Use Framework API (Related to Framework) # 2. Use Framework API (Related to Framework)
## 2.1 Keras API ## 2.1 Keras API
## Save the model with Keras API in the trial code ## Save the model with Keras API in the trial code
...@@ -106,9 +106,9 @@ The tuner has a lot of different files, functions and classes. Here we will only ...@@ -106,9 +106,9 @@ The tuner has a lot of different files, functions and classes. Here we will only
- `networkmorphism_tuner.py` is a tuner which using network morphism techniques. - `networkmorphism_tuner.py` is a tuner which using network morphism techniques.
- `bayesian.py` is Bayesian method to estimate the metric of unseen model based on the models we have already searched. - `bayesian.py` is Bayesian method to estimate the metric of unseen model based on the models we have already searched.
- `graph.py` is the meta graph data structure. Class Graph is representing the neural architecture graph of a model. - `graph.py` is the meta graph data structure. Class Graph is representing the neural architecture graph of a model.
- Graph extracts the neural architecture graph from a model. - Graph extracts the neural architecture graph from a model.
- Each node in the graph is a intermediate tensor between layers. - Each node in the graph is a intermediate tensor between layers.
- Each layer is an edge in the graph. - Each layer is an edge in the graph.
- Notably, multiple edges may refer to the same layer. - Notably, multiple edges may refer to the same layer.
......
...@@ -21,16 +21,6 @@ For other examples you need to change trial command `python3` into `python` in e ...@@ -21,16 +21,6 @@ For other examples you need to change trial command `python3` into `python` in e
Make sure C++ 14.0 compiler installed. Make sure C++ 14.0 compiler installed.
>building 'simplejson._speedups' extension error: [WinError 3] The system cannot find the path specified >building 'simplejson._speedups' extension error: [WinError 3] The system cannot find the path specified
### Fail to run PowerShell when install NNI from source
If you run PowerShell script for the first time and did not set the execution policies for executing the script, you will meet this error below. Try to run PowerShell as administrator with this command first:
```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
>...cannot be loaded because running scripts is disabled on this system.
### Trial failed with missing DLL in command line or PowerShell ### Trial failed with missing DLL in command line or PowerShell
This error caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and fail to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it. This error caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and fail to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it.
...@@ -38,11 +28,7 @@ This error caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and fail to install ...@@ -38,11 +28,7 @@ This error caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and fail to install
### Trial failed on webUI ### Trial failed on webUI
Please check the trial log file stderr for more details. If there is no such file and NNI is installed through pip, then you need to run PowerShell as administrator with this command first: Please check the trial log file stderr for more details.
```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
If there is a stderr file, please check out. Two possible cases are as follows: If there is a stderr file, please check out. Two possible cases are as follows:
......
...@@ -29,7 +29,7 @@ nnictl support commands: ...@@ -29,7 +29,7 @@ nnictl support commands:
* Description * Description
You can use this command to create a new experiment, using the configuration specified in config file. You can use this command to create a new experiment, using the configuration specified in config file.
After this command is successfully done, the context will be set as this experiment, which means the following command you issued is associated with this experiment, unless you explicitly changes the context(not supported yet). After this command is successfully done, the context will be set as this experiment, which means the following command you issued is associated with this experiment, unless you explicitly changes the context(not supported yet).
...@@ -91,7 +91,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -91,7 +91,7 @@ Debug mode will disable version check function in Trialkeeper.
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
|id| True| |The id of the experiment you want to resume| |id| True| |The id of the experiment you want to resume|
|--port, -p| False| |Rest port of the experiment you want to resume| |--port, -p| False| |Rest port of the experiment you want to resume|
|--debug, -d|False||set debug mode| |--debug, -d|False||set debug mode|
...@@ -170,7 +170,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -170,7 +170,7 @@ Debug mode will disable version check function in Trialkeeper.
nnictl update searchspace [experiment_id] --filename examples/trials/mnist/search_space.json nnictl update searchspace [experiment_id] --filename examples/trials/mnist/search_space.json
``` ```
* __nnictl update concurrency__ * __nnictl update concurrency__
* Description * Description
...@@ -197,11 +197,11 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -197,11 +197,11 @@ Debug mode will disable version check function in Trialkeeper.
nnictl update concurrency [experiment_id] --value [concurrency_number] nnictl update concurrency [experiment_id] --value [concurrency_number]
``` ```
* __nnictl update duration__ * __nnictl update duration__
* Description * Description
You can use this command to update an experiment's duration. You can use this command to update an experiment's duration.
* Usage * Usage
...@@ -224,7 +224,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -224,7 +224,7 @@ Debug mode will disable version check function in Trialkeeper.
nnictl update duration [experiment_id] --value [duration] nnictl update duration [experiment_id] --value [duration]
``` ```
* __nnictl update trialnum__ * __nnictl update trialnum__
* Description * Description
You can use this command to update an experiment's maxtrialnum. You can use this command to update an experiment's maxtrialnum.
...@@ -312,7 +312,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -312,7 +312,7 @@ Debug mode will disable version check function in Trialkeeper.
nnictl top nnictl top
``` ```
* Options * Options
|Name, shorthand|Required|Default|Description| |Name, shorthand|Required|Default|Description|
|------|------|------ |------| |------|------|------ |------|
...@@ -525,12 +525,12 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -525,12 +525,12 @@ Debug mode will disable version check function in Trialkeeper.
* __nnictl log trial__ * __nnictl log trial__
* Description * Description
Show trial log path. Show trial log path.
* Usage * Usage
```bash ```bash
nnictl log trial [options] nnictl log trial [options]
``` ```
...@@ -554,7 +554,7 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -554,7 +554,7 @@ Debug mode will disable version check function in Trialkeeper.
* Description * Description
Start the tensorboard process. Start the tensorboard process.
* Usage * Usage
```bash ```bash
...@@ -571,8 +571,8 @@ Debug mode will disable version check function in Trialkeeper. ...@@ -571,8 +571,8 @@ Debug mode will disable version check function in Trialkeeper.
* Detail * Detail
1. NNICTL support tensorboard function in local and remote platform for the moment, other platforms will be supported later. 1. NNICTL support tensorboard function in local and remote platform for the moment, other platforms will be supported later.
2. If you want to use tensorboard, you need to write your tensorboard log data to environment variable [NNI_OUTPUT_DIR] path. 2. If you want to use tensorboard, you need to write your tensorboard log data to environment variable [NNI_OUTPUT_DIR] path.
3. In local mode, nnictl will set --logdir=[NNI_OUTPUT_DIR] directly and start a tensorboard process. 3. In local mode, nnictl will set --logdir=[NNI_OUTPUT_DIR] directly and start a tensorboard process.
4. In remote mode, nnictl will create a ssh client to copy log data from remote machine to local temp directory firstly, and then start a tensorboard process in your local machine. You need to notice that nnictl only copy the log data one time when you use the command, if you want to see the later result of tensorboard, you should execute nnictl tensorboard command again. 4. In remote mode, nnictl will create a ssh client to copy log data from remote machine to local temp directory firstly, and then start a tensorboard process in your local machine. You need to notice that nnictl only copy the log data one time when you use the command, if you want to see the later result of tensorboard, you should execute nnictl tensorboard command again.
5. If there is only one trial job, you don't need to set trial id. If there are multiple trial jobs running, you should set the trial id, or you could use [nnictl tensorboard start --trial_id all] to map --logdir to all trial log paths. 5. If there is only one trial job, you don't need to set trial id. If there are multiple trial jobs running, you should set the trial id, or you could use [nnictl tensorboard start --trial_id all] to map --logdir to all trial log paths.
......
...@@ -11,7 +11,7 @@ The figure below shows high-level architecture of NNI. ...@@ -11,7 +11,7 @@ The figure below shows high-level architecture of NNI.
<p align="center"> <p align="center">
<img src="https://user-images.githubusercontent.com/23273522/51816536-ed055580-2301-11e9-8ad8-605a79ee1b9a.png" alt="drawing" width="700"/> <img src="https://user-images.githubusercontent.com/23273522/51816536-ed055580-2301-11e9-8ad8-605a79ee1b9a.png" alt="drawing" width="700"/>
</p> </p>
## Key Concepts ## Key Concepts
...@@ -42,7 +42,7 @@ For each experiment, user only needs to define a search space and update a few l ...@@ -42,7 +42,7 @@ For each experiment, user only needs to define a search space and update a few l
<p align="center"> <p align="center">
<img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/> <img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/>
</p> </p>
More details about how to run an experiment, please refer to [Get Started](QuickStart.md). More details about how to run an experiment, please refer to [Get Started](QuickStart.md).
......
...@@ -19,7 +19,7 @@ maxExecDuration: 3h ...@@ -19,7 +19,7 @@ maxExecDuration: 3h
maxTrialNum: 100 maxTrialNum: 100
# choice: local, remote, pai # choice: local, remote, pai
trainingServicePlatform: pai trainingServicePlatform: pai
# choice: true, false # choice: true, false
useAnnotation: true useAnnotation: true
tuner: tuner:
builtinTunerName: TPE builtinTunerName: TPE
...@@ -53,7 +53,7 @@ Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial con ...@@ -53,7 +53,7 @@ Compared with LocalMode and [RemoteMachineMode](RemoteMachineMode.md), trial con
* We already build a docker image [nnimsra/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it. * We already build a docker image [nnimsra/nni](https://hub.docker.com/r/msranni/nni/) on [Docker Hub](https://hub.docker.com/). It contains NNI python packages, Node modules and javascript artifact files required to start experiment, and all of NNI dependencies. The docker file used to build this image can be found at [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/Dockerfile). You can either use this image directly in your config file, or build your own image based on it.
* dataDir * dataDir
* Optional key. It specifies the HDFS data direcotry for trial to download data. The format should be something like hdfs://{your HDFS host}:9000/{your data directory} * Optional key. It specifies the HDFS data direcotry for trial to download data. The format should be something like hdfs://{your HDFS host}:9000/{your data directory}
* outputDir * outputDir
* Optional key. It specifies the HDFS output directory for trial. Once the trial is completed (either succeed or fail), trial's stdout, stderr will be copied to this directory by NNI sdk automatically. The format should be something like hdfs://{your HDFS host}:9000/{your output directory} * Optional key. It specifies the HDFS output directory for trial. Once the trial is completed (either succeed or fail), trial's stdout, stderr will be copied to this directory by NNI sdk automatically. The format should be something like hdfs://{your HDFS host}:9000/{your output directory}
* virtualCluster * virtualCluster
* Optional key. Set the virtualCluster of OpenPAI. If omitted, the job will run on default virtual cluster. * Optional key. Set the virtualCluster of OpenPAI. If omitted, the job will run on default virtual cluster.
...@@ -64,13 +64,13 @@ Once complete to fill NNI experiment config file and save (for example, save as ...@@ -64,13 +64,13 @@ Once complete to fill NNI experiment config file and save (for example, save as
``` ```
nnictl create --config exp_pai.yml nnictl create --config exp_pai.yml
``` ```
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`. to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see jobs created by NNI in the OpenPAI cluster's web portal, like: You can see jobs created by NNI in the OpenPAI cluster's web portal, like:
![](../img/nni_pai_joblist.jpg) ![](../img/nni_pai_joblist.jpg)
Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic. Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information. Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
Expand a trial information in trial list view, click the logPath link like: Expand a trial information in trial list view, click the logPath link like:
![](../img/nni_webui_joblist.jpg) ![](../img/nni_webui_joblist.jpg)
...@@ -80,16 +80,16 @@ And you will be redirected to HDFS web portal to browse the output files of that ...@@ -80,16 +80,16 @@ And you will be redirected to HDFS web portal to browse the output files of that
You can see there're three fils in output folder: stderr, stdout, and trial.log You can see there're three fils in output folder: stderr, stdout, and trial.log
If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS. If you also want to save trial's other output into HDFS, like model files, you can use environment variable `NNI_OUTPUT_DIR` in your trial code to save your own output files, and NNI SDK will copy all the files in `NNI_OUTPUT_DIR` from trial's container to HDFS.
Any problems when using NNI in pai mode, please create issues on [NNI github repo](https://github.com/Microsoft/nni). Any problems when using NNI in pai mode, please create issues on [NNI github repo](https://github.com/Microsoft/nni).
## version check ## version check
NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility. NNI support version check feature in since version 0.6. It is a policy to insure the version of NNIManager is consistent with trialKeeper, and avoid errors caused by version incompatibility.
Check policy: Check policy:
1. NNIManager before v0.6 could run any version of trialKeeper, trialKeeper support backward compatibility. 1. NNIManager before v0.6 could run any version of trialKeeper, trialKeeper support backward compatibility.
2. Since version 0.6, NNIManager version should keep same with triakKeeper version. For example, if NNIManager version is 0.6, trialKeeper version should be 0.6 too. 2. Since version 0.6, NNIManager version should keep same with triakKeeper version. For example, if NNIManager version is 0.6, trialKeeper version should be 0.6 too.
3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7. 3. Note that the version check feature only check first two digits of version.For example, NNIManager v0.6.1 could use trialKeeper v0.6 or trialKeeper v0.6.2, but could not use trialKeeper v0.5.1 or trialKeeper v0.7.
If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check. If you could not run your experiment and want to know if it is caused by version check, you could check your webUI, and there will be an error message about version check.
![](../img/version_check.png) ![](../img/version_check.png)
\ No newline at end of file
...@@ -10,11 +10,7 @@ We support Linux MacOS and Windows in current stage, Ubuntu 16.04 or higher, Mac ...@@ -10,11 +10,7 @@ We support Linux MacOS and Windows in current stage, Ubuntu 16.04 or higher, Mac
``` ```
#### Windows #### Windows
If you are using NNI on Windows, you need run below PowerShell command as administrator at first time.
```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
Then install nni through pip:
```bash ```bash
python -m pip install --upgrade nni python -m pip install --upgrade nni
``` ```
...@@ -130,7 +126,7 @@ useAnnotation: false ...@@ -130,7 +126,7 @@ useAnnotation: false
tuner: tuner:
builtinTunerName: TPE builtinTunerName: TPE
# The path and the running command of trial # The path and the running command of trial
trial: trial:
command: python3 mnist.py command: python3 mnist.py
codeDir: . codeDir: .
gpuNum: 0 gpuNum: 0
......
...@@ -102,14 +102,14 @@ ...@@ -102,14 +102,14 @@
## Release 0.5.1 - 1/31/2018 ## Release 0.5.1 - 1/31/2018
### Improvements ### Improvements
* Making [log directory](https://github.com/Microsoft/nni/blob/v0.5.1/docs/en_US/ExperimentConfig.md) configurable * Making [log directory](https://github.com/Microsoft/nni/blob/v0.5.1/docs/en_US/ExperimentConfig.md) configurable
* Support [different levels of logs](https://github.com/Microsoft/nni/blob/v0.5.1/docs/en_US/ExperimentConfig.md), making it easier for debugging * Support [different levels of logs](https://github.com/Microsoft/nni/blob/v0.5.1/docs/en_US/ExperimentConfig.md), making it easier for debugging
### Documentation ### Documentation
* Reorganized documentation & New Homepage Released: https://nni.readthedocs.io/en/latest/ * Reorganized documentation & New Homepage Released: https://nni.readthedocs.io/en/latest/
### Bug Fixes and Other Changes ### Bug Fixes and Other Changes
* Fix the bug of installation in python virtualenv, and refactor the installation logic * Fix the bug of installation in python virtualenv, and refactor the installation logic
* Fix the bug of HDFS access failure on OpenPAI mode after OpenPAI is upgraded. * Fix the bug of HDFS access failure on OpenPAI mode after OpenPAI is upgraded.
* Fix the bug that sometimes in-place flushed stdout makes experiment crash * Fix the bug that sometimes in-place flushed stdout makes experiment crash
## Release 0.5.0 - 01/14/2019 ## Release 0.5.0 - 01/14/2019
...@@ -177,7 +177,7 @@ ...@@ -177,7 +177,7 @@
* [Kubeflow Training service](./KubeflowMode.md) * [Kubeflow Training service](./KubeflowMode.md)
* Support tf-operator * Support tf-operator
* [Distributed trial example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow * [Distributed trial example](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-distributed/dist_mnist.py) on Kubeflow
* [Grid search tuner](GridsearchTuner.md) * [Grid search tuner](GridsearchTuner.md)
* [Hyperband tuner](HyperbandAdvisor.md) * [Hyperband tuner](HyperbandAdvisor.md)
* Support launch NNI experiment on MAC * Support launch NNI experiment on MAC
* WebUI * WebUI
...@@ -192,10 +192,10 @@ ...@@ -192,10 +192,10 @@
### Others ### Others
* Asynchronous dispatcher * Asynchronous dispatcher
* Docker file update, add pytorch library * Docker file update, add pytorch library
* Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API. * Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API.
* OpenPAI training service bug fix * OpenPAI training service bug fix
* Support NNI Manager IP configuration(nniManagerIp) in OpenPAI cluster config file, to fix the issue that user’s machine has no eth0 device * Support NNI Manager IP configuration(nniManagerIp) in OpenPAI cluster config file, to fix the issue that user’s machine has no eth0 device
* File number in codeDir is capped to 1000 now, to avoid user mistakenly fill root dir for codeDir * File number in codeDir is capped to 1000 now, to avoid user mistakenly fill root dir for codeDir
* Don’t print useless ‘metrics is empty’ log in OpenPAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks OpenPAI trial’s output for debugging purpose * Don’t print useless ‘metrics is empty’ log in OpenPAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks OpenPAI trial’s output for debugging purpose
* Add timestamp at the beginning of each log entry in trial keeper. * Add timestamp at the beginning of each log entry in trial keeper.
...@@ -219,7 +219,7 @@ ...@@ -219,7 +219,7 @@
* <span style="color:red">**breaking change**</span>: nn.get_parameters() is refactored to nni.get_next_parameter. All examples of prior releases can not run on v0.3, please clone nni repo to get new examples. If you had applied NNI to your own codes, please update the API accordingly. * <span style="color:red">**breaking change**</span>: nn.get_parameters() is refactored to nni.get_next_parameter. All examples of prior releases can not run on v0.3, please clone nni repo to get new examples. If you had applied NNI to your own codes, please update the API accordingly.
* New API **nni.get_sequence_id()**. * New API **nni.get_sequence_id()**.
Each trial job is allocated a unique sequence number, which can be retrieved by nni.get_sequence_id() API. Each trial job is allocated a unique sequence number, which can be retrieved by nni.get_sequence_id() API.
```bash ```bash
......
...@@ -63,4 +63,4 @@ After the code changes, use **step 3** to rebuild your codes, then the changes w ...@@ -63,4 +63,4 @@ After the code changes, use **step 3** to rebuild your codes, then the changes w
--- ---
At last, wish you have a wonderful day. At last, wish you have a wonderful day.
For more contribution guidelines on making PR's or issues to NNI source code, you can refer to our [Contributing](./Contributing.md) document. For more contribution guidelines on making PR's or issues to NNI source code, you can refer to our [Contributing](./Contributing.md) document.
# Scikit-learn in NNI # Scikit-learn in NNI
[Scikit-learn](https://github.com/scikit-learn/scikit-learn) is a pupular meachine learning tool for data mining and data analysis. It supports many kinds of meachine learning models like LinearRegression, LogisticRegression, DecisionTree, SVM etc. How to make the use of scikit-learn more efficiency is a valuable topic. [Scikit-learn](https://github.com/scikit-learn/scikit-learn) is a pupular meachine learning tool for data mining and data analysis. It supports many kinds of meachine learning models like LinearRegression, LogisticRegression, DecisionTree, SVM etc. How to make the use of scikit-learn more efficiency is a valuable topic.
NNI supports many kinds of tuning algorithms to search the best models and/or hyper-parameters for scikit-learn, and support many kinds of environments like local machine, remote servers and cloud. NNI supports many kinds of tuning algorithms to search the best models and/or hyper-parameters for scikit-learn, and support many kinds of environments like local machine, remote servers and cloud.
## 1. How to run the example ## 1. How to run the example
...@@ -16,7 +16,7 @@ nnictl create --config ./config.yml ...@@ -16,7 +16,7 @@ nnictl create --config ./config.yml
### 2.1 classification ### 2.1 classification
This example uses the dataset of digits, which is made up of 1797 8x8 images, and each image is a hand-written digit, the goal is to classify these images into 10 classes. This example uses the dataset of digits, which is made up of 1797 8x8 images, and each image is a hand-written digit, the goal is to classify these images into 10 classes.
In this example, we use SVC as the model, and choose some parameters of this model, including `"C", "keral", "degree", "gamma" and "coef0"`. For more information of these parameters, please [refer](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). In this example, we use SVC as the model, and choose some parameters of this model, including `"C", "keral", "degree", "gamma" and "coef0"`. For more information of these parameters, please [refer](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).
### 2.2 regression ### 2.2 regression
...@@ -50,7 +50,7 @@ It is easy to use nni in your sklearn code, there are only a few steps. ...@@ -50,7 +50,7 @@ It is easy to use nni in your sklearn code, there are only a few steps.
``` ```
Then you could read these values as a dict from your python code, please get into the step 2. Then you could read these values as a dict from your python code, please get into the step 2.
* __step 2__ * __step 2__
At the beginning of your python code, you should `import nni` to insure the packages works normally. At the beginning of your python code, you should `import nni` to insure the packages works normally.
First, you should use `nni.get_next_parameter()` function to get your parameters given by nni. Then you could use these parameters to update your code. First, you should use `nni.get_next_parameter()` function to get your parameters given by nni. Then you could use these parameters to update your code.
For example, if you define your search_space.json like following format: For example, if you define your search_space.json like following format:
...@@ -78,6 +78,6 @@ It is easy to use nni in your sklearn code, there are only a few steps. ...@@ -78,6 +78,6 @@ It is easy to use nni in your sklearn code, there are only a few steps.
``` ```
Then you could use these variables to write your scikit-learn code. Then you could use these variables to write your scikit-learn code.
* __step 3__ * __step 3__
After you finished your training, you could get your own score of the model, like your percision, recall or MSE etc. NNI needs your score to tuner algorithms and generate next group of parameters, please report the score back to NNI and start next trial job. After you finished your training, you could get your own score of the model, like your percision, recall or MSE etc. NNI needs your score to tuner algorithms and generate next group of parameters, please report the score back to NNI and start next trial job.
You just need to use `nni.report_final_result(score)` to communitate with NNI after you process your scikit-learn code. Or if you have multiple scores in the steps of training, you could also report them back to NNI using `nni.report_intemediate_result(score)`. Note, you may not report intemediate result of your job, but you must report back your final result. You just need to use `nni.report_final_result(score)` to communitate with NNI after you process your scikit-learn code. Or if you have multiple scores in the steps of training, you could also report them back to NNI using `nni.report_intemediate_result(score)`. Note, you may not report intemediate result of your job, but you must report back your final result.
...@@ -7,7 +7,7 @@ To define an NNI trial, you need to firstly define the set of parameters (i.e., ...@@ -7,7 +7,7 @@ To define an NNI trial, you need to firstly define the set of parameters (i.e.,
<a name="nni-api"></a> <a name="nni-api"></a>
## NNI API ## NNI API
### Step 1 - Prepare a SearchSpace parameters file. ### Step 1 - Prepare a SearchSpace parameters file.
An example is shown below: An example is shown below:
...@@ -26,14 +26,14 @@ Refer to [SearchSpaceSpec.md](./SearchSpaceSpec.md) to learn more about search s ...@@ -26,14 +26,14 @@ Refer to [SearchSpaceSpec.md](./SearchSpaceSpec.md) to learn more about search s
- Import NNI - Import NNI
Include `import nni` in your trial code to use NNI APIs. Include `import nni` in your trial code to use NNI APIs.
- Get configuration from Tuner - Get configuration from Tuner
```python ```python
RECEIVED_PARAMS = nni.get_next_parameter() RECEIVED_PARAMS = nni.get_next_parameter()
``` ```
`RECEIVED_PARAMS` is an object, for example: `RECEIVED_PARAMS` is an object, for example:
`{"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}`. `{"conv_size": 2, "hidden_size": 124, "learning_rate": 0.0307, "dropout_rate": 0.2029}`.
- Report metric data periodically (optional) - Report metric data periodically (optional)
...@@ -69,16 +69,16 @@ You can refer to [here](ExperimentConfig.md) for more information about how to s ...@@ -69,16 +69,16 @@ You can refer to [here](ExperimentConfig.md) for more information about how to s
An alternative to writing a trial is to use NNI's syntax for python. Simple as any annotation, NNI annotation is working like comments in your codes. You don't have to make structure or any other big changes to your existing codes. With a few lines of NNI annotation, you will be able to: An alternative to writing a trial is to use NNI's syntax for python. Simple as any annotation, NNI annotation is working like comments in your codes. You don't have to make structure or any other big changes to your existing codes. With a few lines of NNI annotation, you will be able to:
* annotate the variables you want to tune * annotate the variables you want to tune
* specify in which range you want to tune the variables * specify in which range you want to tune the variables
* annotate which variable you want to report as intermediate result to `assessor` * annotate which variable you want to report as intermediate result to `assessor`
* annotate which variable you want to report as the final result (e.g. model accuracy) to `tuner`. * annotate which variable you want to report as the final result (e.g. model accuracy) to `tuner`.
Again, take MNIST as an example, it only requires 2 steps to write a trial with NNI Annotation. Again, take MNIST as an example, it only requires 2 steps to write a trial with NNI Annotation.
### Step 1 - Update codes with annotations ### Step 1 - Update codes with annotations
The following is a tensorflow code snippet for NNI Annotation, where the highlighted four lines are annotations that help you to: The following is a tensorflow code snippet for NNI Annotation, where the highlighted four lines are annotations that help you to:
1. tune batch\_size and dropout\_rate 1. tune batch\_size and dropout\_rate
2. report test\_acc every 100 steps 2. report test\_acc every 100 steps
3. at last report test\_acc as final result. 3. at last report test\_acc as final result.
...@@ -111,11 +111,11 @@ with tf.Session() as sess: ...@@ -111,11 +111,11 @@ with tf.Session() as sess:
+ """@nni.report_final_result(test_acc)""" + """@nni.report_final_result(test_acc)"""
``` ```
**NOTE**: **NOTE**:
- `@nni.variable` will take effect on its following line, which is an assignment statement whose leftvalue must be specified by the keyword `name` in `@nni.variable`. - `@nni.variable` will take effect on its following line, which is an assignment statement whose leftvalue must be specified by the keyword `name` in `@nni.variable`.
- `@nni.report_intermediate_result`/`@nni.report_final_result` will send the data to assessor/tuner at that line. - `@nni.report_intermediate_result`/`@nni.report_final_result` will send the data to assessor/tuner at that line.
For more information about annotation syntax and its usage, please refer to [Annotation](AnnotationSpec.md). For more information about annotation syntax and its usage, please refer to [Annotation](AnnotationSpec.md).
### Step 2 - Enable NNI Annotation ### Step 2 - Enable NNI Annotation
......
...@@ -46,7 +46,7 @@ in the scape input. Simultaneously, intermediate result inputs can limit the int ...@@ -46,7 +46,7 @@ in the scape input. Simultaneously, intermediate result inputs can limit the int
![](../img/webui-img/filter_intermediate.png) ![](../img/webui-img/filter_intermediate.png)
## View trials status ## View trials status
Click the tab "Trials Detail" to see the status of the all trials. Specifically: Click the tab "Trials Detail" to see the status of the all trials. Specifically:
......
Advanced Features Advanced Features
===================== =====================
.. toctree:: .. toctree::
......
...@@ -3,10 +3,10 @@ References ...@@ -3,10 +3,10 @@ References
.. toctree:: .. toctree::
:maxdepth: 3 :maxdepth: 3
Command Line <Nnictl> Command Line <Nnictl>
Python API <sdk_reference> Python API <sdk_reference>
Annotation <AnnotationSpec> Annotation <AnnotationSpec>
Configuration<ExperimentConfig> Configuration<ExperimentConfig>
Search Space <SearchSpaceSpec> Search Space <SearchSpaceSpec>
TrainingService <HowToImplementTrainingService> TrainingService <HowToImplementTrainingService>
\ No newline at end of file
authorName: authorName:
experimentName: experimentName:
trialConcurrency: trialConcurrency:
maxExecDuration: maxExecDuration:
maxTrialNum: maxTrialNum:
#choice: local, remote #choice: local, remote
trainingServicePlatform: trainingServicePlatform:
searchSpacePath: searchSpacePath:
#choice: true, false #choice: true, false
useAnnotation: useAnnotation:
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
builtinTunerName: builtinTunerName:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
assessor: assessor:
#choice: Medianstop #choice: Medianstop
builtinAssessorName: builtinAssessorName:
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: optimize_mode:
trial: trial:
command: command:
codeDir: codeDir:
gpuNum: gpuNum:
#machineList can be empty if the platform is local #machineList can be empty if the platform is local
machineList: machineList:
- ip: - ip:
port: port:
username: username:
passwd: passwd:
\ No newline at end of file
# How to write a Trial running on NNI? # How to write a Trial running on NNI?
*Trial receive the hyper-parameter/architecture configure from Tuner, and send intermediate result to Assessor and final result to Tuner.* *Trial receive the hyper-parameter/architecture configure from Tuner, and send intermediate result to Assessor and final result to Tuner.*
So when user want to write a Trial running on NNI, she/he should: So when user want to write a Trial running on NNI, she/he should:
...@@ -140,9 +140,9 @@ def train(args, params): ...@@ -140,9 +140,9 @@ def train(args, params):
_, acc = model.evaluate(x_test, y_test, verbose=0) _, acc = model.evaluate(x_test, y_test, verbose=0)
... ...
``` ```
**4) Send final result** **4) Send final result**
Use `nni.report_final_result` to send final result to Tuner. Please noted **15** line in the following code. Use `nni.report_final_result` to send final result to Tuner. Please noted **15** line in the following code.
...@@ -162,7 +162,7 @@ def train(args, params): ...@@ -162,7 +162,7 @@ def train(args, params):
_, acc = model.evaluate(x_test, y_test, verbose=0) _, acc = model.evaluate(x_test, y_test, verbose=0)
nni.report_final_result(acc) nni.report_final_result(acc)
... ...
``` ```
Here is the complete example: Here is the complete example:
......
...@@ -3,9 +3,9 @@ ...@@ -3,9 +3,9 @@
# #
# MIT License # MIT License
# #
# Permission is hereby granted, free of charge, # Permission is hereby granted, free of charge,
# to any person obtaining a copy of this software and associated # to any person obtaining a copy of this software and associated
# documentation files (the "Software"), # documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation # to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and # the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions: # to permit persons to whom the Software is furnished to do so, subject to the following conditions:
...@@ -88,7 +88,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test): ...@@ -88,7 +88,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test):
# predict # predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval # eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5 rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse) print('The rmse of prediction is:', rmse)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment