".github/git@developer.sourcefind.cn:OpenDAS/tilelang.git" did not exist on "7fb06776b0cc326718e690800f2463dc335f5111"
Unverified Commit a656bba5 authored by Chi Song's avatar Chi Song Committed by GitHub
Browse files

Doc fix: fix typo, rewording and correct format. (#697)

Change pai to OpenPAI
reformat for translation
fix some typo.
rewording.
parent cba22723
...@@ -75,7 +75,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search ...@@ -75,7 +75,7 @@ The tool dispatches and runs trial jobs generated by tuning algorithms to search
<li><a href="docs/RemoteMachineMode.md">Remote Servers</a></li> <li><a href="docs/RemoteMachineMode.md">Remote Servers</a></li>
<li><a href="docs/PAIMode.md">OpenPAI</a></li> <li><a href="docs/PAIMode.md">OpenPAI</a></li>
<li><a href="docs/KubeflowMode.md">Kubeflow</a></li> <li><a href="docs/KubeflowMode.md">Kubeflow</a></li>
<li><a href="docs/KubeflowMode.md">FrameworkController on K8S (AKS etc.)</a></li> <li><a href="docs/FrameworkControllerMode.md">FrameworkController on K8S (AKS etc.)</a></li>
</ul> </ul>
</td> </td>
</tr> </tr>
......
...@@ -131,7 +131,7 @@ nnictl support commands: ...@@ -131,7 +131,7 @@ nnictl support commands:
* Description * Description
You can use this command to update an experiment's concurrency. You can use this command to update an experiment's duration.
* Usage * Usage
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: easy-to-use, scalability, flexibility, and efficiency. NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: easy-to-use, scalability, flexibility, and efficiency.
* **Easy-to-use**: NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both commandline tool and WebUI to work with your experiments. * **Easy-to-use**: NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both commandline tool and WebUI to work with your experiments.
* **Scalability**: Tuning hyperparameters or neural architecture often demands large amount of computation resource, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., PAI, Kubernetes). Thousands of trials could run in parallel by depending on the capacity of your configured training platforms. * **Scalability**: Tuning hyperparameters or neural architecture often demands large amount of computation resource, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., OpenPAI, Kubernetes). Hundreds of trials could run in parallel by depending on the capacity of your configured training platforms.
* **Flexibility**: Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users could also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them. * **Flexibility**: Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users could also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them.
* **Efficiency**: We are intensively working on more efficient model tuning from both system level and algorithm level. For example, leveraging early feedback to speedup tuning procedure. * **Efficiency**: We are intensively working on more efficient model tuning from both system level and algorithm level. For example, leveraging early feedback to speedup tuning procedure.
...@@ -27,7 +27,7 @@ The figure below shows high-level architecture of NNI. ...@@ -27,7 +27,7 @@ The figure below shows high-level architecture of NNI.
* *Assessor*: Assessor analyzes trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not. * *Assessor*: Assessor analyzes trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
* *Training Platform*: It means where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., PAI, Kubernetes). * *Training Platform*: It means where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., OpenPAI, Kubernetes).
Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted. Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted.
......
...@@ -61,7 +61,7 @@ Once complete to fill NNI experiment config file and save (for example, save as ...@@ -61,7 +61,7 @@ Once complete to fill NNI experiment config file and save (for example, save as
nnictl create --config exp_pai.yml nnictl create --config exp_pai.yml
``` ```
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`. to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see the pai jobs created by NNI in your OpenPAI cluster's web portal, like: You can see jobs created by NNI in the OpenPAI cluster's web portal, like:
![](./img/nni_pai_joblist.jpg) ![](./img/nni_pai_joblist.jpg)
Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic. Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
......
...@@ -24,15 +24,15 @@ Here is an example script to train a CNN on MNIST dataset **without NNI**: ...@@ -24,15 +24,15 @@ Here is an example script to train a CNN on MNIST dataset **without NNI**:
def run_trial(params): def run_trial(params):
# Input data # Input data
mnist = input_data.read_data_sets(params['data_dir'], one_hot=True) mnist = input_data.read_data_sets(params['data_dir'], one_hot=True)
# Build MNIST network # Build network
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'], channel_2_num=params['channel_2_num'], conv_size=params['conv_size'], hidden_size=params['hidden_size'], pool_size=params['pool_size'], learning_rate=params['learning_rate']) mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'], channel_2_num=params['channel_2_num'], conv_size=params['conv_size'], hidden_size=params['hidden_size'], pool_size=params['pool_size'], learning_rate=params['learning_rate'])
mnist_network.build_network() mnist_network.build_network()
test_acc = 0.0 test_acc = 0.0
with tf.Session() as sess: with tf.Session() as sess:
# Train MNIST network # Train network
mnist_network.train(sess, mnist) mnist_network.train(sess, mnist)
# Evaluate MNIST network # Evaluate network
test_acc = mnist_network.evaluate(mnist) test_acc = mnist_network.evaluate(mnist)
if __name__ == '__main__': if __name__ == '__main__':
...@@ -104,7 +104,7 @@ If you want to use NNI to automatically train your model and find the optimal hy ...@@ -104,7 +104,7 @@ If you want to use NNI to automatically train your model and find the optimal hy
*Implemented code directory: [mnist.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist/mnist.py)* *Implemented code directory: [mnist.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist/mnist.py)*
**Step 3**: Define a `config` file in YAML, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max runtime arguments. **Step 3**: Define a `config` file in YAML, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max duration arguments.
```yaml ```yaml
authorName: default authorName: default
...@@ -209,7 +209,7 @@ Click the tab "Trial Duration" to see the bar graph. ...@@ -209,7 +209,7 @@ Click the tab "Trial Duration" to see the bar graph.
Below is the status of the all trials. Specifically: Below is the status of the all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file. * Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
* If you run a pai experiment, you can also see the hdfsLogPath. * If you run on the OpenPAI platform, you can also see the hdfsLogPath.
* Kill: you can kill a job that status is running. * Kill: you can kill a job that status is running.
* Support to search for a specific trial. * Support to search for a specific trial.
......
...@@ -20,7 +20,7 @@ ...@@ -20,7 +20,7 @@
#### User Experience improvements #### User Experience improvements
* A better trial logging support for NNI experiments in PAI, Kubeflow and FrameworkController mode: * A better trial logging support for NNI experiments in OpenPAI, Kubeflow and FrameworkController mode:
* An improved logging architecture to send stdout/stderr of trials to NNI manager via Http post. NNI manager will store trial's stdout/stderr messages in local log file. * An improved logging architecture to send stdout/stderr of trials to NNI manager via Http post. NNI manager will store trial's stdout/stderr messages in local log file.
* Show the link for trial log file on WebUI. * Show the link for trial log file on WebUI.
* Support to show final result's all key-value pairs. * Support to show final result's all key-value pairs.
...@@ -83,9 +83,9 @@ ...@@ -83,9 +83,9 @@
* Docker file update, add pytorch library * Docker file update, add pytorch library
* Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API. * Refactor 'nnictl stop' process, send SIGTERM to nni manager process, rather than calling stop Rest API.
* OpenPAI training service bug fix * OpenPAI training service bug fix
* Support NNI Manager IP configuration(nniManagerIp) in PAI cluster config file, to fix the issue that user’s machine has no eth0 device * Support NNI Manager IP configuration(nniManagerIp) in OpenPAI cluster config file, to fix the issue that user’s machine has no eth0 device
* File number in codeDir is capped to 1000 now, to avoid user mistakenly fill root dir for codeDir * File number in codeDir is capped to 1000 now, to avoid user mistakenly fill root dir for codeDir
* Don’t print useless ‘metrics is empty’ log int PAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks PAI trial’s output for debugging purpose * Don’t print useless ‘metrics is empty’ log in OpenPAI job’s stdout. Only print useful message once new metrics are recorded, to reduce confusion when user checks OpenPAI trial’s output for debugging purpose
* Add timestamp at the beginning of each log entry in trial keeper. * Add timestamp at the beginning of each log entry in trial keeper.
## Release 0.3.0 - 11/2/2018 ## Release 0.3.0 - 11/2/2018
...@@ -146,7 +146,7 @@ ...@@ -146,7 +146,7 @@
### Major Features ### Major Features
* Support [OpenPAI](https://github.com/Microsoft/pai) (aka pai) Training Service (See [here](./PAIMode.md) for instructions about how to submit NNI job in pai mode) * Support [OpenPAI](https://github.com/Microsoft/pai) Training Platform (See [here](./PAIMode.md) for instructions about how to submit NNI job in pai mode)
* Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster * Support training services on pai mode. NNI trials will be scheduled to run on OpenPAI cluster
* NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking * NNI trial's output (including logs and model file) will be copied to OpenPAI HDFS for further debugging and checking
* Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](Builtin_Tuner.md) for instructions about how to use SMAC tuner) * Support [SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) tuner (See [here](Builtin_Tuner.md) for instructions about how to use SMAC tuner)
......
...@@ -25,7 +25,7 @@ Also we have another version which time cost is less and performance is better. ...@@ -25,7 +25,7 @@ Also we have another version which time cost is less and performance is better.
Execute the following command to download needed files Execute the following command to download needed files
using the downloading script: using the downloading script:
``` ```bash
chmod +x ./download.sh chmod +x ./download.sh
./download.sh ./download.sh
``` ```
...@@ -34,14 +34,14 @@ Or Download manually ...@@ -34,14 +34,14 @@ Or Download manually
1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/ 1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/
``` ```bash
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
``` ```
2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/ 2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/
``` ```bash
wget http://nlp.stanford.edu/data/glove.840B.300d.zip wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip unzip glove.840B.300d.zip
``` ```
...@@ -49,7 +49,7 @@ unzip glove.840B.300d.zip ...@@ -49,7 +49,7 @@ unzip glove.840B.300d.zip
### 2.2 Update configuration ### 2.2 Update configuration
Modify `nni/examples/trials/ga_squad/config.yml`, here is the default configuration: Modify `nni/examples/trials/ga_squad/config.yml`, here is the default configuration:
``` ```yaml
authorName: default authorName: default
experimentName: example_ga_squad experimentName: example_ga_squad
trialConcurrency: 1 trialConcurrency: 1
...@@ -75,7 +75,7 @@ In the "trial" part, if you want to use GPU to perform the architecture search, ...@@ -75,7 +75,7 @@ In the "trial" part, if you want to use GPU to perform the architecture search,
### 2.3 submit this job ### 2.3 submit this job
``` ```bash
nnictl create --config ~/nni/examples/trials/ga_squad/config.yml nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
``` ```
...@@ -84,9 +84,10 @@ nnictl create --config ~/nni/examples/trials/ga_squad/config.yml ...@@ -84,9 +84,10 @@ nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours. Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours.
### 3.1 Update configuration ### 3.1 Update configuration
Modify `nni/examples/trials/ga_squad/config_pai.yml`, here is the default configuration: Modify `nni/examples/trials/ga_squad/config_pai.yml`, here is the default configuration:
``` ```yaml
authorName: default authorName: default
experimentName: example_ga_squad experimentName: example_ga_squad
trialConcurrency: 1 trialConcurrency: 1
...@@ -110,18 +111,18 @@ trial: ...@@ -110,18 +111,18 @@ trial:
gpuNum: 0 gpuNum: 0
cpuNum: 1 cpuNum: 1
memoryMB: 32869 memoryMB: 32869
#The docker image to run nni job on pai #The docker image to run nni job on OpenPAI
image: msranni/nni:latest image: msranni/nni:latest
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' #The hdfs directory to store data on OpenPAI, format 'hdfs://host:port/directory'
dataDir: hdfs://10.10.10.10:9000/username/nni dataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' #The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
outputDir: hdfs://10.10.10.10:9000/username/nni outputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig: paiConfig:
#The username to login pai #The username to login OpenPAI
userName: username userName: username
#The password to login pai #The password to login OpenPAI
passWord: password passWord: password
#The host of restful server of pai #The host of restful server of OpenPAI
host: 10.10.10.10 host: 10.10.10.10
``` ```
...@@ -133,13 +134,14 @@ In the "trial" part, if you want to use GPU to perform the architecture search, ...@@ -133,13 +134,14 @@ In the "trial" part, if you want to use GPU to perform the architecture search,
### 3.2 submit this job ### 3.2 submit this job
``` ```bash
nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml
``` ```
## 4. Technical details about the trial ## 4. Technical details about the trial
### 4.1 How does it works ### 4.1 How does it works
The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner. The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
### 4.2 The trial ### 4.2 The trial
...@@ -157,7 +159,7 @@ Among those files, `trial.py` and `graph_to_tf.py` are special. ...@@ -157,7 +159,7 @@ Among those files, `trial.py` and `graph_to_tf.py` are special.
`graph_to_tf.py` has a function named as `graph_to_network`, here is its skeleton code: `graph_to_tf.py` has a function named as `graph_to_network`, here is its skeleton code:
``` ```python
def graph_to_network(input1, def graph_to_network(input1,
input2, input2,
input1_lengths, input1_lengths,
...@@ -190,13 +192,13 @@ def graph_to_network(input1, ...@@ -190,13 +192,13 @@ def graph_to_network(input1,
As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph. As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
``` ```python
topology = graph.is_topology() topology = graph.is_topology()
``` ```
performs topological sorting on the internal graph representation, and the code inside the loop: performs topological sorting on the internal graph representation, and the code inside the loop:
``` ```python
for _, topo_i in enumerate(topology): for _, topo_i in enumerate(topology):
``` ```
...@@ -206,7 +208,7 @@ performs actually conversion that maps each layer to a part in Tensorflow comput ...@@ -206,7 +208,7 @@ performs actually conversion that maps each layer to a part in Tensorflow comput
The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`: The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
``` ```python
class CustomerTuner(Tuner): class CustomerTuner(Tuner):
# ...... # ......
...@@ -235,13 +237,13 @@ class CustomerTuner(Tuner): ...@@ -235,13 +237,13 @@ class CustomerTuner(Tuner):
indiv.mutation() indiv.mutation()
graph = indiv.config graph = indiv.config
temp = json.loads(graph_dumps(graph)) temp = json.loads(graph_dumps(graph))
# ...... # ......
``` ```
As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines: As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
``` ```python
if self.population[0].result > self.population[1].result: if self.population[0].result > self.population[1].result:
self.population[0] = self.population[1] self.population[0] = self.population[1]
indiv = copy.deepcopy(self.population[0]) indiv = copy.deepcopy(self.population[0])
...@@ -253,7 +255,7 @@ controls the mutation process. It will always take two random individuals in the ...@@ -253,7 +255,7 @@ controls the mutation process. It will always take two random individuals in the
Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure. Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
``` ```json
{ {
"max_layer_num": 50, "max_layer_num": 50,
"layers": [ "layers": [
...@@ -300,9 +302,9 @@ Here is an example of the model configuration, which is passed from the tuner to ...@@ -300,9 +302,9 @@ Here is an example of the model configuration, which is passed from the tuner to
Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where: Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
* `type` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively. * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively.
* `size` is the length of the output. "x", "y" correspond to document length / question length, respectively. * `size` is the length of the output. "x", "y" correspond to document length / question length, respectively.
* `input_size` is the number of inputs the layer has. * `input_size` is the number of inputs the layer has.
* `input` is the indices of layers taken as input of this layer. * `input` is the indices of layers taken as input of this layer.
* `output` is the indices of layers use this layer's output as their input. * `output` is the indices of layers use this layer's output as their input.
* `is_delete` means whether the layer is still available. * `is_delete` means whether the layer is still available.
...@@ -8,7 +8,7 @@ To define a search space, users should define the name of variable, the type of ...@@ -8,7 +8,7 @@ To define a search space, users should define the name of variable, the type of
* A example of search space definition as follow: * A example of search space definition as follow:
```python ```yaml
{ {
"dropout_rate":{"_type":"uniform","_value":[0.1,0.5]}, "dropout_rate":{"_type":"uniform","_value":[0.1,0.5]},
"conv_size":{"_type":"choice","_value":[2,3,5,7]}, "conv_size":{"_type":"choice","_value":[2,3,5,7]},
...@@ -19,59 +19,47 @@ To define a search space, users should define the name of variable, the type of ...@@ -19,59 +19,47 @@ To define a search space, users should define the name of variable, the type of
``` ```
Take the first line as an example. `dropout_rate` is defined as a variable whose priori distribution is a uniform distribution of a range from `0.1` and `0.5`.
Take the first line as an example. ```dropout_rate``` is defined as a variable whose priori distribution is a uniform distribution of a range from ```0.1``` and ```0.5```.
## Types ## Types
All types of sampling strategies and their parameter are listed here: All types of sampling strategies and their parameter are listed here:
* {"_type":"choice","_value":options} * {"_type":"choice","_value":options}
* Which means the variable value is one of the options, which should be a list. The elements of options can themselves be [nested] stochastic expressions. In this case, the stochastic choices that only appear in some of the options become conditional parameters. * Which means the variable value is one of the options, which should be a list. The elements of options can themselves be [nested] stochastic expressions. In this case, the stochastic choices that only appear in some of the options become conditional parameters.
<br/>
* {"_type":"randint","_value":[upper]} * {"_type":"randint","_value":[upper]}
* Which means the variable value is a random integer in the range [0, upper). The semantics of this distribution is that there is no more correlation in the loss function between nearby integer values, as compared with more distant integer values. This is an appropriate distribution for describing random seeds for example. If the loss function is probably more correlated for nearby integer values, then you should probably use one of the "quantized" continuous distributions, such as either quniform, qloguniform, qnormal or qlognormal. Note that if you want to change lower bound, you can use `quniform` for now. * Which means the variable value is a random integer in the range [0, upper). The semantics of this distribution is that there is no more correlation in the loss function between nearby integer values, as compared with more distant integer values. This is an appropriate distribution for describing random seeds for example. If the loss function is probably more correlated for nearby integer values, then you should probably use one of the "quantized" continuous distributions, such as either quniform, qloguniform, qnormal or qlognormal. Note that if you want to change lower bound, you can use `quniform` for now.
<br/>
* {"_type":"uniform","_value":[low, high]} * {"_type":"uniform","_value":[low, high]}
* Which means the variable value is a value uniformly between low and high. * Which means the variable value is a value uniformly between low and high.
* When optimizing, this variable is constrained to a two-sided interval. * When optimizing, this variable is constrained to a two-sided interval.
<br/>
* {"_type":"quniform","_value":[low, high, q]} * {"_type":"quniform","_value":[low, high, q]}
* Which means the variable value is a value like round(uniform(low, high) / q) * q * Which means the variable value is a value like round(uniform(low, high) / q) * q
* Suitable for a discrete value with respect to which the objective is still somewhat "smooth", but which should be bounded both above and below. If you want to uniformly choose integer from a range [low, high], you can write `_value` like this: `[low, high, 1]`. * Suitable for a discrete value with respect to which the objective is still somewhat "smooth", but which should be bounded both above and below. If you want to uniformly choose integer from a range [low, high], you can write `_value` like this: `[low, high, 1]`.
<br/>
* {"_type":"loguniform","_value":[low, high]} * {"_type":"loguniform","_value":[low, high]}
* Which means the variable value is a value drawn from a range [low, high] according to a loguniform distribution like exp(uniform(log(low), log(high))), so that the logarithm of the return value is uniformly distributed. * Which means the variable value is a value drawn from a range [low, high] according to a loguniform distribution like exp(uniform(log(low), log(high))), so that the logarithm of the return value is uniformly distributed.
* When optimizing, this variable is constrained to be positive. * When optimizing, this variable is constrained to be positive.
<br/>
* {"_type":"qloguniform","_value":[low, high, q]} * {"_type":"qloguniform","_value":[low, high, q]}
* Which means the variable value is a value like round(loguniform(low, high)) / q) * q * Which means the variable value is a value like round(loguniform(low, high)) / q) * q
* Suitable for a discrete variable with respect to which the objective is "smooth" and gets smoother with the size of the value, but which should be bounded both above and below. * Suitable for a discrete variable with respect to which the objective is "smooth" and gets smoother with the size of the value, but which should be bounded both above and below.
<br/>
* {"_type":"normal","_value":[label, mu, sigma]} * {"_type":"normal","_value":[label, mu, sigma]}
* Which means the variable value is a real value that's normally-distributed with mean mu and standard deviation sigma. When optimizing, this is an unconstrained variable. * Which means the variable value is a real value that's normally-distributed with mean mu and standard deviation sigma. When optimizing, this is an unconstrained variable.
<br/>
* {"_type":"qnormal","_value":[label, mu, sigma, q]} * {"_type":"qnormal","_value":[label, mu, sigma, q]}
* Which means the variable value is a value like round(normal(mu, sigma) / q) * q * Which means the variable value is a value like round(normal(mu, sigma) / q) * q
* Suitable for a discrete variable that probably takes a value around mu, but is fundamentally unbounded. * Suitable for a discrete variable that probably takes a value around mu, but is fundamentally unbounded.
<br/>
* {"_type":"lognormal","_value":[label, mu, sigma]} * {"_type":"lognormal","_value":[label, mu, sigma]}
* Which means the variable value is a value drawn according to exp(normal(mu, sigma)) so that the logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive. * Which means the variable value is a value drawn according to exp(normal(mu, sigma)) so that the logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive.
<br/>
* {"_type":"qlognormal","_value":[label, mu, sigma, q]} * {"_type":"qlognormal","_value":[label, mu, sigma, q]}
* Which means the variable value is a value like round(exp(normal(mu, sigma)) / q) * q * Which means the variable value is a value like round(exp(normal(mu, sigma)) / q) * q
* Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side. * Suitable for a discrete variable with respect to which the objective is smooth and gets smoother with the size of the variable, which is bounded from one side.
<br/>
## Search Space Types Supported by Each Tuner ## Search Space Types Supported by Each Tuner
...@@ -87,7 +75,7 @@ All types of sampling strategies and their parameter are listed here: ...@@ -87,7 +75,7 @@ All types of sampling strategies and their parameter are listed here:
| Hyperband Advisor | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | | Hyperband Advisor | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; | &#10003; |
| Metis Tuner | &#10003; | &#10003; | &#10003; | &#10003; | | | | | | | | Metis Tuner | &#10003; | &#10003; | &#10003; | &#10003; | | | | | | |
Note that In GridSearch Tuner, for users' convenience, the definition of `quniform` and `qloguniform` change, where q here specifies the number of values that will be sampled. Details about them are listed as follows Note that In Grid Search Tuner, for users' convenience, the definition of `quniform` and `qloguniform` change, where q here specifies the number of values that will be sampled. Details about them are listed as follows
* Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the number of values that will be sampled evenly. Note that q should be at least 2. It will be sampled in a way that the first sampled value is 'low', and each of the following values is (high-low)/q larger that the value in front of it. * Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the number of values that will be sampled evenly. Note that q should be at least 2. It will be sampled in a way that the first sampled value is 'low', and each of the following values is (high-low)/q larger that the value in front of it.
* Type 'qloguniform' behaves like 'quniform' except that it will first change the range to [log(low), log(high)] and sample and then change the sampled value back. * Type 'qloguniform' behaves like 'quniform' except that it will first change the range to [log(low), log(high)] and sample and then change the sampled value back.
......
...@@ -41,7 +41,7 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically: ...@@ -41,7 +41,7 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically:
![](./img/webui-img/detail-local.png) ![](./img/webui-img/detail-local.png)
* If you run a pai or kubeflow experiment, you can also see the hdfsLog. * If you run on OpenPAI or Kubeflow platform, you can also see the hdfsLog.
![](./img/webui-img/detail-pai.png) ![](./img/webui-img/detail-pai.png)
......
...@@ -5,7 +5,6 @@ Gradient boosting decision tree has many popular implementations, such as [light ...@@ -5,7 +5,6 @@ Gradient boosting decision tree has many popular implementations, such as [light
NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently. NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently.
## 1. Search Space in GBDT ## 1. Search Space in GBDT
There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example): There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example):
...@@ -13,7 +12,7 @@ There are many hyper-parameters in GBDT, but what kind of parameters will affect ...@@ -13,7 +12,7 @@ There are many hyper-parameters in GBDT, but what kind of parameters will affect
* `learning_rate`. The range of `learning rate` could be [0.001, 0.9]. * `learning_rate`. The range of `learning rate` could be [0.001, 0.9].
* `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them. * `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them.
* `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10] * `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10]
* `num_iterations`. May larger if underfitting. * `num_iterations`. May larger if underfitting.
...@@ -22,7 +21,7 @@ There are many hyper-parameters in GBDT, but what kind of parameters will affect ...@@ -22,7 +21,7 @@ There are many hyper-parameters in GBDT, but what kind of parameters will affect
* `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0]. * `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0].
* `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0]. * `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0].
* `max_bin`. * `max_bin`.
> * To avoid overfitting > * To avoid overfitting
...@@ -37,19 +36,19 @@ There are many hyper-parameters in GBDT, but what kind of parameters will affect ...@@ -37,19 +36,19 @@ There are many hyper-parameters in GBDT, but what kind of parameters will affect
* `num_leaves`. * `num_leaves`.
Reference link: Reference link:
[lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and [lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and [autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf)
[autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf)
## 2. Task description ## 2. Task description
Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train). Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train).
Given the features and label in train data, we train a GBDT regression model and use it to predict. Given the features and label in train data, we train a GBDT regression model and use it to predict.
## 3. How to run in nni ## 3. How to run in nni
### 3.1 Prepare your trial code ### 3.1 Prepare your trial code
You need to prepare a basic code as following: You need to prepare a basic code as following:
``` python
```python
... ...
def get_default_parameters(): def get_default_parameters():
...@@ -75,7 +74,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test): ...@@ -75,7 +74,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test):
# predict # predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval # eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5 rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse) print('The rmse of prediction is:', rmse)
...@@ -88,9 +87,9 @@ if __name__ == '__main__': ...@@ -88,9 +87,9 @@ if __name__ == '__main__':
``` ```
### 3.2 Prepare your search space. ### 3.2 Prepare your search space.
If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`, If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`, you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow:
you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow:
``` ```json
{ {
"num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]}, "num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]},
"learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]}, "learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]},
...@@ -102,6 +101,7 @@ you could write a [search_space.json](https://github.com/Microsoft/nni/blob/mast ...@@ -102,6 +101,7 @@ you could write a [search_space.json](https://github.com/Microsoft/nni/blob/mast
More support variable type you could reference [here](https://github.com/Microsoft/nni/blob/master/docs/SearchSpaceSpec.md). More support variable type you could reference [here](https://github.com/Microsoft/nni/blob/master/docs/SearchSpaceSpec.md).
### 3.3 Add SDK of nni into your code. ### 3.3 Add SDK of nni into your code.
```diff ```diff
+import nni +import nni
... ...
...@@ -129,7 +129,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test): ...@@ -129,7 +129,7 @@ def run(lgb_train, lgb_eval, params, X_test, y_test):
# predict # predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval # eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5 rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse) print('The rmse of prediction is:', rmse)
+ nni.report_final_result(rmse) + nni.report_final_result(rmse)
...@@ -147,6 +147,7 @@ if __name__ == '__main__': ...@@ -147,6 +147,7 @@ if __name__ == '__main__':
``` ```
### 3.4 Write a config file and run it. ### 3.4 Write a config file and run it.
In the config file, you could set some settings including: In the config file, you could set some settings including:
* Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc. * Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc.
...@@ -155,6 +156,7 @@ In the config file, you could set some settings including: ...@@ -155,6 +156,7 @@ In the config file, you could set some settings including:
* Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc. * Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc.
An config.yml as follow: An config.yml as follow:
```yaml ```yaml
authorName: default authorName: default
experimentName: example_auto-gbdt experimentName: example_auto-gbdt
...@@ -180,6 +182,7 @@ trial: ...@@ -180,6 +182,7 @@ trial:
``` ```
Run this experiment with command as follow: Run this experiment with command as follow:
```
```bash
nnictl create --config ./config.yml nnictl create --config ./config.yml
``` ```
\ No newline at end of file
# Scikit-learn in NNI # Scikit-learn in NNI
[Scikit-learn](https://github.com/scikit-learn/scikit-learn) is a pupular meachine learning tool for data mining and data analysis. It supports many kinds of meachine learning models like LinearRegression, LogisticRegression, DecisionTree, SVM etc. How to make the use of scikit-learn more efficiency is a valuable topic. [Scikit-learn](https://github.com/scikit-learn/scikit-learn) is a pupular meachine learning tool for data mining and data analysis. It supports many kinds of meachine learning models like LinearRegression, LogisticRegression, DecisionTree, SVM etc. How to make the use of scikit-learn more efficiency is a valuable topic.
NNI supports many kinds of tuning algorithms to search the best models and/or hyper-parameters for scikit-learn, and support many kinds of environments like local machine, remote servers and cloud. NNI supports many kinds of tuning algorithms to search the best models and/or hyper-parameters for scikit-learn, and support many kinds of environments like local machine, remote servers and cloud.
## 1. How to run the example ## 1. How to run the example
To start using NNI, you should install the nni package, and use the command line tool `nnictl` to start an experiment. For more information about installation and preparing for the environment, please [refer](QuickStart.md). To start using NNI, you should install the nni package, and use the command line tool `nnictl` to start an experiment. For more information about installation and preparing for the environment, please [refer](QuickStart.md).
After you installed NNI, you could enter the corresponding folder and start the experiment using following commands: After you installed NNI, you could enter the corresponding folder and start the experiment using following commands:
```
```bash
nnictl create --config ./config.yml nnictl create --config ./config.yml
``` ```
## 2. Description of the example ## 2. Description of the example
### 2.1 classification ### 2.1 classification
This example uses the dataset of digits, which is made up of 1797 8x8 images, and each image is a hand-written digit, the goal is to classify these images into 10 classes. This example uses the dataset of digits, which is made up of 1797 8x8 images, and each image is a hand-written digit, the goal is to classify these images into 10 classes.
In this example, we use SVC as the model, and choose some parameters of this model, including `"C", "keral", "degree", "gamma" and "coef0"`. For more information of these parameters, please [refer](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). In this example, we use SVC as the model, and choose some parameters of this model, including `"C", "keral", "degree", "gamma" and "coef0"`. For more information of these parameters, please [refer](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html).
### 2.2 regression ### 2.2 regression
This example uses the Boston Housing Dataset, this dataset consists of price of houses in various places in Boston and the information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE) etc to predict the house price of boston. This example uses the Boston Housing Dataset, this dataset consists of price of houses in various places in Boston and the information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE) etc to predict the house price of boston.
In this example, we tune different kinds of regression models including `"LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"` and some parameters like `"svr_kernel", "knr_weights"`. You could get more details about these models from [here](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning). In this example, we tune different kinds of regression models including `"LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"` and some parameters like `"svr_kernel", "knr_weights"`. You could get more details about these models from [here](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning).
## 3. How to write sklearn code using nni ## 3. How to write sklearn code using nni
It is easy to use nni in your sklearn code, there are only a few steps. It is easy to use nni in your sklearn code, there are only a few steps.
* __step 1__
Prepare a search_space.json to storage your choose spaces. * __step 1__
Prepare a search_space.json to storage your choose spaces.
For example, if you want to choose different models, you may try: For example, if you want to choose different models, you may try:
```
```json
{ {
"model_name":{"_type":"choice","_value":["LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"]} "model_name":{"_type":"choice","_value":["LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"]}
} }
``` ```
If you want to choose different models and parameters, you could put them together in a search_space.json file. If you want to choose different models and parameters, you could put them together in a search_space.json file.
```
```json
{ {
"model_name":{"_type":"choice","_value":["LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"]}, "model_name":{"_type":"choice","_value":["LinearRegression", "SVR", "KNeighborsRegressor", "DecisionTreeRegressor"]},
"svr_kernel": {"_type":"choice","_value":["linear", "poly", "rbf"]}, "svr_kernel": {"_type":"choice","_value":["linear", "poly", "rbf"]},
"knr_weights": {"_type":"choice","_value":["uniform", "distance"]} "knr_weights": {"_type":"choice","_value":["uniform", "distance"]}
} }
``` ```
Then you could read these values as a dict from your python code, please get into the step 2. Then you could read these values as a dict from your python code, please get into the step 2.
* __step 2__ * __step 2__
At the beginning of your python code, you should `import nni` to insure the packages works normally. At the beginning of your python code, you should `import nni` to insure the packages works normally.
First, you should use `nni.get_next_parameter()` function to get your parameters given by nni. Then you could use these parameters to update your code. First, you should use `nni.get_next_parameter()` function to get your parameters given by nni. Then you could use these parameters to update your code.
For example, if you define your search_space.json like following format: For example, if you define your search_space.json like following format:
```
```json
{ {
"C": {"_type":"uniform","_value":[0.1, 1]}, "C": {"_type":"uniform","_value":[0.1, 1]},
"keral": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]}, "keral": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]},
...@@ -53,8 +64,10 @@ It is easy to use nni in your sklearn code, there are only a few steps. ...@@ -53,8 +64,10 @@ It is easy to use nni in your sklearn code, there are only a few steps.
"coef0 ": {"_type":"uniform","_value":[0.01, 0.1]} "coef0 ": {"_type":"uniform","_value":[0.01, 0.1]}
} }
``` ```
You may get a parameter dict like this: You may get a parameter dict like this:
```
```python
params = { params = {
'C': 1.0, 'C': 1.0,
'keral': 'linear', 'keral': 'linear',
...@@ -63,7 +76,8 @@ It is easy to use nni in your sklearn code, there are only a few steps. ...@@ -63,7 +76,8 @@ It is easy to use nni in your sklearn code, there are only a few steps.
'coef0': 0.01 'coef0': 0.01
} }
``` ```
Then you could use these variables to write your scikit-learn code. Then you could use these variables to write your scikit-learn code.
* __step 3__ * __step 3__
After you finished your training, you could get your own score of the model, like your percision, recall or MSE etc. NNI needs your score to tuner algorithms and generate next group of parameters, please report the score back to NNI and start next trial job. After you finished your training, you could get your own score of the model, like your percision, recall or MSE etc. NNI needs your score to tuner algorithms and generate next group of parameters, please report the score back to NNI and start next trial job.
You just need to use `nni.report_final_result(score)` to communitate with NNI after you process your scikit-learn code. Or if you have multiple scores in the steps of training, you could also report them back to NNI using `nni.report_intemediate_result(score)`. Note, you may not report intemediate result of your job, but you must report back your final result. You just need to use `nni.report_final_result(score)` to communitate with NNI after you process your scikit-learn code. Or if you have multiple scores in the steps of training, you could also report them back to NNI using `nni.report_intemediate_result(score)`. Note, you may not report intemediate result of your job, but you must report back your final result.
...@@ -114,18 +114,18 @@ trial: ...@@ -114,18 +114,18 @@ trial:
gpuNum: 0 gpuNum: 0
cpuNum: 1 cpuNum: 1
memoryMB: 32869 memoryMB: 32869
#The docker image to run NNI job on pai #The docker image to run NNI job on OpenPAI
image: msranni/nni:latest image: msranni/nni:latest
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' #The hdfs directory to store data on OpenPAI, format 'hdfs://host:port/directory'
dataDir: hdfs://10.10.10.10:9000/username/nni dataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by NNI, format 'hdfs://host:port/directory' #The hdfs directory to store output data generated by NNI, format 'hdfs://host:port/directory'
outputDir: hdfs://10.10.10.10:9000/username/nni outputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig: paiConfig:
#The username to login pai #The username to login OpenPAI
userName: username userName: username
#The password to login pai #The password to login OpenPAI
passWord: password passWord: password
#The host of restful server of pai #The host of restful server of OpenPAI
host: 10.10.10.10 host: 10.10.10.10
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment