Unverified Commit 8b9bcf16 authored by Yan Ni's avatar Yan Ni Committed by GitHub
Browse files

move docs under en_US (#744)

* move en docs under en_US

* update gitignore

* fix image link

* add doc guide

* add napoleon ext
parent bbd441b3
...@@ -62,17 +62,17 @@ nnictl create --config exp_pai.yml ...@@ -62,17 +62,17 @@ nnictl create --config exp_pai.yml
``` ```
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`. to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see jobs created by NNI in the OpenPAI cluster's web portal, like: You can see jobs created by NNI in the OpenPAI cluster's web portal, like:
![](./img/nni_pai_joblist.jpg) ![](../img/nni_pai_joblist.jpg)
Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic. Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information. Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
Expand a trial information in trial list view, click the logPath link like: Expand a trial information in trial list view, click the logPath link like:
![](./img/nni_webui_joblist.jpg) ![](../img/nni_webui_joblist.jpg)
And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS: And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS:
![](./img/nni_trial_hdfs_output.jpg) ![](../img/nni_trial_hdfs_output.jpg)
You can see there're three fils in output folder: stderr, stdout, and trial.log You can see there're three fils in output folder: stderr, stdout, and trial.log
......
...@@ -183,28 +183,28 @@ Click the tab "Overview". ...@@ -183,28 +183,28 @@ Click the tab "Overview".
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also support `download these information and parameters` through the **Download** button. You can download the experiment result anytime in the middle for the running or at the end of the execution, etc. Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also support `download these information and parameters` through the **Download** button. You can download the experiment result anytime in the middle for the running or at the end of the execution, etc.
![](./img/QuickStart1.png) ![](../img/QuickStart1.png)
Top 10 trials will be listed in the Overview page, you can browse all the trials in "Trials Detail" page. Top 10 trials will be listed in the Overview page, you can browse all the trials in "Trials Detail" page.
![](./img/QuickStart2.png) ![](../img/QuickStart2.png)
#### View trials detail page #### View trials detail page
Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message. Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
![](./img/QuickStart3.png) ![](../img/QuickStart3.png)
Click the tab "Hyper Parameter" to see the parallel graph. Click the tab "Hyper Parameter" to see the parallel graph.
* You can select the percentage to see top trials. * You can select the percentage to see top trials.
* Choose two axis to swap its positions * Choose two axis to swap its positions
![](./img/QuickStart4.png) ![](../img/QuickStart4.png)
Click the tab "Trial Duration" to see the bar graph. Click the tab "Trial Duration" to see the bar graph.
![](./img/QuickStart5.png) ![](../img/QuickStart5.png)
Below is the status of the all trials. Specifically: Below is the status of the all trials. Specifically:
...@@ -213,11 +213,11 @@ Below is the status of the all trials. Specifically: ...@@ -213,11 +213,11 @@ Below is the status of the all trials. Specifically:
* Kill: you can kill a job that status is running. * Kill: you can kill a job that status is running.
* Support to search for a specific trial. * Support to search for a specific trial.
![](./img/QuickStart6.png) ![](../img/QuickStart6.png)
* Intermediate Result Grap * Intermediate Result Grap
![](./img/QuickStart7.png) ![](../img/QuickStart7.png)
## Related Topic ## Related Topic
......
# Automatic Model Architecture Search for Reading Comprehension # Automatic Model Architecture Search for Reading Comprehension
This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension. This example shows us how to use Genetic Algorithm to find good model architectures for Reading Comprehension.
## 1. Search Space ## 1. Search Space
Since attention and recurrent neural network (RNN) have been proven effective in Reading Comprehension. Since attention and recurrent neural network (RNN) have been proven effective in Reading Comprehension.
We conclude the search space as follow: We conclude the search space as follow:
1. IDENTITY (Effectively means keep training). 1. IDENTITY (Effectively means keep training).
2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.) 2. INSERT-RNN-LAYER (Inserts a LSTM. Comparing the performance of GRU and LSTM in our experiment, we decided to use LSTM here.)
3. REMOVE-RNN-LAYER 3. REMOVE-RNN-LAYER
4. INSERT-ATTENTION-LAYER(Inserts an attention layer.) 4. INSERT-ATTENTION-LAYER(Inserts an attention layer.)
5. REMOVE-ATTENTION-LAYER 5. REMOVE-ATTENTION-LAYER
6. ADD-SKIP (Identity between random layers). 6. ADD-SKIP (Identity between random layers).
7. REMOVE-SKIP (Removes random skip). 7. REMOVE-SKIP (Removes random skip).
![](https://github.com/Microsoft/nni/tree/master/examples/trials/ga_squad/ga_squad.png) ![](../../examples/trials/ga_squad/ga_squad.png)
### New version ### New version
Also we have another version which time cost is less and performance is better. We will release soon. Also we have another version which time cost is less and performance is better. We will release soon.
## 2. How to run this example in local? ## 2. How to run this example in local?
### 2.1 Use downloading script to download data ### 2.1 Use downloading script to download data
Execute the following command to download needed files Execute the following command to download needed files
using the downloading script: using the downloading script:
```bash ```bash
chmod +x ./download.sh chmod +x ./download.sh
./download.sh ./download.sh
``` ```
Or Download manually Or Download manually
1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/ 1. download "dev-v1.1.json" and "train-v1.1.json" in https://rajpurkar.github.io/SQuAD-explorer/
```bash ```bash
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
``` ```
2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/ 2. download "glove.840B.300d.txt" in https://nlp.stanford.edu/projects/glove/
```bash ```bash
wget http://nlp.stanford.edu/data/glove.840B.300d.zip wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip unzip glove.840B.300d.zip
``` ```
### 2.2 Update configuration ### 2.2 Update configuration
Modify `nni/examples/trials/ga_squad/config.yml`, here is the default configuration: Modify `nni/examples/trials/ga_squad/config.yml`, here is the default configuration:
```yaml ```yaml
authorName: default authorName: default
experimentName: example_ga_squad experimentName: example_ga_squad
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 1 maxTrialNum: 1
#choice: local, remote #choice: local, remote
trainingServicePlatform: local trainingServicePlatform: local
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
codeDir: ~/nni/examples/tuners/ga_customer_tuner codeDir: ~/nni/examples/tuners/ga_customer_tuner
classFileName: customer_tuner.py classFileName: customer_tuner.py
className: CustomerTuner className: CustomerTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
trial: trial:
command: python3 trial.py command: python3 trial.py
codeDir: ~/nni/examples/trials/ga_squad codeDir: ~/nni/examples/trials/ga_squad
gpuNum: 0 gpuNum: 0
``` ```
In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result. In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
### 2.3 submit this job ### 2.3 submit this job
```bash ```bash
nnictl create --config ~/nni/examples/trials/ga_squad/config.yml nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
``` ```
## 3 Run this example on OpenPAI ## 3 Run this example on OpenPAI
Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours. Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours.
### 3.1 Update configuration ### 3.1 Update configuration
Modify `nni/examples/trials/ga_squad/config_pai.yml`, here is the default configuration: Modify `nni/examples/trials/ga_squad/config_pai.yml`, here is the default configuration:
```yaml ```yaml
authorName: default authorName: default
experimentName: example_ga_squad experimentName: example_ga_squad
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 1h maxExecDuration: 1h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote, pai #choice: local, remote, pai
trainingServicePlatform: pai trainingServicePlatform: pai
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
#Your nni_manager ip #Your nni_manager ip
nniManagerIp: 10.10.10.10 nniManagerIp: 10.10.10.10
tuner: tuner:
codeDir: https://github.com/Microsoft/nni/tree/master/examples/tuners/ga_customer_tuner codeDir: https://github.com/Microsoft/nni/tree/master/examples/tuners/ga_customer_tuner
classFileName: customer_tuner.py classFileName: customer_tuner.py
className: CustomerTuner className: CustomerTuner
classArgs: classArgs:
optimize_mode: maximize optimize_mode: maximize
trial: trial:
command: chmod +x ./download.sh && ./download.sh && python3 trial.py command: chmod +x ./download.sh && ./download.sh && python3 trial.py
codeDir: . codeDir: .
gpuNum: 0 gpuNum: 0
cpuNum: 1 cpuNum: 1
memoryMB: 32869 memoryMB: 32869
#The docker image to run nni job on OpenPAI #The docker image to run nni job on OpenPAI
image: msranni/nni:latest image: msranni/nni:latest
#The hdfs directory to store data on OpenPAI, format 'hdfs://host:port/directory' #The hdfs directory to store data on OpenPAI, format 'hdfs://host:port/directory'
dataDir: hdfs://10.10.10.10:9000/username/nni dataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' #The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory'
outputDir: hdfs://10.10.10.10:9000/username/nni outputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig: paiConfig:
#The username to login OpenPAI #The username to login OpenPAI
userName: username userName: username
#The password to login OpenPAI #The password to login OpenPAI
passWord: password passWord: password
#The host of restful server of OpenPAI #The host of restful server of OpenPAI
host: 10.10.10.10 host: 10.10.10.10
``` ```
Please change the default value to your personal account and machine information. Including `nniManagerIp`, `dataDir`, `outputDir`, `userName`, `passWord` and `host`. Please change the default value to your personal account and machine information. Including `nniManagerIp`, `dataDir`, `outputDir`, `userName`, `passWord` and `host`.
In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result. In the "trial" part, if you want to use GPU to perform the architecture search, change `gpuNum` from `0` to `1`. You need to increase the `maxTrialNum` and `maxExecDuration`, according to how long you want to wait for the search result.
`trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1. `trialConcurrency` is the number of trials running concurrently, which is the number of GPUs you want to use, if you are setting `gpuNum` to 1.
### 3.2 submit this job ### 3.2 submit this job
```bash ```bash
nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml nnictl create --config ~/nni/examples/trials/ga_squad/config_pai.yml
``` ```
## 4. Technical details about the trial ## 4. Technical details about the trial
### 4.1 How does it works ### 4.1 How does it works
The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner. The evolution-algorithm based architecture for question answering has two different parts just like any other examples: the trial and the tuner.
### 4.2 The trial ### 4.2 The trial
The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction: The trial has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
* `attention.py` contains an implementation for attention mechanism in Tensorflow. * `attention.py` contains an implementation for attention mechanism in Tensorflow.
* `data.py` contains functions for data preprocessing. * `data.py` contains functions for data preprocessing.
* `evaluate.py` contains the evaluation script. * `evaluate.py` contains the evaluation script.
* `graph.py` contains the definition of the computation graph. * `graph.py` contains the definition of the computation graph.
* `rnn.py` contains an implementation for GRU in Tensorflow. * `rnn.py` contains an implementation for GRU in Tensorflow.
* `train_model.py` is a wrapper for the whole question answering model. * `train_model.py` is a wrapper for the whole question answering model.
Among those files, `trial.py` and `graph_to_tf.py` are special. Among those files, `trial.py` and `graph_to_tf.py` are special.
`graph_to_tf.py` has a function named as `graph_to_network`, here is its skeleton code: `graph_to_tf.py` has a function named as `graph_to_network`, here is its skeleton code:
```python ```python
def graph_to_network(input1, def graph_to_network(input1,
input2, input2,
input1_lengths, input1_lengths,
input2_lengths, input2_lengths,
graph, graph,
dropout_rate, dropout_rate,
is_training, is_training,
num_heads=1, num_heads=1,
rnn_units=256): rnn_units=256):
topology = graph.is_topology() topology = graph.is_topology()
layers = dict() layers = dict()
layers_sequence_lengths = dict() layers_sequence_lengths = dict()
num_units = input1.get_shape().as_list()[-1] num_units = input1.get_shape().as_list()[-1]
layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \ layers[0] = input1*tf.sqrt(tf.cast(num_units, tf.float32)) + \
positional_encoding(input1, scale=False, zero_pad=False) positional_encoding(input1, scale=False, zero_pad=False)
layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32)) layers[1] = input2*tf.sqrt(tf.cast(num_units, tf.float32))
layers[0] = dropout(layers[0], dropout_rate, is_training) layers[0] = dropout(layers[0], dropout_rate, is_training)
layers[1] = dropout(layers[1], dropout_rate, is_training) layers[1] = dropout(layers[1], dropout_rate, is_training)
layers_sequence_lengths[0] = input1_lengths layers_sequence_lengths[0] = input1_lengths
layers_sequence_lengths[1] = input2_lengths layers_sequence_lengths[1] = input2_lengths
for _, topo_i in enumerate(topology): for _, topo_i in enumerate(topology):
if topo_i == '|': if topo_i == '|':
continue continue
if graph.layers[topo_i].graph_type == LayerType.input.value: if graph.layers[topo_i].graph_type == LayerType.input.value:
# ...... # ......
elif graph.layers[topo_i].graph_type == LayerType.attention.value: elif graph.layers[topo_i].graph_type == LayerType.attention.value:
# ...... # ......
# More layers to handle # More layers to handle
``` ```
As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph. As we can see, this function is actually a compiler, that converts the internal model DAG configuration (which will be introduced in the `Model configuration format` section) `graph`, to a Tensorflow computation graph.
```python ```python
topology = graph.is_topology() topology = graph.is_topology()
``` ```
performs topological sorting on the internal graph representation, and the code inside the loop: performs topological sorting on the internal graph representation, and the code inside the loop:
```python ```python
for _, topo_i in enumerate(topology): for _, topo_i in enumerate(topology):
``` ```
performs actually conversion that maps each layer to a part in Tensorflow computation graph. performs actually conversion that maps each layer to a part in Tensorflow computation graph.
### 4.3 The tuner ### 4.3 The tuner
The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`: The tuner is much more simple than the trial. They actually share the same `graph.py`. Besides, the tuner has a `customer_tuner.py`, the most important class in which is `CustomerTuner`:
```python ```python
class CustomerTuner(Tuner): class CustomerTuner(Tuner):
# ...... # ......
def generate_parameters(self, parameter_id): def generate_parameters(self, parameter_id):
"""Returns a set of trial graph config, as a serializable object. """Returns a set of trial graph config, as a serializable object.
parameter_id : int parameter_id : int
""" """
if len(self.population) <= 0: if len(self.population) <= 0:
logger.debug("the len of poplution lower than zero.") logger.debug("the len of poplution lower than zero.")
raise Exception('The population is empty') raise Exception('The population is empty')
pos = -1 pos = -1
for i in range(len(self.population)): for i in range(len(self.population)):
if self.population[i].result == None: if self.population[i].result == None:
pos = i pos = i
break break
if pos != -1: if pos != -1:
indiv = copy.deepcopy(self.population[pos]) indiv = copy.deepcopy(self.population[pos])
self.population.pop(pos) self.population.pop(pos)
temp = json.loads(graph_dumps(indiv.config)) temp = json.loads(graph_dumps(indiv.config))
else: else:
random.shuffle(self.population) random.shuffle(self.population)
if self.population[0].result > self.population[1].result: if self.population[0].result > self.population[1].result:
self.population[0] = self.population[1] self.population[0] = self.population[1]
indiv = copy.deepcopy(self.population[0]) indiv = copy.deepcopy(self.population[0])
self.population.pop(1) self.population.pop(1)
indiv.mutation() indiv.mutation()
graph = indiv.config graph = indiv.config
temp = json.loads(graph_dumps(graph)) temp = json.loads(graph_dumps(graph))
# ...... # ......
``` ```
As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines: As we can see, the overloaded method `generate_parameters` implements a pretty naive mutation algorithm. The code lines:
```python ```python
if self.population[0].result > self.population[1].result: if self.population[0].result > self.population[1].result:
self.population[0] = self.population[1] self.population[0] = self.population[1]
indiv = copy.deepcopy(self.population[0]) indiv = copy.deepcopy(self.population[0])
``` ```
controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result. controls the mutation process. It will always take two random individuals in the population, only keeping and mutating the one with better result.
### 4.4 Model configuration format ### 4.4 Model configuration format
Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure. Here is an example of the model configuration, which is passed from the tuner to the trial in the architecture search procedure.
```json ```json
{ {
"max_layer_num": 50, "max_layer_num": 50,
"layers": [ "layers": [
{ {
"input_size": 0, "input_size": 0,
"type": 3, "type": 3,
"output_size": 1, "output_size": 1,
"input": [], "input": [],
"size": "x", "size": "x",
"output": [4, 5], "output": [4, 5],
"is_delete": false "is_delete": false
}, },
{ {
"input_size": 0, "input_size": 0,
"type": 3, "type": 3,
"output_size": 1, "output_size": 1,
"input": [], "input": [],
"size": "y", "size": "y",
"output": [4, 5], "output": [4, 5],
"is_delete": false "is_delete": false
}, },
{ {
"input_size": 1, "input_size": 1,
"type": 4, "type": 4,
"output_size": 0, "output_size": 0,
"input": [6], "input": [6],
"size": "x", "size": "x",
"output": [], "output": [],
"is_delete": false "is_delete": false
}, },
{ {
"input_size": 1, "input_size": 1,
"type": 4, "type": 4,
"output_size": 0, "output_size": 0,
"input": [5], "input": [5],
"size": "y", "size": "y",
"output": [], "output": [],
"is_delete": false "is_delete": false
}, },
{"Comment": "More layers will be here for actual graphs."} {"Comment": "More layers will be here for actual graphs."}
] ]
} }
``` ```
Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where: Every model configuration will have a "layers" section, which is a JSON list of layer definitions. The definition of each layer is also a JSON object, where:
* `type` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively. * `type` is the type of the layer. 0, 1, 2, 3, 4 corresponds to attention, self-attention, RNN, input and output layer respectively.
* `size` is the length of the output. "x", "y" correspond to document length / question length, respectively. * `size` is the length of the output. "x", "y" correspond to document length / question length, respectively.
* `input_size` is the number of inputs the layer has. * `input_size` is the number of inputs the layer has.
* `input` is the indices of layers taken as input of this layer. * `input` is the indices of layers taken as input of this layer.
* `output` is the indices of layers use this layer's output as their input. * `output` is the indices of layers use this layer's output as their input.
* `is_delete` means whether the layer is still available. * `is_delete` means whether the layer is still available.
...@@ -7,16 +7,16 @@ Click the tab "Overview". ...@@ -7,16 +7,16 @@ Click the tab "Overview".
* See the experiment trial profile and search space message. * See the experiment trial profile and search space message.
* Support to download the experiment result. * Support to download the experiment result.
![](./img/webui-img/over1.png) ![](../img/webui-img/over1.png)
* See good performance trials. * See good performance trials.
![](./img/webui-img/over2.png) ![](../img/webui-img/over2.png)
## View job default metric ## View job default metric
Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message. Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
![](./img/accuracy.png) ![](../img/accuracy.png)
## View hyper parameter ## View hyper parameter
...@@ -25,13 +25,13 @@ Click the tab "Hyper Parameter" to see the parallel graph. ...@@ -25,13 +25,13 @@ Click the tab "Hyper Parameter" to see the parallel graph.
* You can select the percentage to see top trials. * You can select the percentage to see top trials.
* Choose two axis to swap its positions * Choose two axis to swap its positions
![](./img/hyperPara.png) ![](../img/hyperPara.png)
## View Trial Duration ## View Trial Duration
Click the tab "Trial Duration" to see the bar graph. Click the tab "Trial Duration" to see the bar graph.
![](./img/trial_duration.png) ![](../img/trial_duration.png)
## View trials status ## View trials status
...@@ -39,15 +39,15 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically: ...@@ -39,15 +39,15 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file. * Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
![](./img/webui-img/detail-local.png) ![](../img/webui-img/detail-local.png)
* If you run on OpenPAI or Kubeflow platform, you can also see the hdfsLog. * If you run on OpenPAI or Kubeflow platform, you can also see the hdfsLog.
![](./img/webui-img/detail-pai.png) ![](../img/webui-img/detail-pai.png)
* Kill: you can kill a job that status is running. * Kill: you can kill a job that status is running.
* Support to search for a specific trial. * Support to search for a specific trial.
* Intermediate Result Graph. * Intermediate Result Graph.
![](./img/intermediate.png) ![](../img/intermediate.png)
...@@ -8,7 +8,7 @@ Here is an experimental result of MNIST after using 'Curvefitting' Assessor in ' ...@@ -8,7 +8,7 @@ Here is an experimental result of MNIST after using 'Curvefitting' Assessor in '
*Implemented code directory: config_assessor.yml <https://github.com/Microsoft/nni/blob/master/examples/trials/mnist/config_assessor.yml>* *Implemented code directory: config_assessor.yml <https://github.com/Microsoft/nni/blob/master/examples/trials/mnist/config_assessor.yml>*
.. image:: ./img/Assessor.png .. image:: ../img/Assessor.png
Like Tuners, users can either use built-in Assessors, or customize an Assessor on their own. Please refer to the following tutorials for detail: Like Tuners, users can either use built-in Assessors, or customize an Assessor on their own. Please refer to the following tutorials for detail:
......
...@@ -44,6 +44,7 @@ extensions = [ ...@@ -44,6 +44,7 @@ extensions = [
'sphinx.ext.mathjax', 'sphinx.ext.mathjax',
'sphinx_markdown_tables', 'sphinx_markdown_tables',
'sphinxarg.ext', 'sphinxarg.ext',
'sphinx.ext.napoleon',
] ]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
...@@ -107,7 +108,7 @@ html_theme_options = { ...@@ -107,7 +108,7 @@ html_theme_options = {
# #
# html_sidebars = {} # html_sidebars = {}
html_logo = './img/nni_logo_dark.png' html_logo = '../img/nni_logo_dark.png'
# -- Options for HTMLHelp output --------------------------------------------- # -- Options for HTMLHelp output ---------------------------------------------
......
# GBDT in nni # GBDT in nni
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
Gradient boosting decision tree has many popular implementations, such as [lightgbm](https://github.com/Microsoft/LightGBM), [xgboost](https://github.com/dmlc/xgboost), and [catboost](https://github.com/catboost/catboost), etc. GBDT is a great tool for solving the problem of traditional machine learning problem. Since GBDT is a robust algorithm, it could use in many domains. The better hyper-parameters for GBDT, the better performance you could achieve. Gradient boosting decision tree has many popular implementations, such as [lightgbm](https://github.com/Microsoft/LightGBM), [xgboost](https://github.com/dmlc/xgboost), and [catboost](https://github.com/catboost/catboost), etc. GBDT is a great tool for solving the problem of traditional machine learning problem. Since GBDT is a robust algorithm, it could use in many domains. The better hyper-parameters for GBDT, the better performance you could achieve.
NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently. NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently.
## 1. Search Space in GBDT ## 1. Search Space in GBDT
There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example): There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example):
> * For better accuracy > * For better accuracy
* `learning_rate`. The range of `learning rate` could be [0.001, 0.9]. * `learning_rate`. The range of `learning rate` could be [0.001, 0.9].
* `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them. * `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them.
* `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10] * `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10]
* `num_iterations`. May larger if underfitting. * `num_iterations`. May larger if underfitting.
> * For speed up > * For speed up
* `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0]. * `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0].
* `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0]. * `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0].
* `max_bin`. * `max_bin`.
> * To avoid overfitting > * To avoid overfitting
* `min_data_in_leaf`. This depends on your dataset. * `min_data_in_leaf`. This depends on your dataset.
* `min_sum_hessian_in_leaf`. This depend on your dataset. * `min_sum_hessian_in_leaf`. This depend on your dataset.
* `lambda_l1` and `lambda_l2`. * `lambda_l1` and `lambda_l2`.
* `min_gain_to_split`. * `min_gain_to_split`.
* `num_leaves`. * `num_leaves`.
Reference link: Reference link:
[lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and [autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf) [lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and [autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf)
## 2. Task description ## 2. Task description
Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train). Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train).
Given the features and label in train data, we train a GBDT regression model and use it to predict. Given the features and label in train data, we train a GBDT regression model and use it to predict.
## 3. How to run in nni ## 3. How to run in nni
### 3.1 Prepare your trial code ### 3.1 Prepare your trial code
You need to prepare a basic code as following: You need to prepare a basic code as following:
```python ```python
... ...
def get_default_parameters(): def get_default_parameters():
... ...
return params return params
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'): def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
''' '''
Load or create dataset Load or create dataset
''' '''
... ...
return lgb_train, lgb_eval, X_test, y_test return lgb_train, lgb_eval, X_test, y_test
def run(lgb_train, lgb_eval, params, X_test, y_test): def run(lgb_train, lgb_eval, params, X_test, y_test):
# train # train
gbm = lgb.train(params, gbm = lgb.train(params,
lgb_train, lgb_train,
num_boost_round=20, num_boost_round=20,
valid_sets=lgb_eval, valid_sets=lgb_eval,
early_stopping_rounds=5) early_stopping_rounds=5)
# predict # predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval # eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5 rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse) print('The rmse of prediction is:', rmse)
if __name__ == '__main__': if __name__ == '__main__':
lgb_train, lgb_eval, X_test, y_test = load_data() lgb_train, lgb_eval, X_test, y_test = load_data()
PARAMS = get_default_parameters() PARAMS = get_default_parameters()
# train # train
run(lgb_train, lgb_eval, PARAMS, X_test, y_test) run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
``` ```
### 3.2 Prepare your search space. ### 3.2 Prepare your search space.
If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`, you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow: If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`, you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow:
```json ```json
{ {
"num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]}, "num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]},
"learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]}, "learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]},
"bagging_fraction":{"_type":"uniform","_value":[0.7, 1.0]}, "bagging_fraction":{"_type":"uniform","_value":[0.7, 1.0]},
"bagging_freq":{"_type":"choice","_value":[1, 2, 4, 8, 10]} "bagging_freq":{"_type":"choice","_value":[1, 2, 4, 8, 10]}
} }
``` ```
More support variable type you could reference [here](SearchSpaceSpec.md). More support variable type you could reference [here](SearchSpaceSpec.md).
### 3.3 Add SDK of nni into your code. ### 3.3 Add SDK of nni into your code.
```diff ```diff
+import nni +import nni
... ...
def get_default_parameters(): def get_default_parameters():
... ...
return params return params
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'): def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
''' '''
Load or create dataset Load or create dataset
''' '''
... ...
return lgb_train, lgb_eval, X_test, y_test return lgb_train, lgb_eval, X_test, y_test
def run(lgb_train, lgb_eval, params, X_test, y_test): def run(lgb_train, lgb_eval, params, X_test, y_test):
# train # train
gbm = lgb.train(params, gbm = lgb.train(params,
lgb_train, lgb_train,
num_boost_round=20, num_boost_round=20,
valid_sets=lgb_eval, valid_sets=lgb_eval,
early_stopping_rounds=5) early_stopping_rounds=5)
# predict # predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval # eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5 rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse) print('The rmse of prediction is:', rmse)
+ nni.report_final_result(rmse) + nni.report_final_result(rmse)
if __name__ == '__main__': if __name__ == '__main__':
lgb_train, lgb_eval, X_test, y_test = load_data() lgb_train, lgb_eval, X_test, y_test = load_data()
+ RECEIVED_PARAMS = nni.get_next_parameter() + RECEIVED_PARAMS = nni.get_next_parameter()
PARAMS = get_default_parameters() PARAMS = get_default_parameters()
+ PARAMS.update(RECEIVED_PARAMS) + PARAMS.update(RECEIVED_PARAMS)
PARAMS = get_default_parameters() PARAMS = get_default_parameters()
PARAMS.update(RECEIVED_PARAMS) PARAMS.update(RECEIVED_PARAMS)
# train # train
run(lgb_train, lgb_eval, PARAMS, X_test, y_test) run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
``` ```
### 3.4 Write a config file and run it. ### 3.4 Write a config file and run it.
In the config file, you could set some settings including: In the config file, you could set some settings including:
* Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc. * Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc.
* Platform setting: `trainingServicePlatform`, etc. * Platform setting: `trainingServicePlatform`, etc.
* Path seeting: `searchSpacePath`, `trial codeDir`, etc. * Path seeting: `searchSpacePath`, `trial codeDir`, etc.
* Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc. * Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc.
An config.yml as follow: An config.yml as follow:
```yaml ```yaml
authorName: default authorName: default
experimentName: example_auto-gbdt experimentName: example_auto-gbdt
trialConcurrency: 1 trialConcurrency: 1
maxExecDuration: 10h maxExecDuration: 10h
maxTrialNum: 10 maxTrialNum: 10
#choice: local, remote, pai #choice: local, remote, pai
trainingServicePlatform: local trainingServicePlatform: local
searchSpacePath: search_space.json searchSpacePath: search_space.json
#choice: true, false #choice: true, false
useAnnotation: false useAnnotation: false
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner #choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl) #SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE builtinTunerName: TPE
classArgs: classArgs:
#choice: maximize, minimize #choice: maximize, minimize
optimize_mode: minimize optimize_mode: minimize
trial: trial:
command: python3 main.py command: python3 main.py
codeDir: . codeDir: .
gpuNum: 0 gpuNum: 0
``` ```
Run this experiment with command as follow: Run this experiment with command as follow:
```bash ```bash
nnictl create --config ./config.yml nnictl create --config ./config.yml
``` ```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment