Unverified Commit 8b9bcf16 authored by Yan Ni's avatar Yan Ni Committed by GitHub
Browse files

move docs under en_US (#744)

* move en docs under en_US

* update gitignore

* fix image link

* add doc guide

* add napoleon ext
parent bbd441b3
......@@ -62,17 +62,17 @@ nnictl create --config exp_pai.yml
```
to start the experiment in pai mode. NNI will create OpenPAI job for each trial, and the job name format is something like `nni_exp_{experiment_id}_trial_{trial_id}`.
You can see jobs created by NNI in the OpenPAI cluster's web portal, like:
![](./img/nni_pai_joblist.jpg)
![](../img/nni_pai_joblist.jpg)
Notice: In pai mode, NNIManager will start a rest server and listen on a port which is your NNI WebUI's port plus 1. For example, if your WebUI port is `8080`, the rest server will listen on `8081`, to receive metrics from trial job running in Kubernetes. So you should `enable 8081` TCP port in your firewall rule to allow incoming traffic.
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
Expand a trial information in trial list view, click the logPath link like:
![](./img/nni_webui_joblist.jpg)
![](../img/nni_webui_joblist.jpg)
And you will be redirected to HDFS web portal to browse the output files of that trial in HDFS:
![](./img/nni_trial_hdfs_output.jpg)
![](../img/nni_trial_hdfs_output.jpg)
You can see there're three fils in output folder: stderr, stdout, and trial.log
......
......@@ -183,28 +183,28 @@ Click the tab "Overview".
Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also support `download these information and parameters` through the **Download** button. You can download the experiment result anytime in the middle for the running or at the end of the execution, etc.
![](./img/QuickStart1.png)
![](../img/QuickStart1.png)
Top 10 trials will be listed in the Overview page, you can browse all the trials in "Trials Detail" page.
![](./img/QuickStart2.png)
![](../img/QuickStart2.png)
#### View trials detail page
Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
![](./img/QuickStart3.png)
![](../img/QuickStart3.png)
Click the tab "Hyper Parameter" to see the parallel graph.
* You can select the percentage to see top trials.
* Choose two axis to swap its positions
![](./img/QuickStart4.png)
![](../img/QuickStart4.png)
Click the tab "Trial Duration" to see the bar graph.
![](./img/QuickStart5.png)
![](../img/QuickStart5.png)
Below is the status of the all trials. Specifically:
......@@ -213,11 +213,11 @@ Below is the status of the all trials. Specifically:
* Kill: you can kill a job that status is running.
* Support to search for a specific trial.
![](./img/QuickStart6.png)
![](../img/QuickStart6.png)
* Intermediate Result Grap
![](./img/QuickStart7.png)
![](../img/QuickStart7.png)
## Related Topic
......
......@@ -7,16 +7,16 @@ Click the tab "Overview".
* See the experiment trial profile and search space message.
* Support to download the experiment result.
![](./img/webui-img/over1.png)
![](../img/webui-img/over1.png)
* See good performance trials.
![](./img/webui-img/over2.png)
![](../img/webui-img/over2.png)
## View job default metric
Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
![](./img/accuracy.png)
![](../img/accuracy.png)
## View hyper parameter
......@@ -25,13 +25,13 @@ Click the tab "Hyper Parameter" to see the parallel graph.
* You can select the percentage to see top trials.
* Choose two axis to swap its positions
![](./img/hyperPara.png)
![](../img/hyperPara.png)
## View Trial Duration
Click the tab "Trial Duration" to see the bar graph.
![](./img/trial_duration.png)
![](../img/trial_duration.png)
## View trials status
......@@ -39,15 +39,15 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
![](./img/webui-img/detail-local.png)
![](../img/webui-img/detail-local.png)
* If you run on OpenPAI or Kubeflow platform, you can also see the hdfsLog.
![](./img/webui-img/detail-pai.png)
![](../img/webui-img/detail-pai.png)
* Kill: you can kill a job that status is running.
* Support to search for a specific trial.
* Intermediate Result Graph.
![](./img/intermediate.png)
![](../img/intermediate.png)
......@@ -8,7 +8,7 @@ Here is an experimental result of MNIST after using 'Curvefitting' Assessor in '
*Implemented code directory: config_assessor.yml <https://github.com/Microsoft/nni/blob/master/examples/trials/mnist/config_assessor.yml>*
.. image:: ./img/Assessor.png
.. image:: ../img/Assessor.png
Like Tuners, users can either use built-in Assessors, or customize an Assessor on their own. Please refer to the following tutorials for detail:
......
......@@ -44,6 +44,7 @@ extensions = [
'sphinx.ext.mathjax',
'sphinx_markdown_tables',
'sphinxarg.ext',
'sphinx.ext.napoleon',
]
# Add any paths that contain templates here, relative to this directory.
......@@ -107,7 +108,7 @@ html_theme_options = {
#
# html_sidebars = {}
html_logo = './img/nni_logo_dark.png'
html_logo = '../img/nni_logo_dark.png'
# -- Options for HTMLHelp output ---------------------------------------------
......
# GBDT in nni
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
Gradient boosting decision tree has many popular implementations, such as [lightgbm](https://github.com/Microsoft/LightGBM), [xgboost](https://github.com/dmlc/xgboost), and [catboost](https://github.com/catboost/catboost), etc. GBDT is a great tool for solving the problem of traditional machine learning problem. Since GBDT is a robust algorithm, it could use in many domains. The better hyper-parameters for GBDT, the better performance you could achieve.
NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently.
## 1. Search Space in GBDT
There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example):
> * For better accuracy
* `learning_rate`. The range of `learning rate` could be [0.001, 0.9].
* `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them.
* `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10]
* `num_iterations`. May larger if underfitting.
> * For speed up
* `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0].
* `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0].
* `max_bin`.
> * To avoid overfitting
* `min_data_in_leaf`. This depends on your dataset.
* `min_sum_hessian_in_leaf`. This depend on your dataset.
* `lambda_l1` and `lambda_l2`.
* `min_gain_to_split`.
* `num_leaves`.
Reference link:
[lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and [autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf)
## 2. Task description
Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train).
Given the features and label in train data, we train a GBDT regression model and use it to predict.
## 3. How to run in nni
### 3.1 Prepare your trial code
You need to prepare a basic code as following:
```python
...
def get_default_parameters():
...
return params
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
'''
Load or create dataset
'''
...
return lgb_train, lgb_eval, X_test, y_test
def run(lgb_train, lgb_eval, params, X_test, y_test):
# train
gbm = lgb.train(params,
lgb_train,
num_boost_round=20,
valid_sets=lgb_eval,
early_stopping_rounds=5)
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse)
if __name__ == '__main__':
lgb_train, lgb_eval, X_test, y_test = load_data()
PARAMS = get_default_parameters()
# train
run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
```
### 3.2 Prepare your search space.
If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`, you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow:
```json
{
"num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]},
"learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]},
"bagging_fraction":{"_type":"uniform","_value":[0.7, 1.0]},
"bagging_freq":{"_type":"choice","_value":[1, 2, 4, 8, 10]}
}
```
More support variable type you could reference [here](SearchSpaceSpec.md).
### 3.3 Add SDK of nni into your code.
```diff
+import nni
...
def get_default_parameters():
...
return params
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
'''
Load or create dataset
'''
...
return lgb_train, lgb_eval, X_test, y_test
def run(lgb_train, lgb_eval, params, X_test, y_test):
# train
gbm = lgb.train(params,
lgb_train,
num_boost_round=20,
valid_sets=lgb_eval,
early_stopping_rounds=5)
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse)
+ nni.report_final_result(rmse)
if __name__ == '__main__':
lgb_train, lgb_eval, X_test, y_test = load_data()
+ RECEIVED_PARAMS = nni.get_next_parameter()
PARAMS = get_default_parameters()
+ PARAMS.update(RECEIVED_PARAMS)
PARAMS = get_default_parameters()
PARAMS.update(RECEIVED_PARAMS)
# train
run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
```
### 3.4 Write a config file and run it.
In the config file, you could set some settings including:
* Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc.
* Platform setting: `trainingServicePlatform`, etc.
* Path seeting: `searchSpacePath`, `trial codeDir`, etc.
* Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc.
An config.yml as follow:
```yaml
authorName: default
experimentName: example_auto-gbdt
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: minimize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
```
Run this experiment with command as follow:
```bash
nnictl create --config ./config.yml
# GBDT in nni
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion as other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
Gradient boosting decision tree has many popular implementations, such as [lightgbm](https://github.com/Microsoft/LightGBM), [xgboost](https://github.com/dmlc/xgboost), and [catboost](https://github.com/catboost/catboost), etc. GBDT is a great tool for solving the problem of traditional machine learning problem. Since GBDT is a robust algorithm, it could use in many domains. The better hyper-parameters for GBDT, the better performance you could achieve.
NNI is a great platform for tuning hyper-parameters, you could try various builtin search algorithm in nni and run multiple trials concurrently.
## 1. Search Space in GBDT
There are many hyper-parameters in GBDT, but what kind of parameters will affect the performance or speed? Based on some practical experience, some suggestion here(Take lightgbm as example):
> * For better accuracy
* `learning_rate`. The range of `learning rate` could be [0.001, 0.9].
* `num_leaves`. `num_leaves` is related to `max_depth`, you don't have to tune both of them.
* `bagging_freq`. `bagging_freq` could be [1, 2, 4, 8, 10]
* `num_iterations`. May larger if underfitting.
> * For speed up
* `bagging_fraction`. The range of `bagging_fraction` could be [0.7, 1.0].
* `feature_fraction`. The range of `feature_fraction` could be [0.6, 1.0].
* `max_bin`.
> * To avoid overfitting
* `min_data_in_leaf`. This depends on your dataset.
* `min_sum_hessian_in_leaf`. This depend on your dataset.
* `lambda_l1` and `lambda_l2`.
* `min_gain_to_split`.
* `num_leaves`.
Reference link:
[lightgbm](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html) and [autoxgoboost](https://github.com/ja-thomas/autoxgboost/blob/master/poster_2018.pdf)
## 2. Task description
Now we come back to our example "auto-gbdt" which run in lightgbm and nni. The data including [train data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train) and [test data](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/data/regression.train).
Given the features and label in train data, we train a GBDT regression model and use it to predict.
## 3. How to run in nni
### 3.1 Prepare your trial code
You need to prepare a basic code as following:
```python
...
def get_default_parameters():
...
return params
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
'''
Load or create dataset
'''
...
return lgb_train, lgb_eval, X_test, y_test
def run(lgb_train, lgb_eval, params, X_test, y_test):
# train
gbm = lgb.train(params,
lgb_train,
num_boost_round=20,
valid_sets=lgb_eval,
early_stopping_rounds=5)
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse)
if __name__ == '__main__':
lgb_train, lgb_eval, X_test, y_test = load_data()
PARAMS = get_default_parameters()
# train
run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
```
### 3.2 Prepare your search space.
If you like to tune `num_leaves`, `learning_rate`, `bagging_fraction` and `bagging_freq`, you could write a [search_space.json](https://github.com/Microsoft/nni/blob/master/examples/trials/auto-gbdt/search_space.json) as follow:
```json
{
"num_leaves":{"_type":"choice","_value":[31, 28, 24, 20]},
"learning_rate":{"_type":"choice","_value":[0.01, 0.05, 0.1, 0.2]},
"bagging_fraction":{"_type":"uniform","_value":[0.7, 1.0]},
"bagging_freq":{"_type":"choice","_value":[1, 2, 4, 8, 10]}
}
```
More support variable type you could reference [here](SearchSpaceSpec.md).
### 3.3 Add SDK of nni into your code.
```diff
+import nni
...
def get_default_parameters():
...
return params
def load_data(train_path='./data/regression.train', test_path='./data/regression.test'):
'''
Load or create dataset
'''
...
return lgb_train, lgb_eval, X_test, y_test
def run(lgb_train, lgb_eval, params, X_test, y_test):
# train
gbm = lgb.train(params,
lgb_train,
num_boost_round=20,
valid_sets=lgb_eval,
early_stopping_rounds=5)
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
rmse = mean_squared_error(y_test, y_pred) ** 0.5
print('The rmse of prediction is:', rmse)
+ nni.report_final_result(rmse)
if __name__ == '__main__':
lgb_train, lgb_eval, X_test, y_test = load_data()
+ RECEIVED_PARAMS = nni.get_next_parameter()
PARAMS = get_default_parameters()
+ PARAMS.update(RECEIVED_PARAMS)
PARAMS = get_default_parameters()
PARAMS.update(RECEIVED_PARAMS)
# train
run(lgb_train, lgb_eval, PARAMS, X_test, y_test)
```
### 3.4 Write a config file and run it.
In the config file, you could set some settings including:
* Experiment setting: `trialConcurrency`, `maxExecDuration`, `maxTrialNum`, `trial gpuNum`, etc.
* Platform setting: `trainingServicePlatform`, etc.
* Path seeting: `searchSpacePath`, `trial codeDir`, etc.
* Algorithm setting: select `tuner` algorithm, `tuner optimize_mode`, etc.
An config.yml as follow:
```yaml
authorName: default
experimentName: example_auto-gbdt
trialConcurrency: 1
maxExecDuration: 10h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: minimize
trial:
command: python3 main.py
codeDir: .
gpuNum: 0
```
Run this experiment with command as follow:
```bash
nnictl create --config ./config.yml
```
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment