@@ -4,251 +4,259 @@ This is a page contains all parameters in LightGBM.
...
@@ -4,251 +4,259 @@ This is a page contains all parameters in LightGBM.
*[Python API Reference](./Python-API.md)
*[Python API Reference](./Python-API.md)
*[Parameters Tuning](./Parameters-tuning.md)
*[Parameters Tuning](./Parameters-tuning.md)
***Update of 04/13/2017***
Default values for the following parameters have changed:
* min_data_in_leaf = 100 => 20
* min_sum_hessian_in_leaf = 10 => 1e-3
* num_leaves = 255 => 31
* num_iterations = 100 => 10
## Parameter format
## Parameter format
The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can be set both in config file and command line. By using command line, parameters should not have spaces before and after ```=```. By using config files, one line can only contain one parameter. you can use ```#``` to comment. If one parameter appears in both command line and config file, LightGBM will use the parameter in command line.
The parameter format is `key1=value1 key2=value2 ... ` . And parameters can be set both in config file and command line. By using command line, parameters should not have spaces before and after `=`. By using config files, one line can only contain one parameter. you can use `#` to comment. If one parameter appears in both command line and config file, LightGBM will use the parameter in command line.
* For the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPU using [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) to generate 2 threads per CPU core).
* For the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPU using [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) to generate 2 threads per CPU core).
* For parallel learning, should not use full CPU cores since this will cause poor performance for the network.
* For parallel learning, should not use full CPU cores since this will cause poor performance for the network.
* Choose device for the tree learning, can use gpu to achieve the faster learning.
* Choose device for the tree learning, can use gpu to achieve the faster learning.
* Note: 1. Recommend use the smaller ```max_bin```(e.g ```63```) to get the better speed up. 2. For the faster speed, GPU use 32-bit float point to sum up by default, may affect the accuracy for some tasks. You can set ```gpu_use_dp=true``` to enable 64-bit float point, but it will slow down the training. 3. Refer to [Installation Guide](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide#with-gpu-support) to build with GPU .
* Note: 1. Recommend use the smaller `max_bin`(e.g `63`) to get the better speed up. 2. For the faster speed, GPU use 32-bit float point to sum up by default, may affect the accuracy for some tasks. You can set `gpu_use_dp=true` to enable 64-bit float point, but it will slow down the training. 3. Refer to [Installation Guide](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide#with-gpu-support) to build with GPU .
## Learning control parameters
## Learning control parameters
*```max_depth```, default=```-1```, type=int
*`max_depth`, default=`-1`, type=int
* Limit the max depth for tree model. This is used to deal with overfit when #data is small. Tree still grow by leaf-wise.
* Limit the max depth for tree model. This is used to deal with overfit when #data is small. Tree still grow by leaf-wise.
* LightGBM will random select part of features on each iteration if ```feature_fraction``` smaller than ```1.0```. For example, if set to ```0.8```, will select 80% features before training each tree.
* LightGBM will random select part of features on each iteration if `feature_fraction` smaller than `1.0`. For example, if set to `0.8`, will select 80% features before training each tree.
* by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed. But it may out of memory when the data file is very big.
* by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed. But it may out of memory when the data file is very big.
* set this to ```true``` if data file is too big to fit in memory.
* set this to `true` if data file is too big to fit in memory.
* Use number for index, e.g. ```weight=0``` means column_0 is the weight
* Use number for index, e.g. `weight=0` means column_0 is the weight
* Add a prefix ```name:``` for column name, e.g. ```weight=name:weight```
* Add a prefix `name:` for column name, e.g. `weight=name:weight`
* Note: Index start from ```0```. And it doesn't count the label column when passing type is Index. e.g. when label is column_0, and weight is column_1, the correct parameter is ```weight=0```.
* Note: Index start from `0`. And it doesn't count the label column when passing type is Index. e.g. when label is column_0, and weight is column_1, the correct parameter is `weight=0`.
* Use number for index, e.g. ```query=0``` means column_0 is the query id
* Use number for index, e.g. `query=0` means column_0 is the query id
* Add a prefix ```name:``` for column name, e.g. ```query=name:query_id```
* Add a prefix `name:` for column name, e.g. `query=name:query_id`
* Note: Data should group by query_id. Index start from ```0```. And it doesn't count the label column when passing type is Index. e.g. when label is column_0, and query_id is column_1, the correct parameter is ```query=0```.
* Note: Data should group by query_id. Index start from `0`. And it doesn't count the label column when passing type is Index. e.g. when label is column_0, and query_id is column_1, the correct parameter is `query=0`.
*```metric```, default={```l2``` for regression}, {```binary_logloss``` for binary classification},{```ndcg``` for lambdarank}, type=multi-enum, options=```l1```,```l2```,```ndcg```,```auc```,```binary_logloss```,```binary_error```...
*`metric`, default={`l2` for regression}, {`binary_logloss` for binary classification},{`ndcg` for lambdarank}, type=multi-enum, options=`l1`,`l2`,`ndcg`,`auc`,`binary_logloss`,`binary_error`...
* OpenCL platform ID. Usually each GPU vendor exposes one OpenCL platform.
* OpenCL platform ID. Usually each GPU vendor exposes one OpenCL platform.
* Default value is -1, using the system-wide default platform.
* Default value is -1, using the system-wide default platform.
*```gpu_device_id```, default=```-1```, type=int
*`gpu_device_id`, default=`-1`, type=int
* OpenCL device ID in the specified platform. Each GPU in the selected platform has a unique device ID.
* OpenCL device ID in the specified platform. Each GPU in the selected platform has a unique device ID.
* Default value is -1, using the default device in the selected platform.
* Default value is -1, using the default device in the selected platform.
*```gpu_use_dp```, default=```false```, type=bool
*`gpu_use_dp`, default=`false`, type=bool
* Set to true to use double precision math on GPU (default using single precision).
* Set to true to use double precision math on GPU (default using single precision).
## Others
## Others
...
@@ -263,7 +271,7 @@ LightGBM support continued train with initial score. It uses an additional file
...
@@ -263,7 +271,7 @@ LightGBM support continued train with initial score. It uses an additional file
...
...
```
```
It means the initial score of first data is ```0.5```, second is ```-0.1```, and so on. The initial score file corresponds with data file line by line, and has per score per line. And if the name of data file is "train.txt", the initial score file should be named as "train.txt.init" and in the same folder as the data file. And LightGBM will auto load initial score file if it exists.
It means the initial score of first data is `0.5`, second is `-0.1`, and so on. The initial score file corresponds with data file line by line, and has per score per line. And if the name of data file is "train.txt", the initial score file should be named as "train.txt.init" and in the same folder as the data file. And LightGBM will auto load initial score file if it exists.
### Weight data
### Weight data
...
@@ -276,10 +284,10 @@ LightGBM support weighted training. It uses an additional file to store weight d
...
@@ -276,10 +284,10 @@ LightGBM support weighted training. It uses an additional file to store weight d
...
...
```
```
It means the weight of first data is ```1.0```, second is ```0.5```, and so on. The weight file corresponds with data file line by line, and has per weight per line. And if the name of data file is "train.txt", the weight file should be named as "train.txt.weight" and in the same folder as the data file. And LightGBM will auto load weight file if it exists.
It means the weight of first data is `1.0`, second is `0.5`, and so on. The weight file corresponds with data file line by line, and has per weight per line. And if the name of data file is "train.txt", the weight file should be named as "train.txt.weight" and in the same folder as the data file. And LightGBM will auto load weight file if it exists.
update:
update:
You can specific weight column in data file now. Please refer to parameter ```weight``` in above.
You can specific weight column in data file now. Please refer to parameter `weight` in above.
### Query data
### Query data
...
@@ -292,7 +300,6 @@ For LambdaRank learning, it needs query information for training data. LightGBM
...
@@ -292,7 +300,6 @@ For LambdaRank learning, it needs query information for training data. LightGBM
...
...
```
```
It means first ```27``` lines samples belong one query and next ```18``` lines belong to another, and so on.(**Note: data should order by query**) If name of data file is "train.txt", the query file should be named as "train.txt.query" and in same folder of training data. LightGBM will load the query file automatically if it exists.
It means first `27` lines samples belong one query and next `18` lines belong to another, and so on.(**Note: data should order by query**) If name of data file is "train.txt", the query file should be named as "train.txt.query" and in same folder of training data. LightGBM will load the query file automatically if it exists.
update:
You can specific query/group id in data file now. Please refer to parameter `group` in above.
You can specific query/group id in data file now. Please refer to parameter ```group``` in above.