@@ -25,13 +25,15 @@ For more details, please refer to [Features](https://github.com/Microsoft/LightG
...
@@ -25,13 +25,15 @@ For more details, please refer to [Features](https://github.com/Microsoft/LightG
News
News
----
----
07/13/2017: [Gitter](https://gitter.im/Microsoft/LightGBM) is avaiable.
08/15/2017: Optimal split for categorical features.
06/20/2017: Python-package is on [PyPI](https://pypi.python.org/pypi/lightgbm) now.
07/13/2017: [Gitter](https://gitter.im/Microsoft/LightGBM) is avaiable.
06/09/2017: [LightGBM Slack team](https://lightgbm.slack.com) is available.
06/20/2017: Python-package is on [PyPI](https://pypi.python.org/pypi/lightgbm) now.
05/03/2017: LightGBM v2 stable release.
06/09/2017 : [LightGBM Slack team](https://lightgbm.slack.com) is available.
05/03/2017 : LightGBM v2 stable release.
04/10/2017 : LightGBM supports GPU-accelerated tree learning now. Please read our [GPU Tutorial](./docs/GPU-Tutorial.md) and [Performance Comparison](./docs/GPU-Performance.md).
04/10/2017 : LightGBM supports GPU-accelerated tree learning now. Please read our [GPU Tutorial](./docs/GPU-Tutorial.md) and [Performance Comparison](./docs/GPU-Performance.md).
...
@@ -41,7 +43,7 @@ News
...
@@ -41,7 +43,7 @@ News
01/08/2017 : Release [**R-package**](https://github.com/Microsoft/LightGBM/tree/master/R-package) beta version, welcome to have a try and provide feedback.
01/08/2017 : Release [**R-package**](https://github.com/Microsoft/LightGBM/tree/master/R-package) beta version, welcome to have a try and provide feedback.
12/05/2016 : **Categorical Features as input directly** (without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding.
12/05/2016 : **Categorical Features as input directly** (without one-hot coding).
12/02/2016 : Release [**python-package**](https://github.com/Microsoft/LightGBM/tree/master/python-package) beta version, welcome to have a try and provide feedback.
12/02/2016 : Release [**python-package**](https://github.com/Microsoft/LightGBM/tree/master/python-package) beta version, welcome to have a try and provide feedback.
* LightGBM enables the missing value handle by default, you can disable it by set ```use_missing=false```.
* LightGBM uses NA (NAN) to represent the missing value by default, you can change it to use zero by set ```zero_as_missing=true```.
* When ```zero_as_missing=false``` (default), the unshown value in sparse matrices (and LightSVM) is treated as zeros.
* When ```zero_as_missing=true```, NA and zeros (including unshown value in sparse matrices (and LightSVM)) are treated as missing.
## Categorical feature support
* LightGBM can offer a good accuracy when using native categorical features. Not like simply one-hot coding, LightGBM can find the optimal split of categorical features. Such a optimal split can provide the much better accuracy than one-hot coding solution.
* Use `categorical_feature` to specific the categorical features. Refer to the parameter `categorical_feature` in [Parameters](./Parameters.md).
* Need to convert to `int` type first, and only support non-negative numbers. It is better to convert into continues ranges.
* Use `max_cat_group`, `cat_smooth_ratio` to deal with over-fitting (when #data is small or #category is large).
* For categocal features with high cardinality (#categoriy is large), it is better to convert it to numerical features.
## LambdaRank
* The label should be `int` type, and larger number represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect).
* Use `label_gain` to set the gain(weight) of `int` label.
* Use `max_position` to set the NDCG optimization position.
## Parameters Tuning
* Refer to [Parameters tuning](./Parameters-tuning.md).
## GPU support
* Refer to [GPU Tutorial](./GPU-Tutorial.md) and [GPU Targets](./GPU-Targets.md).
## Parallel Learning
* Refer to https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide
This is a page contains all parameters in LightGBM command line program.
This is a page contains all parameters in LightGBM.
***List of other Helpful Links***
***List of other Helpful Links***
*[Python API Reference](./Python-API.md)
*[Python API Reference](./Python-API.md)
...
@@ -125,6 +125,21 @@ The parameter format is `key1=value1 key2=value2 ... ` . And parameters can be s
...
@@ -125,6 +125,21 @@ The parameter format is `key1=value1 key2=value2 ... ` . And parameters can be s
* only used in `goss`, the retain ratio of large gradient data
* only used in `goss`, the retain ratio of large gradient data
*`other_rate`, default=`0.1`, type=int
*`other_rate`, default=`0.1`, type=int
* only used in `goss`, the retain ratio of small gradient data
* only used in `goss`, the retain ratio of small gradient data
*`max_cat_group`, default=`64`, type=int
* use for the categorical features.
* When #catogory is large, finding the split point on it is easily over-fitting. So LightGBM merges them into `max_cat_group` groups, and finds the split points on the group boundaries.
*`min_data_per_group`, default=`10`, type=int
* Min number of data per categorical group.
*`max_cat_threshold`, default=`256`, type=int
* use for the categorical features. Limit the max threshold points in categorical features.
*`min_cat_smooth`, default=`5`, type=double
* use for the categorical features. Refer to the descrption in paramater `cat_smooth_ratio`.
*`max_cat_smooth`, default=`100`, type=double
* use for the categorical features. Refer to the descrption in paramater `cat_smooth_ratio`.
*`cat_smooth_ratio`, default=`0.01`, type=double
* use for the categorical features. This can reduce the effect of noises in categorical features, especially for categories with few data.
* The smooth denominator is `a = min(max_cat_smooth, max(min_cat_smooth, num_data/num_category*cat_smooth_ratio))`.
* The smooth numerator is `b = a * sum_gradient / sum_hessian`.
## IO parameters
## IO parameters
...
@@ -181,7 +196,7 @@ The parameter format is `key1=value1 key2=value2 ... ` . And parameters can be s
...
@@ -181,7 +196,7 @@ The parameter format is `key1=value1 key2=value2 ... ` . And parameters can be s
* specific categorical features
* specific categorical features
* Use number for index, e.g. `categorical_feature=0,1,2` means column_0, column_1 and column_2 are categorical features.
* Use number for index, e.g. `categorical_feature=0,1,2` means column_0, column_1 and column_2 are categorical features.
* Add a prefix `name:` for column name, e.g. `categorical_feature=name:c1,c2,c3` means c1, c2 and c3 are categorical features.
* Add a prefix `name:` for column name, e.g. `categorical_feature=name:c1,c2,c3` means c1, c2 and c3 are categorical features.
* Note: Only support categorical with `int` type. Index start from `0`. And it doesn't count the label column.
* Note: Only support categorical with `int` type (Note: the negative values will be treated as Missing values). Index start from `0`. And it doesn't count the label column.