@@ -12,14 +12,15 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith
For more details, please refer to [Features](https://github.com/Microsoft/LightGBM/wiki/Features).
[Experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment) on public datasets show that LightGBM can outperform other existing boosting framework on both efficiency and accuracy, with significant lower memory consumption. What's more, the [experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
[Experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the [experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
News
----
02/20/2017 : Update to LightGBM v2.
01/08/2017 : Release [**R-package**](./R-package) beta version, welcome to have a try and provide feedback.
12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding (refer to [categorical log](https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log](https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
For the setting details, please refer to [IO Parameters](./docs/Parameters.md#io-parameters).
12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding.
12/02/2016 : Release [**python-package**](./python-package) beta version, welcome to have a try and provide feedback.
...
...
@@ -43,7 +44,7 @@ LightGBM has been developed and used by many active community members. Your help
- Check out [call for contributions](https://github.com/Microsoft/LightGBM/issues?q=is%3Aissue+is%3Aopen+label%3Acall-for-contribution) to see what can be improved, or open an issue if you want something.
- Contribute to the [tests](https://github.com/Microsoft/LightGBM/tree/master/tests) to make it more reliable.
- Contribute to the [documents](https://github.com/Microsoft/LightGBM/tree/master/docs) to make it clearly for everyone.
- Contribute to the [documents](https://github.com/Microsoft/LightGBM/tree/master/docs) to make it clearer for everyone.
- Contribute to the [examples](https://github.com/Microsoft/LightGBM/tree/master/examples) to share your experience with other users.
- Check out [Development Guide](./docs/development.md).
- Open issue if you met problems during development.
-**Solution 1**: this error should be solved in latest version. If you still meet this error, try to remove lightgbm.egg-info folder in your python-package and reinstall, or check [this thread on stackoverflow](http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path).
-**Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already contruct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`.
-**Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already construct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`.
-**Solution 2**: Because LightGBM contructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when contruct a Booster. And if you set free_raw_data=True (default), the raw data (with python data struct) will be freed. So, if you want to:
-**Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when construct a Booster. And if you set free_raw_data=True (default), the raw data (with python data struct) will be freed. So, if you want to:
+ get label(or weight/init_score/group) before contruct dataset, it's same as get `self.label`
+ set label(or weight/init_score/group) before contruct dataset, it's same as `self.label=some_label_array`
+ get num_data(or num_feature) before contruct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape`
+ set predictor(or reference/categorical feature) after contruct dataset, you should set free_raw_data=False or init a Dataset object with the same raw data
+ get label(or weight/init_score/group) before construct dataset, it's same as get `self.label`
+ set label(or weight/init_score/group) before construct dataset, it's same as `self.label=some_label_array`
+ get num_data(or num_feature) before construct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape`
+ set predictor(or reference/categorical feature) after construct dataset, you should set free_raw_data=False or init a Dataset object with the same raw data
@@ -94,6 +96,10 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* only used in ```dart```, true if want to use xgboost dart mode
*```drop_seed```, default=```4```, type=int
* only used in ```dart```, used to random seed to choose dropping models.
*```top_rate```, default=```0.2```, type=double
* only used in ```goss```, the retain ratio of large gradient data
*```other_rate```, default=```0.1```, type=int
* only used in ```goss```, the retain ratio of small gradient data
## IO parameters
...
...
@@ -173,13 +179,15 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* parameter for [Huber loss](https://en.wikipedia.org/wiki/Huber_loss"Huber loss - Wikipedia"). Will be used in regression task.
*```fair_c```, default=```1.0```, type=double
* parameter for [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html). Will be used in regression task.
@@ -10,6 +10,11 @@ This document gives a basic walkthrough of LightGBM python package.
Install
-------
* Install the library first, follow the wiki [here](./Installation-Guide.md).
* Install python-package dependencies, `setuptools`, `numpy` and `scipy` is required, `scikit-learn` is required for sklearn interface and recommended. Run:
@@ -18,7 +18,7 @@ Label is the data of first column, and there is no header in the file.
update 12/5/2016:
LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding (refer to [categorical log](https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log](https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding.
For the setting details, please refer to [Parameters](./Parameters.md#io-parameters).
...
...
@@ -103,7 +103,7 @@ For example, following command line will keep 'num_trees=10' and ignore same par
@@ -6,10 +6,9 @@ Here is an example for LightGBM to use python package.
For the installation, check the wiki [here](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide).
You also need scikit-learn and pandas to run the examples, but they are not required for the package itself. You can install them with pip:
You also need scikit-learn, pandas and matplotlib (only for plot example) to run the examples, but they are not required for the package itself. You can install them with pip:
```
pip install -U scikit-learn
pip install -U pandas
pip install scikit-learn pandas matplotlib -U
```
Now you can run examples in this folder, for example: