Commit eade219e authored by Qiwei Ye's avatar Qiwei Ye
Browse files

merge conflict

parents f23e6083 060bd316
...@@ -12,14 +12,15 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith ...@@ -12,14 +12,15 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith
For more details, please refer to [Features](https://github.com/Microsoft/LightGBM/wiki/Features). For more details, please refer to [Features](https://github.com/Microsoft/LightGBM/wiki/Features).
[Experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment) on public datasets show that LightGBM can outperform other existing boosting framework on both efficiency and accuracy, with significant lower memory consumption. What's more, the [experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings. [Experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the [experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
News News
---- ----
02/20/2017 : Update to LightGBM v2.
01/08/2017 : Release [**R-package**](./R-package) beta version, welcome to have a try and provide feedback. 01/08/2017 : Release [**R-package**](./R-package) beta version, welcome to have a try and provide feedback.
12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)). 12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding.
For the setting details, please refer to [IO Parameters](./docs/Parameters.md#io-parameters).
12/02/2016 : Release [**python-package**](./python-package) beta version, welcome to have a try and provide feedback. 12/02/2016 : Release [**python-package**](./python-package) beta version, welcome to have a try and provide feedback.
...@@ -43,7 +44,7 @@ LightGBM has been developed and used by many active community members. Your help ...@@ -43,7 +44,7 @@ LightGBM has been developed and used by many active community members. Your help
- Check out [call for contributions](https://github.com/Microsoft/LightGBM/issues?q=is%3Aissue+is%3Aopen+label%3Acall-for-contribution) to see what can be improved, or open an issue if you want something. - Check out [call for contributions](https://github.com/Microsoft/LightGBM/issues?q=is%3Aissue+is%3Aopen+label%3Acall-for-contribution) to see what can be improved, or open an issue if you want something.
- Contribute to the [tests](https://github.com/Microsoft/LightGBM/tree/master/tests) to make it more reliable. - Contribute to the [tests](https://github.com/Microsoft/LightGBM/tree/master/tests) to make it more reliable.
- Contribute to the [documents](https://github.com/Microsoft/LightGBM/tree/master/docs) to make it clearly for everyone. - Contribute to the [documents](https://github.com/Microsoft/LightGBM/tree/master/docs) to make it clearer for everyone.
- Contribute to the [examples](https://github.com/Microsoft/LightGBM/tree/master/examples) to share your experience with other users. - Contribute to the [examples](https://github.com/Microsoft/LightGBM/tree/master/examples) to share your experience with other users.
- Check out [Development Guide](./docs/development.md). - Check out [Development Guide](./docs/development.md).
- Open issue if you met problems during development. - Open issue if you met problems during development.
......
# Using LightGBM via Docker
This directory contains `Dockerfile` to make it easy to build and run LightGBM via [Docker](http://www.docker.com/).
## Installing Docker
Follow the general installation instructions
[on the Docker site](https://docs.docker.com/installation/):
* [OSX](https://docs.docker.com/installation/mac/): [docker toolbox](https://www.docker.com/toolbox)
* [ubuntu](https://docs.docker.com/installation/ubuntulinux/)
## Running the container
Build the container, for python users:
$ docker build -t lightgbm -f dockerfile-python .
After build finished, run the container:
$ docker run --rm -it lightgbm
FROM ubuntu:16.04
RUN apt-get update && \
apt-get install -y cmake build-essential gcc g++ git wget && \
# open-mpi
cd /usr/local/src && mkdir openmpi && cd openmpi && \
wget https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1.tar.gz && \
tar -xzf openmpi-2.0.1.tar.gz && cd openmpi-2.0.1 && \
./configure --prefix=/usr/local/openmpi && make && make install && \
export PATH="/usr/local/openmpi/bin:$PATH" && \
# lightgbm
cd /usr/local/src && mkdir lightgbm && cd lightgbm && \
git clone --recursive https://github.com/Microsoft/LightGBM && \
cd LightGBM && mkdir build && cd build && cmake -DUSE_MPI=ON .. && make && \
# python-package
# miniconda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
/bin/bash Miniconda3-latest-Linux-x86_64.sh -f -b -p /opt/conda && \
export PATH="/opt/conda/bin:$PATH" && \
# lightgbm
conda install -y numpy scipy scikit-learn pandas && \
cd ../python-package && python setup.py install && \
# clean
apt-get autoremove -y && apt-get clean && \
conda clean -i -l -t -y && \
rm -rf /usr/local/src/*
ENV PATH /opt/conda/bin:$PATH
...@@ -20,11 +20,11 @@ LightGBM FAQ ...@@ -20,11 +20,11 @@ LightGBM FAQ
- **Solution 1**: this error should be solved in latest version. If you still meet this error, try to remove lightgbm.egg-info folder in your python-package and reinstall, or check [this thread on stackoverflow](http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path). - **Solution 1**: this error should be solved in latest version. If you still meet this error, try to remove lightgbm.egg-info folder in your python-package and reinstall, or check [this thread on stackoverflow](http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path).
- **Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already contruct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`. - **Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already construct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`.
- **Solution 2**: Because LightGBM contructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when contruct a Booster. And if you set free_raw_data=True (default), the raw data (with python data struct) will be freed. So, if you want to: - **Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when construct a Booster. And if you set free_raw_data=True (default), the raw data (with python data struct) will be freed. So, if you want to:
+ get label(or weight/init_score/group) before contruct dataset, it's same as get `self.label` + get label(or weight/init_score/group) before construct dataset, it's same as get `self.label`
+ set label(or weight/init_score/group) before contruct dataset, it's same as `self.label=some_label_array` + set label(or weight/init_score/group) before construct dataset, it's same as `self.label=some_label_array`
+ get num_data(or num_feature) before contruct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape` + get num_data(or num_feature) before construct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape`
+ set predictor(or reference/categorical feature) after contruct dataset, you should set free_raw_data=False or init a Dataset object with the same raw data + set predictor(or reference/categorical feature) after construct dataset, you should set free_raw_data=False or init a Dataset object with the same raw data
...@@ -26,9 +26,9 @@ LightGBM uses [leaf-wise](https://github.com/Microsoft/LightGBM/wiki/Features#op ...@@ -26,9 +26,9 @@ LightGBM uses [leaf-wise](https://github.com/Microsoft/LightGBM/wiki/Features#op
## For better accuracy ## For better accuracy
* Use large ```max_bin``` (may slower) * Use large ```max_bin``` (may be slower)
* Use small ```learning_rate``` with large ```num_iterations``` * Use small ```learning_rate``` with large ```num_iterations```
* Use large ```num_leave```(may over-fitting) * Use large ```num_leaves```(may cause over-fitting)
* Use bigger training data * Use bigger training data
* Try ```dart``` * Try ```dart```
......
...@@ -16,18 +16,20 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can ...@@ -16,18 +16,20 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* ```task```, default=```train```, type=enum, options=```train```,```prediction``` * ```task```, default=```train```, type=enum, options=```train```,```prediction```
* ```train``` for training * ```train``` for training
* ```prediction``` for prediction. * ```prediction``` for prediction.
* ```application```, default=```regression```, type=enum, options=```regression```,```regression_l1```,```huber```,```binary```,```lambdarank```,```multiclass```, alias=```objective```,```app``` * ```application```, default=```regression```, type=enum, options=```regression```,```regression_l1```,```huber```,```fair```,```poisson```,```binary```,```lambdarank```,```multiclass```, alias=```objective```,```app```
* ```regression```, regression application * ```regression```, regression application
* ```regression_l2```, L2 loss, alias=```mean_squared_error```,```mse``` * ```regression_l2```, L2 loss, alias=```mean_squared_error```,```mse```
* ```regression_l1```, L1 loss, alias=```mean_absolute_error```,```mae``` * ```regression_l1```, L1 loss, alias=```mean_absolute_error```,```mae```
* ```huber```, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia") * ```huber```, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia")
* ```fair```, [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html) * ```fair```, [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html)
* ```poisson```, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression "Poisson regression")
* ```binary```, binary classification application * ```binary```, binary classification application
* ```lambdarank```, lambdarank application * ```lambdarank```, lambdarank application
* ```multiclass```, multi-class classification application, should set ```num_class``` as well * ```multiclass```, multi-class classification application, should set ```num_class``` as well
* ```boosting```, default=```gbdt```, type=enum, options=```gbdt```,```dart```, alias=```boost```,```boosting_type``` * ```boosting```, default=```gbdt```, type=enum, options=```gbdt```,```dart```, alias=```boost```,```boosting_type```
* ```gbdt```, traditional Gradient Boosting Decision Tree * ```gbdt```, traditional Gradient Boosting Decision Tree
* ```dart```, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866) * ```dart```, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866)
* ```goss```, Gradient-based One-Side Sampling
* ```data```, default=```""```, type=string, alias=```train```,```train_data``` * ```data```, default=```""```, type=string, alias=```train```,```train_data```
* training data, LightGBM will train from this data * training data, LightGBM will train from this data
* ```valid```, default=```""```, type=multi-string, alias=```test```,```valid_data```,```test_data``` * ```valid```, default=```""```, type=multi-string, alias=```test```,```valid_data```,```test_data```
...@@ -94,6 +96,10 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can ...@@ -94,6 +96,10 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* only used in ```dart```, true if want to use xgboost dart mode * only used in ```dart```, true if want to use xgboost dart mode
* ```drop_seed```, default=```4```, type=int * ```drop_seed```, default=```4```, type=int
* only used in ```dart```, used to random seed to choose dropping models. * only used in ```dart```, used to random seed to choose dropping models.
* ```top_rate```, default=```0.2```, type=double
* only used in ```goss```, the retain ratio of large gradient data
* ```other_rate```, default=```0.1```, type=int
* only used in ```goss```, the retain ratio of small gradient data
## IO parameters ## IO parameters
...@@ -173,13 +179,15 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can ...@@ -173,13 +179,15 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* parameter for [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia"). Will be used in regression task. * parameter for [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia"). Will be used in regression task.
* ```fair_c```, default=```1.0```, type=double * ```fair_c```, default=```1.0```, type=double
* parameter for [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html). Will be used in regression task. * parameter for [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html). Will be used in regression task.
* ```poission_max_delta_step```, default=```0.7```, type=double
* parameter used to safeguard optimization
* ```scale_pos_weight```, default=```1.0```, type=double * ```scale_pos_weight```, default=```1.0```, type=double
* weight of positive class in binary classification task * weight of positive class in binary classification task
* ```is_unbalance```, default=```false```, type=bool * ```is_unbalance```, default=```false```, type=bool
* used in binary classification. Set this to ```true``` if training data are unbalance. * used in binary classification. Set this to ```true``` if training data are unbalance.
* ```max_position```, default=```20```, type=int * ```max_position```, default=```20```, type=int
* used in lambdarank, will optimize NDCG at this position. * used in lambdarank, will optimize NDCG at this position.
* ```label_gain```, default=```{0,1,3,7,15,31,63,...}```, type=multi-double * ```label_gain```, default=```0,1,3,7,15,31,63,...```, type=multi-double
* used in lambdarank, relevant gain for labels. For example, the gain of label ```2``` is ```3``` if using default label gains. * used in lambdarank, relevant gain for labels. For example, the gain of label ```2``` is ```3``` if using default label gains.
* Separate by ```,``` * Separate by ```,```
* ```num_class```, default=```1```, type=int, alias=```num_classes``` * ```num_class```, default=```1```, type=int, alias=```num_classes```
...@@ -192,7 +200,9 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can ...@@ -192,7 +200,9 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* ```l2```, square loss, alias=```mean_squared_error```, ```mse``` * ```l2```, square loss, alias=```mean_squared_error```, ```mse```
* ```huber```, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia") * ```huber```, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia")
* ```fair```, [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html) * ```fair```, [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html)
* ```poisson```, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression "Poisson regression")
* ```ndcg```, [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) * ```ndcg```, [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG)
* ```map```, [MAP](https://www.kaggle.com/wiki/MeanAveragePrecision)
* ```auc```, [AUC](https://en.wikipedia.org/wiki/Area_under_the_curve_(pharmacokinetics)) * ```auc```, [AUC](https://en.wikipedia.org/wiki/Area_under_the_curve_(pharmacokinetics))
* ```binary_logloss```, [log loss](https://www.kaggle.com/wiki/LogarithmicLoss) * ```binary_logloss```, [log loss](https://www.kaggle.com/wiki/LogarithmicLoss)
* ```binary_error```. For one sample ```0``` for correct classification, ```1``` for error classification. * ```binary_error```. For one sample ```0``` for correct classification, ```1``` for error classification.
...@@ -203,7 +213,7 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can ...@@ -203,7 +213,7 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* frequency for metric output * frequency for metric output
* ```is_training_metric```, default=```false```, type=bool * ```is_training_metric```, default=```false```, type=bool
* set this to true if need to output metric result of training * set this to true if need to output metric result of training
* ```ndcg_at```, default=```{1,2,3,4,5}```, type=multi-int, alias=```ndcg_eval_at``` * ```ndcg_at```, default=```1,2,3,4,5```, type=multi-int, alias=```ndcg_eval_at```,```eval_at```
* NDCG evaluation position, separate by ```,``` * NDCG evaluation position, separate by ```,```
## Network parameters ## Network parameters
......
...@@ -5,8 +5,8 @@ ...@@ -5,8 +5,8 @@
- [Booster](Python-API.md#booster) - [Booster](Python-API.md#booster)
* [Training API](Python-API.md#training-api) * [Training API](Python-API.md#training-api)
- [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone) - [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_nameauto-categorical_featureauto-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
- [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-shuffletrue-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone) - [cv](Python-API.md#cvparams-train_set-num_boost_round10-data_splitternone-nfold5-stratifiedfalse-shuffletrue-metricsnone-fobjnone-fevalnone-init_modelnone-feature_nameauto-categorical_featureauto-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
* [Scikit-learn API](Python-API.md#scikit-learn-api) * [Scikit-learn API](Python-API.md#scikit-learn-api)
- [Common Methods](Python-API.md#common-methods) - [Common Methods](Python-API.md#common-methods)
...@@ -23,6 +23,8 @@ ...@@ -23,6 +23,8 @@
+ [record_evaluation](Python-API.md#record_evaluationeval_result) + [record_evaluation](Python-API.md#record_evaluationeval_result)
+ [early_stopping](Python-API.md#early_stoppingstopping_rounds-verbosetrue) + [early_stopping](Python-API.md#early_stoppingstopping_rounds-verbosetrue)
* [Plotting](Python-API.md#plotting)
The methods of each Class is in alphabetical order. The methods of each Class is in alphabetical order.
---- ----
...@@ -31,7 +33,7 @@ The methods of each Class is in alphabetical order. ...@@ -31,7 +33,7 @@ The methods of each Class is in alphabetical order.
###Dataset ###Dataset
####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name=None, categorical_feature=None, params=None, free_raw_data=True) ####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)
Parameters Parameters
---------- ----------
...@@ -50,12 +52,14 @@ The methods of each Class is in alphabetical order. ...@@ -50,12 +52,14 @@ The methods of each Class is in alphabetical order.
Group/query size for dataset Group/query size for dataset
silent : boolean, optional silent : boolean, optional
Whether print messages during construction Whether print messages during construction
feature_name : list of str feature_name : list of str, or 'auto'
Feature names Feature names
categorical_feature : list of str or list of int If 'auto' and data is pandas DataFrame, use data columns name
categorical_feature : list of str or int, or 'auto'
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
If 'auto' and data is pandas DataFrame, use pandas categorical columns
params : dict, optional params : dict, optional
Other parameters Other parameters
free_raw_data : Bool free_raw_data : Bool
...@@ -341,14 +345,31 @@ The methods of each Class is in alphabetical order. ...@@ -341,14 +345,31 @@ The methods of each Class is in alphabetical order.
Evaluation result list. Evaluation result list.
####feature_name()
Get feature names.
Returns
-------
result : array
Array of feature names.
####feature_importance(importance_type="split") ####feature_importance(importance_type="split")
Feature importances. Get feature importances.
Parameters
----------
importance_type : str, default "split"
How the importance is calculated: "split" or "gain"
"split" is the number of times a feature is used in a model
"gain" is the total gain of splits which use the feature
Returns Returns
------- -------
result : array result : array
Array of feature importances Array of feature importances.
####predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True) ####predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True)
...@@ -445,7 +466,7 @@ The methods of each Class is in alphabetical order. ...@@ -445,7 +466,7 @@ The methods of each Class is in alphabetical order.
##Training API ##Training API
####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None) ####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None)
Train with given parameters. Train with given parameters.
...@@ -468,12 +489,14 @@ The methods of each Class is in alphabetical order. ...@@ -468,12 +489,14 @@ The methods of each Class is in alphabetical order.
Note: should return (eval_name, eval_result, is_higher_better) of list of this Note: should return (eval_name, eval_result, is_higher_better) of list of this
init_model : file name of lightgbm model or 'Booster' instance init_model : file name of lightgbm model or 'Booster' instance
model used for continued train model used for continued train
feature_name : list of str feature_name : list of str, or 'auto'
Feature names Feature names
categorical_feature : list of str or list of int If 'auto' and data is pandas DataFrame, use data columns name
categorical_feature : list of str or int, or 'auto'
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
If 'auto' and data is pandas DataFrame, use pandas categorical columns
early_stopping_rounds: int early_stopping_rounds: int
Activates early stopping. Activates early stopping.
Requires at least one validation data and one metric Requires at least one validation data and one metric
...@@ -513,7 +536,7 @@ The methods of each Class is in alphabetical order. ...@@ -513,7 +536,7 @@ The methods of each Class is in alphabetical order.
booster : a trained booster model booster : a trained booster model
####cv(params, train_set, num_boost_round=10, nfold=5, stratified=False, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None) ####cv(params, train_set, num_boost_round=10, data_splitter=None, nfold=5, stratified=False, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)
Cross-validation with given paramaters. Cross-validation with given paramaters.
...@@ -525,14 +548,14 @@ The methods of each Class is in alphabetical order. ...@@ -525,14 +548,14 @@ The methods of each Class is in alphabetical order.
Data to be trained. Data to be trained.
num_boost_round : int num_boost_round : int
Number of boosting iterations. Number of boosting iterations.
data_splitter : an instance with split(X) method
Instance with split(X) method.
nfold : int nfold : int
Number of folds in CV. Number of folds in CV.
stratified : bool stratified : bool
Perform stratified sampling. Perform stratified sampling.
shuffle: bool shuffle: bool
Whether shuffle before split data. Whether shuffle before split data.
folds : a KFold or StratifiedKFold instance
Sklearn KFolds or StratifiedKFolds.
metrics : str or list of str metrics : str or list of str
Evaluation metrics to be watched in CV. Evaluation metrics to be watched in CV.
fobj : function fobj : function
...@@ -541,11 +564,14 @@ The methods of each Class is in alphabetical order. ...@@ -541,11 +564,14 @@ The methods of each Class is in alphabetical order.
Custom evaluation function. Custom evaluation function.
init_model : file name of lightgbm model or 'Booster' instance init_model : file name of lightgbm model or 'Booster' instance
model used for continued train model used for continued train
feature_name : list of str feature_name : list of str, or 'auto'
Feature names Feature names
categorical_feature : list of str or int If 'auto' and data is pandas DataFrame, use data columns name
Categorical features, type int represents index, categorical_feature : list of str or int, or 'auto'
Categorical features,
type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
If 'auto' and data is pandas DataFrame, use pandas categorical columns
early_stopping_rounds: int early_stopping_rounds: int
Activates early stopping. CV error needs to decrease at least Activates early stopping. CV error needs to decrease at least
every <early_stopping_rounds> round(s) to continue. every <early_stopping_rounds> round(s) to continue.
...@@ -576,7 +602,7 @@ The methods of each Class is in alphabetical order. ...@@ -576,7 +602,7 @@ The methods of each Class is in alphabetical order.
###Common Methods ###Common Methods
####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=50000, objective="regression", min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0, nthread=-1, silent=True, sigmoid=1.0, huber_delta=1.0, max_position=20, label_gain=None, drop_rate=0.1, skip_drop=0.5, max_drop=50, uniform_drop=False, xgboost_dart_mode=False) ####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=50000, objective="regression", min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0, nthread=-1, silent=True, sigmoid=1.0, huber_delta=1.0, gaussian_eta=1.0, fair_c=1.0, poisson_max_delta_step=0.7, max_position=20, label_gain=None, drop_rate=0.1, skip_drop=0.5, max_drop=50, uniform_drop=False, xgboost_dart_mode=False)
Implementation of the Scikit-Learn API for LightGBM. Implementation of the Scikit-Learn API for LightGBM.
...@@ -636,6 +662,8 @@ The methods of each Class is in alphabetical order. ...@@ -636,6 +662,8 @@ The methods of each Class is in alphabetical order.
It is used to control the width of Gaussian function to approximate hessian. It is used to control the width of Gaussian function to approximate hessian.
fair_c : float fair_c : float
Only used in regression. Parameter for Fair loss function. Only used in regression. Parameter for Fair loss function.
poisson_max_delta_step : float
parameter used to safeguard optimization in Poisson regression.
max_position : int max_position : int
Only used in lambdarank, will optimize NDCG at this position. Only used in lambdarank, will optimize NDCG at this position.
label_gain : list of float label_gain : list of float
...@@ -693,7 +721,7 @@ The methods of each Class is in alphabetical order. ...@@ -693,7 +721,7 @@ The methods of each Class is in alphabetical order.
X_leaves : array_like, shape=[n_samples, n_trees] X_leaves : array_like, shape=[n_samples, n_trees]
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, callbacks=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)
Fit the gradient boosting model. Fit the gradient boosting model.
...@@ -720,16 +748,19 @@ The methods of each Class is in alphabetical order. ...@@ -720,16 +748,19 @@ The methods of each Class is in alphabetical order.
eval_metric : str, list of str, callable, optional eval_metric : str, list of str, callable, optional
If a str, should be a built-in evaluation metric to use. If a str, should be a built-in evaluation metric to use.
If callable, a custom evaluation metric, see note for more details. If callable, a custom evaluation metric, see note for more details.
default: binary_error for LGBMClassifier, l2 for LGBMRegressor, ndcg for LGBMRanker default: logloss for LGBMClassifier, l2 for LGBMRegressor, ndcg for LGBMRanker
Can directly use 'logloss' or 'error' for LGBMClassifier.
early_stopping_rounds : int early_stopping_rounds : int
verbose : bool verbose : bool
If `verbose` and an evaluation set is used, writes the evaluation If `verbose` and an evaluation set is used, writes the evaluation
feature_name : list of str feature_name : list of str, or 'auto'
Feature names Feature names
categorical_feature : list of str or int If 'auto' and data is pandas DataFrame, use data columns name
categorical_feature : list of str or int, or 'auto'
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well). type str represents feature names (need to specify feature_name as well)
If 'auto' and data is pandas DataFrame, use pandas categorical columns
callbacks : list of callback functions callbacks : list of callback functions
List of callback functions that are applied at each iteration. List of callback functions that are applied at each iteration.
See Callbacks in Python-API.md for more information. See Callbacks in Python-API.md for more information.
...@@ -787,7 +818,7 @@ The methods of each Class is in alphabetical order. ...@@ -787,7 +818,7 @@ The methods of each Class is in alphabetical order.
Get the evaluation results. Get the evaluation results.
####feature_importance_ ####feature_importances_
Get normailized feature importances. Get normailized feature importances.
...@@ -823,7 +854,7 @@ The methods of each Class is in alphabetical order. ...@@ -823,7 +854,7 @@ The methods of each Class is in alphabetical order.
###LGBMRanker ###LGBMRanker
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, callbacks=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)
Most arguments are same as Common Methods except: Most arguments are same as Common Methods except:
...@@ -907,3 +938,110 @@ The methods of each Class is in alphabetical order. ...@@ -907,3 +938,110 @@ The methods of each Class is in alphabetical order.
------- -------
callback : function callback : function
The requested callback function. The requested callback function.
##Plotting
####plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='Feature importance', ylabel='Features', importance_type='split', max_num_features=None, ignore_zero=True, figsize=None, grid=True, **kwargs):
Plot model feature importances.
Parameters
----------
booster : Booster or LGBMModel
Booster or LGBMModel instance.
ax : matplotlib Axes
Target axes instance. If None, new figure and axes will be created.
height : float
Bar height, passed to ax.barh().
xlim : tuple of 2 elements
Tuple passed to axes.xlim().
ylim : tuple of 2 elements
Tuple passed to axes.ylim().
title : str
Axes title. Pass None to disable.
xlabel : str
X axis title label. Pass None to disable.
ylabel : str
Y axis title label. Pass None to disable.
importance_type : str
How the importance is calculated: "split" or "gain".
"split" is the number of times a feature is used in a model.
"gain" is the total gain of splits which use the feature.
max_num_features : int
Max number of top features displayed on plot.
If None or smaller than 1, all features will be displayed.
ignore_zero : bool
Ignore features with zero importance.
figsize : tuple of 2 elements
Figure size.
grid : bool
Whether add grid for axes.
**kwargs :
Other keywords passed to ax.barh().
Returns
-------
ax : matplotlib Axes
####plot_metric(booster, metric=None, dataset_names=None, ax=None, xlim=None, ylim=None, title='Metric during training', xlabel='Iterations', ylabel='auto', figsize=None, grid=True):
Plot one metric during training.
Parameters
----------
booster : dict or LGBMModel
Evals_result recorded by lightgbm.train() or LGBMModel instance
metric : str or None
The metric name to plot.
Only one metric supported because different metrics have various scales.
Pass None to pick `first` one (according to dict hashcode).
dataset_names : None or list of str
List of the dataset names to plot.
Pass None to plot all datasets.
ax : matplotlib Axes
Target axes instance. If None, new figure and axes will be created.
xlim : tuple of 2 elements
Tuple passed to axes.xlim()
ylim : tuple of 2 elements
Tuple passed to axes.ylim()
title : str
Axes title. Pass None to disable.
xlabel : str
X axis title label. Pass None to disable.
ylabel : str
Y axis title label. Pass None to disable. Pass 'auto' to use `metric`.
figsize : tuple of 2 elements
Figure size
grid : bool
Whether add grid for axes
Returns
-------
ax : matplotlib Axes
####plot_tree(booster, ax=None, tree_index=0, figsize=None, graph_attr=None, node_attr=None, edge_attr=None, show_info=None):
Plot specified tree.
Parameters
----------
booster : Booster, LGBMModel
Booster or LGBMModel instance.
ax : matplotlib Axes
Target axes instance. If None, new figure and axes will be created.
tree_index : int, default 0
Specify tree index of target tree.
figsize : tuple of 2 elements
Figure size.
graph_attr: dict
Mapping of (attribute, value) pairs for the graph.
node_attr: dict
Mapping of (attribute, value) pairs set for all nodes.
edge_attr: dict
Mapping of (attribute, value) pairs set for all edges.
show_info : list
Information shows on nodes.
options: 'split_gain', 'internal_value', 'internal_count' or 'leaf_count'.
Returns
-------
ax : matplotlib Axes
...@@ -10,6 +10,11 @@ This document gives a basic walkthrough of LightGBM python package. ...@@ -10,6 +10,11 @@ This document gives a basic walkthrough of LightGBM python package.
Install Install
------- -------
* Install the library first, follow the wiki [here](./Installation-Guide.md). * Install the library first, follow the wiki [here](./Installation-Guide.md).
* Install python-package dependencies, `setuptools`, `numpy` and `scipy` is required, `scikit-learn` is required for sklearn interface and recommended. Run:
```
pip install setuptools numpy scipy scikit-learn -U
```
* In the `python-package` directory, run * In the `python-package` directory, run
``` ```
python setup.py install python setup.py install
...@@ -73,13 +78,13 @@ LightGBM can use categorical features as input directly. It doesn't need to cove ...@@ -73,13 +78,13 @@ LightGBM can use categorical features as input directly. It doesn't need to cove
#### Weights can be set when needed: #### Weights can be set when needed:
```python ```python
w = np.random.rand(500, 1) w = np.random.rand(500, )
train_data = lgb.Dataset(data, label=label, weight=w) train_data = lgb.Dataset(data, label=label, weight=w)
``` ```
or or
```python ```python
train_data = lgb.Dataset(data, label=label) train_data = lgb.Dataset(data, label=label)
w = np.random.rand(500, 1) w = np.random.rand(500, )
train_data.set_weight(w) train_data.set_weight(w)
``` ```
......
...@@ -18,7 +18,7 @@ Label is the data of first column, and there is no header in the file. ...@@ -18,7 +18,7 @@ Label is the data of first column, and there is no header in the file.
update 12/5/2016: update 12/5/2016:
LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)). LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding.
For the setting details, please refer to [Parameters](./Parameters.md#io-parameters). For the setting details, please refer to [Parameters](./Parameters.md#io-parameters).
...@@ -103,7 +103,7 @@ For example, following command line will keep 'num_trees=10' and ignore same par ...@@ -103,7 +103,7 @@ For example, following command line will keep 'num_trees=10' and ignore same par
## Examples ## Examples
* [Binary Classifiaction](../examples/binary_classification) * [Binary Classification](../examples/binary_classification)
* [Regression](../examples/regression) * [Regression](../examples/regression)
* [Lambdarank](../examples/lambdarank) * [Lambdarank](../examples/lambdarank)
* [Parallel Learning](../examples/parallel_learning) * [Parallel Learning](../examples/parallel_learning)
...@@ -9,5 +9,6 @@ Documents ...@@ -9,5 +9,6 @@ Documents
* [Parameters Tuning](./Parameters-tuning.md) * [Parameters Tuning](./Parameters-tuning.md)
* [Python API Reference](./Python-API.md) * [Python API Reference](./Python-API.md)
* [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide) * [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide)
* [FAQ](./FAQ.md)
* [Development Guide](./development.md) * [Development Guide](./development.md)
...@@ -6,10 +6,9 @@ Here is an example for LightGBM to use python package. ...@@ -6,10 +6,9 @@ Here is an example for LightGBM to use python package.
For the installation, check the wiki [here](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide). For the installation, check the wiki [here](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide).
You also need scikit-learn and pandas to run the examples, but they are not required for the package itself. You can install them with pip: You also need scikit-learn, pandas and matplotlib (only for plot example) to run the examples, but they are not required for the package itself. You can install them with pip:
``` ```
pip install -U scikit-learn pip install scikit-learn pandas matplotlib -U
pip install -U pandas
``` ```
Now you can run examples in this folder, for example: Now you can run examples in this folder, for example:
......
...@@ -11,10 +11,10 @@ df_test = pd.read_csv('../binary_classification/binary.test', header=None, sep=' ...@@ -11,10 +11,10 @@ df_test = pd.read_csv('../binary_classification/binary.test', header=None, sep='
W_train = pd.read_csv('../binary_classification/binary.train.weight', header=None)[0] W_train = pd.read_csv('../binary_classification/binary.train.weight', header=None)[0]
W_test = pd.read_csv('../binary_classification/binary.test.weight', header=None)[0] W_test = pd.read_csv('../binary_classification/binary.test.weight', header=None)[0]
y_train = df_train[0] y_train = df_train[0].values
y_test = df_test[0] y_test = df_test[0].values
X_train = df_train.drop(0, axis=1) X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1) X_test = df_test.drop(0, axis=1).values
num_train, num_feature = X_train.shape num_train, num_feature = X_train.shape
......
# coding: utf-8
# pylint: disable = invalid-name, C0111
import lightgbm as lgb
import pandas as pd
try:
import matplotlib.pyplot as plt
except ImportError:
raise ImportError('You need to install matplotlib for plot_example.py.')
# load or create your dataset
print('Load data...')
df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
y_train = df_train[0].values
y_test = df_test[0].values
X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1).values
# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_test = lgb.Dataset(X_test, y_test, reference=lgb_train)
# specify your configurations as a dict
params = {
'num_leaves': 5,
'metric': ('l1', 'l2'),
'verbose': 0
}
evals_result = {} # to record eval results for plotting
print('Start training...')
# train
gbm = lgb.train(params,
lgb_train,
num_boost_round=100,
valid_sets=[lgb_train, lgb_test],
feature_name=['f' + str(i + 1) for i in range(28)],
categorical_feature=[21],
evals_result=evals_result,
verbose_eval=10)
print('Plot metrics during training...')
ax = lgb.plot_metric(evals_result, metric='l1')
plt.show()
print('Plot feature importances...')
ax = lgb.plot_importance(gbm, max_num_features=10)
plt.show()
print('Plot 84th tree...') # one tree use categorical feature to split
ax = lgb.plot_tree(gbm, tree_index=83, figsize=(20, 8), show_info=['split_gain'])
plt.show()
...@@ -10,10 +10,10 @@ print('Load data...') ...@@ -10,10 +10,10 @@ print('Load data...')
df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t') df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t') df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
y_train = df_train[0] y_train = df_train[0].values
y_test = df_test[0] y_test = df_test[0].values
X_train = df_train.drop(0, axis=1) X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1) X_test = df_test.drop(0, axis=1).values
# create dataset for lightgbm # create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train) lgb_train = lgb.Dataset(X_train, y_train)
...@@ -58,7 +58,8 @@ model_json = gbm.dump_model() ...@@ -58,7 +58,8 @@ model_json = gbm.dump_model()
with open('model.json', 'w+') as f: with open('model.json', 'w+') as f:
json.dump(model_json, f, indent=4) json.dump(model_json, f, indent=4)
print('Feature names:', gbm.feature_name())
print('Calculate feature importances...') print('Calculate feature importances...')
# feature importances # feature importances
print('Feature importances:', list(gbm.feature_importance())) print('Feature importances:', list(gbm.feature_importance()))
# print('Feature importances:', list(gbm.feature_importance("gain")))
...@@ -10,10 +10,10 @@ print('Load data...') ...@@ -10,10 +10,10 @@ print('Load data...')
df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t') df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t') df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
y_train = df_train[0] y_train = df_train[0].values
y_test = df_test[0] y_test = df_test[0].values
X_train = df_train.drop(0, axis=1) X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1) X_test = df_test.drop(0, axis=1).values
print('Start training...') print('Start training...')
# train # train
...@@ -34,7 +34,7 @@ print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5) ...@@ -34,7 +34,7 @@ print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)
print('Calculate feature importances...') print('Calculate feature importances...')
# feature importances # feature importances
print('Feature importances:', list(gbm.feature_importance_)) print('Feature importances:', list(gbm.feature_importances_))
# other scikit-learn modules # other scikit-learn modules
estimator = lgb.LGBMRegressor(num_leaves=31) estimator = lgb.LGBMRegressor(num_leaves=31)
......
...@@ -19,8 +19,8 @@ class Metric; ...@@ -19,8 +19,8 @@ class Metric;
* \brief The main entrance of LightGBM. this application has two tasks: * \brief The main entrance of LightGBM. this application has two tasks:
* Train and Predict. * Train and Predict.
* Train task will train a new model * Train task will train a new model
* Predict task will predicting the scores of test data using exsiting model, * Predict task will predict the scores of test data using exsisting model,
* and saving the score to disk. * and save the score to disk.
*/ */
class Application { class Application {
public: public:
...@@ -41,7 +41,7 @@ private: ...@@ -41,7 +41,7 @@ private:
template<typename T> template<typename T>
T GlobalSyncUpByMin(T& local); T GlobalSyncUpByMin(T& local);
/*! \brief Load parametes from command line and config file*/ /*! \brief Load parameters from command line and config file*/
void LoadParameters(int argc, char** argv); void LoadParameters(int argc, char** argv);
/*! \brief Load data, including training data and validation data*/ /*! \brief Load data, including training data and validation data*/
......
#ifndef LIGHTGBM_BIN_H_ #ifndef LIGHTGBM_BIN_H_
#define LIGHTGBM_BIN_H_ #define LIGHTGBM_BIN_H_
#include <LightGBM/utils/common.h>
#include <LightGBM/meta.h> #include <LightGBM/meta.h>
#include <vector> #include <vector>
#include <functional> #include <functional>
#include <unordered_map> #include <unordered_map>
#include <sstream>
namespace LightGBM { namespace LightGBM {
...@@ -14,16 +17,16 @@ enum BinType { ...@@ -14,16 +17,16 @@ enum BinType {
CategoricalBin CategoricalBin
}; };
/*! \brief Store data for one histogram bin */ /*! \brief Store data for one histogram bin */
struct HistogramBinEntry { struct HistogramBinEntry {
public: public:
/*! \brief Sum of gradients on this bin */ /*! \brief Sum of gradients on this bin */
double sum_gradients = 0.0; double sum_gradients = 0.0f;
/*! \brief Sum of hessians on this bin */ /*! \brief Sum of hessians on this bin */
double sum_hessians = 0.0; double sum_hessians = 0.0f;
/*! \brief Number of data on this bin */ /*! \brief Number of data on this bin */
data_size_t cnt = 0; data_size_t cnt = 0;
/*! /*!
* \brief Sum up (reducers) functions for histogram bin * \brief Sum up (reducers) functions for histogram bin
*/ */
...@@ -56,13 +59,11 @@ public: ...@@ -56,13 +59,11 @@ public:
explicit BinMapper(const void* memory); explicit BinMapper(const void* memory);
~BinMapper(); ~BinMapper();
static double kSparseThreshold;
bool CheckAlign(const BinMapper& other) const { bool CheckAlign(const BinMapper& other) const {
if (num_bin_ != other.num_bin_) { if (num_bin_ != other.num_bin_) {
return false; return false;
} }
if (bin_type_ != other.bin_type_) {
return false;
}
if (bin_type_ == BinType::NumericalBin) { if (bin_type_ == BinType::NumericalBin) {
for (int i = 0; i < num_bin_; ++i) { for (int i = 0; i < num_bin_; ++i) {
if (bin_upper_bound_[i] != other.bin_upper_bound_[i]) { if (bin_upper_bound_[i] != other.bin_upper_bound_[i]) {
...@@ -95,7 +96,7 @@ public: ...@@ -95,7 +96,7 @@ public:
* \param bin * \param bin
* \return Feature value of this bin * \return Feature value of this bin
*/ */
inline double BinToValue(unsigned int bin) const { inline double BinToValue(uint32_t bin) const {
if (bin_type_ == BinType::NumericalBin) { if (bin_type_ == BinType::NumericalBin) {
return bin_upper_bound_[bin]; return bin_upper_bound_[bin];
} else { } else {
...@@ -111,26 +112,25 @@ public: ...@@ -111,26 +112,25 @@ public:
* \param value * \param value
* \return bin for this feature value * \return bin for this feature value
*/ */
inline unsigned int ValueToBin(double value) const; inline uint32_t ValueToBin(double value) const;
/*! /*!
* \brief Get the default bin when value is 0 or is firt categorical * \brief Get the default bin when value is 0
* \return default bin * \return default bin
*/ */
inline uint32_t GetDefaultBin() const { inline uint32_t GetDefaultBin() const {
if (bin_type_ == BinType::NumericalBin) { return default_bin_;
return ValueToBin(0);
} else {
return 0;
}
} }
/*! /*!
* \brief Construct feature value to bin mapper according feature values * \brief Construct feature value to bin mapper according feature values
* \param values (Sampled) values of this feature * \param values (Sampled) values of this feature, Note: not include zero.
* \param total_sample_cnt number of total sample count, equal with values.size() + num_zeros
* \param max_bin The maximal number of bin * \param max_bin The maximal number of bin
* \param min_data_in_bin min number of data in one bin
* \param min_split_data
* \param bin_type Type of this bin * \param bin_type Type of this bin
*/ */
void FindBin(std::vector<double>* values, size_t total_sample_cnt, int max_bin, BinType bin_type); void FindBin(std::vector<double>& values, size_t total_sample_cnt, int max_bin, int min_data_in_bin, int min_split_data, BinType bin_type);
/*! /*!
* \brief Use specific number of bin to calculate the size of this class * \brief Use specific number of bin to calculate the size of this class
...@@ -151,7 +151,25 @@ public: ...@@ -151,7 +151,25 @@ public:
*/ */
void CopyFrom(const char* buffer); void CopyFrom(const char* buffer);
/*!
* \brief Get bin types
*/
inline BinType bin_type() const { return bin_type_; } inline BinType bin_type() const { return bin_type_; }
/*!
* \brief Get bin info
*/
inline std::string bin_info() const {
if (bin_type_ == BinType::CategoricalBin) {
return Common::Join(bin_2_categorical_, ":");
} else {
std::stringstream str_buf;
str_buf << std::setprecision(std::numeric_limits<double>::digits10 + 2);
str_buf << '[' << min_val_ << ':' << max_val_ << ']';
return str_buf.str();
}
}
private: private:
/*! \brief Number of bins */ /*! \brief Number of bins */
int num_bin_; int num_bin_;
...@@ -167,6 +185,12 @@ private: ...@@ -167,6 +185,12 @@ private:
std::unordered_map<int, unsigned int> categorical_2_bin_; std::unordered_map<int, unsigned int> categorical_2_bin_;
/*! \brief Mapper from bin to categorical */ /*! \brief Mapper from bin to categorical */
std::vector<int> bin_2_categorical_; std::vector<int> bin_2_categorical_;
/*! \brief minimal feature vaule */
double min_val_;
/*! \brief maximum feature value */
double max_val_;
/*! \brief bin value of feature value 0 */
uint32_t default_bin_;
}; };
/*! /*!
...@@ -188,7 +212,7 @@ public: ...@@ -188,7 +212,7 @@ public:
(this logic was build for bagging logic) (this logic was build for bagging logic)
* \param num_leaves Number of leaves on this iteration * \param num_leaves Number of leaves on this iteration
*/ */
virtual void Init(const char* used_idices, data_size_t num_leaves) = 0; virtual void Init(const char* used_indices, data_size_t num_leaves) = 0;
/*! /*!
* \brief Construct histogram by using this bin * \brief Construct histogram by using this bin
...@@ -206,9 +230,12 @@ public: ...@@ -206,9 +230,12 @@ public:
* \brief Split current bin, and perform re-order by leaf * \brief Split current bin, and perform re-order by leaf
* \param leaf Using which leaf's to split * \param leaf Using which leaf's to split
* \param right_leaf The new leaf index after perform this split * \param right_leaf The new leaf index after perform this split
* \param left_indices left_indices[i] == true means the i-th data will be on left leaf after split * \param is_in_leaf is_in_leaf[i] == mark means the i-th data will be on left leaf after split
* \param mark is_in_leaf[i] == mark means the i-th data will be on left leaf after split
*/ */
virtual void Split(int leaf, int right_leaf, const char* left_indices) = 0; virtual void Split(int leaf, int right_leaf, const char* is_in_leaf, char mark) = 0;
virtual data_size_t NonZeroCount(int leaf) const = 0;
}; };
/*! \brief Iterator for one bin column */ /*! \brief Iterator for one bin column */
...@@ -220,6 +247,8 @@ public: ...@@ -220,6 +247,8 @@ public:
* \return Bin data * \return Bin data
*/ */
virtual uint32_t Get(data_size_t idx) = 0; virtual uint32_t Get(data_size_t idx) = 0;
virtual void Reset(data_size_t idx) = 0;
virtual ~BinIterator() = default;
}; };
/*! /*!
...@@ -240,12 +269,16 @@ public: ...@@ -240,12 +269,16 @@ public:
*/ */
virtual void Push(int tid, data_size_t idx, uint32_t value) = 0; virtual void Push(int tid, data_size_t idx, uint32_t value) = 0;
virtual void CopySubset(const Bin* full_bin, const data_size_t* used_indices, data_size_t num_used_indices) = 0;
/*! /*!
* \brief Get bin interator of this bin * \brief Get bin iterator of this bin for specific feature
* \param start_idx start index of this * \param min_bin min_bin of current used feature
* \param max_bin max_bin of current used feature
* \param default_bin default bin if bin not in [min_bin, max_bin]
* \return Iterator of this bin * \return Iterator of this bin
*/ */
virtual BinIterator* GetIterator(data_size_t start_idx) const = 0; virtual BinIterator* GetIterator(uint32_t min_bin, uint32_t max_bin, uint32_t default_bin) const = 0;
/*! /*!
* \brief Save binary data to file * \brief Save binary data to file
...@@ -255,7 +288,8 @@ public: ...@@ -255,7 +288,8 @@ public:
/*! /*!
* \brief Load from memory * \brief Load from memory
* \param file File want to write * \param memory
* \param local_used_indices
*/ */
virtual void LoadFromMemory(const void* memory, virtual void LoadFromMemory(const void* memory,
const std::vector<data_size_t>& local_used_indices) = 0; const std::vector<data_size_t>& local_used_indices) = 0;
...@@ -268,10 +302,12 @@ public: ...@@ -268,10 +302,12 @@ public:
/*! \brief Number of all data */ /*! \brief Number of all data */
virtual data_size_t num_data() const = 0; virtual data_size_t num_data() const = 0;
virtual void ReSize(data_size_t num_data) = 0;
/*! /*!
* \brief Construct histogram of this feature, * \brief Construct histogram of this feature,
* Note: We use ordered_gradients and ordered_hessians to improve cache hit chance * Note: We use ordered_gradients and ordered_hessians to improve cache hit chance
* The navie solution is use gradients[data_indices[i]] for data_indices[i] to get gradients, * The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients,
which is not cache friendly, since the access of memory is not continuous. which is not cache friendly, since the access of memory is not continuous.
* ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices. * ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices.
* Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians). * Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).
...@@ -288,17 +324,21 @@ public: ...@@ -288,17 +324,21 @@ public:
/*! /*!
* \brief Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices) * \brief Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)
* \param min_bin min_bin of current used feature
* \param max_bin max_bin of current used feature
* \param default_bin defualt bin if bin not in [min_bin, max_bin]
* \param threshold The split threshold. * \param threshold The split threshold.
* \param data_indices Used data indices. After called this function. The less than or equal data indices will store on this object. * \param data_indices Used data indices. After called this function. The less than or equal data indices will store on this object.
* \param num_data Number of used data * \param num_data Number of used data
* \param lte_indices After called this function. The less or equal data indices will store on this object. * \param lte_indices After called this function. The less or equal data indices will store on this object.
* \param gt_indices After called this function. The greater data indices will store on this object. * \param gt_indices After called this function. The greater data indices will store on this object.
* \param bin_type type of bin
* \return The number of less than or equal data. * \return The number of less than or equal data.
*/ */
virtual data_size_t Split( virtual data_size_t Split(uint32_t min_bin, uint32_t max_bin,
unsigned int threshold, uint32_t default_bin, uint32_t threshold,
data_size_t* data_indices, data_size_t num_data, data_size_t* data_indices, data_size_t num_data,
data_size_t* lte_indices, data_size_t* gt_indices) const = 0; data_size_t* lte_indices, data_size_t* gt_indices, BinType bin_type) const = 0;
/*! /*!
* \brief Create the ordered bin for this bin * \brief Create the ordered bin for this bin
...@@ -315,44 +355,35 @@ public: ...@@ -315,44 +355,35 @@ public:
* \brief Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse" * \brief Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse"
* \param num_data Total number of data * \param num_data Total number of data
* \param num_bin Number of bin * \param num_bin Number of bin
* \param is_sparse True if this feature is sparse
* \param sparse_rate Sparse rate of this bins( num_bin0/num_data ) * \param sparse_rate Sparse rate of this bins( num_bin0/num_data )
* \param is_enable_sparse True if enable sparse feature * \param is_enable_sparse True if enable sparse feature
* \param is_sparse Will set to true if this bin is sparse * \param is_sparse Will set to true if this bin is sparse
* \param default_bin Default bin for zeros value * \param default_bin Default bin for zeros value
* \param bin_type type of bin
* \return The bin data object * \return The bin data object
*/ */
static Bin* CreateBin(data_size_t num_data, int num_bin, static Bin* CreateBin(data_size_t num_data, int num_bin,
double sparse_rate, bool is_enable_sparse, double sparse_rate, bool is_enable_sparse, bool* is_sparse);
bool* is_sparse, int default_bin, BinType bin_type);
/*! /*!
* \brief Create object for bin data of one feature, used for dense feature * \brief Create object for bin data of one feature, used for dense feature
* \param num_data Total number of data * \param num_data Total number of data
* \param num_bin Number of bin * \param num_bin Number of bin
* \param default_bin Default bin for zeros value
* \param bin_type type of bin
* \return The bin data object * \return The bin data object
*/ */
static Bin* CreateDenseBin(data_size_t num_data, int num_bin, static Bin* CreateDenseBin(data_size_t num_data, int num_bin);
int default_bin, BinType bin_type);
/*! /*!
* \brief Create object for bin data of one feature, used for sparse feature * \brief Create object for bin data of one feature, used for sparse feature
* \param num_data Total number of data * \param num_data Total number of data
* \param num_bin Number of bin * \param num_bin Number of bin
* \param default_bin Default bin for zeros value
* \param bin_type type of bin
* \return The bin data object * \return The bin data object
*/ */
static Bin* CreateSparseBin(data_size_t num_data, static Bin* CreateSparseBin(data_size_t num_data, int num_bin);
int num_bin, int default_bin, BinType bin_type);
}; };
inline unsigned int BinMapper::ValueToBin(double value) const { inline uint32_t BinMapper::ValueToBin(double value) const {
// binary search to find bin
if (bin_type_ == BinType::NumericalBin) { if (bin_type_ == BinType::NumericalBin) {
// binary search to find bin
int l = 0; int l = 0;
int r = num_bin_ - 1; int r = num_bin_ - 1;
while (l < r) { while (l < r) {
......
...@@ -17,7 +17,7 @@ class Metric; ...@@ -17,7 +17,7 @@ class Metric;
/*! /*!
* \brief The interface for Boosting * \brief The interface for Boosting
*/ */
class Boosting { class LIGHTGBM_EXPORT Boosting {
public: public:
/*! \brief virtual destructor */ /*! \brief virtual destructor */
virtual ~Boosting() {} virtual ~Boosting() {}
...@@ -99,14 +99,14 @@ public: ...@@ -99,14 +99,14 @@ public:
/*! /*!
* \brief Get prediction result at data_idx data * \brief Get prediction result at data_idx data
* \param data_idx 0: training data, 1: 1st validation data * \param data_idx 0: training data, 1: 1st validation data
* \return out_len lenght of returned score * \return out_len length of returned score
*/ */
virtual int64_t GetNumPredictAt(int data_idx) const = 0; virtual int64_t GetNumPredictAt(int data_idx) const = 0;
/*! /*!
* \brief Get prediction result at data_idx data * \brief Get prediction result at data_idx data
* \param data_idx 0: training data, 1: 1st validation data * \param data_idx 0: training data, 1: 1st validation data
* \param result used to store prediction result, should allocate memory before call this function * \param result used to store prediction result, should allocate memory before call this function
* \param out_len lenght of returned score * \param out_len length of returned score
*/ */
virtual void GetPredictAt(int data_idx, double* result, int64_t* out_len) = 0; virtual void GetPredictAt(int data_idx, double* result, int64_t* out_len) = 0;
...@@ -125,7 +125,7 @@ public: ...@@ -125,7 +125,7 @@ public:
virtual std::vector<double> Predict(const double* feature_values) const = 0; virtual std::vector<double> Predict(const double* feature_values) const = 0;
/*! /*!
* \brief Predtion for one record with leaf index * \brief Prediction for one record with leaf index
* \param feature_values Feature value on this record * \param feature_values Feature value on this record
* \return Predicted leaf index for this record * \return Predicted leaf index for this record
*/ */
...@@ -143,14 +143,23 @@ public: ...@@ -143,14 +143,23 @@ public:
* \param num_used_model Number of model that want to save, -1 means save all * \param num_used_model Number of model that want to save, -1 means save all
* \param is_finish Is training finished or not * \param is_finish Is training finished or not
* \param filename Filename that want to save to * \param filename Filename that want to save to
* \return true if succeeded
*/
virtual bool SaveModelToFile(int num_iterations, const char* filename) const = 0;
/*!
* \brief Save model to string
* \param num_used_model Number of model that want to save, -1 means save all
* \return Non-empty string if succeeded
*/ */
virtual void SaveModelToFile(int num_iterations, const char* filename) const = 0; virtual std::string SaveModelToString(int num_iterations) const = 0;
/*! /*!
* \brief Restore from a serialized string * \brief Restore from a serialized string
* \param model_str The string of model * \param model_str The string of model
* \return true if succeeded
*/ */
virtual void LoadModelFromString(const std::string& model_str) = 0; virtual bool LoadModelFromString(const std::string& model_str) = 0;
/*! /*!
* \brief Get max feature index of this model * \brief Get max feature index of this model
...@@ -158,6 +167,12 @@ public: ...@@ -158,6 +167,12 @@ public:
*/ */
virtual int MaxFeatureIdx() const = 0; virtual int MaxFeatureIdx() const = 0;
/*!
* \brief Get feature names of this model
* \return Feature names of this model
*/
virtual std::vector<std::string> FeatureNames() const = 0;
/*! /*!
* \brief Get index of label column * \brief Get index of label column
* \return index of label column * \return index of label column
...@@ -192,7 +207,7 @@ public: ...@@ -192,7 +207,7 @@ public:
/*! \brief Disable copy */ /*! \brief Disable copy */
Boosting(const Boosting&) = delete; Boosting(const Boosting&) = delete;
static void LoadFileToBoosting(Boosting* boosting, const char* filename); static bool LoadFileToBoosting(Boosting* boosting, const char* filename);
/*! /*!
* \brief Create boosting object * \brief Create boosting object
......
This diff is collapsed.
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
#include <LightGBM/utils/log.h> #include <LightGBM/utils/log.h>
#include <LightGBM/meta.h> #include <LightGBM/meta.h>
#include <LightGBM/export.h>
#include <vector> #include <vector>
#include <string> #include <string>
...@@ -84,7 +85,7 @@ enum TaskType { ...@@ -84,7 +85,7 @@ enum TaskType {
/*! \brief Config for input and output files */ /*! \brief Config for input and output files */
struct IOConfig: public ConfigBase { struct IOConfig: public ConfigBase {
public: public:
int max_bin = 256; int max_bin = 255;
int num_class = 1; int num_class = 1;
int data_random_seed = 1; int data_random_seed = 1;
std::string data_filename = ""; std::string data_filename = "";
...@@ -99,10 +100,14 @@ public: ...@@ -99,10 +100,14 @@ public:
bool use_two_round_loading = false; bool use_two_round_loading = false;
bool is_save_binary_file = false; bool is_save_binary_file = false;
bool enable_load_from_binary_file = true; bool enable_load_from_binary_file = true;
int bin_construct_sample_cnt = 50000; int bin_construct_sample_cnt = 200000;
bool is_predict_leaf_index = false; bool is_predict_leaf_index = false;
bool is_predict_raw_score = false; bool is_predict_raw_score = false;
int min_data_in_leaf = 100;
int min_data_in_bin = 5;
double max_conflict_rate = 0.0000f;
bool enable_bundle = true;
bool adjacent_bundle = false;
bool has_header = false; bool has_header = false;
/*! \brief Index or column name of label, default is the first column /*! \brief Index or column name of label, default is the first column
* And add an prefix "name:" while using column name */ * And add an prefix "name:" while using column name */
...@@ -123,7 +128,7 @@ public: ...@@ -123,7 +128,7 @@ public:
* And add an prefix "name:" while using column name * And add an prefix "name:" while using column name
* Note: when using Index, it dosen't count the label index */ * Note: when using Index, it dosen't count the label index */
std::string categorical_column = ""; std::string categorical_column = "";
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
}; };
/*! \brief Config for objective function */ /*! \brief Config for objective function */
...@@ -133,8 +138,9 @@ public: ...@@ -133,8 +138,9 @@ public:
double sigmoid = 1.0f; double sigmoid = 1.0f;
double huber_delta = 1.0f; double huber_delta = 1.0f;
double fair_c = 1.0f; double fair_c = 1.0f;
// for ApproximateHessianWithGaussian // for Approximate Hessian With Gaussian
double gaussian_eta = 1.0f; double gaussian_eta = 1.0f;
double poisson_max_delta_step = 0.7f;
// for lambdarank // for lambdarank
std::vector<double> label_gain; std::vector<double> label_gain;
// for lambdarank // for lambdarank
...@@ -145,7 +151,7 @@ public: ...@@ -145,7 +151,7 @@ public:
int num_class = 1; int num_class = 1;
// Balancing of positive and negative weights // Balancing of positive and negative weights
double scale_pos_weight = 1.0f; double scale_pos_weight = 1.0f;
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
}; };
/*! \brief Config for metrics interface*/ /*! \brief Config for metrics interface*/
...@@ -158,7 +164,7 @@ public: ...@@ -158,7 +164,7 @@ public:
double fair_c = 1.0f; double fair_c = 1.0f;
std::vector<double> label_gain; std::vector<double> label_gain;
std::vector<int> eval_at; std::vector<int> eval_at;
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
}; };
...@@ -174,15 +180,15 @@ public: ...@@ -174,15 +180,15 @@ public:
int num_leaves = 127; int num_leaves = 127;
int feature_fraction_seed = 2; int feature_fraction_seed = 2;
double feature_fraction = 1.0f; double feature_fraction = 1.0f;
// max cache size(unit:MB) for historical histogram. < 0 means not limit // max cache size(unit:MB) for historical histogram. < 0 means no limit
double histogram_pool_size = -1.0f; double histogram_pool_size = -1.0f;
// max depth of tree model. // max depth of tree model.
// Still grow tree by leaf-wise, but limit the max depth to avoid over-fitting // Still grow tree by leaf-wise, but limit the max depth to avoid over-fitting
// And the max leaves will be min(num_leaves, pow(2, max_depth - 1)) // And the max leaves will be min(num_leaves, pow(2, max_depth))
// max_depth < 0 means not limit // max_depth < 0 means no limit
int max_depth = -1; int max_depth = -1;
int top_k = 20; int top_k = 20;
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
}; };
/*! \brief Config for Boosting */ /*! \brief Config for Boosting */
...@@ -205,9 +211,11 @@ public: ...@@ -205,9 +211,11 @@ public:
bool xgboost_dart_mode = false; bool xgboost_dart_mode = false;
bool uniform_drop = false; bool uniform_drop = false;
int drop_seed = 4; int drop_seed = 4;
double top_rate = 0.2f;
double other_rate = 0.1f;
std::string tree_learner_type = "serial"; std::string tree_learner_type = "serial";
TreeConfig tree_config; TreeConfig tree_config;
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
private: private:
void GetTreeLearnerType(const std::unordered_map<std::string, void GetTreeLearnerType(const std::unordered_map<std::string,
std::string>& params); std::string>& params);
...@@ -220,7 +228,7 @@ public: ...@@ -220,7 +228,7 @@ public:
int local_listen_port = 12400; int local_listen_port = 12400;
int time_out = 120; // in minutes int time_out = 120; // in minutes
std::string machine_list_filename = ""; std::string machine_list_filename = "";
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
}; };
...@@ -241,7 +249,7 @@ public: ...@@ -241,7 +249,7 @@ public:
std::vector<std::string> metric_types; std::vector<std::string> metric_types;
MetricConfig metric_config; MetricConfig metric_config;
void Set(const std::unordered_map<std::string, std::string>& params) override; LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
private: private:
void GetBoostingType(const std::unordered_map<std::string, std::string>& params); void GetBoostingType(const std::unordered_map<std::string, std::string>& params);
...@@ -271,7 +279,7 @@ inline bool ConfigBase::GetInt( ...@@ -271,7 +279,7 @@ inline bool ConfigBase::GetInt(
const std::string& name, int* out) { const std::string& name, int* out) {
if (params.count(name) > 0) { if (params.count(name) > 0) {
if (!Common::AtoiAndCheck(params.at(name).c_str(), out)) { if (!Common::AtoiAndCheck(params.at(name).c_str(), out)) {
Log::Fatal("Parameter %s should be of type int, got [%s]", Log::Fatal("Parameter %s should be of type int, got \"%s\"",
name.c_str(), params.at(name).c_str()); name.c_str(), params.at(name).c_str());
} }
return true; return true;
...@@ -284,7 +292,7 @@ inline bool ConfigBase::GetDouble( ...@@ -284,7 +292,7 @@ inline bool ConfigBase::GetDouble(
const std::string& name, double* out) { const std::string& name, double* out) {
if (params.count(name) > 0) { if (params.count(name) > 0) {
if (!Common::AtofAndCheck(params.at(name).c_str(), out)) { if (!Common::AtofAndCheck(params.at(name).c_str(), out)) {
Log::Fatal("Parameter %s should be of type double, got [%s]", Log::Fatal("Parameter %s should be of type double, got \"%s\"",
name.c_str(), params.at(name).c_str()); name.c_str(), params.at(name).c_str());
} }
return true; return true;
...@@ -303,7 +311,7 @@ inline bool ConfigBase::GetBool( ...@@ -303,7 +311,7 @@ inline bool ConfigBase::GetBool(
} else if (value == std::string("true") || value == std::string("+")) { } else if (value == std::string("true") || value == std::string("+")) {
*out = true; *out = true;
} else { } else {
Log::Fatal("Parameter %s should be \"true\"/\"+\" or \"false\"/\"-\", got [%s]", Log::Fatal("Parameter %s should be \"true\"/\"+\" or \"false\"/\"-\", got \"%s\"",
name.c_str(), params.at(name).c_str()); name.c_str(), params.at(name).c_str());
} }
return true; return true;
...@@ -335,9 +343,12 @@ struct ParameterAlias { ...@@ -335,9 +343,12 @@ struct ParameterAlias {
{ "test_data", "valid_data" }, { "test_data", "valid_data" },
{ "test", "valid_data" }, { "test", "valid_data" },
{ "is_sparse", "is_enable_sparse" }, { "is_sparse", "is_enable_sparse" },
{ "enable_sparse", "is_enable_sparse" },
{ "pre_partition", "is_pre_partition" },
{ "tranining_metric", "is_training_metric" }, { "tranining_metric", "is_training_metric" },
{ "train_metric", "is_training_metric" }, { "train_metric", "is_training_metric" },
{ "ndcg_at", "ndcg_eval_at" }, { "ndcg_at", "ndcg_eval_at" },
{ "eval_at", "ndcg_eval_at" },
{ "min_data_per_leaf", "min_data_in_leaf" }, { "min_data_per_leaf", "min_data_in_leaf" },
{ "min_data", "min_data_in_leaf" }, { "min_data", "min_data_in_leaf" },
{ "min_child_samples", "min_data_in_leaf" }, { "min_child_samples", "min_data_in_leaf" },
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment