merge conflict

eade219e · Qiwei Ye · f23e6083 · 060bd316 · eade219e · eade219e
Commit eade219e authored Mar 18, 2017 by Qiwei Ye
20 changed files
--- a/README.md
+++ b/README.md
@@ -12,14 +12,15 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith
 For more details, please refer to [Features](https://github.com/Microsoft/LightGBM/wiki/Features).
-[Experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment) on public datasets show that LightGBM can outperform other existing boosting framework on both efficiency and accuracy, with significant lower memory consumption. What's more, the [experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
+[Experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the [experiments](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
 News
 ----
+02/20/2017 : Update to LightGBM v2.
 01/08/2017 : Release [**R-package**](./R-package) beta version, welcome to have a try and provide feedback.
-12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
+12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding.
-For the setting details, please refer to [IO Parameters](./docs/Parameters.md#io-parameters).
 12/02/2016 : Release [**python-package**](./python-package) beta version, welcome to have a try and provide feedback.
@@ -43,7 +44,7 @@ LightGBM has been developed and used by many active community members. Your help
 - Check out [call for contributions](https://github.com/Microsoft/LightGBM/issues?q=is%3Aissue+is%3Aopen+label%3Acall-for-contribution) to see what can be improved, or open an issue if you want something.
 - Contribute to the [tests](https://github.com/Microsoft/LightGBM/tree/master/tests) to make it more reliable. 
- Contribute to the [documents](https://github.com/Microsoft/LightGBM/tree/master/docs) to make it clearly for everyone.
+- Contribute to the [documents](https://github.com/Microsoft/LightGBM/tree/master/docs) to make it clearer for everyone.
 - Contribute to the [examples](https://github.com/Microsoft/LightGBM/tree/master/examples) to share your experience with other users.
 - Check out [Development Guide](./docs/development.md).
 - Open issue if you met problems during development.

--- a/docker/README.md
+++ b/docker/README.md
+# Using LightGBM via Docker
+This directory contains `Dockerfile` to make it easy to build and run LightGBM via [Docker](http://www.docker.com/).
+## Installing Docker
+Follow the general installation instructions
+[on the Docker site](https://docs.docker.com/installation/):
+* [OSX](https://docs.docker.com/installation/mac/): [docker toolbox](https://www.docker.com/toolbox)
+* [ubuntu](https://docs.docker.com/installation/ubuntulinux/)
+## Running the container
+Build the container, for python users: 
+    $ docker build -t lightgbm -f dockerfile-python .
+After build finished, run the container:
+    $ docker run --rm -it lightgbm
--- a/docker/dockerfile-python
+++ b/docker/dockerfile-python
+FROM ubuntu:16.04
+RUN apt-get update && \
+    apt-get install -y cmake build-essential gcc g++ git wget && \
+# open-mpi
+    cd /usr/local/src && mkdir openmpi && cd openmpi && \
+    wget https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.1.tar.gz && \
+    tar -xzf openmpi-2.0.1.tar.gz && cd openmpi-2.0.1 && \
+    ./configure --prefix=/usr/local/openmpi && make && make install && \
+    export PATH="/usr/local/openmpi/bin:$PATH" && \
+# lightgbm
+    cd /usr/local/src && mkdir lightgbm && cd lightgbm && \
+    git clone --recursive https://github.com/Microsoft/LightGBM && \
+    cd LightGBM && mkdir build && cd build && cmake -DUSE_MPI=ON .. && make && \
+# python-package
+    # miniconda
+    wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
+    /bin/bash Miniconda3-latest-Linux-x86_64.sh -f -b -p /opt/conda && \
+    export PATH="/opt/conda/bin:$PATH" && \
+    # lightgbm
+    conda install -y numpy scipy scikit-learn pandas && \
+    cd ../python-package && python setup.py install && \
+# clean
+    apt-get autoremove -y && apt-get clean && \
+    conda clean -i -l -t -y && \
+    rm -rf /usr/local/src/*
+ENV PATH /opt/conda/bin:$PATH
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -20,11 +20,11 @@ LightGBM FAQ
 - **Solution 1**: this error should be solved in latest version. If you still meet this error, try to remove lightgbm.egg-info folder in your python-package and reinstall, or check [this thread on stackoverflow](http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path).
- **Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already contruct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`.
+- **Question 2**: I see error messages like `Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset`, but I already construct dataset by some code like `train = lightgbm.Dataset(X_train, y_train)`, or error messages like `Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.`.
- **Solution 2**: Because LightGBM contructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when contruct a Booster. And if you set free_raw_data=True (default), the raw data (with python data struct) will be freed. So, if you want to:
+- **Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when construct a Booster. And if you set free_raw_data=True (default), the raw data (with python data struct) will be freed. So, if you want to:
-  + get label(or weight/init_score/group) before contruct dataset, it's same as get `self.label`
+  + get label(or weight/init_score/group) before construct dataset, it's same as get `self.label`
-  + set label(or weight/init_score/group) before contruct dataset, it's same as `self.label=some_label_array`
+  + set label(or weight/init_score/group) before construct dataset, it's same as `self.label=some_label_array`
-  + get num_data(or num_feature) before contruct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape`
+  + get num_data(or num_feature) before construct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape`
-  + set predictor(or reference/categorical feature) after contruct dataset, you should set free_raw_data=False or init a Dataset object with the same raw data
+  + set predictor(or reference/categorical feature) after construct dataset, you should set free_raw_data=False or init a Dataset object with the same raw data
--- a/docs/Parameters-tuning.md
+++ b/docs/Parameters-tuning.md
@@ -26,9 +26,9 @@ LightGBM uses [leaf-wise](https://github.com/Microsoft/LightGBM/wiki/Features#op
 ## For better accuracy
-* Use large ```max_bin``` (may slower)
+* Use large ```max_bin``` (may be slower)
 * Use small ```learning_rate``` with large ```num_iterations```
-* Use large ```num_leave```(may over-fitting)
+* Use large ```num_leaves```(may cause over-fitting)
 * Use bigger training data
 * Try ```dart```

--- a/docs/Parameters.md
+++ b/docs/Parameters.md
@@ -16,18 +16,20 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
 * ```task```, default=```train```, type=enum, options=```train```,```prediction```
  * ```train``` for training
  * ```prediction``` for prediction.
-* ```application```, default=```regression```, type=enum, options=```regression```,```regression_l1```,```huber```,```binary```,```lambdarank```,```multiclass```, alias=```objective```,```app```
+* ```application```, default=```regression```, type=enum, options=```regression```,```regression_l1```,```huber```,```fair```,```poisson```,```binary```,```lambdarank```,```multiclass```, alias=```objective```,```app```
  * ```regression```, regression application
    * ```regression_l2```, L2 loss, alias=```mean_squared_error```,```mse```
    * ```regression_l1```, L1 loss, alias=```mean_absolute_error```,```mae```
    * ```huber```, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia")
    * ```fair```, [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html)
+    * ```poisson```, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression "Poisson regression")
  * ```binary```, binary classification application 
  * ```lambdarank```, lambdarank application
  * ```multiclass```, multi-class classification application, should set ```num_class``` as well
 * ```boosting```, default=```gbdt```, type=enum, options=```gbdt```,```dart```, alias=```boost```,```boosting_type```
  * ```gbdt```, traditional Gradient Boosting Decision Tree 
  * ```dart```, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866)
+  * ```goss```, Gradient-based One-Side Sampling
 * ```data```, default=```""```, type=string, alias=```train```,```train_data```
  * training data, LightGBM will train from this data
 * ```valid```, default=```""```, type=multi-string, alias=```test```,```valid_data```,```test_data```
@@ -94,6 +96,10 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
  * only used in ```dart```, true if want to use xgboost dart mode
 * ```drop_seed```, default=```4```, type=int
  * only used in ```dart```, used to random seed to choose dropping models.
+* ```top_rate```, default=```0.2```, type=double
+  * only used in ```goss```,  the retain ratio of large gradient data
+* ```other_rate```, default=```0.1```, type=int
+  * only used in ```goss```,  the retain ratio of small gradient data
 ## IO parameters
@@ -173,13 +179,15 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
  * parameter for [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia"). Will be used in regression task.
 * ```fair_c```, default=```1.0```, type=double
  * parameter for [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html). Will be used in regression task.
+* ```poission_max_delta_step```, default=```0.7```, type=double
+  * parameter used to safeguard optimization
 * ```scale_pos_weight```, default=```1.0```, type=double
  * weight of positive class in binary classification task
 * ```is_unbalance```, default=```false```, type=bool
  * used in binary classification. Set this to ```true``` if training data are unbalance.
 * ```max_position```, default=```20```, type=int
  * used in lambdarank, will optimize NDCG at this position.
-* ```label_gain```, default=```{0,1,3,7,15,31,63,...}```, type=multi-double
+* ```label_gain```, default=```0,1,3,7,15,31,63,...```, type=multi-double
  * used in lambdarank, relevant gain for labels. For example, the gain of label ```2``` is ```3``` if using default label gains.
  * Separate by ```,```
 * ```num_class```, default=```1```, type=int, alias=```num_classes```
@@ -192,7 +200,9 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
  * ```l2```, square loss, alias=```mean_squared_error```, ```mse```
  * ```huber```, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss "Huber loss - Wikipedia")
  * ```fair```, [Fair loss](http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html)
+  * ```poisson```, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression "Poisson regression")
  * ```ndcg```, [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG)
+  * ```map```, [MAP](https://www.kaggle.com/wiki/MeanAveragePrecision)
  * ```auc```, [AUC](https://en.wikipedia.org/wiki/Area_under_the_curve_(pharmacokinetics))
  * ```binary_logloss```, [log loss](https://www.kaggle.com/wiki/LogarithmicLoss)
  * ```binary_error```. For one sample ```0``` for correct classification, ```1``` for error classification.
@@ -203,7 +213,7 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
  * frequency for metric output
 * ```is_training_metric```, default=```false```, type=bool
  * set this to true if need to output metric result of training
-* ```ndcg_at```, default=```{1,2,3,4,5}```, type=multi-int, alias=```ndcg_eval_at```
+* ```ndcg_at```, default=```1,2,3,4,5```, type=multi-int, alias=```ndcg_eval_at```,```eval_at```
  * NDCG evaluation position, separate by ```,```
 ## Network parameters

--- a/docs/Python-API.md
+++ b/docs/Python-API.md
@@ -5,8 +5,8 @@
    - [Booster](Python-API.md#booster)
 * [Training API](Python-API.md#training-api)
-    - [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
+    - [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_nameauto-categorical_featureauto-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
-    - [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-shuffletrue-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
+    - [cv](Python-API.md#cvparams-train_set-num_boost_round10-data_splitternone-nfold5-stratifiedfalse-shuffletrue-metricsnone-fobjnone-fevalnone-init_modelnone-feature_nameauto-categorical_featureauto-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
 * [Scikit-learn API](Python-API.md#scikit-learn-api)
    - [Common Methods](Python-API.md#common-methods)
@@ -23,6 +23,8 @@
        + [record_evaluation](Python-API.md#record_evaluationeval_result)
        + [early_stopping](Python-API.md#early_stoppingstopping_rounds-verbosetrue)
+* [Plotting](Python-API.md#plotting)
 The methods of each Class is in alphabetical order.
 ----
@@ -31,7 +33,7 @@ The methods of each Class is in alphabetical order.
 ###Dataset
-####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name=None, categorical_feature=None, params=None, free_raw_data=True)
+####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)
    Parameters
    ----------
@@ -50,12 +52,14 @@ The methods of each Class is in alphabetical order.
        Group/query size for dataset
    silent : boolean, optional
        Whether print messages during construction
-    feature_name : list of str
+    feature_name : list of str, or 'auto'
        Feature names
-    categorical_feature : list of str or list of int
+        If 'auto' and data is pandas DataFrame, use data columns name
+    categorical_feature : list of str or int, or 'auto'
        Categorical features,
        type int represents index,
        type str represents feature names (need to specify feature_name as well)
+        If 'auto' and data is pandas DataFrame, use pandas categorical columns
    params : dict, optional
        Other parameters
    free_raw_data : Bool
@@ -341,14 +345,31 @@ The methods of each Class is in alphabetical order.
        Evaluation result list.
+####feature_name()
+    Get feature names.
+    Returns
+    -------
+    result : array
+        Array of feature names.
 ####feature_importance(importance_type="split")
-    Feature importances.
+    Get feature importances.
+    Parameters
+    ----------
+    importance_type : str, default "split"
+    How the importance is calculated: "split" or "gain"
+    "split" is the number of times a feature is used in a model
+    "gain" is the total gain of splits which use the feature
    Returns
    -------
    result : array
-        Array of feature importances
+        Array of feature importances.
 ####predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True)
@@ -445,7 +466,7 @@ The methods of each Class is in alphabetical order.
 ##Training API
-####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None)
+####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None)
    Train with given parameters.
@@ -468,12 +489,14 @@ The methods of each Class is in alphabetical order.
        Note: should return (eval_name, eval_result, is_higher_better) of list of this
    init_model : file name of lightgbm model or 'Booster' instance
        model used for continued train
-    feature_name : list of str
+    feature_name : list of str, or 'auto'
        Feature names
-    categorical_feature : list of str or list of int
+        If 'auto' and data is pandas DataFrame, use data columns name
+    categorical_feature : list of str or int, or 'auto'
        Categorical features,
        type int represents index,
        type str represents feature names (need to specify feature_name as well)
+        If 'auto' and data is pandas DataFrame, use pandas categorical columns
    early_stopping_rounds: int
        Activates early stopping.
        Requires at least one validation data and one metric
@@ -513,7 +536,7 @@ The methods of each Class is in alphabetical order.
    booster : a trained booster model
-####cv(params, train_set, num_boost_round=10, nfold=5, stratified=False, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)
+####cv(params, train_set, num_boost_round=10, data_splitter=None, nfold=5, stratified=False, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)
    Cross-validation with given paramaters.
@@ -525,14 +548,14 @@ The methods of each Class is in alphabetical order.
        Data to be trained.
    num_boost_round : int
        Number of boosting iterations.
+    data_splitter : an instance with split(X) method
+        Instance with split(X) method.
    nfold : int
        Number of folds in CV.
    stratified : bool
        Perform stratified sampling.
    shuffle: bool
        Whether shuffle before split data.
-    folds : a KFold or StratifiedKFold instance
-        Sklearn KFolds or StratifiedKFolds.
    metrics : str or list of str
        Evaluation metrics to be watched in CV.
    fobj : function
@@ -541,11 +564,14 @@ The methods of each Class is in alphabetical order.
        Custom evaluation function.
    init_model : file name of lightgbm model or 'Booster' instance
        model used for continued train
-    feature_name : list of str
+    feature_name : list of str, or 'auto'
        Feature names
-    categorical_feature : list of str or int
+        If 'auto' and data is pandas DataFrame, use data columns name
-        Categorical features, type int represents index,
+    categorical_feature : list of str or int, or 'auto'
+        Categorical features,
+        type int represents index,
        type str represents feature names (need to specify feature_name as well)
+        If 'auto' and data is pandas DataFrame, use pandas categorical columns
    early_stopping_rounds: int
        Activates early stopping. CV error needs to decrease at least
        every <early_stopping_rounds> round(s) to continue.
@@ -576,7 +602,7 @@ The methods of each Class is in alphabetical order.
 ###Common Methods
-####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=50000, objective="regression", min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0, nthread=-1, silent=True, sigmoid=1.0, huber_delta=1.0, max_position=20, label_gain=None, drop_rate=0.1, skip_drop=0.5, max_drop=50, uniform_drop=False, xgboost_dart_mode=False)
+####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=50000, objective="regression", min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0, nthread=-1, silent=True, sigmoid=1.0, huber_delta=1.0, gaussian_eta=1.0, fair_c=1.0, poisson_max_delta_step=0.7, max_position=20, label_gain=None, drop_rate=0.1, skip_drop=0.5, max_drop=50, uniform_drop=False, xgboost_dart_mode=False)
    Implementation of the Scikit-Learn API for LightGBM.
@@ -636,6 +662,8 @@ The methods of each Class is in alphabetical order.
        It is used to control the width of Gaussian function to approximate hessian.
    fair_c : float
        Only used in regression. Parameter for Fair loss function.
+    poisson_max_delta_step : float
+        parameter used to safeguard optimization in Poisson regression.
    max_position : int
        Only used in lambdarank, will optimize NDCG at this position.
    label_gain : list of float
@@ -693,7 +721,7 @@ The methods of each Class is in alphabetical order.
    X_leaves : array_like, shape=[n_samples, n_trees]
-####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, callbacks=None)
+####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)
    Fit the gradient boosting model.
@@ -720,16 +748,19 @@ The methods of each Class is in alphabetical order.
    eval_metric : str, list of str, callable, optional
        If a str, should be a built-in evaluation metric to use.
        If callable, a custom evaluation metric, see note for more details.
-        default: binary_error for LGBMClassifier, l2 for LGBMRegressor, ndcg for LGBMRanker
+        default: logloss for LGBMClassifier, l2 for LGBMRegressor, ndcg for LGBMRanker
+        Can directly use 'logloss' or 'error' for LGBMClassifier.
    early_stopping_rounds : int
    verbose : bool
        If `verbose` and an evaluation set is used, writes the evaluation
-    feature_name : list of str
+    feature_name : list of str, or 'auto'
        Feature names
-    categorical_feature : list of str or int
+        If 'auto' and data is pandas DataFrame, use data columns name
+    categorical_feature : list of str or int, or 'auto'
        Categorical features,
        type int represents index,
-        type str represents feature names (need to specify feature_name as well).
+        type str represents feature names (need to specify feature_name as well)
+        If 'auto' and data is pandas DataFrame, use pandas categorical columns
    callbacks : list of callback functions
        List of callback functions that are applied at each iteration.
        See Callbacks in Python-API.md for more information.
@@ -787,7 +818,7 @@ The methods of each Class is in alphabetical order.
    Get the evaluation results.
-####feature_importance_
+####feature_importances_
    Get normailized feature importances.
@@ -823,7 +854,7 @@ The methods of each Class is in alphabetical order.
 ###LGBMRanker
-####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, callbacks=None)
+####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)
    Most arguments are same as Common Methods except:
@@ -907,3 +938,110 @@ The methods of each Class is in alphabetical order.
    -------
    callback : function
        The requested callback function.
+##Plotting
+####plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='Feature importance', ylabel='Features', importance_type='split', max_num_features=None, ignore_zero=True, figsize=None, grid=True, **kwargs):
+    Plot model feature importances.
+    Parameters
+    ----------
+    booster : Booster or LGBMModel
+        Booster or LGBMModel instance.
+    ax : matplotlib Axes
+        Target axes instance. If None, new figure and axes will be created.
+    height : float
+        Bar height, passed to ax.barh().
+    xlim : tuple of 2 elements
+        Tuple passed to axes.xlim().
+    ylim : tuple of 2 elements
+        Tuple passed to axes.ylim().
+    title : str
+        Axes title. Pass None to disable.
+    xlabel : str
+        X axis title label. Pass None to disable.
+    ylabel : str
+        Y axis title label. Pass None to disable.
+    importance_type : str
+        How the importance is calculated: "split" or "gain".
+        "split" is the number of times a feature is used in a model.
+        "gain" is the total gain of splits which use the feature.
+    max_num_features : int
+        Max number of top features displayed on plot.
+        If None or smaller than 1, all features will be displayed.
+    ignore_zero : bool
+        Ignore features with zero importance.
+    figsize : tuple of 2 elements
+        Figure size.
+    grid : bool
+        Whether add grid for axes.
+    **kwargs :
+        Other keywords passed to ax.barh().
+    Returns
+    -------
+    ax : matplotlib Axes
+####plot_metric(booster, metric=None, dataset_names=None, ax=None, xlim=None, ylim=None, title='Metric during training', xlabel='Iterations', ylabel='auto', figsize=None, grid=True):
+    Plot one metric during training.
+    Parameters
+    ----------
+    booster : dict or LGBMModel
+        Evals_result recorded by lightgbm.train() or LGBMModel instance
+    metric : str or None
+        The metric name to plot.
+        Only one metric supported because different metrics have various scales.
+        Pass None to pick `first` one (according to dict hashcode).
+    dataset_names : None or list of str
+        List of the dataset names to plot.
+        Pass None to plot all datasets.
+    ax : matplotlib Axes
+        Target axes instance. If None, new figure and axes will be created.
+    xlim : tuple of 2 elements
+        Tuple passed to axes.xlim()
+    ylim : tuple of 2 elements
+        Tuple passed to axes.ylim()
+    title : str
+        Axes title. Pass None to disable.
+    xlabel : str
+        X axis title label. Pass None to disable.
+    ylabel : str
+        Y axis title label. Pass None to disable. Pass 'auto' to use `metric`.
+    figsize : tuple of 2 elements
+        Figure size
+    grid : bool
+        Whether add grid for axes
+    Returns
+    -------
+    ax : matplotlib Axes
+####plot_tree(booster, ax=None, tree_index=0, figsize=None, graph_attr=None, node_attr=None, edge_attr=None, show_info=None):
+    Plot specified tree.
+    Parameters
+    ----------
+    booster : Booster, LGBMModel
+        Booster or LGBMModel instance.
+    ax : matplotlib Axes
+        Target axes instance. If None, new figure and axes will be created.
+    tree_index : int, default 0
+        Specify tree index of target tree.
+    figsize : tuple of 2 elements
+        Figure size.
+    graph_attr: dict
+        Mapping of (attribute, value) pairs for the graph.
+    node_attr: dict
+        Mapping of (attribute, value) pairs set for all nodes.
+    edge_attr: dict
+        Mapping of (attribute, value) pairs set for all edges.
+    show_info : list
+        Information shows on nodes.
+        options: 'split_gain', 'internal_value', 'internal_count' or 'leaf_count'.
+    Returns
+    -------
+    ax : matplotlib Axes
--- a/docs/Python-intro.md
+++ b/docs/Python-intro.md
@@ -10,6 +10,11 @@ This document gives a basic walkthrough of LightGBM python package.
 Install
 -------
 * Install the library first, follow the wiki [here](./Installation-Guide.md).
+* Install python-package dependencies, `setuptools`, `numpy` and `scipy` is required, `scikit-learn` is required for sklearn interface and recommended. Run:
+```
+pip install setuptools numpy scipy scikit-learn -U
+```
 * In the  `python-package` directory, run
 ```
 python setup.py install
@@ -73,13 +78,13 @@ LightGBM can use categorical features as input directly. It doesn't need to cove
 #### Weights can be set when needed:
 ```python
-w = np.random.rand(500, 1)
+w = np.random.rand(500, )
 train_data = lgb.Dataset(data, label=label, weight=w)
 ```
 or
 ```python
 train_data = lgb.Dataset(data, label=label)
-w = np.random.rand(500, 1)
+w = np.random.rand(500, )
 train_data.set_weight(w)
 ```

--- a/docs/Quick-Start.md
+++ b/docs/Quick-Start.md
@@ -18,7 +18,7 @@ Label is the data of first column, and there is no header in the file.
 update 12/5/2016:
-LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
+LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding.
 For the setting details, please refer to [Parameters](./Parameters.md#io-parameters).
@@ -103,7 +103,7 @@ For example, following command line will keep 'num_trees=10' and ignore same par
 ## Examples
-* [Binary Classifiaction](../examples/binary_classification)
+* [Binary Classification](../examples/binary_classification)
 * [Regression](../examples/regression)
 * [Lambdarank](../examples/lambdarank)
 * [Parallel Learning](../examples/parallel_learning)
--- a/docs/Readme.md
+++ b/docs/Readme.md
@@ -9,5 +9,6 @@ Documents
 * [Parameters Tuning](./Parameters-tuning.md)
 * [Python API Reference](./Python-API.md)
 * [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide)
+* [FAQ](./FAQ.md)
 * [Development Guide](./development.md)
--- a/examples/python-guide/README.md
+++ b/examples/python-guide/README.md
@@ -6,10 +6,9 @@ Here is an example for LightGBM to use python package.
 For the installation, check the wiki [here](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide).
-You also need scikit-learn and pandas to run the examples, but they are not required for the package itself. You can install them with pip:
+You also need scikit-learn, pandas and matplotlib (only for plot example) to run the examples, but they are not required for the package itself. You can install them with pip:
 ```
-pip install -U scikit-learn
+pip install scikit-learn pandas matplotlib -U
-pip install -U pandas
 ```
 Now you can run examples in this folder, for example:

--- a/examples/python-guide/advanced_example.py
+++ b/examples/python-guide/advanced_example.py
@@ -11,10 +11,10 @@ df_test = pd.read_csv('../binary_classification/binary.test', header=None, sep='
 W_train = pd.read_csv('../binary_classification/binary.train.weight', header=None)[0]
 W_test = pd.read_csv('../binary_classification/binary.test.weight', header=None)[0]
-y_train = df_train[0]
+y_train = df_train[0].values
-y_test = df_test[0]
+y_test = df_test[0].values
-X_train = df_train.drop(0, axis=1)
+X_train = df_train.drop(0, axis=1).values
-X_test = df_test.drop(0, axis=1)
+X_test = df_test.drop(0, axis=1).values
 num_train, num_feature = X_train.shape

--- a/examples/python-guide/plot_example.py
+++ b/examples/python-guide/plot_example.py
+# coding: utf-8
+# pylint: disable = invalid-name, C0111
+import lightgbm as lgb
+import pandas as pd
+try:
+    import matplotlib.pyplot as plt
+except ImportError:
+    raise ImportError('You need to install matplotlib for plot_example.py.')
+# load or create your dataset
+print('Load data...')
+df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
+df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
+y_train = df_train[0].values
+y_test = df_test[0].values
+X_train = df_train.drop(0, axis=1).values
+X_test = df_test.drop(0, axis=1).values
+# create dataset for lightgbm
+lgb_train = lgb.Dataset(X_train, y_train)
+lgb_test = lgb.Dataset(X_test, y_test, reference=lgb_train)
+# specify your configurations as a dict
+params = {
+    'num_leaves': 5,
+    'metric': ('l1', 'l2'),
+    'verbose': 0
+}
+evals_result = {}  # to record eval results for plotting
+print('Start training...')
+# train
+gbm = lgb.train(params,
+                lgb_train,
+                num_boost_round=100,
+                valid_sets=[lgb_train, lgb_test],
+                feature_name=['f' + str(i + 1) for i in range(28)],
+                categorical_feature=[21],
+                evals_result=evals_result,
+                verbose_eval=10)
+print('Plot metrics during training...')
+ax = lgb.plot_metric(evals_result, metric='l1')
+plt.show()
+print('Plot feature importances...')
+ax = lgb.plot_importance(gbm, max_num_features=10)
+plt.show()
+print('Plot 84th tree...')  # one tree use categorical feature to split
+ax = lgb.plot_tree(gbm, tree_index=83, figsize=(20, 8), show_info=['split_gain'])
+plt.show()
--- a/examples/python-guide/simple_example.py
+++ b/examples/python-guide/simple_example.py
@@ -10,10 +10,10 @@ print('Load data...')
 df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
 df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
-y_train = df_train[0]
+y_train = df_train[0].values
-y_test = df_test[0]
+y_test = df_test[0].values
-X_train = df_train.drop(0, axis=1)
+X_train = df_train.drop(0, axis=1).values
-X_test = df_test.drop(0, axis=1)
+X_test = df_test.drop(0, axis=1).values
 # create dataset for lightgbm
 lgb_train = lgb.Dataset(X_train, y_train)
@@ -58,7 +58,8 @@ model_json = gbm.dump_model()
 with open('model.json', 'w+') as f:
    json.dump(model_json, f, indent=4)
+print('Feature names:', gbm.feature_name())
 print('Calculate feature importances...')
 # feature importances
 print('Feature importances:', list(gbm.feature_importance()))
-# print('Feature importances:', list(gbm.feature_importance("gain")))
--- a/examples/python-guide/sklearn_example.py
+++ b/examples/python-guide/sklearn_example.py
@@ -10,10 +10,10 @@ print('Load data...')
 df_train = pd.read_csv('../regression/regression.train', header=None, sep='\t')
 df_test = pd.read_csv('../regression/regression.test', header=None, sep='\t')
-y_train = df_train[0]
+y_train = df_train[0].values
-y_test = df_test[0]
+y_test = df_test[0].values
-X_train = df_train.drop(0, axis=1)
+X_train = df_train.drop(0, axis=1).values
-X_test = df_test.drop(0, axis=1)
+X_test = df_test.drop(0, axis=1).values
 print('Start training...')
 # train
@@ -34,7 +34,7 @@ print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)
 print('Calculate feature importances...')
 # feature importances
-print('Feature importances:', list(gbm.feature_importance_))
+print('Feature importances:', list(gbm.feature_importances_))
 # other scikit-learn modules
 estimator = lgb.LGBMRegressor(num_leaves=31)

--- a/include/LightGBM/application.h
+++ b/include/LightGBM/application.h
@@ -19,8 +19,8 @@ class Metric;
 * \brief The main entrance of LightGBM. this application has two tasks:
 *        Train and Predict.
 *        Train task will train a new model
-*        Predict task will predicting the scores of test data using exsiting model,
+*        Predict task will predict the scores of test data using exsisting model,
-*        and saving the score to disk.
+*        and save the score to disk.
 */
 class Application {
 public:
@@ -41,7 +41,7 @@ private:
  template<typename T>
  T GlobalSyncUpByMin(T& local);
-  /*! \brief Load parametes from command line and config file*/
+  /*! \brief Load parameters from command line and config file*/
  void LoadParameters(int argc, char** argv);
  /*! \brief Load data, including training data and validation data*/

--- a/include/LightGBM/bin.h
+++ b/include/LightGBM/bin.h
 #ifndef LIGHTGBM_BIN_H_
 #define LIGHTGBM_BIN_H_
+#include <LightGBM/utils/common.h>
 #include <LightGBM/meta.h>
 #include <vector>
 #include <functional>
 #include <unordered_map>
+#include <sstream>
 namespace LightGBM {
@@ -14,16 +17,16 @@ enum BinType {
  CategoricalBin
 };
 /*! \brief Store data for one histogram bin */
 struct HistogramBinEntry {
 public:
  /*! \brief Sum of gradients on this bin */
-  double sum_gradients = 0.0;
+  double sum_gradients = 0.0f;
  /*! \brief Sum of hessians on this bin */
-  double sum_hessians = 0.0;
+  double sum_hessians = 0.0f;
  /*! \brief Number of data on this bin */
  data_size_t cnt = 0;
  /*!
  * \brief Sum up (reducers) functions for histogram bin
  */
@@ -56,13 +59,11 @@ public:
  explicit BinMapper(const void* memory);
  ~BinMapper();
+  static double kSparseThreshold;
  bool CheckAlign(const BinMapper& other) const {
    if (num_bin_ != other.num_bin_) {
      return false;
    }
-    if (bin_type_ != other.bin_type_) {
-      return false;
-    }
    if (bin_type_ == BinType::NumericalBin) {
      for (int i = 0; i < num_bin_; ++i) {
        if (bin_upper_bound_[i] != other.bin_upper_bound_[i]) {
@@ -95,7 +96,7 @@ public:
  * \param bin
  * \return Feature value of this bin
  */
-  inline double BinToValue(unsigned int bin) const {
+  inline double BinToValue(uint32_t bin) const {
    if (bin_type_ == BinType::NumericalBin) {
      return bin_upper_bound_[bin];
    } else {
@@ -111,26 +112,25 @@ public:
  * \param value
  * \return bin for this feature value
  */
-  inline unsigned int ValueToBin(double value) const;
+  inline uint32_t ValueToBin(double value) const;
  /*!
-  * \brief Get the default bin when value is 0 or is firt categorical
+  * \brief Get the default bin when value is 0
  * \return default bin
  */
  inline uint32_t GetDefaultBin() const {
-    if (bin_type_ == BinType::NumericalBin) {
+    return default_bin_;
-      return ValueToBin(0);
-    } else {
-      return 0;
-    }
  }
  /*!
  * \brief Construct feature value to bin mapper according feature values
-  * \param values (Sampled) values of this feature
+  * \param values (Sampled) values of this feature, Note: not include zero. 
+  * \param total_sample_cnt number of total sample count, equal with values.size() + num_zeros
  * \param max_bin The maximal number of bin
+  * \param min_data_in_bin min number of data in one bin
+  * \param min_split_data
  * \param bin_type Type of this bin
  */
-  void FindBin(std::vector<double>* values, size_t total_sample_cnt, int max_bin, BinType bin_type);
+  void FindBin(std::vector<double>& values, size_t total_sample_cnt, int max_bin, int min_data_in_bin, int min_split_data, BinType bin_type);
  /*!
  * \brief Use specific number of bin to calculate the size of this class
@@ -151,7 +151,25 @@ public:
  */
  void CopyFrom(const char* buffer);
+  /*!
+  * \brief Get bin types
+  */
  inline BinType bin_type() const { return bin_type_; }
+  /*!
+  * \brief Get bin info
+  */
+  inline std::string bin_info() const {
+    if (bin_type_ == BinType::CategoricalBin) {
+      return Common::Join(bin_2_categorical_, ":");
+    } else {
+      std::stringstream str_buf;
+      str_buf << std::setprecision(std::numeric_limits<double>::digits10 + 2);
+      str_buf << '[' << min_val_ << ':' << max_val_ << ']';
+      return str_buf.str();
+    }
+  }
 private:
  /*! \brief Number of bins */
  int num_bin_;
@@ -167,6 +185,12 @@ private:
  std::unordered_map<int, unsigned int> categorical_2_bin_;
  /*! \brief Mapper from bin to categorical */
  std::vector<int> bin_2_categorical_;
+  /*! \brief minimal feature vaule */
+  double min_val_;
+  /*! \brief maximum feature value */
+  double max_val_;
+  /*! \brief bin value of feature value 0 */
+  uint32_t default_bin_;
 };
 /*!
@@ -188,7 +212,7 @@ public:
           (this logic was build for bagging logic)
  * \param num_leaves Number of leaves on this iteration
  */
-  virtual void Init(const char* used_idices, data_size_t num_leaves) = 0;
+  virtual void Init(const char* used_indices, data_size_t num_leaves) = 0;
  /*!
  * \brief Construct histogram by using this bin
@@ -206,9 +230,12 @@ public:
  * \brief Split current bin, and perform re-order by leaf
  * \param leaf Using which leaf's to split
  * \param right_leaf The new leaf index after perform this split
-  * \param left_indices left_indices[i] == true means the i-th data will be on left leaf after split
+  * \param is_in_leaf is_in_leaf[i] == mark means the i-th data will be on left leaf after split
+  * \param mark is_in_leaf[i] == mark means the i-th data will be on left leaf after split
  */
-  virtual void Split(int leaf, int right_leaf, const char* left_indices) = 0;
+  virtual void Split(int leaf, int right_leaf, const char* is_in_leaf, char mark) = 0;
+  virtual data_size_t NonZeroCount(int leaf) const = 0;
 };
 /*! \brief Iterator for one bin column */
@@ -220,6 +247,8 @@ public:
  * \return Bin data
  */
  virtual uint32_t Get(data_size_t idx) = 0;
+  virtual void Reset(data_size_t idx) = 0;
+  virtual ~BinIterator() = default;
 };
 /*!
@@ -240,12 +269,16 @@ public:
  */
  virtual void Push(int tid, data_size_t idx, uint32_t value) = 0;
+  virtual void CopySubset(const Bin* full_bin, const data_size_t* used_indices, data_size_t num_used_indices) = 0;
  /*!
-  * \brief Get bin interator of this bin
+  * \brief Get bin iterator of this bin for specific feature
-  * \param start_idx start index of this 
+  * \param min_bin min_bin of current used feature
+  * \param max_bin max_bin of current used feature
+  * \param default_bin default bin if bin not in [min_bin, max_bin]
  * \return Iterator of this bin
  */
-  virtual BinIterator* GetIterator(data_size_t start_idx) const = 0;
+  virtual BinIterator* GetIterator(uint32_t min_bin, uint32_t max_bin, uint32_t default_bin) const = 0;
  /*!
  * \brief Save binary data to file
@@ -255,7 +288,8 @@ public:
  /*!
  * \brief Load from memory
-  * \param file File want to write
+  * \param memory
+  * \param local_used_indices
  */
  virtual void LoadFromMemory(const void* memory,
    const std::vector<data_size_t>& local_used_indices) = 0;
@@ -268,10 +302,12 @@ public:
  /*! \brief Number of all data */
  virtual data_size_t num_data() const = 0;
+  virtual void ReSize(data_size_t num_data) = 0;
  /*!
  * \brief Construct histogram of this feature,
  *        Note: We use ordered_gradients and ordered_hessians to improve cache hit chance
-  *        The navie solution is use gradients[data_indices[i]] for data_indices[i] to get gradients, 
+  *        The naive solution is using gradients[data_indices[i]] for data_indices[i] to get gradients,
           which is not cache friendly, since the access of memory is not continuous.
  *        ordered_gradients and ordered_hessians are preprocessed, and they are re-ordered by data_indices.
  *        Ordered_gradients[i] is aligned with data_indices[i]'s gradients (same for ordered_hessians).
@@ -288,17 +324,21 @@ public:
  /*!
  * \brief Split data according to threshold, if bin <= threshold, will put into left(lte_indices), else put into right(gt_indices)
+  * \param min_bin min_bin of current used feature
+  * \param max_bin max_bin of current used feature
+  * \param default_bin defualt bin if bin not in [min_bin, max_bin]
  * \param threshold The split threshold.
  * \param data_indices Used data indices. After called this function. The less than or equal data indices will store on this object.
  * \param num_data Number of used data
  * \param lte_indices After called this function. The less or equal data indices will store on this object.
  * \param gt_indices After called this function. The greater data indices will store on this object.
+  * \param bin_type type of bin
  * \return The number of less than or equal data.
  */
-  virtual data_size_t Split(
+  virtual data_size_t Split(uint32_t min_bin, uint32_t max_bin, 
-    unsigned int threshold,
+    uint32_t default_bin, uint32_t threshold,
    data_size_t* data_indices, data_size_t num_data,
-    data_size_t* lte_indices, data_size_t* gt_indices) const = 0;
+    data_size_t* lte_indices, data_size_t* gt_indices, BinType bin_type) const = 0;
  /*!
  * \brief Create the ordered bin for this bin
@@ -315,44 +355,35 @@ public:
  * \brief Create object for bin data of one feature, will call CreateDenseBin or CreateSparseBin according to "is_sparse"
  * \param num_data Total number of data
  * \param num_bin Number of bin
-  * \param is_sparse True if this feature is sparse
  * \param sparse_rate Sparse rate of this bins( num_bin0/num_data )
  * \param is_enable_sparse True if enable sparse feature
  * \param is_sparse Will set to true if this bin is sparse
  * \param default_bin Default bin for zeros value
-  * \param bin_type type of bin
  * \return The bin data object
  */
  static Bin* CreateBin(data_size_t num_data, int num_bin,
-    double sparse_rate, bool is_enable_sparse, 
+    double sparse_rate, bool is_enable_sparse, bool* is_sparse);
-    bool* is_sparse, int default_bin, BinType bin_type);
  /*!
  * \brief Create object for bin data of one feature, used for dense feature
  * \param num_data Total number of data
  * \param num_bin Number of bin
-  * \param default_bin Default bin for zeros value
-  * \param bin_type type of bin
  * \return The bin data object
  */
-  static Bin* CreateDenseBin(data_size_t num_data, int num_bin, 
+  static Bin* CreateDenseBin(data_size_t num_data, int num_bin);
-    int default_bin, BinType bin_type);
  /*!
  * \brief Create object for bin data of one feature, used for sparse feature
  * \param num_data Total number of data
  * \param num_bin Number of bin
-  * \param default_bin Default bin for zeros value
-  * \param bin_type type of bin
  * \return The bin data object
  */
-  static Bin* CreateSparseBin(data_size_t num_data,
+  static Bin* CreateSparseBin(data_size_t num_data, int num_bin);
-    int num_bin, int default_bin, BinType bin_type);
 };
-inline unsigned int BinMapper::ValueToBin(double value) const {
+inline uint32_t BinMapper::ValueToBin(double value) const {
-  // binary search to find bin
  if (bin_type_ == BinType::NumericalBin) {
+    // binary search to find bin
    int l = 0;
    int r = num_bin_ - 1;
    while (l < r) {

--- a/include/LightGBM/boosting.h
+++ b/include/LightGBM/boosting.h
@@ -17,7 +17,7 @@ class Metric;
 /*!
 * \brief The interface for Boosting
 */
-class Boosting {
+class LIGHTGBM_EXPORT Boosting {
 public:
  /*! \brief virtual destructor */
  virtual ~Boosting() {}
@@ -99,14 +99,14 @@ public:
  /*!
  * \brief Get prediction result at data_idx data
  * \param data_idx 0: training data, 1: 1st validation data
-  * \return out_len lenght of returned score
+  * \return out_len length of returned score
  */
  virtual int64_t GetNumPredictAt(int data_idx) const = 0;
  /*!
  * \brief Get prediction result at data_idx data
  * \param data_idx 0: training data, 1: 1st validation data
  * \param result used to store prediction result, should allocate memory before call this function
-  * \param out_len lenght of returned score
+  * \param out_len length of returned score
  */
  virtual void GetPredictAt(int data_idx, double* result, int64_t* out_len) = 0;
@@ -125,7 +125,7 @@ public:
  virtual std::vector<double> Predict(const double* feature_values) const = 0;
  /*!
-  * \brief Predtion for one record with leaf index
+  * \brief Prediction for one record with leaf index
  * \param feature_values Feature value on this record
  * \return Predicted leaf index for this record
  */
@@ -143,14 +143,23 @@ public:
  * \param num_used_model Number of model that want to save, -1 means save all
  * \param is_finish Is training finished or not
  * \param filename Filename that want to save to
+  * \return true if succeeded
+  */
+  virtual bool SaveModelToFile(int num_iterations, const char* filename) const = 0;
+  /*!
+  * \brief Save model to string
+  * \param num_used_model Number of model that want to save, -1 means save all
+  * \return Non-empty string if succeeded
  */
-  virtual void SaveModelToFile(int num_iterations, const char* filename) const = 0;
+  virtual std::string SaveModelToString(int num_iterations) const = 0;
  /*!
  * \brief Restore from a serialized string
  * \param model_str The string of model
+  * \return true if succeeded
  */
-  virtual void LoadModelFromString(const std::string& model_str) = 0;
+  virtual bool LoadModelFromString(const std::string& model_str) = 0;
  /*!
  * \brief Get max feature index of this model
@@ -158,6 +167,12 @@ public:
  */
  virtual int MaxFeatureIdx() const = 0;
+  /*!
+  * \brief Get feature names of this model
+  * \return Feature names of this model
+  */
+  virtual std::vector<std::string> FeatureNames() const = 0;
  /*!
  * \brief Get index of label column
  * \return index of label column
@@ -192,7 +207,7 @@ public:
  /*! \brief Disable copy */
  Boosting(const Boosting&) = delete;
-  static void LoadFileToBoosting(Boosting* boosting, const char* filename);
+  static bool LoadFileToBoosting(Boosting* boosting, const char* filename);
  /*!
  * \brief Create boosting object

--- a/include/LightGBM/c_api.h
+++ b/include/LightGBM/c_api.h
--- a/include/LightGBM/config.h
+++ b/include/LightGBM/config.h
@@ -5,6 +5,7 @@
 #include <LightGBM/utils/log.h>
 #include <LightGBM/meta.h>
+#include <LightGBM/export.h>
 #include <vector>
 #include <string>
@@ -84,7 +85,7 @@ enum TaskType {
 /*! \brief Config for input and output files */
 struct IOConfig: public ConfigBase {
 public:
-  int max_bin = 256;
+  int max_bin = 255;
  int num_class = 1;
  int data_random_seed = 1;
  std::string data_filename = "";
@@ -99,10 +100,14 @@ public:
  bool use_two_round_loading = false;
  bool is_save_binary_file = false;
  bool enable_load_from_binary_file = true;
-  int bin_construct_sample_cnt = 50000;
+  int bin_construct_sample_cnt = 200000;
  bool is_predict_leaf_index = false;
  bool is_predict_raw_score = false;
+  int min_data_in_leaf = 100;
+  int min_data_in_bin = 5;
+  double max_conflict_rate = 0.0000f;
+  bool enable_bundle = true;
+  bool adjacent_bundle = false;
  bool has_header = false;
  /*! \brief Index or column name of label, default is the first column
   * And add an prefix "name:" while using column name */
@@ -123,7 +128,7 @@ public:
  * And add an prefix "name:" while using column name
  * Note: when using Index, it dosen't count the label index */
  std::string categorical_column = "";
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 };
 /*! \brief Config for objective function */
@@ -133,8 +138,9 @@ public:
  double sigmoid = 1.0f;
  double huber_delta = 1.0f;
  double fair_c = 1.0f;
-  // for ApproximateHessianWithGaussian
+  // for Approximate Hessian With Gaussian
  double gaussian_eta = 1.0f;
+  double poisson_max_delta_step = 0.7f;
  // for lambdarank
  std::vector<double> label_gain;
  // for lambdarank
@@ -145,7 +151,7 @@ public:
  int num_class = 1;
  // Balancing of positive and negative weights
  double scale_pos_weight = 1.0f;
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 };
 /*! \brief Config for metrics interface*/
@@ -158,7 +164,7 @@ public:
  double fair_c = 1.0f;
  std::vector<double> label_gain;
  std::vector<int> eval_at;
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 };
@@ -174,15 +180,15 @@ public:
  int num_leaves = 127;
  int feature_fraction_seed = 2;
  double feature_fraction = 1.0f;
-  // max cache size(unit:MB) for historical histogram. < 0 means not limit
+  // max cache size(unit:MB) for historical histogram. < 0 means no limit
  double histogram_pool_size = -1.0f;
  // max depth of tree model.
  // Still grow tree by leaf-wise, but limit the max depth to avoid over-fitting
-  // And the max leaves will be min(num_leaves, pow(2, max_depth - 1))
+  // And the max leaves will be min(num_leaves, pow(2, max_depth))
-  // max_depth < 0 means not limit
+  // max_depth < 0 means no limit
  int max_depth = -1;
  int top_k = 20;
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 };
 /*! \brief Config for Boosting */
@@ -205,9 +211,11 @@ public:
  bool xgboost_dart_mode = false;
  bool uniform_drop = false;
  int drop_seed = 4;
+  double top_rate = 0.2f;
+  double other_rate = 0.1f;
  std::string tree_learner_type = "serial";
  TreeConfig tree_config;
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 private:
  void GetTreeLearnerType(const std::unordered_map<std::string,
    std::string>& params);
@@ -220,7 +228,7 @@ public:
  int local_listen_port = 12400;
  int time_out = 120;  // in minutes
  std::string machine_list_filename = "";
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 };
@@ -241,7 +249,7 @@ public:
  std::vector<std::string> metric_types;
  MetricConfig metric_config;
-  void Set(const std::unordered_map<std::string, std::string>& params) override;
+  LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
 private:
  void GetBoostingType(const std::unordered_map<std::string, std::string>& params);
@@ -271,7 +279,7 @@ inline bool ConfigBase::GetInt(
  const std::string& name, int* out) {
  if (params.count(name) > 0) {
    if (!Common::AtoiAndCheck(params.at(name).c_str(), out)) {
-      Log::Fatal("Parameter %s should be of type int, got [%s]",
+      Log::Fatal("Parameter %s should be of type int, got \"%s\"",
        name.c_str(), params.at(name).c_str());
    }
    return true;
@@ -284,7 +292,7 @@ inline bool ConfigBase::GetDouble(
  const std::string& name, double* out) {
  if (params.count(name) > 0) {
    if (!Common::AtofAndCheck(params.at(name).c_str(), out)) {
-      Log::Fatal("Parameter %s should be of type double, got [%s]",
+      Log::Fatal("Parameter %s should be of type double, got \"%s\"",
        name.c_str(), params.at(name).c_str());
    }
    return true;
@@ -303,7 +311,7 @@ inline bool ConfigBase::GetBool(
    } else if (value == std::string("true") || value == std::string("+")) {
      *out = true;
    } else {
-      Log::Fatal("Parameter %s should be \"true\"/\"+\" or \"false\"/\"-\", got [%s]",
+      Log::Fatal("Parameter %s should be \"true\"/\"+\" or \"false\"/\"-\", got \"%s\"",
        name.c_str(), params.at(name).c_str());
    }
    return true;
@@ -335,9 +343,12 @@ struct ParameterAlias {
      { "test_data", "valid_data" },
      { "test", "valid_data" },
      { "is_sparse", "is_enable_sparse" },
+      { "enable_sparse", "is_enable_sparse" },
+      { "pre_partition", "is_pre_partition" },
      { "tranining_metric", "is_training_metric" },
      { "train_metric", "is_training_metric" },
      { "ndcg_at", "ndcg_eval_at" },
+      { "eval_at", "ndcg_eval_at" },
      { "min_data_per_leaf", "min_data_in_leaf" },
      { "min_data", "min_data_in_leaf" },
      { "min_child_samples", "min_data_in_leaf" },