Commit 7a166fb3 authored by Nikita Titov's avatar Nikita Titov Committed by Qiwei Ye
Browse files

made parameters consistent in cpp and python code; added missed aliases to the docs (#1018)

* fixed parameters consistent

* added aliases to docs

* added missed parameter top_k

* fixed ignored subsample_for_bin parameter

* added missed aliases to Quick Start Guide
parent b3c20f7a
...@@ -96,9 +96,10 @@ Core Parameters ...@@ -96,9 +96,10 @@ Core Parameters
- support multi validation data, separate by ``,`` - support multi validation data, separate by ``,``
- ``num_iterations``, default=\ ``100``, type=int, - ``num_iterations``, default=\ ``100``, type=int,
alias=\ ``num_iteration``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds`` alias=\ ``num_iteration``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds``, ``num_boost_round``
- number of boosting iterations - number of boosting iterations
- **Note**: for Python/R package, **this parameter is ignored**, - **Note**: for Python/R package, **this parameter is ignored**,
use ``num_boost_round`` (Python) or ``nrounds`` (R) input arguments of ``train`` and ``cv`` methods instead use ``num_boost_round`` (Python) or ``nrounds`` (R) input arguments of ``train`` and ``cv`` methods instead
...@@ -114,7 +115,7 @@ Core Parameters ...@@ -114,7 +115,7 @@ Core Parameters
- number of leaves in one tree - number of leaves in one tree
- ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data`` - ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, alias=\ ``tree``
- ``serial``, single machine tree learner - ``serial``, single machine tree learner
...@@ -157,16 +158,16 @@ Learning Control Parameters ...@@ -157,16 +158,16 @@ Learning Control Parameters
- ``< 0`` means no limit - ``< 0`` means no limit
- ``min_data_in_leaf``, default=\ ``20``, type=int, alias=\ ``min_data_per_leaf`` , ``min_data`` - ``min_data_in_leaf``, default=\ ``20``, type=int, alias=\ ``min_data_per_leaf`` , ``min_data``, ``min_child_samples``
- minimal number of data in one leaf. Can be used to deal with over-fitting - minimal number of data in one leaf. Can be used to deal with over-fitting
- ``min_sum_hessian_in_leaf``, default=\ ``1e-3``, type=double, - ``min_sum_hessian_in_leaf``, default=\ ``1e-3``, type=double,
alias=\ ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian`` alias=\ ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian``, ``min_child_weight``
- minimal sum hessian in one leaf. Like ``min_data_in_leaf``, it can be used to deal with over-fitting - minimal sum hessian in one leaf. Like ``min_data_in_leaf``, it can be used to deal with over-fitting
- ``feature_fraction``, default=\ ``1.0``, type=double, ``0.0 < feature_fraction < 1.0``, alias=\ ``sub_feature`` - ``feature_fraction``, default=\ ``1.0``, type=double, ``0.0 < feature_fraction < 1.0``, alias=\ ``sub_feature``, ``colsample_bytree``
- LightGBM will randomly select part of features on each iteration if ``feature_fraction`` smaller than ``1.0``. - LightGBM will randomly select part of features on each iteration if ``feature_fraction`` smaller than ``1.0``.
For example, if set to ``0.8``, will select 80% features before training each tree For example, if set to ``0.8``, will select 80% features before training each tree
...@@ -179,7 +180,7 @@ Learning Control Parameters ...@@ -179,7 +180,7 @@ Learning Control Parameters
- random seed for ``feature_fraction`` - random seed for ``feature_fraction``
- ``bagging_fraction``, default=\ ``1.0``, type=double, ``0.0 < bagging_fraction < 1.0``, alias=\ ``sub_row`` - ``bagging_fraction``, default=\ ``1.0``, type=double, ``0.0 < bagging_fraction < 1.0``, alias=\ ``sub_row``, ``subsample``
- like ``feature_fraction``, but this will randomly select part of data without resampling - like ``feature_fraction``, but this will randomly select part of data without resampling
...@@ -189,13 +190,13 @@ Learning Control Parameters ...@@ -189,13 +190,13 @@ Learning Control Parameters
- **Note**: To enable bagging, ``bagging_freq`` should be set to a non zero value as well - **Note**: To enable bagging, ``bagging_freq`` should be set to a non zero value as well
- ``bagging_freq``, default=\ ``0``, type=int - ``bagging_freq``, default=\ ``0``, type=int, alias=\ ``subsample_freq``
- frequency for bagging, ``0`` means disable bagging. ``k`` means will perform bagging at every ``k`` iteration - frequency for bagging, ``0`` means disable bagging. ``k`` means will perform bagging at every ``k`` iteration
- **Note**: to enable bagging, ``bagging_fraction`` should be set as well - **Note**: to enable bagging, ``bagging_fraction`` should be set as well
- ``bagging_seed`` , default=\ ``3``, type=int - ``bagging_seed`` , default=\ ``3``, type=int, alias=\ ``bagging_fraction_seed``
- random seed for bagging - random seed for bagging
...@@ -203,15 +204,15 @@ Learning Control Parameters ...@@ -203,15 +204,15 @@ Learning Control Parameters
- will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds - will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds
- ``lambda_l1``, default=\ ``0``, type=double - ``lambda_l1``, default=\ ``0``, type=double, alias=\ ``reg_alpha``
- L1 regularization - L1 regularization
- ``lambda_l2``, default=\ ``0``, type=double - ``lambda_l2``, default=\ ``0``, type=double, alias=\ ``reg_lambda``
- L2 regularization - L2 regularization
- ``min_gain_to_split``, default=\ ``0``, type=double - ``min_split_gain``, default=\ ``0``, type=double, alias=\ ``min_gain_to_split``
- the minimal gain to perform split - the minimal gain to perform split
...@@ -261,9 +262,9 @@ Learning Control Parameters ...@@ -261,9 +262,9 @@ Learning Control Parameters
- ``cat_smooth``, default=\ ``10``, type=double - ``cat_smooth``, default=\ ``10``, type=double
- use for the categorical features - used for the categorical features
- this can reduce the effect of noises in categorical features, especially for categories with few data - this can reduce the effect of noises in categorical features, especially for categories with few data
- ``cat_l2``, default=\ ``10``, type=double - ``cat_l2``, default=\ ``10``, type=double
...@@ -271,7 +272,13 @@ Learning Control Parameters ...@@ -271,7 +272,13 @@ Learning Control Parameters
- ``max_cat_to_onehot``, default=\ ``4``, type=int - ``max_cat_to_onehot``, default=\ ``4``, type=int
- When number of categories of one feature smaller than or equal to ``max_cat_to_onehot``, will use one-vs-other split algorithm. - when number of categories of one feature smaller than or equal to ``max_cat_to_onehot``, one-vs-other split algorithm will be used
- ``top_k``, default=\ ``20``, type=int, alias=\ ``topk``
- used in `Voting parallel <./Parallel-Learning-Guide.rst#choose-appropriate-parallel-algorithm>`__
- set this to larger value for more accurate result, but it will slow down the training speed
IO Parameters IO Parameters
------------- -------------
...@@ -311,25 +318,25 @@ IO Parameters ...@@ -311,25 +318,25 @@ IO Parameters
- ``model_format``, default=\ ``text``, type=string - ``model_format``, default=\ ``text``, type=string
- format to save and load model. - format to save and load model
- ``text``, use text string. - if ``text``, text string will be used
- ``proto``, use protocol buffer binary format. - if ``proto``, Protocol Buffer binary format will be used
- save multiple formats by joining them with comma, like ``text,proto``, in this case, ``model_format`` will be add as suffix after ``output_model``. - you can save in multiple formats by joining them with comma, like ``text,proto``. In this case, ``model_format`` will be add as suffix after ``output_model``
- not support loading with multiple formats. - **Note**: loading with multiple formats is not supported
- Note: you need to cmake with -DUSE_PROTO=ON to use this parameter. - **Note**: to use this parameter you need to `build version with Protobuf Support <./Installation-Guide.rst#protobuf-support>`__
- ``is_pre_partition``, default=\ ``false``, type=bool - ``pre_partition``, default=\ ``false``, type=bool, alias=\ ``is_pre_partition``
- used for parallel learning (not include feature parallel) - used for parallel learning (not include feature parallel)
- ``true`` if training data are pre-partitioned, and different machines use different partitions - ``true`` if training data are pre-partitioned, and different machines use different partitions
- ``is_sparse``, default=\ ``true``, type=bool, alias=\ ``is_enable_sparse`` - ``is_sparse``, default=\ ``true``, type=bool, alias=\ ``is_enable_sparse``, ``enable_sparse``
- used to enable/disable sparse optimization. Set to ``false`` to disable sparse optimization - used to enable/disable sparse optimization. Set to ``false`` to disable sparse optimization
...@@ -429,7 +436,7 @@ IO Parameters ...@@ -429,7 +436,7 @@ IO Parameters
- set to ``true`` to estimate `SHAP values`_, which represent how each feature contributs to each prediction. - set to ``true`` to estimate `SHAP values`_, which represent how each feature contributs to each prediction.
Produces number of features + 1 values where the last value is the expected value of the model output over the training data Produces number of features + 1 values where the last value is the expected value of the model output over the training data
- ``bin_construct_sample_cnt``, default=\ ``200000``, type=int - ``bin_construct_sample_cnt``, default=\ ``200000``, type=int, alias=\ ``subsample_for_bin``
- number of data that sampled to construct histogram bins - number of data that sampled to construct histogram bins
...@@ -509,7 +516,7 @@ Objective Parameters ...@@ -509,7 +516,7 @@ Objective Parameters
- adjust initial score to the mean of labels for faster convergence - adjust initial score to the mean of labels for faster convergence
- ``is_unbalance``, default=\ ``false``, type=bool - ``is_unbalance``, default=\ ``false``, type=bool, alias=\ ``unbalanced_sets``
- used in ``binary`` classification - used in ``binary`` classification
...@@ -572,7 +579,7 @@ Metric Parameters ...@@ -572,7 +579,7 @@ Metric Parameters
- frequency for metric output - frequency for metric output
- ``is_training_metric``, default=\ ``false``, type=bool - ``train_metric``, default=\ ``false``, type=bool, alias=\ ``training_metric``, ``is_training_metric``
- set this to ``true`` if you need to output metric result of training - set this to ``true`` if you need to output metric result of training
...@@ -601,7 +608,7 @@ Following parameters are used for parallel learning, and only used for base (soc ...@@ -601,7 +608,7 @@ Following parameters are used for parallel learning, and only used for base (soc
- socket time-out in minutes - socket time-out in minutes
- ``machine_list_file``, default=\ ``""``, type=string - ``machine_list_file``, default=\ ``""``, type=string, alias=\ ``mlist``
- file that lists machines for this parallel learning application - file that lists machines for this parallel learning application
......
...@@ -115,7 +115,7 @@ Some important parameters: ...@@ -115,7 +115,7 @@ Some important parameters:
- support multi validation data, separate by ``,`` - support multi validation data, separate by ``,``
- ``num_iterations``, default=\ ``100``, type=int, - ``num_iterations``, default=\ ``100``, type=int,
alias=\ ``num_iteration``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds`` alias=\ ``num_iteration``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds``, ``num_boost_round``
- number of boosting iterations/trees - number of boosting iterations/trees
...@@ -127,7 +127,7 @@ Some important parameters: ...@@ -127,7 +127,7 @@ Some important parameters:
- number of leaves in one tree - number of leaves in one tree
- ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data`` - ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, alias=\ ``tree``
- ``serial``, single machine tree learner - ``serial``, single machine tree learner
...@@ -154,12 +154,12 @@ Some important parameters: ...@@ -154,12 +154,12 @@ Some important parameters:
- ``< 0`` means no limit - ``< 0`` means no limit
- ``min_data_in_leaf``, default=\ ``20``, type=int, alias=\ ``min_data_per_leaf`` , ``min_data`` - ``min_data_in_leaf``, default=\ ``20``, type=int, alias=\ ``min_data_per_leaf`` , ``min_data``, ``min_child_samples``
- minimal number of data in one leaf. Can use this to deal with over-fitting - minimal number of data in one leaf. Can use this to deal with over-fitting
- ``min_sum_hessian_in_leaf``, default=\ ``1e-3``, type=double, - ``min_sum_hessian_in_leaf``, default=\ ``1e-3``, type=double,
alias=\ ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian`` alias=\ ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian``, ``min_child_weight``
- minimal sum hessian in one leaf. Like ``min_data_in_leaf``, can be used to deal with over-fitting - minimal sum hessian in one leaf. Like ``min_data_in_leaf``, can be used to deal with over-fitting
......
...@@ -361,8 +361,8 @@ struct ParameterAlias { ...@@ -361,8 +361,8 @@ struct ParameterAlias {
{ {
{ "config", "config_file" }, { "config", "config_file" },
{ "nthread", "num_threads" }, { "nthread", "num_threads" },
{ "num_thread", "num_threads" },
{ "random_seed", "seed" }, { "random_seed", "seed" },
{ "num_thread", "num_threads" },
{ "boosting", "boosting_type" }, { "boosting", "boosting_type" },
{ "boost", "boosting_type" }, { "boost", "boosting_type" },
{ "application", "objective" }, { "application", "objective" },
...@@ -400,6 +400,7 @@ struct ParameterAlias { ...@@ -400,6 +400,7 @@ struct ParameterAlias {
{ "num_round", "num_iterations" }, { "num_round", "num_iterations" },
{ "num_trees", "num_iterations" }, { "num_trees", "num_iterations" },
{ "num_rounds", "num_iterations" }, { "num_rounds", "num_iterations" },
{ "num_boost_round", "num_iterations" },
{ "sub_row", "bagging_fraction" }, { "sub_row", "bagging_fraction" },
{ "subsample", "bagging_fraction" }, { "subsample", "bagging_fraction" },
{ "subsample_freq", "bagging_freq" }, { "subsample_freq", "bagging_freq" },
...@@ -427,9 +428,9 @@ struct ParameterAlias { ...@@ -427,9 +428,9 @@ struct ParameterAlias {
{ "cat_column", "categorical_column" }, { "cat_column", "categorical_column" },
{ "cat_feature", "categorical_column" }, { "cat_feature", "categorical_column" },
{ "predict_raw_score", "is_predict_raw_score" }, { "predict_raw_score", "is_predict_raw_score" },
{ "predict_leaf_index", "is_predict_leaf_index" },
{ "raw_score", "is_predict_raw_score" }, { "raw_score", "is_predict_raw_score" },
{ "leaf_index", "is_predict_leaf_index" }, { "leaf_index", "is_predict_leaf_index" },
{ "predict_leaf_index", "is_predict_leaf_index" },
{ "contrib", "is_predict_contrib" }, { "contrib", "is_predict_contrib" },
{ "predict_contrib", "is_predict_contrib" }, { "predict_contrib", "is_predict_contrib" },
{ "min_split_gain", "min_gain_to_split" }, { "min_split_gain", "min_gain_to_split" },
...@@ -439,9 +440,9 @@ struct ParameterAlias { ...@@ -439,9 +440,9 @@ struct ParameterAlias {
{ "num_classes", "num_class" }, { "num_classes", "num_class" },
{ "unbalanced_sets", "is_unbalance" }, { "unbalanced_sets", "is_unbalance" },
{ "bagging_fraction_seed", "bagging_seed" }, { "bagging_fraction_seed", "bagging_seed" },
{ "num_boost_round", "num_iterations" },
{ "workers", "machines" }, { "workers", "machines" },
{ "nodes", "machines" }, { "nodes", "machines" },
{ "subsample_for_bin", "bin_construct_sample_cnt" },
}); });
const std::unordered_set<std::string> parameter_set({ const std::unordered_set<std::string> parameter_set({
"config", "config_file", "task", "device", "config", "config_file", "task", "device",
...@@ -457,7 +458,7 @@ struct ParameterAlias { ...@@ -457,7 +458,7 @@ struct ParameterAlias {
"ignore_column", "categorical_column", "is_predict_raw_score", "ignore_column", "categorical_column", "is_predict_raw_score",
"is_predict_leaf_index", "min_gain_to_split", "top_k", "is_predict_leaf_index", "min_gain_to_split", "top_k",
"lambda_l1", "lambda_l2", "num_class", "is_unbalance", "lambda_l1", "lambda_l2", "num_class", "is_unbalance",
"max_depth", "subsample_for_bin", "max_bin", "bagging_seed", "max_depth", "max_bin", "bagging_seed",
"drop_rate", "skip_drop", "max_drop", "uniform_drop", "drop_rate", "skip_drop", "max_drop", "uniform_drop",
"xgboost_dart_mode", "drop_seed", "top_rate", "other_rate", "xgboost_dart_mode", "drop_seed", "top_rate", "other_rate",
"min_data_in_bin", "data_random_seed", "bin_construct_sample_cnt", "min_data_in_bin", "data_random_seed", "bin_construct_sample_cnt",
......
...@@ -139,8 +139,8 @@ class LGBMModel(_LGBMModelBase): ...@@ -139,8 +139,8 @@ class LGBMModel(_LGBMModelBase):
def __init__(self, boosting_type="gbdt", num_leaves=31, max_depth=-1, def __init__(self, boosting_type="gbdt", num_leaves=31, max_depth=-1,
learning_rate=0.1, n_estimators=10, max_bin=255, learning_rate=0.1, n_estimators=10, max_bin=255,
subsample_for_bin=50000, objective=None, subsample_for_bin=200000, objective=None,
min_split_gain=0., min_child_weight=5, min_child_samples=10, min_split_gain=0., min_child_weight=1e-3, min_child_samples=20,
subsample=1., subsample_freq=1, colsample_bytree=1., subsample=1., subsample_freq=1, colsample_bytree=1.,
reg_alpha=0., reg_lambda=0., random_state=0, reg_alpha=0., reg_lambda=0., random_state=0,
n_jobs=-1, silent=True, **kwargs): n_jobs=-1, silent=True, **kwargs):
...@@ -171,9 +171,9 @@ class LGBMModel(_LGBMModelBase): ...@@ -171,9 +171,9 @@ class LGBMModel(_LGBMModelBase):
default: 'binary' for LGBMClassifier, 'lambdarank' for LGBMRanker. default: 'binary' for LGBMClassifier, 'lambdarank' for LGBMRanker.
min_split_gain : float, optional (default=0.) min_split_gain : float, optional (default=0.)
Minimum loss reduction required to make a further partition on a leaf node of the tree. Minimum loss reduction required to make a further partition on a leaf node of the tree.
min_child_weight : int, optional (default=5) min_child_weight : float, optional (default=1e-3)
Minimum sum of instance weight(hessian) needed in a child(leaf). Minimum sum of instance weight(hessian) needed in a child(leaf).
min_child_samples : int, optional (default=10) min_child_samples : int, optional (default=20)
Minimum number of data need in a child(leaf). Minimum number of data need in a child(leaf).
subsample : float, optional (default=1.) subsample : float, optional (default=1.)
Subsample ratio of the training instance. Subsample ratio of the training instance.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment