Commit e5eb8560 authored by Nikita Titov's avatar Nikita Titov Committed by Guolin Ke
Browse files

[python] [docs] fixed objective in sklearn wrapper; added missed objectives &...

[python] [docs] fixed objective in sklearn wrapper; added missed objectives & metrics to docs (#1059)

* added missed aliases for task parameter

* fixed indents

* added missed aliases and options for tree_learner parameter

* added missed objectives to docs

* fixed typo in Poisson parameter and its description

* fixed model_format parameter description

* added missed metrics to docs

* fixed sklearn objective

* fixed set_params

* fixed docs

* added missed options to objectives

* added note about ignore_column (#1061)
parent 3d65d065
...@@ -39,22 +39,22 @@ Core Parameters ...@@ -39,22 +39,22 @@ Core Parameters
- path of config file - path of config file
- ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``prediction`` - ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``predict``, ``convert_model``
- ``train`` for training - ``train``, alias=\ ``training``, for training
- ``prediction`` for prediction. - ``predict``, alias=\ ``prediction``, ``test``, for prediction.
- ``convert_model`` for converting model file into if-else format, see more information in `Convert model parameters <#convert-model-parameters>`__ - ``convert_model``, for converting model file into if-else format, see more information in `Convert model parameters <#convert-model-parameters>`__
- ``application``, default=\ ``regression``, type=enum, - ``application``, default=\ ``regression``, type=enum,
options=\ ``regression``, ``regression_l2``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``quantile_l2``, options=\ ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``quantile_l2``,
``binary``, ``lambdarank``, ``multiclass``, ``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``,
alias=\ ``objective``, ``app`` alias=\ ``objective``, ``app``
- ``regression``, regression application - regression application
- ``regression_l2``, L2 loss, alias=\ ``mean_squared_error``, ``mse`` - ``regression_l2``, L2 loss, alias=\ ``regression``, ``mean_squared_error``, ``mse``
- ``regression_l1``, L1 loss, alias=\ ``mean_absolute_error``, ``mae`` - ``regression_l1``, L1 loss, alias=\ ``mean_absolute_error``, ``mae``
...@@ -68,7 +68,21 @@ Core Parameters ...@@ -68,7 +68,21 @@ Core Parameters
- ``quantile_l2``, like the ``quantile``, but L2 loss is used instead - ``quantile_l2``, like the ``quantile``, but L2 loss is used instead
- ``binary``, binary classification application - ``binary``, binary `log loss`_ classification application
- multi-class classification application
- ``multiclass``, `softmax`_ objective function, ``num_class`` should be set as well
- ``multiclassova``, `One-vs-All`_ binary objective function, ``num_class`` should be set as well
- cross-entropy application
- ``xentropy``, objective function for cross-entropy (with optional linear weights), alias=\ ``cross_entropy``
- ``xentlambda``, alternative parameterization of cross-entropy, alias=\ ``cross_entropy_lambda``
- the label is anything in interval [0, 1]
- ``lambdarank``, `lambdarank`_ application - ``lambdarank``, `lambdarank`_ application
...@@ -76,8 +90,6 @@ Core Parameters ...@@ -76,8 +90,6 @@ Core Parameters
- ``label_gain`` can be used to set the gain(weight) of ``int`` label - ``label_gain`` can be used to set the gain(weight) of ``int`` label
- ``multiclass``, multi-class classification application, ``num_class`` should be set as well
- ``boosting``, default=\ ``gbdt``, type=enum, - ``boosting``, default=\ ``gbdt``, type=enum,
options=\ ``gbdt``, ``rf``, ``dart``, ``goss``, options=\ ``gbdt``, ``rf``, ``dart``, ``goss``,
alias=\ ``boost``, ``boosting_type`` alias=\ ``boost``, ``boosting_type``
...@@ -120,13 +132,15 @@ Core Parameters ...@@ -120,13 +132,15 @@ Core Parameters
- number of leaves in one tree - number of leaves in one tree
- ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, alias=\ ``tree`` - ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, ``voting``, alias=\ ``tree``
- ``serial``, single machine tree learner - ``serial``, single machine tree learner
- ``feature``, feature parallel tree learner - ``feature``, alias=\ ``feature_parallel``, feature parallel tree learner
- ``data``, data parallel tree learner - ``data``, alias=\ ``data_parallel``, data parallel tree learner
- ``voting``, alias=\ ``voting_parallel``, voting parallel tree learner
- refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details - refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details
...@@ -321,7 +335,7 @@ IO Parameters ...@@ -321,7 +335,7 @@ IO Parameters
- file name of prediction result in ``prediction`` task - file name of prediction result in ``prediction`` task
- ``model_format``, default=\ ``text``, type=string - ``model_format``, default=\ ``text``, type=multi-enum, options=\ ``text``, ``proto``
- format to save and load model - format to save and load model
...@@ -406,6 +420,8 @@ IO Parameters ...@@ -406,6 +420,8 @@ IO Parameters
- add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored - add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored
- **Note**: works only in CLI-version
- **Note**: index starts from ``0``. And it doesn't count the label column - **Note**: index starts from ``0``. And it doesn't count the label column
- ``categorical_feature``, default=\ ``""``, type=string, alias=\ ``categorical_column``, ``cat_feature``, ``cat_column`` - ``categorical_feature``, default=\ ``""``, type=string, alias=\ ``categorical_column``, ``cat_feature``, ``cat_column``
...@@ -507,9 +523,9 @@ Objective Parameters ...@@ -507,9 +523,9 @@ Objective Parameters
- parameter to control the width of Gaussian function. Will be used in ``regression_l1`` and ``huber`` losses - parameter to control the width of Gaussian function. Will be used in ``regression_l1`` and ``huber`` losses
- ``poission_max_delta_step``, default=\ ``0.7``, type=double - ``poisson_max_delta_step``, default=\ ``0.7``, type=double
- parameter used to safeguard optimization - parameter for `Poisson regression`_ to safeguard optimization
- ``scale_pos_weight``, default=\ ``1.0``, type=double - ``scale_pos_weight``, default=\ ``1.0``, type=double
...@@ -579,13 +595,18 @@ Metric Parameters ...@@ -579,13 +595,18 @@ Metric Parameters
- ``binary_logloss``, `log loss`_ - ``binary_logloss``, `log loss`_
- ``binary_error``. - ``binary_error``, for one sample: ``0`` for correct classification, ``1`` for error classification
For one sample: ``0`` for correct classification, ``1`` for error classification
- ``multi_logloss``, log loss for mulit-class classification - ``multi_logloss``, log loss for mulit-class classification
- ``multi_error``, error rate for mulit-class classification - ``multi_error``, error rate for mulit-class classification
- ``xentropy``, cross-entropy (with optional linear weights), alias=\ ``cross_entropy``
- ``xentlambda``, "intensity-weighted" cross-entropy, alias=\ ``cross_entropy_lambda``
- ``kldiv``, `Kullback-Leibler divergence`_, alias=\ ``kullback_leibler``
- support multi metrics, separated by ``,`` - support multi metrics, separated by ``,``
- ``metric_freq``, default=\ ``1``, type=int - ``metric_freq``, default=\ ``1``, type=int
...@@ -749,3 +770,9 @@ You can specific query/group id in data file now. Please refer to parameter ``gr ...@@ -749,3 +770,9 @@ You can specific query/group id in data file now. Please refer to parameter ``gr
.. _AUC: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve .. _AUC: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve
.. _log loss: https://www.kaggle.com/wiki/LogLoss .. _log loss: https://www.kaggle.com/wiki/LogLoss
.. _softmax: https://en.wikipedia.org/wiki/Softmax_function
.. _One-vs-All: https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest
.. _Kullback-Leibler divergence: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
...@@ -163,7 +163,7 @@ class LGBMModel(_LGBMModelBase): ...@@ -163,7 +163,7 @@ class LGBMModel(_LGBMModelBase):
objective : string, callable or None, optional (default=None) objective : string, callable or None, optional (default=None)
Specify the learning task and the corresponding learning objective or Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below). a custom objective function to be used (see note below).
default: 'binary' for LGBMClassifier, 'lambdarank' for LGBMRanker. default: 'regression' for LGBMRegressor, 'binary' or 'multiclass' for LGBMClassifier, 'lambdarank' for LGBMRanker.
min_split_gain : float, optional (default=0.) min_split_gain : float, optional (default=0.)
Minimum loss reduction required to make a further partition on a leaf node of the tree. Minimum loss reduction required to make a further partition on a leaf node of the tree.
min_child_weight : float, optional (default=1e-3) min_child_weight : float, optional (default=1e-3)
...@@ -264,7 +264,7 @@ class LGBMModel(_LGBMModelBase): ...@@ -264,7 +264,7 @@ class LGBMModel(_LGBMModelBase):
self._best_score = None self._best_score = None
self._best_iteration = None self._best_iteration = None
self._other_params = {} self._other_params = {}
self._objective = None self._objective = objective
self._n_features = None self._n_features = None
self._classes = None self._classes = None
self._n_classes = None self._n_classes = None
...@@ -285,6 +285,8 @@ class LGBMModel(_LGBMModelBase): ...@@ -285,6 +285,8 @@ class LGBMModel(_LGBMModelBase):
def set_params(self, **params): def set_params(self, **params):
for key, value in params.items(): for key, value in params.items():
setattr(self, key, value) setattr(self, key, value)
if hasattr(self, '_' + key):
setattr(self, '_' + key, value)
self._other_params[key] = value self._other_params[key] = value
return self return self
...@@ -370,8 +372,6 @@ class LGBMModel(_LGBMModelBase): ...@@ -370,8 +372,6 @@ class LGBMModel(_LGBMModelBase):
For multi-class task, the y_pred is group by class_id first, then group by row_id. For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
""" """
if not hasattr(self, '_objective'):
self._objective = self.objective
if self._objective is None: if self._objective is None:
if isinstance(self, LGBMRegressor): if isinstance(self, LGBMRegressor):
self._objective = "regression" self._objective = "regression"
...@@ -633,6 +633,7 @@ class LGBMClassifier(LGBMModel, _LGBMClassifierBase): ...@@ -633,6 +633,7 @@ class LGBMClassifier(LGBMModel, _LGBMClassifierBase):
self._n_classes = len(self._classes) self._n_classes = len(self._classes)
if self._n_classes > 2: if self._n_classes > 2:
# Switch to using a multiclass objective in the underlying LGBM instance # Switch to using a multiclass objective in the underlying LGBM instance
if self._objective != "multiclassova" and not callable(self._objective):
self._objective = "multiclass" self._objective = "multiclass"
if eval_metric == 'logloss' or eval_metric == 'binary_logloss': if eval_metric == 'logloss' or eval_metric == 'binary_logloss':
eval_metric = "multi_logloss" eval_metric = "multi_logloss"
......
...@@ -39,7 +39,7 @@ Metric* Metric::CreateMetric(const std::string& type, const MetricConfig& config ...@@ -39,7 +39,7 @@ Metric* Metric::CreateMetric(const std::string& type, const MetricConfig& config
return new MultiErrorMetric(config); return new MultiErrorMetric(config);
} else if (type == std::string("xentropy") || type == std::string("cross_entropy")) { } else if (type == std::string("xentropy") || type == std::string("cross_entropy")) {
return new CrossEntropyMetric(config); return new CrossEntropyMetric(config);
} else if (type == std::string("xentlambda")) { } else if (type == std::string("xentlambda") || type == std::string("cross_entropy_lambda")) {
return new CrossEntropyLambdaMetric(config); return new CrossEntropyLambdaMetric(config);
} else if (type == std::string("kldiv") || type == std::string("kullback_leibler")) { } else if (type == std::string("kldiv") || type == std::string("kullback_leibler")) {
return new KullbackLeiblerDivergence(config); return new KullbackLeiblerDivergence(config);
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment