[python] [docs] fixed objective in sklearn wrapper; added missed objectives &...

[python] [docs] fixed objective in sklearn wrapper; added missed objectives & metrics to docs (#1059) * added missed aliases for task parameter * fixed indents * added missed aliases and options for tree_learner parameter * added missed objectives to docs * fixed typo in Poisson parameter and its description * fixed model_format parameter description * added missed metrics to docs * fixed sklearn objective * fixed set_params * fixed docs * added missed options to objectives * added note about ignore_column (#1061)

[python] [docs] fixed objective in sklearn wrapper; added missed objectives &...
[python] [docs] fixed objective in sklearn wrapper; added missed objectives & metrics to docs (#1059) * added missed aliases for task parameter * fixed indents * added missed aliases and options for tree_learner parameter * added missed objectives to docs * fixed typo in Poisson parameter and its description * fixed model_format parameter description * added missed metrics to docs * fixed sklearn objective * fixed set_params * fixed docs * added missed options to objectives * added note about ignore_column (#1061)
e5eb8560 · Nikita Titov · Guolin Ke · 3d65d065 · e5eb8560 · e5eb8560
Commit e5eb8560 authored Nov 16, 2017 by Nikita Titov Committed by Guolin Ke Nov 16, 2017
4 changed files
--- a/docs/Parameters.rst
+++ b/docs/Parameters.rst
@@ -39,22 +39,22 @@ Core Parameters

   -  path of config file

-  ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``prediction``
+-  ``task``, default=\ ``train``, type=enum, options=\ ``train``, ``predict``, ``convert_model``

-   -  ``train`` for training
+   -  ``train``, alias=\ ``training``, for training

-   -  ``prediction`` for prediction.
+   -  ``predict``, alias=\ ``prediction``, ``test``, for prediction.

-   -  ``convert_model`` for converting model file into if-else format, see more information in `Convert model parameters <#convert-model-parameters>`__
+   -  ``convert_model``, for converting model file into if-else format, see more information in `Convert model parameters <#convert-model-parameters>`__

 -  ``application``, default=\ ``regression``, type=enum,
-   options=\ ``regression``, ``regression_l2``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``quantile_l2``,
-   ``binary``, ``lambdarank``, ``multiclass``,
+   options=\ ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``quantile_l2``,
+   ``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``,
   alias=\ ``objective``, ``app``

-   -  ``regression``, regression application
+   -  regression application

-      -  ``regression_l2``, L2 loss, alias=\ ``mean_squared_error``, ``mse``
+      -  ``regression_l2``, L2 loss, alias=\ ``regression``, ``mean_squared_error``, ``mse``

      -  ``regression_l1``, L1 loss, alias=\ ``mean_absolute_error``, ``mae``

@@ -68,7 +68,21 @@ Core Parameters

      -  ``quantile_l2``, like the ``quantile``, but L2 loss is used instead

-   -  ``binary``, binary classification application
+   -  ``binary``, binary `log loss`_ classification application
+
+   -  multi-class classification application
+
+      -  ``multiclass``, `softmax`_ objective function, ``num_class`` should be set as well
+
+      -  ``multiclassova``, `One-vs-All`_ binary objective function, ``num_class`` should be set as well
+
+   -  cross-entropy application
+
+      -  ``xentropy``, objective function for cross-entropy (with optional linear weights), alias=\ ``cross_entropy``
+
+      -  ``xentlambda``, alternative parameterization of cross-entropy, alias=\ ``cross_entropy_lambda``
+
+      -  the label is anything in interval [0, 1]

   -  ``lambdarank``, `lambdarank`_ application

@@ -76,8 +90,6 @@ Core Parameters

      -  ``label_gain`` can be used to set the gain(weight) of ``int`` label

-   -  ``multiclass``, multi-class classification application, ``num_class`` should be set as well
-
 -  ``boosting``, default=\ ``gbdt``, type=enum,
   options=\ ``gbdt``, ``rf``, ``dart``, ``goss``,
   alias=\ ``boost``, ``boosting_type``
@@ -120,13 +132,15 @@ Core Parameters

   -  number of leaves in one tree

-  ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, alias=\ ``tree``
+-  ``tree_learner``, default=\ ``serial``, type=enum, options=\ ``serial``, ``feature``, ``data``, ``voting``, alias=\ ``tree``

   -  ``serial``, single machine tree learner

-   -  ``feature``, feature parallel tree learner
+   -  ``feature``, alias=\ ``feature_parallel``, feature parallel tree learner

-   -  ``data``, data parallel tree learner
+   -  ``data``, alias=\ ``data_parallel``, data parallel tree learner
+
+   -  ``voting``, alias=\ ``voting_parallel``, voting parallel tree learner

   -  refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details

@@ -321,7 +335,7 @@ IO Parameters

   -  file name of prediction result in ``prediction`` task

-  ``model_format``, default=\ ``text``, type=string
+-  ``model_format``, default=\ ``text``, type=multi-enum, options=\ ``text``, ``proto``

   -  format to save and load model

@@ -406,6 +420,8 @@ IO Parameters

   -  add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored

+   -  **Note**: works only in CLI-version
+
   -  **Note**: index starts from ``0``. And it doesn't count the label column

 -  ``categorical_feature``, default=\ ``""``, type=string, alias=\ ``categorical_column``, ``cat_feature``, ``cat_column``
@@ -507,9 +523,9 @@ Objective Parameters

   -  parameter to control the width of Gaussian function. Will be used in ``regression_l1`` and ``huber`` losses

-  ``poission_max_delta_step``, default=\ ``0.7``, type=double
+-  ``poisson_max_delta_step``, default=\ ``0.7``, type=double

-   -  parameter used to safeguard optimization
+   -  parameter for `Poisson regression`_ to safeguard optimization

 -  ``scale_pos_weight``, default=\ ``1.0``, type=double

@@ -579,13 +595,18 @@ Metric Parameters

   -  ``binary_logloss``, `log loss`_

-   -  ``binary_error``.
-      For one sample: ``0`` for correct classification, ``1`` for error classification
+   -  ``binary_error``, for one sample: ``0`` for correct classification, ``1`` for error classification

   -  ``multi_logloss``, log loss for mulit-class classification

   -  ``multi_error``, error rate for mulit-class classification

+   -  ``xentropy``, cross-entropy (with optional linear weights), alias=\ ``cross_entropy``
+
+   -  ``xentlambda``, "intensity-weighted" cross-entropy, alias=\ ``cross_entropy_lambda``
+
+   -  ``kldiv``, `Kullback-Leibler divergence`_, alias=\ ``kullback_leibler``
+
   -  support multi metrics, separated by ``,``

 -  ``metric_freq``, default=\ ``1``, type=int
@@ -749,3 +770,9 @@ You can specific query/group id in data file now. Please refer to parameter ``gr
 .. _AUC: https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve

 .. _log loss: https://www.kaggle.com/wiki/LogLoss
+
+.. _softmax: https://en.wikipedia.org/wiki/Softmax_function
+
+.. _One-vs-All: https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest
+
+.. _Kullback-Leibler divergence: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
--- a/include/LightGBM/config.h
+++ b/include/LightGBM/config.h
--- a/python-package/lightgbm/sklearn.py
+++ b/python-package/lightgbm/sklearn.py
@@ -163,7 +163,7 @@ class LGBMModel(_LGBMModelBase):
        objective : string, callable or None, optional (default=None)
            Specify the learning task and the corresponding learning objective or
            a custom objective function to be used (see note below).
-            default: 'binary' for LGBMClassifier, 'lambdarank' for LGBMRanker.
+            default: 'regression' for LGBMRegressor, 'binary' or 'multiclass' for LGBMClassifier, 'lambdarank' for LGBMRanker.
        min_split_gain : float, optional (default=0.)
            Minimum loss reduction required to make a further partition on a leaf node of the tree.
        min_child_weight : float, optional (default=1e-3)
@@ -264,7 +264,7 @@ class LGBMModel(_LGBMModelBase):
        self._best_score = None
        self._best_iteration = None
        self._other_params = {}
-        self._objective = None
+        self._objective = objective
        self._n_features = None
        self._classes = None
        self._n_classes = None
@@ -285,6 +285,8 @@ class LGBMModel(_LGBMModelBase):
    def set_params(self, **params):
        for key, value in params.items():
            setattr(self, key, value)
+            if hasattr(self, '_' + key):
+                setattr(self, '_' + key, value)
            self._other_params[key] = value
        return self

@@ -370,8 +372,6 @@ class LGBMModel(_LGBMModelBase):
        For multi-class task, the y_pred is group by class_id first, then group by row_id.
        If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i].
        """
-        if not hasattr(self, '_objective'):
-            self._objective = self.objective
        if self._objective is None:
            if isinstance(self, LGBMRegressor):
                self._objective = "regression"
@@ -633,6 +633,7 @@ class LGBMClassifier(LGBMModel, _LGBMClassifierBase):
        self._n_classes = len(self._classes)
        if self._n_classes > 2:
            # Switch to using a multiclass objective in the underlying LGBM instance
+            if self._objective != "multiclassova" and not callable(self._objective):
                self._objective = "multiclass"
            if eval_metric == 'logloss' or eval_metric == 'binary_logloss':
                eval_metric = "multi_logloss"

--- a/src/metric/metric.cpp
+++ b/src/metric/metric.cpp
@@ -39,7 +39,7 @@ Metric* Metric::CreateMetric(const std::string& type, const MetricConfig& config
    return new MultiErrorMetric(config);
  } else if (type == std::string("xentropy") || type == std::string("cross_entropy")) {
    return new CrossEntropyMetric(config);
-  } else if (type == std::string("xentlambda")) {
+  } else if (type == std::string("xentlambda") || type == std::string("cross_entropy_lambda")) {
    return new CrossEntropyLambdaMetric(config);
  } else if (type == std::string("kldiv") || type == std::string("kullback_leibler")) {
    return new KullbackLeiblerDivergence(config);