Unverified Commit d92d8444 authored by Nikita Titov's avatar Nikita Titov Committed by GitHub
Browse files

[python][R-package][docs] fix support of XE_NDCG_MART obj in language wrappers and docs (#2726)

parent 5de42f84
...@@ -209,6 +209,11 @@ lgb.check.obj <- function(params, obj) { ...@@ -209,6 +209,11 @@ lgb.check.obj <- function(params, obj) {
, "mape" , "mape"
, "gamma" , "gamma"
, "tweedie" , "tweedie"
, "rank_xendcg"
, "xendcg"
, "xe_ndcg"
, "xe_ndcg_mart"
, "xendcg_mart"
) )
# Check whether the objective is empty or not, and take it from params if needed # Check whether the objective is empty or not, and take it from params if needed
......
...@@ -91,17 +91,13 @@ Core Parameters ...@@ -91,17 +91,13 @@ Core Parameters
- label is anything in interval [0, 1] - label is anything in interval [0, 1]
- ``lambdarank``, `lambdarank <https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf>`__ application - ranking application
- label should be ``int`` type in lambdarank tasks, and larger number represents the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect) - ``lambdarank``, `lambdarank <https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf>`__ objective. `label_gain <#objective-parameters>`__ can be used to set the gain (weight) of ``int`` label and all values in ``label`` must be smaller than number of elements in ``label_gain``
- `label_gain <#objective-parameters>`__ can be used to set the gain (weight) of ``int`` label - ``rank_xendcg``, `XE_NDCG_MART <https://arxiv.org/abs/1911.09798>`__ ranking objective function. To obtain reproducible results, you should disable parallelism by setting ``num_threads`` to 1, aliases: ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
- all values in ``label`` must be smaller than number of elements in ``label_gain`` - label should be ``int`` type, and larger number represents the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect)
- ``rank_xendcg``, `XE_NDCG_MART <https://arxiv.org/abs/1911.09798>`__ ranking objective function, aliases: ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
- to obtain reproducible results, you should disable parallelism by setting ``num_threads`` to 1
- ``boosting`` :raw-html:`<a id="boosting" title="Permalink to this parameter" href="#boosting">&#x1F517;&#xFE0E;</a>`, default = ``gbdt``, type = enum, options: ``gbdt``, ``rf``, ``dart``, ``goss``, aliases: ``boosting_type``, ``boost`` - ``boosting`` :raw-html:`<a id="boosting" title="Permalink to this parameter" href="#boosting">&#x1F517;&#xFE0E;</a>`, default = ``gbdt``, type = enum, options: ``gbdt``, ``rf``, ``dart``, ``goss``, aliases: ``boosting_type``, ``boost``
...@@ -878,10 +874,10 @@ Objective Parameters ...@@ -878,10 +874,10 @@ Objective Parameters
- ``objective_seed`` :raw-html:`<a id="objective_seed" title="Permalink to this parameter" href="#objective_seed">&#x1F517;&#xFE0E;</a>`, default = ``5``, type = int - ``objective_seed`` :raw-html:`<a id="objective_seed" title="Permalink to this parameter" href="#objective_seed">&#x1F517;&#xFE0E;</a>`, default = ``5``, type = int
- random seed for objectives
- used only in the ``rank_xendcg`` objective - used only in the ``rank_xendcg`` objective
- random seed for objectives
Metric Parameters Metric Parameters
----------------- -----------------
...@@ -915,7 +911,7 @@ Metric Parameters ...@@ -915,7 +911,7 @@ Metric Parameters
- ``tweedie``, negative log-likelihood for **Tweedie** regression - ``tweedie``, negative log-likelihood for **Tweedie** regression
- ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__, aliases: ``lambdarank`` - ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__, aliases: ``lambdarank``, ``rank_xendcg``, ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
- ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision`` - ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision``
...@@ -1079,7 +1075,7 @@ Also, you can include weight column in your data file. Please refer to the ``wei ...@@ -1079,7 +1075,7 @@ Also, you can include weight column in your data file. Please refer to the ``wei
Query Data Query Data
~~~~~~~~~~ ~~~~~~~~~~
For LambdaRank learning, it needs query information for training data. For learning to rank, it needs query information for training data.
LightGBM uses an additional file to store query data, like the following: LightGBM uses an additional file to store query data, like the following:
:: ::
......
...@@ -123,12 +123,10 @@ struct Config { ...@@ -123,12 +123,10 @@ struct Config {
// descl2 = ``cross_entropy``, objective function for cross-entropy (with optional linear weights), aliases: ``xentropy`` // descl2 = ``cross_entropy``, objective function for cross-entropy (with optional linear weights), aliases: ``xentropy``
// descl2 = ``cross_entropy_lambda``, alternative parameterization of cross-entropy, aliases: ``xentlambda`` // descl2 = ``cross_entropy_lambda``, alternative parameterization of cross-entropy, aliases: ``xentlambda``
// descl2 = label is anything in interval [0, 1] // descl2 = label is anything in interval [0, 1]
// desc = ``lambdarank``, `lambdarank <https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf>`__ application // desc = ranking application
// descl2 = label should be ``int`` type in lambdarank tasks, and larger number represents the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect) // descl2 = ``lambdarank``, `lambdarank <https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf>`__ objective. `label_gain <#objective-parameters>`__ can be used to set the gain (weight) of ``int`` label and all values in ``label`` must be smaller than number of elements in ``label_gain``
// descl2 = `label_gain <#objective-parameters>`__ can be used to set the gain (weight) of ``int`` label // descl2 = ``rank_xendcg``, `XE_NDCG_MART <https://arxiv.org/abs/1911.09798>`__ ranking objective function. To obtain reproducible results, you should disable parallelism by setting ``num_threads`` to 1, aliases: ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
// descl2 = all values in ``label`` must be smaller than number of elements in ``label_gain`` // descl2 = label should be ``int`` type, and larger number represents the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect)
// desc = ``rank_xendcg``, `XE_NDCG_MART <https://arxiv.org/abs/1911.09798>`__ ranking objective function, aliases: ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
// descl2 = to obtain reproducible results, you should disable parallelism by setting ``num_threads`` to 1
std::string objective = "regression"; std::string objective = "regression";
// [doc-only] // [doc-only]
...@@ -763,8 +761,8 @@ struct Config { ...@@ -763,8 +761,8 @@ struct Config {
// desc = separate by ``,`` // desc = separate by ``,``
std::vector<double> label_gain; std::vector<double> label_gain;
// desc = random seed for objectives
// desc = used only in the ``rank_xendcg`` objective // desc = used only in the ``rank_xendcg`` objective
// desc = random seed for objectives
int objective_seed = 5; int objective_seed = 5;
#pragma endregion #pragma endregion
...@@ -789,7 +787,7 @@ struct Config { ...@@ -789,7 +787,7 @@ struct Config {
// descl2 = ``gamma``, negative log-likelihood for **Gamma** regression // descl2 = ``gamma``, negative log-likelihood for **Gamma** regression
// descl2 = ``gamma_deviance``, residual deviance for **Gamma** regression // descl2 = ``gamma_deviance``, residual deviance for **Gamma** regression
// descl2 = ``tweedie``, negative log-likelihood for **Tweedie** regression // descl2 = ``tweedie``, negative log-likelihood for **Tweedie** regression
// descl2 = ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__, aliases: ``lambdarank`` // descl2 = ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__, aliases: ``lambdarank``, ``rank_xendcg``, ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
// descl2 = ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision`` // descl2 = ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision``
// descl2 = ``auc``, `AUC <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`__ // descl2 = ``auc``, `AUC <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`__
// descl2 = ``binary_logloss``, `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__, aliases: ``binary`` // descl2 = ``binary_logloss``, `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__, aliases: ``binary``
......
...@@ -314,10 +314,12 @@ def _make_n_folds(full_data, folds, nfold, params, seed, fpreproc=None, stratifi ...@@ -314,10 +314,12 @@ def _make_n_folds(full_data, folds, nfold, params, seed, fpreproc=None, stratifi
flatted_group = np.zeros(num_data, dtype=np.int32) flatted_group = np.zeros(num_data, dtype=np.int32)
folds = folds.split(X=np.zeros(num_data), y=full_data.get_label(), groups=flatted_group) folds = folds.split(X=np.zeros(num_data), y=full_data.get_label(), groups=flatted_group)
else: else:
if any(params.get(obj_alias, "") == "lambdarank" for obj_alias in _ConfigAliases.get("objective")): if any(params.get(obj_alias, "") in {"lambdarank", "rank_xendcg", "xendcg",
"xe_ndcg", "xe_ndcg_mart", "xendcg_mart"}
for obj_alias in _ConfigAliases.get("objective")):
if not SKLEARN_INSTALLED: if not SKLEARN_INSTALLED:
raise LightGBMError('Scikit-learn is required for lambdarank cv.') raise LightGBMError('Scikit-learn is required for ranking cv.')
# lambdarank task, split according to groups # ranking task, split according to groups
group_info = np.array(full_data.get_group(), dtype=np.int32, copy=False) group_info = np.array(full_data.get_group(), dtype=np.int32, copy=False)
flatted_group = np.repeat(range_(len(group_info)), repeats=group_info) flatted_group = np.repeat(range_(len(group_info)), repeats=group_info)
group_kfold = _LGBMGroupKFold(n_splits=nfold) group_kfold = _LGBMGroupKFold(n_splits=nfold)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment