Unverified Commit d670a4d6 authored by José Morales's avatar José Morales Committed by GitHub
Browse files

[python-package] use 2d collections for predictions, grads and hess in...

[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective (#4925)

* reshape predictions, grad and hess in multiclass custom objective

* add sklearn test. move custom obj to utils. docs for numpy

* use num_model_per_iteration to get num_classes

* update docs and dask multiclass custom objective test

* move reshaping to __inner_predict. add test for feval

* add missing note. remove extra line
parent caa087bc
...@@ -2947,22 +2947,21 @@ class Booster: ...@@ -2947,22 +2947,21 @@ class Booster:
Should accept two parameters: preds, train_data, Should accept two parameters: preds, train_data,
and return (grad, hess). and return (grad, hess).
preds : numpy 1-D array preds : numpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. The predicted values.
Predicted values are returned before any transformation, Predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task. e.g. they are raw margin instead of probability of positive class for binary task.
train_data : Dataset train_data : Dataset
The training dataset. The training dataset.
grad : list, numpy 1-D array or pandas Series grad : numpy 1-D array or numpy 2-D array (for multi-class task)
The value of the first order derivative (gradient) of the loss The value of the first order derivative (gradient) of the loss
with respect to the elements of preds for each sample point. with respect to the elements of preds for each sample point.
hess : list, numpy 1-D array or pandas Series hess : numpy 1-D array or numpy 2-D array (for multi-class task)
The value of the second order derivative (Hessian) of the loss The value of the second order derivative (Hessian) of the loss
with respect to the elements of preds for each sample point. with respect to the elements of preds for each sample point.
For multi-class task, the preds is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row preds in j-th class, the access way is score[j * num_data + i] and grad and hess should be returned in the same format.
and you should group grad and hess in this way as well.
Returns Returns
------- -------
...@@ -3000,6 +2999,9 @@ class Booster: ...@@ -3000,6 +2999,9 @@ class Booster:
if not self.__set_objective_to_none: if not self.__set_objective_to_none:
self.reset_parameter({"objective": "none"}).__set_objective_to_none = True self.reset_parameter({"objective": "none"}).__set_objective_to_none = True
grad, hess = fobj(self.__inner_predict(0), self.train_set) grad, hess = fobj(self.__inner_predict(0), self.train_set)
if self.num_model_per_iteration() > 1:
grad = grad.ravel(order='F')
hess = hess.ravel(order='F')
return self.__boost(grad, hess) return self.__boost(grad, hess)
def __boost(self, grad, hess): def __boost(self, grad, hess):
...@@ -3009,16 +3011,15 @@ class Booster: ...@@ -3009,16 +3011,15 @@ class Booster:
Score is returned before any transformation, Score is returned before any transformation,
e.g. it is raw margin instead of probability of positive class for binary task. e.g. it is raw margin instead of probability of positive class for binary task.
For multi-class task, the score is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row score in j-th class, the access way is score[j * num_data + i] and grad and hess should be returned in the same format.
and you should group grad and hess in this way as well.
Parameters Parameters
---------- ----------
grad : list, numpy 1-D array or pandas Series grad : numpy 1-D array or numpy 2-D array (for multi-class task)
The value of the first order derivative (gradient) of the loss The value of the first order derivative (gradient) of the loss
with respect to the elements of score for each sample point. with respect to the elements of score for each sample point.
hess : list, numpy 1-D array or pandas Series hess : numpy 1-D array or numpy 2-D array (for multi-class task)
The value of the second order derivative (Hessian) of the loss The value of the second order derivative (Hessian) of the loss
with respect to the elements of score for each sample point. with respect to the elements of score for each sample point.
...@@ -3160,8 +3161,8 @@ class Booster: ...@@ -3160,8 +3161,8 @@ class Booster:
is_higher_better : bool is_higher_better : bool
Is eval result higher better, e.g. AUC is ``is_higher_better``. Is eval result higher better, e.g. AUC is ``is_higher_better``.
For multi-class task, the preds is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i]. and grad and hess should be returned in the same format.
Returns Returns
------- -------
...@@ -3195,7 +3196,7 @@ class Booster: ...@@ -3195,7 +3196,7 @@ class Booster:
Should accept two parameters: preds, eval_data, Should accept two parameters: preds, eval_data,
and return (eval_name, eval_result, is_higher_better) or list of such tuples. and return (eval_name, eval_result, is_higher_better) or list of such tuples.
preds : numpy 1-D array preds : numpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. The predicted values.
If ``fobj`` is specified, predicted values are returned before any transformation, If ``fobj`` is specified, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case. e.g. they are raw margin instead of probability of positive class for binary task in this case.
...@@ -3208,8 +3209,8 @@ class Booster: ...@@ -3208,8 +3209,8 @@ class Booster:
is_higher_better : bool is_higher_better : bool
Is eval result higher better, e.g. AUC is ``is_higher_better``. Is eval result higher better, e.g. AUC is ``is_higher_better``.
For multi-class task, the preds is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i]. and grad and hess should be returned in the same format.
Returns Returns
------- -------
...@@ -3228,7 +3229,7 @@ class Booster: ...@@ -3228,7 +3229,7 @@ class Booster:
Should accept two parameters: preds, eval_data, Should accept two parameters: preds, eval_data,
and return (eval_name, eval_result, is_higher_better) or list of such tuples. and return (eval_name, eval_result, is_higher_better) or list of such tuples.
preds : numpy 1-D array preds : numpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. The predicted values.
If ``fobj`` is specified, predicted values are returned before any transformation, If ``fobj`` is specified, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case. e.g. they are raw margin instead of probability of positive class for binary task in this case.
...@@ -3241,8 +3242,8 @@ class Booster: ...@@ -3241,8 +3242,8 @@ class Booster:
is_higher_better : bool is_higher_better : bool
Is eval result higher better, e.g. AUC is ``is_higher_better``. Is eval result higher better, e.g. AUC is ``is_higher_better``.
For multi-class task, the preds is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row preds in j-th class, the access way is preds[j * num_data + i]. and grad and hess should be returned in the same format.
Returns Returns
------- -------
...@@ -3868,7 +3869,11 @@ class Booster: ...@@ -3868,7 +3869,11 @@ class Booster:
if tmp_out_len.value != len(self.__inner_predict_buffer[data_idx]): if tmp_out_len.value != len(self.__inner_predict_buffer[data_idx]):
raise ValueError(f"Wrong length of predict results for data {data_idx}") raise ValueError(f"Wrong length of predict results for data {data_idx}")
self.__is_predicted_cur_iter[data_idx] = True self.__is_predicted_cur_iter[data_idx] = True
return self.__inner_predict_buffer[data_idx] result = self.__inner_predict_buffer[data_idx]
if self.__num_class > 1:
num_data = result.size // self.__num_class
result = result.reshape(num_data, self.__num_class, order='F')
return result
def __get_eval_info(self): def __get_eval_info(self):
"""Get inner evaluation count and names.""" """Get inner evaluation count and names."""
......
...@@ -59,7 +59,7 @@ class _ObjectiveFunctionWrapper: ...@@ -59,7 +59,7 @@ class _ObjectiveFunctionWrapper:
y_true : numpy 1-D array of shape = [n_samples] y_true : numpy 1-D array of shape = [n_samples]
The target values. The target values.
y_pred : numpy 1-D array of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) y_pred : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. The predicted values.
Predicted values are returned before any transformation, Predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task. e.g. they are raw margin instead of probability of positive class for binary task.
...@@ -69,18 +69,17 @@ class _ObjectiveFunctionWrapper: ...@@ -69,18 +69,17 @@ class _ObjectiveFunctionWrapper:
sum(group) = n_samples. sum(group) = n_samples.
For example, if you have a 100-document dataset with ``group = [10, 20, 40, 10, 10, 10]``, that means that you have 6 groups, For example, if you have a 100-document dataset with ``group = [10, 20, 40, 10, 10, 10]``, that means that you have 6 groups,
where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc. where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad : list, numpy 1-D array or pandas Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) grad : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss The value of the first order derivative (gradient) of the loss
with respect to the elements of y_pred for each sample point. with respect to the elements of y_pred for each sample point.
hess : list, numpy 1-D array or pandas Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) hess : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss The value of the second order derivative (Hessian) of the loss
with respect to the elements of y_pred for each sample point. with respect to the elements of y_pred for each sample point.
.. note:: .. note::
For multi-class task, the y_pred is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and grad and hess should be returned in the same format.
and you should group grad and hess in this way as well.
""" """
self.func = func self.func = func
...@@ -89,17 +88,17 @@ class _ObjectiveFunctionWrapper: ...@@ -89,17 +88,17 @@ class _ObjectiveFunctionWrapper:
Parameters Parameters
---------- ----------
preds : numpy 1-D array of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) preds : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. The predicted values.
dataset : Dataset dataset : Dataset
The training dataset. The training dataset.
Returns Returns
------- -------
grad : list, numpy 1-D array or pandas Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) grad : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss The value of the first order derivative (gradient) of the loss
with respect to the elements of preds for each sample point. with respect to the elements of preds for each sample point.
hess : list, numpy 1-D array or pandas Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) hess : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss The value of the second order derivative (Hessian) of the loss
with respect to the elements of preds for each sample point. with respect to the elements of preds for each sample point.
""" """
...@@ -114,20 +113,13 @@ class _ObjectiveFunctionWrapper: ...@@ -114,20 +113,13 @@ class _ObjectiveFunctionWrapper:
"""weighted for objective""" """weighted for objective"""
weight = dataset.get_weight() weight = dataset.get_weight()
if weight is not None: if weight is not None:
"""only one class""" if grad.ndim == 2: # multi-class
if len(weight) == len(grad): num_data = grad.shape[0]
grad = np.multiply(grad, weight) if weight.size != num_data:
hess = np.multiply(hess, weight) raise ValueError("grad and hess should be of shape [n_samples, n_classes]")
else: weight = weight.reshape(num_data, 1)
num_data = len(weight) grad *= weight
num_class = len(grad) // num_data hess *= weight
if num_class * num_data != len(grad):
raise ValueError("Length of grad and hess should equal to num_class * num_data")
for k in range(num_class):
for i in range(num_data):
idx = k * num_data + i
grad[idx] *= weight[i]
hess[idx] *= weight[i]
return grad, hess return grad, hess
...@@ -152,7 +144,7 @@ class _EvalFunctionWrapper: ...@@ -152,7 +144,7 @@ class _EvalFunctionWrapper:
y_true : numpy 1-D array of shape = [n_samples] y_true : numpy 1-D array of shape = [n_samples]
The target values. The target values.
y_pred : numpy 1-D array of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) y_pred : numpy 1-D array of shape = [n_samples] or numpy 2-D array shape = [n_samples, n_classes] (for multi-class task)
The predicted values. The predicted values.
In case of custom ``objective``, predicted values are returned before any transformation, In case of custom ``objective``, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case. e.g. they are raw margin instead of probability of positive class for binary task in this case.
...@@ -173,8 +165,8 @@ class _EvalFunctionWrapper: ...@@ -173,8 +165,8 @@ class _EvalFunctionWrapper:
.. note:: .. note::
For multi-class task, the y_pred is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. and grad and hess should be returned in the same format.
""" """
self.func = func self.func = func
...@@ -183,7 +175,7 @@ class _EvalFunctionWrapper: ...@@ -183,7 +175,7 @@ class _EvalFunctionWrapper:
Parameters Parameters
---------- ----------
preds : numpy 1-D array of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) preds : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. The predicted values.
dataset : Dataset dataset : Dataset
The training dataset. The training dataset.
...@@ -287,7 +279,7 @@ _lgbmmodel_doc_custom_eval_note = """ ...@@ -287,7 +279,7 @@ _lgbmmodel_doc_custom_eval_note = """
y_true : numpy 1-D array of shape = [n_samples] y_true : numpy 1-D array of shape = [n_samples]
The target values. The target values.
y_pred : numpy 1-D array of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) y_pred : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. The predicted values.
In case of custom ``objective``, predicted values are returned before any transformation, In case of custom ``objective``, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case. e.g. they are raw margin instead of probability of positive class for binary task in this case.
...@@ -306,8 +298,8 @@ _lgbmmodel_doc_custom_eval_note = """ ...@@ -306,8 +298,8 @@ _lgbmmodel_doc_custom_eval_note = """
is_higher_better : bool is_higher_better : bool
Is eval result higher better, e.g. AUC is ``is_higher_better``. Is eval result higher better, e.g. AUC is ``is_higher_better``.
For multi-class task, the y_pred is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. and grad and hess should be returned in the same format.
""" """
_lgbmmodel_doc_predict = ( _lgbmmodel_doc_predict = (
...@@ -464,7 +456,7 @@ class LGBMModel(_LGBMModelBase): ...@@ -464,7 +456,7 @@ class LGBMModel(_LGBMModelBase):
y_true : numpy 1-D array of shape = [n_samples] y_true : numpy 1-D array of shape = [n_samples]
The target values. The target values.
y_pred : numpy 1-D array of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) y_pred : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The predicted values. The predicted values.
Predicted values are returned before any transformation, Predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task. e.g. they are raw margin instead of probability of positive class for binary task.
...@@ -474,16 +466,15 @@ class LGBMModel(_LGBMModelBase): ...@@ -474,16 +466,15 @@ class LGBMModel(_LGBMModelBase):
sum(group) = n_samples. sum(group) = n_samples.
For example, if you have a 100-document dataset with ``group = [10, 20, 40, 10, 10, 10]``, that means that you have 6 groups, For example, if you have a 100-document dataset with ``group = [10, 20, 40, 10, 10, 10]``, that means that you have 6 groups,
where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc. where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
grad : list, numpy 1-D array or pandas Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) grad : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss The value of the first order derivative (gradient) of the loss
with respect to the elements of y_pred for each sample point. with respect to the elements of y_pred for each sample point.
hess : list, numpy 1-D array or pandas Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) hess : numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss The value of the second order derivative (Hessian) of the loss
with respect to the elements of y_pred for each sample point. with respect to the elements of y_pred for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id. For multi-class task, preds are a [n_samples, n_classes] numpy 2-D array,
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and grad and hess should be returned in the same format.
and you should group grad and hess in this way as well.
""" """
if not SKLEARN_INSTALLED: if not SKLEARN_INSTALLED:
raise LightGBMError('scikit-learn is required for lightgbm.sklearn. ' raise LightGBMError('scikit-learn is required for lightgbm.sklearn. '
......
...@@ -7,13 +7,14 @@ from pathlib import Path ...@@ -7,13 +7,14 @@ from pathlib import Path
import numpy as np import numpy as np
import pytest import pytest
from scipy import sparse from scipy import sparse
from sklearn.datasets import dump_svmlight_file, load_svmlight_file from sklearn.datasets import dump_svmlight_file, load_svmlight_file, make_blobs
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
import lightgbm as lgb import lightgbm as lgb
from lightgbm.compat import PANDAS_INSTALLED, pd_DataFrame, pd_Series from lightgbm.compat import PANDAS_INSTALLED, pd_DataFrame, pd_Series
from .utils import load_breast_cancer from .utils import load_breast_cancer, sklearn_multiclass_custom_objective, softmax
def test_basic(tmp_path): def test_basic(tmp_path):
...@@ -587,7 +588,7 @@ def _bad_gradients(preds, _): ...@@ -587,7 +588,7 @@ def _bad_gradients(preds, _):
def _good_gradients(preds, _): def _good_gradients(preds, _):
return np.random.randn(len(preds)), np.random.rand(len(preds)) return np.random.randn(*preds.shape), np.random.rand(*preds.shape)
def test_custom_objective_safety(): def test_custom_objective_safety():
...@@ -609,3 +610,51 @@ def test_custom_objective_safety(): ...@@ -609,3 +610,51 @@ def test_custom_objective_safety():
good_bst_multi.update(fobj=_good_gradients) good_bst_multi.update(fobj=_good_gradients)
with pytest.raises(ValueError, match=re.escape(f"number of models per one iteration ({nclass})")): with pytest.raises(ValueError, match=re.escape(f"number of models per one iteration ({nclass})")):
bad_bst_multi.update(fobj=_bad_gradients) bad_bst_multi.update(fobj=_bad_gradients)
def test_multiclass_custom_objective():
def custom_obj(y_pred, ds):
y_true = ds.get_label()
return sklearn_multiclass_custom_objective(y_true, y_pred)
centers = [[-4, -4], [4, 4], [-4, 4]]
X, y = make_blobs(n_samples=1_000, centers=centers, random_state=42)
ds = lgb.Dataset(X, y)
params = {'objective': 'multiclass', 'num_class': 3, 'num_leaves': 7}
builtin_obj_bst = lgb.train(params, ds, num_boost_round=10)
builtin_obj_preds = builtin_obj_bst.predict(X)
custom_obj_bst = lgb.train(params, ds, num_boost_round=10, fobj=custom_obj)
custom_obj_preds = softmax(custom_obj_bst.predict(X))
np.testing.assert_allclose(builtin_obj_preds, custom_obj_preds, rtol=0.01)
def test_multiclass_custom_eval():
def custom_eval(y_pred, ds):
y_true = ds.get_label()
return 'custom_logloss', log_loss(y_true, y_pred), False
centers = [[-4, -4], [4, 4], [-4, 4]]
X, y = make_blobs(n_samples=1_000, centers=centers, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=0)
train_ds = lgb.Dataset(X_train, y_train)
valid_ds = lgb.Dataset(X_valid, y_valid, reference=train_ds)
params = {'objective': 'multiclass', 'num_class': 3, 'num_leaves': 7}
eval_result = {}
bst = lgb.train(
params,
train_ds,
num_boost_round=10,
valid_sets=[train_ds, valid_ds],
valid_names=['train', 'valid'],
feval=custom_eval,
callbacks=[lgb.record_evaluation(eval_result)],
keep_training_booster=True,
)
for key, ds in zip(['train', 'valid'], [train_ds, valid_ds]):
np.testing.assert_allclose(eval_result[key]['multi_logloss'], eval_result[key]['custom_logloss'])
_, metric, value, _ = bst.eval(ds, key, feval=custom_eval)[1] # first element is multi_logloss
assert metric == 'custom_logloss'
np.testing.assert_allclose(value, eval_result[key][metric][-1])
...@@ -15,6 +15,8 @@ import pytest ...@@ -15,6 +15,8 @@ import pytest
import lightgbm as lgb import lightgbm as lgb
from .utils import sklearn_multiclass_custom_objective
if not platform.startswith('linux'): if not platform.startswith('linux'):
pytest.skip('lightgbm.dask is currently supported in Linux environments', allow_module_level=True) pytest.skip('lightgbm.dask is currently supported in Linux environments', allow_module_level=True)
if machine() != 'x86_64': if machine() != 'x86_64':
...@@ -271,25 +273,6 @@ def _objective_logistic_regression(y_true, y_pred): ...@@ -271,25 +273,6 @@ def _objective_logistic_regression(y_true, y_pred):
return grad, hess return grad, hess
def _objective_logloss(y_true, y_pred):
num_rows = len(y_true)
num_class = len(np.unique(y_true))
# operate on preds as [num_data, num_classes] matrix
y_pred = y_pred.reshape(-1, num_class, order='F')
row_wise_max = np.max(y_pred, axis=1).reshape(num_rows, 1)
preds = y_pred - row_wise_max
prob = np.exp(preds) / np.sum(np.exp(preds), axis=1).reshape(num_rows, 1)
grad_update = np.zeros_like(preds)
grad_update[np.arange(num_rows), y_true.astype(np.int32)] = -1.0
grad = prob + grad_update
factor = num_class / (num_class - 1)
hess = factor * prob * (1 - prob)
# reshape back to 1-D array, grouped by class id and then row id
grad = grad.T.reshape(-1)
hess = hess.T.reshape(-1)
return grad, hess
@pytest.mark.parametrize('output', data_output) @pytest.mark.parametrize('output', data_output)
@pytest.mark.parametrize('task', ['binary-classification', 'multiclass-classification']) @pytest.mark.parametrize('task', ['binary-classification', 'multiclass-classification'])
@pytest.mark.parametrize('boosting_type', boosting_types) @pytest.mark.parametrize('boosting_type', boosting_types)
...@@ -507,7 +490,7 @@ def test_classifier_custom_objective(output, task, cluster): ...@@ -507,7 +490,7 @@ def test_classifier_custom_objective(output, task, cluster):
}) })
elif task == 'multiclass-classification': elif task == 'multiclass-classification':
params.update({ params.update({
'objective': _objective_logloss, 'objective': sklearn_multiclass_custom_objective,
'num_classes': 3 'num_classes': 3
}) })
......
...@@ -7,7 +7,7 @@ import joblib ...@@ -7,7 +7,7 @@ import joblib
import numpy as np import numpy as np
import pytest import pytest
from sklearn.base import clone from sklearn.base import clone
from sklearn.datasets import load_svmlight_file, make_multilabel_classification from sklearn.datasets import load_svmlight_file, make_blobs, make_multilabel_classification
from sklearn.ensemble import StackingClassifier, StackingRegressor from sklearn.ensemble import StackingClassifier, StackingRegressor
from sklearn.metrics import log_loss, mean_squared_error from sklearn.metrics import log_loss, mean_squared_error
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, train_test_split
...@@ -18,7 +18,7 @@ from sklearn.utils.validation import check_is_fitted ...@@ -18,7 +18,7 @@ from sklearn.utils.validation import check_is_fitted
import lightgbm as lgb import lightgbm as lgb
from .utils import (load_boston, load_breast_cancer, load_digits, load_iris, load_linnerud, make_ranking, from .utils import (load_boston, load_breast_cancer, load_digits, load_iris, load_linnerud, make_ranking,
make_synthetic_regression) make_synthetic_regression, sklearn_multiclass_custom_objective, softmax)
decreasing_generator = itertools.count(0, -1) decreasing_generator = itertools.count(0, -1)
...@@ -1280,3 +1280,20 @@ def test_training_succeeds_when_data_is_dataframe_and_label_is_column_array(task ...@@ -1280,3 +1280,20 @@ def test_training_succeeds_when_data_is_dataframe_and_label_is_column_array(task
preds_1d = model_1d.predict(X) preds_1d = model_1d.predict(X)
preds_2d = model_2d.predict(X) preds_2d = model_2d.predict(X)
np.testing.assert_array_equal(preds_1d, preds_2d) np.testing.assert_array_equal(preds_1d, preds_2d)
def test_multiclass_custom_objective():
centers = [[-4, -4], [4, 4], [-4, 4]]
X, y = make_blobs(n_samples=1_000, centers=centers, random_state=42)
params = {'n_estimators': 10, 'num_leaves': 7}
builtin_obj_model = lgb.LGBMClassifier(**params)
builtin_obj_model.fit(X, y)
builtin_obj_preds = builtin_obj_model.predict_proba(X)
custom_obj_model = lgb.LGBMClassifier(objective=sklearn_multiclass_custom_objective, **params)
custom_obj_model.fit(X, y)
custom_obj_preds = softmax(custom_obj_model.predict(X, raw_score=True))
np.testing.assert_allclose(builtin_obj_preds, custom_obj_preds, rtol=0.01)
assert not callable(builtin_obj_model.objective_)
assert callable(custom_obj_model.objective_)
...@@ -114,3 +114,20 @@ def make_ranking(n_samples=100, n_features=20, n_informative=5, gmax=2, ...@@ -114,3 +114,20 @@ def make_ranking(n_samples=100, n_features=20, n_informative=5, gmax=2,
@lru_cache(maxsize=None) @lru_cache(maxsize=None)
def make_synthetic_regression(n_samples=100): def make_synthetic_regression(n_samples=100):
return sklearn.datasets.make_regression(n_samples, n_features=4, n_informative=2, random_state=42) return sklearn.datasets.make_regression(n_samples, n_features=4, n_informative=2, random_state=42)
def softmax(x):
row_wise_max = np.max(x, axis=1).reshape(-1, 1)
exp_x = np.exp(x - row_wise_max)
return exp_x / np.sum(exp_x, axis=1).reshape(-1, 1)
def sklearn_multiclass_custom_objective(y_true, y_pred):
num_rows, num_class = y_pred.shape
prob = softmax(y_pred)
grad_update = np.zeros_like(prob)
grad_update[np.arange(num_rows), y_true.astype(np.int32)] = -1.0
grad = prob + grad_update
factor = num_class / (num_class - 1)
hess = factor * prob * (1 - prob)
return grad, hess
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment