Commit bd7274ba authored by wxchan's avatar wxchan Committed by Guolin Ke
Browse files

add callbacks to sklearn interface (#150)

parent 8c6933ec
...@@ -55,9 +55,9 @@ The methods of each Class is in alphabetical order. ...@@ -55,9 +55,9 @@ The methods of each Class is in alphabetical order.
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
params: dict, optional params : dict, optional
Other parameters Other parameters
free_raw_data: Bool free_raw_data : Bool
True if need to free raw data after construct inner dataset True if need to free raw data after construct inner dataset
...@@ -78,7 +78,7 @@ The methods of each Class is in alphabetical order. ...@@ -78,7 +78,7 @@ The methods of each Class is in alphabetical order.
Group/query size for dataset Group/query size for dataset
silent : boolean, optional silent : boolean, optional
Whether print messages during construction Whether print messages during construction
params: dict, optional params : dict, optional
Other parameters Other parameters
...@@ -400,7 +400,7 @@ The methods of each Class is in alphabetical order. ...@@ -400,7 +400,7 @@ The methods of each Class is in alphabetical order.
---------- ----------
filename : str filename : str
Filename to save Filename to save
num_iteration: int num_iteration : int
Number of iteration that want to save. < 0 means save all Number of iteration that want to save. < 0 means save all
...@@ -497,14 +497,15 @@ The methods of each Class is in alphabetical order. ...@@ -497,14 +497,15 @@ The methods of each Class is in alphabetical order.
or the boosting stage found by using `early_stopping_rounds` is also printed. or the boosting stage found by using `early_stopping_rounds` is also printed.
Example: with verbose_eval=4 and at least one item in evals, Example: with verbose_eval=4 and at least one item in evals,
an evaluation metric is printed every 4 (instead of 1) boosting stages. an evaluation metric is printed every 4 (instead of 1) boosting stages.
learning_rates: list or function learning_rates : list or function
List of learning rate for each boosting round List of learning rate for each boosting round
or a customized function that calculates learning_rate or a customized function that calculates learning_rate
in terms of current number of round (e.g. yields learning rate decay) in terms of current number of round (e.g. yields learning rate decay)
- list l: learning_rate = l[current_round] - list l: learning_rate = l[current_round]
- function f: learning_rate = f(current_round) - function f: learning_rate = f(current_round)
callbacks : list of callback functions callbacks : list of callback functions
List of callback functions that are applied at end of each iteration. List of callback functions that are applied at each iteration.
See Callbacks in Python-API.md for more information.
Returns Returns
------- -------
...@@ -643,13 +644,13 @@ The methods of each Class is in alphabetical order. ...@@ -643,13 +644,13 @@ The methods of each Class is in alphabetical order.
y_true: array_like of shape [n_samples] y_true: array_like of shape [n_samples]
The target values The target values
y_pred: array_like of shape [n_samples] or shape[n_samples* n_class] y_pred: array_like of shape [n_samples] or shape[n_samples * n_class]
The predicted values The predicted values
group: array_like group: array_like
group/query data, used for ranking task group/query data, used for ranking task
grad: array_like of shape [n_samples] or shape[n_samples* n_class] grad: array_like of shape [n_samples] or shape[n_samples * n_class]
The value of the gradient for each sample point. The value of the gradient for each sample point.
hess: array_like of shape [n_samples] or shape[n_samples* n_class] hess: array_like of shape [n_samples] or shape[n_samples * n_class]
The value of the second derivative for each sample point The value of the second derivative for each sample point
for multi-class task, the y_pred is group by class_id first, then group by row_id for multi-class task, the y_pred is group by class_id first, then group by row_id
...@@ -703,7 +704,7 @@ The methods of each Class is in alphabetical order. ...@@ -703,7 +704,7 @@ The methods of each Class is in alphabetical order.
Array of normailized feature importances Array of normailized feature importances
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, callbacks=None)
Fit the gradient boosting model. Fit the gradient boosting model.
...@@ -721,12 +722,12 @@ The methods of each Class is in alphabetical order. ...@@ -721,12 +722,12 @@ The methods of each Class is in alphabetical order.
group data of training data group data of training data
eval_set : list, optional eval_set : list, optional
A list of (X, y) tuple pairs to use as a validation set for early-stopping A list of (X, y) tuple pairs to use as a validation set for early-stopping
eval_sample_weight : List or Dict of array eval_sample_weight : list or dict of array
weight of eval data weight of eval data; if you use dict, the index should start from 0
eval_init_score : List or Dict of array eval_init_score : list or dict of array
init score of eval data init score of eval data; if you use dict, the index should start from 0
eval_group : List or Dict of array eval_group : list or dict of array
group data of eval data group data of eval data; if you use dict, the index should start from 0
eval_metric : str, list of str, callable, optional eval_metric : str, list of str, callable, optional
If a str, should be a built-in evaluation metric to use. If a str, should be a built-in evaluation metric to use.
If callable, a custom evaluation metric, see note for more details. If callable, a custom evaluation metric, see note for more details.
...@@ -739,7 +740,10 @@ The methods of each Class is in alphabetical order. ...@@ -739,7 +740,10 @@ The methods of each Class is in alphabetical order.
categorical_feature : list of str or int categorical_feature : list of str or int
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well).
callbacks : list of callback functions
List of callback functions that are applied at each iteration.
See Callbacks in Python-API.md for more information.
Note Note
---- ----
...@@ -807,7 +811,7 @@ The methods of each Class is in alphabetical order. ...@@ -807,7 +811,7 @@ The methods of each Class is in alphabetical order.
###LGBMRanker ###LGBMRanker
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, callbacks=None)
Most arguments are same as Common Methods except: Most arguments are same as Common Methods except:
......
...@@ -74,7 +74,8 @@ def train(params, train_set, num_boost_round=100, ...@@ -74,7 +74,8 @@ def train(params, train_set, num_boost_round=100,
- list l: learning_rate = l[current_round] - list l: learning_rate = l[current_round]
- function f: learning_rate = f(current_round) - function f: learning_rate = f(current_round)
callbacks : list of callback functions callbacks : list of callback functions
List of callback functions that are applied at end of each iteration. List of callback functions that are applied at each iteration.
See Callbacks in Python-API.md for more information.
Returns Returns
------- -------
...@@ -319,7 +320,8 @@ def cv(params, train_set, num_boost_round=10, nfold=5, stratified=False, ...@@ -319,7 +320,8 @@ def cv(params, train_set, num_boost_round=10, nfold=5, stratified=False,
seed : int seed : int
Seed used to generate the folds (passed to numpy.random.seed). Seed used to generate the folds (passed to numpy.random.seed).
callbacks : list of callback functions callbacks : list of callback functions
List of callback functions that are applied at end of each iteration. List of callback functions that are applied at each iteration.
See Callbacks in Python-API.md for more information.
Returns Returns
------- -------
......
...@@ -35,7 +35,7 @@ def _objective_function_wrapper(func): ...@@ -35,7 +35,7 @@ def _objective_function_wrapper(func):
Expects a callable with signature ``func(y_true, y_pred)`` or ``func(y_true, y_pred, group): Expects a callable with signature ``func(y_true, y_pred)`` or ``func(y_true, y_pred, group):
y_true: array_like of shape [n_samples] y_true: array_like of shape [n_samples]
The target values The target values
y_pred: array_like of shape [n_samples] or shape[n_samples* n_class] (for multi-class) y_pred: array_like of shape [n_samples] or shape[n_samples * n_class] (for multi-class)
The predicted values The predicted values
group: array_like group: array_like
group/query data, used for ranking task group/query data, used for ranking task
...@@ -46,7 +46,7 @@ def _objective_function_wrapper(func): ...@@ -46,7 +46,7 @@ def _objective_function_wrapper(func):
The new objective function as expected by ``lightgbm.engine.train``. The new objective function as expected by ``lightgbm.engine.train``.
The signature is ``new_func(preds, dataset)``: The signature is ``new_func(preds, dataset)``:
preds: array_like, shape [n_samples] or shape[n_samples* n_class] preds: array_like, shape [n_samples] or shape[n_samples * n_class]
The predicted values The predicted values
dataset: ``dataset`` dataset: ``dataset``
The training set from which the labels will be extracted using The training set from which the labels will be extracted using
...@@ -97,7 +97,7 @@ def _eval_function_wrapper(func): ...@@ -97,7 +97,7 @@ def _eval_function_wrapper(func):
y_true: array_like of shape [n_samples] y_true: array_like of shape [n_samples]
The target values The target values
y_pred: array_like of shape [n_samples] or shape[n_samples* n_class] (for multi-class) y_pred: array_like of shape [n_samples] or shape[n_samples * n_class] (for multi-class)
The predicted values The predicted values
weight: array_like of shape [n_samples] weight: array_like of shape [n_samples]
The weight of samples The weight of samples
...@@ -110,7 +110,7 @@ def _eval_function_wrapper(func): ...@@ -110,7 +110,7 @@ def _eval_function_wrapper(func):
The new eval function as expected by ``lightgbm.engine.train``. The new eval function as expected by ``lightgbm.engine.train``.
The signature is ``new_func(preds, dataset)``: The signature is ``new_func(preds, dataset)``:
preds: array_like, shape [n_samples] or shape[n_samples* n_class] preds: array_like, shape [n_samples] or shape[n_samples * n_class]
The predicted values The predicted values
dataset: ``dataset`` dataset: ``dataset``
The training set from which the labels will be extracted using The training set from which the labels will be extracted using
...@@ -209,13 +209,13 @@ class LGBMModel(LGBMModelBase): ...@@ -209,13 +209,13 @@ class LGBMModel(LGBMModelBase):
y_true: array_like of shape [n_samples] y_true: array_like of shape [n_samples]
The target values The target values
y_pred: array_like of shape [n_samples] or shape[n_samples* n_class] y_pred: array_like of shape [n_samples] or shape[n_samples * n_class]
The predicted values The predicted values
group: array_like group: array_like
group/query data, used for ranking task group/query data, used for ranking task
grad: array_like of shape [n_samples] or shape[n_samples* n_class] grad: array_like of shape [n_samples] or shape[n_samples * n_class]
The value of the gradient for each sample point. The value of the gradient for each sample point.
hess: array_like of shape [n_samples] or shape[n_samples* n_class] hess: array_like of shape [n_samples] or shape[n_samples * n_class]
The value of the second derivative for each sample point The value of the second derivative for each sample point
for multi-class task, the y_pred is group by class_id first, then group by row_id for multi-class task, the y_pred is group by class_id first, then group by row_id
...@@ -276,7 +276,8 @@ class LGBMModel(LGBMModelBase): ...@@ -276,7 +276,8 @@ class LGBMModel(LGBMModelBase):
eval_init_score=None, eval_group=None, eval_init_score=None, eval_group=None,
eval_metric=None, eval_metric=None,
early_stopping_rounds=None, verbose=True, early_stopping_rounds=None, verbose=True,
feature_name=None, categorical_feature=None): feature_name=None, categorical_feature=None,
callbacks=None):
""" """
Fit the gradient boosting model Fit the gradient boosting model
...@@ -312,6 +313,9 @@ class LGBMModel(LGBMModelBase): ...@@ -312,6 +313,9 @@ class LGBMModel(LGBMModelBase):
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
callbacks : list of callback functions
List of callback functions that are applied at each iteration.
See Callbacks in Python-API.md for more information.
Note Note
---- ----
...@@ -398,7 +402,8 @@ class LGBMModel(LGBMModelBase): ...@@ -398,7 +402,8 @@ class LGBMModel(LGBMModelBase):
early_stopping_rounds=early_stopping_rounds, early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, fobj=self.fobj, feval=feval, evals_result=evals_result, fobj=self.fobj, feval=feval,
verbose_eval=verbose, feature_name=feature_name, verbose_eval=verbose, feature_name=feature_name,
categorical_feature=categorical_feature) categorical_feature=categorical_feature,
callbacks=callbacks)
if evals_result: if evals_result:
for val in evals_result.items(): for val in evals_result.items():
...@@ -525,7 +530,8 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase): ...@@ -525,7 +530,8 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase):
eval_init_score=None, eval_init_score=None,
eval_metric="binary_logloss", eval_metric="binary_logloss",
early_stopping_rounds=None, verbose=True, early_stopping_rounds=None, verbose=True,
feature_name=None, categorical_feature=None): feature_name=None, categorical_feature=None,
callbacks=None):
self._le = LGBMLabelEncoder().fit(y) self._le = LGBMLabelEncoder().fit(y)
y = self._le.transform(y) y = self._le.transform(y)
...@@ -547,7 +553,8 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase): ...@@ -547,7 +553,8 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase):
eval_metric=eval_metric, eval_metric=eval_metric,
early_stopping_rounds=early_stopping_rounds, early_stopping_rounds=early_stopping_rounds,
verbose=verbose, feature_name=feature_name, verbose=verbose, feature_name=feature_name,
categorical_feature=categorical_feature) categorical_feature=categorical_feature,
callbacks=callbacks)
return self return self
def predict(self, data, raw_score=False, num_iteration=0): def predict(self, data, raw_score=False, num_iteration=0):
...@@ -616,7 +623,8 @@ class LGBMRanker(LGBMModel): ...@@ -616,7 +623,8 @@ class LGBMRanker(LGBMModel):
eval_init_score=None, eval_group=None, eval_init_score=None, eval_group=None,
eval_metric='ndcg', eval_at=1, eval_metric='ndcg', eval_at=1,
early_stopping_rounds=None, verbose=True, early_stopping_rounds=None, verbose=True,
feature_name=None, categorical_feature=None): feature_name=None, categorical_feature=None,
callbacks=None):
""" """
Most arguments like common methods except following: Most arguments like common methods except following:
...@@ -633,10 +641,9 @@ class LGBMRanker(LGBMModel): ...@@ -633,10 +641,9 @@ class LGBMRanker(LGBMModel):
raise ValueError("Eval_group cannot be None when eval_set is not None") raise ValueError("Eval_group cannot be None when eval_set is not None")
elif len(eval_group) != len(eval_set): elif len(eval_group) != len(eval_set):
raise ValueError("Length of eval_group should equal to eval_set") raise ValueError("Length of eval_group should equal to eval_set")
else: elif (isinstance(eval_group, dict) and any(i not in eval_group or eval_group[i] is None for i in range(len(eval_group)))) \
for inner_group in eval_group: or (isinstance(eval_group, list) and any(group is None for group in eval_group)):
if inner_group is None: raise ValueError("Should set group for all eval dataset for ranking task; if you use dict, the index should start from 0")
raise ValueError("Should set group for all eval dataset for ranking task")
if eval_at is not None: if eval_at is not None:
self.eval_at = eval_at self.eval_at = eval_at
...@@ -647,5 +654,6 @@ class LGBMRanker(LGBMModel): ...@@ -647,5 +654,6 @@ class LGBMRanker(LGBMModel):
eval_metric=eval_metric, eval_metric=eval_metric,
early_stopping_rounds=early_stopping_rounds, early_stopping_rounds=early_stopping_rounds,
verbose=verbose, feature_name=feature_name, verbose=verbose, feature_name=feature_name,
categorical_feature=categorical_feature) categorical_feature=categorical_feature,
callbacks=callbacks)
return self return self
...@@ -43,7 +43,14 @@ class TestSklearn(unittest.TestCase): ...@@ -43,7 +43,14 @@ class TestSklearn(unittest.TestCase):
X_train, y_train = load_svmlight_file('../../examples/lambdarank/rank.train') X_train, y_train = load_svmlight_file('../../examples/lambdarank/rank.train')
X_test, y_test = load_svmlight_file('../../examples/lambdarank/rank.test') X_test, y_test = load_svmlight_file('../../examples/lambdarank/rank.test')
q_train = np.loadtxt('../../examples/lambdarank/rank.train.query') q_train = np.loadtxt('../../examples/lambdarank/rank.train.query')
lgb_model = lgb.LGBMRanker().fit(X_train, y_train, group=q_train, eval_at=[1]) q_test = np.loadtxt('../../examples/lambdarank/rank.test.query')
lgb_model = lgb.LGBMRanker().fit(X_train, y_train,
group=q_train,
eval_set=[(X_test, y_test)],
eval_group=[q_test],
eval_at=[1],
verbose=False,
callbacks=[lgb.reset_parameter(learning_rate=lambda x: 0.95 ** x * 0.1)])
def test_regression_with_custom_objective(self): def test_regression_with_custom_objective(self):
def objective_ls(y_true, y_pred): def objective_ls(y_true, y_pred):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment