Unverified Commit b5502d19 authored by Frank Fineis's avatar Frank Fineis Committed by GitHub
Browse files

[dask] add support for eval sets and custom eval functions (#4101)



* es WiP, need to add eval_sample_weight and eval_group

* add weight, group to dask es. WiP.

* dask es reorg

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

_train_part model.fit args to lines, pt2
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines pt3
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

dask_model.fit args to lines
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

use is instead of id()
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* applying changes to eval_set PR WiP

* dask support for eval_names, eval_metric, eval_stopping_rounds

* add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP

* fix lint errors in test_dask.py

* drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker

* add eval_at to test_dask eval_set ranker tests

* add back group_shape to lgbmmmodel docs, tighten tests

* drop random eval weights from early stopping, probably causing training to terminate too early

* add eval data templates to sklearn fit docs, add eval data docs to dask

* add n_features to _create_data, eval_set tests stop w/ desirable tree counts

* import alphabetically

* add back get_worker for eval_set error handling

* test_dask argmin typo

* push forgotten eval_names bugfix

* eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test

* change default eval_at to tuple 1-5

* re-drop get_worker

* drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit

* add eval_class_weight and eval_init_score to lightgbm/dask, WiP

* clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets

* remove redundant backslash

* lint fixes

* fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple

* use all data_outputs for test_eval_set tests

* undo newlines from first pr

* add custom_eval_metric test, correct issue with eval_at and metric names

* move _constant_metric outside of test

* dataset reference names instead of __strings__

* add padding to eval_set parts makes each part has same len(eval_set)

* eval set code clean up

* revert n_evals to be max len eval_set across all parts on worker

* pylint errors in _DatasetNames

* more pylint fixes

* pylinting...

* add by pytest.mark, mistakenly deleted during merge conflict resolution

* address code review comments

* add _pad_eval_names to handle nondeterministic evals_result_ valid set names

* change not evaluated evals_result_ test criteria

* address fit eval docs issues, switch _DatasetNames to Enum

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default

* address eval_set dict keys naming in docstr and training eval_set naming issue

* in test_dask check for obj-default metric names in eval_results, remove check for training key

* lint fixes for _pad_eval_names

* remove unnecessary breaklinen in _pad_eval_names docstr

* use Enum.member syntax not Enum.member.name

* remove str from supported eval_at types

* add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train

* remove "of shape = [n_samples]" from group_shape docs

* add eval_at base_doc in DaskLGBMRanker.fit

* remove excess paren from eval_names docs in _train

* make requested changes to test_dask.py

* remove Optional() wrapper on eval_at

* add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__

* fix ordering of .sklearn imports to attempt lint fix

* dask custom eval note to f-string pt1
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 2
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 3
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
parent bb39bc99
This diff is collapsed.
...@@ -207,13 +207,13 @@ _lgbmmodel_doc_fit = ( ...@@ -207,13 +207,13 @@ _lgbmmodel_doc_fit = (
A list of (X, y) tuple pairs to use as validation sets. A list of (X, y) tuple pairs to use as validation sets.
eval_names : list of strings or None, optional (default=None) eval_names : list of strings or None, optional (default=None)
Names of eval_set. Names of eval_set.
eval_sample_weight : list of arrays or None, optional (default=None) eval_sample_weight : {eval_sample_weight_shape}
Weights of eval data. Weights of eval data.
eval_class_weight : list or None, optional (default=None) eval_class_weight : list or None, optional (default=None)
Class weights of eval data. Class weights of eval data.
eval_init_score : list of arrays or None, optional (default=None) eval_init_score : {eval_init_score_shape}
Init score of eval data. Init score of eval data.
eval_group : list of arrays or None, optional (default=None) eval_group : {eval_group_shape}
Group data of eval data. Group data of eval data.
eval_metric : string, callable, list or None, optional (default=None) eval_metric : string, callable, list or None, optional (default=None)
If string, it should be a built-in evaluation metric to use. If string, it should be a built-in evaluation metric to use.
...@@ -718,7 +718,10 @@ class LGBMModel(_LGBMModelBase): ...@@ -718,7 +718,10 @@ class LGBMModel(_LGBMModelBase):
y_shape="array-like of shape = [n_samples]", y_shape="array-like of shape = [n_samples]",
sample_weight_shape="array-like of shape = [n_samples] or None, optional (default=None)", sample_weight_shape="array-like of shape = [n_samples] or None, optional (default=None)",
init_score_shape="array-like of shape = [n_samples] or None, optional (default=None)", init_score_shape="array-like of shape = [n_samples] or None, optional (default=None)",
group_shape="array-like or None, optional (default=None)" group_shape="array-like or None, optional (default=None)",
eval_sample_weight_shape="list of arrays or None, optional (default=None)",
eval_init_score_shape="list of arrays or None, optional (default=None)",
eval_group_shape="list of arrays or None, optional (default=None)"
) + "\n\n" + _lgbmmodel_doc_custom_eval_note ) + "\n\n" + _lgbmmodel_doc_custom_eval_note
def predict(self, X, raw_score=False, start_iteration=0, num_iteration=None, def predict(self, X, raw_score=False, start_iteration=0, num_iteration=None,
......
...@@ -214,6 +214,13 @@ def _accuracy_score(dy_true, dy_pred): ...@@ -214,6 +214,13 @@ def _accuracy_score(dy_true, dy_pred):
return da.average(dy_true == dy_pred).compute() return da.average(dy_true == dy_pred).compute()
def _constant_metric(dy_true, dy_pred):
metric_name = 'constant_metric'
value = 0.708
is_higher_better = False
return metric_name, value, is_higher_better
def _pickle(obj, filepath, serializer): def _pickle(obj, filepath, serializer):
if serializer == 'pickle': if serializer == 'pickle':
with open(filepath, 'wb') as f: with open(filepath, 'wb') as f:
...@@ -745,6 +752,231 @@ def test_ranker(output, group, boosting_type, tree_learner, cluster): ...@@ -745,6 +752,231 @@ def test_ranker(output, group, boosting_type, tree_learner, cluster):
assert tree_df.loc[node_uses_cat_col, "decision_type"].unique()[0] == '==' assert tree_df.loc[node_uses_cat_col, "decision_type"].unique()[0] == '=='
@pytest.mark.parametrize('task', tasks)
@pytest.mark.parametrize('output', data_output)
@pytest.mark.parametrize('eval_sizes', [[0.5, 1, 1.5], [0]])
@pytest.mark.parametrize('eval_names_prefix', ['specified', None])
def test_eval_set_no_early_stopping(task, output, eval_sizes, eval_names_prefix, cluster):
if task == 'ranking' and output == 'scipy_csr_matrix':
pytest.skip('LGBMRanker is not currently tested on sparse matrices')
with Client(cluster) as client:
# Use larger trainset to prevent premature stopping due to zero loss, causing num_trees() < n_estimators.
# Use small chunk_size to avoid single-worker allocation of eval data partitions.
n_samples = 1000
chunk_size = 10
n_eval_sets = len(eval_sizes)
eval_set = []
eval_sample_weight = []
eval_class_weight = None
eval_init_score = None
if eval_names_prefix:
eval_names = [f'{eval_names_prefix}_{i}' for i in range(len(eval_sizes))]
else:
eval_names = None
X, y, w, g, dX, dy, dw, dg = _create_data(
objective=task,
n_samples=n_samples,
output=output,
chunk_size=chunk_size
)
if task == 'ranking':
eval_metrics = ['ndcg']
eval_at = (5, 6)
eval_metric_names = [f'ndcg@{k}' for k in eval_at]
eval_group = []
else:
# test eval_class_weight, eval_init_score on binary-classification task.
# Note: objective's default `metric` will be evaluated in evals_result_ in addition to all eval_metrics.
if task == 'binary-classification':
eval_metrics = ['binary_error', 'auc']
eval_metric_names = ['binary_logloss', 'binary_error', 'auc']
eval_class_weight = []
eval_init_score = []
elif task == 'multiclass-classification':
eval_metrics = ['multi_error']
eval_metric_names = ['multi_logloss', 'multi_error']
elif task == 'regression':
eval_metrics = ['l1']
eval_metric_names = ['l2', 'l1']
# create eval_sets by creating new datasets or copying training data.
for eval_size in eval_sizes:
if eval_size == 1:
y_e = y
dX_e = dX
dy_e = dy
dw_e = dw
dg_e = dg
else:
n_eval_samples = max(chunk_size, int(n_samples * eval_size))
_, y_e, _, _, dX_e, dy_e, dw_e, dg_e = _create_data(
objective=task,
n_samples=n_eval_samples,
output=output,
chunk_size=chunk_size
)
eval_set.append((dX_e, dy_e))
eval_sample_weight.append(dw_e)
if task == 'ranking':
eval_group.append(dg_e)
if task == 'binary-classification':
n_neg = np.sum(y_e == 0)
n_pos = np.sum(y_e == 1)
eval_class_weight.append({0: n_neg / n_pos, 1: n_pos / n_neg})
init_score_value = np.log(np.mean(y_e) / (1 - np.mean(y_e)))
if 'dataframe' in output:
d_init_score = dy_e.map_partitions(lambda x: pd.Series([init_score_value] * x.size))
else:
d_init_score = dy_e.map_blocks(lambda x: np.repeat(init_score_value, x.size))
eval_init_score.append(d_init_score)
fit_trees = 50
params = {
"random_state": 42,
"n_estimators": fit_trees,
"num_leaves": 2
}
model_factory = task_to_dask_factory[task]
dask_model = model_factory(
client=client,
**params
)
fit_params = {
'X': dX,
'y': dy,
'eval_set': eval_set,
'eval_names': eval_names,
'eval_sample_weight': eval_sample_weight,
'eval_init_score': eval_init_score,
'eval_metric': eval_metrics,
'verbose': True
}
if task == 'ranking':
fit_params.update(
{'group': dg,
'eval_group': eval_group,
'eval_at': eval_at}
)
elif task == 'binary-classification':
fit_params.update({'eval_class_weight': eval_class_weight})
if eval_sizes == [0]:
with pytest.warns(UserWarning, match='Worker (.*) was not allocated eval_set data. Therefore evals_result_ and best_score_ data may be unreliable.'):
dask_model.fit(**fit_params)
else:
dask_model = dask_model.fit(**fit_params)
# total number of trees scales up for ova classifier.
if task == 'multiclass-classification':
model_trees = fit_trees * dask_model.n_classes_
else:
model_trees = fit_trees
# check that early stopping was not applied.
assert dask_model.booster_.num_trees() == model_trees
assert dask_model.best_iteration_ is None
# checks that evals_result_ and best_score_ contain expected data and eval_set names.
evals_result = dask_model.evals_result_
best_scores = dask_model.best_score_
assert len(evals_result) == n_eval_sets
assert len(best_scores) == n_eval_sets
for eval_name in evals_result:
assert eval_name in dask_model.best_score_
if eval_names:
assert eval_name in eval_names
# check that each eval_name and metric exists for all eval sets, allowing for the
# case when a worker receives a fully-padded eval_set component which is not evaluated.
if evals_result[eval_name] != 'not evaluated':
for metric in eval_metric_names:
assert metric in evals_result[eval_name]
assert metric in best_scores[eval_name]
assert len(evals_result[eval_name][metric]) == fit_trees
@pytest.mark.parametrize('task', ['binary-classification', 'regression', 'ranking'])
def test_eval_set_with_custom_eval_metric(task, cluster):
with Client(cluster) as client:
n_samples = 1000
n_eval_samples = int(n_samples * 0.5)
chunk_size = 10
output = 'array'
X, y, w, g, dX, dy, dw, dg = _create_data(
objective=task,
n_samples=n_samples,
output=output,
chunk_size=chunk_size
)
_, _, _, _, dX_e, dy_e, _, dg_e = _create_data(
objective=task,
n_samples=n_eval_samples,
output=output,
chunk_size=chunk_size
)
if task == 'ranking':
eval_at = (5, 6)
eval_metrics = ['ndcg', _constant_metric]
eval_metric_names = [f'ndcg@{k}' for k in eval_at] + ['constant_metric']
elif task == 'binary-classification':
eval_metrics = ['binary_error', 'auc', _constant_metric]
eval_metric_names = ['binary_logloss', 'binary_error', 'auc', 'constant_metric']
else:
eval_metrics = ['l1', _constant_metric]
eval_metric_names = ['l2', 'l1', 'constant_metric']
fit_trees = 50
params = {
"random_state": 42,
"n_estimators": fit_trees,
"num_leaves": 2
}
model_factory = task_to_dask_factory[task]
dask_model = model_factory(
client=client,
**params
)
eval_set = [(dX_e, dy_e)]
fit_params = {
'X': dX,
'y': dy,
'eval_set': eval_set,
'eval_metric': eval_metrics
}
if task == 'ranking':
fit_params.update(
{'group': dg,
'eval_group': [dg_e],
'eval_at': eval_at}
)
dask_model = dask_model.fit(**fit_params)
eval_name = 'valid_0'
evals_result = dask_model.evals_result_
assert len(evals_result) == 1
assert eval_name in evals_result
for metric in eval_metric_names:
assert metric in evals_result[eval_name]
assert len(evals_result[eval_name][metric]) == fit_trees
np.testing.assert_allclose(evals_result[eval_name]['constant_metric'], 0.708)
@pytest.mark.parametrize('task', tasks) @pytest.mark.parametrize('task', tasks)
def test_training_works_if_client_not_provided_or_set_after_construction(task, cluster): def test_training_works_if_client_not_provided_or_set_after_construction(task, cluster):
with Client(cluster) as client: with Client(cluster) as client:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment