[dask] add support for eval sets and custom eval functions (#4101)

* es WiP, need to add eval_sample_weight and eval_group * add weight, group to dask es. WiP. * dask es reorg * Update python-package/lightgbm/dask.py _train_part model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py _train_part model.fit args to lines, pt2 Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py _train_part model.fit args to lines pt3 Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py dask_model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py use is instead of id() Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * applying changes to eval_set PR WiP * dask support for eval_names, eval_metric, eval_stopping_rounds * add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP * fix lint errors in test_dask.py * drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker * add eval_at to test_dask eval_set ranker tests * add back group_shape to lgbmmmodel docs, tighten tests * drop random eval weights from early stopping, probably causing training to terminate too early * add eval data templates to sklearn fit docs, add eval data docs to dask * add n_features to _create_data, eval_set tests stop w/ desirable tree counts * import alphabetically * add back get_worker for eval_set error handling * test_dask argmin typo * push forgotten eval_names bugfix * eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test * change default eval_at to tuple 1-5 * re-drop get_worker * drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit * add eval_class_weight and eval_init_score to lightgbm/dask, WiP * clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets * remove redundant backslash * lint fixes * fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple * use all data_outputs for test_eval_set tests * undo newlines from first pr * add custom_eval_metric test, correct issue with eval_at and metric names * move _constant_metric outside of test * dataset reference names instead of __strings__ * add padding to eval_set parts makes each part has same len(eval_set) * eval set code clean up * revert n_evals to be max len eval_set across all parts on worker * pylint errors in _DatasetNames * more pylint fixes * pylinting... * add by pytest.mark, mistakenly deleted during merge conflict resolution * address code review comments * add _pad_eval_names to handle nondeterministic evals_result_ valid set names * change not evaluated evals_result_ test criteria * address fit eval docs issues, switch _DatasetNames to Enum * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default * address eval_set dict keys naming in docstr and training eval_set naming issue * in test_dask check for obj-default metric names in eval_results, remove check for training key * lint fixes for _pad_eval_names * remove unnecessary breaklinen in _pad_eval_names docstr * use Enum.member syntax not Enum.member.name * remove str from supported eval_at types * add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train * remove "of shape = [n_samples]" from group_shape docs * add eval_at base_doc in DaskLGBMRanker.fit * remove excess paren from eval_names docs in _train * make requested changes to test_dask.py * remove Optional() wrapper on eval_at * add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__ * fix ordering of .sklearn imports to attempt lint fix * dask custom eval note to f-string pt1 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * dask custom eval note to f-string pt 2 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * dask custom eval note to f-string pt 3 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

[dask] add support for eval sets and custom eval functions (#4101)
* es WiP, need to add eval_sample_weight and eval_group * add weight, group to dask es. WiP. * dask es reorg * Update python-package/lightgbm/dask.py _train_part model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py _train_part model.fit args to lines, pt2 Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py _train_part model.fit args to lines pt3 Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py dask_model.fit args to lines Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py use is instead of id() Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update tests/python_package_test/test_dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * Update python-package/lightgbm/dask.py Co-authored-by: James Lamb <jaylamb20@gmail.com> * applying changes to eval_set PR WiP * dask support for eval_names, eval_metric, eval_stopping_rounds * add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP * fix lint errors in test_dask.py * drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker * add eval_at to test_dask eval_set ranker tests * add back group_shape to lgbmmmodel docs, tighten tests * drop random eval weights from early stopping, probably causing training to terminate too early * add eval data templates to sklearn fit docs, add eval data docs to dask * add n_features to _create_data, eval_set tests stop w/ desirable tree counts * import alphabetically * add back get_worker for eval_set error handling * test_dask argmin typo * push forgotten eval_names bugfix * eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test * change default eval_at to tuple 1-5 * re-drop get_worker * drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit * add eval_class_weight and eval_init_score to lightgbm/dask, WiP * clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets * remove redundant backslash * lint fixes * fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple * use all data_outputs for test_eval_set tests * undo newlines from first pr * add custom_eval_metric test, correct issue with eval_at and metric names * move _constant_metric outside of test * dataset reference names instead of __strings__ * add padding to eval_set parts makes each part has same len(eval_set) * eval set code clean up * revert n_evals to be max len eval_set across all parts on worker * pylint errors in _DatasetNames * more pylint fixes * pylinting... * add by pytest.mark, mistakenly deleted during merge conflict resolution * address code review comments * add _pad_eval_names to handle nondeterministic evals_result_ valid set names * change not evaluated evals_result_ test criteria * address fit eval docs issues, switch _DatasetNames to Enum * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default * address eval_set dict keys naming in docstr and training eval_set naming issue * in test_dask check for obj-default metric names in eval_results, remove check for training key * lint fixes for _pad_eval_names * remove unnecessary breaklinen in _pad_eval_names docstr * use Enum.member syntax not Enum.member.name * remove str from supported eval_at types * add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train * remove "of shape = [n_samples]" from group_shape docs * add eval_at base_doc in DaskLGBMRanker.fit * remove excess paren from eval_names docs in _train * make requested changes to test_dask.py * remove Optional() wrapper on eval_at * add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__ * fix ordering of .sklearn imports to attempt lint fix * dask custom eval note to f-string pt1 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * dask custom eval note to f-string pt 2 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * dask custom eval note to f-string pt 3 Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: James Lamb <jaylamb20@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
b5502d19 · Frank Fineis · GitHub · bb39bc99 · b5502d19 · b5502d19
Unverified Commit b5502d19 authored Jun 27, 2021 by Frank Fineis Committed by GitHub Jun 27, 2021
3 changed files
--- a/python-package/lightgbm/dask.py
+++ b/python-package/lightgbm/dask.py
--- a/python-package/lightgbm/sklearn.py
+++ b/python-package/lightgbm/sklearn.py
@@ -207,13 +207,13 @@ _lgbmmodel_doc_fit = (
        A list of (X, y) tuple pairs to use as validation sets.
    eval_names : list of strings or None, optional (default=None)
        Names of eval_set.
-    eval_sample_weight : list of arrays or None, optional (default=None)
+    eval_sample_weight : {eval_sample_weight_shape}
        Weights of eval data.
    eval_class_weight : list or None, optional (default=None)
        Class weights of eval data.
-    eval_init_score : list of arrays or None, optional (default=None)
+    eval_init_score : {eval_init_score_shape}
        Init score of eval data.
-    eval_group : list of arrays or None, optional (default=None)
+    eval_group : {eval_group_shape}
        Group data of eval data.
    eval_metric : string, callable, list or None, optional (default=None)
        If string, it should be a built-in evaluation metric to use.
@@ -718,7 +718,10 @@ class LGBMModel(_LGBMModelBase):
        y_shape="array-like of shape = [n_samples]",
        sample_weight_shape="array-like of shape = [n_samples] or None, optional (default=None)",
        init_score_shape="array-like of shape = [n_samples] or None, optional (default=None)",
-        group_shape="array-like or None, optional (default=None)"
+        group_shape="array-like or None, optional (default=None)",
+        eval_sample_weight_shape="list of arrays or None, optional (default=None)",
+        eval_init_score_shape="list of arrays or None, optional (default=None)",
+        eval_group_shape="list of arrays or None, optional (default=None)"
    ) + "\n\n" + _lgbmmodel_doc_custom_eval_note
    def predict(self, X, raw_score=False, start_iteration=0, num_iteration=None,

--- a/tests/python_package_test/test_dask.py
+++ b/tests/python_package_test/test_dask.py
@@ -214,6 +214,13 @@ def _accuracy_score(dy_true, dy_pred):
    return da.average(dy_true == dy_pred).compute()
+def _constant_metric(dy_true, dy_pred):
+    metric_name = 'constant_metric'
+    value = 0.708
+    is_higher_better = False
+    return metric_name, value, is_higher_better
 def _pickle(obj, filepath, serializer):
    if serializer == 'pickle':
        with open(filepath, 'wb') as f:
@@ -745,6 +752,231 @@ def test_ranker(output, group, boosting_type, tree_learner, cluster):
            assert tree_df.loc[node_uses_cat_col, "decision_type"].unique()[0] == '=='
+@pytest.mark.parametrize('task', tasks)
+@pytest.mark.parametrize('output', data_output)
+@pytest.mark.parametrize('eval_sizes', [[0.5, 1, 1.5], [0]])
+@pytest.mark.parametrize('eval_names_prefix', ['specified', None])
+def test_eval_set_no_early_stopping(task, output, eval_sizes, eval_names_prefix, cluster):
+    if task == 'ranking' and output == 'scipy_csr_matrix':
+        pytest.skip('LGBMRanker is not currently tested on sparse matrices')
+    with Client(cluster) as client:
+        # Use larger trainset to prevent premature stopping due to zero loss, causing num_trees() < n_estimators.
+        # Use small chunk_size to avoid single-worker allocation of eval data partitions.
+        n_samples = 1000
+        chunk_size = 10
+        n_eval_sets = len(eval_sizes)
+        eval_set = []
+        eval_sample_weight = []
+        eval_class_weight = None
+        eval_init_score = None
+        if eval_names_prefix:
+            eval_names = [f'{eval_names_prefix}_{i}' for i in range(len(eval_sizes))]
+        else:
+            eval_names = None
+        X, y, w, g, dX, dy, dw, dg = _create_data(
+            objective=task,
+            n_samples=n_samples,
+            output=output,
+            chunk_size=chunk_size
+        )
+        if task == 'ranking':
+            eval_metrics = ['ndcg']
+            eval_at = (5, 6)
+            eval_metric_names = [f'ndcg@{k}' for k in eval_at]
+            eval_group = []
+        else:
+            # test eval_class_weight, eval_init_score on binary-classification task.
+            # Note: objective's default `metric` will be evaluated in evals_result_ in addition to all eval_metrics.
+            if task == 'binary-classification':
+                eval_metrics = ['binary_error', 'auc']
+                eval_metric_names = ['binary_logloss', 'binary_error', 'auc']
+                eval_class_weight = []
+                eval_init_score = []
+            elif task == 'multiclass-classification':
+                eval_metrics = ['multi_error']
+                eval_metric_names = ['multi_logloss', 'multi_error']
+            elif task == 'regression':
+                eval_metrics = ['l1']
+                eval_metric_names = ['l2', 'l1']
+        # create eval_sets by creating new datasets or copying training data.
+        for eval_size in eval_sizes:
+            if eval_size == 1:
+                y_e = y
+                dX_e = dX
+                dy_e = dy
+                dw_e = dw
+                dg_e = dg
+            else:
+                n_eval_samples = max(chunk_size, int(n_samples * eval_size))
+                _, y_e, _, _, dX_e, dy_e, dw_e, dg_e = _create_data(
+                    objective=task,
+                    n_samples=n_eval_samples,
+                    output=output,
+                    chunk_size=chunk_size
+                )
+            eval_set.append((dX_e, dy_e))
+            eval_sample_weight.append(dw_e)
+            if task == 'ranking':
+                eval_group.append(dg_e)
+            if task == 'binary-classification':
+                n_neg = np.sum(y_e == 0)
+                n_pos = np.sum(y_e == 1)
+                eval_class_weight.append({0: n_neg / n_pos, 1: n_pos / n_neg})
+                init_score_value = np.log(np.mean(y_e) / (1 - np.mean(y_e)))
+                if 'dataframe' in output:
+                    d_init_score = dy_e.map_partitions(lambda x: pd.Series([init_score_value] * x.size))
+                else:
+                    d_init_score = dy_e.map_blocks(lambda x: np.repeat(init_score_value, x.size))
+                eval_init_score.append(d_init_score)
+        fit_trees = 50
+        params = {
+            "random_state": 42,
+            "n_estimators": fit_trees,
+            "num_leaves": 2
+        }
+        model_factory = task_to_dask_factory[task]
+        dask_model = model_factory(
+            client=client,
+            **params
+        )
+        fit_params = {
+            'X': dX,
+            'y': dy,
+            'eval_set': eval_set,
+            'eval_names': eval_names,
+            'eval_sample_weight': eval_sample_weight,
+            'eval_init_score': eval_init_score,
+            'eval_metric': eval_metrics,
+            'verbose': True
+        }
+        if task == 'ranking':
+            fit_params.update(
+                {'group': dg,
+                 'eval_group': eval_group,
+                 'eval_at': eval_at}
+            )
+        elif task == 'binary-classification':
+            fit_params.update({'eval_class_weight': eval_class_weight})
+        if eval_sizes == [0]:
+            with pytest.warns(UserWarning, match='Worker (.*) was not allocated eval_set data. Therefore evals_result_ and best_score_ data may be unreliable.'):
+                dask_model.fit(**fit_params)
+        else:
+            dask_model = dask_model.fit(**fit_params)
+            # total number of trees scales up for ova classifier.
+            if task == 'multiclass-classification':
+                model_trees = fit_trees * dask_model.n_classes_
+            else:
+                model_trees = fit_trees
+            # check that early stopping was not applied.
+            assert dask_model.booster_.num_trees() == model_trees
+            assert dask_model.best_iteration_ is None
+            # checks that evals_result_ and best_score_ contain expected data and eval_set names.
+            evals_result = dask_model.evals_result_
+            best_scores = dask_model.best_score_
+            assert len(evals_result) == n_eval_sets
+            assert len(best_scores) == n_eval_sets
+            for eval_name in evals_result:
+                assert eval_name in dask_model.best_score_
+                if eval_names:
+                    assert eval_name in eval_names
+                # check that each eval_name and metric exists for all eval sets, allowing for the
+                # case when a worker receives a fully-padded eval_set component which is not evaluated.
+                if evals_result[eval_name] != 'not evaluated':
+                    for metric in eval_metric_names:
+                        assert metric in evals_result[eval_name]
+                        assert metric in best_scores[eval_name]
+                        assert len(evals_result[eval_name][metric]) == fit_trees
+@pytest.mark.parametrize('task', ['binary-classification', 'regression', 'ranking'])
+def test_eval_set_with_custom_eval_metric(task, cluster):
+    with Client(cluster) as client:
+        n_samples = 1000
+        n_eval_samples = int(n_samples * 0.5)
+        chunk_size = 10
+        output = 'array'
+        X, y, w, g, dX, dy, dw, dg = _create_data(
+            objective=task,
+            n_samples=n_samples,
+            output=output,
+            chunk_size=chunk_size
+        )
+        _, _, _, _, dX_e, dy_e, _, dg_e = _create_data(
+            objective=task,
+            n_samples=n_eval_samples,
+            output=output,
+            chunk_size=chunk_size
+        )
+        if task == 'ranking':
+            eval_at = (5, 6)
+            eval_metrics = ['ndcg', _constant_metric]
+            eval_metric_names = [f'ndcg@{k}' for k in eval_at] + ['constant_metric']
+        elif task == 'binary-classification':
+            eval_metrics = ['binary_error', 'auc', _constant_metric]
+            eval_metric_names = ['binary_logloss', 'binary_error', 'auc', 'constant_metric']
+        else:
+            eval_metrics = ['l1', _constant_metric]
+            eval_metric_names = ['l2', 'l1', 'constant_metric']
+        fit_trees = 50
+        params = {
+            "random_state": 42,
+            "n_estimators": fit_trees,
+            "num_leaves": 2
+        }
+        model_factory = task_to_dask_factory[task]
+        dask_model = model_factory(
+            client=client,
+            **params
+        )
+        eval_set = [(dX_e, dy_e)]
+        fit_params = {
+            'X': dX,
+            'y': dy,
+            'eval_set': eval_set,
+            'eval_metric': eval_metrics
+        }
+        if task == 'ranking':
+            fit_params.update(
+                {'group': dg,
+                 'eval_group': [dg_e],
+                 'eval_at': eval_at}
+            )
+        dask_model = dask_model.fit(**fit_params)
+        eval_name = 'valid_0'
+        evals_result = dask_model.evals_result_
+        assert len(evals_result) == 1
+        assert eval_name in evals_result
+        for metric in eval_metric_names:
+            assert metric in evals_result[eval_name]
+            assert len(evals_result[eval_name][metric]) == fit_trees
+        np.testing.assert_allclose(evals_result[eval_name]['constant_metric'], 0.708)
 @pytest.mark.parametrize('task', tasks)
 def test_training_works_if_client_not_provided_or_set_after_construction(task, cluster):
    with Client(cluster) as client: