Commits · 0a4d1908289ef8b8d5b64190d1e92bd987ccc9b0 · tianlh / LightGBM-DCU

10 Nov, 2021 2 commits

[python][sklearn] respect objective aliases (#4758) · 0a4d1908

Nikita Titov authored Nov 10, 2021

* respect objective aliases

* Update test_sklearn.py

* revert removal of blank lines

* add argument name which is being overwritten in warning message

0a4d1908

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) (#4725) · 33a2f9ec

tongwu-msft authored Nov 10, 2021

* issue fix #4601

* fix issue 4601 it2

* add tests for issue 4601

* fix warning

* fix warning

* add new line at end

* remove last line at end

* fix lint warning

* address comments

* address comments

* address comments

* fix address

* address comments

* revert seed

* fix recursive force split issue

* fix build error

* fix lint warning

33a2f9ec

08 Nov, 2021 1 commit
- Suppress categorical warning (fixes #3379) · b1facf50
  Zhiyuan He authored Nov 08, 2021
  
  b1facf50
07 Nov, 2021 1 commit
- [ci][tests][python] remove assertion for `filename` attribute that is no... · cebdc2a8
  Nikita Titov authored Nov 07, 2021
```
[ci][tests][python] remove assertion for `filename` attribute that is no longer true with new version of graphviz (#4778)
```
  cebdc2a8
05 Nov, 2021 1 commit
- [python][sklearn] add `n_estimators_` and `n_iter_` post-fit attributes (#4753) · aab212a7
  Nikita Titov authored Nov 05, 2021
```
* add n_estimators_ and n_iter_ post-fit attributes

* address review comments
```
  aab212a7
29 Oct, 2021 1 commit
- [tests] [python] add test for non-serializable callback (#4741) · 798dc1d4
  Nikita Titov authored Oct 29, 2021
  
  798dc1d4
13 Oct, 2021 1 commit
- fix behavior for default objective and metric (#4660) · d130bb19
  Nikita Titov authored Oct 13, 2021
  
  d130bb19
07 Oct, 2021 1 commit
- [tests][python-package] refactor list_to_1d_numpy test to run without pandas installed (#4639) · 29857c8a
  José Morales authored Oct 07, 2021
```
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
```
  29857c8a
23 Sep, 2021 1 commit
- [python] add placeholders to titles in plotting functions (#4614) · b78175b7
  Nikita Titov authored Sep 23, 2021
  
  b78175b7
17 Sep, 2021 1 commit

[python-package] Support 2d collections as input for `init_score` in... · f1f5ba15

José Morales authored Sep 17, 2021


[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150)

* initial implementation of init_score for multiclass classification

* check for 1d or 2d collection in init_score

* remove dataset import

* initial comments

* update dask test and docstrings

* update docstrings

* move logic to set_field. reshape back on get_field

* add type hints and update docstrings for dask. fix Dataset.set_field

* revert wrong docstrings and type hints

* add extra comma for consistency

* prefix private functions with underscore

add type hints to new functions

make commas consistent in dask and basic

* add missing spaces after type hint

* remove shape condition for dataframe in is_2d_collection
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

f1f5ba15

15 Sep, 2021 1 commit

[python] rename `print_evaluation()` into `log_evaluation()` (#4604) · 54facc4d

Nikita Titov authored Sep 16, 2021

* Update __init__.py

* Update Python-API.rst

* Update engine.py

* Update test_utilities.py

* Update sklearn.py

* Update callback.py

* Update callback.py

* Update callback.py

54facc4d

12 Sep, 2021 1 commit
- [RFC][python] deprecate advanced args of `train()` and `cv()` functions and sklearn wrapper (#4574) · 86bda6f0
  Nikita Titov authored Sep 12, 2021
```
* deprecate advanced args of `train()` and `cv()`

* update Dask test

* improve deducing

* address review comments
```
  86bda6f0
10 Sep, 2021 1 commit
- [python] [sklearn] respect `eval_at` aliases in keyword arguments (#4599) · 79463dfb
  Nikita Titov authored Sep 10, 2021
  
  79463dfb
09 Sep, 2021 2 commits
- [tests][dask] Use workers hostname in tests (fixes #4594) (#4595) · 5857ef5e
  José Morales authored Sep 09, 2021
```
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
```
  5857ef5e
- [ci] skip Dask tests on QEMU builds (#4600) · 4bf9f954
  James Lamb authored Sep 09, 2021
  
  4bf9f954
01 Sep, 2021 1 commit
- add 'auto' value for `importance_type` param in plotting (#4570) · 39421265
  Nikita Titov authored Sep 01, 2021
  
  39421265
23 Aug, 2021 1 commit

[python] add parameter object_hook to method dump_model (#4533) · 11d7608f

Xavier Dupré authored Aug 24, 2021



* add parameter object_hook to function dump_model (python API)

* eol

* fix syntax

* lint

* better documentation

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

11d7608f

09 Aug, 2021 1 commit

[tests][dask] reduce number of collisions tests (#4501) · cfe8eb17

José Morales authored Aug 09, 2021

* reduce number of collisions tests

* measure tests execution time

* measure tests execution time in bdist task

* remove durations in bdist task

cfe8eb17

03 Aug, 2021 1 commit
- [dask] find all needed ports in each host at once (fixes #4458) (#4498) · 5fe27d59
  José Morales authored Aug 03, 2021
```
* find all needed ports in each worker at once

* lint

* better naming

* use _HostWorkers in test
```
  5fe27d59
31 Jul, 2021 1 commit
- [python][tests] refactor tests with Sequence input (#4495) · 661bde10
  Nikita Titov authored Jul 31, 2021
  
  661bde10
30 Jul, 2021 1 commit

[python] support Dataset.get_data for Sequence input. (#4472) · 1d21d1ad

Chen Yufei authored Jul 31, 2021



* [python] support Dataset.get_data for Sequence input.

* Tweaks according to review comments.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Add test cases.

* fix import order in test_basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1d21d1ad

10 Jul, 2021 1 commit
- [tests][python] added tests for early stop in prediction in ranking task (#4457) · d05f5470
  Nikita Titov authored Jul 10, 2021
  
  d05f5470
07 Jul, 2021 2 commits

[python] allow to pass some params as pathlib.Path objects (#4440) · 90342e92
Nikita Titov authored Jul 07, 2021
```
* allow to pass some params as pathlib.Path objects

* fix lint

* improve indentation
```
90342e92

[dask] Make output of feature contribution predictions for sparse matrices... · b09da434

James Lamb authored Jul 07, 2021


[dask] Make output of feature contribution predictions for sparse matrices match those from sklearn estimators (fixes #3881) (#4378)

* test_classifier working

* adding tests

* docs

* tests

* revert unnecessary changes in tests

* test output type

* linting

* linting

* use from_delayed() instead

* docstring pycodestyle is happy with

* isort

* put pytest skips back

* respect sparse return type

* fix doc

* remove unnecessary dask_array_concatenate()

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update predict_proba() docstring

* remove unnecessary np.array()

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix assertion

* fix test use of len()

* restore np.array() in tests

* use np.asarray() instead

* use toarray()

* remove empty functions in compat
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

b09da434

06 Jul, 2021 1 commit
- [python-package] use toarray() instead of todense() in tests and examples (#4446) · e36cc9c1
  James Lamb authored Jul 06, 2021
  
  e36cc9c1
05 Jul, 2021 1 commit

[python] minor refactoring of Python code (#4442) · 7eac5a63

Nikita Titov authored Jul 05, 2021

* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py

7eac5a63

04 Jul, 2021 3 commits
- [tests] fix deprecation numpy warning (#4439) · 29052c5d
  Nikita Titov authored Jul 05, 2021
  
  29052c5d
- [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136) (#4436) · 26cc160a
  James Lamb authored Jul 04, 2021
```
* [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136)

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* revert get_workflow_status changes
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
```
  26cc160a
- [python] migrate to pathlib in python tests (#4435) · cff80442
  Nikita Titov authored Jul 04, 2021
  
  cff80442
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

28 Jun, 2021 1 commit

[dask] add support for eval sets and custom eval functions (#4101) · b5502d19

Frank Fineis authored Jun 27, 2021



* es WiP, need to add eval_sample_weight and eval_group

* add weight, group to dask es. WiP.

* dask es reorg

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

_train_part model.fit args to lines, pt2
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines pt3
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

dask_model.fit args to lines
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

use is instead of id()
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* applying changes to eval_set PR WiP

* dask support for eval_names, eval_metric, eval_stopping_rounds

* add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP

* fix lint errors in test_dask.py

* drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker

* add eval_at to test_dask eval_set ranker tests

* add back group_shape to lgbmmmodel docs, tighten tests

* drop random eval weights from early stopping, probably causing training to terminate too early

* add eval data templates to sklearn fit docs, add eval data docs to dask

* add n_features to _create_data, eval_set tests stop w/ desirable tree counts

* import alphabetically

* add back get_worker for eval_set error handling

* test_dask argmin typo

* push forgotten eval_names bugfix

* eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test

* change default eval_at to tuple 1-5

* re-drop get_worker

* drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit

* add eval_class_weight and eval_init_score to lightgbm/dask, WiP

* clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets

* remove redundant backslash

* lint fixes

* fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple

* use all data_outputs for test_eval_set tests

* undo newlines from first pr

* add custom_eval_metric test, correct issue with eval_at and metric names

* move _constant_metric outside of test

* dataset reference names instead of __strings__

* add padding to eval_set parts makes each part has same len(eval_set)

* eval set code clean up

* revert n_evals to be max len eval_set across all parts on worker

* pylint errors in _DatasetNames

* more pylint fixes

* pylinting...

* add by pytest.mark, mistakenly deleted during merge conflict resolution

* address code review comments

* add _pad_eval_names to handle nondeterministic evals_result_ valid set names

* change not evaluated evals_result_ test criteria

* address fit eval docs issues, switch _DatasetNames to Enum

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default

* address eval_set dict keys naming in docstr and training eval_set naming issue

* in test_dask check for obj-default metric names in eval_results, remove check for training key

* lint fixes for _pad_eval_names

* remove unnecessary breaklinen in _pad_eval_names docstr

* use Enum.member syntax not Enum.member.name

* remove str from supported eval_at types

* add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train

* remove "of shape = [n_samples]" from group_shape docs

* add eval_at base_doc in DaskLGBMRanker.fit

* remove excess paren from eval_names docs in _train

* make requested changes to test_dask.py

* remove Optional() wrapper on eval_at

* add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__

* fix ordering of .sklearn imports to attempt lint fix

* dask custom eval note to f-string pt1
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 2
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 3
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

b5502d19

27 Jun, 2021 2 commits
- [python] replace numpy.zeros with numpy.empty for the speedup (#4410) · 45ac271b
  Nikita Titov authored Jun 27, 2021
  
  45ac271b
- [tests][dask] add missing compute() in Dask test (#4412) · db3915c2
  James Lamb authored Jun 27, 2021
  
  db3915c2
26 Jun, 2021 2 commits

[dask] pass additional predict() parameters through when input is a Dask Array (#4399) · 8116d880

James Lamb authored Jun 26, 2021



* [dask] pass predict() kwargs through when input is a Dask Array

* add tests

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add prediction early stopping params
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

8116d880

fix param aliases (#4387) · aab8fc18
Nikita Titov authored Jun 26, 2021

aab8fc18

15 Jun, 2021 1 commit
- [tests] replace pytest.parametrize (#4377) · c738c83b
  Nikita Titov authored Jun 15, 2021
```
* replace pytest.parametrize

* add informative message for assert
```
  c738c83b
12 Jun, 2021 1 commit
- [tests][python] fix f-string in test_dask.py (#4373) · c3b9363d
  Nikita Titov authored Jun 12, 2021
  
  c3b9363d
09 Jun, 2021 2 commits

[python] improving the syntax of the fstring in the file :... · d677d6c6

sayantan sadhu authored Jun 09, 2021


[python] improving the syntax of the fstring in the file : tests/python_package_test/test_dask.py (#4358)

* updated the old syntax with fstrings

* Updated the strings with + catenation to fstrings

* Updated the strings with + catenation to fstrings

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

d677d6c6

[python-package] change to f-strings in test_plotting.py (#4359) · 9143003d
Weston King-Leatham authored Jun 08, 2021

9143003d

07 Jun, 2021 1 commit
- [python-package] updated test_consistency.py to use f-strings (#4348) · bab58d0e
  sayantan sadhu authored Jun 07, 2021
  
  bab58d0e