Commits · a06899ab72e59aeb635c794962cb19a5880724df · tianlh / LightGBM-DCU

06 Jul, 2021 1 commit
- [python-package] use toarray() instead of todense() in tests and examples (#4446) · e36cc9c1
  James Lamb authored Jul 06, 2021
  
  e36cc9c1
05 Jul, 2021 1 commit

[python] minor refactoring of Python code (#4442) · 7eac5a63

Nikita Titov authored Jul 05, 2021

* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py

7eac5a63

04 Jul, 2021 3 commits
- [tests] fix deprecation numpy warning (#4439) · 29052c5d
  Nikita Titov authored Jul 05, 2021
  
  29052c5d
- [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136) (#4436) · 26cc160a
  James Lamb authored Jul 04, 2021
```
* [python-package] convert string concatenation to f-strings in test_engine.py (fixes #4136)

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* revert get_workflow_status changes
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
```
  26cc160a
- [python] migrate to pathlib in python tests (#4435) · cff80442
  Nikita Titov authored Jul 04, 2021
  
  cff80442
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

28 Jun, 2021 1 commit

[dask] add support for eval sets and custom eval functions (#4101) · b5502d19

Frank Fineis authored Jun 27, 2021



* es WiP, need to add eval_sample_weight and eval_group

* add weight, group to dask es. WiP.

* dask es reorg

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

_train_part model.fit args to lines, pt2
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

_train_part model.fit args to lines pt3
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py

dask_model.fit args to lines
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py

use is instead of id()
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* applying changes to eval_set PR WiP

* dask support for eval_names, eval_metric, eval_stopping_rounds

* add evals_result checks and other eval_set attribute-related test checks. need to merge master - WiP

* fix lint errors in test_dask.py

* drop group_shape from _lgbmmodel_doc_fit.format for non-rankers, add support for eval_at for dask ranker

* add eval_at to test_dask eval_set ranker tests

* add back group_shape to lgbmmmodel docs, tighten tests

* drop random eval weights from early stopping, probably causing training to terminate too early

* add eval data templates to sklearn fit docs, add eval data docs to dask

* add n_features to _create_data, eval_set tests stop w/ desirable tree counts

* import alphabetically

* add back get_worker for eval_set error handling

* test_dask argmin typo

* push forgotten eval_names bugfix

* eval_stopping_rounds -> early_stopping_rounds, fix failing non-es test

* change default eval_at to tuple 1-5

* re-drop get_worker

* drop early stopping support from eval_set commits, move eval_set worker check prior to client.submit

* add eval_class_weight and eval_init_score to lightgbm/dask, WiP

* clean up eval_set tests, allow user to specify fewer eval_names, clswghts than eval_sets

* remove redundant backslash

* lint fixes

* fix eval_at, eval_metric duplication, let eval_at be Iterable not just Tuple

* use all data_outputs for test_eval_set tests

* undo newlines from first pr

* add custom_eval_metric test, correct issue with eval_at and metric names

* move _constant_metric outside of test

* dataset reference names instead of __strings__

* add padding to eval_set parts makes each part has same len(eval_set)

* eval set code clean up

* revert n_evals to be max len eval_set across all parts on worker

* pylint errors in _DatasetNames

* more pylint fixes

* pylinting...

* add by pytest.mark, mistakenly deleted during merge conflict resolution

* address code review comments

* add _pad_eval_names to handle nondeterministic evals_result_ valid set names

* change not evaluated evals_result_ test criteria

* address fit eval docs issues, switch _DatasetNames to Enum

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update eval_metrics, eval_at dask fit docstr to match sklearn, make tests reflect that l2 (rmse), logloss in evals_result_ by default

* address eval_set dict keys naming in docstr and training eval_set naming issue

* in test_dask check for obj-default metric names in eval_results, remove check for training key

* lint fixes for _pad_eval_names

* remove unnecessary breaklinen in _pad_eval_names docstr

* use Enum.member syntax not Enum.member.name

* remove str from supported eval_at types

* add whitespace and remove DaskDataframes mention from eval_ param docstrs in _train

* remove "of shape = [n_samples]" from group_shape docs

* add eval_at base_doc in DaskLGBMRanker.fit

* remove excess paren from eval_names docs in _train

* make requested changes to test_dask.py

* remove Optional() wrapper on eval_at

* add _lgbmmodel_doc_custom_eval_note to dask.py fit.__doc__

* fix ordering of .sklearn imports to attempt lint fix

* dask custom eval note to f-string pt1
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 2
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* dask custom eval note to f-string pt 3
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

b5502d19

27 Jun, 2021 2 commits
- [python] replace numpy.zeros with numpy.empty for the speedup (#4410) · 45ac271b
  Nikita Titov authored Jun 27, 2021
  
  45ac271b
- [tests][dask] add missing compute() in Dask test (#4412) · db3915c2
  James Lamb authored Jun 27, 2021
  
  db3915c2
26 Jun, 2021 2 commits

[dask] pass additional predict() parameters through when input is a Dask Array (#4399) · 8116d880

James Lamb authored Jun 26, 2021



* [dask] pass predict() kwargs through when input is a Dask Array

* add tests

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add prediction early stopping params
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

8116d880

fix param aliases (#4387) · aab8fc18
Nikita Titov authored Jun 26, 2021

aab8fc18

15 Jun, 2021 1 commit
- [tests] replace pytest.parametrize (#4377) · c738c83b
  Nikita Titov authored Jun 15, 2021
```
* replace pytest.parametrize

* add informative message for assert
```
  c738c83b
12 Jun, 2021 1 commit
- [tests][python] fix f-string in test_dask.py (#4373) · c3b9363d
  Nikita Titov authored Jun 12, 2021
  
  c3b9363d
09 Jun, 2021 2 commits

[python] improving the syntax of the fstring in the file :... · d677d6c6

sayantan sadhu authored Jun 09, 2021


[python] improving the syntax of the fstring in the file : tests/python_package_test/test_dask.py (#4358)

* updated the old syntax with fstrings

* Updated the strings with + catenation to fstrings

* Updated the strings with + catenation to fstrings

* Update tests/python_package_test/test_dask.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

d677d6c6

[python-package] change to f-strings in test_plotting.py (#4359) · 9143003d
Weston King-Leatham authored Jun 08, 2021

9143003d

07 Jun, 2021 1 commit
- [python-package] updated test_consistency.py to use f-strings (#4348) · bab58d0e
  sayantan sadhu authored Jun 07, 2021
  
  bab58d0e
03 Jun, 2021 1 commit

Add linear leaf models to json output (fixes #4186) (#4329) · 1b5bec00

Belinda Trotta authored Jun 03, 2021



* Add linear leaf models to json output

* Add closing bracket

* Move test into test_engine.py and add asserts

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1b5bec00

21 May, 2021 3 commits

[python] improving the syntax of the fstring in the file :... · da3465cb
sayantan sadhu authored May 21, 2021
```
[python] improving the syntax of the fstring in the file : tests/python_package_test/test_basic.py (#4312)
```
da3465cb

[dask] run Dask tests on aarch64 architecture (#3996) · a372ed50

Nikita Titov authored May 21, 2021



* run Dask tests on aarch64 architecture

* make random Dask test to fail

* Revert "make random Dask test to fail"

This reverts commit c43c98507f818994bb08b4f7d289ecad3b3449eb.

* empty commit

* empty commit

* empty commit

* empty commit
Co-authored-by: James Lamb <jaylamb20@gmail.com>

a372ed50

[python] handle arbitrary length feature names in Python-package (#4293) · 237ac299
Nikita Titov authored May 21, 2021
```
* handle arbitrary length feature names in Python-package

* added tests
```
237ac299

04 May, 2021 1 commit

Correct spelling (#4250) · e79716e0

Andrew Ziem authored May 04, 2021



* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.
Co-authored-by: James Lamb <jaylamb20@gmail.com>

e79716e0

28 Apr, 2021 1 commit
- [ci][python-package] remove unused import in tests (#4233) · 086f0785
  James Lamb authored Apr 28, 2021
  
  086f0785
11 Apr, 2021 1 commit

enforce interaction constraints with monotone_constraints_method = intermediate/advanced (#4043) · 9e1d7fa1

Christoph Aymanns authored Apr 11, 2021



* add test for interaction constraints and monotone constraints

* enforce interaction constraints in RecomputeBestSplitForLeaf

* code formatting

* code formatting

* move interaction constraint test to test_engine

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

9e1d7fa1

05 Apr, 2021 1 commit
- [tests][dask] replace client fixture with cluster fixture (#4159) · 965b9fc9
  jmoralez authored Apr 05, 2021
```
* replace client fixture with cluster fixture

* wait on persist before rebalance
```
  965b9fc9
01 Apr, 2021 2 commits

[tests][dask] Add voting_parallel algorithm in tests (fixes #3834) (#4088) · d517ba12

jmoralez authored Apr 01, 2021

* include voting_parallel tree_learner in test_regressor, test_classifier and test_ranker

* remove test for warnings and test for error when using feature_parallel

* use real names for tree_learner intest and include test for aliases. use the error message in the test for error in feature parallel

* split all tests with rf in test_classifier

* remove task parametrization for tree_learner aliases test. smaller input data from feature_parallel error

* define task for tree_learner aliases

d517ba12

use dy_true mean in denominator for r2_score (#4151) · 46a20ab0
jmoralez authored Apr 01, 2021

46a20ab0

31 Mar, 2021 1 commit

[dask] make random port search more resilient to random collisions (fixes #4057) (#4133) · 1ce4b22b

James Lamb authored Mar 31, 2021

* [dask] make random port search more resilient to random collisions

* linting

* more reliable ports check

* address review comments

* add error message

1ce4b22b

30 Mar, 2021 1 commit

[tests][dask] test all boosting types (fixes #3896) (#4119) · f879018b

jmoralez authored Mar 30, 2021

* test all boosting types

* lint

* bring scores comparison back and set y as second argument in assert_eq

f879018b

27 Mar, 2021 2 commits

[ci] remove output parametrization from two Dask tests (#4123) · d32ee23a
Nikita Titov authored Mar 28, 2021
```
* Update test_dask.py

* Update test_dask.py
```
d32ee23a

[dask] Include support for raw_score in predict (fixes #3793) (#4024) · fe1b80a5

jmoralez authored Mar 27, 2021

* include test for prediction with raw_score

* close client

* initial comments

* update data creation and include ranking task

* linting

* update _create_data

* compare unique raw_predictions with values in leaves_df

fe1b80a5

26 Mar, 2021 1 commit

[tests][dask] Create an informative categorical feature (#4113) · 8cc6eefc

jmoralez authored Mar 26, 2021

* make one categorical variable informative. increase n_samples. reduce n_features for regression

* adjust tolerances in checks

8cc6eefc

16 Mar, 2021 1 commit
- [tests][dask] simplify code in Dask tests (#4075) · 1f4a0842
  Nikita Titov authored Mar 16, 2021
```
* simplify Dask tests code

* enable CI

* disable CI
```
  1f4a0842
15 Mar, 2021 1 commit
- [dask] [ci] fix flaky network-setup test (#4071) · 39c85dd9
  James Lamb authored Mar 15, 2021
  
  39c85dd9
10 Mar, 2021 2 commits

[dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059) · 296397df

James Lamb authored Mar 10, 2021

* [dask] raise more informative error for duplicates in 'machines'

* uncomment

* avoid test failure

* Revert "avoid test failure"

This reverts commit 9442bdf00f193a19a923dc0deb46b7822cb6f601.

296397df

[dask] include multiclass-classification task in tests (#4048) · 1d7b54d3

jmoralez authored Mar 09, 2021

* include multiclass-classification task and task_to_model_factory dicts

* define centers coordinates. flatten init_scores within each partition for multiclass-classification

* include issue comment and fix linting error

1d7b54d3

04 Mar, 2021 1 commit

[dask] Include support for init_score (#3950) · 37e98782

jmoralez authored Mar 04, 2021

* include support for init_score

* use dataframe from init_score and test difference with and without init_score in local model

* revert refactoring

* initial docs. test between distributed models with and without init_score

* remove ranker from tests

* test value for root node and change docs

* comma

* re-include parametrize

* fix incorrect merge

* use single init_score and the booster_ attribute

* use np.float64 instead of float

37e98782

02 Mar, 2021 1 commit

[dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031) (#4032) · 2a00b6ff

James Lamb authored Mar 02, 2021



* [dask] [ci] add support for scikit-learn 0.24+ in tests (fixes #4031)

* Update tests/python_package_test/test_dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* try upgrading mixtexsetup

* they changed the executable name UGH

* more changes for executable name

* another path change

* changing package mirrors

* undo experiments
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

2a00b6ff

24 Feb, 2021 3 commits

[tests][dask] simplify fit calls in Dask tests (#4018) · 3ab6bbf9
Nikita Titov authored Feb 24, 2021
```
* simplify fit calls in Dask tests

* Update .vsts-ci.yml

* Update .vsts-ci.yml
```
3ab6bbf9

[dask][python-package] include support for column array as label (#3943) · 5dacd603

jmoralez authored Feb 24, 2021

* include support for column array as label

* remove nested ifs

* fix linting errors

* include tests for sklearn regressors

* include docstring for numpy_1d_array_to_dtype

* include . at end of docstring

* remove pandas import and test for regression, classification and ranking

* check predictions of sklearn models as well

* test training only in dask. drop pandas series tests

* use PANDAS_INSTALLED and pd_Series

* inline imports

* use col array in fit for test_dask

* include review comments

5dacd603

[tests][python] Add test for single leaf in linear tree (#4015) · 86a085f7

Nikita Titov authored Feb 24, 2021

* Update test_engine.py

* Update python_package.yml

* Update python_package.yml

* Update test_engine.py

* hotfix

86a085f7