Commits · 2315c0d11a0f106b8d981492a3706cc6d7ebffb4 · tianlh / LightGBM-DCU

10 Nov, 2020 1 commit

[tests][python][sklearn] make sklearn integration test compatible with 0.24 (#3533) · 2315c0d1

Guillaume Lemaitre authored Nov 10, 2020

* TST make sklearn integration test compatible with 0.24

* remove useless import

* remove outdated comment

* order import

* use parametrize_with_checks

* change the reason

* skip constructible if != 0.23

* make tests behave the same across sklearn version

* linter

* address suggestions

2315c0d1

26 Oct, 2020 1 commit

Fix add features (#2754) · 53977f36

Guolin Ke authored Oct 27, 2020



* fix subset bug

* typo

* add fixme tag

* bin mapper

* fix test

* fix add_features_from

* Update dataset.cpp

* fix merge bug

* added Python merge code

* added test for add_features

* Update dataset.cpp

* Update src/io/dataset.cpp

* continue implementing

* warn users about categorical features
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

53977f36

30 Sep, 2020 2 commits
- [python] Use ctypes for parameters of DLL functions for Dataset (#3423) · a3862f15
  Nikita Titov authored Sep 30, 2020
  
  a3862f15
- Use ctypes for parameters of DLL functions (#3419) · f60c14f1
  Belinda Trotta authored Sep 30, 2020
  
  f60c14f1
29 Sep, 2020 1 commit

[python] fix dangerous default for eval_at in LGBMRanker (#3377) · ecbb0e99

James Lamb authored Sep 29, 2020



* [python] fix dangerous default for eval_at in LGBMRanker

* use a tuple

* five

* Update python-package/lightgbm/sklearn.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

ecbb0e99

21 Sep, 2020 1 commit

Add option to build with integrated OpenCL (#3144) · 3454698e

TP Boudreau authored Sep 21, 2020



* Add specialized OpenCL/Python package build path

* Refer to upstream OpenCL repository

* Reset build job count in setup.py

* TEMPORARY: refer to OpenCL fork to ensure Linux CI builds succeed

* Remove intermediate cmake target

* Restrict OpenCL headers to documented API version

* Use command line definition to activate integrated build

* Flag reference to unofficial repo with FIXME

* TEMPORARY: update private repo tag for dependency

* Remove integrated build for non-Win32 and related cleanup

* Remove commented code

* Rename integrated OpenCL build option and other cleanups

* Small cleanups

* Update CMakeIntegratedOpenCL.cmake
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeIntegratedOpenCL.cmake
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeIntegratedOpenCL.cmake
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeIntegratedOpenCL.cmake
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeIntegratedOpenCL.cmake

Targeted download of Boost submodules.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

3454698e

20 Sep, 2020 1 commit

[GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457

Chip Kerchner authored Sep 20, 2020

* Initial CUDA work

* redirect log to python console (#3090)

* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

* re-order includes (fixes #3132) (#3133)

* Revert "re-order includes (fixes #3132) (#3133)" (#3153)

This reverts commit 656d2676

* Missing change from previous rebase

* Minor cleanup and removal of development scripts.

* Only set gpu_use_dp on by default for CUDA. Other minor change.

* Fix python lint indentation problem.

* More python lint issues.

* Big lint cleanup - more to come.

* Another large lint cleanup - more to come.

* Even more lint cleanup.

* Minor cleanup so less differences in code.

* Revert is_use_subset changes

* Another rebase from master to fix recent conflicts.

* More lint.

* Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.

* Removed parameters added for CUDA and various bug fix.

* Yet more lint and unneccessary changes.

* Revert another change.

* Removal of unneccessary code.

* temporary appveyor.yml for building and testing

* Remove return value in ReSize

* Removal of unused variables.

* Code cleanup from reviewers suggestions.

* Removal of FIXME comments and unused defines.

* More reviewers comments cleanup.

* Fix config variables.

* Attempt to fix check-docs failure

* Update Paramster.rst for num_gpu

* Removing test appveyor.yml

* Add CUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.

* Fixed handling of data elements less than 2K.

* More reviewers comments cleanup.

* Removal of TODO and fix printing of int64_t

* Add cuda change for CI testing and remove cuda from device_type in python.

* Missed one change form previous check-in

* Removal AdditionConfig and fix settings.

* Limit number of GPUs to one for now in CUDA.

* Update Parameters.rst for previous check-in

* Whitespace removal.

* Cleanup unused code.

* Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.

* Lint change from previous check-in.

* Changes based on reviewers comments.

* More reviewer comment changes.

* Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_

* Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.

* Reviewer comment cleanup.

* Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.

* Remove PRINT debug for CUDA code.

* Allow to use of multiple GPUs for CUDA.

* More multi-GPUs enablement for CUDA.

* More code cleanup based on reviews comments.

* Update docs with latest config changes.
Co-authored-by: Gordon Fossum <fossum@us.ibm.com>
Co-authored-by: ChipKerchner <ckerchne@linux.vnet.ibm.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

f7ad9457

15 Sep, 2020 1 commit

[python] Drop Python 3.5 support (#3395) · a3b9dae7

Nikita Titov authored Sep 15, 2020

* Update .appveyor.yml

* Update .travis.yml

* Update .vsts-ci.yml

* Update main.yml

* Update setup.py

a3b9dae7

11 Sep, 2020 2 commits
- [docs] Simplify the python installation instruction (#3378) · 0d45ebd6
  Guolin Ke authored Sep 12, 2020
```
* Update Python-Intro.rst

* Update README.rst
```
  0d45ebd6
- [python] remove unused variable (#3376) · 0c708d37
  James Lamb authored Sep 11, 2020
  
  0c708d37
06 Sep, 2020 1 commit

[Python] Refactor scikit-learn API to allow a list of evaluation metrics (#3254) · afc76d2c

Germán Ramírez-Espinoza authored Sep 07, 2020



* Refactors sklearn API to allow a list of evaluation metrics in the parameter eval_metric of the class (and subclasses of) LGBMModel. Also adds unit tests for this functionality

* Simplify expression to check whether the user passed one or multiple metrics to eval_metric parameter

* Simplify new tests by using custom metrics already defined in the test file

* Update docstring to reflect the fact that the parameter "feval" from the "train" and "cv" functions can also receive a list of callables

* Remove oxford comma from docstrings

Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Use named-parameters to make sure code is compatible with future versions of scikit-learn

Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove throwaway return value to make code more succinct
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Move statement to group together the code related to feval

* Avoid modifying original args as it causes errors in scikit-learn tools

For details see: https://github.com/microsoft/LightGBM/pull/2619



* Consolidate multiple eval-metrics unit-tests into one test
Co-authored-by: German I Ramirez-Espinoza <gire@home>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

afc76d2c

02 Sep, 2020 2 commits

be compatible with check_is_fitted sklearn function (#3329) · ca066d49
Nikita Titov authored Sep 02, 2020

ca066d49

bump version (#3344) · 8fc80bb4

Guolin Ke authored Sep 02, 2020



* Update VERSION.txt

* Update .appveyor.yml

* remove 3.0-RC installation guide

* Apply suggestions from code review

* [R-package] bump version (#3345)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

8fc80bb4

24 Aug, 2020 1 commit
- [doc] remove excess fit arguments from docstrings (#3330) · 9503d3f9
  Nikita Titov authored Aug 24, 2020
  
  9503d3f9
11 Aug, 2020 2 commits

simplify start_iteration param for predict in Python and some code cleanup for... · 877d58fa

Nikita Titov authored Aug 11, 2020

simplify start_iteration param for predict in Python and some code cleanup for start_iteration (#3288)

* simplify start_iteration param for predict in Python and some code cleanup for start_iteration

* revert docs changes about the prediction result shape

877d58fa

[python] try to use bundled distutils to setuptools during setup (#3294) · 97d5758f
Nikita Titov authored Aug 11, 2020

97d5758f

09 Aug, 2020 1 commit
- bump version for development (#3281) · ee8ec182
  Guolin Ke authored Aug 09, 2020
```
* bump version for development

* Update .appveyor.yml

* Update README.rst
```
  ee8ec182
06 Aug, 2020 2 commits

[Python] / [R] add start_iteration to python predict interface (fix #3058) (#3272) · 82e2ff7a

shiyu1994 authored Aug 06, 2020



* [python] add start_iteration to python predict interface (#3058)

* Apply suggestions from code review

* Update lightgbm_R.h

* Apply suggestions from code review

* Apply suggestions from code review

* fix R interface

* update R documentation
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

82e2ff7a

[doc] better doc for `keep_training_booster` (#3275) · 6f54ec3d

Guolin Ke authored Aug 06, 2020



* [doc] better doc for `keep_training_booster`

* Update python-package/lightgbm/engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6f54ec3d

02 Aug, 2020 1 commit

[python] add return_cvbooster flag to cv func and publish _CVBooster (#283,#2105,#1445) (#3204) · 1d59a045

momijiame authored Aug 03, 2020



* [python] add return_cvbooster flag to cv function and rename _CVBooster to make public (#283,#2105)

* [python] Reduce expected metric of unit testing

* [docs] add the CVBooster to the documentation

* [python] reflect the review comments

- Add some clarifications to the documentation
- Rename CVBooster.append to make private
- Decrease iteration rounds of testing to save CI time
- Use CVBooster as root member of lgb

* [python] add more checks in testing for cv
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* [python] add docstring for instance attributes of CVBooster
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* [python] fix docstring
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1d59a045

15 Jul, 2020 1 commit

feature importance type in saved model file (#3220) · 87d46489

Guolin Ke authored Jul 16, 2020



* feature importance type in saved model file

* fix nullptr

* fixed formatting

* fix python/R

* Update src/c_api.cpp

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix c_api test

* fix swig

* minor docs improvements and added defines for importance types
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

87d46489

14 Jul, 2020 1 commit

[python][scikit-learn] Fixes a bug that prevented using multiple eval_metrics... · 7b8b5151

Germán Ramírez-Espinoza authored Jul 15, 2020


[python][scikit-learn] Fixes a bug that prevented using multiple eval_metrics in LGBMClassifier (#3222)

* Fixes a bug that prevented using multiple eval_metrics in LGBMClassifier

* Move bug-fix test to the test_metrics unit-test

* Fix test to avoid issues with existing tests

* Fix coding-style error
Co-authored-by: German I Ramirez-Espinoza <gire@home>

7b8b5151

07 Jul, 2020 1 commit
- [python] fix early_stopping_round = 0 (#3211) · aef50f86
  Guolin Ke authored Jul 08, 2020
```
* Update engine.py

* Update sklearn.py
```
  aef50f86
28 Jun, 2020 1 commit

adding sparse support to TreeSHAP in lightgbm (#3000) · 9f367d11

Ilya Matiach authored Jun 28, 2020

* adding sparse support to TreeSHAP in lightgbm

* updating based on comments

* updated based on comments, used fromiter instead of frombuffer

* updated based on comments

* fixed limits import order

* fix sparse feature contribs to work with more than int32 max rows

* really fixed int64 max error and build warnings

* added sparse test with >int32 max rows

* fixed python side reshape check on sparse data

* updated based on latest comments

* fixed comments

* added CSC INT32_MAX validation to test, fixed comments

9f367d11

27 Jun, 2020 1 commit

[python][scikit-learn] new stacking tests and make number of features a property (#3173) · 72849466

Alex authored Jun 28, 2020

* modify attribute and include stacking tests

* backwards compatibility

* check sklearn version

* move stacking import

* Number of input features (#3173)

* Number of input features (#3173)

* Number of input features (#3173)

* Number of input features (#3173)

Split number of features and stacking tests.

* Number of input features (#3173)

Modify test name.

* Number of input features (#3173)

Update stacking tests for review comments.

* Number of input features (#3173)

* Number of input features (#3173)

* Number of input features (#3173)

* Number of input features (#3173)

Modify classifier test.

* Number of input features (#3173)

* Number of input features (#3173)

Check score.

72849466

23 Jun, 2020 1 commit

Interaction constraints (#3126) · bca2da97

Belinda Trotta authored Jun 23, 2020

* Add interaction constraints functionality.

* Minor fixes.

* Minor fixes.

* Change lambda to function.

* Fix gpu bug, remove extra blank lines.

* Fix gpu bug.

* Fix style issues.

* Try to fix segfault on MACOS.

* Fix bug.

* Fix bug.

* Fix bugs.

* Change parameter format for R.

* Fix R style issues.

* Change string formatting code.

* Change docs to say R package not supported.

* Remove R functionality, moving to separate PR.

* Keep track of branch features in tree object.

* Only track branch features when feature interactions are enabled.

* Fix lint error.

* Update docs and simplify tests.

bca2da97

22 Jun, 2020 1 commit

[docs][scikit-learn] removed duplicated docstrings (#3164) · fa2de89b

Nikita Titov authored Jun 23, 2020

* Revert "[ci][docs] temporarily pin Sphinx version (#3157)"

This reverts commit b3a84df5.

* removed duplicated docstrings

fa2de89b

11 Jun, 2020 1 commit
- refactor LGBM_DatasetGetFeatureNames (#3022) · f30e0bb3
  Nikita Titov authored Jun 11, 2020
  
  f30e0bb3
02 Jun, 2020 1 commit

[python][scikit-learn] add new attribute for used number of features (#3129) · a2a38b6c

Alex authored Jun 03, 2020

* update number of features attribute

Fixes issue related to https://github.com/scikit-learn/scikit-learn/issues/17353 (see SLEP010 https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html

).

* Update sklearn.py

* set public attribute in fit method

Reverted ```n_features``` property, and inserted the public attribute ```n_features_in_```.

* Update documentation

* Update python-package/lightgbm/sklearn.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

a2a38b6c

20 May, 2020 1 commit

redirect log to python console (#3090) · dea2391b

Guolin Ke authored May 21, 2020



* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

dea2391b

12 May, 2020 1 commit
- removed deprecated code (#3073) · b60294b8
  Nikita Titov authored May 12, 2020
  
  b60294b8
09 May, 2020 1 commit
- [python-package] Use set for OS check (#3059) · 1b4cddfb
  James Lamb authored May 09, 2020
  
  1b4cddfb
05 May, 2020 1 commit
- [docs] updated docs about output values (#3037) · 796ba803
  Nikita Titov authored May 05, 2020
  
  796ba803
13 Apr, 2020 1 commit
- [python][ci] start to support Python 3.8 (#2713) · 6c19539e
  Nikita Titov authored Apr 14, 2020
```
* start to support Python 3.8

* update configs

* hotfix
```
  6c19539e
10 Apr, 2020 2 commits

Support UTF-8 characters in feature name again (#2976) · 44a91201

OMOTO Tsukasa authored Apr 10, 2020

* Support UTF-8 characters in feature name again

This commit reverts 0d59859c.
Also see:
- https://github.com/microsoft/LightGBM/issues/2226
- https://github.com/microsoft/LightGBM/issues/2478
- https://github.com/microsoft/LightGBM/pull/2229

I reproduced the issue and as @kidotaka gave us a great survey in #2226,
I don't conclude that the cause is UTF-8, but "an empty string (character)".
Therefore, I revert "throw error when meet non ascii (#2229)" whose commit hash
is 0d59859c, and add support feture names as UTF-8 again.

* add tests

* fix check-docs tests

* update

* fix tests

* update .travis.yml

* fix tests

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* add a test for R-package

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* fix test for R-package

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* update

* updte

* update

* remove unneeded comments

44a91201

[python] Re-enable scikit-learn 0.22+ support (#2949) · c633c6c2

Nikita Titov authored Apr 10, 2020

* Revert "specify the last supported version of scikit-learn (#2637)"

This reverts commit d1002776.

* ban scikit-learn 0.22.0 and skip broken test

* fix updated test

* fix lint test

* Revert "fix lint test"

This reverts commit 8b4db0805fe7a9e7f7eb0be3eac231f85026d196.

c633c6c2

20 Mar, 2020 2 commits

Fix SWIG methods that return char** (#2850) · 91185c3a

Alberto Ferreira authored Mar 20, 2020



* [swig] Fix SWIG methods that return char** with StringArray.

+ [new] Add StringArray class to manage and manipulate arrays of fixed-length strings:

  This class is now used to wrap any char** parameters, manage memory and
  manipulate the strings.

  Such class is defined at swig/StringArray.hpp and wrapped in StringArray.i.

+ [API+fix] Wrap LGBM_BoosterGetFeatureNames it resulted in segfault before:

  Added wrapper LGBM_BoosterGetFeatureNamesSWIG(BoosterHandle) that
  only receives the booster handle and figures how much memory to allocate
  for strings and returns a StringArray which can be easily converted to String[].

+ [API+safety] For consistency, LGBM_BoosterGetEvalNamesSWIG was wrapped as well:

  * Refactor to detect any kind of errors and removed all the parameters
    besides the BoosterHandle (much simpler API to use in Java).
  * No assumptions are made about the required string space necessary (128 before).
  * The amount of required string memory is computed internally

+ [safety] No possibility of undefined behaviour

  The two methods wrapped above now compute the necessary string storage space
  prior to allocation, as the low-level C API calls would crash the process
  irreversibly if they write more memory than which is passed to them.

* Changes to C API and wrappers support char**

To support the latest SWIG changes that enable proper char**
return support that is safe, the C API was changed.

The respecive wrappers in R and Python were changed too.

* Cleanup indentation in new lightgbm_R.cpp code

* Adress review code-style comments.

* Update swig/StringArray.hpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/basic.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/lightgbm_R.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: alberto.ferreira <alberto.ferreira@feedzai.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

91185c3a

[python] handle RandomState object in Scikit-learn Api (#2904) · cf0a992e

Lukas Pfannschmidt authored Mar 20, 2020



* Add handling of RandomState object, which is standard for sklearn methods.

LightGBM expects an integer seed instead of an object.
If passed object is RandomState, we choose random integer based on its state to seed the underlying low level code.
While chosen random integer is only in the range between 1 and 1e10 I expect it to have enough entropy (?) to not matter in practice.

* Add RandomState object to random_state docstring.

* remove blank line

* Use property to handle setting random_state.
This enables setting cloned estimators with the set_params method in sklearn.

* Add docstring to attribute.

* Fix and simplify docstring.

* Add test case.

* Use maximal int for datatype in seed derivation.

* Replace random_state property with interfacing in fit method.
Derives int seed for C code only when fitting and keeps RandomState object as param.

* Adapt unit test to property change.

* Extended test case and docstring
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Add more equality checks (feature importance, best iteration/score).

* Add equality comparison of boosters represented by strings.
Remove useless best_iteration_ comparison (we do not use early_stopping).

* fix whitespace

* Test if two subsequent fits produce different models

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

cf0a992e

16 Mar, 2020 2 commits

[python] Fix continued train by reusing the same dataset (#2906) · fc0f132f

Guolin Ke authored Mar 17, 2020



* fix

* fix return

* fix test

* fix test

* fix predictor is none

* Apply suggestions from code review

* Update basic.py

* Update basic.py

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

fc0f132f

[python] fix the bug when use different params with reference (#2907) · 399b746b

Guolin Ke authored Mar 16, 2020



* fix the bug when use different params with reference

* fix

* Update basic.py

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update basic.py

* add test

* Apply suggestions from code review

* added asserts in test
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

399b746b