Commits · 7744757a52c349e9b6e3907043581d507142d9ea · tianlh / LightGBM-DCU

23 Sep, 2020 1 commit
- Improve performance of path smoothing (#3396) · 7744757a
  Belinda Trotta authored Sep 23, 2020
```
* Make path smoothing faster

* Fix bug

* Fix bug

* Minor style fix
```
  7744757a
21 Sep, 2020 2 commits

Pr4 advanced method monotone constraints (#3264) · 4278f222

CharlesAuguste authored Sep 21, 2020



* No need to pass the tree to all fuctions related to monotone constraints because the pointer is shared.

* Fix OppositeChildShouldBeUpdated numerical split optimisation.

* No need to use constraints when computing the output of the root.

* Refactor existing constraints.

* Add advanced constraints method.

* Update tests.

* Add override.

* linting.

* Add override.

* Simplify condition in LeftRightContainsRelevantInformation.

* Add virtual destructor to FeatureConstraint.

* Remove redundant blank line.

* linting of else.

* Indentation.

* Lint else.

* Replaced non-const reference by pointers.

* Forgotten reference.

* Leverage USE_MC for efficiency.

* Make constraints const again in feature_histogram.hpp.

* Update docs.

* Add "advanced" to the monotone constraints options.

* Update monotone constraints restrictions.

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove superfluous parenthesis.

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove std namespace qualifier.

* Fix unsigned_int size_t comparison.

* Set num_features as int for consistency with the rest of the codebase.

* Make sure constraints exist before recomputing them.

* Initialize previous constraints in UpdateConstraints.

* Update monotone constraints restrictions.

* Refactor UpdateConstraints loop.

* Update src/io/config.cpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Delete white spaces.
Co-authored-by: Charles Auguste <charles.auguste@sig.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

4278f222

fix sparse multiclass local feature contributions and add test (#3382) · eff287e9
Ilya Matiach authored Sep 21, 2020

eff287e9

20 Sep, 2020 3 commits

Auc mu weights (#3349) · 1782fcb1

Belinda Trotta authored Sep 20, 2020

* Update auc_mu metric to use data weights if provided

* Calculate class sizes and total weights in Init so we only do it once

* Fix lint error

* Empty commit to trigger CI jobs

1782fcb1

[GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457

Chip Kerchner authored Sep 20, 2020

* Initial CUDA work

* redirect log to python console (#3090)

* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

* re-order includes (fixes #3132) (#3133)

* Revert "re-order includes (fixes #3132) (#3133)" (#3153)

This reverts commit 656d2676

* Missing change from previous rebase

* Minor cleanup and removal of development scripts.

* Only set gpu_use_dp on by default for CUDA. Other minor change.

* Fix python lint indentation problem.

* More python lint issues.

* Big lint cleanup - more to come.

* Another large lint cleanup - more to come.

* Even more lint cleanup.

* Minor cleanup so less differences in code.

* Revert is_use_subset changes

* Another rebase from master to fix recent conflicts.

* More lint.

* Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.

* Removed parameters added for CUDA and various bug fix.

* Yet more lint and unneccessary changes.

* Revert another change.

* Removal of unneccessary code.

* temporary appveyor.yml for building and testing

* Remove return value in ReSize

* Removal of unused variables.

* Code cleanup from reviewers suggestions.

* Removal of FIXME comments and unused defines.

* More reviewers comments cleanup.

* Fix config variables.

* Attempt to fix check-docs failure

* Update Paramster.rst for num_gpu

* Removing test appveyor.yml

* Add CUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.

* Fixed handling of data elements less than 2K.

* More reviewers comments cleanup.

* Removal of TODO and fix printing of int64_t

* Add cuda change for CI testing and remove cuda from device_type in python.

* Missed one change form previous check-in

* Removal AdditionConfig and fix settings.

* Limit number of GPUs to one for now in CUDA.

* Update Parameters.rst for previous check-in

* Whitespace removal.

* Cleanup unused code.

* Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.

* Lint change from previous check-in.

* Changes based on reviewers comments.

* More reviewer comment changes.

* Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_

* Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.

* Reviewer comment cleanup.

* Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.

* Remove PRINT debug for CUDA code.

* Allow to use of multiple GPUs for CUDA.

* More multi-GPUs enablement for CUDA.

* More code cleanup based on reviews comments.

* Update docs with latest config changes.
Co-authored-by: Gordon Fossum <fossum@us.ibm.com>
Co-authored-by: ChipKerchner <ckerchne@linux.vnet.ibm.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

f7ad9457

improve subfeature_bynode (#3384) · 1fddabb5

Guolin Ke authored Sep 20, 2020

* Update serial_tree_learner.cpp

* Update src/treelearner/serial_tree_learner.cpp

* Update src/treelearner/serial_tree_learner.cpp

1fddabb5

13 Sep, 2020 1 commit
- Fix typo in ResetConfig (#3392) · dc963d9f
  Belinda Trotta authored Sep 13, 2020
  
  dc963d9f
11 Sep, 2020 1 commit
- avoid segment fault in ResetConfig for GBDT in prediction (fix #3317) (#3373) · fee6f4a2
  shiyu1994 authored Sep 11, 2020
  
  fee6f4a2
15 Aug, 2020 2 commits

fix typo (#3309) · edbe3683
Nikita Titov authored Aug 15, 2020
```
* fix typo

* fix typo
```
edbe3683

fix zero bin in categorical split (#3305) · 03910760

Guolin Ke authored Aug 15, 2020

* fix zero bin

* some fix

* fix bin mapping

* fix

* fix bug

* use stable sort

* fix cat forced split

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

03910760

11 Aug, 2020 1 commit

simplify start_iteration param for predict in Python and some code cleanup for... · 877d58fa

Nikita Titov authored Aug 11, 2020

simplify start_iteration param for predict in Python and some code cleanup for start_iteration (#3288)

* simplify start_iteration param for predict in Python and some code cleanup for start_iteration

* revert docs changes about the prediction result shape

877d58fa

06 Aug, 2020 2 commits

[Python] / [R] add start_iteration to python predict interface (fix #3058) (#3272) · 82e2ff7a

shiyu1994 authored Aug 06, 2020



* [python] add start_iteration to python predict interface (#3058)

* Apply suggestions from code review

* Update lightgbm_R.h

* Apply suggestions from code review

* Apply suggestions from code review

* fix R interface

* update R documentation
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

82e2ff7a

fix the omp index in window · 9b263735
Guolin Ke authored Aug 06, 2020

9b263735

05 Aug, 2020 3 commits

create buffer for gradients and hessians with goss and customized objective (fixes #3243) (#3263) · 1dbe5e99

shiyu1994 authored Aug 06, 2020



* fix bug for GOSS with customized objective (fixes #3243)

* Apply suggestions from code review
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

1dbe5e99

fix multi-class objective (softmax) (#3256) · 4f28233b

Guolin Ke authored Aug 06, 2020



* Update multiclass_objective.hpp

* Apply suggestions from code review

* Update src/objective/multiclass_objective.hpp

* Apply suggestions from code review

* Update test_basic.R

* Update test_basic.R

* Update src/objective/multiclass_objective.hpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

4f28233b

Fast single row predict API v2 (#3268) · b5027de3

Alberto Ferreira authored Aug 05, 2020

* Fix bug introduced in PR #2992 for Fast predict

* Faster Fast predict API

* Add const to SingleRow Fast methods

b5027de3

29 Jul, 2020 1 commit

[TYPO] DatasetLoader::ConstructFromSampleData (#3258) · 6f339d77

Lucas David authored Jul 29, 2020



* ~ Modified name of method DatasetLoader::CostructFromSampleData to DatasetLoader::ConstructFromSampleData.
& Build passes for Debug, Debug_DLL, DLL and Release (not tested Debug_mpi and Release_mpi).

* ~ Refactored indentations.
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6f339d77

25 Jul, 2020 2 commits
- fix bug in CEGB when reset training data or config (#3246) · 091f41b6
  Guolin Ke authored Jul 25, 2020
```
* fix

* Apply suggestions from code review
```
  091f41b6
- fix possible problem in read number of columns from libsvm file. (#3242) · e2f11b05
  Guolin Ke authored Jul 25, 2020
  
  e2f11b05
20 Jul, 2020 1 commit
- typo fix (#3239) · 9d431d12
  Guolin Ke authored Jul 21, 2020
  
  9d431d12
19 Jul, 2020 1 commit

Change locking strategy of Booster, allow for share and unique locks (#2760) · 1c35c3b9

Joan Fontanals authored Jul 19, 2020



* Add capability to get possible max and min values for a model

* Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp

* Update include/LightGBM/c_api.h
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Change iteration to avoid potential overflow, add bindings to R and Python and a basic test

* Adjust test values

* Consider const correctness and multithreading protection

* Put everything possible as const

* Include shared_mutex, for now as unique_lock

* Update test values

* Put everything possible as const

* Include shared_mutex, for now as unique_lock

* Make PredictSingleRow const and share the lock with other reading threads

* Update test values

* Add test to check that model is exactly the same in all platforms

* Try to parse the model to get the expected values

* Try to parse the model to get the expected values

* Fix implementation, num_leaves can be lower than the leaf_value_ size

* Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value

* Change test order

* Add gpu_use_dp option in test

* Remove helper test method

* Remove TODO

* Add preprocessing option to compile with c++17

* Update python-package/setup.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Remove unwanted changes

* Move option

* Fix problems introduced by conflict fix

* Avoid switching to c++17 and use yamc mutex library to access shared lock functionality

* Add extra yamc include

* Change header order

* some lint fix

* change include order and remove some extra blank lines

* Further fix lint issues

* Update c_api.cpp

* Further fix lint issues

* Move yamc include files to a new yamc folder

* Use standard unique_lock

* Update windows/LightGBM.vcxproj
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* Update windows/LightGBM.vcxproj.filters
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* Update windows/LightGBM.vcxproj.filters
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update windows/LightGBM.vcxproj.filters
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update windows/LightGBM.vcxproj.filters
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix problems coming from merge conflict resolution
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: joanfontanals <jfontanals@ntent.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

1c35c3b9

16 Jul, 2020 1 commit

Upcast index to size_t in Refit (closes #3227) (#3228) · f5f27ca8

Jan Tilly authored Jul 16, 2020

In the current implementation, the index is an int32, which will segfault with large data sets and a large number of estimators.

f5f27ca8

15 Jul, 2020 2 commits

Feat/optimize single prediction (#2992) · fc79b366

Alberto Ferreira authored Jul 15, 2020

* [performance] Add Fast methods to C API for SingleRow Predictions

 * Add methods to C API to make single-row predictions faster:

   - LGBM_BoosterPredictForMatSingleRowFastInit (setup)
   - LGBM_BoosterPredictForMatSingleRowFast (predict)
   - LGBM_FastConfigFree (cleanup setup outputs)

* Code syle cleanup

* Fix lint errors

* [performance] Revert FastConfig improvement to pass data at init

This reduces optimization by 5% / 30% with this branch but makes it so it can be used for higher level wrappers in MMLSpark.
And outside it as well.

* [performance] Introduce Fast variants for SingleRow predictors.

Although this already provides performance gains by itself for any
callers, two new functions were added to Java's SWIG interfaces to
exploit that AND the GetPrimitiveArrayCritical data fetches.

* [tests/profiling] Profile Fast predict methods

Build with -DBUILD_PROFILING_TESTS=ON and copy the default
model trained on the Higgs dataset from the benchmarks repo

 https://github.com/guolinke/boosting_tree_benchmarks.git



to LightGBM repo root and run the lightgbm_profile_* binaries.

The single instance used is the first row from that dataset.

* Update comment on CMakeLists.

* Fix doxygen-introduced issue (#threads)

* Fix conflicts due to new RowFunctionFromCSR signature in master

* Change FastConfig ncol to int32_t.

* Removed profiling folder

* fix doxygen typo include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix doxygen typo include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix doxygen typo include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Doxygen: change new docstrings to double back-quote
Co-authored-by: alberto.ferreira <alberto.ferreira@feedzai.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

fc79b366

feature importance type in saved model file (#3220) · 87d46489

Guolin Ke authored Jul 16, 2020



* feature importance type in saved model file

* fix nullptr

* fixed formatting

* fix python/R

* Update src/c_api.cpp

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix c_api test

* fix swig

* minor docs improvements and added defines for importance types
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

87d46489

09 Jul, 2020 1 commit
- typo fix (#3174) · e7a2b66f
  guanqun authored Jul 09, 2020
  
  e7a2b66f
07 Jul, 2020 1 commit
- Fix integer overflow in auc_mu. (#3209) · 1e2013a3
  Belinda Trotta authored Jul 07, 2020
  
  1e2013a3
02 Jul, 2020 1 commit
- store the true split gain in tree model (#3196) · cfc5e4fe
  shiyu1994 authored Jul 02, 2020
  
  cfc5e4fe
01 Jul, 2020 1 commit
- Allow the minimal feature to 1 in column sampling (#3197) · ddf8c104
  Guolin Ke authored Jul 02, 2020
```
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
```
  ddf8c104
28 Jun, 2020 2 commits

adding sparse support to TreeSHAP in lightgbm (#3000) · 9f367d11

Ilya Matiach authored Jun 28, 2020

* adding sparse support to TreeSHAP in lightgbm

* updating based on comments

* updated based on comments, used fromiter instead of frombuffer

* updated based on comments

* fixed limits import order

* fix sparse feature contribs to work with more than int32 max rows

* really fixed int64 max error and build warnings

* added sparse test with >int32 max rows

* fixed python side reshape check on sparse data

* updated based on latest comments

* fixed comments

* added CSC INT32_MAX validation to test, fixed comments

9f367d11

Fix bug with interaction constraints (#3189) · d563aff9

Belinda Trotta authored Jun 28, 2020

* Fix bug: crashes when interaction_constraints is nonempty and not all features are used.

* Fix python lint error.

d563aff9

23 Jun, 2020 1 commit

Interaction constraints (#3126) · bca2da97

Belinda Trotta authored Jun 23, 2020

* Add interaction constraints functionality.

* Minor fixes.

* Minor fixes.

* Change lambda to function.

* Fix gpu bug, remove extra blank lines.

* Fix gpu bug.

* Fix style issues.

* Try to fix segfault on MACOS.

* Fix bug.

* Fix bug.

* Fix bugs.

* Change parameter format for R.

* Fix R style issues.

* Change string formatting code.

* Change docs to say R package not supported.

* Remove R functionality, moving to separate PR.

* Keep track of branch features in tree object.

* Only track branch features when feature interactions are enabled.

* Fix lint error.

* Update docs and simplify tests.

bca2da97

11 Jun, 2020 1 commit
- refactor LGBM_DatasetGetFeatureNames (#3022) · f30e0bb3
  Nikita Titov authored Jun 11, 2020
  
  f30e0bb3
09 Jun, 2020 1 commit
- Update tree.cpp (#3148) · 8092c9fe
  Guolin Ke authored Jun 09, 2020
  
  8092c9fe
05 Jun, 2020 1 commit
- Revert "re-order includes (fixes #3132) (#3133)" (#3153) · ac5f5e56
  Nikita Titov authored Jun 05, 2020
```
This reverts commit 656d2676.
```
  ac5f5e56
01 Jun, 2020 1 commit
- re-order includes (fixes #3132) (#3133) · 656d2676
  James Lamb authored Jun 01, 2020
  
  656d2676
26 May, 2020 1 commit

memory corruption fix for distributed data parallel version before SyncUpGlobalBestSplit (#3110) · 8ead7cc1

Ilya Matiach authored May 26, 2020

* memory corruption fix for distributed data parallel version before SyncUpGlobalBestSplit

* updated based on comments

* updated voting and feature parallel based on comments

* fixing macos failure

* rename variable

8ead7cc1

25 May, 2020 2 commits
- [R-package] move R source files into R-package, reduce duplication in build_r.R (#3087) · 4d43e96b
  James Lamb authored May 25, 2020
```
* [R-package] move R source files into R-package

* fix linting warning

* stuff
```
  4d43e96b
- fix a bug when we set the default score (#3114) · cd70bad4
  guanqun authored May 25, 2020
  
  cd70bad4
23 May, 2020 1 commit
- fix MSVC warning about control paths (fixes #3067) (#3068) · 74e5ec4b
  James Lamb authored May 23, 2020
```
* [R-package] fix MSVC warning about control paths (fixes #3067)

* linting

* simplify
```
  74e5ec4b
22 May, 2020 1 commit
- Fixed machine list parsing: s/find_first_of/find (#3108) · 7fe10fa1
  odimka authored May 22, 2020
  
  7fe10fa1