Commits · 792c930305a818db1463911c2cbee92d462eaab9 · tianlh / LightGBM-DCU

"tests/vscode:/vscode.git/clone" did not exist on "b857ee10cc9a913e6dedd15c2475765d1e923c7b"

08 Dec, 2020 1 commit

Fix model locale issue and improve model R/W performance. (#3405) · 792c9303

Alberto Ferreira authored Dec 08, 2020

* Fix LightGBM models locale sensitivity and improve R/W performance.

When Java is used, the default C++ locale is broken. This is true for
Java providers that use the C API or even Python models that require JEP.

This patch solves that issue making the model reads/writes insensitive
to such settings.
To achieve it, within the model read/write codebase:
 - C++ streams are imbued with the classic locale
 - Calls to functions that are dependent on the locale are replaced
 - The default locale is not changed!

This approach means:
 - The user's locale is never tampered with, avoiding issues such as
    https://github.com/microsoft/LightGBM/issues/2979 with the previous
    approach https://github.com/microsoft/LightGBM/pull/2891
 - Datasets can still be read according the user's locale
 - The model file has a single format independent of locale

Changes:
 - Add CommonC namespace which provides faster locale-independent versions of Common's methods
 - Model code makes conversions through CommonC
 - Cleanup unused Common methods
 - Performance improvements. Use fast libraries for locale-agnostic conversion:
   - value->string: https://github.com/fmtlib/fmt
   - string->double: https://github.com/lemire/fast_double_parser (10x
      faster double parsing according to their benchmark)

Bugfixes:
 - https://github.com/microsoft/LightGBM/issues/2500
 - https://github.com/microsoft/LightGBM/issues/2890
 - https://github.com/ninia/jep/issues/205

 (as it is related to LGBM as well)

* Align CommonC namespace

* Add new external_libs/ to python setup

* Try fast_double_parser fix #1

Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds

* CMake: Attempt to link fmt without explicit PUBLIC tag

* Exclude external_libs from linting

* Add exernal_libs to MANIFEST.in

* Set dynamic linking option for fmt.

* linting issues

* Try to fix lint includes

* Try to pass fPIC with static fmt lib

* Try CMake P_I_C option with fmt library

* [R-package] Add CMake support for R and CRAN

* Cleanup CMakeLists

* Try fmt hack to remove stdout

* Switch to header-only mode

* Add PRIVATE argument to target_link_libraries

* use fmt in header-only mode

* Remove CMakeLists comment

* Change OpenMP to PUBLIC linking in Mac

* Update fmt submodule to 7.1.2

* Use fmt in header-only-mode

* Remove fmt from CMakeLists.txt

* Upgrade fast_double_parser to v0.2.0

* Revert "Add PRIVATE argument to target_link_libraries"

This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7.

* Address James Lamb's comments

* Update R-package/.Rbuildignore
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Upgrade to fast_double_parser v0.3.0 - Solaris support

* Use legacy code only in Solaris

* Fix lint issues

* Fix comment

* Address StrikerRUS's comments (solaris ifdef).

* Change header guards
Co-authored-by: James Lamb <jaylamb20@gmail.com>

792c9303

21 Nov, 2020 1 commit

[R-package] Remove CLI-only objects (#3566) · 1c5930bf

James Lamb authored Nov 21, 2020



* [R-package] Remove CLI-only objects

* more guards

* more guards

* variable not string

* simplify fix

* revert build_r.R changes

* move define of global_timer
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1c5930bf

23 Oct, 2020 1 commit
- remove std::move (#3478) · c07644d1
  Nikita Titov authored Oct 23, 2020
  
  c07644d1
18 Oct, 2020 1 commit

[ci] [R-package] Fix memory leaks found by valgrind (#3443) · 81d76113

James Lamb authored Oct 18, 2020



* fix int64 write error

* attempt

* [WIP] [ci] [R-package] Add CI job that runs valgrind tests

* update all-successful

* install

* executable

* fix redirect stuff

* Apply suggestions from code review
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* more flags

* add mc to msvc proj

* fix memory leak in mc

* Update monotone_constraints.hpp

* Update r_package.yml

* remove R_INT64_PTR

* disable openmp

* Update gbdt_model_text.cpp

* Update gbdt_model_text.cpp

* Apply suggestions from code review

* try to free vector

* free more memories.

* Update src/boosting/gbdt_model_text.cpp

* fix using

* try the UNPROTECT(1);

* fix a const pointer

* fix Common

* reduce UNPROTECT

* remove UNPROTECT(1);

* fix null handle

* fix predictor

* use NULL after free

* fix a leaking in test

* try more fixes

* test the effect of tests

* throw exception in Fatal

* add test back

* Apply suggestions from code review

* commet some tests

* Apply suggestions from code review

* Apply suggestions from code review

* trying to comment out tests

* Update openmp_wrapper.h

* Apply suggestions from code review

* Update configure

* Update configure.ac

* trying to uncomment

* more comments

* more uncommenting

* more uncommenting

* fix comment

* more uncommenting

* uncomment fully-commented out stuff

* try uncommenting more dataset tests

* uncommenting more tests

* ok getting closer

* more uncommenting

* free dataset

* skipping a test, more uncommenting

* more skipping

* re-enable OpenMP

* allow on OpenMP thing

* move valgrind to comment-only job

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* changes from code review

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* linting

* issue comments too

* remove issue_comment
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

81d76113

30 Sep, 2020 1 commit
- disable monotone constraint in objective functions with renew_tree_output (#3380) · f8f6c513
  Guolin Ke authored Sep 30, 2020
```
* Update gbdt.cpp

* Update gbdt.cpp

* Apply suggestions from code review
```
  f8f6c513
20 Sep, 2020 1 commit

[GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457

Chip Kerchner authored Sep 20, 2020

* Initial CUDA work

* redirect log to python console (#3090)

* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

* re-order includes (fixes #3132) (#3133)

* Revert "re-order includes (fixes #3132) (#3133)" (#3153)

This reverts commit 656d2676

* Missing change from previous rebase

* Minor cleanup and removal of development scripts.

* Only set gpu_use_dp on by default for CUDA. Other minor change.

* Fix python lint indentation problem.

* More python lint issues.

* Big lint cleanup - more to come.

* Another large lint cleanup - more to come.

* Even more lint cleanup.

* Minor cleanup so less differences in code.

* Revert is_use_subset changes

* Another rebase from master to fix recent conflicts.

* More lint.

* Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.

* Removed parameters added for CUDA and various bug fix.

* Yet more lint and unneccessary changes.

* Revert another change.

* Removal of unneccessary code.

* temporary appveyor.yml for building and testing

* Remove return value in ReSize

* Removal of unused variables.

* Code cleanup from reviewers suggestions.

* Removal of FIXME comments and unused defines.

* More reviewers comments cleanup.

* Fix config variables.

* Attempt to fix check-docs failure

* Update Paramster.rst for num_gpu

* Removing test appveyor.yml

* Add CUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.

* Fixed handling of data elements less than 2K.

* More reviewers comments cleanup.

* Removal of TODO and fix printing of int64_t

* Add cuda change for CI testing and remove cuda from device_type in python.

* Missed one change form previous check-in

* Removal AdditionConfig and fix settings.

* Limit number of GPUs to one for now in CUDA.

* Update Parameters.rst for previous check-in

* Whitespace removal.

* Cleanup unused code.

* Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.

* Lint change from previous check-in.

* Changes based on reviewers comments.

* More reviewer comment changes.

* Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_

* Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.

* Reviewer comment cleanup.

* Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.

* Remove PRINT debug for CUDA code.

* Allow to use of multiple GPUs for CUDA.

* More multi-GPUs enablement for CUDA.

* More code cleanup based on reviews comments.

* Update docs with latest config changes.
Co-authored-by: Gordon Fossum <fossum@us.ibm.com>
Co-authored-by: ChipKerchner <ckerchne@linux.vnet.ibm.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

f7ad9457

13 Sep, 2020 1 commit
- Fix typo in ResetConfig (#3392) · dc963d9f
  Belinda Trotta authored Sep 13, 2020
  
  dc963d9f
11 Sep, 2020 1 commit
- avoid segment fault in ResetConfig for GBDT in prediction (fix #3317) (#3373) · fee6f4a2
  shiyu1994 authored Sep 11, 2020
  
  fee6f4a2
11 Aug, 2020 1 commit

simplify start_iteration param for predict in Python and some code cleanup for... · 877d58fa

Nikita Titov authored Aug 11, 2020

simplify start_iteration param for predict in Python and some code cleanup for start_iteration (#3288)

* simplify start_iteration param for predict in Python and some code cleanup for start_iteration

* revert docs changes about the prediction result shape

877d58fa

06 Aug, 2020 2 commits

[Python] / [R] add start_iteration to python predict interface (fix #3058) (#3272) · 82e2ff7a

shiyu1994 authored Aug 06, 2020



* [python] add start_iteration to python predict interface (#3058)

* Apply suggestions from code review

* Update lightgbm_R.h

* Apply suggestions from code review

* Apply suggestions from code review

* fix R interface

* update R documentation
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

82e2ff7a

fix the omp index in window · 9b263735
Guolin Ke authored Aug 06, 2020

9b263735

05 Aug, 2020 1 commit

create buffer for gradients and hessians with goss and customized objective (fixes #3243) (#3263) · 1dbe5e99

shiyu1994 authored Aug 06, 2020



* fix bug for GOSS with customized objective (fixes #3243)

* Apply suggestions from code review
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

1dbe5e99

15 Jul, 2020 1 commit

feature importance type in saved model file (#3220) · 87d46489

Guolin Ke authored Jul 16, 2020



* feature importance type in saved model file

* fix nullptr

* fixed formatting

* fix python/R

* Update src/c_api.cpp

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix c_api test

* fix swig

* minor docs improvements and added defines for importance types
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

87d46489

09 Jul, 2020 1 commit
- typo fix (#3174) · e7a2b66f
  guanqun authored Jul 09, 2020
  
  e7a2b66f
28 Jun, 2020 1 commit

adding sparse support to TreeSHAP in lightgbm (#3000) · 9f367d11

Ilya Matiach authored Jun 28, 2020

* adding sparse support to TreeSHAP in lightgbm

* updating based on comments

* updated based on comments, used fromiter instead of frombuffer

* updated based on comments

* fixed limits import order

* fix sparse feature contribs to work with more than int32 max rows

* really fixed int64 max error and build warnings

* added sparse test with >int32 max rows

* fixed python side reshape check on sparse data

* updated based on latest comments

* fixed comments

* added CSC INT32_MAX validation to test, fixed comments

9f367d11

23 Jun, 2020 1 commit

Interaction constraints (#3126) · bca2da97

Belinda Trotta authored Jun 23, 2020

* Add interaction constraints functionality.

* Minor fixes.

* Minor fixes.

* Change lambda to function.

* Fix gpu bug, remove extra blank lines.

* Fix gpu bug.

* Fix style issues.

* Try to fix segfault on MACOS.

* Fix bug.

* Fix bug.

* Fix bugs.

* Change parameter format for R.

* Fix R style issues.

* Change string formatting code.

* Change docs to say R package not supported.

* Remove R functionality, moving to separate PR.

* Keep track of branch features in tree object.

* Only track branch features when feature interactions are enabled.

* Fix lint error.

* Update docs and simplify tests.

bca2da97

05 Jun, 2020 1 commit
- Revert "re-order includes (fixes #3132) (#3133)" (#3153) · ac5f5e56
  Nikita Titov authored Jun 05, 2020
```
This reverts commit 656d2676.
```
  ac5f5e56
01 Jun, 2020 1 commit
- re-order includes (fixes #3132) (#3133) · 656d2676
  James Lamb authored Jun 01, 2020
  
  656d2676
25 May, 2020 1 commit
- fix a bug when we set the default score (#3114) · cd70bad4
  guanqun authored May 25, 2020
  
  cd70bad4
23 May, 2020 1 commit
- fix MSVC warning about control paths (fixes #3067) (#3068) · 74e5ec4b
  James Lamb authored May 23, 2020
```
* [R-package] fix MSVC warning about control paths (fixes #3067)

* linting

* simplify
```
  74e5ec4b
15 May, 2020 1 commit
- fix goss with constant hessian (#3077) · 5cc38e6a
  Guolin Ke authored May 15, 2020
  
  5cc38e6a
13 May, 2020 1 commit
- fix a bug in goss · f155379c
  Guolin Ke authored May 13, 2020
  
  f155379c
11 May, 2020 1 commit
- fix goss bug (#3063) · ad7f2851
  Guolin Ke authored May 11, 2020
  
  ad7f2851
08 Apr, 2020 1 commit

fix cpp lint in json11 (#2975) · 5c201e48

Guolin Ke authored Apr 08, 2020



* indent and constructor

* fix more

* fix long
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

5c201e48

30 Mar, 2020 1 commit
- fix inf in json model dump (#2940) · e6de39a7
  Nikita Titov authored Mar 30, 2020
  
  e6de39a7
06 Mar, 2020 1 commit
- fixed cpplint errors and VS project files (#2873) · 532722b9
  Nikita Titov authored Mar 06, 2020
  
  532722b9
05 Mar, 2020 1 commit

speed up `FindBestThresholdFromHistogram` (#2867) · 77d92b7c

Guolin Ke authored Mar 05, 2020

* speed up for const hessian

* rename template

* some refactorings

* refine

* refine

* simplify codes

* fix random in feature histogram

* code refine

* refine

* try fix

* make gcc happy

* remove timer

* rollback some changes

* more templates

* fix a bug

* reduce the cost of timer

* fix gpu

* fix bug

* fix gpu

77d92b7c

04 Mar, 2020 1 commit
- fixed cpplint issues (#2863) · d018d30a
  Nikita Titov authored Mar 04, 2020
```
* fixed cpplint errors

* fixed more cpplint errors
```
  d018d30a
02 Mar, 2020 2 commits

speed up multi-val bin subset for bagging (#2827) · d0bec9e9

Guolin Ke authored Mar 02, 2020

* speed up multi-val bin subset for bagging

* remove the duplicated codes

* code refine

* some codes refactoring

* move `is_constant_hessian` into `TrainingShareStates`

* refine

* fix bug

* fix bug when num_groups_ < 0

* fix gpu

* fix gpu bagging

* fix gpu bug

* typo

* Update src/treelearner/serial_tree_learner.h

d0bec9e9

introduced specific CHECKs (#2849) · 5a80b788
Nikita Titov authored Mar 02, 2020

5a80b788

28 Feb, 2020 1 commit
- fixed cpplint error about namespace using-directives (#2836) · 53137e25
  Nikita Titov authored Feb 28, 2020
  
  53137e25
27 Feb, 2020 1 commit
- fix index and remove unused includes (#2829) · 83012fe4
  Nikita Titov authored Feb 27, 2020
```
* removed unused includes

* fixed index
```
  83012fe4
26 Feb, 2020 1 commit

decouple bagging with num_threads (#2804) · d20ceac7

Guolin Ke authored Feb 26, 2020



* fix bagging

* fixed cpplint issues

* updated docs
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

d20ceac7

25 Feb, 2020 1 commit
- replace std::runtime_error with Log::Fatal (#2816) · 37ce3eb2
  Nikita Titov authored Feb 25, 2020
```
* replace std::runtime_error with Log::Fatal

* Update c_api.cpp
```
  37ce3eb2
24 Feb, 2020 1 commit
- fixed cpplint issues (#2809) · 224b8b98
  Nikita Titov authored Feb 24, 2020
```
* fixed cpplint issues

* fixed cpplint issues
```
  224b8b98
22 Feb, 2020 1 commit

some code refactoring (#2769) · 3e80df7e

Guolin Ke authored Feb 22, 2020

* some refines

* more omp refactoring

* format define

* fix merge bug

* some fixes

* fix some warnings

* Apply suggestions from code review

* Apply suggestions from code review

* remove dup codes

3e80df7e

20 Feb, 2020 2 commits

added feature infos to JSON dump (#2660) · c4a7ab81

Nikita Titov authored Feb 20, 2020



* added feature infos to JSON dump

* slight json schema refactor

* simpified code

* refactor feature_infos

* refactoring

* Update src/boosting/gbdt.cpp

* Update dataset.h

* Update include/LightGBM/dataset.h

* simplify

* Apply suggestions from code review

* parse string and construct JSON objs
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

c4a7ab81

Add capability to get possible max and min values for a model (#2737) · 18e7de4f

Joan Fontanals authored Feb 20, 2020



* Add capability to get possible max and min values for a model

* Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp

* Update include/LightGBM/c_api.h
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Change iteration to avoid potential overflow, add bindings to R and Python and a basic test

* Adjust test values

* Consider const correctness and multithreading protection

* Update test values

* Update test values

* Add test to check that model is exactly the same in all platforms

* Try to parse the model to get the expected values

* Try to parse the model to get the expected values

* Fix implementation, num_leaves can be lower than the leaf_value_ size

* Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value

* Change test order

* Add gpu_use_dp option in test

* Remove helper test method

* Update src/c_api.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_basic.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Remoove imports
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

18e7de4f

19 Feb, 2020 1 commit

[python] [R-package] refine the parameters for Dataset (#2594) · 9f79e840

Guolin Ke authored Feb 19, 2020



* reset

* fix a bug

* fix test

* Update c_api.h

* support to no filter features by min_data

* add warning in reset config

* refine warnings for override dataset's parameter

* some cleans

* clean code

* clean code

* refine C API function doxygen comments

* refined new param description

* refined doxygen comments for R API function

* removed stuff related to int8

* break long line in warning message

* removed tests which results cannot be validated anymore

* added test for warnings about unchangeable params

* write parameter from dataset to booster

* consider free_raw_data.

* fix params

* fix bug

* implementing R

* fix typo

* filter params in R

* fix R

* not min_data

* refined tests

* fixed linting

* refine

* pilint

* add docstring

* fix docstring

* R lint

* updated description for C API function

* use param aliases in Python

* fixed typo

* fixed typo

* added more params to test

* removed debug print

* fix dataset construct place

* fix merge bug

* Update feature_histogram.hpp

* add is_sparse back

* remove unused parameters

* fix lint

* add data random seed

* update

* [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

9f79e840

10 Feb, 2020 1 commit

Refactoring monotone constraints (linked to #2305) (#2717) · 3670e476

CharlesAuguste authored Feb 10, 2020



* Move monotone constraints to the monotone_constraints files.

* Add checks for debug mode.

* Refactored FindBestSplitsFromHistograms.

* Add headers.

* fix

* Update data_parallel_tree_learner.cpp

* simplify ComputeBestSplitForFeature

* Fix min / max issue.

* Remove duplicated check.
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

3670e476