Commits · 5a4608467d930aa46f8ef5d7ceb6ecfb8643b220 · tianlh / LightGBM-DCU

28 Dec, 2020 1 commit

small code and docs refactoring (#3681) · 5a460846

Nikita Titov authored Dec 29, 2020

* small code and docs refactoring

* Update CMakeLists.txt

* Update .vsts-ci.yml

* Update test.sh

* continue

* continue

* revert stable sort for all-unique values

5a460846

24 Dec, 2020 1 commit

Trees with linear models at leaves (#3299) · fcfd4132

Belinda Trotta authored Dec 24, 2020

* Add Eigen library.

* Working for simple test.

* Apply changes to config params.

* Handle nan data.

* Update docs.

* Add test.

* Only load raw data if boosting=gbdt_linear

* Remove unneeded code.

* Minor updates.

* Update to work with sk-learn interface.

* Update to work with chunked datasets.

* Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.

* Save raw data in binary dataset file.

* Update docs and fix parameter checking.

* Fix dataset loading.

* Add test for regularization.

* Fix bugs when saving and loading tree.

* Add test for load/save linear model.

* Remove unneeded code.

* Fix case where not enough leaf data for linear model.

* Simplify code.

* Speed up code.

* Speed up code.

* Simplify code.

* Speed up code.

* Fix bugs.

* Working version.

* Store feature data column-wise (not fully working yet).

* Fix bugs.

* Speed up.

* Speed up.

* Remove unneeded code.

* Small speedup.

* Speed up.

* Minor updates.

* Remove unneeded code.

* Fix bug.

* Fix bug.

* Speed up.

* Speed up.

* Simplify code.

* Remove unneeded code.

* Fix bug, add more tests.

* Fix bug and add test.

* Only store numerical features

* Fix bug and speed up using templates.

* Speed up prediction.

* Fix bug with regularisation

* Visual studio files.

* Working version

* Only check nans if necessary

* Store coeff matrix as an array.

* Align cache lines

* Align cache lines

* Preallocation coefficient calculation matrices

* Small speedups

* Small speedup

* Reverse cache alignment changes

* Change to dynamic schedule

* Update docs.

* Refactor so that linear tree learner is not a separate class.

* Add refit capability.

* Speed up

* Small speedups.

* Speed up add prediction to score.

* Fix bug

* Fix bug and speed up.

* Speed up dataload.

* Speed up dataload

* Use vectors instead of pointers

* Fix bug

* Add OMP exception handling.

* Change return type of LGBM_BoosterGetLinear to bool

* Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change

* Remove unused internal_parent_ property of tree

* Remove unused parameter to CreateTreeLearner

* Remove reference to LinearTreeLearner

* Minor style issues

* Remove unneeded check

* Reverse temporary testing change

* Fix Visual Studio project files

* Restore LightGBM.vcxproj.filters

* Speed up

* Speed up

* Simplify code

* Update docs

* Simplify code

* Initialise storage space for max num threads

* Move Eigen to include directory and delete unused files

* Remove old files.

* Fix so it compiles with mingw

* Fix gpu tree learner

* Change AddPredictionToScore back to const

* Fix python lint error

* Fix C++ lint errors

* Change eigen to a submodule

* Update comment

* Add the eigen folder

* Try to fix build issues with eigen

* Remove eigen files

* Add eigen as submodule

* Fix include paths

* Exclude eigen files from Python linter

* Ignore eigen folders for pydocstyle

* Fix C++ linting errors

* Fix docs

* Fix docs

* Exclude eigen directories from doxygen

* Update manifest to include eigen

* Update build_r to include eigen files

* Fix compiler warnings

* Store raw feature data as float

* Use float for calculating linear coefficients

* Remove eigen directory from GLOB

* Don't compile linear model code when building R package

* Fix doxygen issue

* Fix lint issue

* Fix lint issue

* Remove uneeded code

* Restore delected lines

* Restore delected lines

* Change return type of has_raw to bool

* Update docs

* Rename some variables and functions for readability

* Make tree_learner parameter const in AddScore

* Fix style issues

* Pass vectors as const reference when setting tree properties

* Make temporary storage of serial_tree_learner mutable so we can make the object's methods const

* Remove get_raw_size, use num_numeric_features instead

* Fix typo

* Make contains_nan_ and any_nan_ properties immutable again

* Remove data_has_nan_ property of tree

* Remove temporary test code

* Make linear_tree a dataset param

* Fix lint error

* Make LinearTreeLearner a separate class

* Fix lint errors

* Fix lint error

* Add linear_tree_learner.o

* Simulate omp_get_max_threads if openmp is not available

* Update PushOneData to also store raw data.

* Cast size to int

* Fix bug in ReshapeRaw

* Speed up code with multithreading

* Use OMP_NUM_THREADS

* Speed up with multithreading

* Update to use ArrayToString

* Fix tests

* Fix test

* Fix bug introduced in merge

* Minor updates

* Update docs

fcfd4132

08 Dec, 2020 1 commit

Fix model locale issue and improve model R/W performance. (#3405) · 792c9303

Alberto Ferreira authored Dec 08, 2020

* Fix LightGBM models locale sensitivity and improve R/W performance.

When Java is used, the default C++ locale is broken. This is true for
Java providers that use the C API or even Python models that require JEP.

This patch solves that issue making the model reads/writes insensitive
to such settings.
To achieve it, within the model read/write codebase:
 - C++ streams are imbued with the classic locale
 - Calls to functions that are dependent on the locale are replaced
 - The default locale is not changed!

This approach means:
 - The user's locale is never tampered with, avoiding issues such as
    https://github.com/microsoft/LightGBM/issues/2979 with the previous
    approach https://github.com/microsoft/LightGBM/pull/2891
 - Datasets can still be read according the user's locale
 - The model file has a single format independent of locale

Changes:
 - Add CommonC namespace which provides faster locale-independent versions of Common's methods
 - Model code makes conversions through CommonC
 - Cleanup unused Common methods
 - Performance improvements. Use fast libraries for locale-agnostic conversion:
   - value->string: https://github.com/fmtlib/fmt
   - string->double: https://github.com/lemire/fast_double_parser (10x
      faster double parsing according to their benchmark)

Bugfixes:
 - https://github.com/microsoft/LightGBM/issues/2500
 - https://github.com/microsoft/LightGBM/issues/2890
 - https://github.com/ninia/jep/issues/205

 (as it is related to LGBM as well)

* Align CommonC namespace

* Add new external_libs/ to python setup

* Try fast_double_parser fix #1

Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe

If it works it should fix more LGBM builds

* CMake: Attempt to link fmt without explicit PUBLIC tag

* Exclude external_libs from linting

* Add exernal_libs to MANIFEST.in

* Set dynamic linking option for fmt.

* linting issues

* Try to fix lint includes

* Try to pass fPIC with static fmt lib

* Try CMake P_I_C option with fmt library

* [R-package] Add CMake support for R and CRAN

* Cleanup CMakeLists

* Try fmt hack to remove stdout

* Switch to header-only mode

* Add PRIVATE argument to target_link_libraries

* use fmt in header-only mode

* Remove CMakeLists comment

* Change OpenMP to PUBLIC linking in Mac

* Update fmt submodule to 7.1.2

* Use fmt in header-only-mode

* Remove fmt from CMakeLists.txt

* Upgrade fast_double_parser to v0.2.0

* Revert "Add PRIVATE argument to target_link_libraries"

This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7.

* Address James Lamb's comments

* Update R-package/.Rbuildignore
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Upgrade to fast_double_parser v0.3.0 - Solaris support

* Use legacy code only in Solaris

* Fix lint issues

* Fix comment

* Address StrikerRUS's comments (solaris ifdef).

* Change header guards
Co-authored-by: James Lamb <jaylamb20@gmail.com>

792c9303

09 Oct, 2020 1 commit

Move Tree destructor to header file (#3417) · f1aaa9b9

Lucas David authored Oct 09, 2020



~ Added 'noexcept' specifier and defaulted desctructor.
Co-authored-by: Lucas DAVID <lucas@isdom.isoft.fr>

f1aaa9b9

28 Jun, 2020 1 commit

adding sparse support to TreeSHAP in lightgbm (#3000) · 9f367d11

Ilya Matiach authored Jun 28, 2020

* adding sparse support to TreeSHAP in lightgbm

* updating based on comments

* updated based on comments, used fromiter instead of frombuffer

* updated based on comments

* fixed limits import order

* fix sparse feature contribs to work with more than int32 max rows

* really fixed int64 max error and build warnings

* added sparse test with >int32 max rows

* fixed python side reshape check on sparse data

* updated based on latest comments

* fixed comments

* added CSC INT32_MAX validation to test, fixed comments

9f367d11

23 Jun, 2020 1 commit

Interaction constraints (#3126) · bca2da97

Belinda Trotta authored Jun 23, 2020

* Add interaction constraints functionality.

* Minor fixes.

* Minor fixes.

* Change lambda to function.

* Fix gpu bug, remove extra blank lines.

* Fix gpu bug.

* Fix style issues.

* Try to fix segfault on MACOS.

* Fix bug.

* Fix bug.

* Fix bugs.

* Change parameter format for R.

* Fix R style issues.

* Change string formatting code.

* Change docs to say R package not supported.

* Remove R functionality, moving to separate PR.

* Keep track of branch features in tree object.

* Only track branch features when feature interactions are enabled.

* Fix lint error.

* Update docs and simplify tests.

bca2da97

09 Jun, 2020 1 commit
- Update tree.cpp (#3148) · 8092c9fe
  Guolin Ke authored Jun 09, 2020
  
  8092c9fe
05 Jun, 2020 1 commit
- Revert "re-order includes (fixes #3132) (#3133)" (#3153) · ac5f5e56
  Nikita Titov authored Jun 05, 2020
```
This reverts commit 656d2676.
```
  ac5f5e56
01 Jun, 2020 1 commit
- re-order includes (fixes #3132) (#3133) · 656d2676
  James Lamb authored Jun 01, 2020
  
  656d2676
13 Apr, 2020 1 commit

[ci] more cpp lints (#2985) · 5c0baf6f

Guolin Ke authored Apr 14, 2020



* fix

* Apply suggestions from code review
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

5c0baf6f

04 Apr, 2020 1 commit
- fixed cpplint errors (#2971) · d84e9a2e
  Nikita Titov authored Apr 04, 2020
  
  d84e9a2e
02 Apr, 2020 1 commit

Cleanup MissingType enum constants (#2931) · 51f37e9b

Alberto Ferreira authored Apr 02, 2020



* [refactor] Cleanup MissingType enum constants

* Update tree.cpp
Co-authored-by: Alberto Ferreira <alberto.ferreira@feedzai.com>

51f37e9b

23 Mar, 2020 1 commit

Improving monotone constraints ("Fast" method; linked to #2305, #2717) (#2770) · a8c1e0a1

CharlesAuguste authored Mar 23, 2020

* Add util functions.

* Added monotone_constraints_method as a parameter.

* Add the intermediate constraining method.

* Updated tests.

* Minor fixes.

* Typo.

* Linting.

* Ran the parameter generator for the doc.

* Removed usage of the FeatureMonotone function.

* more fixes

* Fix.

* Remove duplicated code.

* Add debug checks.

* Typo.

* Bug fix.

* Disable the use of intermediate monotone constraints and feature sampling at the same time.

* Added an alias for monotone constraining method.

* Use the right variable to get the number of threads.

* Fix DEBUG checks.

* Add back check to determine if histogram is splittable.

* Added forgotten override keywords.

* Perform monotone constraint update only when necessary.

* Small refactor of FastLeafConstraints.

* Post rebase commit.

* Small refactor.

* Typo.

* Added comment and slightly improved logic of monotone constraints.

* Forgot a const.

* Vectors that are to be modified need to be pointers.

* Rename FastLeafConstraints to IntermediateLeafConstraints to match documentation.

* Remove overload of GoUpToFindLeavesToUpdate.

* Stop memory leaking.

* Fix cpplint issues.

* Fix checks.

* Fix more cpplint issues.

* Refactor config monotone constraints method.

* Typos.

* Remove useless empty lines.

* Add new line to separate includes.

* Replace unsigned ind by size_t.

* Reduce number of trials in tests to decrease CI time.

* Specify monotone constraints better in tests.

* Removed outer loop in test of monotone constraints.

* Added categorical features to the monotone constraints tests.

* Add blank line.

* Regenerate parameters automatically.

* Speed up ShouldKeepGoingLeftRight.
Co-authored-by: Charles Auguste <auguste@dubquantdev801.ire.susq.com>
Co-authored-by: guolinke <guolin.ke@outlook.com>

a8c1e0a1

22 Feb, 2020 1 commit

some code refactoring (#2769) · 3e80df7e

Guolin Ke authored Feb 22, 2020

* some refines

* more omp refactoring

* format define

* fix merge bug

* some fixes

* fix some warnings

* Apply suggestions from code review

* Apply suggestions from code review

* remove dup codes

3e80df7e

20 Feb, 2020 1 commit

Add capability to get possible max and min values for a model (#2737) · 18e7de4f

Joan Fontanals authored Feb 20, 2020



* Add capability to get possible max and min values for a model

* Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp

* Update include/LightGBM/c_api.h
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Change iteration to avoid potential overflow, add bindings to R and Python and a basic test

* Adjust test values

* Consider const correctness and multithreading protection

* Update test values

* Update test values

* Add test to check that model is exactly the same in all platforms

* Try to parse the model to get the expected values

* Try to parse the model to get the expected values

* Fix implementation, num_leaves can be lower than the leaf_value_ size

* Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value

* Change test order

* Add gpu_use_dp option in test

* Remove helper test method

* Update src/c_api.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_basic.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Remoove imports
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

18e7de4f

14 Aug, 2019 1 commit
- fix nan in tree model (#2303) · 9558417a
  Guolin Ke authored Aug 14, 2019
```
* fix nan in tree model

* fix
```
  9558417a
25 Jul, 2019 1 commit
- fixed cpplint errors about spaces and indents (#2282) · 716fe4d0
  Nikita Titov authored Jul 25, 2019
  
  716fe4d0
24 Jul, 2019 1 commit

add weight in tree model output (#2269) · e1d7a7b9

Guolin Ke authored Jul 24, 2019

* add weight in tree model output

* fix bug

* updated Python plotting part to handle weights

e1d7a7b9

29 Apr, 2019 1 commit
- dump_model() bug with num_leaves=1 · c45ec4f1
  Guolin Ke authored Apr 29, 2019
  
  c45ec4f1
13 Apr, 2019 1 commit
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

01 Apr, 2019 1 commit
- addressed cpplint error about C-style cast (#2064) · 2027f6b4
  Nikita Titov authored Apr 01, 2019
  
  2027f6b4
02 Feb, 2019 1 commit
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
20 May, 2018 1 commit

Refine config object (#1381) · dc699574

Guolin Ke authored May 20, 2018

* [WIP] refine config

* [wip] ready for the auto code generate

* auto generate config codes

* use with to open file

* fix bug

* fix pylint

* fix bug

* fix pylint

* fix bugs.

* tmp for failed test.

* fix tests.

* added nthreads alias

* added new aliases from new config.h

* fixed duplicated alias

* refactored parameter_generator.py

* added new aliases from config.h and removed remaining old names

* fix bugs & some miss alias

* added aliases

* add more descriptions.

* add comment.

dc699574

11 May, 2018 1 commit
- [python] decode error description (#1362) · 899151fc
  Nikita Titov authored May 11, 2018
```
* decode error description

* added break line char in log massages
```
  899151fc
24 Jan, 2018 1 commit
- fix a multi-thread bug in pred_contrib · 61fb5ea2
  Guolin Ke authored Jan 24, 2018
  
  61fb5ea2
12 Dec, 2017 1 commit
- change kZeroThreshold to 1e-35f · 0a7a4080
  Guolin Ke authored Dec 12, 2017
  
  0a7a4080
26 Nov, 2017 1 commit

Speed up saving and loading model (#1083) · 8a5ec366

Guolin Ke authored Nov 26, 2017

* remove protobuf

* add version number

* remove pmml script

* use float for split gain

* fix warnings

* refine the read model logic of gbdt

* fix compile error

* improve decode speed

* fix some bugs

* fix double accuracy problem

* fix bug

* multi-thread save model

* speed up save model to string

* parallel save/load model

* fix some warnings.

* fix warnings.

* fix a bug

* remove debug output

* fix doc

* fix max_bin warning in tests.

* fix max_bin warning

* fix pylint

* clean code for stringToArray

* clean code for TToString

* remove max_bin

* replace "class" with typename

8a5ec366

15 Nov, 2017 2 commits
- fix some formats · 3d65d065
  Guolin Ke authored Nov 15, 2017
  
  3d65d065
- Add func to handle sparse testing data (#1045) · ba5c7459
  ww authored Nov 15, 2017
```
* first commit

* fix bug

* fix by commits

* fix by commit

* add funcs to IfElse

* fix bug

* fix bug

* fix bug

* change tab to space
```
  ba5c7459
16 Sep, 2017 1 commit

Fix feature attributions for regression models and add Python bindings (#861) · 67c2bdf9

Scott Lundberg authored Sep 16, 2017

* Fix feature attributions for regression models and add Python bindings

* Address pylint issue

* Lazy fix missing tree depth info

67c2bdf9

02 Sep, 2017 1 commit
- fix tree model format (support multi-cat threshold) · ae6ff288
  Guolin Ke authored Sep 02, 2017
  
  ae6ff288
29 Aug, 2017 2 commits
- clean code for Tree. · 2c572a71
  Guolin Ke authored Aug 29, 2017
  
  2c572a71
- clean code for Boosting. · 6d0eae0c
  Guolin Ke authored Aug 29, 2017
  
  6d0eae0c
20 Aug, 2017 2 commits
- support constant tree (one-leaf tree) (#851) · cc83cd67
  Guolin Ke authored Aug 20, 2017
  
  cc83cd67
- clean code for the split of bins and leaves. · 6c4a9750
  Guolin Ke authored Aug 20, 2017
  
  6c4a9750
30 Jul, 2017 1 commit

Better missing value handle (#747) · 00cb04a2

Guolin Ke authored Jul 30, 2017

* finish the data loading part

* allow prediction.

* fix bug for decision type.

* finish split finding part

* fix bugs.

* bug fixed. add a test .

* fix pep8 .

* update documents.

* fix test bugs.

* fix a format

* fix import error in python test.

* disable missing handle in categorial features.

* fix a bug.

* add more tests.

* fix pep8

* fix bugs.

* remove the missing handle code for categorical feature.

00cb04a2

13 Jun, 2017 1 commit

[python] fix dump model with infinite threshold (#617) · f2c99ea4

wxchan authored Jun 13, 2017

* avoid threshold inf

* use __save_model_to_string for feature importance

* Revert "use __save_model_to_string for feature importance"

This reverts commit dca6a85fb3d89866eb56eb0c9ca103ada4d92b53.

f2c99ea4

06 Jun, 2017 1 commit
- fix the split functions. · 3a4608f4
  Guolin Ke authored Jun 06, 2017
  
  3a4608f4
15 May, 2017 1 commit
- Handle for missing values (#516) · e984b0d6
  Guolin Ke authored May 15, 2017
  
  e984b0d6