Commits · 0655d67cc1ee28d536a773a6c5d978d38a0e56c1 · tianlh / LightGBM-DCU

13 Nov, 2020 1 commit

Optimization of row-wise histogram construction (#3522) · 0655d67c

shiyu1994 authored Nov 13, 2020



* store without offset in multi_val_dense_bin

* fix offset bug

* add comment for offset

* add comment for bin type selection

* faster operations for offset

* keep most freq bin in histogram for multi val dense

* use original feature iterators

* consider 9 cases (3 x 3) for multi val bin construction

* fix dense bin setting

* fix bin data in multi val group

* fix offset of the first feature histogram

* use float hist buf

* avx in histogram construction

* use avx for hist construction without prefetch

* vectorize bin extraction

* use only 128 vec

* use avx2

* use vectorization for sparse row wise

* add bit size for multi val dense bin

* float with no vectorization

* change multithreading strategy to dynamic

* remove intrinsic header

* fix dense multi val col copy

* remove bit size

* use large enough block size when the bin number is large

* calc min block size by sparsity

* rescale gradients

* rollback gradients scaling

* single precision histogram buffer as an option

* add float hist buffer with thread buffer

* fix setting zero in hist data

* fix hist begin pointer in tree learners

* remove debug logs

* remove omp simd

* update Makevars of R-package

* fix feature group binary storing

* two row wise for double hist buffer

* add subfeature for two row wise

* remove useless code and fix two row wise

* refactor code

* grouping the dense feature groups can get sparse multi val bin

* clean format problems

* one thread for two blocks in sep row wise

* use ordered gradients for sep row wise

* fix grad ptr

* ordered grad with combined block for sep row wise

* fix block threading

* use the same min block size

* rollback share min block size

* remove logs

* Update src/io/dataset.cpp
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* fix parameter description

* remove sep_row_wise

* remove check codes

* add check for empty multi val bin

* fix lint error

* rollback changes in config.h

* Apply suggestions from code review
Co-authored-by: Ubuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

0655d67c

23 Sep, 2020 1 commit
- Improve performance of path smoothing (#3396) · 7744757a
  Belinda Trotta authored Sep 23, 2020
```
* Make path smoothing faster

* Fix bug

* Fix bug

* Minor style fix
```
  7744757a
20 Sep, 2020 1 commit

[GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457

Chip Kerchner authored Sep 20, 2020

* Initial CUDA work

* redirect log to python console (#3090)

* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

* re-order includes (fixes #3132) (#3133)

* Revert "re-order includes (fixes #3132) (#3133)" (#3153)

This reverts commit 656d2676

* Missing change from previous rebase

* Minor cleanup and removal of development scripts.

* Only set gpu_use_dp on by default for CUDA. Other minor change.

* Fix python lint indentation problem.

* More python lint issues.

* Big lint cleanup - more to come.

* Another large lint cleanup - more to come.

* Even more lint cleanup.

* Minor cleanup so less differences in code.

* Revert is_use_subset changes

* Another rebase from master to fix recent conflicts.

* More lint.

* Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.

* Removed parameters added for CUDA and various bug fix.

* Yet more lint and unneccessary changes.

* Revert another change.

* Removal of unneccessary code.

* temporary appveyor.yml for building and testing

* Remove return value in ReSize

* Removal of unused variables.

* Code cleanup from reviewers suggestions.

* Removal of FIXME comments and unused defines.

* More reviewers comments cleanup.

* Fix config variables.

* Attempt to fix check-docs failure

* Update Paramster.rst for num_gpu

* Removing test appveyor.yml

* Add CUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.

* Fixed handling of data elements less than 2K.

* More reviewers comments cleanup.

* Removal of TODO and fix printing of int64_t

* Add cuda change for CI testing and remove cuda from device_type in python.

* Missed one change form previous check-in

* Removal AdditionConfig and fix settings.

* Limit number of GPUs to one for now in CUDA.

* Update Parameters.rst for previous check-in

* Whitespace removal.

* Cleanup unused code.

* Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.

* Lint change from previous check-in.

* Changes based on reviewers comments.

* More reviewer comment changes.

* Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_

* Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.

* Reviewer comment cleanup.

* Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.

* Remove PRINT debug for CUDA code.

* Allow to use of multiple GPUs for CUDA.

* More multi-GPUs enablement for CUDA.

* More code cleanup based on reviews comments.

* Update docs with latest config changes.
Co-authored-by: Gordon Fossum <fossum@us.ibm.com>
Co-authored-by: ChipKerchner <ckerchne@linux.vnet.ibm.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

f7ad9457

23 Jun, 2020 1 commit

Interaction constraints (#3126) · bca2da97

Belinda Trotta authored Jun 23, 2020

* Add interaction constraints functionality.

* Minor fixes.

* Minor fixes.

* Change lambda to function.

* Fix gpu bug, remove extra blank lines.

* Fix gpu bug.

* Fix style issues.

* Try to fix segfault on MACOS.

* Fix bug.

* Fix bug.

* Fix bugs.

* Change parameter format for R.

* Fix R style issues.

* Change string formatting code.

* Change docs to say R package not supported.

* Remove R functionality, moving to separate PR.

* Keep track of branch features in tree object.

* Only track branch features when feature interactions are enabled.

* Fix lint error.

* Update docs and simplify tests.

bca2da97

26 May, 2020 1 commit

memory corruption fix for distributed data parallel version before SyncUpGlobalBestSplit (#3110) · 8ead7cc1

Ilya Matiach authored May 26, 2020

* memory corruption fix for distributed data parallel version before SyncUpGlobalBestSplit

* updated based on comments

* updated voting and feature parallel based on comments

* fixing macos failure

* rename variable

8ead7cc1

05 Mar, 2020 1 commit

speed up `FindBestThresholdFromHistogram` (#2867) · 77d92b7c

Guolin Ke authored Mar 05, 2020

* speed up for const hessian

* rename template

* some refactorings

* refine

* refine

* simplify codes

* fix random in feature histogram

* code refine

* refine

* try fix

* make gcc happy

* remove timer

* rollback some changes

* more templates

* fix a bug

* reduce the cost of timer

* fix gpu

* fix bug

* fix gpu

77d92b7c

02 Mar, 2020 3 commits

fix bug in parallel learning (#2851) · da91c613
Guolin Ke authored Mar 02, 2020
```
* refix

* fix config

* avoid to rely on config
```
da91c613

reduce the overhead of OMP_NUM_THREADS in training (#2852) · 9c386db1

Guolin Ke authored Mar 02, 2020

* reduce overhead of get num_threads

* add warning

* Apply suggestions from code review

* Apply suggestions from code review

9c386db1

don't save num_thread as possible (#2839) · 0aa7bfee

Guolin Ke authored Mar 02, 2020



* don't cache `num_thread`, to avoid change outside

* rename

* update document

* Update docs/Parameters.rst

* Update include/LightGBM/config.h

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

0aa7bfee

10 Feb, 2020 1 commit

Refactoring monotone constraints (linked to #2305) (#2717) · 3670e476

CharlesAuguste authored Feb 10, 2020



* Move monotone constraints to the monotone_constraints files.

* Add checks for debug mode.

* Refactored FindBestSplitsFromHistograms.

* Add headers.

* fix

* Update data_parallel_tree_learner.cpp

* simplify ComputeBestSplitForFeature

* Fix min / max issue.

* Remove duplicated check.
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

3670e476

08 Feb, 2020 1 commit

various minor style, docs and cpplint improvements (#2747) · 1c1a2765

Nikita Titov authored Feb 09, 2020

* various minor style, docs and cpplint improvements

* fixed typo in warning

* fix recently added cpplint errors

* move note for params upper in description for consistency

1c1a2765

02 Feb, 2020 1 commit

Support both row-wise and col-wise multi-threading (#2699) · 509c2e50

Guolin Ke authored Feb 02, 2020



* commit

* fix a bug

* fix bug

* reset to track changes

* refine the auto choose logic

* sort the time stats output

* fix include

* change  multi_val_bin_sparse_threshold

* add cmake

* add _mm_malloc and _mm_free for cross platform

* fix cmake bug

* timer for split

* try to fix cmake

* fix tests

* refactor DataPartition::Split

* fix test

* typo

* formating

* Revert "formating"

This reverts commit 5b8de4f7fb9d975ee23701d276a66d40ee6d4222.

* add document

* [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)

* naming

* fix gpu code

* Update include/LightGBM/bin.h
Co-Authored-By: James Lamb <jaylamb20@gmail.com>

* Update src/treelearner/ocl/histogram16.cl

* test: swap compilers for CI

* fix omp

* not avx2

* no aligned for feature histogram

* Revert "refactor DataPartition::Split"

This reverts commit 256e6d9641ade966a1f54da1752e998a1149b6f8.

* slightly refactor data partition

* reduce the memory cost
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

509c2e50

14 Jan, 2020 1 commit

support most frequent bin (#2689) · c7e90393

Guolin Ke authored Jan 14, 2020

* implement

* fix warning

* fix bug

* fix a bug

* remove unneed function

* fix data push bug

* fix valid data push

* fix bug for missing_type=zero

* refine split

* renames

* typo

c7e90393

22 Sep, 2019 1 commit

fix many cpp lint errors (#2426) · f1a14869

Guolin Ke authored Sep 22, 2019

* fix many cpp lint errors

* indent

* fix bug

* fix more

* fix gpu

* more fixes

f1a14869

12 Sep, 2019 1 commit
- update feature_fraction_bynode (#2381) · ad8e8ccc
  Guolin Ke authored Sep 12, 2019
```
* update

* fix a bug

* Update config.h

* Update Parameters.rst
```
  ad8e8ccc
03 Sep, 2019 1 commit

sub-features for node level (#2330) · bbbad73d

Guolin Ke authored Sep 03, 2019

* add parameter

* implement

* fix bug

* fix bug

* fix according comment

* add test

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

bbbad73d

13 Apr, 2019 1 commit
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

02 Feb, 2019 1 commit
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
20 May, 2018 1 commit

Refine config object (#1381) · dc699574

Guolin Ke authored May 20, 2018

* [WIP] refine config

* [wip] ready for the auto code generate

* auto generate config codes

* use with to open file

* fix bug

* fix pylint

* fix bug

* fix pylint

* fix bugs.

* tmp for failed test.

* fix tests.

* added nthreads alias

* added new aliases from new config.h

* fixed duplicated alias

* refactored parameter_generator.py

* added new aliases from config.h and removed remaining old names

* fix bugs & some miss alias

* added aliases

* add more descriptions.

* add comment.

dc699574

11 May, 2018 1 commit

Shut up warnings (#1363) · 79d27770

Tsukasa OMOTO authored May 11, 2018

* Shut up warnings

- warning: 'void* memset(void*, int, size_t)' clearing an object of non-trivial type 'struct LightGBM::HistogramBinEntry'; use assignment or value-initialization instead [-Wclass-memaccess]
- warning: 'void* memcpy(void*, const void*, size_t)' writing to an object of type 'class std::tuple<int, double, double>' with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]

* void*

79d27770

18 Apr, 2018 1 commit
- Monotone Constraint (#1314) · e005cdb0
  Guolin Ke authored Apr 18, 2018
  
  e005cdb0
15 Dec, 2017 1 commit
- refine network interface · 51287f07
  Guolin Ke authored Dec 15, 2017
  
  51287f07
02 Sep, 2017 1 commit
- fix tree model format (support multi-cat threshold) · ae6ff288
  Guolin Ke authored Sep 02, 2017
  
  ae6ff288
20 Aug, 2017 1 commit
- slight reduce communication cost in parallel tree learner. · ecc8b8cd
  Guolin Ke authored Aug 20, 2017
  
  ecc8b8cd
30 Jun, 2017 1 commit
- clean code for tree learner. · 82e273ba
  Guolin Ke authored Jun 30, 2017
  
  82e273ba
09 Apr, 2017 1 commit

Initial GPU acceleration support for LightGBM (#368) · 0bb4a825

Huan Zhang authored Apr 09, 2017

* add dummy gpu solver code

* initial GPU code

* fix crash bug

* first working version

* use asynchronous copy

* use a better kernel for root

* parallel read histogram

* sparse features now works, but no acceleration, compute on CPU

* compute sparse feature on CPU simultaneously

* fix big bug; add gpu selection; add kernel selection

* better debugging

* clean up

* add feature scatter

* Add sparse_threshold control

* fix a bug in feature scatter

* clean up debug

* temporarily add OpenCL kernels for k=64,256

* fix up CMakeList and definition USE_GPU

* add OpenCL kernels as string literals

* Add boost.compute as a submodule

* add boost dependency into CMakeList

* fix opencl pragma

* use pinned memory for histogram

* use pinned buffer for gradients and hessians

* better debugging message

* add double precision support on GPU

* fix boost version in CMakeList

* Add a README

* reconstruct GPU initialization code for ResetTrainingData

* move data to GPU in parallel

* fix a bug during feature copy

* update gpu kernels

* update gpu code

* initial port to LightGBM v2

* speedup GPU data loading process

* Add 4-bit bin support to GPU

* re-add sparse_threshold parameter

* remove kMaxNumWorkgroups and allows an unlimited number of features

* add feature mask support for skipping unused features

* enable kernel cache

* use GPU kernels withoug feature masks when all features are used

* REAdme.

* update README

* fix typos (#349)

* change compile to gcc on Apple as default

* clean vscode related file

* refine api of constructing from sampling data.

* fix bug in the last commit.

* more efficient algorithm to sample k from n.

* fix bug in filter bin

* change to boost from average output.

* fix tests.

* only stop training when all classes are finshed in multi-class.

* limit the max tree output. change hessian in multi-class objective.

* robust tree model loading.

* fix test.

* convert the probabilities to raw score in boost_from_average of classification.

* fix the average label for binary classification.

* Add boost_from_average to docs (#354)

* don't use "ConvertToRawScore" for self-defined objective function.

* boost_from_average seems doesn't work well in binary classification. remove it.

* For a better jump link (#355)

* Update Python-API.md

* for a better jump in page

A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)

After adding the spaces, we can jump to the exact position in page by click the link.

* fixed something mentioned by @wxchan

* Update Python-API.md

* add FitByExistingTree.

* adapt GPU tree learner for FitByExistingTree

* avoid NaN output.

* update boost.compute

* fix typos (#361)

* fix broken links (#359)

* update README

* disable GPU acceleration by default

* fix image url

* cleanup debug macro

* remove old README

* do not save sparse_threshold_ in FeatureGroup

* add details for new GPU settings

* ignore submodule when doing pep8 check

* allocate workspace for at least one thread during builing Feature4

* move sparse_threshold to class Dataset

* remove duplicated code in GPUTreeLearner::Split

* Remove duplicated code in FindBestThresholds and BeforeFindBestSplit

* do not rebuild ordered gradients and hessians for sparse features

* support feature groups in GPUTreeLearner

* Initial parallel learners with GPU support

* add option device, cleanup code

* clean up FindBestThresholds; add some omp parallel

* constant hessian optimization for GPU

* Fix GPUTreeLearner crash when there is zero feature

* use np.testing.assert_almost_equal() to compare lists of floats in tests

* travis for GPU

0bb4a825

05 Apr, 2017 1 commit
- Let TreeLearner share the same code of ConstructHistograms. · 98ffbb2b
  Guolin Ke authored Apr 05, 2017
  
  98ffbb2b
28 Mar, 2017 1 commit
- support multi-threading exceptions. · 14195876
  Guolin Ke authored Mar 28, 2017
  
  14195876
13 Mar, 2017 1 commit
- fix bugs for parallel learning. · ebc0de8b
  Guolin Ke authored Mar 13, 2017
  
  ebc0de8b
03 Mar, 2017 1 commit
- fix bug in data/voting parallel. · 1bade1e2
  Guolin Ke authored Mar 03, 2017
  
  1bade1e2
01 Mar, 2017 2 commits

Squashed commit of the following: · 9ea487b1

Guolin Ke authored Feb 22, 2017

commit 70f88c54f6820d4c3824d97b34042c617fb21635
Author: Guolin Ke <i@yumumu.me>
Date:   Wed Feb 22 19:40:10 2017 +0800

    futher reduce guided

commit f54620b807c68f5ac6cef521d787ca2750efe464
Author: Guolin Ke <i@yumumu.me>
Date:   Wed Feb 22 19:32:36 2017 +0800

    avoid to use guided

9ea487b1

update to v2. · 4f77bd28
Guolin Ke authored Feb 20, 2017

4f77bd28

25 Jan, 2017 1 commit
- clean code · 5c5dce37
  Guolin Ke authored Jan 25, 2017
  
  5c5dce37
23 Jan, 2017 1 commit
- improve bagging speed · 4948dcc5
  Guolin Ke authored Jan 23, 2017
  
  4948dcc5
22 Jan, 2017 1 commit
- use subset to speed up bagging · c8fbd42b
  Guolin Ke authored Jan 22, 2017
  
  c8fbd42b
18 Dec, 2016 1 commit
- refine reset_parameters logic · c2e94f17
  Guolin Ke authored Dec 18, 2016
  
  c2e94f17
05 Dec, 2016 2 commits
- use ".empty()" to check container · 866a2f91
  Guolin Ke authored Dec 05, 2016
  
  866a2f91
- Categorical feature support (#108) · 1466f907
  Guolin Ke authored Dec 05, 2016
```
Categorical feature support (#108)
```
  1466f907
18 Nov, 2016 1 commit

Refactor for RAII (#86) · 5442ed78

Guolin Ke authored Nov 18, 2016

* RAII for utils, application and c_api(partical)

* raii for class in include folder

* raii for application and boosting

* raii for dataset and dataset loader

* raii for dense bin and parser

* RAII refactor for almost all classes

* RAII for c_api

* clean code

* refine repeated code

* Decouple the "sigmoid" between objective and boosting.

* change std::vector<bool> back to std::vector<char> due to concurrence problem

* slight reduce some memory cost

5442ed78