Commits · 992f5056652408fa3f6b8a96e09c33f84e2ebe87 · tianlh / LightGBM-DCU

"...git@developer.sourcefind.cn:tianlh/lightgbm-dcu.git" did not exist on "c8712a9e083bfa0922f6e860f49a8a374816f43e"

01 Feb, 2023 1 commit

[CUDA] consolidate CUDA versions (#5677) · 4f47547c

James Lamb authored Jan 31, 2023



* [ci] speed up if-else, swig, and lint conda setup

* add 'source activate'

* python constraint

* start removing cuda v1

* comment out CI

* remove more references

* revert some unnecessaary changes

* revert a few more mistakes

* revert another change that ignored params

* sigh

* remove CUDATreeLearner

* fix tests, docs

* fix quoting in setup.py

* restore all CI

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Apply suggestions from code review

* completely remove cuda_exp, update docs

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

4f47547c

11 Oct, 2022 1 commit
- suppress alias warnings with verbosity<0 (fixes #4518) (#5253) · 46427128
  José Morales authored Oct 10, 2022
  
  46427128
01 May, 2022 1 commit

fix some wrong format specifiers (#5190) · 0fb09e77

Nikita Titov authored May 01, 2022

* Update dataset_loader.cpp

* Update config.cpp

* Update application.cpp

* Update linkers_socket.cpp

0fb09e77

23 Mar, 2022 1 commit

[CUDA] New CUDA version Part 1 (#4630) · 6b56a90c

shiyu1994 authored Mar 23, 2022



* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6b56a90c

28 Oct, 2021 1 commit
- Reset OpenMP thread number if num_threads <= 0 (#4704) · 42914830
  Zhiyuan He authored Oct 29, 2021
```
* mock func for no openmp

* use omp_get_max_threads
Co-authored-by: hzy46 <email@example.com>
```
  42914830
07 May, 2021 1 commit

Precise text file parsing (#4081) · f8318088

Chen Yufei authored May 07, 2021



* New build option: USE_PRECISE_TEXT_PARSER.

Use fast_double_parser for text file parsing. For each number, fallback
to strtod in case of parse failure.

* Add benchmark for CSVParser with Atof and AtofPrecise.

* Fix lint complaint.

* Fix typo in open result error message.

* Revert "Fix lint complaint."

This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.

* Revert "Add benchmark for CSVParser with Atof and AtofPrecise."

This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.

* Use AtofPrecise in Common::__StringToTHelper.

* [option] precise_float_parser: precise float number parsing for text input.

* Remove USE_PRECISE_TEXT_PARSER compile option.

* test: add test for Common::AtofPrecise.

* test: remove ChunkedArrayTest with 0 length.

This triggers Log::Fatal which aborts the test program.

* fix lint, add copyright.

* Revert "test: remove ChunkedArrayTest with 0 length."

This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.

* Use LightGBM::Common::Sign

* save precise_float_parser in model file.

* Fix error checking in AtofPrecise. Add more test cases.

* Remove test case that can't pass under macOS.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

f8318088

19 Feb, 2021 1 commit
- [docs] Change some 'parallel learning' references to 'distributed learning' (#4000) · 7880b79f
  James Lamb authored Feb 19, 2021
```
* [docs] Change some 'parallel learning' references to 'distributed learning'

* found a few more

* one more reference
```
  7880b79f
03 Feb, 2021 1 commit
- Add new task type: "save_binary" (#3651) · 111d0c80
  Chen Yufei authored Feb 03, 2021
```
* Add new task type: "save_binary".

* Document for task "save_binary".
```
  111d0c80
21 Nov, 2020 1 commit

[R-package] Remove CLI-only objects (#3566) · 1c5930bf

James Lamb authored Nov 21, 2020



* [R-package] Remove CLI-only objects

* more guards

* more guards

* variable not string

* simplify fix

* revert build_r.R changes

* move define of global_timer
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1c5930bf

20 Sep, 2020 1 commit

[GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457

Chip Kerchner authored Sep 20, 2020

* Initial CUDA work

* redirect log to python console (#3090)

* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

* re-order includes (fixes #3132) (#3133)

* Revert "re-order includes (fixes #3132) (#3133)" (#3153)

This reverts commit 656d2676

* Missing change from previous rebase

* Minor cleanup and removal of development scripts.

* Only set gpu_use_dp on by default for CUDA. Other minor change.

* Fix python lint indentation problem.

* More python lint issues.

* Big lint cleanup - more to come.

* Another large lint cleanup - more to come.

* Even more lint cleanup.

* Minor cleanup so less differences in code.

* Revert is_use_subset changes

* Another rebase from master to fix recent conflicts.

* More lint.

* Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.

* Removed parameters added for CUDA and various bug fix.

* Yet more lint and unneccessary changes.

* Revert another change.

* Removal of unneccessary code.

* temporary appveyor.yml for building and testing

* Remove return value in ReSize

* Removal of unused variables.

* Code cleanup from reviewers suggestions.

* Removal of FIXME comments and unused defines.

* More reviewers comments cleanup.

* Fix config variables.

* Attempt to fix check-docs failure

* Update Paramster.rst for num_gpu

* Removing test appveyor.yml

* Add CUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.

* Fixed handling of data elements less than 2K.

* More reviewers comments cleanup.

* Removal of TODO and fix printing of int64_t

* Add cuda change for CI testing and remove cuda from device_type in python.

* Missed one change form previous check-in

* Removal AdditionConfig and fix settings.

* Limit number of GPUs to one for now in CUDA.

* Update Parameters.rst for previous check-in

* Whitespace removal.

* Cleanup unused code.

* Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.

* Lint change from previous check-in.

* Changes based on reviewers comments.

* More reviewer comment changes.

* Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_

* Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.

* Reviewer comment cleanup.

* Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.

* Remove PRINT debug for CUDA code.

* Allow to use of multiple GPUs for CUDA.

* More multi-GPUs enablement for CUDA.

* More code cleanup based on reviews comments.

* Update docs with latest config changes.
Co-authored-by: Gordon Fossum <fossum@us.ibm.com>
Co-authored-by: ChipKerchner <ckerchne@linux.vnet.ibm.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

f7ad9457

06 Aug, 2020 1 commit

[Python] / [R] add start_iteration to python predict interface (fix #3058) (#3272) · 82e2ff7a

shiyu1994 authored Aug 06, 2020



* [python] add start_iteration to python predict interface (#3058)

* Apply suggestions from code review

* Update lightgbm_R.h

* Apply suggestions from code review

* Apply suggestions from code review

* fix R interface

* update R documentation
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

82e2ff7a

15 Jul, 2020 1 commit

feature importance type in saved model file (#3220) · 87d46489

Guolin Ke authored Jul 16, 2020



* feature importance type in saved model file

* fix nullptr

* fixed formatting

* fix python/R

* Update src/c_api.cpp

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix c_api test

* fix swig

* minor docs improvements and added defines for importance types
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

87d46489

05 Jun, 2020 1 commit
- Revert "re-order includes (fixes #3132) (#3133)" (#3153) · ac5f5e56
  Nikita Titov authored Jun 05, 2020
```
This reverts commit 656d2676.
```
  ac5f5e56
01 Jun, 2020 1 commit
- re-order includes (fixes #3132) (#3133) · 656d2676
  James Lamb authored Jun 01, 2020
  
  656d2676
02 Mar, 2020 1 commit
- fix bug in parallel learning (#2851) · da91c613
  Guolin Ke authored Mar 02, 2020
```
* refix

* fix config

* avoid to rely on config
```
  da91c613
20 Feb, 2020 1 commit

remove init-score parameter (#2776) · 3c394c8d

Guolin Ke authored Feb 20, 2020



* remove related cpp codes

* removed more mentiones of init_score_filename params
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

3c394c8d

04 Feb, 2020 1 commit
- no omp_set_nested (#2734) · 3e9a6140
  Guolin Ke authored Feb 04, 2020
```
* no omp_set_nested

* Update openmp_wrapper.h
```
  3e9a6140
02 Feb, 2020 1 commit

Support both row-wise and col-wise multi-threading (#2699) · 509c2e50

Guolin Ke authored Feb 02, 2020



* commit

* fix a bug

* fix bug

* reset to track changes

* refine the auto choose logic

* sort the time stats output

* fix include

* change  multi_val_bin_sparse_threshold

* add cmake

* add _mm_malloc and _mm_free for cross platform

* fix cmake bug

* timer for split

* try to fix cmake

* fix tests

* refactor DataPartition::Split

* fix test

* typo

* formating

* Revert "formating"

This reverts commit 5b8de4f7fb9d975ee23701d276a66d40ee6d4222.

* add document

* [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)

* naming

* fix gpu code

* Update include/LightGBM/bin.h
Co-Authored-By: James Lamb <jaylamb20@gmail.com>

* Update src/treelearner/ocl/histogram16.cl

* test: swap compilers for CI

* fix omp

* not avx2

* no aligned for feature histogram

* Revert "refactor DataPartition::Split"

This reverts commit 256e6d9641ade966a1f54da1752e998a1149b6f8.

* slightly refactor data partition

* reduce the memory cost
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

509c2e50

07 Jan, 2020 1 commit

option to disable the shape checking in prediction (#2669) · f6b8ecf6

Guolin Ke authored Jan 08, 2020



* implement

* better documentation

* Update include/LightGBM/config.h
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* fix document

* regenerate docs
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

f6b8ecf6

29 Nov, 2019 1 commit
- Add more debug logging to show data load progress. (#2587) · b3975555
  ashok-ponnuswami-msft authored Nov 28, 2019
  
  b3975555
22 Sep, 2019 1 commit

fix many cpp lint errors (#2426) · f1a14869

Guolin Ke authored Sep 22, 2019

* fix many cpp lint errors

* indent

* fix bug

* fix more

* fix gpu

* more fixes

f1a14869

01 Aug, 2019 1 commit
- delete the variable which is never used (#2294) · 55656f27
  奇安信CodeSafe authored Aug 01, 2019
  
  55656f27
13 Apr, 2019 1 commit
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

02 Feb, 2019 1 commit
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
22 Aug, 2018 1 commit

add start_iteration in model saving (#1565) · 941068ee

Guolin Ke authored Aug 22, 2018

* add start_iteration in model saving

* fix test

* shuffle models ability

* fix bug

* update document

* refine

* Update engine.py

* Update basic.py

* fix comments

* fix comment

941068ee

20 May, 2018 1 commit

Refine config object (#1381) · dc699574

Guolin Ke authored May 20, 2018

* [WIP] refine config

* [wip] ready for the auto code generate

* auto generate config codes

* use with to open file

* fix bug

* fix pylint

* fix bug

* fix pylint

* fix bugs.

* tmp for failed test.

* fix tests.

* added nthreads alias

* added new aliases from new config.h

* fixed duplicated alias

* refactored parameter_generator.py

* added new aliases from config.h and removed remaining old names

* fix bugs & some miss alias

* added aliases

* add more descriptions.

* add comment.

dc699574

12 Jan, 2018 1 commit
- refine output message of loading model. · e2835654
  Guolin Ke authored Jan 12, 2018
  
  e2835654
22 Dec, 2017 1 commit
- fix a silly bug. · b30beb99
  Guolin Ke authored Dec 22, 2017
  
  b30beb99
19 Dec, 2017 1 commit

support refit model by new data (#1124) · 92f2a570

Guolin Ke authored Dec 19, 2017

* add code for refit tree

* add implementation.

* update documents.

* clean code

* fix a type

92f2a570

26 Nov, 2017 1 commit

Speed up saving and loading model (#1083) · 8a5ec366

Guolin Ke authored Nov 26, 2017

* remove protobuf

* add version number

* remove pmml script

* use float for split gain

* fix warnings

* refine the read model logic of gbdt

* fix compile error

* improve decode speed

* fix some bugs

* fix double accuracy problem

* fix bug

* multi-thread save model

* speed up save model to string

* parallel save/load model

* fix some warnings.

* fix warnings.

* fix a bug

* remove debug output

* fix doc

* fix max_bin warning in tests.

* fix max_bin warning

* fix pylint

* clean code for stringToArray

* clean code for TToString

* remove max_bin

* replace "class" with typename

8a5ec366

26 Oct, 2017 1 commit

fix protobuf on vs build (#1004) · 2b20569e

wxchan authored Oct 26, 2017

* [optional] support protobuf

* fix windows/LightGBM.vcxproj

* add doc

* fix doc

* fix vs support (#2)

* fix vs support

* fix cmake

2b20569e

20 Oct, 2017 1 commit
- Revert "[optional] support protobuf (#908)" (#1002) · 1ef3d43e
  Guolin Ke authored Oct 20, 2017
```
This reverts commit 53b99854.
```
  1ef3d43e
19 Oct, 2017 1 commit
- [optional] support protobuf (#908) · 53b99854
  wxchan authored Oct 19, 2017
  
  53b99854
07 Oct, 2017 1 commit

[MRG] check duplicate params (#961) · d3e88ef5

wxchan authored Oct 07, 2017

* add get params

clean codes

* check duplicate params

* Revert "add get params"

This reverts commit ec1d8dd17aa2c83dd0d5f7716c59b6a6fb94102c.

* set priority by length & check duplicate

* rename function

d3e88ef5

18 Aug, 2017 4 commits
- Explain individual predictions using SHAP value feature attributions (#825) · 36732f23
  Scott Lundberg authored Aug 13, 2017
```
* Explain individual predictions using SHAP value feature attributions

* Address code review
```
  36732f23
- support specific path of initial scores. · 3b161919
  Guolin Ke authored Aug 11, 2017
  
  3b161919
- fix #822 · 203df1b4
  Guolin Ke authored Aug 11, 2017
  
  203df1b4
- move the training into boosting. · 29900dd7
  Guolin Ke authored Aug 09, 2017
  
  29900dd7
30 May, 2017 1 commit

Support early stopping of prediction in CLI (#565) · 6d4c7b03

Guolin Ke authored May 30, 2017

* fix multi-threading.

* fix name style.

* support in CLI version.

* remove warnings.

* Not default parameters.

* fix if...else... .

* fix bug.

* fix warning.

* refine c_api.

* fix R-package.

* fix R's warning.

* fix tests.

* fix pep8 .

6d4c7b03