Commits · fffd066cb331a3573fc8565915c914ae5b6b8313 · tianlh / LightGBM-DCU

28 Dec, 2022 1 commit

Decouple Boosting Types (fixes #3128) (#4827) · fffd066c

Yifei Liu authored Dec 28, 2022



* add parameter data_sample_strategy

* abstract GOSS as a sample strategy(GOSS1), togetherwith origial GOSS (Normal Bagging has not been abstracted, so do NOT use it now)

* abstract Bagging as a subclass (BAGGING), but original Bagging members in GBDT are still kept

* fix some variables

* remove GOSS(as boost) and Bagging logic in GBDT

* rename GOSS1 to GOSS(as sample strategy)

* add warning about use GOSS as boosting_type

* a little ; bug

* remove CHECK when "gradients != nullptr"

* rename DataSampleStrategy to avoid confusion

* remove and add some ccomments, followingconvention

* fix bug about GBDT::ResetConfig (ObjectiveFunction inconsistencty bet…

* add std::ignore to avoid compiler warnings (anpotential fails)

* update Makevars and vcxproj

* handle constant hessian

move resize of gradient vectors out of sample strategy

* mark override for IsHessianChange

* fix lint errors

* rerun parameter_generator.py

* update config_auto.cpp

* delete redundant blank line

* update num_data_ when train_data_ is updated

set gradients and hessians when GOSS

* check bagging_freq is not zero

* reset config_ value

merge ResetBaggingConfig and ResetGOSS

* remove useless check

* add ttests in test_engine.py

* remove whitespace in blank line

* remove arguments verbose_eval and evals_result

* Update tests/python_package_test/test_engine.py

reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

reduce num_boost_round
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update src/boosting/sample_strategy.cpp

modify warning about setting goss as `boosting_type`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update tests/python_package_test/test_engine.py

replace load_boston() with make_regression()

remove value checks of mean_squared_error in test_sample_strategy_with_boosting()

* Update tests/python_package_test/test_engine.py

add value checks of mean_squared_error in test_sample_strategy_with_boosting()

* Modify warnning about using goss as boosting type

* Update tests/python_package_test/test_engine.py

add random_state=42 for make_regression()

reduce the threshold of mean_square_error

* Update src/boosting/sample_strategy.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* remove goss from boosting types in documentation

* Update src/boosting/bagging.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update src/boosting/bagging.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update src/boosting/goss.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update src/boosting/goss.hpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* rename GOSS with GOSSStrategy

* update doc

* address comments

* fix table in doc

* Update include/LightGBM/config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update documentation

* update test case

* revert useless change in test_engine.py

* add tests for evaluation results in test_sample_strategy_with_boosting

* include <string>

* change to assert_allclose in test_goss_boosting_and_strategy_equivalent

* more tolerance in result checking, due to minor difference in results of gpu versions

* change == to np.testing.assert_allclose

* fix test case

* set gpu_use_dp to true

* change --report to --report-level for rstcheck

* use gpu_use_dp=true in test_goss_boosting_and_strategy_equivalent

* revert unexpected changes of non-ascii characters

* revert unexpected changes of non-ascii characters

* remove useless changes

* allocate gradients_pointer_ and hessians_pointer when necessary

* add spaces

* remove redundant virtual

* include <LightGBM/utils/log.h> for USE_CUDA

* check for  in test_goss_boosting_and_strategy_equivalent

* check for identity in test_sample_strategy_with_boosting

* remove cuda  option in test_sample_strategy_with_boosting

* Update tests/python_package_test/test_engine.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_engine.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* ResetGradientBuffers after ResetSampleConfig

* ResetGradientBuffers after ResetSampleConfig

* ResetGradientBuffers after bagging

* remove useless code

* check objective_function_ instead of gradients

* enable rf with goss

simplify params in test cases

* remove useless changes

* allow rf with feature subsampling alone

* change position of ResetGradientBuffers

* check for dask

* add parameter types for data_sample_strategy
Co-authored-by: Guangda Liu <v-guangdaliu@microsoft.com>
Co-authored-by: Yu Shi <shiyu_k1994@qq.com>
Co-authored-by: GuangdaLiu <90019144+GuangdaLiu@users.noreply.github.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

fffd066c

27 Dec, 2022 1 commit

[CUDA] Add L2 metric for new CUDA version (#5633) · 6482b47e

shiyu1994 authored Dec 27, 2022

* add rmse metric for new cuda version

* add Init for CUDAMetricInterface

* fix lint errors

* fix rmse and add l2 metric for new cuda version

* use CUDAL2Metric

* explicit template instantiation

* write result only with the first thread

* pre allocate buffer for output converting

* fix l2 regression with cuda metric evaluation

* weighting loss in cuda metric evaluation

* mark CUDATree::AsConstantTree as override

6482b47e

29 Nov, 2022 1 commit
- Fix OpenMP thread allocation in Linux (#5551) · 4c5d0fbb
  Scott Votaw authored Nov 29, 2022
  
  4c5d0fbb
11 Oct, 2022 4 commits
- renamed cur_cat => cur_cat_idx and added some comments (#5522) · c35ecfbf
  Zhuyi Xue authored Oct 11, 2022
  
  c35ecfbf
- [python-package][R-package] load parameters from model file (fixes #2613) (#5424) · 8b720844
  José Morales authored Oct 11, 2022
  
  8b720844
- suppress alias warnings with verbosity<0 (fixes #4518) (#5253) · 46427128
  José Morales authored Oct 10, 2022
  
  46427128
- renamed tmp_num_sample_values to non_na_cnt (#5521) · c5391c97
  Zhuyi Xue authored Oct 10, 2022
  
  c5391c97
11 Sep, 2022 1 commit
- Remove redundant whitespaces (#5480) · 952458a9
  Ilya Chernov authored Sep 11, 2022
```
remove redundant whitespaces
```
  952458a9
07 Sep, 2022 1 commit

[CUDA] Add feature interaction constraint for cuda_exp (fix #4785) (#5474) · 1444a748

shiyu1994 authored Sep 07, 2022

* add feature interaction constraint for cuda_exp

* test feature interaction constraints for cuda_exp

* remove useless check

* update comment

1444a748

02 Sep, 2022 1 commit
- Rename Metadata num_classes to be more clear (#5461) · 7d1276ad
  Scott Votaw authored Sep 01, 2022
```
Rename num_classes to be more clear
```
  7d1276ad
29 Aug, 2022 1 commit

[ci][fix] Fix cuda_exp ci (#5438) · be7f3213

shiyu1994 authored Aug 29, 2022



* fix cuda_exp ci

* fix ci failures introduced by #5279

* cleanup cuda.yml

* fix test.sh

* clean up test.sh

* clean up test.sh

* skip lines by cuda_exp in test_register_logger

* Update tests/python_package_test/test_utilities.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

be7f3213

28 Aug, 2022 1 commit
- include parameters from reference dataset on subset (fixes #5402) (#5416) · 5079de4a
  José Morales authored Aug 28, 2022
```
* include parameters from reference dataset on copy

* lint

* set non-default parameters
```
  5079de4a
25 Aug, 2022 1 commit
- update tree to if-else (#5422) · 39eb041f
  José Morales authored Aug 25, 2022
```
* update tree to if-else

* add missing )

* fix case

* trigger ci
```
  39eb041f
16 Aug, 2022 1 commit

Add default definition for GetColWiseData and GetColWiseData (#5413) · 9489f878

shiyu1994 authored Aug 16, 2022

* add default definition for GetColWiseData and GetColWiseData

* fix warnings of template instantiation

* remove files in Makevars and LightGBM.vcxproj

9489f878

10 Aug, 2022 1 commit

feature: Add true streaming APIs to reduce client-side memory usage (#5299) · 0a5c5838

Scott Votaw authored Aug 10, 2022

* Extract streaming to own PR

* small merge fixes and cleanup

* linting fixes

* fix cast warning

* Fix accidental deletion during branch transfer

* responded to initial triage comments

* Added more tests to use create-from-samples APIs

* added mutex and adjusted nclasses logic

* Fix thread-safety for pushing data to sparse bins through Push APIs

* lint and doc fixes

* Small SWIG fix

* nit fix

* Responded to StrikerRUS comments

* fix breaking change after merge with master

* Extract streaming to own PR

* small merge fixes and cleanup

* Fix accidental deletion during branch transfer

* responded to initial triage comments

* Added more tests to use create-from-samples APIs

* Fix rstcheck call in ci

* remove TODOs

* Extract streaming to own PR

* small merge fixes and cleanup

* Fix accidental deletion during branch transfer

* responded to initial triage comments

* Added more tests to use create-from-samples APIs

* Small SWIG fix

* remove ci change

* responded to shiyu1994 comments

* responded to StrikerRUS comments

* Fixes from StrikerRUS comments

0a5c5838

30 Jul, 2022 1 commit

reproducible parameter alias resolution for wrappers (fixes #5304) (#5338) · 83627ff0

José Morales authored Jul 30, 2022

* dump sorted parameter aliases

* update lgb.check.wrapper_param

* update _choose_param_value to look like lgb.check.wrapper_param

* apply suggestions from review

* reduce diff

* move DumpAliases to config

* remove unnecessary check

* restore parameter check

83627ff0

29 Jul, 2022 1 commit

[CUDA] Initial work for boosting and evaluation with CUDA (#5279) · e0af160a

shiyu1994 authored Jul 29, 2022

* initial work for boosting and evaluation with CUDA

* fix compatibility with CPU code

* fix creating objective without USE_CUDA_EXP

* fix static analysis errors

* fix static analysis errors

e0af160a

21 Jul, 2022 1 commit

fix: Adjust LGBM_DatasetCreateFromSampledColumn to handle distributed data (#5344) · f94050a4

Scott Votaw authored Jul 21, 2022

* Adjust LGBM_DatasetCreateFromSampledColumn to handle distributed data better

* linting fix

* switch to 1 API with breaking change

* Fix pything native call

* more python test fixes

f94050a4

02 Jun, 2022 1 commit
- [c++][fix] check nullable of bin mappers in dataset_loader.cpp (fix #5221) (#5258) · fa9e4527
  shiyu1994 authored Jun 02, 2022
```
check nullable of bin mappers
```
  fa9e4527
29 May, 2022 1 commit
- Remove leftovers after the drop of Solaris support (#5248) · fb37e507
  Nikita Titov authored May 29, 2022
```
* Update tree.cpp

* Update common.h

* Update common.h
```
  fb37e507
10 May, 2022 1 commit

Fix potential overflow "Multiplication result converted to larger type" (#5189) · 6de9bafa

Nikita Titov authored May 10, 2022



* Update dataset_loader.cpp

* Update gbdt.h

* Update regression_objective.hpp

* Update linker_topo.cpp

* Update xentropy_objective.hpp

* Update regression_objective.hpp

* investigate inf test failure

* avoid overflow in regression objective

* remove `test_inf_handle` test
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

6de9bafa

01 May, 2022 2 commits
- fix precision lost in tree's ToIfElse (#5187) · a107c907
  Lipson authored May 02, 2022
  
  a107c907
- fix some wrong format specifiers (#5190) · 0fb09e77
  Nikita Titov authored May 01, 2022
```
* Update dataset_loader.cpp

* Update config.cpp

* Update application.cpp

* Update linkers_socket.cpp
```
  0fb09e77
26 Apr, 2022 1 commit
- [CUDA] Fix integer overflow in cuda row-wise data (#5167) · d893cd1f
  shiyu1994 authored Apr 26, 2022
  
  d893cd1f
13 Apr, 2022 1 commit
- check nullable of bin_mappers in DatasetLoader::CheckCategoricalFeatureNumBin (fix #5145) (#5146) · 0a4851f5
  shiyu1994 authored Apr 13, 2022
  
  0a4851f5
30 Mar, 2022 1 commit
- [CUDA] Fix row-wise histogram construction with dense data matrix (#5103) · 417c732c
  shiyu1994 authored Mar 30, 2022
```
* fix cuda exp with dense row wise

* disable usage of multi val group in cuda exp
```
  417c732c
27 Mar, 2022 1 commit

Log warnings for number of bins of categorical features (#4448) · d163c2c1

shiyu1994 authored Mar 28, 2022

* log warnings when number of bins of categorical features exceeds the configured maximum number of bins

* log only one warning information for all categorical features

* Add #include <memory> for unique_ptr

* remove useless param description

d163c2c1

26 Mar, 2022 1 commit
- Load initial scores with binary data files in CLI version (#4807) · 17d4e007
  shiyu1994 authored Mar 27, 2022
  
  17d4e007
23 Mar, 2022 1 commit

[CUDA] New CUDA version Part 1 (#4630) · 6b56a90c

shiyu1994 authored Mar 23, 2022



* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6b56a90c

22 Mar, 2022 1 commit
- clarify no-meaningful-features warning in Dataset construction (fixes #5081) (#5083) · b857ee10
  James Lamb authored Mar 22, 2022
```
* clarify no-meaningful-features warning in Dataset construction (fixes #5081)

* update tests
```
  b857ee10
17 Feb, 2022 1 commit
- pass train dataset parser config to valid dataset loading parser (#4985) · c61f0d2e
  chjinche authored Feb 18, 2022
  
  c61f0d2e
23 Dec, 2021 1 commit

clear memory of sample data right after BinMapper is constructed to save memory (#4890) · 2ef3cb81

xuchuanyin authored Dec 23, 2021

Sample data is useless after BinMapper is constructed, but the corresponding memory is still there before feature extraction is finished.

2ef3cb81

03 Dec, 2021 1 commit

Add C API function that returns all parameter names with their aliases (#4829) · cf38071b

Nikita Titov authored Dec 03, 2021



* add C API function that returns all param names with aliases

* add C API function that returns all param names with aliases

* add R code

* test R code

* remove debug CI

* fix R lint

* refactor

* run CI

* fix R

* fix

* revert CI checks

* revert changes in docs

* Try to make function `const`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* add `const` in cpp file

* address review comments and sync with `master`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

cf38071b

16 Nov, 2021 1 commit

Add customized parser support (#4782) · b0137deb

chjinche authored Nov 16, 2021

* add customized parser support

* fix typo of parser_config_file description

* make delimiter as parameter of JoinedLines

b0137deb

11 Nov, 2021 1 commit

Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) (#4746) · 3b6ebd79

Michael Mahoney authored Nov 10, 2021

* Add 'nrounds' as an alias for 'num_iterations'

* Improve tests

* Compare against nrounds directly

* Fix whitespace lints

3b6ebd79

29 Oct, 2021 1 commit
- Remove checks for label when loading dataset from binary file because label is... · 96ecab6f
  Nikita Titov authored Oct 29, 2021
```
Remove checks for label when loading dataset from binary file because label is ignored in that case (#4737)
```
  96ecab6f
28 Oct, 2021 1 commit

Improve warning wordings (#4731) · 765ceadc

Nikita Titov authored Oct 28, 2021

* Update dataset_loader.cpp

* Update dataset_loader.cpp

* Update dataset_loader.cpp

765ceadc

27 Oct, 2021 1 commit
- Add some warnings when loading dataset from binary file (#4724) · 5fbfa00b
  Nikita Titov authored Oct 28, 2021
  
  5fbfa00b
25 Oct, 2021 1 commit
- Fix some paramater hints when loading from binary file (#4701) · dc02dcaf
  Zhiyuan He authored Oct 25, 2021
```
Co-authored-by: hzy46 <email@example.com>
```
  dc02dcaf
20 Oct, 2021 1 commit
- Fix ASAN issues with `std::function` usage (#4673) · 13ed38ca
  david-cortes authored Oct 20, 2021
```
* don't compare std::function to nullptr ref #4633

* Update dataset_loader.h
```
  13ed38ca