Commits · a0fde1b00d40eb664945aeaca8c9159ff872206b · tianlh / LightGBM-DCU

24 Jul, 2025 1 commit

[ROCm] add support for ROCm/HIP device (#6086) · a0fde1b0

Jeff Daily authored Jul 23, 2025



* [ROCm] add support for ROCm/HIP

- CMakeLists.txt ROCm updates, also replace glob with explicit file list
- initial warpSize interop changes
- helpers/hipify.sh script added
- .gitignore to ignore generated hip source files

* more rocm updates

- disable compiler warnings
- move PercentileDevice __device__ template function into header
- bug fixes for __host__ __define__ and __HIP__ preprocessor symbols

* more bug fixes

* warp 32 vs 64 updates

* lint fixes

* missing device_index variable

* accidental inclusion of hip headers

* copyright notice compliance

* Update CMakeLists.txt
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix lint issue

* clean up

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* clean up CMakeLists.txt

use WARPSIZE

* use WARPSIZE

* fix share buffer size

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Yu Shi <yushi2@microsoft.com>

a0fde1b0

07 Feb, 2025 1 commit
- [ci] [python-package] update pre-commit hooks to latest versions (#6817) · 81922a7e
  James Lamb authored Feb 07, 2025
  
  81922a7e
02 Jan, 2025 1 commit
- [CUDA] remove src/treelearner/kernels (#6766) · f3bd64a1
  shiyu1994 authored Jan 02, 2025
```
* remove src/treelearner/kernels

* Update CMakeLists.txt

* clean up
```
  f3bd64a1
15 Dec, 2024 1 commit

[ci] use Ruff linter instead of isort (#6755) · c2f3807c

Nikita Titov authored Dec 15, 2024

* Update append-comment.sh

* Update static_analysis.yml

* Update static_analysis.yml

* Update basic.py

* Update basic.py

* Update .pre-commit-config.yaml

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update pyproject.toml

* Update interactive_plot_example.ipynb

* Update pyproject.toml

* Update append-comment.sh

* Update basic.py

* Update basic.py

* Update pyproject.toml

* Update .pre-commit-config.yaml

* Update basic.py

* Update basic.py

* Update test_basic.R

* Update rank_objective.hpp

* Update histogram_16_64_256.cu

* Update static_analysis.yml

* ensure alphabetical order of rules

c2f3807c

11 Dec, 2024 1 commit

[c++] fix parallel_tree_learner_split_info (#6738) · 186c7cd4

Murphy Liang authored Dec 11, 2024


Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

186c7cd4

01 Dec, 2024 1 commit
- [ci] Introduce `typos` pre-commit hook (#6564) · 784f3841
  Oliver Borchert authored Dec 01, 2024
```
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
```
  784f3841
18 Oct, 2024 1 commit

[GPU] Add support for linear tree with device=gpu (#6567) · c7d3ac1b

dragonbra authored Oct 18, 2024



* basic gpu_linear_tree_learner implementation

* corresponding config of gpu linear tree

* Update src/io/config.cpp
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* work around for gpu linear tree learner without gpu enabled

* add #endif

* add #ifdef USE_GPU

* fix lint problems

* fix compilation when USE_GPU is OFF

* add destructor

* add gpu_linear_tree_learner.cpp in make file list

* use template for linear tree learner

---------
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

c7d3ac1b

13 Oct, 2024 1 commit
- [c++] Fix `dump_model()` information for root node (#6569) · bbeecc09
  Atanas Dimitrov authored Oct 13, 2024
```
Co-authored-by: Atanas Dimitrov <nasko119@abv.bg>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
```
  bbeecc09
19 Mar, 2024 1 commit
- [ci] prevent trailing whitespace, ensure files end with newline (#6373) · 631e0a2a
  James Lamb authored Mar 18, 2024
  
  631e0a2a
23 Feb, 2024 1 commit

[c++][fix] Support Quantized Training with Categorical Features on CPU (#6301) · 776c5c3c

shiyu1994 authored Feb 23, 2024

* support quantized training with categorical features on cpu

* remove white spaces

* add tests for quantized training with categorical features

* skip tests for cuda version

* fix cases when only 1 data block in row-wise quantized histogram construction with 8 inner bits

* remove useless capture

* fix compilation warnings

revert useless changes

* revert useless change

* separate functions in feature histogram into cpp file

* add feature_histogram.o in Makevars

776c5c3c

20 Feb, 2024 1 commit

Fix calculation of number of bins in FindGroup (#6019) · d0d70716

CVPaul authored Feb 20, 2024

* solve 'bin size 257 cannot run on GPU #3339'

https://github.com/microsoft/LightGBM/issues/3339#issuecomment-1665131743



* fix  typo LeafIndex -> leaf_index

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

d0d70716

17 Jan, 2024 1 commit
- [R-package] [ci] remove unnecessary include in linear_tree_learner (fixes #6264) (#6265) · 255c93b5
  James Lamb authored Jan 16, 2024
  
  255c93b5
22 Nov, 2023 1 commit
- [CUDA] fix typo in error message (#6207) · bc694222
  James Lamb authored Nov 22, 2023
  
  bc694222
10 Oct, 2023 1 commit
- set explicit number of threads in every OpenMP `parallel` region (#6135) · 8ed371ce
  James Lamb authored Oct 09, 2023
  
  8ed371ce
09 Oct, 2023 1 commit
- factor out uses of omp_get_num_threads() and omp_get_max_threads() outside of... · 992f5056
  James Lamb authored Oct 08, 2023
```
factor out uses of omp_get_num_threads() and omp_get_max_threads() outside of OpenMP wrapper (#6133)
```
  992f5056
08 Oct, 2023 1 commit

[CUDA] CUDA Quantized Training (fixes #5606) (#5933) · f901f471

shiyu1994 authored Oct 08, 2023

* add quantized training (first stage)

* add histogram construction functions for integer gradients

* add stochastic rounding

* update docs

* fix compilation errors by adding template instantiations

* update files for compilation

* fix compilation of gpu version

* initialize gradient discretizer before share states

* add a test case for quantized training

* add quantized training for data distributed training

* Delete origin.pred

* Delete ifelse.pred

* Delete LightGBM_model.txt

* remove useless changes

* fix lint error

* remove debug loggings

* fix mismatch of vector and allocator types

* remove changes in main.cpp

* fix bugs with uninitialized gradient discretizer

* initialize ordered gradients in gradient discretizer

* disable quantized training with gpu and cuda

fix msvc compilation errors and warnings

* fix bug in data parallel tree learner

* make quantized training test deterministic

* make quantized training in test case more accurate

* refactor test_quantized_training

* fix leaf splits initialization with quantized training

* check distributed quantized training result

* add cuda gradient discretizer

* add quantized training for CUDA version in tree learner

* remove cuda computability 6.1 and 6.2

* fix parts of gpu quantized training errors and warnings

* fix build-python.sh to install locally built version

* fix memory access bugs

* fix lint errors

* mark cuda quantized training on cuda with categorical features as unsupported

* rename cuda_utils.h to cuda_utils.hu

* enable quantized training with cuda

* fix cuda quantized training with sparse row data

* allow using global memory buffer in histogram construction with cuda quantized training

* recover build-python.sh

enlarge allowed package size to 100M

f901f471

12 Sep, 2023 1 commit

[fix] fix quantized training (fixes #5982) (fixes #5994) (#6092) · a92bf374

shiyu1994 authored Sep 13, 2023



* fix leaf splits update after split in quantized training

* fix preparation ordered gradients for quantized training

* remove force_row_wise in distributed test for quantized training

* Update src/treelearner/leaf_splits.hpp

---------
Co-authored-by: James Lamb <jaylamb20@gmail.com>

a92bf374

12 Jul, 2023 1 commit
- [c++] virtual destructor for gradient discretizer (#5965) · 28e0e6f0
  shiyu1994 authored Jul 12, 2023
  
  28e0e6f0
30 Jun, 2023 1 commit
- move LightGBM-vendored json11 into a LightGBM-specific namespace (fixes #5944) (#5946) · 9f78ccee
  maskedcoder1337 authored Jun 30, 2023
  
  9f78ccee
05 May, 2023 1 commit

Add quantized training (CPU part) (#5800) · 17ecfab3

shiyu1994 authored May 05, 2023

* add quantized training (first stage)

* add histogram construction functions for integer gradients

* add stochastic rounding

* update docs

* fix compilation errors by adding template instantiations

* update files for compilation

* fix compilation of gpu version

* initialize gradient discretizer before share states

* add a test case for quantized training

* add quantized training for data distributed training

* Delete origin.pred

* Delete ifelse.pred

* Delete LightGBM_model.txt

* remove useless changes

* fix lint error

* remove debug loggings

* fix mismatch of vector and allocator types

* remove changes in main.cpp

* fix bugs with uninitialized gradient discretizer

* initialize ordered gradients in gradient discretizer

* disable quantized training with gpu and cuda

fix msvc compilation errors and warnings

* fix bug in data parallel tree learner

* make quantized training test deterministic

* make quantized training in test case more accurate

* refactor test_quantized_training

* fix leaf splits initialization with quantized training

* check distributed quantized training result

17ecfab3

15 Mar, 2023 1 commit
- Fix DEBUG-mode GPU builds (#5778) · 88110631
  Aleksandar Bojarov authored Mar 15, 2023
```
Fix for DEBUG mode

This commit fixes issue #5777
```
  88110631
01 Feb, 2023 1 commit

[CUDA] consolidate CUDA versions (#5677) · 4f47547c

James Lamb authored Jan 31, 2023



* [ci] speed up if-else, swig, and lint conda setup

* add 'source activate'

* python constraint

* start removing cuda v1

* comment out CI

* remove more references

* revert some unnecessaary changes

* revert a few more mistakes

* revert another change that ignored params

* sigh

* remove CUDATreeLearner

* fix tests, docs

* fix quoting in setup.py

* restore all CI

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Apply suggestions from code review

* completely remove cuda_exp, update docs

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

4f47547c

11 Sep, 2022 1 commit
- Remove redundant whitespaces (#5480) · 952458a9
  Ilya Chernov authored Sep 11, 2022
```
remove redundant whitespaces
```
  952458a9
07 Sep, 2022 1 commit

[CUDA] Add feature interaction constraint for cuda_exp (fix #4785) (#5474) · 1444a748

shiyu1994 authored Sep 07, 2022

* add feature interaction constraint for cuda_exp

* test feature interaction constraints for cuda_exp

* remove useless check

* update comment

1444a748

02 Sep, 2022 1 commit

[CUDA] Add Huber regression objective for cuda_exp (#5462) · 45c53f78

shiyu1994 authored Sep 02, 2022

* add huber regression for cuda_exp

* renew tree output on GPU

add test cases for regression objectives

* remove useless changes

* add white space

* fix test_regression

45c53f78

29 Aug, 2022 1 commit

[ci][fix] Fix cuda_exp ci (#5438) · be7f3213

shiyu1994 authored Aug 29, 2022



* fix cuda_exp ci

* fix ci failures introduced by #5279

* cleanup cuda.yml

* fix test.sh

* clean up test.sh

* clean up test.sh

* skip lines by cuda_exp in test_register_logger

* Update tests/python_package_test/test_utilities.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

be7f3213

03 Aug, 2022 1 commit

Fix potential overflow in linear trees (#5395) · e2dfcd69

Nikita Titov authored Aug 03, 2022



* Fix potential overflow in linear trees

* simplify
Co-authored-by: James Lamb <jaylamb20@gmail.com>

e2dfcd69

29 Jul, 2022 2 commits
- Use double precision in threaded calculation of linear tree coefficients (fixes #5226) (#5368) · 44d37184
  Belinda Trotta authored Jul 30, 2022
  
  44d37184
- [CUDA] Initial work for boosting and evaluation with CUDA (#5279) · e0af160a
  shiyu1994 authored Jul 29, 2022
```
* initial work for boosting and evaluation with CUDA

* fix compatibility with CPU code

* fix creating objective without USE_CUDA_EXP

* fix static analysis errors

* fix static analysis errors
```
  e0af160a
08 Jun, 2022 1 commit

Clear split info buffer in cost efficient gradient boosting before every... · f1328d5c

shiyu1994 authored Jun 08, 2022

Clear split info buffer in cost efficient gradient boosting before every iteration (fix partially #3679) (#5164)

* clear split info buffer in cegb_ before every iteration

* check nullable of cegb_ in serial_tree_learner.cpp

* add a test case for checking the split buffer in CEGB

* swith to Threading::For instead of raw OpenMP

* apply review suggestions

* apply review comments

* remove device cpu

f1328d5c

26 Apr, 2022 1 commit
- [CUDA] Fix integer overflow in cuda row-wise data (#5167) · d893cd1f
  shiyu1994 authored Apr 26, 2022
  
  d893cd1f
24 Apr, 2022 1 commit
- fix typo in CEGB method name (#5168) · 3d25e373
  James Lamb authored Apr 23, 2022
  
  3d25e373
30 Mar, 2022 1 commit
- [CUDA] Fix row-wise histogram construction with dense data matrix (#5103) · 417c732c
  shiyu1994 authored Mar 30, 2022
```
* fix cuda exp with dense row wise

* disable usage of multi val group in cuda exp
```
  417c732c
27 Mar, 2022 1 commit

Log warnings for number of bins of categorical features (#4448) · d163c2c1

shiyu1994 authored Mar 28, 2022

* log warnings when number of bins of categorical features exceeds the configured maximum number of bins

* log only one warning information for all categorical features

* Add #include <memory> for unique_ptr

* remove useless param description

d163c2c1

23 Mar, 2022 1 commit

[CUDA] New CUDA version Part 1 (#4630) · 6b56a90c

shiyu1994 authored Mar 23, 2022



* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6b56a90c

20 Feb, 2022 1 commit

CUDATreeLearner: free GPU memory in destructor if any allocated (#4963) · 0db573c3

Dzianis Dus authored Feb 20, 2022

* CUDATreeLearner: free GPU memory in destruuctor if any allocated

* Minor changes: checking for num_gpu_feature_groups is not needed

* Trigger CI again

0db573c3

08 Jan, 2022 1 commit
- fix gpu allocate memory overflow (#4928) · 305369dd
  文佳鹏 authored Jan 08, 2022
  
  305369dd
10 Nov, 2021 1 commit

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) (#4725) · 33a2f9ec

tongwu-msft authored Nov 10, 2021

* issue fix #4601

* fix issue 4601 it2

* add tests for issue 4601

* fix warning

* fix warning

* add new line at end

* remove last line at end

* fix lint warning

* address comments

* address comments

* address comments

* fix address

* address comments

* revert seed

* fix recursive force split issue

* fix build error

* fix lint warning

33a2f9ec

23 Sep, 2021 1 commit

simplify and speed up comparisons for splits with identical gains (#4542) · b52ecb16

James Lamb authored Sep 22, 2021

* fix incorrect behavior of SplitInfo == operator for splits with identical gains

* LightSplitInfo too, and improve comment

* dont check features unnecessarily

* update LightSplitInfo too

b52ecb16

28 Jun, 2021 1 commit
- [CUDA] fix CUDA memory error by reducing block number (fixed #4315) (#4327) · 77d9529d
  Robin Dong authored Jun 28, 2021
  
  77d9529d