Commits · a0fde1b00d40eb664945aeaca8c9159ff872206b · tianlh / LightGBM-DCU

24 Jul, 2025 1 commit

[ROCm] add support for ROCm/HIP device (#6086) · a0fde1b0

Jeff Daily authored Jul 23, 2025



* [ROCm] add support for ROCm/HIP

- CMakeLists.txt ROCm updates, also replace glob with explicit file list
- initial warpSize interop changes
- helpers/hipify.sh script added
- .gitignore to ignore generated hip source files

* more rocm updates

- disable compiler warnings
- move PercentileDevice __device__ template function into header
- bug fixes for __host__ __define__ and __HIP__ preprocessor symbols

* more bug fixes

* warp 32 vs 64 updates

* lint fixes

* missing device_index variable

* accidental inclusion of hip headers

* copyright notice compliance

* Update CMakeLists.txt
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix lint issue

* clean up

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update CMakeLists.txt
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* clean up CMakeLists.txt

use WARPSIZE

* use WARPSIZE

* fix share buffer size

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Yu Shi <yushi2@microsoft.com>

a0fde1b0

08 Oct, 2023 1 commit

[CUDA] CUDA Quantized Training (fixes #5606) (#5933) · f901f471

shiyu1994 authored Oct 08, 2023

* add quantized training (first stage)

* add histogram construction functions for integer gradients

* add stochastic rounding

* update docs

* fix compilation errors by adding template instantiations

* update files for compilation

* fix compilation of gpu version

* initialize gradient discretizer before share states

* add a test case for quantized training

* add quantized training for data distributed training

* Delete origin.pred

* Delete ifelse.pred

* Delete LightGBM_model.txt

* remove useless changes

* fix lint error

* remove debug loggings

* fix mismatch of vector and allocator types

* remove changes in main.cpp

* fix bugs with uninitialized gradient discretizer

* initialize ordered gradients in gradient discretizer

* disable quantized training with gpu and cuda

fix msvc compilation errors and warnings

* fix bug in data parallel tree learner

* make quantized training test deterministic

* make quantized training in test case more accurate

* refactor test_quantized_training

* fix leaf splits initialization with quantized training

* check distributed quantized training result

* add cuda gradient discretizer

* add quantized training for CUDA version in tree learner

* remove cuda computability 6.1 and 6.2

* fix parts of gpu quantized training errors and warnings

* fix build-python.sh to install locally built version

* fix memory access bugs

* fix lint errors

* mark cuda quantized training on cuda with categorical features as unsupported

* rename cuda_utils.h to cuda_utils.hu

* enable quantized training with cuda

* fix cuda quantized training with sparse row data

* allow using global memory buffer in histogram construction with cuda quantized training

* recover build-python.sh

enlarge allowed package size to 100M

f901f471