1. 22 Sep, 2025 1 commit
    • Jeff Daily's avatar
      [ROCm] re-add support for ROCm builds · 61ec4f1a
      Jeff Daily authored
      Previously #6086 added ROCm support but after numerous rebases it lost
      critical changes. This PR restores the ROCm build.
      
      There are many source file changes but most were automated using the
      following:
      
      ```bash
      for f in `grep -rl '#ifdef USE_CUDA'`
      do
          sed -i 's@#ifdef USE_CUDA@#if defined(USE_CUDA) || defined(USE_ROCM)@g' $f
      done
      
      for f in `grep -rl '#endif  // USE_CUDA'`
      do
          sed -i 's@#endif  // USE_CUDA@#endif  // USE_CUDA || USE_ROCM@g' $f
      done
      ```
      61ec4f1a
  2. 08 Oct, 2023 1 commit
    • shiyu1994's avatar
      [CUDA] CUDA Quantized Training (fixes #5606) (#5933) · f901f471
      shiyu1994 authored
      * add quantized training (first stage)
      
      * add histogram construction functions for integer gradients
      
      * add stochastic rounding
      
      * update docs
      
      * fix compilation errors by adding template instantiations
      
      * update files for compilation
      
      * fix compilation of gpu version
      
      * initialize gradient discretizer before share states
      
      * add a test case for quantized training
      
      * add quantized training for data distributed training
      
      * Delete origin.pred
      
      * Delete ifelse.pred
      
      * Delete LightGBM_model.txt
      
      * remove useless changes
      
      * fix lint error
      
      * remove debug loggings
      
      * fix mismatch of vector and allocator types
      
      * remove changes in main.cpp
      
      * fix bugs with uninitialized gradient discretizer
      
      * initialize ordered gradients in gradient discretizer
      
      * disable quantized training with gpu and cuda
      
      fix msvc compilation errors and warnings
      
      * fix bug in data parallel tree learner
      
      * make quantized training test deterministic
      
      * make quantized training in test case more accurate
      
      * refactor test_quantized_training
      
      * fix leaf splits initialization with quantized training
      
      * check distributed quantized training result
      
      * add cuda gradient discretizer
      
      * add quantized training for CUDA version in tree learner
      
      * remove cuda computability 6.1 and 6.2
      
      * fix parts of gpu quantized training errors and warnings
      
      * fix build-python.sh to install locally built version
      
      * fix memory access bugs
      
      * fix lint errors
      
      * mark cuda quantized training on cuda with categorical features as unsupported
      
      * rename cuda_utils.h to cuda_utils.hu
      
      * enable quantized training with cuda
      
      * fix cuda quantized training with sparse row data
      
      * allow using global memory buffer in histogram construction with cuda quantized training
      
      * recover build-python.sh
      
      enlarge allowed package size to 100M
      f901f471