1. 13 Oct, 2025 1 commit
  2. 23 Sep, 2025 1 commit
  3. 22 Sep, 2025 1 commit
    • Jeff Daily's avatar
      [ROCm] re-add support for ROCm builds · 61ec4f1a
      Jeff Daily authored
      Previously #6086 added ROCm support but after numerous rebases it lost
      critical changes. This PR restores the ROCm build.
      
      There are many source file changes but most were automated using the
      following:
      
      ```bash
      for f in `grep -rl '#ifdef USE_CUDA'`
      do
          sed -i 's@#ifdef USE_CUDA@#if defined(USE_CUDA) || defined(USE_ROCM)@g' $f
      done
      
      for f in `grep -rl '#endif  // USE_CUDA'`
      do
          sed -i 's@#endif  // USE_CUDA@#endif  // USE_CUDA || USE_ROCM@g' $f
      done
      ```
      61ec4f1a
  4. 23 Feb, 2024 1 commit
    • shiyu1994's avatar
      [c++][fix] Support Quantized Training with Categorical Features on CPU (#6301) · 776c5c3c
      shiyu1994 authored
      * support quantized training with categorical features on cpu
      
      * remove white spaces
      
      * add tests for quantized training with categorical features
      
      * skip tests for cuda version
      
      * fix cases when only 1 data block in row-wise quantized histogram construction with 8 inner bits
      
      * remove useless capture
      
      * fix compilation warnings
      
      revert useless changes
      
      * revert useless change
      
      * separate functions in feature histogram into cpp file
      
      * add feature_histogram.o in Makevars
      776c5c3c
  5. 10 Oct, 2023 1 commit
  6. 05 May, 2023 1 commit
    • shiyu1994's avatar
      Add quantized training (CPU part) (#5800) · 17ecfab3
      shiyu1994 authored
      * add quantized training (first stage)
      
      * add histogram construction functions for integer gradients
      
      * add stochastic rounding
      
      * update docs
      
      * fix compilation errors by adding template instantiations
      
      * update files for compilation
      
      * fix compilation of gpu version
      
      * initialize gradient discretizer before share states
      
      * add a test case for quantized training
      
      * add quantized training for data distributed training
      
      * Delete origin.pred
      
      * Delete ifelse.pred
      
      * Delete LightGBM_model.txt
      
      * remove useless changes
      
      * fix lint error
      
      * remove debug loggings
      
      * fix mismatch of vector and allocator types
      
      * remove changes in main.cpp
      
      * fix bugs with uninitialized gradient discretizer
      
      * initialize ordered gradients in gradient discretizer
      
      * disable quantized training with gpu and cuda
      
      fix msvc compilation errors and warnings
      
      * fix bug in data parallel tree learner
      
      * make quantized training test deterministic
      
      * make quantized training in test case more accurate
      
      * refactor test_quantized_training
      
      * fix leaf splits initialization with quantized training
      
      * check distributed quantized training result
      17ecfab3
  7. 01 Feb, 2023 1 commit
    • James Lamb's avatar
      [CUDA] consolidate CUDA versions (#5677) · 4f47547c
      James Lamb authored
      
      
      * [ci] speed up if-else, swig, and lint conda setup
      
      * add 'source activate'
      
      * python constraint
      
      * start removing cuda v1
      
      * comment out CI
      
      * remove more references
      
      * revert some unnecessaary changes
      
      * revert a few more mistakes
      
      * revert another change that ignored params
      
      * sigh
      
      * remove CUDATreeLearner
      
      * fix tests, docs
      
      * fix quoting in setup.py
      
      * restore all CI
      
      * Apply suggestions from code review
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * Apply suggestions from code review
      
      * completely remove cuda_exp, update docs
      
      ---------
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      4f47547c
  8. 23 Mar, 2022 1 commit
    • shiyu1994's avatar
      [CUDA] New CUDA version Part 1 (#4630) · 6b56a90c
      shiyu1994 authored
      
      
      * new cuda framework
      
      * add histogram construction kernel
      
      * before removing multi-gpu
      
      * new cuda framework
      
      * tree learner cuda kernels
      
      * single tree framework ready
      
      * single tree training framework
      
      * remove comments
      
      * boosting with cuda
      
      * optimize for best split find
      
      * data split
      
      * move boosting into cuda
      
      * parallel synchronize best split point
      
      * merge split data kernels
      
      * before code refactor
      
      * use tasks instead of features as units for split finding
      
      * refactor cuda best split finder
      
      * fix configuration error with small leaves in data split
      
      * skip histogram construction of too small leaf
      
      * skip split finding of invalid leaves
      
      stop when no leaf to split
      
      * support row wise with CUDA
      
      * copy data for split by column
      
      * copy data from host to CPU by column for data partition
      
      * add synchronize best splits for one leaf from multiple blocks
      
      * partition dense row data
      
      * fix sync best split from task blocks
      
      * add support for sparse row wise for CUDA
      
      * remove useless code
      
      * add l2 regression objective
      
      * sparse multi value bin enabled for CUDA
      
      * fix cuda ranking objective
      
      * support for number of items <= 2048 per query
      
      * speedup histogram construction by interleaving global memory access
      
      * split optimization
      
      * add cuda tree predictor
      
      * remove comma
      
      * refactor objective and score updater
      
      * before use struct
      
      * use structure for split information
      
      * use structure for leaf splits
      
      * return CUDASplitInfo directly after finding best split
      
      * split with CUDATree directly
      
      * use cuda row data in cuda histogram constructor
      
      * clean src/treelearner/cuda
      
      * gather shared cuda device functions
      
      * put shared CUDA functions into header file
      
      * change smaller leaf from <= back to < for consistent result with CPU
      
      * add tree predictor
      
      * remove useless cuda_tree_predictor
      
      * predict on CUDA with pipeline
      
      * add global sort algorithms
      
      * add global argsort for queries with many items in ranking tasks
      
      * remove limitation of maximum number of items per query in ranking
      
      * add cuda metrics
      
      * fix CUDA AUC
      
      * remove debug code
      
      * add regression metrics
      
      * remove useless file
      
      * don't use mask in shuffle reduce
      
      * add more regression objectives
      
      * fix cuda mape loss
      
      add cuda xentropy loss
      
      * use template for different versions of BitonicArgSortDevice
      
      * add multiclass metrics
      
      * add ndcg metric
      
      * fix cross entropy objectives and metrics
      
      * fix cross entropy and ndcg metrics
      
      * add support for customized objective in CUDA
      
      * complete multiclass ova for CUDA
      
      * separate cuda tree learner
      
      * use shuffle based prefix sum
      
      * clean up cuda_algorithms.hpp
      
      * add copy subset on CUDA
      
      * add bagging for CUDA
      
      * clean up code
      
      * copy gradients from host to device
      
      * support bagging without using subset
      
      * add support of bagging with subset for CUDAColumnData
      
      * add support of bagging with subset for dense CUDARowData
      
      * refactor copy sparse subrow
      
      * use copy subset for column subset
      
      * add reset train data and reset config for CUDA tree learner
      
      add deconstructors for cuda tree learner
      
      * add USE_CUDA ifdef to cuda tree learner files
      
      * check that dataset doesn't contain CUDA tree learner
      
      * remove printf debug information
      
      * use full new cuda tree learner only when using single GPU
      
      * disable all CUDA code when using CPU version
      
      * recover main.cpp
      
      * add cpp files for multi value bins
      
      * update LightGBM.vcxproj
      
      * update LightGBM.vcxproj
      
      fix lint errors
      
      * fix lint errors
      
      * fix lint errors
      
      * update Makevars
      
      fix lint errors
      
      * fix the case with 0 feature and 0 bin
      
      fix split finding for invalid leaves
      
      create cuda column data when loaded from bin file
      
      * fix lint errors
      
      hide GetRowWiseData when cuda is not used
      
      * recover default device type to cpu
      
      * fix na_as_missing case
      
      fix cuda feature meta information
      
      * fix UpdateDataIndexToLeafIndexKernel
      
      * create CUDA trees when needed in CUDADataPartition::UpdateTrainScore
      
      * add refit by tree for cuda tree learner
      
      * fix test_refit in test_engine.py
      
      * create set of large bin partitions in CUDARowData
      
      * add histogram construction for columns with a large number of bins
      
      * add find best split for categorical features on CUDA
      
      * add bitvectors for categorical split
      
      * cuda data partition split for categorical features
      
      * fix split tree with categorical feature
      
      * fix categorical feature splits
      
      * refactor cuda_data_partition.cu with multi-level templates
      
      * refactor CUDABestSplitFinder by grouping task information into struct
      
      * pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder
      
      * fix misuse of reference
      
      * remove useless changes
      
      * add support for path smoothing
      
      * virtual destructor for LightGBM::Tree
      
      * fix overlapped cat threshold in best split infos
      
      * reset histogram pointers in data partition and spllit finder in ResetConfig
      
      * comment useless parameter
      
      * fix reverse case when na is missing and default bin is zero
      
      * fix mfb_is_na and mfb_is_zero and is_single_feature_column
      
      * remove debug log
      
      * fix cat_l2 when one-hot
      
      fix gradient copy when data subset is used
      
      * switch shared histogram size according to CUDA version
      
      * gpu_use_dp=true when cuda test
      
      * revert modification in config.h
      
      * fix setting of gpu_use_dp=true in .ci/test.sh
      
      * fix linter errors
      
      * fix linter error
      
      remove useless change
      
      * recover main.cpp
      
      * separate cuda_exp and cuda
      
      * fix ci bash scripts
      
      add description for cuda_exp
      
      * add USE_CUDA_EXP flag
      
      * switch off USE_CUDA_EXP
      
      * revert changes in python-packages
      
      * more careful separation for USE_CUDA_EXP
      
      * fix CUDARowData::DivideCUDAFeatureGroups
      
      fix set fields for cuda metadata
      
      * revert config.h
      
      * fix test settings for cuda experimental version
      
      * skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version
      
      * fix lint issue by adding a blank line
      
      * fix lint errors by resorting imports
      
      * fix lint errors by resorting imports
      
      * fix lint errors by resorting imports
      
      * merge cuda.yml and cuda_exp.yml
      
      * update python version in cuda.yml
      
      * remove cuda_exp.yml
      
      * remove unrelated changes
      
      * fix compilation warnings
      
      fix cuda exp ci task name
      
      * recover task
      
      * use multi-level template in histogram construction
      
      check split only in debug mode
      
      * ignore NVCC related lines in parameter_generator.py
      
      * update job name for CUDA tests
      
      * apply review suggestions
      
      * Update .github/workflows/cuda.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update .github/workflows/cuda.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * update header
      
      * remove useless TODOs
      
      * remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062
      
      * #include <LightGBM/utils/log.h> for USE_CUDA_EXP only
      
      * fix include order
      
      * fix include order
      
      * remove extra space
      
      * address review comments
      
      * add warning when cuda_exp is used together with deterministic
      
      * add comment about gpu_use_dp in .ci/test.sh
      
      * revert changing order of included headers
      Co-authored-by: default avatarYu Shi <shiyu1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      6b56a90c
  9. 11 Jan, 2021 1 commit
  10. 23 Nov, 2020 1 commit
  11. 14 Nov, 2020 1 commit
  12. 13 Nov, 2020 1 commit
    • shiyu1994's avatar
      Optimization of row-wise histogram construction (#3522) · 0655d67c
      shiyu1994 authored
      
      
      * store without offset in multi_val_dense_bin
      
      * fix offset bug
      
      * add comment for offset
      
      * add comment for bin type selection
      
      * faster operations for offset
      
      * keep most freq bin in histogram for multi val dense
      
      * use original feature iterators
      
      * consider 9 cases (3 x 3) for multi val bin construction
      
      * fix dense bin setting
      
      * fix bin data in multi val group
      
      * fix offset of the first feature histogram
      
      * use float hist buf
      
      * avx in histogram construction
      
      * use avx for hist construction without prefetch
      
      * vectorize bin extraction
      
      * use only 128 vec
      
      * use avx2
      
      * use vectorization for sparse row wise
      
      * add bit size for multi val dense bin
      
      * float with no vectorization
      
      * change multithreading strategy to dynamic
      
      * remove intrinsic header
      
      * fix dense multi val col copy
      
      * remove bit size
      
      * use large enough block size when the bin number is large
      
      * calc min block size by sparsity
      
      * rescale gradients
      
      * rollback gradients scaling
      
      * single precision histogram buffer as an option
      
      * add float hist buffer with thread buffer
      
      * fix setting zero in hist data
      
      * fix hist begin pointer in tree learners
      
      * remove debug logs
      
      * remove omp simd
      
      * update Makevars of R-package
      
      * fix feature group binary storing
      
      * two row wise for double hist buffer
      
      * add subfeature for two row wise
      
      * remove useless code and fix two row wise
      
      * refactor code
      
      * grouping the dense feature groups can get sparse multi val bin
      
      * clean format problems
      
      * one thread for two blocks in sep row wise
      
      * use ordered gradients for sep row wise
      
      * fix grad ptr
      
      * ordered grad with combined block for sep row wise
      
      * fix block threading
      
      * use the same min block size
      
      * rollback share min block size
      
      * remove logs
      
      * Update src/io/dataset.cpp
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      
      * fix parameter description
      
      * remove sep_row_wise
      
      * remove check codes
      
      * add check for empty multi val bin
      
      * fix lint error
      
      * rollback changes in config.h
      
      * Apply suggestions from code review
      Co-authored-by: default avatarUbuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      0655d67c