1. 24 Jul, 2025 1 commit
  2. 05 Dec, 2024 1 commit
  3. 01 Dec, 2024 1 commit
  4. 13 Oct, 2024 1 commit
  5. 02 Oct, 2024 1 commit
  6. 08 Oct, 2023 1 commit
    • shiyu1994's avatar
      [CUDA] CUDA Quantized Training (fixes #5606) (#5933) · f901f471
      shiyu1994 authored
      * add quantized training (first stage)
      
      * add histogram construction functions for integer gradients
      
      * add stochastic rounding
      
      * update docs
      
      * fix compilation errors by adding template instantiations
      
      * update files for compilation
      
      * fix compilation of gpu version
      
      * initialize gradient discretizer before share states
      
      * add a test case for quantized training
      
      * add quantized training for data distributed training
      
      * Delete origin.pred
      
      * Delete ifelse.pred
      
      * Delete LightGBM_model.txt
      
      * remove useless changes
      
      * fix lint error
      
      * remove debug loggings
      
      * fix mismatch of vector and allocator types
      
      * remove changes in main.cpp
      
      * fix bugs with uninitialized gradient discretizer
      
      * initialize ordered gradients in gradient discretizer
      
      * disable quantized training with gpu and cuda
      
      fix msvc compilation errors and warnings
      
      * fix bug in data parallel tree learner
      
      * make quantized training test deterministic
      
      * make quantized training in test case more accurate
      
      * refactor test_quantized_training
      
      * fix leaf splits initialization with quantized training
      
      * check distributed quantized training result
      
      * add cuda gradient discretizer
      
      * add quantized training for CUDA version in tree learner
      
      * remove cuda computability 6.1 and 6.2
      
      * fix parts of gpu quantized training errors and warnings
      
      * fix build-python.sh to install locally built version
      
      * fix memory access bugs
      
      * fix lint errors
      
      * mark cuda quantized training on cuda with categorical features as unsupported
      
      * rename cuda_utils.h to cuda_utils.hu
      
      * enable quantized training with cuda
      
      * fix cuda quantized training with sparse row data
      
      * allow using global memory buffer in histogram construction with cuda quantized training
      
      * recover build-python.sh
      
      enlarge allowed package size to 100M
      f901f471
  7. 13 Aug, 2023 1 commit
  8. 16 Jun, 2023 1 commit
  9. 21 Mar, 2023 1 commit
  10. 01 Feb, 2023 1 commit
    • James Lamb's avatar
      [CUDA] consolidate CUDA versions (#5677) · 4f47547c
      James Lamb authored
      
      
      * [ci] speed up if-else, swig, and lint conda setup
      
      * add 'source activate'
      
      * python constraint
      
      * start removing cuda v1
      
      * comment out CI
      
      * remove more references
      
      * revert some unnecessaary changes
      
      * revert a few more mistakes
      
      * revert another change that ignored params
      
      * sigh
      
      * remove CUDATreeLearner
      
      * fix tests, docs
      
      * fix quoting in setup.py
      
      * restore all CI
      
      * Apply suggestions from code review
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * Apply suggestions from code review
      
      * completely remove cuda_exp, update docs
      
      ---------
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      4f47547c
  11. 28 Dec, 2022 1 commit
    • Yifei Liu's avatar
      Decouple Boosting Types (fixes #3128) (#4827) · fffd066c
      Yifei Liu authored
      
      
      * add parameter data_sample_strategy
      
      * abstract GOSS as a sample strategy(GOSS1), togetherwith origial GOSS (Normal Bagging has not been abstracted, so do NOT use it now)
      
      * abstract Bagging as a subclass (BAGGING), but original Bagging members in GBDT are still kept
      
      * fix some variables
      
      * remove GOSS(as boost) and Bagging logic in GBDT
      
      * rename GOSS1 to GOSS(as sample strategy)
      
      * add warning about use GOSS as boosting_type
      
      * a little ; bug
      
      * remove CHECK when "gradients != nullptr"
      
      * rename DataSampleStrategy to avoid confusion
      
      * remove and add some ccomments, followingconvention
      
      * fix bug about GBDT::ResetConfig (ObjectiveFunction inconsistencty bet…
      
      * add std::ignore to avoid compiler warnings (anpotential fails)
      
      * update Makevars and vcxproj
      
      * handle constant hessian
      
      move resize of gradient vectors out of sample strategy
      
      * mark override for IsHessianChange
      
      * fix lint errors
      
      * rerun parameter_generator.py
      
      * update config_auto.cpp
      
      * delete redundant blank line
      
      * update num_data_ when train_data_ is updated
      
      set gradients and hessians when GOSS
      
      * check bagging_freq is not zero
      
      * reset config_ value
      
      merge ResetBaggingConfig and ResetGOSS
      
      * remove useless check
      
      * add ttests in test_engine.py
      
      * remove whitespace in blank line
      
      * remove arguments verbose_eval and evals_result
      
      * Update tests/python_package_test/test_engine.py
      
      reduce num_boost_round
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_engine.py
      
      reduce num_boost_round
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_engine.py
      
      reduce num_boost_round
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_engine.py
      
      reduce num_boost_round
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_engine.py
      
      reduce num_boost_round
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_engine.py
      
      reduce num_boost_round
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update src/boosting/sample_strategy.cpp
      
      modify warning about setting goss as `boosting_type`
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update tests/python_package_test/test_engine.py
      
      replace load_boston() with make_regression()
      
      remove value checks of mean_squared_error in test_sample_strategy_with_boosting()
      
      * Update tests/python_package_test/test_engine.py
      
      add value checks of mean_squared_error in test_sample_strategy_with_boosting()
      
      * Modify warnning about using goss as boosting type
      
      * Update tests/python_package_test/test_engine.py
      
      add random_state=42 for make_regression()
      
      reduce the threshold of mean_square_error
      
      * Update src/boosting/sample_strategy.cpp
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * remove goss from boosting types in documentation
      
      * Update src/boosting/bagging.hpp
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update src/boosting/bagging.hpp
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update src/boosting/goss.hpp
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update src/boosting/goss.hpp
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * rename GOSS with GOSSStrategy
      
      * update doc
      
      * address comments
      
      * fix table in doc
      
      * Update include/LightGBM/config.h
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * update documentation
      
      * update test case
      
      * revert useless change in test_engine.py
      
      * add tests for evaluation results in test_sample_strategy_with_boosting
      
      * include <string>
      
      * change to assert_allclose in test_goss_boosting_and_strategy_equivalent
      
      * more tolerance in result checking, due to minor difference in results of gpu versions
      
      * change == to np.testing.assert_allclose
      
      * fix test case
      
      * set gpu_use_dp to true
      
      * change --report to --report-level for rstcheck
      
      * use gpu_use_dp=true in test_goss_boosting_and_strategy_equivalent
      
      * revert unexpected changes of non-ascii characters
      
      * revert unexpected changes of non-ascii characters
      
      * remove useless changes
      
      * allocate gradients_pointer_ and hessians_pointer when necessary
      
      * add spaces
      
      * remove redundant virtual
      
      * include <LightGBM/utils/log.h> for USE_CUDA
      
      * check for  in test_goss_boosting_and_strategy_equivalent
      
      * check for identity in test_sample_strategy_with_boosting
      
      * remove cuda  option in test_sample_strategy_with_boosting
      
      * Update tests/python_package_test/test_engine.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update tests/python_package_test/test_engine.py
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * ResetGradientBuffers after ResetSampleConfig
      
      * ResetGradientBuffers after ResetSampleConfig
      
      * ResetGradientBuffers after bagging
      
      * remove useless code
      
      * check objective_function_ instead of gradients
      
      * enable rf with goss
      
      simplify params in test cases
      
      * remove useless changes
      
      * allow rf with feature subsampling alone
      
      * change position of ResetGradientBuffers
      
      * check for dask
      
      * add parameter types for data_sample_strategy
      Co-authored-by: default avatarGuangda Liu <v-guangdaliu@microsoft.com>
      Co-authored-by: default avatarYu Shi <shiyu_k1994@qq.com>
      Co-authored-by: default avatarGuangdaLiu <90019144+GuangdaLiu@users.noreply.github.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      fffd066c
  12. 27 Dec, 2022 1 commit
    • shiyu1994's avatar
      [CUDA] Add L2 metric for new CUDA version (#5633) · 6482b47e
      shiyu1994 authored
      * add rmse metric for new cuda version
      
      * add Init for CUDAMetricInterface
      
      * fix lint errors
      
      * fix rmse and add l2 metric for new cuda version
      
      * use CUDAL2Metric
      
      * explicit template instantiation
      
      * write result only with the first thread
      
      * pre allocate buffer for output converting
      
      * fix l2 regression with cuda metric evaluation
      
      * weighting loss in cuda metric evaluation
      
      * mark CUDATree::AsConstantTree as override
      6482b47e
  13. 02 Dec, 2022 1 commit
  14. 27 Nov, 2022 1 commit
    • shiyu1994's avatar
      [CUDA] Add Poisson regression objective for cuda_exp and refactor objective... · 24af9fa5
      shiyu1994 authored
      
      [CUDA] Add Poisson regression objective for cuda_exp and refactor objective functions for cuda_exp (#5486)
      
      * add poisson regression objective for cuda_exp
      
      * enable Poisson regression for cuda_exp
      
      * refactor cuda objective functions
      
      * remove useless changes
      
      * fix linter errors
      
      * remove redundant buffer in cuda poisson regression objective
      
      * fix log of cuda_exp binary objective
      
      * fix threshold of poisson objective result
      
      * remove useless changes
      
      * fix compilation errors
      
      * add cuda quantile regression objective
      
      * remove cuda quantile regression objective
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      24af9fa5
  15. 07 Sep, 2022 1 commit
  16. 05 Sep, 2022 2 commits
  17. 01 Sep, 2022 1 commit
  18. 31 Aug, 2022 2 commits
    • shiyu1994's avatar
      [CUDA] L2 regression objective for cuda_exp (#5452) · 9e89ee7f
      shiyu1994 authored
      * add (l2) regression objective for cuda_exp
      
      * fix lint errors
      
      * correct time tag
      9e89ee7f
    • shiyu1994's avatar
      [CUDA] Add binary objective for cuda_exp (#5425) · 2b8fe8b4
      shiyu1994 authored
      * add binary objective for cuda_exp
      
      * include <string> and <vector>
      
      * exchange include ordering
      
      * fix length of score to copy in evaluation
      
      * fix EvalOneMetric
      
      * fix cuda binary objective and prediction when boosting on gpu
      
      * Add white space
      
      * fix BoostFromScore for CUDABinaryLogloss
      
      update log in test_register_logger
      
      * include <algorithm>
      
      * simplify shared memory buffer
      2b8fe8b4
  19. 29 Aug, 2022 1 commit
  20. 29 Jul, 2022 1 commit
  21. 23 Mar, 2022 1 commit
    • shiyu1994's avatar
      [CUDA] New CUDA version Part 1 (#4630) · 6b56a90c
      shiyu1994 authored
      
      
      * new cuda framework
      
      * add histogram construction kernel
      
      * before removing multi-gpu
      
      * new cuda framework
      
      * tree learner cuda kernels
      
      * single tree framework ready
      
      * single tree training framework
      
      * remove comments
      
      * boosting with cuda
      
      * optimize for best split find
      
      * data split
      
      * move boosting into cuda
      
      * parallel synchronize best split point
      
      * merge split data kernels
      
      * before code refactor
      
      * use tasks instead of features as units for split finding
      
      * refactor cuda best split finder
      
      * fix configuration error with small leaves in data split
      
      * skip histogram construction of too small leaf
      
      * skip split finding of invalid leaves
      
      stop when no leaf to split
      
      * support row wise with CUDA
      
      * copy data for split by column
      
      * copy data from host to CPU by column for data partition
      
      * add synchronize best splits for one leaf from multiple blocks
      
      * partition dense row data
      
      * fix sync best split from task blocks
      
      * add support for sparse row wise for CUDA
      
      * remove useless code
      
      * add l2 regression objective
      
      * sparse multi value bin enabled for CUDA
      
      * fix cuda ranking objective
      
      * support for number of items <= 2048 per query
      
      * speedup histogram construction by interleaving global memory access
      
      * split optimization
      
      * add cuda tree predictor
      
      * remove comma
      
      * refactor objective and score updater
      
      * before use struct
      
      * use structure for split information
      
      * use structure for leaf splits
      
      * return CUDASplitInfo directly after finding best split
      
      * split with CUDATree directly
      
      * use cuda row data in cuda histogram constructor
      
      * clean src/treelearner/cuda
      
      * gather shared cuda device functions
      
      * put shared CUDA functions into header file
      
      * change smaller leaf from <= back to < for consistent result with CPU
      
      * add tree predictor
      
      * remove useless cuda_tree_predictor
      
      * predict on CUDA with pipeline
      
      * add global sort algorithms
      
      * add global argsort for queries with many items in ranking tasks
      
      * remove limitation of maximum number of items per query in ranking
      
      * add cuda metrics
      
      * fix CUDA AUC
      
      * remove debug code
      
      * add regression metrics
      
      * remove useless file
      
      * don't use mask in shuffle reduce
      
      * add more regression objectives
      
      * fix cuda mape loss
      
      add cuda xentropy loss
      
      * use template for different versions of BitonicArgSortDevice
      
      * add multiclass metrics
      
      * add ndcg metric
      
      * fix cross entropy objectives and metrics
      
      * fix cross entropy and ndcg metrics
      
      * add support for customized objective in CUDA
      
      * complete multiclass ova for CUDA
      
      * separate cuda tree learner
      
      * use shuffle based prefix sum
      
      * clean up cuda_algorithms.hpp
      
      * add copy subset on CUDA
      
      * add bagging for CUDA
      
      * clean up code
      
      * copy gradients from host to device
      
      * support bagging without using subset
      
      * add support of bagging with subset for CUDAColumnData
      
      * add support of bagging with subset for dense CUDARowData
      
      * refactor copy sparse subrow
      
      * use copy subset for column subset
      
      * add reset train data and reset config for CUDA tree learner
      
      add deconstructors for cuda tree learner
      
      * add USE_CUDA ifdef to cuda tree learner files
      
      * check that dataset doesn't contain CUDA tree learner
      
      * remove printf debug information
      
      * use full new cuda tree learner only when using single GPU
      
      * disable all CUDA code when using CPU version
      
      * recover main.cpp
      
      * add cpp files for multi value bins
      
      * update LightGBM.vcxproj
      
      * update LightGBM.vcxproj
      
      fix lint errors
      
      * fix lint errors
      
      * fix lint errors
      
      * update Makevars
      
      fix lint errors
      
      * fix the case with 0 feature and 0 bin
      
      fix split finding for invalid leaves
      
      create cuda column data when loaded from bin file
      
      * fix lint errors
      
      hide GetRowWiseData when cuda is not used
      
      * recover default device type to cpu
      
      * fix na_as_missing case
      
      fix cuda feature meta information
      
      * fix UpdateDataIndexToLeafIndexKernel
      
      * create CUDA trees when needed in CUDADataPartition::UpdateTrainScore
      
      * add refit by tree for cuda tree learner
      
      * fix test_refit in test_engine.py
      
      * create set of large bin partitions in CUDARowData
      
      * add histogram construction for columns with a large number of bins
      
      * add find best split for categorical features on CUDA
      
      * add bitvectors for categorical split
      
      * cuda data partition split for categorical features
      
      * fix split tree with categorical feature
      
      * fix categorical feature splits
      
      * refactor cuda_data_partition.cu with multi-level templates
      
      * refactor CUDABestSplitFinder by grouping task information into struct
      
      * pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder
      
      * fix misuse of reference
      
      * remove useless changes
      
      * add support for path smoothing
      
      * virtual destructor for LightGBM::Tree
      
      * fix overlapped cat threshold in best split infos
      
      * reset histogram pointers in data partition and spllit finder in ResetConfig
      
      * comment useless parameter
      
      * fix reverse case when na is missing and default bin is zero
      
      * fix mfb_is_na and mfb_is_zero and is_single_feature_column
      
      * remove debug log
      
      * fix cat_l2 when one-hot
      
      fix gradient copy when data subset is used
      
      * switch shared histogram size according to CUDA version
      
      * gpu_use_dp=true when cuda test
      
      * revert modification in config.h
      
      * fix setting of gpu_use_dp=true in .ci/test.sh
      
      * fix linter errors
      
      * fix linter error
      
      remove useless change
      
      * recover main.cpp
      
      * separate cuda_exp and cuda
      
      * fix ci bash scripts
      
      add description for cuda_exp
      
      * add USE_CUDA_EXP flag
      
      * switch off USE_CUDA_EXP
      
      * revert changes in python-packages
      
      * more careful separation for USE_CUDA_EXP
      
      * fix CUDARowData::DivideCUDAFeatureGroups
      
      fix set fields for cuda metadata
      
      * revert config.h
      
      * fix test settings for cuda experimental version
      
      * skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version
      
      * fix lint issue by adding a blank line
      
      * fix lint errors by resorting imports
      
      * fix lint errors by resorting imports
      
      * fix lint errors by resorting imports
      
      * merge cuda.yml and cuda_exp.yml
      
      * update python version in cuda.yml
      
      * remove cuda_exp.yml
      
      * remove unrelated changes
      
      * fix compilation warnings
      
      fix cuda exp ci task name
      
      * recover task
      
      * use multi-level template in histogram construction
      
      check split only in debug mode
      
      * ignore NVCC related lines in parameter_generator.py
      
      * update job name for CUDA tests
      
      * apply review suggestions
      
      * Update .github/workflows/cuda.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update .github/workflows/cuda.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * update header
      
      * remove useless TODOs
      
      * remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062
      
      * #include <LightGBM/utils/log.h> for USE_CUDA_EXP only
      
      * fix include order
      
      * fix include order
      
      * remove extra space
      
      * address review comments
      
      * add warning when cuda_exp is used together with deterministic
      
      * add comment about gpu_use_dp in .ci/test.sh
      
      * revert changing order of included headers
      Co-authored-by: default avatarYu Shi <shiyu1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      6b56a90c
  22. 11 Jan, 2021 1 commit
  23. 26 Oct, 2020 1 commit
  24. 20 Sep, 2020 1 commit
    • Chip Kerchner's avatar
      [GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457
      Chip Kerchner authored
      
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * Initial CUDA work
      
      * redirect log to python console (#3090)
      
      * redir log to python console
      
      * fix pylint
      
      * Apply suggestions from code review
      
      * Update basic.py
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update c_api.h
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * super-minor: better wording
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarStrikerRUS <nekit94-12@hotmail.com>
      
      * re-order includes (fixes #3132) (#3133)
      
      * Revert "re-order includes (fixes #3132) (#3133)" (#3153)
      
      This reverts commit 656d2676
      
      .
      
      * Missing change from previous rebase
      
      * Minor cleanup and removal of development scripts.
      
      * Only set gpu_use_dp on by default for CUDA. Other minor change.
      
      * Fix python lint indentation problem.
      
      * More python lint issues.
      
      * Big lint cleanup - more to come.
      
      * Another large lint cleanup - more to come.
      
      * Even more lint cleanup.
      
      * Minor cleanup so less differences in code.
      
      * Revert is_use_subset changes
      
      * Another rebase from master to fix recent conflicts.
      
      * More lint.
      
      * Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.
      
      * Removed parameters added for CUDA and various bug fix.
      
      * Yet more lint and unneccessary changes.
      
      * Revert another change.
      
      * Removal of unneccessary code.
      
      * temporary appveyor.yml for building and testing
      
      * Remove return value in ReSize
      
      * Removal of unused variables.
      
      * Code cleanup from reviewers suggestions.
      
      * Removal of FIXME comments and unused defines.
      
      * More reviewers comments cleanup.
      
      * More reviewers comments cleanup.
      
      * More reviewers comments cleanup.
      
      * Fix config variables.
      
      * Attempt to fix check-docs failure
      
      * Update Paramster.rst for num_gpu
      
      * Removing test appveyor.yml
      
      * Add ƒCUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.
      
      * Fixed handling of data elements less than 2K.
      
      * More reviewers comments cleanup.
      
      * Removal of TODO and fix printing of int64_t
      
      * Add cuda change for CI testing and remove cuda from device_type in python.
      
      * Missed one change form previous check-in
      
      * Removal AdditionConfig and fix settings.
      
      * Limit number of GPUs to one for now in CUDA.
      
      * Update Parameters.rst for previous check-in
      
      * Whitespace removal.
      
      * Cleanup unused code.
      
      * Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.
      
      * Lint change from previous check-in.
      
      * Changes based on reviewers comments.
      
      * More reviewer comment changes.
      
      * Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_
      
      * Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.
      
      * Reviewer comment cleanup.
      
      * Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.
      
      * Remove PRINT debug for CUDA code.
      
      * Allow to use of multiple GPUs for CUDA.
      
      * More multi-GPUs enablement for CUDA.
      
      * More code cleanup based on reviews comments.
      
      * Update docs with latest config changes.
      Co-authored-by: default avatarGordon Fossum <fossum@us.ibm.com>
      Co-authored-by: default avatarChipKerchner <ckerchne@linux.vnet.ibm.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarStrikerRUS <nekit94-12@hotmail.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      f7ad9457