1. 01 Feb, 2023 1 commit
    • James Lamb's avatar
      [CUDA] consolidate CUDA versions (#5677) · 4f47547c
      James Lamb authored
      
      
      * [ci] speed up if-else, swig, and lint conda setup
      
      * add 'source activate'
      
      * python constraint
      
      * start removing cuda v1
      
      * comment out CI
      
      * remove more references
      
      * revert some unnecessaary changes
      
      * revert a few more mistakes
      
      * revert another change that ignored params
      
      * sigh
      
      * remove CUDATreeLearner
      
      * fix tests, docs
      
      * fix quoting in setup.py
      
      * restore all CI
      
      * Apply suggestions from code review
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * Apply suggestions from code review
      
      * completely remove cuda_exp, update docs
      
      ---------
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      4f47547c
  2. 27 Jan, 2023 1 commit
  3. 12 Jan, 2023 1 commit
  4. 29 Dec, 2022 1 commit
  5. 28 Dec, 2022 1 commit
  6. 02 Dec, 2022 1 commit
    • Jonathan Giannuzzi's avatar
      [ci] Build integrated OpenCL Linux wheels (#5252) · 38a1f582
      Jonathan Giannuzzi authored
      
      
      * Add integrated OpenCL build on Linux
      
      * Build integrated OpenCL Linux wheel in CI
      
      * Fix test_dual.py on Linux arm64
      
      * Enable integrated OpenCL Linux wheel arm64 testing in CI
      
      * Update documentation
      
      * Add comment about gpu_use_dp
      
      * add missing fi dropped in merge conflict resolution
      
      * install opencl-headers on bdist task
      
      * use new CI image for x86_64
      
      * update check_dynamic_dependencies script
      
      * use main CI image
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      38a1f582
  7. 27 Nov, 2022 1 commit
  8. 25 Nov, 2022 1 commit
  9. 03 Nov, 2022 1 commit
  10. 03 Aug, 2022 1 commit
  11. 28 Jul, 2022 1 commit
  12. 16 Jun, 2022 1 commit
  13. 14 Jun, 2022 1 commit
  14. 13 Jun, 2022 1 commit
  15. 23 Mar, 2022 1 commit
    • shiyu1994's avatar
      [CUDA] New CUDA version Part 1 (#4630) · 6b56a90c
      shiyu1994 authored
      
      
      * new cuda framework
      
      * add histogram construction kernel
      
      * before removing multi-gpu
      
      * new cuda framework
      
      * tree learner cuda kernels
      
      * single tree framework ready
      
      * single tree training framework
      
      * remove comments
      
      * boosting with cuda
      
      * optimize for best split find
      
      * data split
      
      * move boosting into cuda
      
      * parallel synchronize best split point
      
      * merge split data kernels
      
      * before code refactor
      
      * use tasks instead of features as units for split finding
      
      * refactor cuda best split finder
      
      * fix configuration error with small leaves in data split
      
      * skip histogram construction of too small leaf
      
      * skip split finding of invalid leaves
      
      stop when no leaf to split
      
      * support row wise with CUDA
      
      * copy data for split by column
      
      * copy data from host to CPU by column for data partition
      
      * add synchronize best splits for one leaf from multiple blocks
      
      * partition dense row data
      
      * fix sync best split from task blocks
      
      * add support for sparse row wise for CUDA
      
      * remove useless code
      
      * add l2 regression objective
      
      * sparse multi value bin enabled for CUDA
      
      * fix cuda ranking objective
      
      * support for number of items <= 2048 per query
      
      * speedup histogram construction by interleaving global memory access
      
      * split optimization
      
      * add cuda tree predictor
      
      * remove comma
      
      * refactor objective and score updater
      
      * before use struct
      
      * use structure for split information
      
      * use structure for leaf splits
      
      * return CUDASplitInfo directly after finding best split
      
      * split with CUDATree directly
      
      * use cuda row data in cuda histogram constructor
      
      * clean src/treelearner/cuda
      
      * gather shared cuda device functions
      
      * put shared CUDA functions into header file
      
      * change smaller leaf from <= back to < for consistent result with CPU
      
      * add tree predictor
      
      * remove useless cuda_tree_predictor
      
      * predict on CUDA with pipeline
      
      * add global sort algorithms
      
      * add global argsort for queries with many items in ranking tasks
      
      * remove limitation of maximum number of items per query in ranking
      
      * add cuda metrics
      
      * fix CUDA AUC
      
      * remove debug code
      
      * add regression metrics
      
      * remove useless file
      
      * don't use mask in shuffle reduce
      
      * add more regression objectives
      
      * fix cuda mape loss
      
      add cuda xentropy loss
      
      * use template for different versions of BitonicArgSortDevice
      
      * add multiclass metrics
      
      * add ndcg metric
      
      * fix cross entropy objectives and metrics
      
      * fix cross entropy and ndcg metrics
      
      * add support for customized objective in CUDA
      
      * complete multiclass ova for CUDA
      
      * separate cuda tree learner
      
      * use shuffle based prefix sum
      
      * clean up cuda_algorithms.hpp
      
      * add copy subset on CUDA
      
      * add bagging for CUDA
      
      * clean up code
      
      * copy gradients from host to device
      
      * support bagging without using subset
      
      * add support of bagging with subset for CUDAColumnData
      
      * add support of bagging with subset for dense CUDARowData
      
      * refactor copy sparse subrow
      
      * use copy subset for column subset
      
      * add reset train data and reset config for CUDA tree learner
      
      add deconstructors for cuda tree learner
      
      * add USE_CUDA ifdef to cuda tree learner files
      
      * check that dataset doesn't contain CUDA tree learner
      
      * remove printf debug information
      
      * use full new cuda tree learner only when using single GPU
      
      * disable all CUDA code when using CPU version
      
      * recover main.cpp
      
      * add cpp files for multi value bins
      
      * update LightGBM.vcxproj
      
      * update LightGBM.vcxproj
      
      fix lint errors
      
      * fix lint errors
      
      * fix lint errors
      
      * update Makevars
      
      fix lint errors
      
      * fix the case with 0 feature and 0 bin
      
      fix split finding for invalid leaves
      
      create cuda column data when loaded from bin file
      
      * fix lint errors
      
      hide GetRowWiseData when cuda is not used
      
      * recover default device type to cpu
      
      * fix na_as_missing case
      
      fix cuda feature meta information
      
      * fix UpdateDataIndexToLeafIndexKernel
      
      * create CUDA trees when needed in CUDADataPartition::UpdateTrainScore
      
      * add refit by tree for cuda tree learner
      
      * fix test_refit in test_engine.py
      
      * create set of large bin partitions in CUDARowData
      
      * add histogram construction for columns with a large number of bins
      
      * add find best split for categorical features on CUDA
      
      * add bitvectors for categorical split
      
      * cuda data partition split for categorical features
      
      * fix split tree with categorical feature
      
      * fix categorical feature splits
      
      * refactor cuda_data_partition.cu with multi-level templates
      
      * refactor CUDABestSplitFinder by grouping task information into struct
      
      * pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder
      
      * fix misuse of reference
      
      * remove useless changes
      
      * add support for path smoothing
      
      * virtual destructor for LightGBM::Tree
      
      * fix overlapped cat threshold in best split infos
      
      * reset histogram pointers in data partition and spllit finder in ResetConfig
      
      * comment useless parameter
      
      * fix reverse case when na is missing and default bin is zero
      
      * fix mfb_is_na and mfb_is_zero and is_single_feature_column
      
      * remove debug log
      
      * fix cat_l2 when one-hot
      
      fix gradient copy when data subset is used
      
      * switch shared histogram size according to CUDA version
      
      * gpu_use_dp=true when cuda test
      
      * revert modification in config.h
      
      * fix setting of gpu_use_dp=true in .ci/test.sh
      
      * fix linter errors
      
      * fix linter error
      
      remove useless change
      
      * recover main.cpp
      
      * separate cuda_exp and cuda
      
      * fix ci bash scripts
      
      add description for cuda_exp
      
      * add USE_CUDA_EXP flag
      
      * switch off USE_CUDA_EXP
      
      * revert changes in python-packages
      
      * more careful separation for USE_CUDA_EXP
      
      * fix CUDARowData::DivideCUDAFeatureGroups
      
      fix set fields for cuda metadata
      
      * revert config.h
      
      * fix test settings for cuda experimental version
      
      * skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version
      
      * fix lint issue by adding a blank line
      
      * fix lint errors by resorting imports
      
      * fix lint errors by resorting imports
      
      * fix lint errors by resorting imports
      
      * merge cuda.yml and cuda_exp.yml
      
      * update python version in cuda.yml
      
      * remove cuda_exp.yml
      
      * remove unrelated changes
      
      * fix compilation warnings
      
      fix cuda exp ci task name
      
      * recover task
      
      * use multi-level template in histogram construction
      
      check split only in debug mode
      
      * ignore NVCC related lines in parameter_generator.py
      
      * update job name for CUDA tests
      
      * apply review suggestions
      
      * Update .github/workflows/cuda.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update .github/workflows/cuda.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * update header
      
      * remove useless TODOs
      
      * remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062
      
      * #include <LightGBM/utils/log.h> for USE_CUDA_EXP only
      
      * fix include order
      
      * fix include order
      
      * remove extra space
      
      * address review comments
      
      * add warning when cuda_exp is used together with deterministic
      
      * add comment about gpu_use_dp in .ci/test.sh
      
      * revert changing order of included headers
      Co-authored-by: default avatarYu Shi <shiyu1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      6b56a90c
  16. 19 Feb, 2022 1 commit
  17. 11 Feb, 2022 1 commit
    • James Lamb's avatar
      [ci] use conda-forge in Linux and macOS CI jobs (#4953) · 3500cb67
      James Lamb authored
      
      
      * [ci] use conda-forge in CI jobs (fixes #4948)
      
      * comment out more jobs
      
      * try reverting graphviz patch, running more cuda jobs
      
      * get graphviz from PyPI and try removing some patches for r-lintr
      
      * start running appveyor again
      
      * use conda-forge if using conda
      
      * fix commands
      
      * conda install graphviz
      
      * try newer openmp
      
      * pin below openmp 11.x
      
      * focus on gpu task
      
      * trying to narrow down error
      
      * maybe gcc11 is the issue
      
      * start adding other tests back
      
      * pin openmp too
      
      * maybe need to pin to gcc less than 10.x
      
      * pin libgfortran and libstdcxx as well
      
      * pin to gcc 9.3.0
      
      * move constraints up to initial environment
      
      * add all CI jobs back
      
      * try installing python-graphviz separately
      
      * try new lightgbm/vsts-agent image
      
      * fix typo
      
      * test if pinning gcc for linux gpu_source build is still necessary
      
      * ok yes, pinning gcc is necessary
      
      * test if Linux gpu_source works with Python 3.9.6
      
      * no special exception for Linux gpu_source job
      
      * pin to Python 3.9.6 in Linux gpu_source
      
      * try explicitly asking for libstdcxx-ng for every linux build
      
      * swap compilers
      
      * switch compilers back
      
      * revert accidental whitespace change
      
      * comment out CI
      
      * try Linux gpu_source with different Python versions
      
      * Revert "try Linux gpu_source with different Python versions"
      
      This reverts commit f6f63cbb9b4a9cf138f3580ae4223a8acdd0e94a.
      
      * Revert "comment out CI"
      
      This reverts commit ece191f01e3650c2f325e80ff86bfc8c485fb7bc.
      
      * remove libxml2 install, change CONDA path
      
      * avoid installing conda in rchk job
      
      * empty commit 1
      
      * empty commit 2
      
      * empty commit 3
      
      * empty commit 4
      
      * add more verbose logging around installation of python-graphviz
      
      * empty commit 1
      
      * get mamba info
      
      * get more conda info
      
      * add another mamba info call
      
      * allow for other macOS environments in GHA configuration
      
      * Revert "allow for other macOS environments in GHA configuration"
      
      This reverts commit a3c7a19926be94e3719f5ae9100fbe30e87b35da.
      
      * get more logs from mamba
      
      * get Build.ArtifactsStagingDirectory
      
      * get more logs and try to force re-installing everything
      
      * clean cache after every step
      
      * remove --update-all and make logs less verbose
      
      * remove more print statements and uncomment jobs
      
      * test if conda-clean issue fixes segfaults for gpu_source
      
      * pin python version for gpu_source
      
      * empty commit 1
      
      * use miniforge instead
      
      * empty commit 1
      
      * Apply suggestions from code review
      
      * bring workarounds back
      
      * remove duplicated graphviz system-wide installation (reverts #4095, #4097, #4238)
      
      * empty commit 1
      
      * empty commit 2
      
      * empty commit 3
      
      * empty commit 4
      
      * empty commit 5
      
      * empty commit 6
      
      * empty commit 7
      
      * empty commit 8
      
      * empty commit 9
      
      * empty commit 10
      
      * empty commit 10
      
      * empty commit 10
      
      * empty commit 10
      
      * empty commit 11
      
      * one more try
      
      * try to downgrade Python version for Linux GPU job
      
      * swap compilers
      
      * Revert "swap compilers"
      
      This reverts commit f04dc27b17920a69cbcba1254a8e109ce9791154.
      Co-authored-by: default avatarNikita Titov <nekit94-12@hotmail.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      3500cb67
  18. 18 Dec, 2021 1 commit
  19. 04 Dec, 2021 1 commit
  20. 14 Nov, 2021 2 commits
  21. 05 Nov, 2021 1 commit
  22. 30 Oct, 2021 1 commit
  23. 05 Oct, 2021 1 commit
  24. 22 Sep, 2021 1 commit
  25. 14 Aug, 2021 1 commit
  26. 10 Jul, 2021 1 commit
  27. 08 Jul, 2021 1 commit
  28. 04 Jul, 2021 1 commit
  29. 02 Jul, 2021 1 commit
    • Chen Yufei's avatar
      [python-package] Create Dataset from multiple data files (#4089) · c359896e
      Chen Yufei authored
      * [python-package] create Dataset from sampled data.
      
      * [python-package] create Dataset from List[Sequence].
      
      1. Use random access for data sampling
      2. Support read data from multiple input files
      3. Read data in batch so no need to hold all data in memory
      
      * [python-package] example: create Dataset from multiple HDF5 file.
      
      * fix: revert is_class implementation for seq
      
      * fix: unwanted memory view reference for seq
      
      * fix: seq is_class accepts sklearn matrices
      
      * fix: requirements for example
      
      * fix: pycode
      
      * feat: print static code linting stage
      
      * fix: linting: avoid shell str regex conversion
      
      * code style: doc style
      
      * code style: isort
      
      * fix ci dependency: h5py on windows
      
      * [py] remove rm files in test seq
      https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623
      
      * docs(python): init_from_sample summary
      
      https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389
      
      
      
      * remove dataset dump sample data debugging code.
      
      * remove typo fix.
      
      Create separate PR for this.
      
      * fix typo in src/c_api.cpp
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * style(linting): py3 type hint for seq
      
      * test(basic): os.path style path handling
      
      * Revert "feat: print static code linting stage"
      
      This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.
      
      * feat(python): sequence on validation set
      
      * minor(python): comment
      
      * minor(python): test option hint
      
      * style(python): fix code linting
      
      * style(python): add pydoc for ref_dataset
      
      * doc(python): sequence
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * revert(python): sequence class abc
      
      * chore(python): remove rm_files
      
      * Remove useless static_assert.
      
      * refactor: test_basic test for sequence.
      
      * fix lint complaint.
      
      * remove dataset._dump_text in sequence test.
      
      * Fix reverting typo fix.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Fix type hint, code and doc style.
      
      * fix failing test_basic.
      
      * Remove TODO about keep constant in sync with cpp.
      
      * Install h5py only when running python-examples.
      
      * Fix lint complaint.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Doc fixes, remove unused params_str in __init_from_seqs.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Remove unnecessary conda install in windows ci script.
      
      * Keep param as example in dataset_from_multi_hdf5.py
      
      * Add _get_sample_count function to remove code duplication.
      
      * Use batch_size parameter in generate_hdf.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Fix after applying suggestions.
      
      * Fix test, check idx is instance of numbers.Integral.
      
      * Update python-package/lightgbm/basic.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Expose Sequence class in Python-API doc.
      
      * Handle Sequence object not having batch_size.
      
      * Fix isort lint complaint.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update docstring to mention Sequence as data input.
      
      * Remove get_one_line in test_basic.py
      
      * Make Sequence an abstract class.
      
      * Reduce number of tests for test_sequence.
      
      * Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.
      
      * empty commit to trigger ci
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.
      
      Also rename total_nrow to num_total_row in c_api.h for consistency.
      
      * Doc about Sequence in docs/Python-Intro.rst.
      
      * Fix: basic.py change LGBM_SampleIndices out_len to int32.
      
      * Add create_valid test case with Dataset from Sequence.
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      
      * Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.
      
      * Update python-package/lightgbm/basic.py
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarWillian Zhang <willian@willian.email>
      Co-authored-by: default avatarWillian Z <Willian@Willian-Zhang.com>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarshiyu1994 <shiyu_k1994@qq.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      c359896e
  30. 26 Jun, 2021 1 commit
  31. 14 May, 2021 1 commit
  32. 29 Apr, 2021 1 commit
    • Nikita Titov's avatar
      [ci] Install graphviz system-widely (#4238) · 91f72e2a
      Nikita Titov authored
      * Install graphviz from default conda channel
      
      * Update test.sh
      
      * Update setup.sh
      
      * Update test.sh
      
      * Update setup.sh
      
      * Update setup.sh
      
      * Update setup.sh
      
      * Update setup.sh
      
      * Update setup.sh
      
      * Update setup.sh
      91f72e2a
  33. 16 Apr, 2021 1 commit
  34. 28 Mar, 2021 1 commit
  35. 23 Mar, 2021 1 commit
  36. 21 Mar, 2021 1 commit
    • Alberto Ferreira's avatar
      [SWIG] Add streaming data support + cpp tests (#3997) · 4ded1342
      Alberto Ferreira authored
      * [feature] Add ChunkedArray to SWIG
      
      * Add ChunkedArray
      * Add ChunkedArray_API_extensions.i
      * Add SWIG class wrappers
      
      * Address some review comments
      
      * Fix linting issues
      
      * Move test to tests/test_ChunkedArray_manually.cpp
      
      * Add test note
      
      * Move ChunkedArray to include/LightGBM/utils/
      
      * Declare more explicit types of ChunkedArray in the SWIG API.
      
      * Port ChunkedArray tests to googletest
      
      * Please C++ linter
      
      * Address StrikerRUS' review comments
      
      * Update SWIG doc & disable ChunkedArray<int64_t>
      
      * Use CHECK_EQ instead of assert
      
      * Change include order (linting)
      
      * Rename ChunkedArray -> chunked_array files
      
      * Change header guards
      
      * Address last comments from StrikerRUS
      4ded1342
  37. 16 Mar, 2021 1 commit
  38. 15 Mar, 2021 1 commit
  39. 09 Mar, 2021 1 commit