Commits · 5f57d6c673ee5f4c10e408868f98762edb062204 · tianlh / LightGBM-DCU

10 Apr, 2022 1 commit

[docs] Document behaviour of the first linear estimator (#5132) · 5f57d6c6

Pablo Dávila Herrero authored Apr 10, 2022



* Document behaviour of the first linear estimator

* Properly update docs
Co-authored-by: Pablo-Davila <Pablo-Davila@users.noreply.github.com>

5f57d6c6

26 Mar, 2022 1 commit
- Load initial scores with binary data files in CLI version (#4807) · 17d4e007
  shiyu1994 authored Mar 27, 2022
  
  17d4e007
23 Mar, 2022 1 commit

[CUDA] New CUDA version Part 1 (#4630) · 6b56a90c

shiyu1994 authored Mar 23, 2022



* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6b56a90c

23 Feb, 2022 1 commit

[Docs] Weights non-negative for train data (#5013) · 6ced58ad

Miguel Trejo Marrufo authored Feb 22, 2022

* docs: weight parameter non-negative

* docs: weights non negative only for train data

* docs: weights should be non negative for validation data

* typo in html render

* docs: brief weights non-negative description

6ced58ad

20 Feb, 2022 1 commit

[docs] clarify that categorical features will be converted to integers internally (#4959) · 820ae7e6

José Morales authored Feb 20, 2022

* clarify that categoricals will be converted to ints and not that they should be ints in the input data

* update remaining sections

* update config.h

* add suggestions

820ae7e6

14 Feb, 2022 1 commit
- document rounding behavior of floating point numbers in categorical features · 2d1caf14
  Yu Shi authored Feb 14, 2022
  
  2d1caf14
29 Nov, 2021 1 commit
- [docs] document that `pred_early_stop` can be used only in normal and raw scores prediction (#4823) · 67b4205c
  Nikita Titov authored Nov 29, 2021
  
  67b4205c
16 Nov, 2021 1 commit

Add customized parser support (#4782) · b0137deb

chjinche authored Nov 16, 2021

* add customized parser support

* fix typo of parser_config_file description

* make delimiter as parameter of JoinedLines

b0137deb

11 Nov, 2021 1 commit

Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) (#4746) · 3b6ebd79

Michael Mahoney authored Nov 10, 2021

* Add 'nrounds' as an alias for 'num_iterations'

* Improve tests

* Compare against nrounds directly

* Fix whitespace lints

3b6ebd79

30 Oct, 2021 1 commit
- [docs] improve docs about `nthreads` parameter (#4756) · dac0dffe
  Nikita Titov authored Oct 31, 2021
```
* in predict(), respect params set via `set_params()` after fit()

* extract docs changes
```
  dac0dffe
25 Oct, 2021 1 commit
- Fix some paramater hints when loading from binary file (#4701) · dc02dcaf
  Zhiyuan He authored Oct 25, 2021
```
Co-authored-by: hzy46 <email@example.com>
```
  dc02dcaf
05 Oct, 2021 1 commit
- add param aliases from scikit-learn (#4637) · e95d5ab8
  Nikita Titov authored Oct 05, 2021
  
  e95d5ab8
25 Jul, 2021 1 commit
- [docs] document CLI behavior when label_column is omitted (#4485) · fdc582ea
  James Lamb authored Jul 24, 2021
  
  fdc582ea
09 Jul, 2021 1 commit
- [docs] clarify description of prediction early stopping (#4411) · 0d1d12fb
  Nikita Titov authored Jul 09, 2021
  
  0d1d12fb
26 Jun, 2021 1 commit
- fix param aliases (#4387) · aab8fc18
  Nikita Titov authored Jun 26, 2021
  
  aab8fc18
09 Jun, 2021 1 commit

[docs] document how to pass multi-value params from Python and R (fixes #4345) (#4346) · 24ac9208

James Lamb authored Jun 09, 2021



* [R-package] add docs and tests on monotone constraints (fixes #4345)

* remove tests

* move doc to top level

* slightly more specific

* Update docs/Parameters.rst
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

24ac9208

07 May, 2021 1 commit

Precise text file parsing (#4081) · f8318088

Chen Yufei authored May 07, 2021



* New build option: USE_PRECISE_TEXT_PARSER.

Use fast_double_parser for text file parsing. For each number, fallback
to strtod in case of parse failure.

* Add benchmark for CSVParser with Atof and AtofPrecise.

* Fix lint complaint.

* Fix typo in open result error message.

* Revert "Fix lint complaint."

This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.

* Revert "Add benchmark for CSVParser with Atof and AtofPrecise."

This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.

* Use AtofPrecise in Common::__StringToTHelper.

* [option] precise_float_parser: precise float number parsing for text input.

* Remove USE_PRECISE_TEXT_PARSER compile option.

* test: add test for Common::AtofPrecise.

* test: remove ChunkedArrayTest with 0 length.

This triggers Log::Fatal which aborts the test program.

* fix lint, add copyright.

* Revert "test: remove ChunkedArrayTest with 0 length."

This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.

* Use LightGBM::Common::Sign

* save precise_float_parser in model file.

* Fix error checking in AtofPrecise. Add more test cases.

* Remove test case that can't pass under macOS.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

f8318088

23 Apr, 2021 1 commit
- added aliases to params (#4205) · 8b477ba3
  Nikita Titov authored Apr 23, 2021
  
  8b477ba3
28 Mar, 2021 1 commit
- [docs] add missed CUDA device type in docs (#4130) · 9cab93a9
  Nikita Titov authored Mar 28, 2021
  
  9cab93a9
04 Mar, 2021 1 commit

[docs] update description of deterministic parameter (#4027) · 19f35772

shiyu1994 authored Mar 04, 2021



* update description of deterministic parameter to require using with force_row_wise or force_col_wise

* Update include/LightGBM/config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update docs
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

19f35772

23 Feb, 2021 2 commits

[dask] allow tight control over ports (#3994) · 1f73f559

James Lamb authored Feb 23, 2021



* [dask] allow tight control over ports

* getting there, getting there

* fix params maybe

* fixing params

* remove unnecessary stuff

* fix tests

* fixes

* some minor changes

* fix flaky test

* linting

* more linting

* clarify parameter description

* add warning

* revert docs change

* Update python-package/lightgbm/dask.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* trying to fix stuff

* this is working

* update tests

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* indent
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1f73f559

[DOCS] Update docs to note that pred_contrib is not available for linear trees (#4006) · b09c1ff7
Belinda Trotta authored Feb 24, 2021
```
* Update docs to note that pred_contrib is not available for linear trees

* Add warning in code

* Change warning to error
```
b09c1ff7

19 Feb, 2021 1 commit
- [docs] Change some 'parallel learning' references to 'distributed learning' (#4000) · 7880b79f
  James Lamb authored Feb 19, 2021
```
* [docs] Change some 'parallel learning' references to 'distributed learning'

* found a few more

* one more reference
```
  7880b79f
13 Feb, 2021 1 commit
- [docs] Update docs about linear tree and monotone constraints (#3945) · 50e061f3
  Belinda Trotta authored Feb 14, 2021
```
* Update docs about linear tree and monotone constraints

* Fix punctuation
```
  50e061f3
03 Feb, 2021 1 commit
- Add new task type: "save_binary" (#3651) · 111d0c80
  Chen Yufei authored Feb 03, 2021
```
* Add new task type: "save_binary".

* Document for task "save_binary".
```
  111d0c80
31 Jan, 2021 1 commit

[docs] document CUDA version support (#3428) · 8040ef94

Nikita Titov authored Jan 31, 2021

* document CUDA version support

* address review comments

* collapse CUDA section in the guide

* remove Clang support from CUDA docs as we have never tested it

8040ef94

28 Jan, 2021 1 commit
- fix docs for machine_list_filename param (#3863) · 8ef874bd
  Nikita Titov authored Jan 28, 2021
  
  8ef874bd
19 Jan, 2021 1 commit
- [docs] fix current RTD failures (#3787) · 78778085
  Nikita Titov authored Jan 19, 2021
```
* fix docs

* Update basic.py

* Update engine.py
```
  78778085
18 Jan, 2021 2 commits

[docs] expand documentation on 'group' for ranking task (#3772) · 0e5eb9e3

James Lamb authored Jan 18, 2021



* [python-package] expand documentation on 'group' for ranking task

* add R package

* update Query Data section

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* fix typo in group example

* regenerate parameters

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* regenerate R docs
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

0e5eb9e3

[R-package] enable use of trees with linear models at leaves (fixes #3319) (#3699) · ed651e86

James Lamb authored Jan 18, 2021

* [R-package] enable use of trees with linear models at leaves (fixes #3319)

* remove problematic pragmas

* fix tests

* try to fix build scripts

* try fixing pragma check

* more pragma checks

* ok fix pragma stuff for real

* empty commit

* regenerate documentation

* try skipping test

* uncomment CI

* add note on missing value types for R

* add tests on saving and re-loading booster

ed651e86

29 Dec, 2020 1 commit

[docs] add doc on min_data_in_leaf approximation (fixes #3634) (#3690) · 68a40c79

James Lamb authored Dec 29, 2020



* [docs] add doc on min_data_in_leaf approximation (fixes #3634)

* Fix capital letter
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

68a40c79

28 Dec, 2020 1 commit

small code and docs refactoring (#3681) · 5a460846

Nikita Titov authored Dec 29, 2020

* small code and docs refactoring

* Update CMakeLists.txt

* Update .vsts-ci.yml

* Update test.sh

* continue

* continue

* revert stable sort for all-unique values

5a460846

24 Dec, 2020 1 commit

Trees with linear models at leaves (#3299) · fcfd4132

Belinda Trotta authored Dec 24, 2020

* Add Eigen library.

* Working for simple test.

* Apply changes to config params.

* Handle nan data.

* Update docs.

* Add test.

* Only load raw data if boosting=gbdt_linear

* Remove unneeded code.

* Minor updates.

* Update to work with sk-learn interface.

* Update to work with chunked datasets.

* Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.

* Save raw data in binary dataset file.

* Update docs and fix parameter checking.

* Fix dataset loading.

* Add test for regularization.

* Fix bugs when saving and loading tree.

* Add test for load/save linear model.

* Remove unneeded code.

* Fix case where not enough leaf data for linear model.

* Simplify code.

* Speed up code.

* Speed up code.

* Simplify code.

* Speed up code.

* Fix bugs.

* Working version.

* Store feature data column-wise (not fully working yet).

* Fix bugs.

* Speed up.

* Speed up.

* Remove unneeded code.

* Small speedup.

* Speed up.

* Minor updates.

* Remove unneeded code.

* Fix bug.

* Fix bug.

* Speed up.

* Speed up.

* Simplify code.

* Remove unneeded code.

* Fix bug, add more tests.

* Fix bug and add test.

* Only store numerical features

* Fix bug and speed up using templates.

* Speed up prediction.

* Fix bug with regularisation

* Visual studio files.

* Working version

* Only check nans if necessary

* Store coeff matrix as an array.

* Align cache lines

* Align cache lines

* Preallocation coefficient calculation matrices

* Small speedups

* Small speedup

* Reverse cache alignment changes

* Change to dynamic schedule

* Update docs.

* Refactor so that linear tree learner is not a separate class.

* Add refit capability.

* Speed up

* Small speedups.

* Speed up add prediction to score.

* Fix bug

* Fix bug and speed up.

* Speed up dataload.

* Speed up dataload

* Use vectors instead of pointers

* Fix bug

* Add OMP exception handling.

* Change return type of LGBM_BoosterGetLinear to bool

* Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change

* Remove unused internal_parent_ property of tree

* Remove unused parameter to CreateTreeLearner

* Remove reference to LinearTreeLearner

* Minor style issues

* Remove unneeded check

* Reverse temporary testing change

* Fix Visual Studio project files

* Restore LightGBM.vcxproj.filters

* Speed up

* Speed up

* Simplify code

* Update docs

* Simplify code

* Initialise storage space for max num threads

* Move Eigen to include directory and delete unused files

* Remove old files.

* Fix so it compiles with mingw

* Fix gpu tree learner

* Change AddPredictionToScore back to const

* Fix python lint error

* Fix C++ lint errors

* Change eigen to a submodule

* Update comment

* Add the eigen folder

* Try to fix build issues with eigen

* Remove eigen files

* Add eigen as submodule

* Fix include paths

* Exclude eigen files from Python linter

* Ignore eigen folders for pydocstyle

* Fix C++ linting errors

* Fix docs

* Fix docs

* Exclude eigen directories from doxygen

* Update manifest to include eigen

* Update build_r to include eigen files

* Fix compiler warnings

* Store raw feature data as float

* Use float for calculating linear coefficients

* Remove eigen directory from GLOB

* Don't compile linear model code when building R package

* Fix doxygen issue

* Fix lint issue

* Fix lint issue

* Remove uneeded code

* Restore delected lines

* Restore delected lines

* Change return type of has_raw to bool

* Update docs

* Rename some variables and functions for readability

* Make tree_learner parameter const in AddScore

* Fix style issues

* Pass vectors as const reference when setting tree properties

* Make temporary storage of serial_tree_learner mutable so we can make the object's methods const

* Remove get_raw_size, use num_numeric_features instead

* Fix typo

* Make contains_nan_ and any_nan_ properties immutable again

* Remove data_has_nan_ property of tree

* Remove temporary test code

* Make linear_tree a dataset param

* Fix lint error

* Make LinearTreeLearner a separate class

* Fix lint errors

* Fix lint error

* Add linear_tree_learner.o

* Simulate omp_get_max_threads if openmp is not available

* Update PushOneData to also store raw data.

* Cast size to int

* Fix bug in ReshapeRaw

* Speed up code with multithreading

* Use OMP_NUM_THREADS

* Speed up with multithreading

* Update to use ArrayToString

* Fix tests

* Fix test

* Fix bug introduced in merge

* Minor updates

* Update docs

fcfd4132

11 Dec, 2020 1 commit

[docs] Add details on improving training speed (#3628) · b69364e9

James Lamb authored Dec 11, 2020



* [docs] Add details to docs on improving training speed

* formatting

* fix link

* fix formatting

* replace 'performance' with 'accuracy' and mention learning_rate

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* regenerate docs from config.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

b69364e9

06 Nov, 2020 1 commit

better document for bin_construct_sample_cnt (#3521) · bee732af

Guolin Ke authored Nov 06, 2020



* better document for bin_construct_sample_cnt

* add warnings
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

bee732af

01 Nov, 2020 1 commit

Support deterministic (#3494) · c39afb9d

Guolin Ke authored Nov 01, 2020



* implement

* fix compilation

* Update config.cpp

* unify wordings
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

c39afb9d

28 Oct, 2020 1 commit
- fix param docs (#3495) · 5cc9e671
  Nikita Titov authored Oct 28, 2020
  
  5cc9e671
27 Oct, 2020 1 commit

Add support to optimize for NDCG at a given truncation level (#3425) · ba0a1f8d

Pavel Metrikov authored Oct 27, 2020



* Add support to optimize for NDCG at a given truncation level

In order to correctly optimize for NDCG@_k_, one should exclude pairs containing both documents beyond the top-_k_ (as they don't affect NDCG@_k_ when swapped).

* Update rank_objective.hpp

* Apply suggestions from code review
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* Update rank_objective.hpp

remove the additional branching: get high_rank and low_rank by one "if".

* Update config.h

add description to lambdarank_truncation_level parameter

* Update Parameters.rst

* Update test_sklearn.py

update expected NDCG value for a test, as it was affected by the underlying change in the algorithm

* Update test_sklearn.py

update NDCG@3 reference value

* fix R learning-to-rank tests

* Update rank_objective.hpp

* Update include/LightGBM/config.h
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* Update Parameters.rst
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

ba0a1f8d

29 Sep, 2020 1 commit
- [docs] Change doc link to monotone constraints report to HAL document (#3410) · 432c8214
  CharlesAuguste authored Sep 29, 2020
  
  432c8214
23 Sep, 2020 1 commit

Average precision score (#3347) · 28704900

Belinda Trotta authored Sep 23, 2020

* Implement average precision score

* Fix lint errors

* Change name to average_precision

* Add to R-package list of metrics

* Empty commit to trigger CI jobs

* Change name to average_precision

28704900