Commits · 7e34d23c05599ce3a8a6f22cdba29e103f57d218 · tianlh / LightGBM-DCU

04 Sep, 2023 1 commit

Treat position bias via GAM in LambdaMART (#5929) · 7e34d23c

Pavel Metrikov authored Sep 04, 2023



* Update dataset.h

* Update metadata.cpp

* Update rank_objective.hpp

* Update metadata.cpp

* Update rank_objective.hpp

* Update metadata.cpp

* Update dataset.h

* Update rank_objective.hpp

* Update metadata.cpp

* Update test_engine.py

* Update test_engine.py

* Add files via upload

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update _rank.train.position

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update _rank.train.position

* Update _rank.train.position

* Update test_engine.py

* Update _rank.train.position

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update the position of import statement

* Update rank_objective.hpp

* Update config.h

* Update config_auto.cpp

* Update rank_objective.hpp

* Update rank_objective.hpp

* update documentation

* remove extra blank line

* Update src/io/metadata.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update src/io/metadata.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* remove _rank.train.position

* add position in python API

* fix set_positions in basic.py

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update docs/Advanced-Topics.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update docs/Advanced-Topics.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* Update Advanced-Topics.rst

* remove List from _LGBM_PositionType

* move new position parameter to the last in Dataset constructor

* add position_filename as a parameter

* Update docs/Advanced-Topics.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update docs/Advanced-Topics.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update Advanced-Topics.rst

* Update src/objective/rank_objective.hpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update src/io/metadata.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update metadata.cpp

* Update python-package/lightgbm/basic.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/basic.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/basic.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update python-package/lightgbm/basic.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update src/io/metadata.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* more infomrative fatal message

address more comments

* update documentation for more flexible position specification

* fix SetPosition

add tests for get_position and set_position

* remove position_filename

* remove useless changes

* Update python-package/lightgbm/basic.py
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* remove useless files

* move position file when position set in Dataset

* warn when positions are overwritten

* skip ranking with position test in cuda

* split test case

* remove useless import

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py

* Update docs/Advanced-Topics.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Update Parameters.rst

* Update rank_objective.hpp

* Update config.h

* update config_auto.cppp

* Update docs/Advanced-Topics.rst
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix randomness in test case for gpu

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

7e34d23c

05 May, 2023 1 commit

Add quantized training (CPU part) (#5800) · 17ecfab3

shiyu1994 authored May 05, 2023

* add quantized training (first stage)

* add histogram construction functions for integer gradients

* add stochastic rounding

* update docs

* fix compilation errors by adding template instantiations

* update files for compilation

* fix compilation of gpu version

* initialize gradient discretizer before share states

* add a test case for quantized training

* add quantized training for data distributed training

* Delete origin.pred

* Delete ifelse.pred

* Delete LightGBM_model.txt

* remove useless changes

* fix lint error

* remove debug loggings

* fix mismatch of vector and allocator types

* remove changes in main.cpp

* fix bugs with uninitialized gradient discretizer

* initialize ordered gradients in gradient discretizer

* disable quantized training with gpu and cuda

fix msvc compilation errors and warnings

* fix bug in data parallel tree learner

* make quantized training test deterministic

* make quantized training in test case more accurate

* refactor test_quantized_training

* fix leaf splits initialization with quantized training

* check distributed quantized training result

17ecfab3

14 Feb, 2023 1 commit

feature: Add serialization of reference dataset (#5427) · 0f7983b6

Scott Votaw authored Feb 13, 2023

* Add serialization of reference dataset

* lint and missing file

* Fixes from reviewers

* responded to comments

* revert sdk change

0f7983b6

01 Feb, 2023 1 commit

[CUDA] consolidate CUDA versions (#5677) · 4f47547c

James Lamb authored Jan 31, 2023



* [ci] speed up if-else, swig, and lint conda setup

* add 'source activate'

* python constraint

* start removing cuda v1

* comment out CI

* remove more references

* revert some unnecessaary changes

* revert a few more mistakes

* revert another change that ignored params

* sigh

* remove CUDATreeLearner

* fix tests, docs

* fix quoting in setup.py

* restore all CI

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Apply suggestions from code review

* completely remove cuda_exp, update docs

---------
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

4f47547c

29 Nov, 2022 1 commit
- Fix OpenMP thread allocation in Linux (#5551) · 4c5d0fbb
  Scott Votaw authored Nov 29, 2022
  
  4c5d0fbb
02 Sep, 2022 1 commit
- Rename Metadata num_classes to be more clear (#5461) · 7d1276ad
  Scott Votaw authored Sep 01, 2022
```
Rename num_classes to be more clear
```
  7d1276ad
10 Aug, 2022 1 commit

feature: Add true streaming APIs to reduce client-side memory usage (#5299) · 0a5c5838

Scott Votaw authored Aug 10, 2022

* Extract streaming to own PR

* small merge fixes and cleanup

* linting fixes

* fix cast warning

* Fix accidental deletion during branch transfer

* responded to initial triage comments

* Added more tests to use create-from-samples APIs

* added mutex and adjusted nclasses logic

* Fix thread-safety for pushing data to sparse bins through Push APIs

* lint and doc fixes

* Small SWIG fix

* nit fix

* Responded to StrikerRUS comments

* fix breaking change after merge with master

* Extract streaming to own PR

* small merge fixes and cleanup

* Fix accidental deletion during branch transfer

* responded to initial triage comments

* Added more tests to use create-from-samples APIs

* Fix rstcheck call in ci

* remove TODOs

* Extract streaming to own PR

* small merge fixes and cleanup

* Fix accidental deletion during branch transfer

* responded to initial triage comments

* Added more tests to use create-from-samples APIs

* Small SWIG fix

* remove ci change

* responded to shiyu1994 comments

* responded to StrikerRUS comments

* Fixes from StrikerRUS comments

0a5c5838

26 Mar, 2022 1 commit
- Load initial scores with binary data files in CLI version (#4807) · 17d4e007
  shiyu1994 authored Mar 27, 2022
  
  17d4e007
23 Mar, 2022 1 commit

[CUDA] New CUDA version Part 1 (#4630) · 6b56a90c

shiyu1994 authored Mar 23, 2022



* new cuda framework

* add histogram construction kernel

* before removing multi-gpu

* new cuda framework

* tree learner cuda kernels

* single tree framework ready

* single tree training framework

* remove comments

* boosting with cuda

* optimize for best split find

* data split

* move boosting into cuda

* parallel synchronize best split point

* merge split data kernels

* before code refactor

* use tasks instead of features as units for split finding

* refactor cuda best split finder

* fix configuration error with small leaves in data split

* skip histogram construction of too small leaf

* skip split finding of invalid leaves

stop when no leaf to split

* support row wise with CUDA

* copy data for split by column

* copy data from host to CPU by column for data partition

* add synchronize best splits for one leaf from multiple blocks

* partition dense row data

* fix sync best split from task blocks

* add support for sparse row wise for CUDA

* remove useless code

* add l2 regression objective

* sparse multi value bin enabled for CUDA

* fix cuda ranking objective

* support for number of items <= 2048 per query

* speedup histogram construction by interleaving global memory access

* split optimization

* add cuda tree predictor

* remove comma

* refactor objective and score updater

* before use struct

* use structure for split information

* use structure for leaf splits

* return CUDASplitInfo directly after finding best split

* split with CUDATree directly

* use cuda row data in cuda histogram constructor

* clean src/treelearner/cuda

* gather shared cuda device functions

* put shared CUDA functions into header file

* change smaller leaf from <= back to < for consistent result with CPU

* add tree predictor

* remove useless cuda_tree_predictor

* predict on CUDA with pipeline

* add global sort algorithms

* add global argsort for queries with many items in ranking tasks

* remove limitation of maximum number of items per query in ranking

* add cuda metrics

* fix CUDA AUC

* remove debug code

* add regression metrics

* remove useless file

* don't use mask in shuffle reduce

* add more regression objectives

* fix cuda mape loss

add cuda xentropy loss

* use template for different versions of BitonicArgSortDevice

* add multiclass metrics

* add ndcg metric

* fix cross entropy objectives and metrics

* fix cross entropy and ndcg metrics

* add support for customized objective in CUDA

* complete multiclass ova for CUDA

* separate cuda tree learner

* use shuffle based prefix sum

* clean up cuda_algorithms.hpp

* add copy subset on CUDA

* add bagging for CUDA

* clean up code

* copy gradients from host to device

* support bagging without using subset

* add support of bagging with subset for CUDAColumnData

* add support of bagging with subset for dense CUDARowData

* refactor copy sparse subrow

* use copy subset for column subset

* add reset train data and reset config for CUDA tree learner

add deconstructors for cuda tree learner

* add USE_CUDA ifdef to cuda tree learner files

* check that dataset doesn't contain CUDA tree learner

* remove printf debug information

* use full new cuda tree learner only when using single GPU

* disable all CUDA code when using CPU version

* recover main.cpp

* add cpp files for multi value bins

* update LightGBM.vcxproj

* update LightGBM.vcxproj

fix lint errors

* fix lint errors

* fix lint errors

* update Makevars

fix lint errors

* fix the case with 0 feature and 0 bin

fix split finding for invalid leaves

create cuda column data when loaded from bin file

* fix lint errors

hide GetRowWiseData when cuda is not used

* recover default device type to cpu

* fix na_as_missing case

fix cuda feature meta information

* fix UpdateDataIndexToLeafIndexKernel

* create CUDA trees when needed in CUDADataPartition::UpdateTrainScore

* add refit by tree for cuda tree learner

* fix test_refit in test_engine.py

* create set of large bin partitions in CUDARowData

* add histogram construction for columns with a large number of bins

* add find best split for categorical features on CUDA

* add bitvectors for categorical split

* cuda data partition split for categorical features

* fix split tree with categorical feature

* fix categorical feature splits

* refactor cuda_data_partition.cu with multi-level templates

* refactor CUDABestSplitFinder by grouping task information into struct

* pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder

* fix misuse of reference

* remove useless changes

* add support for path smoothing

* virtual destructor for LightGBM::Tree

* fix overlapped cat threshold in best split infos

* reset histogram pointers in data partition and spllit finder in ResetConfig

* comment useless parameter

* fix reverse case when na is missing and default bin is zero

* fix mfb_is_na and mfb_is_zero and is_single_feature_column

* remove debug log

* fix cat_l2 when one-hot

fix gradient copy when data subset is used

* switch shared histogram size according to CUDA version

* gpu_use_dp=true when cuda test

* revert modification in config.h

* fix setting of gpu_use_dp=true in .ci/test.sh

* fix linter errors

* fix linter error

remove useless change

* recover main.cpp

* separate cuda_exp and cuda

* fix ci bash scripts

add description for cuda_exp

* add USE_CUDA_EXP flag

* switch off USE_CUDA_EXP

* revert changes in python-packages

* more careful separation for USE_CUDA_EXP

* fix CUDARowData::DivideCUDAFeatureGroups

fix set fields for cuda metadata

* revert config.h

* fix test settings for cuda experimental version

* skip some tests due to unsupported features or differences in implementation details for CUDA Experimental version

* fix lint issue by adding a blank line

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* fix lint errors by resorting imports

* merge cuda.yml and cuda_exp.yml

* update python version in cuda.yml

* remove cuda_exp.yml

* remove unrelated changes

* fix compilation warnings

fix cuda exp ci task name

* recover task

* use multi-level template in histogram construction

check split only in debug mode

* ignore NVCC related lines in parameter_generator.py

* update job name for CUDA tests

* apply review suggestions

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update .github/workflows/cuda.yml
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update header

* remove useless TODOs

* remove [TODO(shiyu1994): constrain the split with min_data_in_group] and record in #5062

* #include <LightGBM/utils/log.h> for USE_CUDA_EXP only

* fix include order

* fix include order

* remove extra space

* address review comments

* add warning when cuda_exp is used together with deterministic

* add comment about gpu_use_dp in .ci/test.sh

* revert changing order of included headers
Co-authored-by: Yu Shi <shiyu1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

6b56a90c

16 Nov, 2021 1 commit

Add customized parser support (#4782) · b0137deb

chjinche authored Nov 16, 2021

* add customized parser support

* fix typo of parser_config_file description

* make delimiter as parameter of JoinedLines

b0137deb

07 May, 2021 1 commit

Precise text file parsing (#4081) · f8318088

Chen Yufei authored May 07, 2021



* New build option: USE_PRECISE_TEXT_PARSER.

Use fast_double_parser for text file parsing. For each number, fallback
to strtod in case of parse failure.

* Add benchmark for CSVParser with Atof and AtofPrecise.

* Fix lint complaint.

* Fix typo in open result error message.

* Revert "Fix lint complaint."

This reverts commit 92ab0b6bce9f17d7be9eaeb20f19d4a0a36f0387.

* Revert "Add benchmark for CSVParser with Atof and AtofPrecise."

This reverts commit 4f8639abd06c679d4382eb715a1793afd94df3d2.

* Use AtofPrecise in Common::__StringToTHelper.

* [option] precise_float_parser: precise float number parsing for text input.

* Remove USE_PRECISE_TEXT_PARSER compile option.

* test: add test for Common::AtofPrecise.

* test: remove ChunkedArrayTest with 0 length.

This triggers Log::Fatal which aborts the test program.

* fix lint, add copyright.

* Revert "test: remove ChunkedArrayTest with 0 length."

This reverts commit 346c76affe9e78b6ca2738c4a56dbb9c00f31102.

* Use LightGBM::Common::Sign

* save precise_float_parser in model file.

* Fix error checking in AtofPrecise. Add more test cases.

* Remove test case that can't pass under macOS.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

f8318088

04 May, 2021 1 commit

Correct spelling (#4250) · e79716e0

Andrew Ziem authored May 04, 2021



* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.
Co-authored-by: James Lamb <jaylamb20@gmail.com>

e79716e0

19 Feb, 2021 1 commit
- [docs] Change some 'parallel learning' references to 'distributed learning' (#4000) · 7880b79f
  James Lamb authored Feb 19, 2021
```
* [docs] Change some 'parallel learning' references to 'distributed learning'

* found a few more

* one more reference
```
  7880b79f
07 Jan, 2021 1 commit
- Fix compiler warnings caused by implicit type conversion (fixes #3677) (#3729) · 753b0e9c
  Belinda Trotta authored Jan 07, 2021
```
* Fix compiler warnings caused by implicit type conversion

* Fix more warnings

* Fix more warnings
```
  753b0e9c
03 Jan, 2021 1 commit
- fix warning (#3678) · 26671aa3
  sisco0 authored Jan 02, 2021
```
Compile warnings have been fixed
```
  26671aa3
24 Dec, 2020 1 commit

Trees with linear models at leaves (#3299) · fcfd4132

Belinda Trotta authored Dec 24, 2020

* Add Eigen library.

* Working for simple test.

* Apply changes to config params.

* Handle nan data.

* Update docs.

* Add test.

* Only load raw data if boosting=gbdt_linear

* Remove unneeded code.

* Minor updates.

* Update to work with sk-learn interface.

* Update to work with chunked datasets.

* Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.

* Save raw data in binary dataset file.

* Update docs and fix parameter checking.

* Fix dataset loading.

* Add test for regularization.

* Fix bugs when saving and loading tree.

* Add test for load/save linear model.

* Remove unneeded code.

* Fix case where not enough leaf data for linear model.

* Simplify code.

* Speed up code.

* Speed up code.

* Simplify code.

* Speed up code.

* Fix bugs.

* Working version.

* Store feature data column-wise (not fully working yet).

* Fix bugs.

* Speed up.

* Speed up.

* Remove unneeded code.

* Small speedup.

* Speed up.

* Minor updates.

* Remove unneeded code.

* Fix bug.

* Fix bug.

* Speed up.

* Speed up.

* Simplify code.

* Remove unneeded code.

* Fix bug, add more tests.

* Fix bug and add test.

* Only store numerical features

* Fix bug and speed up using templates.

* Speed up prediction.

* Fix bug with regularisation

* Visual studio files.

* Working version

* Only check nans if necessary

* Store coeff matrix as an array.

* Align cache lines

* Align cache lines

* Preallocation coefficient calculation matrices

* Small speedups

* Small speedup

* Reverse cache alignment changes

* Change to dynamic schedule

* Update docs.

* Refactor so that linear tree learner is not a separate class.

* Add refit capability.

* Speed up

* Small speedups.

* Speed up add prediction to score.

* Fix bug

* Fix bug and speed up.

* Speed up dataload.

* Speed up dataload

* Use vectors instead of pointers

* Fix bug

* Add OMP exception handling.

* Change return type of LGBM_BoosterGetLinear to bool

* Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change

* Remove unused internal_parent_ property of tree

* Remove unused parameter to CreateTreeLearner

* Remove reference to LinearTreeLearner

* Minor style issues

* Remove unneeded check

* Reverse temporary testing change

* Fix Visual Studio project files

* Restore LightGBM.vcxproj.filters

* Speed up

* Speed up

* Simplify code

* Update docs

* Simplify code

* Initialise storage space for max num threads

* Move Eigen to include directory and delete unused files

* Remove old files.

* Fix so it compiles with mingw

* Fix gpu tree learner

* Change AddPredictionToScore back to const

* Fix python lint error

* Fix C++ lint errors

* Change eigen to a submodule

* Update comment

* Add the eigen folder

* Try to fix build issues with eigen

* Remove eigen files

* Add eigen as submodule

* Fix include paths

* Exclude eigen files from Python linter

* Ignore eigen folders for pydocstyle

* Fix C++ linting errors

* Fix docs

* Fix docs

* Exclude eigen directories from doxygen

* Update manifest to include eigen

* Update build_r to include eigen files

* Fix compiler warnings

* Store raw feature data as float

* Use float for calculating linear coefficients

* Remove eigen directory from GLOB

* Don't compile linear model code when building R package

* Fix doxygen issue

* Fix lint issue

* Fix lint issue

* Remove uneeded code

* Restore delected lines

* Restore delected lines

* Change return type of has_raw to bool

* Update docs

* Rename some variables and functions for readability

* Make tree_learner parameter const in AddScore

* Fix style issues

* Pass vectors as const reference when setting tree properties

* Make temporary storage of serial_tree_learner mutable so we can make the object's methods const

* Remove get_raw_size, use num_numeric_features instead

* Fix typo

* Make contains_nan_ and any_nan_ properties immutable again

* Remove data_has_nan_ property of tree

* Remove temporary test code

* Make linear_tree a dataset param

* Fix lint error

* Make LinearTreeLearner a separate class

* Fix lint errors

* Fix lint error

* Add linear_tree_learner.o

* Simulate omp_get_max_threads if openmp is not available

* Update PushOneData to also store raw data.

* Cast size to int

* Fix bug in ReshapeRaw

* Speed up code with multithreading

* Use OMP_NUM_THREADS

* Speed up with multithreading

* Update to use ArrayToString

* Fix tests

* Fix test

* Fix bug introduced in merge

* Minor updates

* Update docs

fcfd4132

13 Nov, 2020 1 commit

Optimization of row-wise histogram construction (#3522) · 0655d67c

shiyu1994 authored Nov 13, 2020



* store without offset in multi_val_dense_bin

* fix offset bug

* add comment for offset

* add comment for bin type selection

* faster operations for offset

* keep most freq bin in histogram for multi val dense

* use original feature iterators

* consider 9 cases (3 x 3) for multi val bin construction

* fix dense bin setting

* fix bin data in multi val group

* fix offset of the first feature histogram

* use float hist buf

* avx in histogram construction

* use avx for hist construction without prefetch

* vectorize bin extraction

* use only 128 vec

* use avx2

* use vectorization for sparse row wise

* add bit size for multi val dense bin

* float with no vectorization

* change multithreading strategy to dynamic

* remove intrinsic header

* fix dense multi val col copy

* remove bit size

* use large enough block size when the bin number is large

* calc min block size by sparsity

* rescale gradients

* rollback gradients scaling

* single precision histogram buffer as an option

* add float hist buffer with thread buffer

* fix setting zero in hist data

* fix hist begin pointer in tree learners

* remove debug logs

* remove omp simd

* update Makevars of R-package

* fix feature group binary storing

* two row wise for double hist buffer

* add subfeature for two row wise

* remove useless code and fix two row wise

* refactor code

* grouping the dense feature groups can get sparse multi val bin

* clean format problems

* one thread for two blocks in sep row wise

* use ordered gradients for sep row wise

* fix grad ptr

* ordered grad with combined block for sep row wise

* fix block threading

* use the same min block size

* rollback share min block size

* remove logs

* Update src/io/dataset.cpp
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

* fix parameter description

* remove sep_row_wise

* remove check codes

* add check for empty multi val bin

* fix lint error

* rollback changes in config.h

* Apply suggestions from code review
Co-authored-by: Ubuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

0655d67c

20 Sep, 2020 1 commit

[GPU] Add support for CUDA-based GPU build (#3160) · f7ad9457

Chip Kerchner authored Sep 20, 2020

* Initial CUDA work

* redirect log to python console (#3090)

* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

* re-order includes (fixes #3132) (#3133)

* Revert "re-order includes (fixes #3132) (#3133)" (#3153)

This reverts commit 656d2676

* Missing change from previous rebase

* Minor cleanup and removal of development scripts.

* Only set gpu_use_dp on by default for CUDA. Other minor change.

* Fix python lint indentation problem.

* More python lint issues.

* Big lint cleanup - more to come.

* Another large lint cleanup - more to come.

* Even more lint cleanup.

* Minor cleanup so less differences in code.

* Revert is_use_subset changes

* Another rebase from master to fix recent conflicts.

* More lint.

* Simple code cleanup - add & remove blank lines, revert unneccessary format changes, remove added dead code.

* Removed parameters added for CUDA and various bug fix.

* Yet more lint and unneccessary changes.

* Revert another change.

* Removal of unneccessary code.

* temporary appveyor.yml for building and testing

* Remove return value in ReSize

* Removal of unused variables.

* Code cleanup from reviewers suggestions.

* Removal of FIXME comments and unused defines.

* More reviewers comments cleanup.

* Fix config variables.

* Attempt to fix check-docs failure

* Update Paramster.rst for num_gpu

* Removing test appveyor.yml

* Add CUDA_RESOLVE_DEVICE_SYMBOLS to libraries to fix linking issue.

* Fixed handling of data elements less than 2K.

* More reviewers comments cleanup.

* Removal of TODO and fix printing of int64_t

* Add cuda change for CI testing and remove cuda from device_type in python.

* Missed one change form previous check-in

* Removal AdditionConfig and fix settings.

* Limit number of GPUs to one for now in CUDA.

* Update Parameters.rst for previous check-in

* Whitespace removal.

* Cleanup unused code.

* Changed uint/ushort/ulong to unsigned int/short/long to help Windows based CUDA compiler work.

* Lint change from previous check-in.

* Changes based on reviewers comments.

* More reviewer comment changes.

* Adding warning for is_sparse. Revert tmp_subset code. Only return FeatureGroupData if not is_multi_val_

* Fix so that CUDA code will compile even if you enable the SCORE_T_USE_DOUBLE define.

* Reviewer comment cleanup.

* Replace warning with Log message. Removal of some of the USE_CUDA. Fix typo and removal of pragma once.

* Remove PRINT debug for CUDA code.

* Allow to use of multiple GPUs for CUDA.

* More multi-GPUs enablement for CUDA.

* More code cleanup based on reviews comments.

* Update docs with latest config changes.
Co-authored-by: Gordon Fossum <fossum@us.ibm.com>
Co-authored-by: ChipKerchner <ckerchne@linux.vnet.ibm.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

f7ad9457

19 Jul, 2020 1 commit
- set num_threads of share_state (fix #3151) (#3238) · 58b49dd8
  shiyu1994 authored Jul 19, 2020
```
Co-authored-by: Ubuntu <shiyu@gbdt-shiyu.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net>
```
  58b49dd8
05 Jun, 2020 1 commit
- Revert "re-order includes (fixes #3132) (#3133)" (#3153) · ac5f5e56
  Nikita Titov authored Jun 05, 2020
```
This reverts commit 656d2676.
```
  ac5f5e56
01 Jun, 2020 1 commit
- re-order includes (fixes #3132) (#3133) · 656d2676
  James Lamb authored Jun 01, 2020
  
  656d2676
10 Apr, 2020 1 commit

Support UTF-8 characters in feature name again (#2976) · 44a91201

OMOTO Tsukasa authored Apr 10, 2020

* Support UTF-8 characters in feature name again

This commit reverts 0d59859c.
Also see:
- https://github.com/microsoft/LightGBM/issues/2226
- https://github.com/microsoft/LightGBM/issues/2478
- https://github.com/microsoft/LightGBM/pull/2229

I reproduced the issue and as @kidotaka gave us a great survey in #2226,
I don't conclude that the cause is UTF-8, but "an empty string (character)".
Therefore, I revert "throw error when meet non ascii (#2229)" whose commit hash
is 0d59859c, and add support feture names as UTF-8 again.

* add tests

* fix check-docs tests

* update

* fix tests

* update .travis.yml

* fix tests

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* add a test for R-package

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* fix test for R-package

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* update

* updte

* update

* remove unneeded comments

44a91201

11 Mar, 2020 1 commit
- fixed cpplint errors and disable warning only for VS (#2888) · bd10918e
  Nikita Titov authored Mar 11, 2020
```
* fixed cpplint errors and disable warning only for VS

* wrap more pragma warning
```
  bd10918e
08 Mar, 2020 1 commit
- Speed-up "Split" and some code refactorings (#2883) · bcad692e
  Guolin Ke authored Mar 08, 2020
```
* commit

* fix msvc

* fix format
```
  bcad692e
03 Mar, 2020 1 commit

speed up for const hessian (#2857) · bc7d2f0c

Guolin Ke authored Mar 03, 2020

* speed up for const hessian

* rename template

* fix clang build

* template init

* add comment

bc7d2f0c

02 Mar, 2020 3 commits

reduce the overhead of OMP_NUM_THREADS in training (#2852) · 9c386db1

Guolin Ke authored Mar 02, 2020

* reduce overhead of get num_threads

* add warning

* Apply suggestions from code review

* Apply suggestions from code review

9c386db1

speed up multi-val bin subset for bagging (#2827) · d0bec9e9

Guolin Ke authored Mar 02, 2020

* speed up multi-val bin subset for bagging

* remove the duplicated codes

* code refine

* some codes refactoring

* move `is_constant_hessian` into `TrainingShareStates`

* refine

* fix bug

* fix bug when num_groups_ < 0

* fix gpu

* fix gpu bagging

* fix gpu bug

* typo

* Update src/treelearner/serial_tree_learner.h

d0bec9e9

don't save num_thread as possible (#2839) · 0aa7bfee

Guolin Ke authored Mar 02, 2020



* don't cache `num_thread`, to avoid change outside

* rename

* update document

* Update docs/Parameters.rst

* Update include/LightGBM/config.h

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

0aa7bfee

20 Feb, 2020 2 commits

added feature infos to JSON dump (#2660) · c4a7ab81

Nikita Titov authored Feb 20, 2020



* added feature infos to JSON dump

* slight json schema refactor

* simpified code

* refactor feature_infos

* refactoring

* Update src/boosting/gbdt.cpp

* Update dataset.h

* Update include/LightGBM/dataset.h

* simplify

* Apply suggestions from code review

* parse string and construct JSON objs
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

c4a7ab81

remove init-score parameter (#2776) · 3c394c8d

Guolin Ke authored Feb 20, 2020



* remove related cpp codes

* removed more mentiones of init_score_filename params
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

3c394c8d

19 Feb, 2020 2 commits

fixed cpplint issues (#2771) · c315087f
Nikita Titov authored Feb 19, 2020

c315087f

[python] [R-package] refine the parameters for Dataset (#2594) · 9f79e840

Guolin Ke authored Feb 19, 2020



* reset

* fix a bug

* fix test

* Update c_api.h

* support to no filter features by min_data

* add warning in reset config

* refine warnings for override dataset's parameter

* some cleans

* clean code

* clean code

* refine C API function doxygen comments

* refined new param description

* refined doxygen comments for R API function

* removed stuff related to int8

* break long line in warning message

* removed tests which results cannot be validated anymore

* added test for warnings about unchangeable params

* write parameter from dataset to booster

* consider free_raw_data.

* fix params

* fix bug

* implementing R

* fix typo

* filter params in R

* fix R

* not min_data

* refined tests

* fixed linting

* refine

* pilint

* add docstring

* fix docstring

* R lint

* updated description for C API function

* use param aliases in Python

* fixed typo

* fixed typo

* added more params to test

* removed debug print

* fix dataset construct place

* fix merge bug

* Update feature_histogram.hpp

* add is_sparse back

* remove unused parameters

* fix lint

* add data random seed

* update

* [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

9f79e840

17 Feb, 2020 1 commit

speed up sub-feature in row-wise parallelism (#2764) · fed09d33

Guolin Ke authored Feb 17, 2020

* commit

* refactoring

* Update src/io/bin.cpp

* Apply suggestions from code review

* bug

* code clean

* remove warning

* commit

* update parameter

fed09d33

08 Feb, 2020 1 commit

various minor style, docs and cpplint improvements (#2747) · 1c1a2765

Nikita Titov authored Feb 09, 2020

* various minor style, docs and cpplint improvements

* fixed typo in warning

* fix recently added cpplint errors

* move note for params upper in description for consistency

1c1a2765

02 Feb, 2020 1 commit

Support both row-wise and col-wise multi-threading (#2699) · 509c2e50

Guolin Ke authored Feb 02, 2020



* commit

* fix a bug

* fix bug

* reset to track changes

* refine the auto choose logic

* sort the time stats output

* fix include

* change  multi_val_bin_sparse_threshold

* add cmake

* add _mm_malloc and _mm_free for cross platform

* fix cmake bug

* timer for split

* try to fix cmake

* fix tests

* refactor DataPartition::Split

* fix test

* typo

* formating

* Revert "formating"

This reverts commit 5b8de4f7fb9d975ee23701d276a66d40ee6d4222.

* add document

* [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)

* naming

* fix gpu code

* Update include/LightGBM/bin.h
Co-Authored-By: James Lamb <jaylamb20@gmail.com>

* Update src/treelearner/ocl/histogram16.cl

* test: swap compilers for CI

* fix omp

* not avx2

* no aligned for feature histogram

* Revert "refactor DataPartition::Split"

This reverts commit 256e6d9641ade966a1f54da1752e998a1149b6f8.

* slightly refactor data partition

* reduce the memory cost
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

509c2e50

14 Jan, 2020 1 commit

support most frequent bin (#2689) · c7e90393

Guolin Ke authored Jan 14, 2020

* implement

* fix warning

* fix bug

* fix a bug

* remove unneed function

* fix data push bug

* fix valid data push

* fix bug for missing_type=zero

* refine split

* renames

* typo

c7e90393

20 Dec, 2019 1 commit
- fix predict with header (#2643) · ae320e59
  Guolin Ke authored Dec 20, 2019
```
* fix predict with header

* avoid duplicated feature names
```
  ae320e59
11 Nov, 2019 1 commit
- check feature names for special JSON chars (#2557) · b2da19b8
  Nikita Titov authored Nov 11, 2019
  
  b2da19b8
15 Oct, 2019 1 commit

reduce the buffer when using high dimensional data in distributed mode. (#2485) · 40e56ca7

Guolin Ke authored Oct 15, 2019

* reduce the buffer when using high dimensional data in distributed mode.

* Update dataset_loader.cpp

* refix

* typo

* fix number of bin accumulation.

* avoid overflow

* fix warning

* efficient solution.

* Update dataset.h

* fix bin count output

* fix warning

* bug in dist number of feature check

* fix possible edge case

* Update dataset.cpp

* possible bug fix

* fix

40e56ca7

07 Oct, 2019 1 commit
- [docs] fixed miscellaneous typos in comments and documentation (#2496) · d7f8aa53
  James Lamb authored Oct 07, 2019
```
* fixed miscellaneous typos in documentation

* fix typo introduced in typo-fixing PR
```
  d7f8aa53