Commits · d018d30a97126b4a97b655aea72f47cc2e8886b8 · tianlh / LightGBM-DCU

04 Mar, 2020 1 commit
- fixed cpplint issues (#2863) · d018d30a
  Nikita Titov authored Mar 04, 2020
```
* fixed cpplint errors

* fixed more cpplint errors
```
  d018d30a
02 Mar, 2020 1 commit
- introduced specific CHECKs (#2849) · 5a80b788
  Nikita Titov authored Mar 02, 2020
  
  5a80b788
29 Feb, 2020 1 commit
- fix bug for multi-val-bin construction (#2841) · 8f5cd522
  Guolin Ke authored Feb 29, 2020
```
* fix

* Update multi_val_sparse_bin.hpp
```
  8f5cd522
27 Feb, 2020 2 commits
- avoid most_freq_bin to be 0 in categorical features (#2824) · e502ed01
  Guolin Ke authored Feb 27, 2020
```
* avoid most_freq_bin to be 0 in categorical features

* Apply suggestions from code review

* add tests

* update test

* Apply suggestions from code review

* Apply suggestions from code review
```
  e502ed01
- fixed cpplint issues and updated docs (#2830) · b305a432
  Nikita Titov authored Feb 27, 2020
  
  b305a432
25 Feb, 2020 1 commit
- support larger entry size for multi-val bin (#2817) · 73dc1bbd
  Guolin Ke authored Feb 25, 2020
  
  73dc1bbd
19 Feb, 2020 1 commit

[python] [R-package] refine the parameters for Dataset (#2594) · 9f79e840

Guolin Ke authored Feb 19, 2020



* reset

* fix a bug

* fix test

* Update c_api.h

* support to no filter features by min_data

* add warning in reset config

* refine warnings for override dataset's parameter

* some cleans

* clean code

* clean code

* refine C API function doxygen comments

* refined new param description

* refined doxygen comments for R API function

* removed stuff related to int8

* break long line in warning message

* removed tests which results cannot be validated anymore

* added test for warnings about unchangeable params

* write parameter from dataset to booster

* consider free_raw_data.

* fix params

* fix bug

* implementing R

* fix typo

* filter params in R

* fix R

* not min_data

* refined tests

* fixed linting

* refine

* pilint

* add docstring

* fix docstring

* R lint

* updated description for C API function

* use param aliases in Python

* fixed typo

* fixed typo

* added more params to test

* removed debug print

* fix dataset construct place

* fix merge bug

* Update feature_histogram.hpp

* add is_sparse back

* remove unused parameters

* fix lint

* add data random seed

* update

* [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

9f79e840

17 Feb, 2020 1 commit

speed up sub-feature in row-wise parallelism (#2764) · fed09d33

Guolin Ke authored Feb 17, 2020

* commit

* refactoring

* Update src/io/bin.cpp

* Apply suggestions from code review

* bug

* code clean

* remove warning

* commit

* update parameter

fed09d33

08 Feb, 2020 1 commit

various minor style, docs and cpplint improvements (#2747) · 1c1a2765

Nikita Titov authored Feb 09, 2020

* various minor style, docs and cpplint improvements

* fixed typo in warning

* fix recently added cpplint errors

* move note for params upper in description for consistency

1c1a2765

02 Feb, 2020 1 commit

Support both row-wise and col-wise multi-threading (#2699) · 509c2e50

Guolin Ke authored Feb 02, 2020



* commit

* fix a bug

* fix bug

* reset to track changes

* refine the auto choose logic

* sort the time stats output

* fix include

* change  multi_val_bin_sparse_threshold

* add cmake

* add _mm_malloc and _mm_free for cross platform

* fix cmake bug

* timer for split

* try to fix cmake

* fix tests

* refactor DataPartition::Split

* fix test

* typo

* formating

* Revert "formating"

This reverts commit 5b8de4f7fb9d975ee23701d276a66d40ee6d4222.

* add document

* [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)

* naming

* fix gpu code

* Update include/LightGBM/bin.h
Co-Authored-By: James Lamb <jaylamb20@gmail.com>

* Update src/treelearner/ocl/histogram16.cl

* test: swap compilers for CI

* fix omp

* not avx2

* no aligned for feature histogram

* Revert "refactor DataPartition::Split"

This reverts commit 256e6d9641ade966a1f54da1752e998a1149b6f8.

* slightly refactor data partition

* reduce the memory cost
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

509c2e50

14 Jan, 2020 1 commit

support most frequent bin (#2689) · c7e90393

Guolin Ke authored Jan 14, 2020

* implement

* fix warning

* fix bug

* fix a bug

* remove unneed function

* fix data push bug

* fix valid data push

* fix bug for missing_type=zero

* refine split

* renames

* typo

c7e90393

15 Oct, 2019 1 commit

reduce the buffer when using high dimensional data in distributed mode. (#2485) · 40e56ca7

Guolin Ke authored Oct 15, 2019

* reduce the buffer when using high dimensional data in distributed mode.

* Update dataset_loader.cpp

* refix

* typo

* fix number of bin accumulation.

* avoid overflow

* fix warning

* efficient solution.

* Update dataset.h

* fix bin count output

* fix warning

* bug in dist number of feature check

* fix possible edge case

* Update dataset.cpp

* possible bug fix

* fix

40e56ca7

01 Oct, 2019 1 commit
- fixed cpplint errors about spaces and newlines (#2481) · 9b61166f
  Nikita Titov authored Oct 01, 2019
  
  9b61166f
28 Sep, 2019 1 commit

Predefined bin thresholds (#2325) · cc7a1e27

Belinda Trotta authored Sep 29, 2019

* Fix bug where small values of max_bin cause crash.

* Revert "Fix bug where small values of max_bin cause crash."

This reverts commit fe5c8e2547057c1fa5750bcddd359dd7708fab4b.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Add functionality to force bin thresholds.

* Fix style issues.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Use different bin finding function for predefined bounds.

* Fix style issues.

* Minor refactoring, overload FindBinWithZeroAsOneBin.

* Fix style issues.

* Fix bug and add new test.

* Add warning when using categorical features with forced bins.

* Pass forced_upper_bounds by reference.

* Pass container types by const reference.

* Get categorical features using FeatureBinMapper.

* Fix bug for small max_bin.

* Move GetForcedBins to DatasetLoader.

* Find forced bins in dataset_loader.

* Minor fixes.

cc7a1e27

22 Sep, 2019 1 commit

fix many cpp lint errors (#2426) · f1a14869

Guolin Ke authored Sep 22, 2019

* fix many cpp lint errors

* indent

* fix bug

* fix more

* fix gpu

* more fixes

f1a14869

20 Aug, 2019 1 commit
- fix the bug in bin with small values (#2342) · 20f94c52
  Guolin Ke authored Aug 20, 2019
```
* fix the bug in bin with small values

* Update bin.cpp

* Update test_engine.py
```
  20f94c52
16 Aug, 2019 1 commit

Bug fix: small values of max_bin cause program to crash (#2299) · c421f898

Belinda Trotta authored Aug 16, 2019

* Fix bug where small values of max_bin cause crash.

* Revert "Fix bug where small values of max_bin cause crash."

This reverts commit fe5c8e2547057c1fa5750bcddd359dd7708fab4b.

* Fix bug where small values of max_bin cause crash.

* Reset random seed in test, remove extra blank line.

* Minor bug fix. Remove extra blank line.

* Change old test to account for new binning behavior.

c421f898

23 Jul, 2019 1 commit
- fix MissingType::Zero in categorical features. (#2275) · 86a95783
  Guolin Ke authored Jul 23, 2019
  
  86a95783
13 Apr, 2019 1 commit
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

02 Feb, 2019 1 commit
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
20 Dec, 2018 1 commit
- fix trival typo (#1915) · 92e95e62
  Lingyi Hu authored Dec 20, 2018
  
  92e95e62
10 Oct, 2018 1 commit

fix ranking tasks consistency (#1739) · 496a07d1

Guolin Ke authored Oct 10, 2018

* fix ndcg consistency.

* more stable sorts

* Update gbdt_model_text.cpp

* Update dataset.cpp

* Update gbdt_model_text.cpp

496a07d1

11 Sep, 2018 1 commit
- Docs & Warning on sparse categorical features (#1636) · a58aca64
  dmitryikh authored Sep 11, 2018
```
* warning on categorical feature with sparse values

* [docs] categorical features note
```
  a58aca64
16 Aug, 2018 1 commit
- fix include (#1586) · 5bee6489
  Guolin Ke authored Aug 16, 2018
```
* fix include

* reduce dependency on header file

* fix build
```
  5bee6489
29 Jul, 2018 1 commit
- fix all negative values in cat features (#1547) · c30ace21
  Guolin Ke authored Jul 29, 2018
```
* fix all negative values in cat features

* fix a bug
```
  c30ace21
27 Feb, 2018 1 commit

Experimental support for HDFS (#1243) · 7e186a57

ebernhardson authored Feb 26, 2018

* Read and write datsets from hdfs.
* Only enabled when cmake is run with -DUSE_HDFS:BOOL=TRUE
* Introduces VirtualFile(Reader|Writer) to asbtract VFS differences

7e186a57

12 Dec, 2017 1 commit
- change kZeroThreshold to 1e-35f · 0a7a4080
  Guolin Ke authored Dec 12, 2017
  
  0a7a4080
09 Nov, 2017 1 commit

add init_score & test cpp and python result consistency (#1007) · bc0579c8

wxchan authored Nov 09, 2017

* add init_score & test cpp and python result consistency

* try fix common.h

* Fix tests (#3)

* update atof

* fix bug

* fix tests.

* fix bug

* fix dtypes

* fix categorical feature override

* fix protobuf on vs build (#1004)

* [optional] support protobuf

* fix windows/LightGBM.vcxproj

* add doc

* fix doc

* fix vs support (#2)

* fix vs support

* fix cmake

* fix #1012

* [python] add network config api  (#1019)

* add network

* update doc

* add float tolerance in bin finder.

* fix a bug

* update tests

* add double torelance on tree model

* fix tests

* simplify the double comparison

* fix lightsvm zero base

* move double tolerance to the bin finder.

* fix pylint

* clean test.sh

* add sklearn test

* remove underline

* clean codes

* set random_state=None

* add last line

* fix doc

* rename file

* try fix test

bc0579c8

16 Oct, 2017 2 commits

reduce parameters in categorical split · db9ec217
Guolin Ke authored Oct 17, 2017

db9ec217

Refine categorical features (#993) · eadc7b9d

Guolin Ke authored Oct 16, 2017

* many fixes for categorical feature

* add l2 to categorcial split.

* remove useless file

* update version

* add cat_l2

* update appveyor verison

* remove file

* fix tests.

* change default cat_l2 value

* fix a bug in bin finder

* change default cat_smooth_ratio

eadc7b9d

13 Oct, 2017 1 commit
- fix #991 (#992) · ef221275
  Guolin Ke authored Oct 14, 2017
```
* refine categorical split

* a bug fix

* fix a bug
```
  ef221275
30 Aug, 2017 1 commit
- check edge case for bin finder. · b5e211ba
  Guolin Ke authored Aug 30, 2017
  
  b5e211ba
18 Aug, 2017 1 commit
- fix merge bugs. · c62dcf73
  Guolin Ke authored Aug 18, 2017
  
  c62dcf73
30 Jul, 2017 1 commit

Better missing value handle (#747) · 00cb04a2

Guolin Ke authored Jul 30, 2017

* finish the data loading part

* allow prediction.

* fix bug for decision type.

* finish split finding part

* fix bugs.

* bug fixed. add a test .

* fix pep8 .

* update documents.

* fix test bugs.

* fix a format

* fix import error in python test.

* disable missing handle in categorial features.

* fix a bug.

* add more tests.

* fix pep8

* fix bugs.

* remove the missing handle code for categorical feature.

00cb04a2

16 Jun, 2017 2 commits
- Add const to BinMapper::CopyTo · 1c66bfcb
  Guolin Ke authored Jun 17, 2017
  
  1c66bfcb
- fix #609 · 716584f4
  Guolin Ke authored Jun 16, 2017
  
  716584f4
15 May, 2017 1 commit
- Handle for missing values (#516) · e984b0d6
  Guolin Ke authored May 15, 2017
  
  e984b0d6
09 Apr, 2017 1 commit

Initial GPU acceleration support for LightGBM (#368) · 0bb4a825

Huan Zhang authored Apr 09, 2017

* add dummy gpu solver code

* initial GPU code

* fix crash bug

* first working version

* use asynchronous copy

* use a better kernel for root

* parallel read histogram

* sparse features now works, but no acceleration, compute on CPU

* compute sparse feature on CPU simultaneously

* fix big bug; add gpu selection; add kernel selection

* better debugging

* clean up

* add feature scatter

* Add sparse_threshold control

* fix a bug in feature scatter

* clean up debug

* temporarily add OpenCL kernels for k=64,256

* fix up CMakeList and definition USE_GPU

* add OpenCL kernels as string literals

* Add boost.compute as a submodule

* add boost dependency into CMakeList

* fix opencl pragma

* use pinned memory for histogram

* use pinned buffer for gradients and hessians

* better debugging message

* add double precision support on GPU

* fix boost version in CMakeList

* Add a README

* reconstruct GPU initialization code for ResetTrainingData

* move data to GPU in parallel

* fix a bug during feature copy

* update gpu kernels

* update gpu code

* initial port to LightGBM v2

* speedup GPU data loading process

* Add 4-bit bin support to GPU

* re-add sparse_threshold parameter

* remove kMaxNumWorkgroups and allows an unlimited number of features

* add feature mask support for skipping unused features

* enable kernel cache

* use GPU kernels withoug feature masks when all features are used

* REAdme.

* update README

* fix typos (#349)

* change compile to gcc on Apple as default

* clean vscode related file

* refine api of constructing from sampling data.

* fix bug in the last commit.

* more efficient algorithm to sample k from n.

* fix bug in filter bin

* change to boost from average output.

* fix tests.

* only stop training when all classes are finshed in multi-class.

* limit the max tree output. change hessian in multi-class objective.

* robust tree model loading.

* fix test.

* convert the probabilities to raw score in boost_from_average of classification.

* fix the average label for binary classification.

* Add boost_from_average to docs (#354)

* don't use "ConvertToRawScore" for self-defined objective function.

* boost_from_average seems doesn't work well in binary classification. remove it.

* For a better jump link (#355)

* Update Python-API.md

* for a better jump in page

A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)

After adding the spaces, we can jump to the exact position in page by click the link.

* fixed something mentioned by @wxchan

* Update Python-API.md

* add FitByExistingTree.

* adapt GPU tree learner for FitByExistingTree

* avoid NaN output.

* update boost.compute

* fix typos (#361)

* fix broken links (#359)

* update README

* disable GPU acceleration by default

* fix image url

* cleanup debug macro

* remove old README

* do not save sparse_threshold_ in FeatureGroup

* add details for new GPU settings

* ignore submodule when doing pep8 check

* allocate workspace for at least one thread during builing Feature4

* move sparse_threshold to class Dataset

* remove duplicated code in GPUTreeLearner::Split

* Remove duplicated code in FindBestThresholds and BeforeFindBestSplit

* do not rebuild ordered gradients and hessians for sparse features

* support feature groups in GPUTreeLearner

* Initial parallel learners with GPU support

* add option device, cleanup code

* clean up FindBestThresholds; add some omp parallel

* constant hessian optimization for GPU

* Fix GPUTreeLearner crash when there is zero feature

* use np.testing.assert_almost_equal() to compare lists of floats in tests

* travis for GPU

0bb4a825

22 Mar, 2017 1 commit
- fix bug in filter bin · 1c1749db
  Guolin Ke authored Mar 22, 2017
  
  1c1749db