Commits · 32ef7603aba858b1e3dbf98e23d99a4cb2ee99b8 · tianlh / LightGBM-DCU

13 Apr, 2019 1 commit
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

04 Apr, 2019 1 commit

Add Cost Effective Gradient Boosting (#2014) · 76102284

remcob-gr authored Apr 04, 2019

* Add configuration parameters for CEGB.

* Add skeleton CEGB tree learner

Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.

* Track features used in CEGB tree learner.

* Pull CEGB tradeoff and coupled feature penalty from config.

* Implement finding best splits for CEGB

This is heavily based on the serial version, but just adds using the coupled penalties.

* Set proper defaults for cegb parameters.

* Ensure sanity checks don't switch off CEGB.

* Implement per-data-point feature penalties in CEGB.

* Implement split penalty and remove unused parameters.

* Merge changes from CEGB tree learner into serial tree learner

* Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.

* Fix bug where CEGB would incorrectly penalise a previously used feature

The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.

* Document CEGB parameters and add them to the appropriate section.

* Remove leftover reference to cegb tree learner.

* Remove outdated diff.

* Fix warnings

* Fix minor issues identified by @StrikerRUS.

* Add docs section on CEGB, including citation.

* Fix link.

* Fix CI failure.

* Add some unit tests

* Fix pylint issues.

* Fix remaining pylint issue

76102284

01 Apr, 2019 1 commit
- addressed cpplint error about C-style cast (#2064) · 2027f6b4
  Nikita Titov authored Apr 01, 2019
  
  2027f6b4
26 Mar, 2019 1 commit
- fixed cpplint error about spaces and newlines (#2068) · 3c999be3
  Nikita Titov authored Mar 26, 2019
  
  3c999be3
25 Mar, 2019 2 commits

Add API method LGBM_BoosterPredictForMats (#2008) · 823fc03c

mjmckp authored Mar 25, 2019

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

823fc03c

remove warnings · 548fc91e
Guolin Ke authored Mar 25, 2019

548fc91e

18 Mar, 2019 1 commit

Added additional APIs to better support JNI on Spark (#2032) · beeb6e0f

Markus Cozowicz authored Mar 19, 2019

* added API changes required for JNI performance optimizations (e.g. predict is 3-4x faster)

* removed commented variables

* removed commented header

* renamed method to make it obvious it is created for Spark

* fixed comment alignment

* replaced GetPrimitiveArrayCritical with GetIntArrayElements for training. fixed dead-lock on databricks

beeb6e0f

09 Mar, 2019 1 commit
- Fix typo in GatherInfoForThresholdCategorical (#2040) · 9de526f8
  remcob-gr authored Mar 09, 2019
  
  9de526f8
26 Feb, 2019 1 commit

Add ability to move features from one data set to another in memory (#2006) · 219c943d

remcob-gr authored Feb 26, 2019

* Initial attempt to implement appending features in-memory to another data set

The intent is for this to enable munging files together easily, without needing to round-trip via numpy or write multiple copies to disk.
In turn, that enables working more efficiently with data sets that were written separately.

* Implement Dataset.dump_text, and fix small bug in appending of group bin boundaries.

Dumping to text enables us to compare results, without having to worry about issues like features being reordered.

* Add basic tests for validation logic for add_features_from.

* Remove various internal mapping items from dataset text dumps

These are too sensitive to the exact feature order chosen, which is not visible to the user.
Including them in tests appears unnecessary, as the data dumping code should provide enough coverage.

* Add test that add_features_from results in identical data sets according to dump_text.

* Add test that booster behaviour after using add_features_from matches that of training on the full data

This checks:
- That training after add_features_from works at all
- That add_features_from does not cause training to misbehave

* Expose feature_penalty and monotone_types/constraints via get_field

These getters allow us to check that add_features_from does the right thing with these vectors.

* Add tests that add_features correctly handles feature_penalty and monotone_constraints.

* Ensure add_features_from properly frees the added dataset and add unit test for this

Since add_features_from moves the feature group pointers from the added dataset to the dataset being added to, the added dataset is invalid after the call.
We must ensure we do not try and access this handle.

* Remove some obsolete TODOs

* Tidy up DumpTextFile by using a single iterator for each feature

This iterators were also passed around as raw pointers without being freed, which is now fixed.

* Factor out offsetting logic in AddFeaturesFrom

* Remove obsolete TODO

* Remove another TODO

This one is debatable, test code can be a bit messy and duplicate-heavy, factoring it out tends to end badly.
Leaving this for now, will revisit if adding more tests later on becomes a mess.

* Add documentation for newly-added methods.

* Fix whitespace issues identified by pylint.

* Fix a few more whitespace issues.

* Fix doc comments

* Implement deep copying for feature groups.

* Replace awkward std::move usage by emplace_back, and reduce vector size to num_features rather than num_total_features.

* Copy feature groups in addFeaturesFrom, rather than moving them.

* Fix bugs in FeatureGroup copy constructor and ensure source dataset remains usable

* Add reserve to PushVector and PushOffset

* Move definition of Clone into class body

* Fix PR review issues

* Fix for loop increment style.

* Fix test failure

* Some more docstring fixes.

* Remove blank line

219c943d

24 Feb, 2019 1 commit

[docs] added notes about params usage when data is provided via path and... · f9ab5f58

Nikita Titov authored Feb 24, 2019

[docs] added notes about params usage when data is provided via path and removed unused param (#2024)

* added notes about params usage when data is provided via path

* fixed init score and valid init score params note

* fixed binary params description

f9ab5f58

06 Feb, 2019 1 commit
- fixed modifiers indent (#1997) · 462612b4
  Nikita Titov authored Feb 06, 2019
  
  462612b4
03 Feb, 2019 1 commit
- fix #1994 · b7a61c32
  Guolin Ke authored Feb 03, 2019
  
  b7a61c32
02 Feb, 2019 2 commits
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
- improved model loading routines (#1979) · 861de1c1
  Nikita Titov authored Feb 02, 2019
  
  861de1c1
31 Jan, 2019 1 commit
- fix #1981 · 5530f286
  Guolin Ke authored Jan 31, 2019
  
  5530f286
30 Jan, 2019 2 commits

fix nan in eval results (#1973) · feeaf38f

Guolin Ke authored Jan 30, 2019

* always save the score of the first round in early stopping

fix #1971

* avoid using std::log on non-positive numbers

* remove unnecessary changes

* add tests

* Update test_sklearn.py

* enhanced tests

feeaf38f

fix R's overflow (#1960) · 5c399840
Guolin Ke authored Jan 30, 2019

5c399840

29 Jan, 2019 1 commit
- fix more edge cases in mape (#1977) · a2f5c50c
  Guolin Ke authored Jan 29, 2019
  
  a2f5c50c
23 Jan, 2019 1 commit

support to override some parameters in Dataset (#1876) · b37065db

Guolin Ke authored Jan 23, 2019

* add warnings for override parameters of Dataset

* fix pep8

* add feature_penalty

* refactor

* add R's code

* Update basic.py

* Update basic.py

* fix parameter bug

* Update lgb.Dataset.R

* fix a bug

b37065db

22 Jan, 2019 1 commit
- [docs] fixed minor typos in documentation (#1959) · f3080967
  James Lamb authored Jan 22, 2019
```
* fixed minor typos in documentation

* fixed typo in gpu_tree_learner.cpp

* Update .gitignore
```
  f3080967
18 Jan, 2019 1 commit
- removed warnings about types in comparison ([-Wsign-compare]) (#1953) · e220f8af
  Nikita Titov authored Jan 18, 2019
```
* removed comparison warning

* fixed spacing
```
  e220f8af
16 Jan, 2019 2 commits

Reserve vectors, to save reallocation costs. (#1949) · 24c9503f

Shahzad Lone authored Jan 16, 2019

File: [LightGBM//src/io/dataset.cpp]
Function: [138:FastFeatureBundling(...)]

Reserving vectors where we already know the size to save on reallocation costs.

Also removed a variable that was unnecessary.

24c9503f

When loading a binary file, take feature penalty and monotone constraints from... · 61527856

remcob-gr authored Jan 16, 2019

When loading a binary file, take feature penalty and monotone constraints from config if given there. (#1881)

* When loading a binary file, take feature penalty from config if given there.

* When loading a binary file, take feature penalty from config if given there.

* Fix crash when num_features != num_total_features and feature_contri is given.

* Apply the same logic to monotone_types_.

* Fix indentation

61527856

20 Dec, 2018 1 commit
- fix trival typo (#1915) · 92e95e62
  Lingyi Hu authored Dec 20, 2018
  
  92e95e62
17 Dec, 2018 1 commit

Fix bugs in RF (#1906) · cba82447

Guolin Ke authored Dec 17, 2018

* fix RF's bugs

* fix tests

* rollback num_iterations

* fix a bug and reduce memory costs

* reduce memory cost

cba82447

14 Dec, 2018 1 commit

Enabling feature_fraction == 1.0f in ResetConfig (#1902) · 0c5f390a

yvandenis authored Dec 14, 2018

Init has been updated to allow for feature_fraction == 1.0f but the change has not been copyed in the ResetConfig function.

0c5f390a

10 Dec, 2018 1 commit
- [CPP] disable early stopping in dart (#1894) · 3f1fc835
  Guolin Ke authored Dec 10, 2018
```
* [CPP] disable early stopping in dart

* fix compile error
```
  3f1fc835
30 Nov, 2018 1 commit
- Allow feature_fraction=1.0 in random forest mode · adc1004f
  Guolin Ke authored Nov 30, 2018
  
  adc1004f
25 Nov, 2018 2 commits
- fix bug for one-class binary (#1877) · 0c4bb89d
  Guolin Ke authored Nov 26, 2018
  
  0c4bb89d
- fix offset in score_updater. · e55c8158
  Guolin Ke authored Nov 25, 2018
  
  e55c8158
23 Nov, 2018 1 commit
- add average_output & objective fields to JSON format (#1850) · 9ef85fd8
  Dmitry Khominich authored Nov 23, 2018
  
  9ef85fd8
06 Nov, 2018 3 commits

clean warning · a0efb07b
Guolin Ke authored Nov 06, 2018

a0efb07b
fix a bug in WeightedPercentile function. · 0f224c05
Guolin Ke authored Nov 06, 2018

0f224c05

Fix index out-of-range exception generated by BaggingHelper on small datasets. (#1817) · 2204e45f

mjmckp authored Nov 06, 2018

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

2204e45f

01 Nov, 2018 2 commits
- try to fix bug with disable openmp (#1813) · 59f10453
  Guolin Ke authored Nov 01, 2018
  
  59f10453
- [ci] check dynamic symbol versions at CI side (#1812) · dfe0fae4
  Nikita Titov authored Nov 01, 2018
```
* renamed helper folder to helpers

* added library dependencies check
```
  dfe0fae4
31 Oct, 2018 1 commit
- fix missing value handle in forced split (#1809) · c3f10a1b
  Guolin Ke authored Oct 31, 2018
  
  c3f10a1b
27 Oct, 2018 1 commit

[docs] Quick fix for better understanding for forced split logic (#1784) · c8e0995b

Qiwei Ye authored Oct 27, 2018

* quick fix for better understanding

* update document for forced split

* typo fix

* made NOTE bold

* made Note bold

c8e0995b

26 Oct, 2018 1 commit
- fix problems with null json node. (#1785) · b64751bf
  Guolin Ke authored Oct 26, 2018
  
  b64751bf