Commits · 32ef7603aba858b1e3dbf98e23d99a4cb2ee99b8 · tianlh / LightGBM-DCU

13 Apr, 2019 1 commit
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

04 Apr, 2019 1 commit

Add Cost Effective Gradient Boosting (#2014) · 76102284

remcob-gr authored Apr 04, 2019

* Add configuration parameters for CEGB.

* Add skeleton CEGB tree learner

Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.

* Track features used in CEGB tree learner.

* Pull CEGB tradeoff and coupled feature penalty from config.

* Implement finding best splits for CEGB

This is heavily based on the serial version, but just adds using the coupled penalties.

* Set proper defaults for cegb parameters.

* Ensure sanity checks don't switch off CEGB.

* Implement per-data-point feature penalties in CEGB.

* Implement split penalty and remove unused parameters.

* Merge changes from CEGB tree learner into serial tree learner

* Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.

* Fix bug where CEGB would incorrectly penalise a previously used feature

The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.

* Document CEGB parameters and add them to the appropriate section.

* Remove leftover reference to cegb tree learner.

* Remove outdated diff.

* Fix warnings

* Fix minor issues identified by @StrikerRUS.

* Add docs section on CEGB, including citation.

* Fix link.

* Fix CI failure.

* Add some unit tests

* Fix pylint issues.

* Fix remaining pylint issue

76102284

01 Apr, 2019 1 commit
- addressed cpplint error about C-style cast (#2064) · 2027f6b4
  Nikita Titov authored Apr 01, 2019
  
  2027f6b4
26 Mar, 2019 1 commit
- fixed cpplint error about spaces and newlines (#2068) · 3c999be3
  Nikita Titov authored Mar 26, 2019
  
  3c999be3
25 Mar, 2019 1 commit

Add API method LGBM_BoosterPredictForMats (#2008) · 823fc03c

mjmckp authored Mar 25, 2019

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

823fc03c

18 Mar, 2019 1 commit

Added additional APIs to better support JNI on Spark (#2032) · beeb6e0f

Markus Cozowicz authored Mar 19, 2019

* added API changes required for JNI performance optimizations (e.g. predict is 3-4x faster)

* removed commented variables

* removed commented header

* renamed method to make it obvious it is created for Spark

* fixed comment alignment

* replaced GetPrimitiveArrayCritical with GetIntArrayElements for training. fixed dead-lock on databricks

beeb6e0f

26 Feb, 2019 1 commit

Add ability to move features from one data set to another in memory (#2006) · 219c943d

remcob-gr authored Feb 26, 2019

* Initial attempt to implement appending features in-memory to another data set

The intent is for this to enable munging files together easily, without needing to round-trip via numpy or write multiple copies to disk.
In turn, that enables working more efficiently with data sets that were written separately.

* Implement Dataset.dump_text, and fix small bug in appending of group bin boundaries.

Dumping to text enables us to compare results, without having to worry about issues like features being reordered.

* Add basic tests for validation logic for add_features_from.

* Remove various internal mapping items from dataset text dumps

These are too sensitive to the exact feature order chosen, which is not visible to the user.
Including them in tests appears unnecessary, as the data dumping code should provide enough coverage.

* Add test that add_features_from results in identical data sets according to dump_text.

* Add test that booster behaviour after using add_features_from matches that of training on the full data

This checks:
- That training after add_features_from works at all
- That add_features_from does not cause training to misbehave

* Expose feature_penalty and monotone_types/constraints via get_field

These getters allow us to check that add_features_from does the right thing with these vectors.

* Add tests that add_features correctly handles feature_penalty and monotone_constraints.

* Ensure add_features_from properly frees the added dataset and add unit test for this

Since add_features_from moves the feature group pointers from the added dataset to the dataset being added to, the added dataset is invalid after the call.
We must ensure we do not try and access this handle.

* Remove some obsolete TODOs

* Tidy up DumpTextFile by using a single iterator for each feature

This iterators were also passed around as raw pointers without being freed, which is now fixed.

* Factor out offsetting logic in AddFeaturesFrom

* Remove obsolete TODO

* Remove another TODO

This one is debatable, test code can be a bit messy and duplicate-heavy, factoring it out tends to end badly.
Leaving this for now, will revisit if adding more tests later on becomes a mess.

* Add documentation for newly-added methods.

* Fix whitespace issues identified by pylint.

* Fix a few more whitespace issues.

* Fix doc comments

* Implement deep copying for feature groups.

* Replace awkward std::move usage by emplace_back, and reduce vector size to num_features rather than num_total_features.

* Copy feature groups in addFeaturesFrom, rather than moving them.

* Fix bugs in FeatureGroup copy constructor and ensure source dataset remains usable

* Add reserve to PushVector and PushOffset

* Move definition of Clone into class body

* Fix PR review issues

* Fix for loop increment style.

* Fix test failure

* Some more docstring fixes.

* Remove blank line

219c943d

24 Feb, 2019 1 commit

[docs] added notes about params usage when data is provided via path and... · f9ab5f58

Nikita Titov authored Feb 24, 2019

[docs] added notes about params usage when data is provided via path and removed unused param (#2024)

* added notes about params usage when data is provided via path

* fixed init score and valid init score params note

* fixed binary params description

f9ab5f58

06 Feb, 2019 1 commit
- fixed modifiers indent (#1997) · 462612b4
  Nikita Titov authored Feb 06, 2019
  
  462612b4
02 Feb, 2019 1 commit
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
30 Jan, 2019 2 commits

fix nan in eval results (#1973) · feeaf38f

Guolin Ke authored Jan 30, 2019

* always save the score of the first round in early stopping

fix #1971

* avoid using std::log on non-positive numbers

* remove unnecessary changes

* add tests

* Update test_sklearn.py

* enhanced tests

feeaf38f

fix R's overflow (#1960) · 5c399840
Guolin Ke authored Jan 30, 2019

5c399840

23 Jan, 2019 1 commit

support to override some parameters in Dataset (#1876) · b37065db

Guolin Ke authored Jan 23, 2019

* add warnings for override parameters of Dataset

* fix pep8

* add feature_penalty

* refactor

* add R's code

* Update basic.py

* Update basic.py

* fix parameter bug

* Update lgb.Dataset.R

* fix a bug

b37065db

20 Dec, 2018 1 commit
- fix trival typo (#1915) · 92e95e62
  Lingyi Hu authored Dec 20, 2018
  
  92e95e62
17 Dec, 2018 1 commit

Fix bugs in RF (#1906) · cba82447

Guolin Ke authored Dec 17, 2018

* fix RF's bugs

* fix tests

* rollback num_iterations

* fix a bug and reduce memory costs

* reduce memory cost

cba82447

25 Nov, 2018 1 commit
- fixed seed param default value and description (#1872) · 25b3de36
  Nikita Titov authored Nov 25, 2018
  
  25b3de36
22 Nov, 2018 1 commit
- added note about valid format of ignored columns (#1865) · d21a0e39
  Nikita Titov authored Nov 22, 2018
  
  d21a0e39
01 Nov, 2018 1 commit
- try to fix bug with disable openmp (#1813) · 59f10453
  Guolin Ke authored Nov 01, 2018
  
  59f10453
29 Oct, 2018 1 commit
- [docs] is_provide_training_metric param can be used only in CLI version (#1800) · 7537bbe3
  Nikita Titov authored Oct 29, 2018
  
  7537bbe3
27 Oct, 2018 1 commit

[docs] Quick fix for better understanding for forced split logic (#1784) · c8e0995b

Qiwei Ye authored Oct 27, 2018

* quick fix for better understanding

* update document for forced split

* typo fix

* made NOTE bold

* made Note bold

c8e0995b

10 Oct, 2018 2 commits
- fix ranking tasks consistency (#1739) · 496a07d1
  Guolin Ke authored Oct 10, 2018
```
* fix ndcg consistency.

* more stable sorts

* Update gbdt_model_text.cpp

* Update dataset.cpp

* Update gbdt_model_text.cpp
```
  496a07d1
- [docs] fixed some typos and grammatical errors (#1738) · ac6951d3
  Alex authored Oct 10, 2018
  
  ac6951d3
09 Oct, 2018 1 commit

average predictions for constant features (#1735) · c920e634

Guolin Ke authored Oct 09, 2018

* average predictions for constant features

* fix possible numerical issues in std::log.

* fix pylint

* fix bugs in c_api

* fix styles

* clean code for multi class

* rewrite test

* fix pylint

* skip test_constant_features

* refine test

* fix tests

* fix tests

* update FAQ

* fix test

* Update FAQ.rst

c920e634

29 Sep, 2018 1 commit
- add indices in shuffle model. (#1710) · 81e2485a
  Guolin Ke authored Sep 29, 2018
```
* add indexs in shuffle model.

* fix pep

* fix bug
```
  81e2485a
16 Sep, 2018 1 commit

[doc] Update GPU-Targets.rst (#1609) · 6214657a

Huan Zhang authored Sep 16, 2018

* Update GPU-Targets.rst

Fix some inaccurate information in docs

* fix travisCI warning

* fix typos

* update config.h

6214657a

11 Sep, 2018 1 commit
- Docs & Warning on sparse categorical features (#1636) · a58aca64
  dmitryikh authored Sep 11, 2018
```
* warning on categorical feature with sparse values

* [docs] categorical features note
```
  a58aca64
06 Sep, 2018 1 commit

[python] pass params to _InnerPredictor in train and cv and verbose fix (#1628) · bd3889f7

Nikita Titov authored Sep 06, 2018

* pass params to _InnerPredictor in train and cv

* fixed verbosity param description

* treat silent param as Fatal log level

* create Dataset in refit method silently

* do not overwrite verbose param by silent argument

bd3889f7

29 Aug, 2018 1 commit
- [docs] added note about shap package (#1620) · abd73765
  Nikita Titov authored Aug 29, 2018
  
  abd73765
27 Aug, 2018 2 commits

various improvements around metric param and early_stopping_rounds param description (#1589) · cd6d0583

Nikita Titov authored Aug 27, 2018

* bring consistency and clearness into early_stopping_rounds desc, metric desc and implementation

* hotfix

* hotfix

* used NDCG as default metric for lambdarank task

* fixed missed methods at ReadTheDocs and changed default eval_metric

* leaved only unique metrics

* fixed comment

cd6d0583

add NumModelPerIteration and NumberOfTotalModel in C_API (#1613) · c77153a1

Nikita Titov authored Aug 27, 2018

* added NumberOfTotalModel and NumModelPerIteration to C_API and python-package

* fixed tests

* added tests for current_iteration, num_trees, num_model_per_iteration methods

* break huge line in test

* hotfix

c77153a1

25 Aug, 2018 1 commit

add support of refit-decay (#1603) · 2db6377a

Guolin Ke authored Aug 25, 2018

* add support of refit-decay

* add refit into c_api

* add test

* update document

* Update basic.py

* Update test_engine.py

* Update basic.py

* Update test_engine.py

* fix comments

* update test

* fix the comments

* Update test_engine.py

2db6377a

22 Aug, 2018 1 commit

add start_iteration in model saving (#1565) · 941068ee

Guolin Ke authored Aug 22, 2018

* add start_iteration in model saving

* fix test

* shuffle models ability

* fix bug

* update document

* refine

* Update engine.py

* Update basic.py

* fix comments

* fix comment

941068ee

21 Aug, 2018 1 commit

remove unnecessary std::move & unused capture (#1596) · 3400e389

Qiwei Ye authored Aug 21, 2018

* remove unnecessary  std::move

* remove unused-lambda-capture

* remove unused-lambda-capture

* fix unused parameter

* minor fix

*  invalid capture of lambda function

3400e389

17 Aug, 2018 1 commit
- fix error when using distributed quantile regression (#1593) · c78f26ed
  Ilya Matiach authored Aug 17, 2018
  
  c78f26ed
16 Aug, 2018 1 commit
- fix include (#1586) · 5bee6489
  Guolin Ke authored Aug 16, 2018
```
* fix include

* reduce dependency on header file

* fix build
```
  5bee6489
08 Aug, 2018 1 commit

[docs] negative values in category columns (#1567) · 93764fda

Nikita Titov authored Aug 08, 2018

* broadcast info about negative values in categorical features to python package

* update link to categorical_feature parameter

93764fda

06 Aug, 2018 1 commit
- [docs] updated docs of num_iterations parameter (#1515) · 82ae9b20
  Nikita Titov authored Aug 06, 2018
```
* updated docs of num_iterations parameter

* updated num_iterations param description for R
```
  82ae9b20
31 Jul, 2018 1 commit
- fix custom metric for multiclass (#1505) · 7b6f80f3
  Guolin Ke authored Jul 31, 2018
```
* fix custom metric for multiclass

* fix alias

* fix bug

* fix indent
```
  7b6f80f3
25 Jul, 2018 1 commit
- [docs] added new parameters aliases (#1537) · 00a125d5
  Nikita Titov authored Jul 25, 2018
```
* added new aliases for params

* run helper/parameter_generator.py

* removed useless test
```
  00a125d5