Commits · 207bb3ef329b38fc255ef36b1d7ca765924363ea · tianlh / LightGBM-DCU

25 Jul, 2019 2 commits

Guolin Ke authored Jul 25, 2019

* fix metric alias

* fix format

* updated docs

* simplify alias in objective function

* move the alias parsing to config.cpp

* updated docs

* fix multi-class aliases

* updated regression aliases in docs

* fixed trailing space

5d3a3ea4

fixed cpplint errors about spaces and indents (#2282) · 716fe4d0
Nikita Titov authored Jul 25, 2019

716fe4d0

24 Jul, 2019 1 commit

add weight in tree model output (#2269) · e1d7a7b9

Guolin Ke authored Jul 24, 2019

* add weight in tree model output

* fix bug

* updated Python plotting part to handle weights

e1d7a7b9

18 Jul, 2019 1 commit
- throw error when meet non ascii (#2229) · 0d59859c
  Guolin Ke authored Jul 19, 2019
```
* throw error when meet non ascii

* check ascii for config strings.
```
  0d59859c
08 Jul, 2019 1 commit

Max bin by feature (#2190) · 291752de

Belinda Trotta authored Jul 08, 2019

* Add parameter max_bin_by_feature.

* Fix minor bug.

* Fix minor bug.

* Fix calculation of header size for writing binary file.

* Fix style issues.

* Fix python style issue.

* Fix test and python style issue.

291752de

07 Jul, 2019 1 commit
- switch name and alias of rmse metric (#2257) · bf78008b
  Nikita Titov authored Jul 07, 2019
  
  bf78008b
18 Jun, 2019 1 commit

balanced bagging (#2214) · cdba7147

Guolin Ke authored Jun 18, 2019

* add balanced bagging

* refine code

* fix format

* clarify usage only for binary application

cdba7147

28 May, 2019 1 commit
- [docs] fixed and enhanced format of C API (#2195) · b1e5a843
  Nikita Titov authored May 28, 2019
```
* fixed and enhanced format of C API

* fixed description of dataset creation functions
```
  b1e5a843
26 May, 2019 1 commit

Top k multi error (#2178) · b3db9e92

Belinda Trotta authored May 26, 2019

* Implement top-k multiclass error metric. Add new parameter top_k_threshold.

* Add test for multiclass metrics

* Make test less sensitive to avoid floating-point issues.

* Change tabs to spaces.

* Fix problem with test in Python 2. Refactor to use np.testing. Decrease number of training rounds so loss is larger and easier to compare.

* Move multiclass tests into test_engine.py

* Change parameter name from top_k_threshold to multi_error_top_k.

* Fix top-k error metric to handle case where scores are equal. Update tests and docs.

* Change name of top-k metric to multi_error@k.

* Change tabs to spaces.

* Fix formatting.

* Fix minor issues in docs.

b3db9e92

25 May, 2019 1 commit

[ci][docs] refine C API docs (#2076) · 19de2be0

nabokovas authored May 26, 2019

* [ci] for autogenerating docs

* resolved comments

* resolved comments 2

* update to 36c89134

19de2be0

16 May, 2019 1 commit

first metric only in earlystopping for cli (#2172) · f01b2aca

Guolin Ke authored May 16, 2019

* first metric only in earlystopping for cli

* code clean

* added note about CLI only usage

* removed note about CLI only usage

f01b2aca

15 May, 2019 2 commits
- [doc] minor doc fix for gamma param (#2180) · 6f3fae51
  Ilya Matiach authored May 15, 2019
  
  6f3fae51
- [docs] fixing max_depth param description (#2155) · 3d8770af
  Laurae authored May 15, 2019
```
* PR #1879

* Update docs with parameter_generator.py

* Update wrapper doc for sklearn
```
  3d8770af
08 May, 2019 2 commits
- [docs] updated Microsoft GitHub URL (#2152) · 94fbe5bb
  Guolin Ke authored May 08, 2019
```
* fix travis badge

* updated GitHub Microsoft URL
```
  94fbe5bb
- fix warning · f46f8b2a
  Guolin Ke authored May 08, 2019
  
  f46f8b2a
06 May, 2019 1 commit
- fix a bug when bagging with reset_config (#2149) · 46d21476
  Guolin Ke authored May 06, 2019
```
* fix a bug when bagging with reset_config

* clean code
```
  46d21476
05 May, 2019 1 commit

[ci][docs] generate docs for C API (#2059) · cfcc020e

Nikita Titov authored May 05, 2019

* use file to install deps for docs

* added C_API docs

* use breathe without exhale

* added missed params descriptions and make Doxygen fail for warnings

* escape char hotfix

* ignore unknown directive for rstcheck

* better handle env variable

* hotfix for 'Unknown directive type' error with C_API=NO

* Update .gitignore

* fixed pylint

* use already defined constants in conf.py

* do not suppress Doxygen's output

* addressed review comments

* removed unneeded import

cfcc020e

30 Apr, 2019 1 commit
- Removed legacy code (#2137) · a295f6b0
  Nikita Titov authored Apr 30, 2019
```
* Update meta.h

* Update json11.hpp
```
  a295f6b0
28 Apr, 2019 1 commit
- fixed minor typos (#2119) · 24ad35f7
  Nikita Titov authored Apr 28, 2019
  
  24ad35f7
19 Apr, 2019 1 commit

[docs] Update doc string for pred_contrib (#2116) · 89f2021a

Scott Lundberg authored Apr 18, 2019

* Update doc string for pred_contrib

See comments at the end of #1969

* Update basic.py

* Update basic.py

* update doc strings

* update equals sign in doc string

* strip whitespace and gen rst

* strip whitespace

89f2021a

18 Apr, 2019 1 commit
- [docs] added note about the spoiled probabilities (#2113) · beb35d56
  Nikita Titov authored Apr 18, 2019
  
  beb35d56
13 Apr, 2019 2 commits
- fixed cpplint errors about spaces and newlines (#2102) · 0a4a7a86
  Nikita Titov authored Apr 13, 2019
  
  0a4a7a86
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 1 commit

reworked includes in source files (#2066) · 50ce01b5

Nikita Titov authored Apr 12, 2019

* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)

50ce01b5

04 Apr, 2019 1 commit

Add Cost Effective Gradient Boosting (#2014) · 76102284

remcob-gr authored Apr 04, 2019

* Add configuration parameters for CEGB.

* Add skeleton CEGB tree learner

Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.

* Track features used in CEGB tree learner.

* Pull CEGB tradeoff and coupled feature penalty from config.

* Implement finding best splits for CEGB

This is heavily based on the serial version, but just adds using the coupled penalties.

* Set proper defaults for cegb parameters.

* Ensure sanity checks don't switch off CEGB.

* Implement per-data-point feature penalties in CEGB.

* Implement split penalty and remove unused parameters.

* Merge changes from CEGB tree learner into serial tree learner

* Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.

* Fix bug where CEGB would incorrectly penalise a previously used feature

The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.

* Document CEGB parameters and add them to the appropriate section.

* Remove leftover reference to cegb tree learner.

* Remove outdated diff.

* Fix warnings

* Fix minor issues identified by @StrikerRUS.

* Add docs section on CEGB, including citation.

* Fix link.

* Fix CI failure.

* Add some unit tests

* Fix pylint issues.

* Fix remaining pylint issue

76102284

01 Apr, 2019 1 commit
- addressed cpplint error about C-style cast (#2064) · 2027f6b4
  Nikita Titov authored Apr 01, 2019
  
  2027f6b4
26 Mar, 2019 1 commit
- fixed cpplint error about spaces and newlines (#2068) · 3c999be3
  Nikita Titov authored Mar 26, 2019
  
  3c999be3
25 Mar, 2019 1 commit

Add API method LGBM_BoosterPredictForMats (#2008) · 823fc03c

mjmckp authored Mar 25, 2019

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

823fc03c

18 Mar, 2019 1 commit

Added additional APIs to better support JNI on Spark (#2032) · beeb6e0f

Markus Cozowicz authored Mar 19, 2019

* added API changes required for JNI performance optimizations (e.g. predict is 3-4x faster)

* removed commented variables

* removed commented header

* renamed method to make it obvious it is created for Spark

* fixed comment alignment

* replaced GetPrimitiveArrayCritical with GetIntArrayElements for training. fixed dead-lock on databricks

beeb6e0f

26 Feb, 2019 1 commit

Add ability to move features from one data set to another in memory (#2006) · 219c943d

remcob-gr authored Feb 26, 2019

* Initial attempt to implement appending features in-memory to another data set

The intent is for this to enable munging files together easily, without needing to round-trip via numpy or write multiple copies to disk.
In turn, that enables working more efficiently with data sets that were written separately.

* Implement Dataset.dump_text, and fix small bug in appending of group bin boundaries.

Dumping to text enables us to compare results, without having to worry about issues like features being reordered.

* Add basic tests for validation logic for add_features_from.

* Remove various internal mapping items from dataset text dumps

These are too sensitive to the exact feature order chosen, which is not visible to the user.
Including them in tests appears unnecessary, as the data dumping code should provide enough coverage.

* Add test that add_features_from results in identical data sets according to dump_text.

* Add test that booster behaviour after using add_features_from matches that of training on the full data

This checks:
- That training after add_features_from works at all
- That add_features_from does not cause training to misbehave

* Expose feature_penalty and monotone_types/constraints via get_field

These getters allow us to check that add_features_from does the right thing with these vectors.

* Add tests that add_features correctly handles feature_penalty and monotone_constraints.

* Ensure add_features_from properly frees the added dataset and add unit test for this

Since add_features_from moves the feature group pointers from the added dataset to the dataset being added to, the added dataset is invalid after the call.
We must ensure we do not try and access this handle.

* Remove some obsolete TODOs

* Tidy up DumpTextFile by using a single iterator for each feature

This iterators were also passed around as raw pointers without being freed, which is now fixed.

* Factor out offsetting logic in AddFeaturesFrom

* Remove obsolete TODO

* Remove another TODO

This one is debatable, test code can be a bit messy and duplicate-heavy, factoring it out tends to end badly.
Leaving this for now, will revisit if adding more tests later on becomes a mess.

* Add documentation for newly-added methods.

* Fix whitespace issues identified by pylint.

* Fix a few more whitespace issues.

* Fix doc comments

* Implement deep copying for feature groups.

* Replace awkward std::move usage by emplace_back, and reduce vector size to num_features rather than num_total_features.

* Copy feature groups in addFeaturesFrom, rather than moving them.

* Fix bugs in FeatureGroup copy constructor and ensure source dataset remains usable

* Add reserve to PushVector and PushOffset

* Move definition of Clone into class body

* Fix PR review issues

* Fix for loop increment style.

* Fix test failure

* Some more docstring fixes.

* Remove blank line

219c943d

24 Feb, 2019 1 commit

[docs] added notes about params usage when data is provided via path and... · f9ab5f58

Nikita Titov authored Feb 24, 2019

[docs] added notes about params usage when data is provided via path and removed unused param (#2024)

* added notes about params usage when data is provided via path

* fixed init score and valid init score params note

* fixed binary params description

f9ab5f58

06 Feb, 2019 1 commit
- fixed modifiers indent (#1997) · 462612b4
  Nikita Titov authored Feb 06, 2019
  
  462612b4
02 Feb, 2019 1 commit
- cpplint whitespaces and new lines (#1986) · 90127b52
  Nikita Titov authored Feb 02, 2019
  
  90127b52
30 Jan, 2019 2 commits

fix nan in eval results (#1973) · feeaf38f

Guolin Ke authored Jan 30, 2019

* always save the score of the first round in early stopping

fix #1971

* avoid using std::log on non-positive numbers

* remove unnecessary changes

* add tests

* Update test_sklearn.py

* enhanced tests

feeaf38f

fix R's overflow (#1960) · 5c399840
Guolin Ke authored Jan 30, 2019

5c399840

23 Jan, 2019 1 commit

support to override some parameters in Dataset (#1876) · b37065db

Guolin Ke authored Jan 23, 2019

* add warnings for override parameters of Dataset

* fix pep8

* add feature_penalty

* refactor

* add R's code

* Update basic.py

* Update basic.py

* fix parameter bug

* Update lgb.Dataset.R

* fix a bug

b37065db

20 Dec, 2018 1 commit
- fix trival typo (#1915) · 92e95e62
  Lingyi Hu authored Dec 20, 2018
  
  92e95e62
17 Dec, 2018 1 commit

Fix bugs in RF (#1906) · cba82447

Guolin Ke authored Dec 17, 2018

* fix RF's bugs

* fix tests

* rollback num_iterations

* fix a bug and reduce memory costs

* reduce memory cost

cba82447

25 Nov, 2018 1 commit
- fixed seed param default value and description (#1872) · 25b3de36
  Nikita Titov authored Nov 25, 2018
  
  25b3de36
22 Nov, 2018 1 commit
- added note about valid format of ignored columns (#1865) · d21a0e39
  Nikita Titov authored Nov 22, 2018
  
  d21a0e39