Commits · ca85b6795002cc0c74d8e97d19fee88ca6ecc98e · tianlh / LightGBM-DCU

16 Apr, 2019 2 commits

[python] add flag of displaying train loss for lgb.cv() (#2089) · ca85b679

kenmatsu4 authored Apr 16, 2019

* [python] displaying train loss during training with lgb.cv

* modifying only display running type when disp_train_loss==True

* Add test for display train loss

* del .idea files

* Rename disp_train_loss to show_train_loss and revise comment.

* Change aug name show_train_loss -> eval_train_metric , and add a test item.

* Modifying comment of eval_train_metric.

ca85b679

fix overflow in WeightedPercentile · 8ffd8d80
Guolin Ke authored Apr 16, 2019

8ffd8d80

13 Apr, 2019 3 commits
- [python] make possibility to create Booster from string official (#2098) · 5b5b9823
  Nikita Titov authored Apr 13, 2019
  
  5b5b9823
- fixed cpplint errors about spaces and newlines (#2102) · 0a4a7a86
  Nikita Titov authored Apr 13, 2019
  
  0a4a7a86
- added copyright message in files (#2101) · 32ef7603
  Nikita Titov authored Apr 13, 2019
  
  32ef7603
11 Apr, 2019 3 commits
- reworked includes in source files (#2066) · 50ce01b5
  Nikita Titov authored Apr 12, 2019
```
* added all necessary includes - fixed build/include_what_you_use error

* fixed the order of includes (build/include_order)
```
  50ce01b5
- [docs] updated HDFS guide (#1890) · c56412a8
  Nikita Titov authored Apr 11, 2019
```
* updated HDFS guide

* updated guide

* no info about Clang

* pass paths in quotes

* Update README.rst
```
  c56412a8
- add IntelliJ IDEA to .gitignore (#2093) · fb9b2b8a
  Nikita Titov authored Apr 11, 2019
  
  fb9b2b8a
10 Apr, 2019 2 commits

[ci] update macOS on Travis to Mojave (#2086) · 0e025e9c

Nikita Titov authored Apr 10, 2019

* added fix for OpenMP on macOS into test script

* test: AppleClang on Travis

* use Mojave on Travis

* bash hotfix

* get back to gcc compiler on Travis macOS

0e025e9c

[docs] Python wrapper doesn't support params in form of list of pairs (#2078) · b3c31c40
Nikita Titov authored Apr 10, 2019
```
* fixed Python intro

* fixed typos

* scikit-learn added support of https
```
b3c31c40

09 Apr, 2019 1 commit

[ci] update CI stuff (#2079) · 691b8428

Nikita Titov authored Apr 09, 2019

* updated boost submodule

* updated docker with new stable Clang and CMake

* switch to dev docker

* updated setup script

* updated MinGW on Appveyor

* updated Azure config to use docker for GPU task

* do not upgrade gcc - takes too long

* test: switch compilers

* switch compilers back

* get back to main docker

691b8428

04 Apr, 2019 1 commit

Add Cost Effective Gradient Boosting (#2014) · 76102284

remcob-gr authored Apr 04, 2019

* Add configuration parameters for CEGB.

* Add skeleton CEGB tree learner

Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.

* Track features used in CEGB tree learner.

* Pull CEGB tradeoff and coupled feature penalty from config.

* Implement finding best splits for CEGB

This is heavily based on the serial version, but just adds using the coupled penalties.

* Set proper defaults for cegb parameters.

* Ensure sanity checks don't switch off CEGB.

* Implement per-data-point feature penalties in CEGB.

* Implement split penalty and remove unused parameters.

* Merge changes from CEGB tree learner into serial tree learner

* Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.

* Fix bug where CEGB would incorrectly penalise a previously used feature

The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.

* Document CEGB parameters and add them to the appropriate section.

* Remove leftover reference to cegb tree learner.

* Remove outdated diff.

* Fix warnings

* Fix minor issues identified by @StrikerRUS.

* Add docs section on CEGB, including citation.

* Fix link.

* Fix CI failure.

* Add some unit tests

* Fix pylint issues.

* Fix remaining pylint issue

76102284

02 Apr, 2019 1 commit
- [docs] Fix typo in Python-Intro.rst (#2074) · fe115bbb
  sheikheddy authored Apr 03, 2019
  
  fe115bbb
01 Apr, 2019 1 commit
- addressed cpplint error about C-style cast (#2064) · 2027f6b4
  Nikita Titov authored Apr 01, 2019
  
  2027f6b4
26 Mar, 2019 3 commits
- updated gitignore to ignore files created by local python installation (#2061) · 82f803b3
  James Lamb authored Mar 26, 2019
```
* updated gitignore to ignore files created by local python installation

* moved sections around in gitignore
```
  82f803b3
- [docs] Small aesthetic improvements to RTD docs (#2060) · 572ae400
  James Lamb authored Mar 26, 2019
```
* Small aesthetic improvements to RTD docs

* fixed markdown table in Development-Guide

* removed unnecessary blank line in conf.py
```
  572ae400
- fixed cpplint error about spaces and newlines (#2068) · 3c999be3
  Nikita Titov authored Mar 26, 2019
  
  3c999be3
25 Mar, 2019 3 commits

Add API method LGBM_BoosterPredictForMats (#2008) · 823fc03c

mjmckp authored Mar 25, 2019

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

823fc03c

[python] Use first_metric_only flag for early_stopping function. (#2049) · 011cc90a

kenmatsu4 authored Mar 25, 2019

* Use first_metric_only flag for early_stopping function.

In order to apply early stopping with only first metric, applying first_metric_only flag for early_stopping function.

* upcate comment

* Revert "upcate comment"

This reverts commit 1e75a1a415cc16cfbe795181e148ebfe91469be4.

* added test

* fixed docstring

* cut comment and save one line

* document new feature

011cc90a

remove warnings · 548fc91e
Guolin Ke authored Mar 25, 2019

548fc91e

22 Mar, 2019 1 commit
- fixed broken link (#2065) · e5e9fbea
  Nikita Titov authored Mar 22, 2019
  
  e5e9fbea
20 Mar, 2019 1 commit
- removed Python 3.4 support (#2051) · 7a89d005
  Nikita Titov authored Mar 20, 2019
  
  7a89d005
18 Mar, 2019 2 commits

removed hotfix (#2054) · babe8b31
Nikita Titov authored Mar 19, 2019

babe8b31

Added additional APIs to better support JNI on Spark (#2032) · beeb6e0f

Markus Cozowicz authored Mar 19, 2019

* added API changes required for JNI performance optimizations (e.g. predict is 3-4x faster)

* removed commented variables

* removed commented header

* renamed method to make it obvious it is created for Spark

* fixed comment alignment

* replaced GetPrimitiveArrayCritical with GetIntArrayElements for training. fixed dead-lock on databricks

beeb6e0f

16 Mar, 2019 1 commit
- lightgbm SWIG Java wrapper changes needed to add early stopping in mmlspark (#2047) · 95246cda
  Ilya Matiach authored Mar 16, 2019
```
* lightgbm SWIG Java wrapper changes needed to add early stopping in mmlspark

* updated based on comments
```
  95246cda
14 Mar, 2019 4 commits
- [ci] compatibility hotfix for notebook execution (#2048) · b020a25d
  Nikita Titov authored Mar 14, 2019
```
* ci fix

* ci fix for Appveyor

* actually firx Appveyor
```
  b020a25d
- added examples for multiple custom metrics (#2021) · 74ce2cfe
  Nikita Titov authored Mar 14, 2019
  
  74ce2cfe
- [python] disabled split value histogram for categorical features (#2045) · ffb134cc
  Nikita Titov authored Mar 14, 2019
```
* disabled split value histogram for categorical features

* updated test for cat. feature

* updated docs
```
  ffb134cc
- [examples] updated tree index with categorical feature (#2044) · 7ab94e6b
  Nikita Titov authored Mar 14, 2019
```
* updated gitignore

* updated tree index with cat feature
```
  7ab94e6b
09 Mar, 2019 2 commits
- [python] added get_split_value_histogram method (#2041) · 8d6666e0
  Nikita Titov authored Mar 09, 2019
```
* added get_split_value_histogram method

* added param for ordinary return value
```
  8d6666e0
- Fix typo in GatherInfoForThresholdCategorical (#2040) · 9de526f8
  remcob-gr authored Mar 09, 2019
  
  9de526f8
07 Mar, 2019 2 commits
- [python] Minor fix: close open file pointer (#2039) · 5801460a
  Erling Haugstad authored Mar 07, 2019
  
  5801460a
- [tests] fixed and refactored some tests (#2035) · 8aa08c4a
  Nikita Titov authored Mar 07, 2019
```
* fixed number of tests in pytest

* fixed data shape and removed unused code

* refactored tests

* hotfix

* hotfix
```
  8aa08c4a
26 Feb, 2019 1 commit

Add ability to move features from one data set to another in memory (#2006) · 219c943d

remcob-gr authored Feb 26, 2019

* Initial attempt to implement appending features in-memory to another data set

The intent is for this to enable munging files together easily, without needing to round-trip via numpy or write multiple copies to disk.
In turn, that enables working more efficiently with data sets that were written separately.

* Implement Dataset.dump_text, and fix small bug in appending of group bin boundaries.

Dumping to text enables us to compare results, without having to worry about issues like features being reordered.

* Add basic tests for validation logic for add_features_from.

* Remove various internal mapping items from dataset text dumps

These are too sensitive to the exact feature order chosen, which is not visible to the user.
Including them in tests appears unnecessary, as the data dumping code should provide enough coverage.

* Add test that add_features_from results in identical data sets according to dump_text.

* Add test that booster behaviour after using add_features_from matches that of training on the full data

This checks:
- That training after add_features_from works at all
- That add_features_from does not cause training to misbehave

* Expose feature_penalty and monotone_types/constraints via get_field

These getters allow us to check that add_features_from does the right thing with these vectors.

* Add tests that add_features correctly handles feature_penalty and monotone_constraints.

* Ensure add_features_from properly frees the added dataset and add unit test for this

Since add_features_from moves the feature group pointers from the added dataset to the dataset being added to, the added dataset is invalid after the call.
We must ensure we do not try and access this handle.

* Remove some obsolete TODOs

* Tidy up DumpTextFile by using a single iterator for each feature

This iterators were also passed around as raw pointers without being freed, which is now fixed.

* Factor out offsetting logic in AddFeaturesFrom

* Remove obsolete TODO

* Remove another TODO

This one is debatable, test code can be a bit messy and duplicate-heavy, factoring it out tends to end badly.
Leaving this for now, will revisit if adding more tests later on becomes a mess.

* Add documentation for newly-added methods.

* Fix whitespace issues identified by pylint.

* Fix a few more whitespace issues.

* Fix doc comments

* Implement deep copying for feature groups.

* Replace awkward std::move usage by emplace_back, and reduce vector size to num_features rather than num_total_features.

* Copy feature groups in addFeaturesFrom, rather than moving them.

* Fix bugs in FeatureGroup copy constructor and ensure source dataset remains usable

* Add reserve to PushVector and PushOffset

* Move definition of Clone into class body

* Fix PR review issues

* Fix for loop increment style.

* Fix test failure

* Some more docstring fixes.

* Remove blank line

219c943d

24 Feb, 2019 1 commit

[docs] added notes about params usage when data is provided via path and... · f9ab5f58

Nikita Titov authored Feb 24, 2019

[docs] added notes about params usage when data is provided via path and removed unused param (#2024)

* added notes about params usage when data is provided via path

* fixed init score and valid init score params note

* fixed binary params description

f9ab5f58

21 Feb, 2019 1 commit
- [python] update DataTable handling (#2020) · c5cfe3e3
  Nikita Titov authored Feb 21, 2019
  
  c5cfe3e3
20 Feb, 2019 1 commit

added LightGBM SWIG wrappers for macOS and updated docs (#2002) · 414bb609

Ilya Matiach authored Feb 20, 2019

* added LightGBM SWIG wrappers for macOS and updated docs

* updated installation instructions based on comments

* updated based on comments

414bb609

18 Feb, 2019 3 commits
- Fix wording (#2015) · 7ebf80f8
  Harry Moreno authored Feb 18, 2019
  
  7ebf80f8
- Change variable name test_data > validation_data (#2018) · a777aedd
  Harry Moreno authored Feb 18, 2019
```
* it is confusing to name validation data `test_data` especially as terms like train, validation, test splits are common in ML. Change variable name in python quick start.
```
  a777aedd
- fixed conda (#2016) · b447a7bc
  Nikita Titov authored Feb 18, 2019
  
  b447a7bc