Commits · f3afe98bedd469b04758024c93c9ad77a43dc195 · tianlh / LightGBM-DCU

05 Dec, 2019 2 commits

[python] Allow python sklearn interface's fit() to pass init_model to train() (#2447) · f3afe98b

aaiyer authored Dec 05, 2019

* allow python sklearn interface's fit() to pass init_model to train()

* Fix whitespace issues, and change ordering of parameters to be backward
compatible

* Formatting fixes

* allow python sklearn interface's fit() to pass init_model to train()

* Fix whitespace issues, and change ordering of parameters to be backward
compatible

* Formatting fixes

* Recognize LGBModel objects for init_model

* simplified condition

* updated docstring

* added test

f3afe98b

[python][R-package] warn users about untransformed values in case of custom obj (#2611) · 69c1c330
Nikita Titov authored Dec 05, 2019

69c1c330

27 Oct, 2019 2 commits
- [tests][python] refined python tests (#2483) · 1f1dc452
  Nikita Titov authored Oct 27, 2019
```
* speed up tests

* more updates

* fixed pylint

* updated tests

* Update test_sklearn.py

* test that indices are sorted internally
```
  1f1dc452
- [python] removed unused pylint directives (#2466) · 00d1e693
  Nikita Titov authored Oct 27, 2019
  
  00d1e693
21 Oct, 2019 1 commit

check sorted indices in Subset (#2510) · 465d1262

Guolin Ke authored Oct 21, 2019

* Update sparse_bin.hpp

* check sorted in c_api

* fix python package

* fix tests

* fix test

* std::is_sorted

* Update basic.py

465d1262

03 Oct, 2019 1 commit

check the shape for mat, csr and csc in prediction (#2464) · dee72159

Guolin Ke authored Oct 03, 2019

* check the shape for mat, csr and csc

* guess from csr

* support file checking

* better error msg

* grammar

* clean code

* code clean

* check range for CSR

* Update test_.py

* Update test_.py

* added tests

dee72159

28 Sep, 2019 1 commit

Predefined bin thresholds (#2325) · cc7a1e27

Belinda Trotta authored Sep 29, 2019

* Fix bug where small values of max_bin cause crash.

* Revert "Fix bug where small values of max_bin cause crash."

This reverts commit fe5c8e2547057c1fa5750bcddd359dd7708fab4b.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Add functionality to force bin thresholds.

* Fix style issues.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Add functionality to force bin thresholds.

* Fix style issues.

* Use stable sort.

* Minor style and doc fixes.

* Change binning behavior to be same as PR #2342.

* Use different bin finding function for predefined bounds.

* Fix style issues.

* Minor refactoring, overload FindBinWithZeroAsOneBin.

* Fix style issues.

* Fix bug and add new test.

* Add warning when using categorical features with forced bins.

* Pass forced_upper_bounds by reference.

* Pass container types by const reference.

* Get categorical features using FeatureBinMapper.

* Fix bug for small max_bin.

* Move GetForcedBins to DatasetLoader.

* Find forced bins in dataset_loader.

* Minor fixes.

cc7a1e27

26 Sep, 2019 1 commit
- [python] make dump_text() private (#2434) · a0a117aa
  Nikita Titov authored Sep 26, 2019
```
* make dump_text() private

* updated test
```
  a0a117aa
15 Sep, 2019 1 commit

[python] Bug fix for first_metric_only on earlystopping. (#2209) · 84754399

kenmatsu4 authored Sep 16, 2019

* Bug fix for first_metric_only if the first metric is train metric.

* Update bug fix for feval issue.

* Disable feval for first_metric_only.

* Additional test items.

* Fix wrong assertEqual settings & formating.

* Change dataset of test.

* Fix random seed for test.

* Modiry assumed test result due to different sklearn verion between CI and local.

* Remove f-string

* Applying variable assumed test result for test.

* Fix flake8 error.

* Modifying in accordance with review comments.

* Modifying for pylint.

* simplified tests

* Deleting error criteria `if eval_metric is None`.

* Delete test items of classification.

* Simplifying if condition.

* Applying first_metric_only for sklearn wrapper.

* Modifying test_sklearn for comforming to python 2.x

* Fix flake8 error.

* Additional fix for sklearn and add tests.

* Bug fix and add test cases.

* some refactor

* fixed lint

* Fix duplicated metrics scores to pass the test.

* Fix the case first_metric_only not in params.

* Converting metrics aliases.

* Add comment.

* Modify comment for pylint.

* Modify comment for pydocstyle.

* Using split test set for two eval_set.

* added test case for metric aliases and length checks

* minor style fixes

* fixed rmse name and alias position

* Fix the case metric=[]

* Fix using env.model._train_data_name

* Fix wrong test condition.

* Move initial process to _init() func.

* Modify test setting for test_sklearn & training data matching on callback.py

* test_sklearn.py
-> A test case for training is wrong, so fixed.

* callback.py
-> A condition of if statement for detecting test dataset is wrong, so fixed.

* Support composite name metrics.

* Remove metric check process & reduce redundant test cases.

For #2273 fixed not only the order of metrics in cpp, removing metric check process at callback.py

* Revised according to the matters pointed out on a review.

* increased code readability

* Fix the issue of order of validation set.

* Changing to OrderdDict from default dict for score result.

* added missed check in cv function for first_metric_only and feval co-occurrence

* keep order only for metrics but not for datasets in best_score

* move OrderedDict initialization to init phase

* fixed minor printing issues

* move first metric detection to init phase and split can be performed without checks

* split only once during callback

* removed excess code

* fixed typo in variable name and squashed ifs

* use setdefault

* hotfix

* fixed failing test

* refined tests

* refined sklearn test

* Making "feval" effective on early stopping.

* allow feval and first_metric_only for cv

* removed unused code

* added tests for feval

* fixed printing

* add note about whitespaces in feval name

* Modifying final iteration process in case valid set is training data.

84754399

12 Sep, 2019 1 commit
- update feature_fraction_bynode (#2381) · ad8e8ccc
  Guolin Ke authored Sep 12, 2019
```
* update

* fix a bug

* Update config.h

* Update Parameters.rst
```
  ad8e8ccc
09 Sep, 2019 1 commit
- [python] keep consistent state for Dataset fields (#2390) · 9f6e4413
  Nikita Titov authored Sep 09, 2019
```
* keep consistent state for Dataset fields

* hotfix
```
  9f6e4413
08 Sep, 2019 1 commit

[python] Improved python tree plots (#2304) · f52be9be

CharlesAuguste authored Sep 08, 2019

* Some basic changes to the plot of the trees to make them readable.

* Squeezed the information in the nodes.

* Added colouring when a dictionnary mapping the features to the constraints is passed.

* Fix spaces.

* Added data percentage as an option in the nodes.

* Squeezed the information in the leaves.

* Important information is now in bold.

* Added a legend for the color of monotone splits.

* Changed "split_gain" to "gain" and "internal_value" to "value".

* Sqeezed leaves a bit more.

* Changed description in the legend.

* Revert "Sqeezed leaves a bit more."

This reverts commit dd8bf14a3ba604b0dfae3b7bb1c64b6784d15e03.

* Increased the readability for the gain.

* Tidied up the legend.

* Added the data percentage in the leaves.

* Added the monotone constraints to the dumped model.

* Monotone constraints are now specified automatically when plotting trees.

* Raise an exception instead of the bug that was here before.

* Removed operators on the branches for a clearer design.

* Small cleaning of the code.

* Setting a monotone constraint on a categorical feature now returns an exception instead of doing nothing.

* Fix bug when monotone constraints are empty.

* Fix another bug when monotone constraints are empty.

* Variable name change.

* Added is / isn't on every edge of the trees.

* Fix test "tree_create_digraph".

* Add new test for plotting trees with monotone constraints.

* Typo.

* Update documentation of categorical features.

* Typo.

* Information in nodes more explicit.

* Used regular strings instead of raw strings.

* Small refactoring.

* Some cleaning.

* Added future statement.

* Changed output for consistency.

* Updated documentation.

* Added comments for colors.

* Changed text on edges for more clarity.

* Small refactoring.

* Modified text in leaves for consistency with nodes.

* Updated default values and documentaton for consistency.

* Replaced CHECK with Log::Fatal for user-friendliness.

* Updated tests.

* Typo.

* Simplify imports.

* Swapped count and weight to improve readibility of the leaves in the plotted trees.

* Thresholds in bold.

* Made information in nodes written in a specific order.

* Added information to clarify legend.

* Code cleaning.

f52be9be

03 Sep, 2019 2 commits
- [ci][tests] install joblib for test directly (#2374) · df26b65d
  Nikita Titov authored Sep 03, 2019
  
  df26b65d
- sub-features for node level (#2330) · bbbad73d
  Guolin Ke authored Sep 03, 2019
```
* add parameter

* implement

* fix bug

* fix bug

* fix according comment

* add test

* Update test_engine.py

* Update test_engine.py

* Update test_engine.py
```
  bbbad73d
02 Sep, 2019 1 commit
- [tests] simplified test and added dumped data to gitignore (#2372) · e7c6e67a
  Nikita Titov authored Sep 02, 2019
  
  e7c6e67a
24 Aug, 2019 1 commit

normalize the lambdas in lambdamart objective (#2331) · 0dfda826

Guolin Ke authored Aug 25, 2019

* norm the lambda scores

* change default to false

* update doc

* typo

* Update Parameters.rst

* Update config.h

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update rank_objective.hpp

* Update Parameters.rst

* Update config.h

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

0dfda826

20 Aug, 2019 1 commit
- fix the bug in bin with small values (#2342) · 20f94c52
  Guolin Ke authored Aug 20, 2019
```
* fix the bug in bin with small values

* Update bin.cpp

* Update test_engine.py
```
  20f94c52
17 Aug, 2019 1 commit

sigmoid_ in grad and hess for rank objective (#2322) · aee92f63

sbruch authored Aug 16, 2019

* Lambdas and hessians need to factor sigmoid_ into the computation. Additionally, the sigmoid function has an arbitrary factor of 2 in the exponent; it is not just non-standard but the gradients are not computed correctly anyway.

* Update unit test

* Also remove a heuristic that normalizes the gradient by the difference in scores.

* Also fix unit test after removing the heuristic

aee92f63

16 Aug, 2019 1 commit

Bug fix: small values of max_bin cause program to crash (#2299) · c421f898

Belinda Trotta authored Aug 16, 2019

* Fix bug where small values of max_bin cause crash.

* Revert "Fix bug where small values of max_bin cause crash."

This reverts commit fe5c8e2547057c1fa5750bcddd359dd7708fab4b.

* Fix bug where small values of max_bin cause crash.

* Reset random seed in test, remove extra blank line.

* Minor bug fix. Remove extra blank line.

* Change old test to account for new binning behavior.

c421f898

13 Aug, 2019 1 commit

[python] add sparsity support for new version of pandas and check Series for bad dtypes (#2318) · 8f446be7

Nikita Titov authored Aug 13, 2019

* reworked pandas dtypes mapper

* added tests

* added sparsity support for new version of pandas

* fixed tests for old pandas

* check pd.Series for bad dtypes as well

* enhanced tests

* fixed pylint

8f446be7

24 Jul, 2019 1 commit

add weight in tree model output (#2269) · e1d7a7b9

Guolin Ke authored Jul 24, 2019

* add weight in tree model output

* fix bug

* updated Python plotting part to handle weights

e1d7a7b9

12 Jul, 2019 1 commit

fix init_model with subset (#2252) · 7360cff9

Guolin Ke authored Jul 12, 2019

* fix init_model with subset

* Update basic.py

* added test

* fix predictor naming issue

* Update basic.py

* fix bug

* fix pylint

* fix comments

* Update basic.py

* Update basic.py

* updated test

* fixed bug

* fixed lint

* fix warning

* add get_data before initial prediction

* refine the warning in get_data

* refine warning

* Update basic.py

7360cff9

09 Jul, 2019 1 commit
- fix bug when using dart with init_model (#2251) · ebc831bc
  Guolin Ke authored Jul 09, 2019
```
* add test

* fix a index bug
```
  ebc831bc
08 Jul, 2019 1 commit

Max bin by feature (#2190) · 291752de

Belinda Trotta authored Jul 08, 2019

* Add parameter max_bin_by_feature.

* Fix minor bug.

* Fix minor bug.

* Fix calculation of header size for writing binary file.

* Fix style issues.

* Fix python style issue.

* Fix test and python style issue.

291752de

20 Jun, 2019 1 commit

[tests] use numpy.testing.assert_allclose (#2207) · 86269ee3

Nikita Titov authored Jun 20, 2019

* Update test.py

* Update test_consistency.py

* Update test_basic.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_engine.py

* more replacements

86269ee3

04 Jun, 2019 1 commit
- [python] fix class_weight (#2199) · b6f65783
  Nikita Titov authored Jun 04, 2019
```
* fixed class_weight

* fixed lint

* added test

* hotfix
```
  b6f65783
27 May, 2019 1 commit

[python] fixed picklability of sklearn models with custom obj and updated... · 2459362a

Nikita Titov authored May 27, 2019

[python] fixed picklability of sklearn models with custom obj and updated docstings for custom obj (#2191)

* refactored joblib test

* fixed picklability of sklearn models with custom obj and updated docstings for custom obj

* pickled model should be able to predict without refitting

2459362a

26 May, 2019 1 commit

Top k multi error (#2178) · b3db9e92

Belinda Trotta authored May 26, 2019

* Implement top-k multiclass error metric. Add new parameter top_k_threshold.

* Add test for multiclass metrics

* Make test less sensitive to avoid floating-point issues.

* Change tabs to spaces.

* Fix problem with test in Python 2. Refactor to use np.testing. Decrease number of training rounds so loss is larger and easier to compare.

* Move multiclass tests into test_engine.py

* Change parameter name from top_k_threshold to multi_error_top_k.

* Fix top-k error metric to handle case where scores are equal. Update tests and docs.

* Change name of top-k metric to multi_error@k.

* Change tabs to spaces.

* Fix formatting.

* Fix minor issues in docs.

b3db9e92

15 May, 2019 1 commit
- [python] added ability to pass first_metric_only in params (#2175) · f91e5644
  Nikita Titov authored May 15, 2019
```
* added ability to pass first_metric_only in params

* simplified tests

* fixed test

* fixed punctuation
```
  f91e5644
08 May, 2019 1 commit
- [docs] updated Microsoft GitHub URL (#2152) · 94fbe5bb
  Guolin Ke authored May 08, 2019
```
* fix travis badge

* updated GitHub Microsoft URL
```
  94fbe5bb
01 May, 2019 1 commit

[python] added plot_split_value_histogram function (#2043) · 611cf5d4

Nikita Titov authored May 01, 2019

* added plot_split_value_histogram function

* updated init module

* added plot split value histogram example

* added plot_split_value_histogram to notebook

* added test

* fixed pylint

* updated API docs

* fixed grammar

* set y ticks to int value in more sufficient way

611cf5d4

22 Apr, 2019 1 commit
- [python] disable default pandas cat features if cat features were explicitly provided (#2121) · 4be53a5a
  Nikita Titov authored Apr 22, 2019
```
* disable default pandas cat features if cat features were explicitly provided

* added assertion for cat features
```
  4be53a5a
19 Apr, 2019 1 commit
- [python] ignore pandas ordered categorical columns by default (#2115) · d115769c
  Nikita Titov authored Apr 19, 2019
```
* ignore pandas ordered categorical columns by default

* fix tests

* fix tests

* added comments
```
  d115769c
16 Apr, 2019 1 commit

[python] add flag of displaying train loss for lgb.cv() (#2089) · ca85b679

kenmatsu4 authored Apr 16, 2019

* [python] displaying train loss during training with lgb.cv

* modifying only display running type when disp_train_loss==True

* Add test for display train loss

* del .idea files

* Rename disp_train_loss to show_train_loss and revise comment.

* Change aug name show_train_loss -> eval_train_metric , and add a test item.

* Modifying comment of eval_train_metric.

ca85b679

13 Apr, 2019 1 commit
- [python] make possibility to create Booster from string official (#2098) · 5b5b9823
  Nikita Titov authored Apr 13, 2019
  
  5b5b9823
04 Apr, 2019 1 commit

Add Cost Effective Gradient Boosting (#2014) · 76102284

remcob-gr authored Apr 04, 2019

* Add configuration parameters for CEGB.

* Add skeleton CEGB tree learner

Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.

* Track features used in CEGB tree learner.

* Pull CEGB tradeoff and coupled feature penalty from config.

* Implement finding best splits for CEGB

This is heavily based on the serial version, but just adds using the coupled penalties.

* Set proper defaults for cegb parameters.

* Ensure sanity checks don't switch off CEGB.

* Implement per-data-point feature penalties in CEGB.

* Implement split penalty and remove unused parameters.

* Merge changes from CEGB tree learner into serial tree learner

* Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.

* Fix bug where CEGB would incorrectly penalise a previously used feature

The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.

* Document CEGB parameters and add them to the appropriate section.

* Remove leftover reference to cegb tree learner.

* Remove outdated diff.

* Fix warnings

* Fix minor issues identified by @StrikerRUS.

* Add docs section on CEGB, including citation.

* Fix link.

* Fix CI failure.

* Add some unit tests

* Fix pylint issues.

* Fix remaining pylint issue

76102284

25 Mar, 2019 1 commit

[python] Use first_metric_only flag for early_stopping function. (#2049) · 011cc90a

kenmatsu4 authored Mar 25, 2019

* Use first_metric_only flag for early_stopping function.

In order to apply early stopping with only first metric, applying first_metric_only flag for early_stopping function.

* upcate comment

* Revert "upcate comment"

This reverts commit 1e75a1a415cc16cfbe795181e148ebfe91469be4.

* added test

* fixed docstring

* cut comment and save one line

* document new feature

011cc90a

14 Mar, 2019 1 commit
- [python] disabled split value histogram for categorical features (#2045) · ffb134cc
  Nikita Titov authored Mar 14, 2019
```
* disabled split value histogram for categorical features

* updated test for cat. feature

* updated docs
```
  ffb134cc
09 Mar, 2019 1 commit
- [python] added get_split_value_histogram method (#2041) · 8d6666e0
  Nikita Titov authored Mar 09, 2019
```
* added get_split_value_histogram method

* added param for ordinary return value
```
  8d6666e0
07 Mar, 2019 1 commit

[tests] fixed and refactored some tests (#2035) · 8aa08c4a

Nikita Titov authored Mar 07, 2019

* fixed number of tests in pytest

* fixed data shape and removed unused code

* refactored tests

* hotfix

* hotfix

8aa08c4a