Commits · a3862f151f8fa34154095217389d8661c015f662 · tianlh / LightGBM-DCU

30 Sep, 2020 2 commits
- [python] Use ctypes for parameters of DLL functions for Dataset (#3423) · a3862f15
  Nikita Titov authored Sep 30, 2020
  
  a3862f15
- Use ctypes for parameters of DLL functions (#3419) · f60c14f1
  Belinda Trotta authored Sep 30, 2020
  
  f60c14f1
11 Sep, 2020 1 commit
- [python] remove unused variable (#3376) · 0c708d37
  James Lamb authored Sep 11, 2020
  
  0c708d37
06 Sep, 2020 1 commit

[Python] Refactor scikit-learn API to allow a list of evaluation metrics (#3254) · afc76d2c

Germán Ramírez-Espinoza authored Sep 07, 2020



* Refactors sklearn API to allow a list of evaluation metrics in the parameter eval_metric of the class (and subclasses of) LGBMModel. Also adds unit tests for this functionality

* Simplify expression to check whether the user passed one or multiple metrics to eval_metric parameter

* Simplify new tests by using custom metrics already defined in the test file

* Update docstring to reflect the fact that the parameter "feval" from the "train" and "cv" functions can also receive a list of callables

* Remove oxford comma from docstrings

Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Use named-parameters to make sure code is compatible with future versions of scikit-learn

Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove throwaway return value to make code more succinct
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Move statement to group together the code related to feval

* Avoid modifying original args as it causes errors in scikit-learn tools

For details see: https://github.com/microsoft/LightGBM/pull/2619



* Consolidate multiple eval-metrics unit-tests into one test
Co-authored-by: German I Ramirez-Espinoza <gire@home>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

afc76d2c

11 Aug, 2020 1 commit

simplify start_iteration param for predict in Python and some code cleanup for... · 877d58fa

Nikita Titov authored Aug 11, 2020

simplify start_iteration param for predict in Python and some code cleanup for start_iteration (#3288)

* simplify start_iteration param for predict in Python and some code cleanup for start_iteration

* revert docs changes about the prediction result shape

877d58fa

06 Aug, 2020 1 commit

[Python] / [R] add start_iteration to python predict interface (fix #3058) (#3272) · 82e2ff7a

shiyu1994 authored Aug 06, 2020



* [python] add start_iteration to python predict interface (#3058)

* Apply suggestions from code review

* Update lightgbm_R.h

* Apply suggestions from code review

* Apply suggestions from code review

* fix R interface

* update R documentation
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

82e2ff7a

15 Jul, 2020 1 commit

feature importance type in saved model file (#3220) · 87d46489

Guolin Ke authored Jul 16, 2020



* feature importance type in saved model file

* fix nullptr

* fixed formatting

* fix python/R

* Update src/c_api.cpp

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* fix c_api test

* fix swig

* minor docs improvements and added defines for importance types
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

87d46489

28 Jun, 2020 1 commit

adding sparse support to TreeSHAP in lightgbm (#3000) · 9f367d11

Ilya Matiach authored Jun 28, 2020

* adding sparse support to TreeSHAP in lightgbm

* updating based on comments

* updated based on comments, used fromiter instead of frombuffer

* updated based on comments

* fixed limits import order

* fix sparse feature contribs to work with more than int32 max rows

* really fixed int64 max error and build warnings

* added sparse test with >int32 max rows

* fixed python side reshape check on sparse data

* updated based on latest comments

* fixed comments

* added CSC INT32_MAX validation to test, fixed comments

9f367d11

23 Jun, 2020 1 commit

Interaction constraints (#3126) · bca2da97

Belinda Trotta authored Jun 23, 2020

* Add interaction constraints functionality.

* Minor fixes.

* Minor fixes.

* Change lambda to function.

* Fix gpu bug, remove extra blank lines.

* Fix gpu bug.

* Fix style issues.

* Try to fix segfault on MACOS.

* Fix bug.

* Fix bug.

* Fix bugs.

* Change parameter format for R.

* Fix R style issues.

* Change string formatting code.

* Change docs to say R package not supported.

* Remove R functionality, moving to separate PR.

* Keep track of branch features in tree object.

* Only track branch features when feature interactions are enabled.

* Fix lint error.

* Update docs and simplify tests.

bca2da97

11 Jun, 2020 1 commit
- refactor LGBM_DatasetGetFeatureNames (#3022) · f30e0bb3
  Nikita Titov authored Jun 11, 2020
  
  f30e0bb3
20 May, 2020 1 commit

redirect log to python console (#3090) · dea2391b

Guolin Ke authored May 21, 2020



* redir log to python console

* fix pylint

* Apply suggestions from code review

* Update basic.py

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update c_api.h

* Apply suggestions from code review

* Apply suggestions from code review

* super-minor: better wording
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

dea2391b

05 May, 2020 1 commit
- [docs] updated docs about output values (#3037) · 796ba803
  Nikita Titov authored May 05, 2020
  
  796ba803
10 Apr, 2020 1 commit

Support UTF-8 characters in feature name again (#2976) · 44a91201

OMOTO Tsukasa authored Apr 10, 2020

* Support UTF-8 characters in feature name again

This commit reverts 0d59859c.
Also see:
- https://github.com/microsoft/LightGBM/issues/2226
- https://github.com/microsoft/LightGBM/issues/2478
- https://github.com/microsoft/LightGBM/pull/2229

I reproduced the issue and as @kidotaka gave us a great survey in #2226,
I don't conclude that the cause is UTF-8, but "an empty string (character)".
Therefore, I revert "throw error when meet non ascii (#2229)" whose commit hash
is 0d59859c, and add support feture names as UTF-8 again.

* add tests

* fix check-docs tests

* update

* fix tests

* update .travis.yml

* fix tests

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* add a test for R-package

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* fix test for R-package

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* update test_r_package.sh

* update

* updte

* update

* remove unneeded comments

44a91201

20 Mar, 2020 1 commit

Fix SWIG methods that return char** (#2850) · 91185c3a

Alberto Ferreira authored Mar 20, 2020



* [swig] Fix SWIG methods that return char** with StringArray.

+ [new] Add StringArray class to manage and manipulate arrays of fixed-length strings:

  This class is now used to wrap any char** parameters, manage memory and
  manipulate the strings.

  Such class is defined at swig/StringArray.hpp and wrapped in StringArray.i.

+ [API+fix] Wrap LGBM_BoosterGetFeatureNames it resulted in segfault before:

  Added wrapper LGBM_BoosterGetFeatureNamesSWIG(BoosterHandle) that
  only receives the booster handle and figures how much memory to allocate
  for strings and returns a StringArray which can be easily converted to String[].

+ [API+safety] For consistency, LGBM_BoosterGetEvalNamesSWIG was wrapped as well:

  * Refactor to detect any kind of errors and removed all the parameters
    besides the BoosterHandle (much simpler API to use in Java).
  * No assumptions are made about the required string space necessary (128 before).
  * The amount of required string memory is computed internally

+ [safety] No possibility of undefined behaviour

  The two methods wrapped above now compute the necessary string storage space
  prior to allocation, as the low-level C API calls would crash the process
  irreversibly if they write more memory than which is passed to them.

* Changes to C API and wrappers support char**

To support the latest SWIG changes that enable proper char**
return support that is safe, the C API was changed.

The respecive wrappers in R and Python were changed too.

* Cleanup indentation in new lightgbm_R.cpp code

* Adress review code-style comments.

* Update swig/StringArray.hpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/basic.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/lightgbm_R.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: alberto.ferreira <alberto.ferreira@feedzai.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

91185c3a

16 Mar, 2020 2 commits

[python] Fix continued train by reusing the same dataset (#2906) · fc0f132f

Guolin Ke authored Mar 17, 2020



* fix

* fix return

* fix test

* fix test

* fix predictor is none

* Apply suggestions from code review

* Update basic.py

* Update basic.py

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

fc0f132f

[python] fix the bug when use different params with reference (#2907) · 399b746b

Guolin Ke authored Mar 16, 2020



* fix the bug when use different params with reference

* fix

* Update basic.py

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update basic.py

* add test

* Apply suggestions from code review

* added asserts in test
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>

399b746b

06 Mar, 2020 1 commit

[python] save all param values into model file (#2589) · ba15a16a

Nikita Titov authored Mar 06, 2020

* save all param values into model file

* revert storing predict params

* do not save params for predict and convert tasks

* fixed test: 10 is found successfully for default 100

* specify more params as no-save

ba15a16a

20 Feb, 2020 1 commit

Add capability to get possible max and min values for a model (#2737) · 18e7de4f

Joan Fontanals authored Feb 20, 2020



* Add capability to get possible max and min values for a model

* Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp

* Update include/LightGBM/c_api.h
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Change iteration to avoid potential overflow, add bindings to R and Python and a basic test

* Adjust test values

* Consider const correctness and multithreading protection

* Update test values

* Update test values

* Add test to check that model is exactly the same in all platforms

* Try to parse the model to get the expected values

* Try to parse the model to get the expected values

* Fix implementation, num_leaves can be lower than the leaf_value_ size

* Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value

* Change test order

* Add gpu_use_dp option in test

* Remove helper test method

* Update src/c_api.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_basic.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Remoove imports
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

18e7de4f

19 Feb, 2020 1 commit

[python] [R-package] refine the parameters for Dataset (#2594) · 9f79e840

Guolin Ke authored Feb 19, 2020



* reset

* fix a bug

* fix test

* Update c_api.h

* support to no filter features by min_data

* add warning in reset config

* refine warnings for override dataset's parameter

* some cleans

* clean code

* clean code

* refine C API function doxygen comments

* refined new param description

* refined doxygen comments for R API function

* removed stuff related to int8

* break long line in warning message

* removed tests which results cannot be validated anymore

* added test for warnings about unchangeable params

* write parameter from dataset to booster

* consider free_raw_data.

* fix params

* fix bug

* implementing R

* fix typo

* filter params in R

* fix R

* not min_data

* refined tests

* fixed linting

* refine

* pilint

* add docstring

* fix docstring

* R lint

* updated description for C API function

* use param aliases in Python

* fixed typo

* fixed typo

* added more params to test

* removed debug print

* fix dataset construct place

* fix merge bug

* Update feature_histogram.hpp

* add is_sparse back

* remove unused parameters

* fix lint

* add data random seed

* update

* [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

9f79e840

03 Feb, 2020 1 commit
- [python][R-package] removed duplicated code from language wrappers (#2606) · bef83598
  Nikita Titov authored Feb 03, 2020
```
* removed duplicated code from language wrappers

* removed check for resetting metric
```
  bef83598
14 Jan, 2020 2 commits
- [python] fix trees_to_dataframe and enhance test (#2690) · b161f334
  Nikita Titov authored Jan 14, 2020
```
* transfer and enhance test for trees_to_dataframe

* fixed bug in Python 2
```
  b161f334
- [python] [R-package] Use the same address when updated label/weight/query (#2662) · 82886ba6
  Guolin Ke authored Jan 14, 2020
```
* Update metadata.cpp

* add version for training set, for efficiently update label/weight/... during training.

* Update lgb.Booster.R
```
  82886ba6
10 Jan, 2020 1 commit

[python] Output model to a pandas DataFrame (#2592) · 301402c8

Patrick Ford authored Jan 10, 2020

* trees_to_df method and unit test added. PEP 8 fixes for integration.

* Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

Post-review changes

* changes from second round of reviews from striker

* third round of review. formatting and added 2 more tests

* replaced pandas dot attribute accessor with string attribute accessor

* dealt with single tree edge case and minor refactor of tests

* slight refactor for checking if tree is a single node

301402c8

29 Dec, 2019 1 commit

warning for init_score in save_binary (#2649) · 7b411bdd

Guolin Ke authored Dec 29, 2019



* warning for init_score in save_binary

fix #2639

* Update metadata.cpp

* added info into docs
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

7b411bdd

08 Dec, 2019 1 commit
- [python] Variable Typo: redictor -> predictor (#2622) · b36926d8
  duckladydinh authored Dec 08, 2019
```
I believe that this should be a typo, right?
```
  b36926d8
27 Oct, 2019 1 commit
- [python] removed unused pylint directives (#2466) · 00d1e693
  Nikita Titov authored Oct 27, 2019
  
  00d1e693
22 Oct, 2019 1 commit
- [python] handle params aliases centralized (#2489) · 5dcd4be9
  Nikita Titov authored Oct 22, 2019
```
* handle aliases centralized

* convert aliases dict to class
```
  5dcd4be9
21 Oct, 2019 1 commit

check sorted indices in Subset (#2510) · 465d1262

Guolin Ke authored Oct 21, 2019

* Update sparse_bin.hpp

* check sorted in c_api

* fix python package

* fix tests

* fix test

* std::is_sorted

* Update basic.py

465d1262

26 Sep, 2019 3 commits

[python] avoid data copy where possible (#2383) · d064019f

Nikita Titov authored Sep 26, 2019

* avoid copy where possible

* use precise type for importance type

* removed pointless code

* simplify sparse pandas Series conversion

* more memory savings

* always force type conversion for 1-D arrays

* one more copy=False

d064019f

fixed docstrings (#2451) · a0d7313b
Nikita Titov authored Sep 26, 2019

a0d7313b
[python] make dump_text() private (#2434) · a0a117aa
Nikita Titov authored Sep 26, 2019
```
* make dump_text() private

* updated test
```
a0a117aa

15 Sep, 2019 1 commit

[python] Bug fix for first_metric_only on earlystopping. (#2209) · 84754399

kenmatsu4 authored Sep 16, 2019

* Bug fix for first_metric_only if the first metric is train metric.

* Update bug fix for feval issue.

* Disable feval for first_metric_only.

* Additional test items.

* Fix wrong assertEqual settings & formating.

* Change dataset of test.

* Fix random seed for test.

* Modiry assumed test result due to different sklearn verion between CI and local.

* Remove f-string

* Applying variable assumed test result for test.

* Fix flake8 error.

* Modifying in accordance with review comments.

* Modifying for pylint.

* simplified tests

* Deleting error criteria `if eval_metric is None`.

* Delete test items of classification.

* Simplifying if condition.

* Applying first_metric_only for sklearn wrapper.

* Modifying test_sklearn for comforming to python 2.x

* Fix flake8 error.

* Additional fix for sklearn and add tests.

* Bug fix and add test cases.

* some refactor

* fixed lint

* Fix duplicated metrics scores to pass the test.

* Fix the case first_metric_only not in params.

* Converting metrics aliases.

* Add comment.

* Modify comment for pylint.

* Modify comment for pydocstyle.

* Using split test set for two eval_set.

* added test case for metric aliases and length checks

* minor style fixes

* fixed rmse name and alias position

* Fix the case metric=[]

* Fix using env.model._train_data_name

* Fix wrong test condition.

* Move initial process to _init() func.

* Modify test setting for test_sklearn & training data matching on callback.py

* test_sklearn.py
-> A test case for training is wrong, so fixed.

* callback.py
-> A condition of if statement for detecting test dataset is wrong, so fixed.

* Support composite name metrics.

* Remove metric check process & reduce redundant test cases.

For #2273 fixed not only the order of metrics in cpp, removing metric check process at callback.py

* Revised according to the matters pointed out on a review.

* increased code readability

* Fix the issue of order of validation set.

* Changing to OrderdDict from default dict for score result.

* added missed check in cv function for first_metric_only and feval co-occurrence

* keep order only for metrics but not for datasets in best_score

* move OrderedDict initialization to init phase

* fixed minor printing issues

* move first metric detection to init phase and split can be performed without checks

* split only once during callback

* removed excess code

* fixed typo in variable name and squashed ifs

* use setdefault

* hotfix

* fixed failing test

* refined tests

* refined sklearn test

* Making "feval" effective on early stopping.

* allow feval and first_metric_only for cv

* removed unused code

* added tests for feval

* fixed printing

* add note about whitespaces in feval name

* Modifying final iteration process in case valid set is training data.

84754399

09 Sep, 2019 1 commit
- [python] keep consistent state for Dataset fields (#2390) · 9f6e4413
  Nikita Titov authored Sep 09, 2019
```
* keep consistent state for Dataset fields

* hotfix
```
  9f6e4413
08 Sep, 2019 1 commit

[python] Improved python tree plots (#2304) · f52be9be

CharlesAuguste authored Sep 08, 2019

* Some basic changes to the plot of the trees to make them readable.

* Squeezed the information in the nodes.

* Added colouring when a dictionnary mapping the features to the constraints is passed.

* Fix spaces.

* Added data percentage as an option in the nodes.

* Squeezed the information in the leaves.

* Important information is now in bold.

* Added a legend for the color of monotone splits.

* Changed "split_gain" to "gain" and "internal_value" to "value".

* Sqeezed leaves a bit more.

* Changed description in the legend.

* Revert "Sqeezed leaves a bit more."

This reverts commit dd8bf14a3ba604b0dfae3b7bb1c64b6784d15e03.

* Increased the readability for the gain.

* Tidied up the legend.

* Added the data percentage in the leaves.

* Added the monotone constraints to the dumped model.

* Monotone constraints are now specified automatically when plotting trees.

* Raise an exception instead of the bug that was here before.

* Removed operators on the branches for a clearer design.

* Small cleaning of the code.

* Setting a monotone constraint on a categorical feature now returns an exception instead of doing nothing.

* Fix bug when monotone constraints are empty.

* Fix another bug when monotone constraints are empty.

* Variable name change.

* Added is / isn't on every edge of the trees.

* Fix test "tree_create_digraph".

* Add new test for plotting trees with monotone constraints.

* Typo.

* Update documentation of categorical features.

* Typo.

* Information in nodes more explicit.

* Used regular strings instead of raw strings.

* Small refactoring.

* Some cleaning.

* Added future statement.

* Changed output for consistency.

* Updated documentation.

* Added comments for colors.

* Changed text on edges for more clarity.

* Small refactoring.

* Modified text in leaves for consistency with nodes.

* Updated default values and documentaton for consistency.

* Replaced CHECK with Log::Fatal for user-friendliness.

* Updated tests.

* Typo.

* Simplify imports.

* Swapped count and weight to improve readibility of the leaves in the plotted trees.

* Thresholds in bold.

* Made information in nodes written in a specific order.

* Added information to clarify legend.

* Code cleaning.

f52be9be

07 Sep, 2019 2 commits
- [python] removed excess condition (#2385) · de1f3cb3
  Nikita Titov authored Sep 07, 2019
  
  de1f3cb3
- [python] fixed typo (#2386) · 22cd39e8
  Nikita Titov authored Sep 07, 2019
  
  22cd39e8
13 Aug, 2019 1 commit

[python] add sparsity support for new version of pandas and check Series for bad dtypes (#2318) · 8f446be7

Nikita Titov authored Aug 13, 2019

* reworked pandas dtypes mapper

* added tests

* added sparsity support for new version of pandas

* fixed tests for old pandas

* check pd.Series for bad dtypes as well

* enhanced tests

* fixed pylint

8f446be7

07 Aug, 2019 1 commit

[python] Deep copy params in _update_params of DataSet (#2310) · 5cff4e8e

Madiyar authored Aug 07, 2019

Otherwise, it would print `basic.py:762: UserWarning: categorical_feature in param dict is overridden.`. Because when updating the params for a validation test, the updated params for the train test was used which contains `'categorical_column'`.

5cff4e8e

31 Jul, 2019 1 commit
- change the priority of init_score and init_model. (#2291) · 6c210209
  Guolin Ke authored Jul 31, 2019
  
  6c210209
12 Jul, 2019 1 commit

fix init_model with subset (#2252) · 7360cff9

Guolin Ke authored Jul 12, 2019

* fix init_model with subset

* Update basic.py

* added test

* fix predictor naming issue

* Update basic.py

* fix bug

* fix pylint

* fix comments

* Update basic.py

* Update basic.py

* updated test

* fixed bug

* fixed lint

* fix warning

* add get_data before initial prediction

* refine the warning in get_data

* refine warning

* Update basic.py

7360cff9