Commits · c633c6c2afc42816b9f1e3d522ec1eb02ca4e11e · tianlh / LightGBM-DCU

10 Apr, 2020 1 commit

[python] Re-enable scikit-learn 0.22+ support (#2949) · c633c6c2

Nikita Titov authored Apr 10, 2020

* Revert "specify the last supported version of scikit-learn (#2637)"

This reverts commit d1002776.

* ban scikit-learn 0.22.0 and skip broken test

* fix updated test

* fix lint test

* Revert "fix lint test"

This reverts commit 8b4db0805fe7a9e7f7eb0be3eac231f85026d196.

c633c6c2

20 Mar, 2020 1 commit

[python] handle RandomState object in Scikit-learn Api (#2904) · cf0a992e

Lukas Pfannschmidt authored Mar 20, 2020



* Add handling of RandomState object, which is standard for sklearn methods.

LightGBM expects an integer seed instead of an object.
If passed object is RandomState, we choose random integer based on its state to seed the underlying low level code.
While chosen random integer is only in the range between 1 and 1e10 I expect it to have enough entropy (?) to not matter in practice.

* Add RandomState object to random_state docstring.

* remove blank line

* Use property to handle setting random_state.
This enables setting cloned estimators with the set_params method in sklearn.

* Add docstring to attribute.

* Fix and simplify docstring.

* Add test case.

* Use maximal int for datatype in seed derivation.

* Replace random_state property with interfacing in fit method.
Derives int seed for C code only when fitting and keeps RandomState object as param.

* Adapt unit test to property change.

* Extended test case and docstring
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Add more equality checks (feature importance, best iteration/score).

* Add equality comparison of boosters represented by strings.
Remove useless best_iteration_ comparison (we do not use early_stopping).

* fix whitespace

* Test if two subsequent fits produce different models

* Apply suggestions from code review
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

cf0a992e

26 Feb, 2020 1 commit

Code refactoring for ranking objective & Faster ndcg_xendcg (#2801) · e676af23

Guolin Ke authored Feb 26, 2020

* code refactoring

* update vcproject

* refine

* fix test

* Update tests/python_package_test/test_sklearn.py

* fix test

e676af23

25 Feb, 2020 1 commit
- [tests][python] fixed pandas deprecation warning in tests (#2819) · 745b54d6
  Nikita Titov authored Feb 25, 2020
```
* fxied pandas deprecation warning in tests

* support old versions of pandas
```
  745b54d6
03 Feb, 2020 1 commit
- [python][tests] fixed typo (#2732) · 85889901
  Nikita Titov authored Feb 03, 2020
```
* Update test_engine.py

* Update test_sklearn.py
```
  85889901
02 Feb, 2020 1 commit

Support both row-wise and col-wise multi-threading (#2699) · 509c2e50

Guolin Ke authored Feb 02, 2020



* commit

* fix a bug

* fix bug

* reset to track changes

* refine the auto choose logic

* sort the time stats output

* fix include

* change  multi_val_bin_sparse_threshold

* add cmake

* add _mm_malloc and _mm_free for cross platform

* fix cmake bug

* timer for split

* try to fix cmake

* fix tests

* refactor DataPartition::Split

* fix test

* typo

* formating

* Revert "formating"

This reverts commit 5b8de4f7fb9d975ee23701d276a66d40ee6d4222.

* add document

* [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)

* naming

* fix gpu code

* Update include/LightGBM/bin.h
Co-Authored-By: James Lamb <jaylamb20@gmail.com>

* Update src/treelearner/ocl/histogram16.cl

* test: swap compilers for CI

* fix omp

* not avx2

* no aligned for feature histogram

* Revert "refactor DataPartition::Split"

This reverts commit 256e6d9641ade966a1f54da1752e998a1149b6f8.

* slightly refactor data partition

* reduce the memory cost
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

509c2e50

30 Jan, 2020 1 commit

Implementation of XE_NDCG_MART for the ranking task (#2620) · 86530988

sbruch authored Jan 29, 2020

* Implementation of XE_NDCG loss function for ranking.

* Add citation

* Check in example usage for xe_ndcg loss.

* Seed the generator when a seed is provided in the config. Add unit-tests for xe_ndcg

* Update documentation

* Fix indentation

* Address issues raised by reviewers.

* Clean up include statements.

* Fix issues raised by reviewers.

* Regenerate parameters.rst

* Add a note to explain that reproducing xe_ndcg results requires num_threads to be one.

* Introduce objective_seed and use that in rank_xendcg instead of directly using seed

* Change default value of objective_seed

86530988

09 Dec, 2019 1 commit
- [python][sklearn] do not modify args in fit function and minor code cleanup (#2619) · eec60731
  Nikita Titov authored Dec 09, 2019
```
* clean code

* clean code

* do not modify args in fit function

* added test
```
  eec60731
05 Dec, 2019 2 commits

[python] Allow python sklearn interface's fit() to pass init_model to train() (#2447) · f3afe98b

aaiyer authored Dec 05, 2019

* allow python sklearn interface's fit() to pass init_model to train()

* Fix whitespace issues, and change ordering of parameters to be backward
compatible

* Formatting fixes

* allow python sklearn interface's fit() to pass init_model to train()

* Fix whitespace issues, and change ordering of parameters to be backward
compatible

* Formatting fixes

* Recognize LGBModel objects for init_model

* simplified condition

* updated docstring

* added test

f3afe98b

[python][R-package] warn users about untransformed values in case of custom obj (#2611) · 69c1c330
Nikita Titov authored Dec 05, 2019

69c1c330

27 Oct, 2019 2 commits
- [tests][python] refined python tests (#2483) · 1f1dc452
  Nikita Titov authored Oct 27, 2019
```
* speed up tests

* more updates

* fixed pylint

* updated tests

* Update test_sklearn.py

* test that indices are sorted internally
```
  1f1dc452
- [python] removed unused pylint directives (#2466) · 00d1e693
  Nikita Titov authored Oct 27, 2019
  
  00d1e693
15 Sep, 2019 1 commit

[python] Bug fix for first_metric_only on earlystopping. (#2209) · 84754399

kenmatsu4 authored Sep 16, 2019

* Bug fix for first_metric_only if the first metric is train metric.

* Update bug fix for feval issue.

* Disable feval for first_metric_only.

* Additional test items.

* Fix wrong assertEqual settings & formating.

* Change dataset of test.

* Fix random seed for test.

* Modiry assumed test result due to different sklearn verion between CI and local.

* Remove f-string

* Applying variable assumed test result for test.

* Fix flake8 error.

* Modifying in accordance with review comments.

* Modifying for pylint.

* simplified tests

* Deleting error criteria `if eval_metric is None`.

* Delete test items of classification.

* Simplifying if condition.

* Applying first_metric_only for sklearn wrapper.

* Modifying test_sklearn for comforming to python 2.x

* Fix flake8 error.

* Additional fix for sklearn and add tests.

* Bug fix and add test cases.

* some refactor

* fixed lint

* Fix duplicated metrics scores to pass the test.

* Fix the case first_metric_only not in params.

* Converting metrics aliases.

* Add comment.

* Modify comment for pylint.

* Modify comment for pydocstyle.

* Using split test set for two eval_set.

* added test case for metric aliases and length checks

* minor style fixes

* fixed rmse name and alias position

* Fix the case metric=[]

* Fix using env.model._train_data_name

* Fix wrong test condition.

* Move initial process to _init() func.

* Modify test setting for test_sklearn & training data matching on callback.py

* test_sklearn.py
-> A test case for training is wrong, so fixed.

* callback.py
-> A condition of if statement for detecting test dataset is wrong, so fixed.

* Support composite name metrics.

* Remove metric check process & reduce redundant test cases.

For #2273 fixed not only the order of metrics in cpp, removing metric check process at callback.py

* Revised according to the matters pointed out on a review.

* increased code readability

* Fix the issue of order of validation set.

* Changing to OrderdDict from default dict for score result.

* added missed check in cv function for first_metric_only and feval co-occurrence

* keep order only for metrics but not for datasets in best_score

* move OrderedDict initialization to init phase

* fixed minor printing issues

* move first metric detection to init phase and split can be performed without checks

* split only once during callback

* removed excess code

* fixed typo in variable name and squashed ifs

* use setdefault

* hotfix

* fixed failing test

* refined tests

* refined sklearn test

* Making "feval" effective on early stopping.

* allow feval and first_metric_only for cv

* removed unused code

* added tests for feval

* fixed printing

* add note about whitespaces in feval name

* Modifying final iteration process in case valid set is training data.

84754399

03 Sep, 2019 1 commit
- [ci][tests] install joblib for test directly (#2374) · df26b65d
  Nikita Titov authored Sep 03, 2019
  
  df26b65d
24 Aug, 2019 1 commit

normalize the lambdas in lambdamart objective (#2331) · 0dfda826

Guolin Ke authored Aug 25, 2019

* norm the lambda scores

* change default to false

* update doc

* typo

* Update Parameters.rst

* Update config.h

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update rank_objective.hpp

* Update Parameters.rst

* Update config.h

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_sklearn.py

0dfda826

17 Aug, 2019 1 commit

sigmoid_ in grad and hess for rank objective (#2322) · aee92f63

sbruch authored Aug 16, 2019

* Lambdas and hessians need to factor sigmoid_ into the computation. Additionally, the sigmoid function has an arbitrary factor of 2 in the exponent; it is not just non-standard but the gradients are not computed correctly anyway.

* Update unit test

* Also remove a heuristic that normalizes the gradient by the difference in scores.

* Also fix unit test after removing the heuristic

aee92f63

13 Aug, 2019 1 commit

[python] add sparsity support for new version of pandas and check Series for bad dtypes (#2318) · 8f446be7

Nikita Titov authored Aug 13, 2019

* reworked pandas dtypes mapper

* added tests

* added sparsity support for new version of pandas

* fixed tests for old pandas

* check pd.Series for bad dtypes as well

* enhanced tests

* fixed pylint

8f446be7

20 Jun, 2019 1 commit

[tests] use numpy.testing.assert_allclose (#2207) · 86269ee3

Nikita Titov authored Jun 20, 2019

* Update test.py

* Update test_consistency.py

* Update test_basic.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_engine.py

* more replacements

86269ee3

04 Jun, 2019 1 commit
- [python] fix class_weight (#2199) · b6f65783
  Nikita Titov authored Jun 04, 2019
```
* fixed class_weight

* fixed lint

* added test

* hotfix
```
  b6f65783
27 May, 2019 1 commit

[python] fixed picklability of sklearn models with custom obj and updated... · 2459362a

Nikita Titov authored May 27, 2019

[python] fixed picklability of sklearn models with custom obj and updated docstings for custom obj (#2191)

* refactored joblib test

* fixed picklability of sklearn models with custom obj and updated docstings for custom obj

* pickled model should be able to predict without refitting

2459362a

08 May, 2019 1 commit
- [docs] updated Microsoft GitHub URL (#2152) · 94fbe5bb
  Guolin Ke authored May 08, 2019
```
* fix travis badge

* updated GitHub Microsoft URL
```
  94fbe5bb
22 Apr, 2019 1 commit
- [python] disable default pandas cat features if cat features were explicitly provided (#2121) · 4be53a5a
  Nikita Titov authored Apr 22, 2019
```
* disable default pandas cat features if cat features were explicitly provided

* added assertion for cat features
```
  4be53a5a
19 Apr, 2019 1 commit
- [python] ignore pandas ordered categorical columns by default (#2115) · d115769c
  Nikita Titov authored Apr 19, 2019
```
* ignore pandas ordered categorical columns by default

* fix tests

* fix tests

* added comments
```
  d115769c
02 Feb, 2019 1 commit
- improved model loading routines (#1979) · 861de1c1
  Nikita Titov authored Feb 02, 2019
  
  861de1c1
30 Jan, 2019 1 commit

fix nan in eval results (#1973) · feeaf38f

Guolin Ke authored Jan 30, 2019

* always save the score of the first round in early stopping

fix #1971

* avoid using std::log on non-positive numbers

* remove unnecessary changes

* add tests

* Update test_sklearn.py

* enhanced tests

feeaf38f

27 Jan, 2019 1 commit

[tests][python] added tests for metrics' behavior and fixed case for... · f9a1465d

Nikita Titov authored Jan 27, 2019

[tests][python] added tests for metrics' behavior and fixed case for multiclass task with custom objective (#1954)

* added metrics test for standard interface

* simplified code

* less trees

* less trees

* use dummy custom objective and metric

* added tests for multiclass metrics aliases

* fixed bug in case of custom obj and num_class > 1

* added metric test for sklearn wrapper

f9a1465d

20 Dec, 2018 1 commit

[python] fix creating train_set in fit (#1916) · c9bcba44

Tsukasa OMOTO authored Dec 20, 2018

* [python] fix creating train_set in fit

https://github.com/Microsoft/LightGBM/blob/cc99f0d36ae929eb02b22a072823ab7c6d3155ab/python-package/lightgbm/sklearn.py#L519
may False even if valid_data[0] is X and valid_data[1] is y actually, because `check_X_y` might return copy of X and y.
https://scikit-learn.org/0.20/modules/generated/sklearn.utils.check_X_y.html

cf. https://github.com/Microsoft/LightGBM/pull/451

* use assertIn

c9bcba44

11 Oct, 2018 1 commit

[tests] fixed codestyle, removed unused code and added several new checks (#1688) · 108e80f2

Nikita Titov authored Oct 11, 2018

* break huge lines in sklearn tests

* break huge line in plotting tests

* break huge lines in basic tests

* multiple enhancements in engine tests

* multiple enhancements in sklearn tests

* hotfixes

* break huge lines and use with statement in C API test

* make NDCG test more strict

108e80f2

10 Oct, 2018 1 commit

fix ranking tasks consistency (#1739) · 496a07d1

Guolin Ke authored Oct 10, 2018

* fix ndcg consistency.

* more stable sorts

* Update gbdt_model_text.cpp

* Update dataset.cpp

* Update gbdt_model_text.cpp

496a07d1

28 Sep, 2018 1 commit
- [ci][python] fixes according to scikit-learn 0.20 release (#1707) · f53116af
  Nikita Titov authored Sep 28, 2018
```
* fixed FutureWarning about cv default value

* fixed according to new check_estimator API

* fixed joblib warning
```
  f53116af
25 Jul, 2018 1 commit
- [docs] added new parameters aliases (#1537) · 00a125d5
  Nikita Titov authored Jul 25, 2018
```
* added new aliases for params

* run helper/parameter_generator.py

* removed useless test
```
  00a125d5
11 Jul, 2018 1 commit

[python] Configure choice of `feature_importance_` in sklearn API (#1470) · dae75516

Misha Lisovyi authored Jul 11, 2018

* ignore vim temporary files

* add importance_type arg to sklearn API

* update documentation info

* remote a trailing space

* remove trailing space (again :))

* add instructions on importance choices to sklearn API

* drop mention of constructor in the feature type setting

* adding a test for different feture types

* remove trailing spaces, make shorter assert in feature importance type handling test

* fixing style issue introduced with the new test

dae75516

20 Jun, 2018 1 commit

[python] tests for plot tree functions and module_INSTALLED variables (#1438) · 5fe2bdd7

Nikita Titov authored Jun 20, 2018

* removed excess import

* added tests for plotting trees in Python

* refined module_INSTALLED mechanism

* added note about that create_tree_digraph is better than plot_tree

5fe2bdd7

09 Jun, 2018 1 commit

[python] make tree rendering more clear (#1424) · 69a36605

Nikita Titov authored Jun 09, 2018

* fixed grammar

* fixed params description in graph plotting functions

* clarified types of attributes in their descriptions

* increased readability of graphs by adding spaces

* added precision parameter to plot tree functions

69a36605

10 May, 2018 1 commit

[python][docs] reworked predict method in sklearn wrapper and docs improvements (#1351) · 41152eab

Nikita Titov authored May 10, 2018

* fixed docs

* reworker predict method of sklearn wrapper

* fixed encapsulation

* added test

* fixed consistency between docstring and params docs

* fixed verbose

* replaced predict_proba with predict in test

* fixed verbose again

* fixed fraction params descriptions

* added description of skip_drop and drop_rate constraints

* fixed subsample_freq consistency with C++ default value

* fixed nice look of params list

* made force splits json file example clickable

* fixed nice look of metrics list and added comma

* reduced warning in test about same param specified twice

* replaced pred_parameter with **kwargs in predict method

* added test for **kwargs in predict method

* fixed warnings

* fixed pylint

41152eab

19 Sep, 2017 1 commit

[python] bring pandas support to the sklearn wrapper back (#904) · 0350a9a6

Nikita Titov authored Sep 19, 2017

* added test for sklearn handle categorical features

* use raw X, y in sklearn wrapper in case of pandas.DataFrame

* fixed probs

0350a9a6

08 Sep, 2017 1 commit

[python] [setup] improving installation (#880) · 8984111f

Nikita Titov authored Sep 08, 2017

* disabled logs from compilers; fixed #874

* fixed safe clear_fplder

* added windows folder to manifest.in

* added windows folder to build

* added library path

* added compilation with MSBuild from .sln-file

* fixed unknown PlatformToolset returns exitcode 0

* hotfix

* updated Readme

* removed return

* added installation with mingw test to appveyor

* let's test appveyor with both VS 2015 and VS 2017; but MinGW isn't installed on VS 2017 image

* fixed built-in name 'file'

* simplified appveyor

* removed excess data_files

* fixed unreadable paths

* separated exceptions for cmake and mingw

* refactored silent_call

* don't create artifacts with VS 2015 and mingw

* be more precise with python versioning in Travis

* removed unnecessary if statement

* added classifiers for PyPI and python versions badge

* changed python version in travis

* added support of scikit-learn 0.18.x

* added more python versions to Travis

* added more python versions to Appveyor

* reduced number of tests in Travis

* Travis trick is not needed anymore

* attempt to fix according to https://github.com/Microsoft/LightGBM/pull/880#discussion_r137438856

8984111f

05 Sep, 2017 2 commits

[python] fixed sklearn test on python 2.7 (#888) · db8b6b00

Nikita Titov authored Sep 05, 2017

* fixed sklearn test on python 2.7

* commit to show that problem has been solved

* come back to python 3.6

* removed warnings check

db8b6b00

[python] improved sklearn interface (#870) · 015c8fff

Nikita Titov authored Sep 05, 2017

* improved sklearn interface; added sklearns' tests

* moved best_score into the if statement

* improved docstrings; simplified LGBMCheckConsistentLength

* fixed typo

* pylint

* updated example

* fixed Ranker interface

* added missed boosting_type

* fixed more comfortable autocomplete without unused objects

* removed check for None of eval_at

* fixed according to review

* fixed typo

* added description of fit return type

* dictionary->dict for short

* markdown cleanup

015c8fff

23 Aug, 2017 1 commit

[python] parameters renaming for sklearn naming convention (#854) · 3f0061ca

Nikita Titov authored Aug 23, 2017

* updated scikit-learn interface

* fixed better description

* updated set_params()

* removed backward compatibility

* removed excess lines

* replaced pop with setdefault

* added deprecated warnings

* added tests

3f0061ca