Commits · af5b40e1f653e25a5f3470e09de4d1c68dc59663 · tianlh / LightGBM-DCU

30 Dec, 2021 1 commit

[python] raise an informative error instead of segfaulting when custom... · af5b40e1

Yaqub Alwan authored Dec 30, 2021


[python] raise an informative error instead of segfaulting when custom objective produces incorrect output (#4815)

* fix for bad grads causing segfault

* adjust checking criteria to properly reflect reality of multi-class classifiers

* fix styling

* Line break before operator

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add a note to the C-API docs

* rearrange text s;ightly

* add some tests to python package

* Update include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* PR comments

* match argument is a regex and our expression has brackets ..

* rework tests

* isorting imports

* updating test to relfect that the python APi does not take pres/labels as a fobj function
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

af5b40e1

03 Dec, 2021 1 commit

Add C API function that returns all parameter names with their aliases (#4829) · cf38071b

Nikita Titov authored Dec 03, 2021



* add C API function that returns all param names with aliases

* add C API function that returns all param names with aliases

* add R code

* test R code

* remove debug CI

* fix R lint

* refactor

* run CI

* fix R

* fix

* revert CI checks

* revert changes in docs

* Try to make function `const`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* add `const` in cpp file

* address review comments and sync with `master`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

cf38071b

16 Nov, 2021 1 commit

Add customized parser support (#4782) · b0137deb

chjinche authored Nov 16, 2021

* add customized parser support

* fix typo of parser_config_file description

* make delimiter as parameter of JoinedLines

b0137deb

07 Oct, 2021 1 commit
- [tests][python-package] refactor list_to_1d_numpy test to run without pandas installed (#4639) · 29857c8a
  José Morales authored Oct 07, 2021
```
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>
```
  29857c8a
17 Sep, 2021 1 commit

[python-package] Support 2d collections as input for `init_score` in... · f1f5ba15

José Morales authored Sep 17, 2021


[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150)

* initial implementation of init_score for multiclass classification

* check for 1d or 2d collection in init_score

* remove dataset import

* initial comments

* update dask test and docstrings

* update docstrings

* move logic to set_field. reshape back on get_field

* add type hints and update docstrings for dask. fix Dataset.set_field

* revert wrong docstrings and type hints

* add extra comma for consistency

* prefix private functions with underscore

add type hints to new functions

make commas consistent in dask and basic

* add missing spaces after type hint

* remove shape condition for dataframe in is_2d_collection
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

f1f5ba15

31 Jul, 2021 1 commit
- [python][tests] refactor tests with Sequence input (#4495) · 661bde10
  Nikita Titov authored Jul 31, 2021
  
  661bde10
30 Jul, 2021 1 commit

[python] support Dataset.get_data for Sequence input. (#4472) · 1d21d1ad

Chen Yufei authored Jul 31, 2021



* [python] support Dataset.get_data for Sequence input.

* Tweaks according to review comments.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Add test cases.

* fix import order in test_basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1d21d1ad

07 Jul, 2021 1 commit
- [python] allow to pass some params as pathlib.Path objects (#4440) · 90342e92
  Nikita Titov authored Jul 07, 2021
```
* allow to pass some params as pathlib.Path objects

* fix lint

* improve indentation
```
  90342e92
05 Jul, 2021 1 commit

[python] minor refactoring of Python code (#4442) · 7eac5a63

Nikita Titov authored Jul 05, 2021

* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py

7eac5a63

04 Jul, 2021 2 commits
- [tests] fix deprecation numpy warning (#4439) · 29052c5d
  Nikita Titov authored Jul 05, 2021
  
  29052c5d
- [python] migrate to pathlib in python tests (#4435) · cff80442
  Nikita Titov authored Jul 04, 2021
  
  cff80442
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

21 May, 2021 2 commits
- [python] improving the syntax of the fstring in the file :... · da3465cb
  sayantan sadhu authored May 21, 2021
```
[python] improving the syntax of the fstring in the file : tests/python_package_test/test_basic.py (#4312)
```
  da3465cb
- [python] handle arbitrary length feature names in Python-package (#4293) · 237ac299
  Nikita Titov authored May 21, 2021
```
* handle arbitrary length feature names in Python-package

* added tests
```
  237ac299
24 Feb, 2021 1 commit

[dask][python-package] include support for column array as label (#3943) · 5dacd603

jmoralez authored Feb 24, 2021

* include support for column array as label

* remove nested ifs

* fix linting errors

* include tests for sklearn regressors

* include docstring for numpy_1d_array_to_dtype

* include . at end of docstring

* remove pandas import and test for regression, classification and ranking

* check predictions of sklearn models as well

* test training only in dask. drop pandas series tests

* use PANDAS_INSTALLED and pd_Series

* inline imports

* use col array in fit for test_dask

* include review comments

5dacd603

16 Feb, 2021 1 commit
- [ci][python] apply isort to tests/python_package_test/test_basic.py #3958 (#3977) · 9445b2ca
  Zhuyi Xue authored Feb 15, 2021
  
  9445b2ca
26 Jan, 2021 1 commit

[python-package] respect parameter aliases for network params (#3813) · 9f70e968

James Lamb authored Jan 26, 2021



* [dask] allow parameter aliases for tree_learner and local_listen_port (fixes #3671)

* num_thread too

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* empty commit

* add _choose_param_value

* revert param order change

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/dask.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* just import deepcopy

* remove machines aliases

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

9f70e968

23 Jan, 2021 1 commit
- [python][tests] transfer test_save_and_load_linear to test_engine (#3821) · e754f23a
  Nikita Titov authored Jan 23, 2021
  
  e754f23a
15 Jan, 2021 2 commits
- completely remove tempfile from test_basic (#3767) · f2695dab
  Nikita Titov authored Jan 15, 2021
  
  f2695dab
- [python][tests] Migrates test_basic.py to use pytest (#3764) · 9bacf03c
  Thomas J. Fan authored Jan 15, 2021
```
* TST Migrates test_basic.py to use pytest

* STY Linting

* CI Force CI to run
```
  9bacf03c
04 Jan, 2021 1 commit
- [python][tests] small Python tests cleanup (#3715) · 69798c3e
  Nikita Titov authored Jan 04, 2021
  
  69798c3e
28 Dec, 2020 1 commit

small code and docs refactoring (#3681) · 5a460846

Nikita Titov authored Dec 29, 2020

* small code and docs refactoring

* Update CMakeLists.txt

* Update .vsts-ci.yml

* Update test.sh

* continue

* continue

* revert stable sort for all-unique values

5a460846

24 Dec, 2020 1 commit

Trees with linear models at leaves (#3299) · fcfd4132

Belinda Trotta authored Dec 24, 2020

* Add Eigen library.

* Working for simple test.

* Apply changes to config params.

* Handle nan data.

* Update docs.

* Add test.

* Only load raw data if boosting=gbdt_linear

* Remove unneeded code.

* Minor updates.

* Update to work with sk-learn interface.

* Update to work with chunked datasets.

* Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.

* Save raw data in binary dataset file.

* Update docs and fix parameter checking.

* Fix dataset loading.

* Add test for regularization.

* Fix bugs when saving and loading tree.

* Add test for load/save linear model.

* Remove unneeded code.

* Fix case where not enough leaf data for linear model.

* Simplify code.

* Speed up code.

* Speed up code.

* Simplify code.

* Speed up code.

* Fix bugs.

* Working version.

* Store feature data column-wise (not fully working yet).

* Fix bugs.

* Speed up.

* Speed up.

* Remove unneeded code.

* Small speedup.

* Speed up.

* Minor updates.

* Remove unneeded code.

* Fix bug.

* Fix bug.

* Speed up.

* Speed up.

* Simplify code.

* Remove unneeded code.

* Fix bug, add more tests.

* Fix bug and add test.

* Only store numerical features

* Fix bug and speed up using templates.

* Speed up prediction.

* Fix bug with regularisation

* Visual studio files.

* Working version

* Only check nans if necessary

* Store coeff matrix as an array.

* Align cache lines

* Align cache lines

* Preallocation coefficient calculation matrices

* Small speedups

* Small speedup

* Reverse cache alignment changes

* Change to dynamic schedule

* Update docs.

* Refactor so that linear tree learner is not a separate class.

* Add refit capability.

* Speed up

* Small speedups.

* Speed up add prediction to score.

* Fix bug

* Fix bug and speed up.

* Speed up dataload.

* Speed up dataload

* Use vectors instead of pointers

* Fix bug

* Add OMP exception handling.

* Change return type of LGBM_BoosterGetLinear to bool

* Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change

* Remove unused internal_parent_ property of tree

* Remove unused parameter to CreateTreeLearner

* Remove reference to LinearTreeLearner

* Minor style issues

* Remove unneeded check

* Reverse temporary testing change

* Fix Visual Studio project files

* Restore LightGBM.vcxproj.filters

* Speed up

* Speed up

* Simplify code

* Update docs

* Simplify code

* Initialise storage space for max num threads

* Move Eigen to include directory and delete unused files

* Remove old files.

* Fix so it compiles with mingw

* Fix gpu tree learner

* Change AddPredictionToScore back to const

* Fix python lint error

* Fix C++ lint errors

* Change eigen to a submodule

* Update comment

* Add the eigen folder

* Try to fix build issues with eigen

* Remove eigen files

* Add eigen as submodule

* Fix include paths

* Exclude eigen files from Python linter

* Ignore eigen folders for pydocstyle

* Fix C++ linting errors

* Fix docs

* Fix docs

* Exclude eigen directories from doxygen

* Update manifest to include eigen

* Update build_r to include eigen files

* Fix compiler warnings

* Store raw feature data as float

* Use float for calculating linear coefficients

* Remove eigen directory from GLOB

* Don't compile linear model code when building R package

* Fix doxygen issue

* Fix lint issue

* Fix lint issue

* Remove uneeded code

* Restore delected lines

* Restore delected lines

* Change return type of has_raw to bool

* Update docs

* Rename some variables and functions for readability

* Make tree_learner parameter const in AddScore

* Fix style issues

* Pass vectors as const reference when setting tree properties

* Make temporary storage of serial_tree_learner mutable so we can make the object's methods const

* Remove get_raw_size, use num_numeric_features instead

* Fix typo

* Make contains_nan_ and any_nan_ properties immutable again

* Remove data_has_nan_ property of tree

* Remove temporary test code

* Make linear_tree a dataset param

* Fix lint error

* Make LinearTreeLearner a separate class

* Fix lint errors

* Fix lint error

* Add linear_tree_learner.o

* Simulate omp_get_max_threads if openmp is not available

* Update PushOneData to also store raw data.

* Cast size to int

* Fix bug in ReshapeRaw

* Speed up code with multithreading

* Use OMP_NUM_THREADS

* Speed up with multithreading

* Update to use ArrayToString

* Fix tests

* Fix test

* Fix bug introduced in merge

* Minor updates

* Update docs

fcfd4132

30 Oct, 2020 1 commit
- [tests][python] remove excess iterations (#3504) · 709d3728
  nabokovas authored Oct 30, 2020
  
  709d3728
29 Oct, 2020 1 commit

[tests][python] reduce unnecessary data loading in tests (#3486) · 03c4d455

James Lamb authored Oct 29, 2020



* [ci] [python] reduce unnecessary data loading in tests

* add profiling files to gitignore

* just use cache()

* default on cache size

* patch lru_cache on Python 2.7

* linting

* reduce duplicated code

* missing warnings

* fix imports

* fix lru_cache backport

* missing kwargs

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* reduce duplicated code

* cache in test_plotting
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

03c4d455

26 Oct, 2020 1 commit

Fix add features (#2754) · 53977f36

Guolin Ke authored Oct 27, 2020



* fix subset bug

* typo

* add fixme tag

* bin mapper

* fix test

* fix add_features_from

* Update dataset.cpp

* fix merge bug

* added Python merge code

* added test for add_features

* Update dataset.cpp

* Update src/io/dataset.cpp

* continue implementing

* warn users about categorical features
Co-authored-by: StrikerRUS <nekit94-12@hotmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

53977f36

11 Aug, 2020 1 commit
- [ci][python] fix sklearn FutureWarning about positional args (#3295) · e6bf4090
  Nikita Titov authored Aug 11, 2020
  
  e6bf4090
11 Jun, 2020 1 commit
- refactor LGBM_DatasetGetFeatureNames (#3022) · f30e0bb3
  Nikita Titov authored Jun 11, 2020
  
  f30e0bb3
20 Feb, 2020 1 commit

Add capability to get possible max and min values for a model (#2737) · 18e7de4f

Joan Fontanals authored Feb 20, 2020



* Add capability to get possible max and min values for a model

* Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp

* Update include/LightGBM/c_api.h
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Change iteration to avoid potential overflow, add bindings to R and Python and a basic test

* Adjust test values

* Consider const correctness and multithreading protection

* Update test values

* Update test values

* Add test to check that model is exactly the same in all platforms

* Try to parse the model to get the expected values

* Try to parse the model to get the expected values

* Fix implementation, num_leaves can be lower than the leaf_value_ size

* Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value

* Change test order

* Add gpu_use_dp option in test

* Remove helper test method

* Update src/c_api.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update src/io/tree.cpp
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Update tests/python_package_test/test_basic.py
Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Remoove imports
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

18e7de4f

19 Feb, 2020 1 commit

[python] [R-package] refine the parameters for Dataset (#2594) · 9f79e840

Guolin Ke authored Feb 19, 2020



* reset

* fix a bug

* fix test

* Update c_api.h

* support to no filter features by min_data

* add warning in reset config

* refine warnings for override dataset's parameter

* some cleans

* clean code

* clean code

* refine C API function doxygen comments

* refined new param description

* refined doxygen comments for R API function

* removed stuff related to int8

* break long line in warning message

* removed tests which results cannot be validated anymore

* added test for warnings about unchangeable params

* write parameter from dataset to booster

* consider free_raw_data.

* fix params

* fix bug

* implementing R

* fix typo

* filter params in R

* fix R

* not min_data

* refined tests

* fixed linting

* refine

* pilint

* add docstring

* fix docstring

* R lint

* updated description for C API function

* use param aliases in Python

* fixed typo

* fixed typo

* added more params to test

* removed debug print

* fix dataset construct place

* fix merge bug

* Update feature_histogram.hpp

* add is_sparse back

* remove unused parameters

* fix lint

* add data random seed

* update

* [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767)
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>

9f79e840

14 Jan, 2020 1 commit
- [python] fix trees_to_dataframe and enhance test (#2690) · b161f334
  Nikita Titov authored Jan 14, 2020
```
* transfer and enhance test for trees_to_dataframe

* fixed bug in Python 2
```
  b161f334
10 Jan, 2020 1 commit

[python] Output model to a pandas DataFrame (#2592) · 301402c8

Patrick Ford authored Jan 10, 2020

* trees_to_df method and unit test added. PEP 8 fixes for integration.

* Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

Post-review changes

* changes from second round of reviews from striker

* third round of review. formatting and added 2 more tests

* replaced pandas dot attribute accessor with string attribute accessor

* dealt with single tree edge case and minor refactor of tests

* slight refactor for checking if tree is a single node

301402c8

27 Oct, 2019 2 commits
- [tests][python] refined python tests (#2483) · 1f1dc452
  Nikita Titov authored Oct 27, 2019
```
* speed up tests

* more updates

* fixed pylint

* updated tests

* Update test_sklearn.py

* test that indices are sorted internally
```
  1f1dc452
- [python] removed unused pylint directives (#2466) · 00d1e693
  Nikita Titov authored Oct 27, 2019
  
  00d1e693
03 Oct, 2019 1 commit

check the shape for mat, csr and csc in prediction (#2464) · dee72159

Guolin Ke authored Oct 03, 2019

* check the shape for mat, csr and csc

* guess from csr

* support file checking

* better error msg

* grammar

* clean code

* code clean

* check range for CSR

* Update test_.py

* Update test_.py

* added tests

dee72159

26 Sep, 2019 1 commit
- [python] make dump_text() private (#2434) · a0a117aa
  Nikita Titov authored Sep 26, 2019
```
* make dump_text() private

* updated test
```
  a0a117aa
09 Sep, 2019 1 commit
- [python] keep consistent state for Dataset fields (#2390) · 9f6e4413
  Nikita Titov authored Sep 09, 2019
```
* keep consistent state for Dataset fields

* hotfix
```
  9f6e4413
20 Jun, 2019 1 commit

[tests] use numpy.testing.assert_allclose (#2207) · 86269ee3

Nikita Titov authored Jun 20, 2019

* Update test.py

* Update test_consistency.py

* Update test_basic.py

* Update test_sklearn.py

* Update test_sklearn.py

* Update test_engine.py

* more replacements

86269ee3

04 Apr, 2019 1 commit

Add Cost Effective Gradient Boosting (#2014) · 76102284

remcob-gr authored Apr 04, 2019

* Add configuration parameters for CEGB.

* Add skeleton CEGB tree learner

Like the original CEGB version, this inherits from SerialTreeLearner.
Currently, it changes nothing from the original.

* Track features used in CEGB tree learner.

* Pull CEGB tradeoff and coupled feature penalty from config.

* Implement finding best splits for CEGB

This is heavily based on the serial version, but just adds using the coupled penalties.

* Set proper defaults for cegb parameters.

* Ensure sanity checks don't switch off CEGB.

* Implement per-data-point feature penalties in CEGB.

* Implement split penalty and remove unused parameters.

* Merge changes from CEGB tree learner into serial tree learner

* Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.

* Fix bug where CEGB would incorrectly penalise a previously used feature

The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
This caused it to prefer new features due to incorrectly penalising splitting on previously used features.

* Document CEGB parameters and add them to the appropriate section.

* Remove leftover reference to cegb tree learner.

* Remove outdated diff.

* Fix warnings

* Fix minor issues identified by @StrikerRUS.

* Add docs section on CEGB, including citation.

* Fix link.

* Fix CI failure.

* Add some unit tests

* Fix pylint issues.

* Fix remaining pylint issue

76102284

07 Mar, 2019 1 commit

[tests] fixed and refactored some tests (#2035) · 8aa08c4a

Nikita Titov authored Mar 07, 2019

* fixed number of tests in pytest

* fixed data shape and removed unused code

* refactored tests

* hotfix

* hotfix

8aa08c4a