Commits · d670a4d6551eb27487c1d609a8459222a1b00ba1 · tianlh / LightGBM-DCU

23 Feb, 2022 1 commit

[python-package] use 2d collections for predictions, grads and hess in... · d670a4d6

José Morales authored Feb 22, 2022

[python-package] use 2d collections for predictions, grads and hess in multiclass custom objective (#4925)

* reshape predictions, grad and hess in multiclass custom objective

* add sklearn test. move custom obj to utils. docs for numpy

* use num_model_per_iteration to get num_classes

* update docs and dask multiclass custom objective test

* move reshaping to __inner_predict. add test for feval

* add missing note. remove extra line

d670a4d6

20 Feb, 2022 1 commit

[docs] clarify that categorical features will be converted to integers internally (#4959) · 820ae7e6

José Morales authored Feb 20, 2022

* clarify that categoricals will be converted to ints and not that they should be ints in the input data

* update remaining sections

* update config.h

* add suggestions

820ae7e6

17 Feb, 2022 1 commit
- [docs] clarify that custom eval functions are not only used on training data (#5011) · 717631af
  James Lamb authored Feb 17, 2022
  
  717631af
16 Feb, 2022 2 commits
- [docs] document rounding behavior of floating point numbers in categorical features (#5009) · 057ba078
  Nikita Titov authored Feb 17, 2022
  
  057ba078
- Change docs for feval (#5002) · d31346f6
  Akshita Dixit authored Feb 16, 2022
  
  d31346f6
22 Jan, 2022 1 commit

[python-package] support customizing Dataset creation in Booster.refit() (fixes #3038) (#4894) · e6a2f716

Miguel Trejo Marrufo authored Jan 22, 2022

* feat: refit additional kwargs for dataset and predict

* test: kwargs for refit method

* fix: __init__ got multiple values for argument

* fix: pycodestyle E302 error

* refactor: dataset_params to avoid breaking change

* refactor: expose all Dataset params in refit

* feat: dataset_params updates new_params

* fix: remove unnecessary params to test

* test: parameters input are the same

* docs: address StrikeRUS changes

* test: refit test changes in train dataset

* test: set init_score and decay_rate to zero

e6a2f716

30 Dec, 2021 1 commit

[python] raise an informative error instead of segfaulting when custom... · af5b40e1

Yaqub Alwan authored Dec 30, 2021


[python] raise an informative error instead of segfaulting when custom objective produces incorrect output (#4815)

* fix for bad grads causing segfault

* adjust checking criteria to properly reflect reality of multi-class classifiers

* fix styling

* Line break before operator

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* add a note to the C-API docs

* rearrange text s;ightly

* add some tests to python package

* Update include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* PR comments

* match argument is a regex and our expression has brackets ..

* rework tests

* isorting imports

* updating test to relfect that the python APi does not take pres/labels as a fobj function
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

af5b40e1

11 Dec, 2021 1 commit
- [python] remove `verbose` argument of `model_from_string()` method of Booster class (#4877) · 80662618
  Nikita Titov authored Dec 11, 2021
  
  80662618
03 Dec, 2021 1 commit

Add C API function that returns all parameter names with their aliases (#4829) · cf38071b

Nikita Titov authored Dec 03, 2021



* add C API function that returns all param names with aliases

* add C API function that returns all param names with aliases

* add R code

* test R code

* remove debug CI

* fix R lint

* refactor

* run CI

* fix R

* fix

* revert CI checks

* revert changes in docs

* Try to make function `const`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* add `const` in cpp file

* address review comments and sync with `master`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

cf38071b

02 Dec, 2021 1 commit
- [python][sklearn] respect parameters for predictions in `init()` and `set_params()` methods (#4822) · f57ef6f4
  Nikita Titov authored Dec 02, 2021
```
* in predict(), respect params set via `set_params()` after fit()

* continue

* add test

* fix return name

* hotfix

* simplify
```
  f57ef6f4
26 Nov, 2021 1 commit
- [python][docs] fix type hints for custom functions and remove vague `array-like` wording (#4816) · 5e9b0209
  Nikita Titov authored Nov 27, 2021
```
* Update sklearn.py

* Update engine.py

* Update sklearn.py

* Update engine.py

* Update basic.py

* Update engine.py
```
  5e9b0209
23 Nov, 2021 1 commit
- [python] add type hints to `_compare_params_for_warning()` and make it reusable (#4824) · a1fdeb1f
  Nikita Titov authored Nov 23, 2021
  
  a1fdeb1f
20 Nov, 2021 1 commit

[python] Remove `silent` argument (#4800) · 2caf945f

Nikita Titov authored Nov 21, 2021

* Update test_plotting.py

* Update dask.py

* Update sklearn.py

* Update test_sklearn.py

* Update basic.py

* Update engine.py

* Update test_engine.py

* Update basic.py

* Update basic.py

* Update engine.py

2caf945f

15 Nov, 2021 1 commit

[c_api] Improve ANSI compatibility by avoiding <stdbool.h> (#4697) · bfb346c1

Drew Miller authored Nov 15, 2021

* [c_api] Improve ANSI compatibility by avoiding <stdbool.h>

* fixes in response to CI linting

* inline NOLINT instead of separate test

* moving length declaration to non-ANSI C conditional

* [c_api] Align expected return type in `basic.py` with new c_api type.

bfb346c1

12 Nov, 2021 1 commit

[python] Faster categorical column names selection (#4787) · 6cbb3586

Roman Shaptala authored Nov 12, 2021

* Faster categorical column names selection (#1)

* Faster categorical column names selection

Change slow and redundant dataframe query by select_dtypes into a dataframe.dtypes list comprehension

* Update compat with CategoricalDtype

* sort imports

* import CategoricalDtype from pandas.api.types

* add categorical import try/except

6cbb3586

11 Nov, 2021 1 commit

Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) (#4746) · 3b6ebd79

Michael Mahoney authored Nov 10, 2021

* Add 'nrounds' as an alias for 'num_iterations'

* Improve tests

* Compare against nrounds directly

* Fix whitespace lints

3b6ebd79

08 Nov, 2021 1 commit
- Suppress categorical warning (fixes #3379) · b1facf50
  Zhiyuan He authored Nov 08, 2021
  
  b1facf50
07 Oct, 2021 1 commit

[python] add type hints to _safe_call (#4641) · 7fa07ee2

strobel authored Oct 07, 2021


Co-authored-by: strobel <thaddaeus.strobel@ai4bd.com>
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

7fa07ee2

05 Oct, 2021 1 commit
- add param aliases from scikit-learn (#4637) · e95d5ab8
  Nikita Titov authored Oct 05, 2021
  
  e95d5ab8
17 Sep, 2021 1 commit

[python-package] Support 2d collections as input for `init_score` in... · f1f5ba15

José Morales authored Sep 17, 2021


[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150)

* initial implementation of init_score for multiclass classification

* check for 1d or 2d collection in init_score

* remove dataset import

* initial comments

* update dask test and docstrings

* update docstrings

* move logic to set_field. reshape back on get_field

* add type hints and update docstrings for dask. fix Dataset.set_field

* revert wrong docstrings and type hints

* add extra comma for consistency

* prefix private functions with underscore

add type hints to new functions

make commas consistent in dask and basic

* add missing spaces after type hint

* remove shape condition for dataframe in is_2d_collection
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

f1f5ba15

04 Sep, 2021 1 commit
- [python] deprecate `silent` and standalone `verbose` args. Prefer global `verbose` param (#4577) · 64f15005
  Nikita Titov authored Sep 04, 2021
```
* deprecate `silent` and standalone `verbose` args. Prefer global `verbose` param

* simplify code

* Rephrase warning messages
```
  64f15005
30 Aug, 2021 1 commit
- [docs][python] Refer to functions as callable in docstrings (#4575) · 32445aba
  Nikita Titov authored Aug 30, 2021
  
  32445aba
27 Aug, 2021 2 commits
- [python] Use double type for `init_score` array when set by predictor (#4510) · 99cc4f2f
  Nikita Titov authored Aug 27, 2021
  
  99cc4f2f
- [python][docs] Refer to string type as `str` and add commas in `list of ...` types (#4557) · c6199311
  Nikita Titov authored Aug 27, 2021
```
* Reffer to string type as `str` and and commas in `list of ...` types

* update `libpath.py` too
```
  c6199311
25 Aug, 2021 1 commit

[docs] Clarify the fact that predict() on a file does not support saved... · 417ba192

James Lamb authored Aug 25, 2021


[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) (#4545)

* documentation changes

* add list of supported formats to error message

* add unit tests

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update per review comments

* make references consistent
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

417ba192

23 Aug, 2021 1 commit

[python] add parameter object_hook to method dump_model (#4533) · 11d7608f

Xavier Dupré authored Aug 24, 2021



* add parameter object_hook to function dump_model (python API)

* eol

* fix syntax

* lint

* better documentation

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

11d7608f

19 Aug, 2021 1 commit
- [python] add type hints to logging functions in basic.py (#4527) · c65a2e33
  James Lamb authored Aug 19, 2021
```
* [python] add type hints to logging functions in basic.py

* add hints on wrapper
```
  c65a2e33
03 Aug, 2021 1 commit

Update c_api LGBM_SampleIndices() comment. (#4490) · 1dbf4382

Chen Yufei authored Aug 04, 2021



* Update c_api LGBM_SampleIndices() comment.

rand.Sample() now returns exactly given number of samples, thus the
comment should be fixed.

* Update include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1dbf4382

31 Jul, 2021 1 commit
- [python][tests] refactor tests with Sequence input (#4495) · 661bde10
  Nikita Titov authored Jul 31, 2021
  
  661bde10
30 Jul, 2021 1 commit

[python] support Dataset.get_data for Sequence input. (#4472) · 1d21d1ad

Chen Yufei authored Jul 31, 2021



* [python] support Dataset.get_data for Sequence input.

* Tweaks according to review comments.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Add test cases.

* fix import order in test_basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1d21d1ad

07 Jul, 2021 1 commit
- [python] allow to pass some params as pathlib.Path objects (#4440) · 90342e92
  Nikita Titov authored Jul 07, 2021
```
* allow to pass some params as pathlib.Path objects

* fix lint

* improve indentation
```
  90342e92
05 Jul, 2021 2 commits
- [python] minor refactoring of Python code (#4442) · 7eac5a63
  Nikita Titov authored Jul 05, 2021
```
* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py
```
  7eac5a63
- [docs][python] add versionadded to Sequence class in Python wrapper (#4441) · 1525cc42
  Nikita Titov authored Jul 05, 2021
  
  1525cc42
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

27 Jun, 2021 1 commit
- [python] replace numpy.zeros with numpy.empty for the speedup (#4410) · 45ac271b
  Nikita Titov authored Jun 27, 2021
  
  45ac271b
26 Jun, 2021 1 commit
- fix param aliases (#4387) · aab8fc18
  Nikita Titov authored Jun 26, 2021
  
  aab8fc18
18 Jun, 2021 1 commit
- fix: typo in `_InnerPredictor` docstring. (#4389) · 1b567bf1
  Chen Yufei authored Jun 18, 2021
  
  1b567bf1
21 May, 2021 1 commit
- [python] handle arbitrary length feature names in Python-package (#4293) · 237ac299
  Nikita Titov authored May 21, 2021
```
* handle arbitrary length feature names in Python-package

* added tests
```
  237ac299
20 May, 2021 1 commit
- improve error message for required packages (#4304) · f076ca58
  Nikita Titov authored May 20, 2021
  
  f076ca58
17 May, 2021 1 commit
- [python] Handle integer types more accurate in Python-to-C interface (#4292) · 08c38efc
  Nikita Titov authored May 17, 2021
  
  08c38efc