Commits · 0e25841d672f6776ba4bb54d3e466cd817ca9f5a · tianlh / LightGBM-DCU

03 Dec, 2021 1 commit

Add C API function that returns all parameter names with their aliases (#4829) · cf38071b

Nikita Titov authored Dec 03, 2021



* add C API function that returns all param names with aliases

* add C API function that returns all param names with aliases

* add R code

* test R code

* remove debug CI

* fix R lint

* refactor

* run CI

* fix R

* fix

* revert CI checks

* revert changes in docs

* Try to make function `const`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* add `const` in cpp file

* address review comments and sync with `master`
Co-authored-by: James Lamb <jaylamb20@gmail.com>

cf38071b

02 Dec, 2021 1 commit
- [python][sklearn] respect parameters for predictions in `init()` and `set_params()` methods (#4822) · f57ef6f4
  Nikita Titov authored Dec 02, 2021
```
* in predict(), respect params set via `set_params()` after fit()

* continue

* add test

* fix return name

* hotfix

* simplify
```
  f57ef6f4
26 Nov, 2021 1 commit
- [python][docs] fix type hints for custom functions and remove vague `array-like` wording (#4816) · 5e9b0209
  Nikita Titov authored Nov 27, 2021
```
* Update sklearn.py

* Update engine.py

* Update sklearn.py

* Update engine.py

* Update basic.py

* Update engine.py
```
  5e9b0209
23 Nov, 2021 1 commit
- [python] add type hints to `_compare_params_for_warning()` and make it reusable (#4824) · a1fdeb1f
  Nikita Titov authored Nov 23, 2021
  
  a1fdeb1f
20 Nov, 2021 1 commit

[python] Remove `silent` argument (#4800) · 2caf945f

Nikita Titov authored Nov 21, 2021

* Update test_plotting.py

* Update dask.py

* Update sklearn.py

* Update test_sklearn.py

* Update basic.py

* Update engine.py

* Update test_engine.py

* Update basic.py

* Update basic.py

* Update engine.py

2caf945f

15 Nov, 2021 1 commit

[c_api] Improve ANSI compatibility by avoiding <stdbool.h> (#4697) · bfb346c1

Drew Miller authored Nov 15, 2021

* [c_api] Improve ANSI compatibility by avoiding <stdbool.h>

* fixes in response to CI linting

* inline NOLINT instead of separate test

* moving length declaration to non-ANSI C conditional

* [c_api] Align expected return type in `basic.py` with new c_api type.

bfb346c1

12 Nov, 2021 1 commit

[python] Faster categorical column names selection (#4787) · 6cbb3586

Roman Shaptala authored Nov 12, 2021

* Faster categorical column names selection (#1)

* Faster categorical column names selection

Change slow and redundant dataframe query by select_dtypes into a dataframe.dtypes list comprehension

* Update compat with CategoricalDtype

* sort imports

* import CategoricalDtype from pandas.api.types

* add categorical import try/except

6cbb3586

11 Nov, 2021 1 commit

Add 'nrounds' as an alias for 'num_iterations' (fixes #4743) (#4746) · 3b6ebd79

Michael Mahoney authored Nov 10, 2021

* Add 'nrounds' as an alias for 'num_iterations'

* Improve tests

* Compare against nrounds directly

* Fix whitespace lints

3b6ebd79

08 Nov, 2021 1 commit
- Suppress categorical warning (fixes #3379) · b1facf50
  Zhiyuan He authored Nov 08, 2021
  
  b1facf50
07 Oct, 2021 1 commit

[python] add type hints to _safe_call (#4641) · 7fa07ee2

strobel authored Oct 07, 2021


Co-authored-by: strobel <thaddaeus.strobel@ai4bd.com>
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

7fa07ee2

05 Oct, 2021 1 commit
- add param aliases from scikit-learn (#4637) · e95d5ab8
  Nikita Titov authored Oct 05, 2021
  
  e95d5ab8
17 Sep, 2021 1 commit

[python-package] Support 2d collections as input for `init_score` in... · f1f5ba15

José Morales authored Sep 17, 2021


[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150)

* initial implementation of init_score for multiclass classification

* check for 1d or 2d collection in init_score

* remove dataset import

* initial comments

* update dask test and docstrings

* update docstrings

* move logic to set_field. reshape back on get_field

* add type hints and update docstrings for dask. fix Dataset.set_field

* revert wrong docstrings and type hints

* add extra comma for consistency

* prefix private functions with underscore

add type hints to new functions

make commas consistent in dask and basic

* add missing spaces after type hint

* remove shape condition for dataframe in is_2d_collection
Co-authored-by: Nikita Titov <nekit94-12@hotmail.com>

f1f5ba15

04 Sep, 2021 1 commit
- [python] deprecate `silent` and standalone `verbose` args. Prefer global `verbose` param (#4577) · 64f15005
  Nikita Titov authored Sep 04, 2021
```
* deprecate `silent` and standalone `verbose` args. Prefer global `verbose` param

* simplify code

* Rephrase warning messages
```
  64f15005
30 Aug, 2021 1 commit
- [docs][python] Refer to functions as callable in docstrings (#4575) · 32445aba
  Nikita Titov authored Aug 30, 2021
  
  32445aba
27 Aug, 2021 2 commits
- [python] Use double type for `init_score` array when set by predictor (#4510) · 99cc4f2f
  Nikita Titov authored Aug 27, 2021
  
  99cc4f2f
- [python][docs] Refer to string type as `str` and add commas in `list of ...` types (#4557) · c6199311
  Nikita Titov authored Aug 27, 2021
```
* Reffer to string type as `str` and and commas in `list of ...` types

* update `libpath.py` too
```
  c6199311
25 Aug, 2021 1 commit

[docs] Clarify the fact that predict() on a file does not support saved... · 417ba192

James Lamb authored Aug 25, 2021


[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) (#4545)

* documentation changes

* add list of supported formats to error message

* add unit tests

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* update per review comments

* make references consistent
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

417ba192

23 Aug, 2021 1 commit

[python] add parameter object_hook to method dump_model (#4533) · 11d7608f

Xavier Dupré authored Aug 24, 2021



* add parameter object_hook to function dump_model (python API)

* eol

* fix syntax

* lint

* better documentation

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: xavier dupré <xavier.dupre@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

11d7608f

19 Aug, 2021 1 commit
- [python] add type hints to logging functions in basic.py (#4527) · c65a2e33
  James Lamb authored Aug 19, 2021
```
* [python] add type hints to logging functions in basic.py

* add hints on wrapper
```
  c65a2e33
03 Aug, 2021 1 commit

Update c_api LGBM_SampleIndices() comment. (#4490) · 1dbf4382

Chen Yufei authored Aug 04, 2021



* Update c_api LGBM_SampleIndices() comment.

rand.Sample() now returns exactly given number of samples, thus the
comment should be fixed.

* Update include/LightGBM/c_api.h
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1dbf4382

31 Jul, 2021 1 commit
- [python][tests] refactor tests with Sequence input (#4495) · 661bde10
  Nikita Titov authored Jul 31, 2021
  
  661bde10
30 Jul, 2021 1 commit

[python] support Dataset.get_data for Sequence input. (#4472) · 1d21d1ad

Chen Yufei authored Jul 31, 2021



* [python] support Dataset.get_data for Sequence input.

* Tweaks according to review comments.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Add test cases.

* fix import order in test_basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

1d21d1ad

07 Jul, 2021 1 commit
- [python] allow to pass some params as pathlib.Path objects (#4440) · 90342e92
  Nikita Titov authored Jul 07, 2021
```
* allow to pass some params as pathlib.Path objects

* fix lint

* improve indentation
```
  90342e92
05 Jul, 2021 2 commits
- [python] minor refactoring of Python code (#4442) · 7eac5a63
  Nikita Titov authored Jul 05, 2021
```
* Update test_sklearn.py

* Update test_basic.py

* Update dask.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update basic.py

* Update callback.py
```
  7eac5a63
- [docs][python] add versionadded to Sequence class in Python wrapper (#4441) · 1525cc42
  Nikita Titov authored Jul 05, 2021
  
  1525cc42
02 Jul, 2021 1 commit

[python-package] Create Dataset from multiple data files (#4089) · c359896e

Chen Yufei authored Jul 02, 2021

* [python-package] create Dataset from sampled data.

* [python-package] create Dataset from List[Sequence].

1. Use random access for data sampling
2. Support read data from multiple input files
3. Read data in batch so no need to hold all data in memory

* [python-package] example: create Dataset from multiple HDF5 file.

* fix: revert is_class implementation for seq

* fix: unwanted memory view reference for seq

* fix: seq is_class accepts sklearn matrices

* fix: requirements for example

* fix: pycode

* feat: print static code linting stage

* fix: linting: avoid shell str regex conversion

* code style: doc style

* code style: isort

* fix ci dependency: h5py on windows

* [py] remove rm files in test seq
https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623

* docs(python): init_from_sample summary

https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389



* remove dataset dump sample data debugging code.

* remove typo fix.

Create separate PR for this.

* fix typo in src/c_api.cpp
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* style(linting): py3 type hint for seq

* test(basic): os.path style path handling

* Revert "feat: print static code linting stage"

This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d.

* feat(python): sequence on validation set

* minor(python): comment

* minor(python): test option hint

* style(python): fix code linting

* style(python): add pydoc for ref_dataset

* doc(python): sequence
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* revert(python): sequence class abc

* chore(python): remove rm_files

* Remove useless static_assert.

* refactor: test_basic test for sequence.

* fix lint complaint.

* remove dataset._dump_text in sequence test.

* Fix reverting typo fix.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Fix type hint, code and doc style.

* fix failing test_basic.

* Remove TODO about keep constant in sync with cpp.

* Install h5py only when running python-examples.

* Fix lint complaint.

* Apply suggestions from code review
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Doc fixes, remove unused params_str in __init_from_seqs.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Remove unnecessary conda install in windows ci script.

* Keep param as example in dataset_from_multi_hdf5.py

* Add _get_sample_count function to remove code duplication.

* Use batch_size parameter in generate_hdf.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Fix after applying suggestions.

* Fix test, check idx is instance of numbers.Integral.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Expose Sequence class in Python-API doc.

* Handle Sequence object not having batch_size.

* Fix isort lint complaint.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Update docstring to mention Sequence as data input.

* Remove get_one_line in test_basic.py

* Make Sequence an abstract class.

* Reduce number of tests for test_sequence.

* Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices.

* empty commit to trigger ci

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t.

Also rename total_nrow to num_total_row in c_api.h for consistency.

* Doc about Sequence in docs/Python-Intro.rst.

* Fix: basic.py change LGBM_SampleIndices out_len to int32.

* Add create_valid test case with Dataset from Sequence.

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>

* Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT.

* Update python-package/lightgbm/basic.py
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Willian Zhang <willian@willian.email>
Co-authored-by: Willian Z <Willian@Willian-Zhang.com>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: shiyu1994 <shiyu_k1994@qq.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

c359896e

27 Jun, 2021 1 commit
- [python] replace numpy.zeros with numpy.empty for the speedup (#4410) · 45ac271b
  Nikita Titov authored Jun 27, 2021
  
  45ac271b
26 Jun, 2021 1 commit
- fix param aliases (#4387) · aab8fc18
  Nikita Titov authored Jun 26, 2021
  
  aab8fc18
18 Jun, 2021 1 commit
- fix: typo in `_InnerPredictor` docstring. (#4389) · 1b567bf1
  Chen Yufei authored Jun 18, 2021
  
  1b567bf1
21 May, 2021 1 commit
- [python] handle arbitrary length feature names in Python-package (#4293) · 237ac299
  Nikita Titov authored May 21, 2021
```
* handle arbitrary length feature names in Python-package

* added tests
```
  237ac299
20 May, 2021 1 commit
- improve error message for required packages (#4304) · f076ca58
  Nikita Titov authored May 20, 2021
  
  f076ca58
17 May, 2021 1 commit
- [python] Handle integer types more accurate in Python-to-C interface (#4292) · 08c38efc
  Nikita Titov authored May 17, 2021
  
  08c38efc
15 May, 2021 1 commit

[python] added f-string to python-package/lightgbm/basic.py (#4143) · bffa6ca5

NovusEdge authored May 15, 2021



* added f-string

* fix missing parentheses and other string formatting

* remove extra trailing parenthesis

* one more missing parenthesis

* fix pandas categoricals

* update uses of +

* Apply suggestions from code review
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

bffa6ca5

10 May, 2021 1 commit
- [docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetEvalCounts... · 08d1ce4b
  James Lamb authored May 10, 2021
```
[docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetEvalCounts (fixes #4264) (#4270)
```
  08d1ce4b
04 May, 2021 1 commit

Correct spelling (#4250) · e79716e0

Andrew Ziem authored May 04, 2021



* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: James Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.
Co-authored-by: James Lamb <jaylamb20@gmail.com>

e79716e0

02 May, 2021 1 commit
- [docs][python] update some docs related to custom objective (#4245) · 1a367c65
  Nikita Titov authored May 02, 2021
  
  1a367c65
15 Mar, 2021 1 commit
- [python-package] add type hints on Booster.set_network() (#4068) · dc1bc23a
  James Lamb authored Mar 15, 2021
```
* [python-package] add type hints on Booster.set_network()

* change behavior
```
  dc1bc23a
24 Feb, 2021 1 commit

[dask][python-package] include support for column array as label (#3943) · 5dacd603

jmoralez authored Feb 24, 2021

* include support for column array as label

* remove nested ifs

* fix linting errors

* include tests for sklearn regressors

* include docstring for numpy_1d_array_to_dtype

* include . at end of docstring

* remove pandas import and test for regression, classification and ranking

* check predictions of sklearn models as well

* test training only in dask. drop pandas series tests

* use PANDAS_INSTALLED and pd_Series

* inline imports

* use col array in fit for test_dask

* include review comments

5dacd603

19 Feb, 2021 1 commit
- [docs] Change some 'parallel learning' references to 'distributed learning' (#4000) · 7880b79f
  James Lamb authored Feb 19, 2021
```
* [docs] Change some 'parallel learning' references to 'distributed learning'

* found a few more

* one more reference
```
  7880b79f
17 Feb, 2021 1 commit

Optimize array-from-ctypes in basic.py (#3927) · de8c6105

Alex Ford authored Feb 16, 2021

Approximately %80 of runtime when loading "low column count, high row
count" DataFrames into Datasets is consumed in `np.fromiter`, called
as part of the `Dataset.get_field` method.

This is particularly pernicious hotspot, as unlike other ctypes-based
methods this is a hot loop over a python iterator loop and causes
significant GIL-contention in multi-threaded applications.

Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`,
which allows a single-shot `copy` of the underlying array.

This reduces the load time of a ~35 million row categorical dataframe
with 1 column from ~5 seconds to ~1 second, and allows multi-threaded
execution.

de8c6105