- 03 Dec, 2021 1 commit
-
-
Nikita Titov authored
* add C API function that returns all param names with aliases * add C API function that returns all param names with aliases * add R code * test R code * remove debug CI * fix R lint * refactor * run CI * fix R * fix * revert CI checks * revert changes in docs * Try to make function `const` Co-authored-by:
James Lamb <jaylamb20@gmail.com> * add `const` in cpp file * address review comments and sync with `master` Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 02 Dec, 2021 1 commit
-
-
Nikita Titov authored
* in predict(), respect params set via `set_params()` after fit() * continue * add test * fix return name * hotfix * simplify
-
- 26 Nov, 2021 1 commit
-
-
Nikita Titov authored
* Update sklearn.py * Update engine.py * Update sklearn.py * Update engine.py * Update basic.py * Update engine.py
-
- 23 Nov, 2021 1 commit
-
-
Nikita Titov authored
-
- 20 Nov, 2021 1 commit
-
-
Nikita Titov authored
* Update test_plotting.py * Update dask.py * Update sklearn.py * Update test_sklearn.py * Update basic.py * Update engine.py * Update test_engine.py * Update basic.py * Update basic.py * Update engine.py
-
- 15 Nov, 2021 1 commit
-
-
Drew Miller authored
* [c_api] Improve ANSI compatibility by avoiding <stdbool.h> * fixes in response to CI linting * inline NOLINT instead of separate test * moving length declaration to non-ANSI C conditional * [c_api] Align expected return type in `basic.py` with new c_api type.
-
- 12 Nov, 2021 1 commit
-
-
Roman Shaptala authored
* Faster categorical column names selection (#1) * Faster categorical column names selection Change slow and redundant dataframe query by select_dtypes into a dataframe.dtypes list comprehension * Update compat with CategoricalDtype * sort imports * import CategoricalDtype from pandas.api.types * add categorical import try/except
-
- 11 Nov, 2021 1 commit
-
-
Michael Mahoney authored
* Add 'nrounds' as an alias for 'num_iterations' * Improve tests * Compare against nrounds directly * Fix whitespace lints
-
- 08 Nov, 2021 1 commit
-
-
Zhiyuan He authored
-
- 07 Oct, 2021 1 commit
-
-
strobel authored
Co-authored-by:
strobel <thaddaeus.strobel@ai4bd.com> Co-authored-by:
Nikita Titov <nekit94-12@hotmail.com>
-
- 05 Oct, 2021 1 commit
-
-
Nikita Titov authored
-
- 17 Sep, 2021 1 commit
-
-
José Morales authored
[python-package] Support 2d collections as input for `init_score` in multiclass classification task (#4150) * initial implementation of init_score for multiclass classification * check for 1d or 2d collection in init_score * remove dataset import * initial comments * update dask test and docstrings * update docstrings * move logic to set_field. reshape back on get_field * add type hints and update docstrings for dask. fix Dataset.set_field * revert wrong docstrings and type hints * add extra comma for consistency * prefix private functions with underscore add type hints to new functions make commas consistent in dask and basic * add missing spaces after type hint * remove shape condition for dataframe in is_2d_collection Co-authored-by:Nikita Titov <nekit94-12@hotmail.com>
-
- 04 Sep, 2021 1 commit
-
-
Nikita Titov authored
* deprecate `silent` and standalone `verbose` args. Prefer global `verbose` param * simplify code * Rephrase warning messages
-
- 30 Aug, 2021 1 commit
-
-
Nikita Titov authored
-
- 27 Aug, 2021 2 commits
-
-
Nikita Titov authored
-
Nikita Titov authored
* Reffer to string type as `str` and and commas in `list of ...` types * update `libpath.py` too
-
- 25 Aug, 2021 1 commit
-
-
James Lamb authored
[docs] Clarify the fact that predict() on a file does not support saved Datasets (fixes #4034) (#4545) * documentation changes * add list of supported formats to error message * add unit tests * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * update per review comments * make references consistent Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 23 Aug, 2021 1 commit
-
-
Xavier Dupré authored
* add parameter object_hook to function dump_model (python API) * eol * fix syntax * lint * better documentation * Update python-package/lightgbm/basic.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
xavier dupré <xavier.dupre@gmail.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 19 Aug, 2021 1 commit
-
-
James Lamb authored
* [python] add type hints to logging functions in basic.py * add hints on wrapper
-
- 03 Aug, 2021 1 commit
-
-
Chen Yufei authored
* Update c_api LGBM_SampleIndices() comment. rand.Sample() now returns exactly given number of samples, thus the comment should be fixed. * Update include/LightGBM/c_api.h Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 31 Jul, 2021 1 commit
-
-
Nikita Titov authored
-
- 30 Jul, 2021 1 commit
-
-
Chen Yufei authored
* [python] support Dataset.get_data for Sequence input. * Tweaks according to review comments. * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Add test cases. * fix import order in test_basic.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 07 Jul, 2021 1 commit
-
-
Nikita Titov authored
* allow to pass some params as pathlib.Path objects * fix lint * improve indentation
-
- 05 Jul, 2021 2 commits
-
-
Nikita Titov authored
* Update test_sklearn.py * Update test_basic.py * Update dask.py * Update basic.py * Update basic.py * Update basic.py * Update basic.py * Update callback.py
-
Nikita Titov authored
-
- 02 Jul, 2021 1 commit
-
-
Chen Yufei authored
* [python-package] create Dataset from sampled data. * [python-package] create Dataset from List[Sequence]. 1. Use random access for data sampling 2. Support read data from multiple input files 3. Read data in batch so no need to hold all data in memory * [python-package] example: create Dataset from multiple HDF5 file. * fix: revert is_class implementation for seq * fix: unwanted memory view reference for seq * fix: seq is_class accepts sklearn matrices * fix: requirements for example * fix: pycode * feat: print static code linting stage * fix: linting: avoid shell str regex conversion * code style: doc style * code style: isort * fix ci dependency: h5py on windows * [py] remove rm files in test seq https://github.com/microsoft/LightGBM/pull/4089#discussion_r612929623 * docs(python): init_from_sample summary https://github.com/microsoft/LightGBM/pull/4089#discussion_r612903389 * remove dataset dump sample data debugging code. * remove typo fix. Create separate PR for this. * fix typo in src/c_api.cpp Co-authored-by:
James Lamb <jaylamb20@gmail.com> * style(linting): py3 type hint for seq * test(basic): os.path style path handling * Revert "feat: print static code linting stage" This reverts commit 10bd79f7f8258bea8e61c3abb8c9c7e4456a916d. * feat(python): sequence on validation set * minor(python): comment * minor(python): test option hint * style(python): fix code linting * style(python): add pydoc for ref_dataset * doc(python): sequence Co-authored-by:
shiyu1994 <shiyu_k1994@qq.com> * revert(python): sequence class abc * chore(python): remove rm_files * Remove useless static_assert. * refactor: test_basic test for sequence. * fix lint complaint. * remove dataset._dump_text in sequence test. * Fix reverting typo fix. * Apply suggestions from code review Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Fix type hint, code and doc style. * fix failing test_basic. * Remove TODO about keep constant in sync with cpp. * Install h5py only when running python-examples. * Fix lint complaint. * Apply suggestions from code review Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Doc fixes, remove unused params_str in __init_from_seqs. * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Remove unnecessary conda install in windows ci script. * Keep param as example in dataset_from_multi_hdf5.py * Add _get_sample_count function to remove code duplication. * Use batch_size parameter in generate_hdf. * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Fix after applying suggestions. * Fix test, check idx is instance of numbers.Integral. * Update python-package/lightgbm/basic.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Expose Sequence class in Python-API doc. * Handle Sequence object not having batch_size. * Fix isort lint complaint. * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update docstring to mention Sequence as data input. * Remove get_one_line in test_basic.py * Make Sequence an abstract class. * Reduce number of tests for test_sequence. * Add c_api: LGBM_SampleCount, fix potential bug in LGBMSampleIndices. * empty commit to trigger ci * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Rename to LGBM_GetSampleCount, change LGBM_SampleIndices out_len to int32_t. Also rename total_nrow to num_total_row in c_api.h for consistency. * Doc about Sequence in docs/Python-Intro.rst. * Fix: basic.py change LGBM_SampleIndices out_len to int32. * Add create_valid test case with Dataset from Sequence. * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Apply suggestions from code review Co-authored-by:
shiyu1994 <shiyu_k1994@qq.com> * Remove no longer used DEFAULT_BIN_CONSTRUCT_SAMPLE_CNT. * Update python-package/lightgbm/basic.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
Willian Zhang <willian@willian.email> Co-authored-by:
Willian Z <Willian@Willian-Zhang.com> Co-authored-by:
James Lamb <jaylamb20@gmail.com> Co-authored-by:
shiyu1994 <shiyu_k1994@qq.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 27 Jun, 2021 1 commit
-
-
Nikita Titov authored
-
- 26 Jun, 2021 1 commit
-
-
Nikita Titov authored
-
- 18 Jun, 2021 1 commit
-
-
Chen Yufei authored
-
- 21 May, 2021 1 commit
-
-
Nikita Titov authored
* handle arbitrary length feature names in Python-package * added tests
-
- 20 May, 2021 1 commit
-
-
Nikita Titov authored
-
- 17 May, 2021 1 commit
-
-
Nikita Titov authored
-
- 15 May, 2021 1 commit
-
-
NovusEdge authored
* added f-string * fix missing parentheses and other string formatting * remove extra trailing parenthesis * one more missing parenthesis * fix pandas categoricals * update uses of + * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
James Lamb <jaylamb20@gmail.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 10 May, 2021 1 commit
-
-
James Lamb authored
[docs] clarify docs for LGBM_BoosterGetEvalNames and LGBM_BoosterGetEvalCounts (fixes #4264) (#4270)
-
- 04 May, 2021 1 commit
-
-
Andrew Ziem authored
* Correct spelling Most changes were in comments, and there were a few changes to literals for log output. There were no changes to variable names, function names, IDs, or functionality. * Clarify a phrase in a comment Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Clarify a phrase in a comment Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Clarify a phrase in a comment Co-authored-by:
James Lamb <jaylamb20@gmail.com> * Correct spelling Most are code comments, but one case is a literal in a logging message. There are a few grammar fixes too. Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 02 May, 2021 1 commit
-
-
Nikita Titov authored
-
- 15 Mar, 2021 1 commit
-
-
James Lamb authored
* [python-package] add type hints on Booster.set_network() * change behavior
-
- 24 Feb, 2021 1 commit
-
-
jmoralez authored
* include support for column array as label * remove nested ifs * fix linting errors * include tests for sklearn regressors * include docstring for numpy_1d_array_to_dtype * include . at end of docstring * remove pandas import and test for regression, classification and ranking * check predictions of sklearn models as well * test training only in dask. drop pandas series tests * use PANDAS_INSTALLED and pd_Series * inline imports * use col array in fit for test_dask * include review comments
-
- 19 Feb, 2021 1 commit
-
-
James Lamb authored
* [docs] Change some 'parallel learning' references to 'distributed learning' * found a few more * one more reference
-
- 17 Feb, 2021 1 commit
-
-
Alex Ford authored
Approximately %80 of runtime when loading "low column count, high row count" DataFrames into Datasets is consumed in `np.fromiter`, called as part of the `Dataset.get_field` method. This is particularly pernicious hotspot, as unlike other ctypes-based methods this is a hot loop over a python iterator loop and causes significant GIL-contention in multi-threaded applications. Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`, which allows a single-shot `copy` of the underlying array. This reduces the load time of a ~35 million row categorical dataframe with 1 column from ~5 seconds to ~1 second, and allows multi-threaded execution.
-