- 19 Feb, 2021 1 commit
-
-
James Lamb authored
* [docs] Change some 'parallel learning' references to 'distributed learning' * found a few more * one more reference
-
- 17 Feb, 2021 1 commit
-
-
Alex Ford authored
Approximately %80 of runtime when loading "low column count, high row count" DataFrames into Datasets is consumed in `np.fromiter`, called as part of the `Dataset.get_field` method. This is particularly pernicious hotspot, as unlike other ctypes-based methods this is a hot loop over a python iterator loop and causes significant GIL-contention in multi-threaded applications. Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`, which allows a single-shot `copy` of the underlying array. This reduces the load time of a ~35 million row categorical dataframe with 1 column from ~5 seconds to ~1 second, and allows multi-threaded execution.
-
- 16 Feb, 2021 2 commits
-
-
Nikita Titov authored
* run isort in CI linting job * workaround conda compatibility issues
-
Zhuyi Xue authored
-
- 28 Jan, 2021 1 commit
-
-
Nikita Titov authored
-
- 26 Jan, 2021 3 commits
-
-
Nikita Titov authored
-
Nikita Titov authored
* fix Dask docstrings and mimic sklearn importing way * Update .vsts-ci.yml * revert CI checks * use import aliases for Dask classes * check Dask is installed in _predict() func * fix lint issues introduced during resolving merge conflicts * Update dask.py
-
James Lamb authored
* [dask] allow parameter aliases for tree_learner and local_listen_port (fixes #3671) * num_thread too * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * empty commit * add _choose_param_value * revert param order change * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/dask.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * just import deepcopy * remove machines aliases * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 24 Jan, 2021 2 commits
-
-
Nikita Titov authored
* Update dask.py * Update basic.py * hotfix pop
-
Nikita Titov authored
* centralize Python-package logging in one place * continue * fix test name * removed unused import * enhance test * fix lint * hotfix test * workaround for GPU test * remove custom logger from Dask-package * replace one log func with flags by multiple funcs
-
- 20 Jan, 2021 1 commit
-
-
James Lamb authored
[dask] allow parameter aliases for local_listen_port, num_threads, tree_learner (fixes #3671) (#3789) * [dask] allow parameter aliases for tree_learner and local_listen_port (fixes #3671) * num_thread too * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * empty commit Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 19 Jan, 2021 1 commit
-
-
Nikita Titov authored
* fix docs * Update basic.py * Update engine.py
-
- 18 Jan, 2021 1 commit
-
-
James Lamb authored
* [python-package] expand documentation on 'group' for ranking task * add R package * update Query Data section * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * fix typo in group example * regenerate parameters * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * regenerate R docs Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 24 Dec, 2020 1 commit
-
-
Belinda Trotta authored
* Add Eigen library. * Working for simple test. * Apply changes to config params. * Handle nan data. * Update docs. * Add test. * Only load raw data if boosting=gbdt_linear * Remove unneeded code. * Minor updates. * Update to work with sk-learn interface. * Update to work with chunked datasets. * Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters. * Save raw data in binary dataset file. * Update docs and fix parameter checking. * Fix dataset loading. * Add test for regularization. * Fix bugs when saving and loading tree. * Add test for load/save linear model. * Remove unneeded code. * Fix case where not enough leaf data for linear model. * Simplify code. * Speed up code. * Speed up code. * Simplify code. * Speed up code. * Fix bugs. * Working version. * Store feature data column-wise (not fully working yet). * Fix bugs. * Speed up. * Speed up. * Remove unneeded code. * Small speedup. * Speed up. * Minor updates. * Remove unneeded code. * Fix bug. * Fix bug. * Speed up. * Speed up. * Simplify code. * Remove unneeded code. * Fix bug, add more tests. * Fix bug and add test. * Only store numerical features * Fix bug and speed up using templates. * Speed up prediction. * Fix bug with regularisation * Visual studio files. * Working version * Only check nans if necessary * Store coeff matrix as an array. * Align cache lines * Align cache lines * Preallocation coefficient calculation matrices * Small speedups * Small speedup * Reverse cache alignment changes * Change to dynamic schedule * Update docs. * Refactor so that linear tree learner is not a separate class. * Add refit capability. * Speed up * Small speedups. * Speed up add prediction to score. * Fix bug * Fix bug and speed up. * Speed up dataload. * Speed up dataload * Use vectors instead of pointers * Fix bug * Add OMP exception handling. * Change return type of LGBM_BoosterGetLinear to bool * Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change * Remove unused internal_parent_ property of tree * Remove unused parameter to CreateTreeLearner * Remove reference to LinearTreeLearner * Minor style issues * Remove unneeded check * Reverse temporary testing change * Fix Visual Studio project files * Restore LightGBM.vcxproj.filters * Speed up * Speed up * Simplify code * Update docs * Simplify code * Initialise storage space for max num threads * Move Eigen to include directory and delete unused files * Remove old files. * Fix so it compiles with mingw * Fix gpu tree learner * Change AddPredictionToScore back to const * Fix python lint error * Fix C++ lint errors * Change eigen to a submodule * Update comment * Add the eigen folder * Try to fix build issues with eigen * Remove eigen files * Add eigen as submodule * Fix include paths * Exclude eigen files from Python linter * Ignore eigen folders for pydocstyle * Fix C++ linting errors * Fix docs * Fix docs * Exclude eigen directories from doxygen * Update manifest to include eigen * Update build_r to include eigen files * Fix compiler warnings * Store raw feature data as float * Use float for calculating linear coefficients * Remove eigen directory from GLOB * Don't compile linear model code when building R package * Fix doxygen issue * Fix lint issue * Fix lint issue * Remove uneeded code * Restore delected lines * Restore delected lines * Change return type of has_raw to bool * Update docs * Rename some variables and functions for readability * Make tree_learner parameter const in AddScore * Fix style issues * Pass vectors as const reference when setting tree properties * Make temporary storage of serial_tree_learner mutable so we can make the object's methods const * Remove get_raw_size, use num_numeric_features instead * Fix typo * Make contains_nan_ and any_nan_ properties immutable again * Remove data_has_nan_ property of tree * Remove temporary test code * Make linear_tree a dataset param * Fix lint error * Make LinearTreeLearner a separate class * Fix lint errors * Fix lint error * Add linear_tree_learner.o * Simulate omp_get_max_threads if openmp is not available * Update PushOneData to also store raw data. * Cast size to int * Fix bug in ReshapeRaw * Speed up code with multithreading * Use OMP_NUM_THREADS * Speed up with multithreading * Update to use ArrayToString * Fix tests * Fix test * Fix bug introduced in merge * Minor updates * Update docs
-
- 15 Dec, 2020 1 commit
-
-
penolove authored
-
- 09 Dec, 2020 1 commit
-
-
Nikita Titov authored
* Update setup.py * Update .appveyor.yml * Update .travis.yml * Update .vsts-ci.yml * Update __init__.py * Update test.sh * Update test_windows.ps1 * Update advanced_example.py * Update requirements_base.txt * Update conf.py * Update conf.py * Update test_engine.py * Update utils.py * Update dockerfile-r * Update README.md * Update dockerfile.gpu * Update test_consistency.py * Update basic.py * Update compat.py * Update engine.py * Update sklearn.py * Update sklearn.py * Update callback.py * Update setup.py * Update __init__.py * Update plotting.py * Update sklearn.py * Update engine.py * Update compat.py * Update callback.py * Update basic.py * Update compat.py * Update basic.py * Update basic.py * Update compat.py * Update compat.py * Update plotting.py * Update engine.py * Update basic.py * Update sklearn.py * Update compat.py * Update engine.py * Update engine.py * Update callback.py * Update basic.py * Update basic.py * Update basic.py * Update basic.py * Update basic.py * Update sklearn.py * Update sklearn.py * Update plotting.py * Update sklearn.py * Update compat.py * Update compat.py * Update engine.py * Update plotting.py * Update sklearn.py * Update basic.py * Update basic.py * Update basic.py * Update basic.py * Update compat.py * Update compat.py * Update compat.py * Update engine.py * Update basic.py * Update compat.py * Update basic.py * Update basic.py * Update basic.py * Update compat.py * Update compat.py * Update basic.py * Update basic.py * Update .vsts-ci.yml * Update .vsts-ci.yml * Update conf.py * Revert "Update dockerfile-r" This reverts commit 4ff6ffc7e3eeda24cc6a59a3bb0c973f02d9d71c.
-
- 07 Dec, 2020 1 commit
-
-
James Lamb authored
[python][docs] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree() (#3618) * [python] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree() * fixing warnings * fix warnings * undo unnecessary space * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * single line, better weight descriptions * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * column names * Update python-package/lightgbm/plotting.py Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 26 Oct, 2020 1 commit
-
-
Guolin Ke authored
* fix subset bug * typo * add fixme tag * bin mapper * fix test * fix add_features_from * Update dataset.cpp * fix merge bug * added Python merge code * added test for add_features * Update dataset.cpp * Update src/io/dataset.cpp * continue implementing * warn users about categorical features Co-authored-by:
StrikerRUS <nekit94-12@hotmail.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 30 Sep, 2020 2 commits
-
-
Nikita Titov authored
-
Belinda Trotta authored
-
- 11 Sep, 2020 1 commit
-
-
James Lamb authored
-
- 06 Sep, 2020 1 commit
-
-
Germán Ramírez-Espinoza authored
* Refactors sklearn API to allow a list of evaluation metrics in the parameter eval_metric of the class (and subclasses of) LGBMModel. Also adds unit tests for this functionality * Simplify expression to check whether the user passed one or multiple metrics to eval_metric parameter * Simplify new tests by using custom metrics already defined in the test file * Update docstring to reflect the fact that the parameter "feval" from the "train" and "cv" functions can also receive a list of callables * Remove oxford comma from docstrings Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Use named-parameters to make sure code is compatible with future versions of scikit-learn Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Remove throwaway return value to make code more succinct Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Move statement to group together the code related to feval * Avoid modifying original args as it causes errors in scikit-learn tools For details see: https://github.com/microsoft/LightGBM/pull/2619 * Consolidate multiple eval-metrics unit-tests into one test Co-authored-by:
German I Ramirez-Espinoza <gire@home> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 11 Aug, 2020 1 commit
-
-
Nikita Titov authored
simplify start_iteration param for predict in Python and some code cleanup for start_iteration (#3288) * simplify start_iteration param for predict in Python and some code cleanup for start_iteration * revert docs changes about the prediction result shape
-
- 06 Aug, 2020 1 commit
-
-
shiyu1994 authored
* [python] add start_iteration to python predict interface (#3058) * Apply suggestions from code review * Update lightgbm_R.h * Apply suggestions from code review * Apply suggestions from code review * fix R interface * update R documentation Co-authored-by:Guolin Ke <guolin.ke@outlook.com>
-
- 15 Jul, 2020 1 commit
-
-
Guolin Ke authored
* feature importance type in saved model file * fix nullptr * fixed formatting * fix python/R * Update src/c_api.cpp * Apply suggestions from code review Co-authored-by:
James Lamb <jaylamb20@gmail.com> * fix c_api test * fix swig * minor docs improvements and added defines for importance types Co-authored-by:
StrikerRUS <nekit94-12@hotmail.com> Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 28 Jun, 2020 1 commit
-
-
Ilya Matiach authored
* adding sparse support to TreeSHAP in lightgbm * updating based on comments * updated based on comments, used fromiter instead of frombuffer * updated based on comments * fixed limits import order * fix sparse feature contribs to work with more than int32 max rows * really fixed int64 max error and build warnings * added sparse test with >int32 max rows * fixed python side reshape check on sparse data * updated based on latest comments * fixed comments * added CSC INT32_MAX validation to test, fixed comments
-
- 23 Jun, 2020 1 commit
-
-
Belinda Trotta authored
* Add interaction constraints functionality. * Minor fixes. * Minor fixes. * Change lambda to function. * Fix gpu bug, remove extra blank lines. * Fix gpu bug. * Fix style issues. * Try to fix segfault on MACOS. * Fix bug. * Fix bug. * Fix bugs. * Change parameter format for R. * Fix R style issues. * Change string formatting code. * Change docs to say R package not supported. * Remove R functionality, moving to separate PR. * Keep track of branch features in tree object. * Only track branch features when feature interactions are enabled. * Fix lint error. * Update docs and simplify tests.
-
- 11 Jun, 2020 1 commit
-
-
Nikita Titov authored
-
- 20 May, 2020 1 commit
-
-
Guolin Ke authored
* redir log to python console * fix pylint * Apply suggestions from code review * Update basic.py * Apply suggestions from code review Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> * Update c_api.h * Apply suggestions from code review * Apply suggestions from code review * super-minor: better wording Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
StrikerRUS <nekit94-12@hotmail.com>
-
- 05 May, 2020 1 commit
-
-
Nikita Titov authored
-
- 10 Apr, 2020 1 commit
-
-
OMOTO Tsukasa authored
* Support UTF-8 characters in feature name again This commit reverts 0d59859c. Also see: - https://github.com/microsoft/LightGBM/issues/2226 - https://github.com/microsoft/LightGBM/issues/2478 - https://github.com/microsoft/LightGBM/pull/2229 I reproduced the issue and as @kidotaka gave us a great survey in #2226, I don't conclude that the cause is UTF-8, but "an empty string (character)". Therefore, I revert "throw error when meet non ascii (#2229)" whose commit hash is 0d59859c, and add support feture names as UTF-8 again. * add tests * fix check-docs tests * update * fix tests * update .travis.yml * fix tests * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * add a test for R-package * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * fix test for R-package * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * update test_r_package.sh * update * updte * update * remove unneeded comments
-
- 20 Mar, 2020 1 commit
-
-
Alberto Ferreira authored
* [swig] Fix SWIG methods that return char** with StringArray. + [new] Add StringArray class to manage and manipulate arrays of fixed-length strings: This class is now used to wrap any char** parameters, manage memory and manipulate the strings. Such class is defined at swig/StringArray.hpp and wrapped in StringArray.i. + [API+fix] Wrap LGBM_BoosterGetFeatureNames it resulted in segfault before: Added wrapper LGBM_BoosterGetFeatureNamesSWIG(BoosterHandle) that only receives the booster handle and figures how much memory to allocate for strings and returns a StringArray which can be easily converted to String[]. + [API+safety] For consistency, LGBM_BoosterGetEvalNamesSWIG was wrapped as well: * Refactor to detect any kind of errors and removed all the parameters besides the BoosterHandle (much simpler API to use in Java). * No assumptions are made about the required string space necessary (128 before). * The amount of required string memory is computed internally + [safety] No possibility of undefined behaviour The two methods wrapped above now compute the necessary string storage space prior to allocation, as the low-level C API calls would crash the process irreversibly if they write more memory than which is passed to them. * Changes to C API and wrappers support char** To support the latest SWIG changes that enable proper char** return support that is safe, the C API was changed. The respecive wrappers in R and Python were changed too. * Cleanup indentation in new lightgbm_R.cpp code * Adress review code-style comments. * Update swig/StringArray.hpp Co-Authored-By:Nikita Titov <nekit94-08@mail.ru> * Update python-package/lightgbm/basic.py Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Update src/lightgbm_R.cpp Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
alberto.ferreira <alberto.ferreira@feedzai.com> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 16 Mar, 2020 2 commits
-
-
Guolin Ke authored
* fix * fix return * fix test * fix test * fix predictor is none * Apply suggestions from code review * Update basic.py * Update basic.py * Apply suggestions from code review Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
Guolin Ke authored
* fix the bug when use different params with reference * fix * Update basic.py * Apply suggestions from code review Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Update basic.py * add test * Apply suggestions from code review * added asserts in test Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
StrikerRUS <nekit94-12@hotmail.com>
-
- 06 Mar, 2020 1 commit
-
-
Nikita Titov authored
* save all param values into model file * revert storing predict params * do not save params for predict and convert tasks * fixed test: 10 is found successfully for default 100 * specify more params as no-save
-
- 20 Feb, 2020 1 commit
-
-
Joan Fontanals authored
* Add capability to get possible max and min values for a model * Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp * Update include/LightGBM/c_api.h Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Change iteration to avoid potential overflow, add bindings to R and Python and a basic test * Adjust test values * Consider const correctness and multithreading protection * Update test values * Update test values * Add test to check that model is exactly the same in all platforms * Try to parse the model to get the expected values * Try to parse the model to get the expected values * Fix implementation, num_leaves can be lower than the leaf_value_ size * Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value * Change test order * Add gpu_use_dp option in test * Remove helper test method * Update src/c_api.cpp Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Update src/io/tree.cpp Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Update src/io/tree.cpp Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Update tests/python_package_test/test_basic.py Co-Authored-By:
Nikita Titov <nekit94-08@mail.ru> * Remoove imports Co-authored-by:
Nikita Titov <nekit94-08@mail.ru>
-
- 19 Feb, 2020 1 commit
-
-
Guolin Ke authored
* reset * fix a bug * fix test * Update c_api.h * support to no filter features by min_data * add warning in reset config * refine warnings for override dataset's parameter * some cleans * clean code * clean code * refine C API function doxygen comments * refined new param description * refined doxygen comments for R API function * removed stuff related to int8 * break long line in warning message * removed tests which results cannot be validated anymore * added test for warnings about unchangeable params * write parameter from dataset to booster * consider free_raw_data. * fix params * fix bug * implementing R * fix typo * filter params in R * fix R * not min_data * refined tests * fixed linting * refine * pilint * add docstring * fix docstring * R lint * updated description for C API function * use param aliases in Python * fixed typo * fixed typo * added more params to test * removed debug print * fix dataset construct place * fix merge bug * Update feature_histogram.hpp * add is_sparse back * remove unused parameters * fix lint * add data random seed * update * [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767) Co-authored-by:
Nikita Titov <nekit94-08@mail.ru> Co-authored-by:
James Lamb <jaylamb20@gmail.com>
-
- 03 Feb, 2020 1 commit
-
-
Nikita Titov authored
* removed duplicated code from language wrappers * removed check for resetting metric
-
- 14 Jan, 2020 2 commits
-
-
Nikita Titov authored
* transfer and enhance test for trees_to_dataframe * fixed bug in Python 2
-
Guolin Ke authored
* Update metadata.cpp * add version for training set, for efficiently update label/weight/... during training. * Update lgb.Booster.R
-