1. 24 Dec, 2020 1 commit
    • Belinda Trotta's avatar
      Trees with linear models at leaves (#3299) · fcfd4132
      Belinda Trotta authored
      * Add Eigen library.
      
      * Working for simple test.
      
      * Apply changes to config params.
      
      * Handle nan data.
      
      * Update docs.
      
      * Add test.
      
      * Only load raw data if boosting=gbdt_linear
      
      * Remove unneeded code.
      
      * Minor updates.
      
      * Update to work with sk-learn interface.
      
      * Update to work with chunked datasets.
      
      * Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.
      
      * Save raw data in binary dataset file.
      
      * Update docs and fix parameter checking.
      
      * Fix dataset loading.
      
      * Add test for regularization.
      
      * Fix bugs when saving and loading tree.
      
      * Add test for load/save linear model.
      
      * Remove unneeded code.
      
      * Fix case where not enough leaf data for linear model.
      
      * Simplify code.
      
      * Speed up code.
      
      * Speed up code.
      
      * Simplify code.
      
      * Speed up code.
      
      * Fix bugs.
      
      * Working version.
      
      * Store feature data column-wise (not fully working yet).
      
      * Fix bugs.
      
      * Speed up.
      
      * Speed up.
      
      * Remove unneeded code.
      
      * Small speedup.
      
      * Speed up.
      
      * Minor updates.
      
      * Remove unneeded code.
      
      * Fix bug.
      
      * Fix bug.
      
      * Speed up.
      
      * Speed up.
      
      * Simplify code.
      
      * Remove unneeded code.
      
      * Fix bug, add more tests.
      
      * Fix bug and add test.
      
      * Only store numerical features
      
      * Fix bug and speed up using templates.
      
      * Speed up prediction.
      
      * Fix bug with regularisation
      
      * Visual studio files.
      
      * Working version
      
      * Only check nans if necessary
      
      * Store coeff matrix as an array.
      
      * Align cache lines
      
      * Align cache lines
      
      * Preallocation coefficient calculation matrices
      
      * Small speedups
      
      * Small speedup
      
      * Reverse cache alignment changes
      
      * Change to dynamic schedule
      
      * Update docs.
      
      * Refactor so that linear tree learner is not a separate class.
      
      * Add refit capability.
      
      * Speed up
      
      * Small speedups.
      
      * Speed up add prediction to score.
      
      * Fix bug
      
      * Fix bug and speed up.
      
      * Speed up dataload.
      
      * Speed up dataload
      
      * Use vectors instead of pointers
      
      * Fix bug
      
      * Add OMP exception handling.
      
      * Change return type of LGBM_BoosterGetLinear to bool
      
      * Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change
      
      * Remove unused internal_parent_ property of tree
      
      * Remove unused parameter to CreateTreeLearner
      
      * Remove reference to LinearTreeLearner
      
      * Minor style issues
      
      * Remove unneeded check
      
      * Reverse temporary testing change
      
      * Fix Visual Studio project files
      
      * Restore LightGBM.vcxproj.filters
      
      * Speed up
      
      * Speed up
      
      * Simplify code
      
      * Update docs
      
      * Simplify code
      
      * Initialise storage space for max num threads
      
      * Move Eigen to include directory and delete unused files
      
      * Remove old files.
      
      * Fix so it compiles with mingw
      
      * Fix gpu tree learner
      
      * Change AddPredictionToScore back to const
      
      * Fix python lint error
      
      * Fix C++ lint errors
      
      * Change eigen to a submodule
      
      * Update comment
      
      * Add the eigen folder
      
      * Try to fix build issues with eigen
      
      * Remove eigen files
      
      * Add eigen as submodule
      
      * Fix include paths
      
      * Exclude eigen files from Python linter
      
      * Ignore eigen folders for pydocstyle
      
      * Fix C++ linting errors
      
      * Fix docs
      
      * Fix docs
      
      * Exclude eigen directories from doxygen
      
      * Update manifest to include eigen
      
      * Update build_r to include eigen files
      
      * Fix compiler warnings
      
      * Store raw feature data as float
      
      * Use float for calculating linear coefficients
      
      * Remove eigen directory from GLOB
      
      * Don't compile linear model code when building R package
      
      * Fix doxygen issue
      
      * Fix lint issue
      
      * Fix lint issue
      
      * Remove uneeded code
      
      * Restore delected lines
      
      * Restore delected lines
      
      * Change return type of has_raw to bool
      
      * Update docs
      
      * Rename some variables and functions for readability
      
      * Make tree_learner parameter const in AddScore
      
      * Fix style issues
      
      * Pass vectors as const reference when setting tree properties
      
      * Make temporary storage of serial_tree_learner mutable so we can make the object's methods const
      
      * Remove get_raw_size, use num_numeric_features instead
      
      * Fix typo
      
      * Make contains_nan_ and any_nan_ properties immutable again
      
      * Remove data_has_nan_ property of tree
      
      * Remove temporary test code
      
      * Make linear_tree a dataset param
      
      * Fix lint error
      
      * Make LinearTreeLearner a separate class
      
      * Fix lint errors
      
      * Fix lint error
      
      * Add linear_tree_learner.o
      
      * Simulate omp_get_max_threads if openmp is not available
      
      * Update PushOneData to also store raw data.
      
      * Cast size to int
      
      * Fix bug in ReshapeRaw
      
      * Speed up code with multithreading
      
      * Use OMP_NUM_THREADS
      
      * Speed up with multithreading
      
      * Update to use ArrayToString
      
      * Fix tests
      
      * Fix test
      
      * Fix bug introduced in merge
      
      * Minor updates
      
      * Update docs
      fcfd4132
  2. 22 Dec, 2020 1 commit
    • Jan Stiborek's avatar
      [python] [dask] add initial dask integration (#3515) · d90a16d5
      Jan Stiborek authored
      * migrated implementation from dask/dask-lightgbm
      
      * relaxed tests
      
      * tests skipped in case that MPI is used
      
      * fixed python 2.7 import + tests disabled on windows
      
      * python < 3.6 is not supported in tests
      
      * tests enabled only for linux
      
      * tests disabled for mpi interface
      
      * dask version pinned to >= 2.0
      
      * added @jameslamb as code owner
      
      * added missing pandas dependency
      
      * code refactoring, removed code duplication - lightgbm.dask.LGBMClassifier.fit is the same as lightgbm.dask.LGBMRegressor.fit
      
      * fixed refactoring
      
      * code deduplication - fit method moved into mixin class
      
      * fixed CODEOWNERS
      
      * removed unnecessary import
      
      * skip the module execution on python < 3.6 and on platform different than linux.
      
      * removed skip for python < 3.6
      
      * review comments
      
      * removed noqa, renamed API classes, renamed local variables
      d90a16d5
  3. 15 Dec, 2020 1 commit
  4. 09 Dec, 2020 1 commit
    • Nikita Titov's avatar
      [python] Drop Python 2 support (#3581) · 44a6fb7f
      Nikita Titov authored
      * Update setup.py
      
      * Update .appveyor.yml
      
      * Update .travis.yml
      
      * Update .vsts-ci.yml
      
      * Update __init__.py
      
      * Update test.sh
      
      * Update test_windows.ps1
      
      * Update advanced_example.py
      
      * Update requirements_base.txt
      
      * Update conf.py
      
      * Update conf.py
      
      * Update test_engine.py
      
      * Update utils.py
      
      * Update dockerfile-r
      
      * Update README.md
      
      * Update dockerfile.gpu
      
      * Update test_consistency.py
      
      * Update basic.py
      
      * Update compat.py
      
      * Update engine.py
      
      * Update sklearn.py
      
      * Update sklearn.py
      
      * Update callback.py
      
      * Update setup.py
      
      * Update __init__.py
      
      * Update plotting.py
      
      * Update sklearn.py
      
      * Update engine.py
      
      * Update compat.py
      
      * Update callback.py
      
      * Update basic.py
      
      * Update compat.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update compat.py
      
      * Update compat.py
      
      * Update plotting.py
      
      * Update engine.py
      
      * Update basic.py
      
      * Update sklearn.py
      
      * Update compat.py
      
      * Update engine.py
      
      * Update engine.py
      
      * Update callback.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update sklearn.py
      
      * Update sklearn.py
      
      * Update plotting.py
      
      * Update sklearn.py
      
      * Update compat.py
      
      * Update compat.py
      
      * Update engine.py
      
      * Update plotting.py
      
      * Update sklearn.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update compat.py
      
      * Update compat.py
      
      * Update compat.py
      
      * Update engine.py
      
      * Update basic.py
      
      * Update compat.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update compat.py
      
      * Update compat.py
      
      * Update basic.py
      
      * Update basic.py
      
      * Update .vsts-ci.yml
      
      * Update .vsts-ci.yml
      
      * Update conf.py
      
      * Revert "Update dockerfile-r"
      
      This reverts commit 4ff6ffc7e3eeda24cc6a59a3bb0c973f02d9d71c.
      44a6fb7f
  5. 07 Dec, 2020 1 commit
  6. 15 Nov, 2020 1 commit
  7. 10 Nov, 2020 1 commit
  8. 26 Oct, 2020 1 commit
    • Guolin Ke's avatar
      Fix add features (#2754) · 53977f36
      Guolin Ke authored
      
      
      * fix subset bug
      
      * typo
      
      * add fixme tag
      
      * bin mapper
      
      * fix test
      
      * fix add_features_from
      
      * Update dataset.cpp
      
      * fix merge bug
      
      * added Python merge code
      
      * added test for add_features
      
      * Update dataset.cpp
      
      * Update src/io/dataset.cpp
      
      * continue implementing
      
      * warn users about categorical features
      Co-authored-by: default avatarStrikerRUS <nekit94-12@hotmail.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      53977f36
  9. 30 Sep, 2020 2 commits
  10. 29 Sep, 2020 1 commit
  11. 11 Sep, 2020 1 commit
  12. 06 Sep, 2020 1 commit
  13. 02 Sep, 2020 1 commit
  14. 24 Aug, 2020 1 commit
  15. 11 Aug, 2020 1 commit
  16. 06 Aug, 2020 2 commits
  17. 02 Aug, 2020 1 commit
  18. 15 Jul, 2020 1 commit
  19. 14 Jul, 2020 1 commit
  20. 07 Jul, 2020 1 commit
  21. 28 Jun, 2020 1 commit
    • Ilya Matiach's avatar
      adding sparse support to TreeSHAP in lightgbm (#3000) · 9f367d11
      Ilya Matiach authored
      * adding sparse support to TreeSHAP in lightgbm
      
      * updating based on comments
      
      * updated based on comments, used fromiter instead of frombuffer
      
      * updated based on comments
      
      * fixed limits import order
      
      * fix sparse feature contribs to work with more than int32 max rows
      
      * really fixed int64 max error and build warnings
      
      * added sparse test with >int32 max rows
      
      * fixed python side reshape check on sparse data
      
      * updated based on latest comments
      
      * fixed comments
      
      * added CSC INT32_MAX validation to test, fixed comments
      9f367d11
  22. 27 Jun, 2020 1 commit
    • Alex's avatar
      [python][scikit-learn] new stacking tests and make number of features a property (#3173) · 72849466
      Alex authored
      * modify attribute and include stacking tests
      
      * backwards compatibility
      
      * check sklearn version
      
      * move stacking import
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      Split number of features and stacking tests.
      
      * Number of input features (#3173)
      
      Modify test name.
      
      * Number of input features (#3173)
      
      Update stacking tests for review comments.
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      Modify classifier test.
      
      * Number of input features (#3173)
      
      * Number of input features (#3173)
      
      Check score.
      72849466
  23. 23 Jun, 2020 1 commit
    • Belinda Trotta's avatar
      Interaction constraints (#3126) · bca2da97
      Belinda Trotta authored
      * Add interaction constraints functionality.
      
      * Minor fixes.
      
      * Minor fixes.
      
      * Change lambda to function.
      
      * Fix gpu bug, remove extra blank lines.
      
      * Fix gpu bug.
      
      * Fix style issues.
      
      * Try to fix segfault on MACOS.
      
      * Fix bug.
      
      * Fix bug.
      
      * Fix bugs.
      
      * Change parameter format for R.
      
      * Fix R style issues.
      
      * Change string formatting code.
      
      * Change docs to say R package not supported.
      
      * Remove R functionality, moving to separate PR.
      
      * Keep track of branch features in tree object.
      
      * Only track branch features when feature interactions are enabled.
      
      * Fix lint error.
      
      * Update docs and simplify tests.
      bca2da97
  24. 22 Jun, 2020 1 commit
  25. 11 Jun, 2020 1 commit
  26. 02 Jun, 2020 1 commit
  27. 20 May, 2020 1 commit
  28. 12 May, 2020 1 commit
  29. 05 May, 2020 1 commit
  30. 10 Apr, 2020 2 commits
  31. 20 Mar, 2020 2 commits
    • Alberto Ferreira's avatar
      Fix SWIG methods that return char** (#2850) · 91185c3a
      Alberto Ferreira authored
      
      
      * [swig] Fix SWIG methods that return char** with StringArray.
      
      + [new] Add StringArray class to manage and manipulate arrays of fixed-length strings:
      
        This class is now used to wrap any char** parameters, manage memory and
        manipulate the strings.
      
        Such class is defined at swig/StringArray.hpp and wrapped in StringArray.i.
      
      + [API+fix] Wrap LGBM_BoosterGetFeatureNames it resulted in segfault before:
      
        Added wrapper LGBM_BoosterGetFeatureNamesSWIG(BoosterHandle) that
        only receives the booster handle and figures how much memory to allocate
        for strings and returns a StringArray which can be easily converted to String[].
      
      + [API+safety] For consistency, LGBM_BoosterGetEvalNamesSWIG was wrapped as well:
      
        * Refactor to detect any kind of errors and removed all the parameters
          besides the BoosterHandle (much simpler API to use in Java).
        * No assumptions are made about the required string space necessary (128 before).
        * The amount of required string memory is computed internally
      
      + [safety] No possibility of undefined behaviour
      
        The two methods wrapped above now compute the necessary string storage space
        prior to allocation, as the low-level C API calls would crash the process
        irreversibly if they write more memory than which is passed to them.
      
      * Changes to C API and wrappers support char**
      
      To support the latest SWIG changes that enable proper char**
      return support that is safe, the C API was changed.
      
      The respecive wrappers in R and Python were changed too.
      
      * Cleanup indentation in new lightgbm_R.cpp code
      
      * Adress review code-style comments.
      
      * Update swig/StringArray.hpp
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update python-package/lightgbm/basic.py
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update src/lightgbm_R.cpp
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avataralberto.ferreira <alberto.ferreira@feedzai.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      91185c3a
    • Lukas Pfannschmidt's avatar
      [python] handle RandomState object in Scikit-learn Api (#2904) · cf0a992e
      Lukas Pfannschmidt authored
      
      
      * Add handling of RandomState object, which is standard for sklearn methods.
      
      LightGBM expects an integer seed instead of an object.
      If passed object is RandomState, we choose random integer based on its state to seed the underlying low level code.
      While chosen random integer is only in the range between 1 and 1e10 I expect it to have enough entropy (?) to not matter in practice.
      
      * Add RandomState object to random_state docstring.
      
      * remove blank line
      
      * Use property to handle setting random_state.
      This enables setting cloned estimators with the set_params method in sklearn.
      
      * Add docstring to attribute.
      
      * Fix and simplify docstring.
      
      * Add test case.
      
      * Use maximal int for datatype in seed derivation.
      
      * Replace random_state property with interfacing in fit method.
      Derives int seed for C code only when fitting and keeps RandomState object as param.
      
      * Adapt unit test to property change.
      
      * Extended test case and docstring
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Add more equality checks (feature importance, best iteration/score).
      
      * Add equality comparison of boosters represented by strings.
      Remove useless best_iteration_ comparison (we do not use early_stopping).
      
      * fix whitespace
      
      * Test if two subsequent fits produce different models
      
      * Apply suggestions from code review
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      cf0a992e
  32. 16 Mar, 2020 2 commits
  33. 06 Mar, 2020 1 commit
  34. 20 Feb, 2020 1 commit
    • Joan Fontanals's avatar
      Add capability to get possible max and min values for a model (#2737) · 18e7de4f
      Joan Fontanals authored
      
      
      * Add capability to get possible max and min values for a model
      
      * Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp
      
      * Update include/LightGBM/c_api.h
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Change iteration to avoid potential overflow, add bindings to R and Python and a basic test
      
      * Adjust test values
      
      * Consider const correctness and multithreading protection
      
      * Update test values
      
      * Update test values
      
      * Add test to check that model is exactly the same in all platforms
      
      * Try to parse the model to get the expected values
      
      * Try to parse the model to get the expected values
      
      * Fix implementation, num_leaves can be lower than the leaf_value_ size
      
      * Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value
      
      * Change test order
      
      * Add gpu_use_dp option in test
      
      * Remove helper test method
      
      * Update src/c_api.cpp
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update src/io/tree.cpp
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update src/io/tree.cpp
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update tests/python_package_test/test_basic.py
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Remoove imports
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      18e7de4f
  35. 19 Feb, 2020 1 commit
    • Guolin Ke's avatar
      [python] [R-package] refine the parameters for Dataset (#2594) · 9f79e840
      Guolin Ke authored
      
      
      * reset
      
      * fix a bug
      
      * fix test
      
      * Update c_api.h
      
      * support to no filter features by min_data
      
      * add warning in reset config
      
      * refine warnings for override dataset's parameter
      
      * some cleans
      
      * clean code
      
      * clean code
      
      * refine C API function doxygen comments
      
      * refined new param description
      
      * refined doxygen comments for R API function
      
      * removed stuff related to int8
      
      * break long line in warning message
      
      * removed tests which results cannot be validated anymore
      
      * added test for warnings about unchangeable params
      
      * write parameter from dataset to booster
      
      * consider free_raw_data.
      
      * fix params
      
      * fix bug
      
      * implementing R
      
      * fix typo
      
      * filter params in R
      
      * fix R
      
      * not min_data
      
      * refined tests
      
      * fixed linting
      
      * refine
      
      * pilint
      
      * add docstring
      
      * fix docstring
      
      * R lint
      
      * updated description for C API function
      
      * use param aliases in Python
      
      * fixed typo
      
      * fixed typo
      
      * added more params to test
      
      * removed debug print
      
      * fix dataset construct place
      
      * fix merge bug
      
      * Update feature_histogram.hpp
      
      * add is_sparse back
      
      * remove unused parameters
      
      * fix lint
      
      * add data random seed
      
      * update
      
      * [R-package] centrallized Dataset parameter aliases and added tests on Dataset parameter updating (#2767)
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      9f79e840