1. 05 Apr, 2021 1 commit
  2. 24 Mar, 2021 1 commit
  3. 17 Mar, 2021 1 commit
    • ashok-ponnuswami-msft's avatar
      Range check for DCG position discount lookup (#4069) · 4580393f
      ashok-ponnuswami-msft authored
      * Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data.
      
      * Change debug logging location so that we can print the data file name as well.
      
      * Revert "Change debug logging location so that we can print the data file name as well."
      
      This reverts commit 3981b34bd6e0530f89c4733e78e6b6603bf50d48.
      
      * Add data file name to debug logging.
      
      * Move log line to a place where it is output even when query IDs are read from a separate file.
      
      * Also add the out-of-range check to rank metrics.
      
      * Perform check after number of queries is initialized.
      
      * Update
      4580393f
  4. 12 Mar, 2021 1 commit
  5. 23 Feb, 2021 1 commit
  6. 21 Feb, 2021 1 commit
    • mjmckp's avatar
      Fix evalution of linear trees with a single leaf. (#3987) · 605c97b5
      mjmckp authored
      
      
      * Fix index out-of-range exception generated by BaggingHelper on small datasets.
      
      Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.
      
      * Update goss.hpp
      
      * Update goss.hpp
      
      * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)
      
      * Fix incorrect upstream merge
      
      * Add link to LightGBM.NET
      
      * Fix indenting to 2 spaces
      
      * Dummy edit to trigger CI
      
      * Dummy edit to trigger CI
      
      * remove duplicate functions from merge
      
      * Fix evalution of linear trees with a single leaf.
      
      Note that trees without linear models at the leaf always handle num_leaves = 1 as a special case and directly output the leaf value.  Linear trees were missing this special case handling, and hence would have the following issues:
       * Calling Tree::Predict or Tree::PredictByMap would cause an access violation exception attempting to access the first value of the empty split_feature_ array in GetLeaf.
       * PredictionFunLinear would either cause an access violation or go into an infinite loop when attempting to do the equivalent of GetLeaf.
      
      Note also that PredictionFun does not need the same changes as PredictionFunLinear, since both are only called by Tree::AddPredictionToScore, which has a special case for (!is_linear_ && num_leaves_ <= 1) that precludes calling PredictionFun.
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      605c97b5
  7. 19 Feb, 2021 3 commits
    • mjmckp's avatar
      Use high precision conversion from double to string in Tree::ToString() for... · 7f91dc66
      mjmckp authored
      
      Use high precision conversion from double to string in Tree::ToString() for new linear tree members (#3938)
      
      * Fix index out-of-range exception generated by BaggingHelper on small datasets.
      
      Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.
      
      * Update goss.hpp
      
      * Update goss.hpp
      
      * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)
      
      * Fix incorrect upstream merge
      
      * Add link to LightGBM.NET
      
      * Fix indenting to 2 spaces
      
      * Dummy edit to trigger CI
      
      * Dummy edit to trigger CI
      
      * remove duplicate functions from merge
      
      * In Tree::ToString() method, print double values for linear tree models with high precision, so that the tree may be accurately reproduced elsewhere (LightGBM.Net in particular)
      
      * Need to use more precise StringToArray instead of StringToArrayFast when parsing double valued arrays for linear trees, to ensure models round-trip via string or file correctly.
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      7f91dc66
    • James Lamb's avatar
      [docs] Change some 'parallel learning' references to 'distributed learning' (#4000) · 7880b79f
      James Lamb authored
      * [docs] Change some 'parallel learning' references to 'distributed learning'
      
      * found a few more
      
      * one more reference
      7880b79f
    • James Lamb's avatar
  8. 17 Feb, 2021 1 commit
    • mjmckp's avatar
      Fix for CreatePredictor function and VS2017 Debug build (#3937) · 5321fef6
      mjmckp authored
      
      
      * Fix index out-of-range exception generated by BaggingHelper on small datasets.
      
      Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.
      
      * Update goss.hpp
      
      * Update goss.hpp
      
      * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)
      
      * Fix incorrect upstream merge
      
      * Add link to LightGBM.NET
      
      * Fix indenting to 2 spaces
      
      * Dummy edit to trigger CI
      
      * Dummy edit to trigger CI
      
      * remove duplicate functions from merge
      
      * Fix for CreatePredictor function: for VS2017 in Debug build, the previous version would end up giving an uninitialised prediction function that would throw access violation exceptions when invoked.
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      5321fef6
  9. 14 Feb, 2021 1 commit
  10. 09 Feb, 2021 1 commit
  11. 06 Feb, 2021 1 commit
  12. 03 Feb, 2021 1 commit
  13. 28 Jan, 2021 1 commit
  14. 26 Jan, 2021 1 commit
  15. 25 Jan, 2021 1 commit
  16. 23 Jan, 2021 1 commit
  17. 21 Jan, 2021 1 commit
  18. 18 Jan, 2021 2 commits
  19. 15 Jan, 2021 1 commit
  20. 11 Jan, 2021 2 commits
  21. 09 Jan, 2021 1 commit
  22. 07 Jan, 2021 3 commits
  23. 05 Jan, 2021 1 commit
  24. 03 Jan, 2021 1 commit
  25. 29 Dec, 2020 1 commit
  26. 28 Dec, 2020 1 commit
    • Nikita Titov's avatar
      small code and docs refactoring (#3681) · 5a460846
      Nikita Titov authored
      * small code and docs refactoring
      
      * Update CMakeLists.txt
      
      * Update .vsts-ci.yml
      
      * Update test.sh
      
      * continue
      
      * continue
      
      * revert stable sort for all-unique values
      5a460846
  27. 24 Dec, 2020 1 commit
    • Belinda Trotta's avatar
      Trees with linear models at leaves (#3299) · fcfd4132
      Belinda Trotta authored
      * Add Eigen library.
      
      * Working for simple test.
      
      * Apply changes to config params.
      
      * Handle nan data.
      
      * Update docs.
      
      * Add test.
      
      * Only load raw data if boosting=gbdt_linear
      
      * Remove unneeded code.
      
      * Minor updates.
      
      * Update to work with sk-learn interface.
      
      * Update to work with chunked datasets.
      
      * Throw error if we try to create a Booster with an already-constructed dataset having incompatible parameters.
      
      * Save raw data in binary dataset file.
      
      * Update docs and fix parameter checking.
      
      * Fix dataset loading.
      
      * Add test for regularization.
      
      * Fix bugs when saving and loading tree.
      
      * Add test for load/save linear model.
      
      * Remove unneeded code.
      
      * Fix case where not enough leaf data for linear model.
      
      * Simplify code.
      
      * Speed up code.
      
      * Speed up code.
      
      * Simplify code.
      
      * Speed up code.
      
      * Fix bugs.
      
      * Working version.
      
      * Store feature data column-wise (not fully working yet).
      
      * Fix bugs.
      
      * Speed up.
      
      * Speed up.
      
      * Remove unneeded code.
      
      * Small speedup.
      
      * Speed up.
      
      * Minor updates.
      
      * Remove unneeded code.
      
      * Fix bug.
      
      * Fix bug.
      
      * Speed up.
      
      * Speed up.
      
      * Simplify code.
      
      * Remove unneeded code.
      
      * Fix bug, add more tests.
      
      * Fix bug and add test.
      
      * Only store numerical features
      
      * Fix bug and speed up using templates.
      
      * Speed up prediction.
      
      * Fix bug with regularisation
      
      * Visual studio files.
      
      * Working version
      
      * Only check nans if necessary
      
      * Store coeff matrix as an array.
      
      * Align cache lines
      
      * Align cache lines
      
      * Preallocation coefficient calculation matrices
      
      * Small speedups
      
      * Small speedup
      
      * Reverse cache alignment changes
      
      * Change to dynamic schedule
      
      * Update docs.
      
      * Refactor so that linear tree learner is not a separate class.
      
      * Add refit capability.
      
      * Speed up
      
      * Small speedups.
      
      * Speed up add prediction to score.
      
      * Fix bug
      
      * Fix bug and speed up.
      
      * Speed up dataload.
      
      * Speed up dataload
      
      * Use vectors instead of pointers
      
      * Fix bug
      
      * Add OMP exception handling.
      
      * Change return type of LGBM_BoosterGetLinear to bool
      
      * Change return type of LGBM_BoosterGetLinear back to int, only parameter type needed to change
      
      * Remove unused internal_parent_ property of tree
      
      * Remove unused parameter to CreateTreeLearner
      
      * Remove reference to LinearTreeLearner
      
      * Minor style issues
      
      * Remove unneeded check
      
      * Reverse temporary testing change
      
      * Fix Visual Studio project files
      
      * Restore LightGBM.vcxproj.filters
      
      * Speed up
      
      * Speed up
      
      * Simplify code
      
      * Update docs
      
      * Simplify code
      
      * Initialise storage space for max num threads
      
      * Move Eigen to include directory and delete unused files
      
      * Remove old files.
      
      * Fix so it compiles with mingw
      
      * Fix gpu tree learner
      
      * Change AddPredictionToScore back to const
      
      * Fix python lint error
      
      * Fix C++ lint errors
      
      * Change eigen to a submodule
      
      * Update comment
      
      * Add the eigen folder
      
      * Try to fix build issues with eigen
      
      * Remove eigen files
      
      * Add eigen as submodule
      
      * Fix include paths
      
      * Exclude eigen files from Python linter
      
      * Ignore eigen folders for pydocstyle
      
      * Fix C++ linting errors
      
      * Fix docs
      
      * Fix docs
      
      * Exclude eigen directories from doxygen
      
      * Update manifest to include eigen
      
      * Update build_r to include eigen files
      
      * Fix compiler warnings
      
      * Store raw feature data as float
      
      * Use float for calculating linear coefficients
      
      * Remove eigen directory from GLOB
      
      * Don't compile linear model code when building R package
      
      * Fix doxygen issue
      
      * Fix lint issue
      
      * Fix lint issue
      
      * Remove uneeded code
      
      * Restore delected lines
      
      * Restore delected lines
      
      * Change return type of has_raw to bool
      
      * Update docs
      
      * Rename some variables and functions for readability
      
      * Make tree_learner parameter const in AddScore
      
      * Fix style issues
      
      * Pass vectors as const reference when setting tree properties
      
      * Make temporary storage of serial_tree_learner mutable so we can make the object's methods const
      
      * Remove get_raw_size, use num_numeric_features instead
      
      * Fix typo
      
      * Make contains_nan_ and any_nan_ properties immutable again
      
      * Remove data_has_nan_ property of tree
      
      * Remove temporary test code
      
      * Make linear_tree a dataset param
      
      * Fix lint error
      
      * Make LinearTreeLearner a separate class
      
      * Fix lint errors
      
      * Fix lint error
      
      * Add linear_tree_learner.o
      
      * Simulate omp_get_max_threads if openmp is not available
      
      * Update PushOneData to also store raw data.
      
      * Cast size to int
      
      * Fix bug in ReshapeRaw
      
      * Speed up code with multithreading
      
      * Use OMP_NUM_THREADS
      
      * Speed up with multithreading
      
      * Update to use ArrayToString
      
      * Fix tests
      
      * Fix test
      
      * Fix bug introduced in merge
      
      * Minor updates
      
      * Update docs
      fcfd4132
  28. 11 Dec, 2020 1 commit
  29. 08 Dec, 2020 1 commit
    • Alberto Ferreira's avatar
      Fix model locale issue and improve model R/W performance. (#3405) · 792c9303
      Alberto Ferreira authored
      * Fix LightGBM models locale sensitivity and improve R/W performance.
      
      When Java is used, the default C++ locale is broken. This is true for
      Java providers that use the C API or even Python models that require JEP.
      
      This patch solves that issue making the model reads/writes insensitive
      to such settings.
      To achieve it, within the model read/write codebase:
       - C++ streams are imbued with the classic locale
       - Calls to functions that are dependent on the locale are replaced
       - The default locale is not changed!
      
      This approach means:
       - The user's locale is never tampered with, avoiding issues such as
          https://github.com/microsoft/LightGBM/issues/2979 with the previous
          approach https://github.com/microsoft/LightGBM/pull/2891
       - Datasets can still be read according the user's locale
       - The model file has a single format independent of locale
      
      Changes:
       - Add CommonC namespace which provides faster locale-independent versions of Common's methods
       - Model code makes conversions through CommonC
       - Cleanup unused Common methods
       - Performance improvements. Use fast libraries for locale-agnostic conversion:
         - value->string: https://github.com/fmtlib/fmt
         - string->double: https://github.com/lemire/fast_double_parser (10x
            faster double parsing according to their benchmark)
      
      Bugfixes:
       - https://github.com/microsoft/LightGBM/issues/2500
       - https://github.com/microsoft/LightGBM/issues/2890
       - https://github.com/ninia/jep/issues/205
      
       (as it is related to LGBM as well)
      
      * Align CommonC namespace
      
      * Add new external_libs/ to python setup
      
      * Try fast_double_parser fix #1
      
      Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe
      
      If it works it should fix more LGBM builds
      
      * CMake: Attempt to link fmt without explicit PUBLIC tag
      
      * Exclude external_libs from linting
      
      * Add exernal_libs to MANIFEST.in
      
      * Set dynamic linking option for fmt.
      
      * linting issues
      
      * Try to fix lint includes
      
      * Try to pass fPIC with static fmt lib
      
      * Try CMake P_I_C option with fmt library
      
      * [R-package] Add CMake support for R and CRAN
      
      * Cleanup CMakeLists
      
      * Try fmt hack to remove stdout
      
      * Switch to header-only mode
      
      * Add PRIVATE argument to target_link_libraries
      
      * use fmt in header-only mode
      
      * Remove CMakeLists comment
      
      * Change OpenMP to PUBLIC linking in Mac
      
      * Update fmt submodule to 7.1.2
      
      * Use fmt in header-only-mode
      
      * Remove fmt from CMakeLists.txt
      
      * Upgrade fast_double_parser to v0.2.0
      
      * Revert "Add PRIVATE argument to target_link_libraries"
      
      This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7.
      
      * Address James Lamb's comments
      
      * Update R-package/.Rbuildignore
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Upgrade to fast_double_parser v0.3.0 - Solaris support
      
      * Use legacy code only in Solaris
      
      * Fix lint issues
      
      * Fix comment
      
      * Address StrikerRUS's comments (solaris ifdef).
      
      * Change header guards
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      792c9303
  30. 07 Dec, 2020 1 commit
  31. 05 Dec, 2020 1 commit
  32. 24 Nov, 2020 2 commits
  33. 23 Nov, 2020 1 commit