1. 08 Dec, 2020 1 commit
    • Alberto Ferreira's avatar
      Fix model locale issue and improve model R/W performance. (#3405) · 792c9303
      Alberto Ferreira authored
      * Fix LightGBM models locale sensitivity and improve R/W performance.
      
      When Java is used, the default C++ locale is broken. This is true for
      Java providers that use the C API or even Python models that require JEP.
      
      This patch solves that issue making the model reads/writes insensitive
      to such settings.
      To achieve it, within the model read/write codebase:
       - C++ streams are imbued with the classic locale
       - Calls to functions that are dependent on the locale are replaced
       - The default locale is not changed!
      
      This approach means:
       - The user's locale is never tampered with, avoiding issues such as
          https://github.com/microsoft/LightGBM/issues/2979 with the previous
          approach https://github.com/microsoft/LightGBM/pull/2891
       - Datasets can still be read according the user's locale
       - The model file has a single format independent of locale
      
      Changes:
       - Add CommonC namespace which provides faster locale-independent versions of Common's methods
       - Model code makes conversions through CommonC
       - Cleanup unused Common methods
       - Performance improvements. Use fast libraries for locale-agnostic conversion:
         - value->string: https://github.com/fmtlib/fmt
         - string->double: https://github.com/lemire/fast_double_parser (10x
            faster double parsing according to their benchmark)
      
      Bugfixes:
       - https://github.com/microsoft/LightGBM/issues/2500
       - https://github.com/microsoft/LightGBM/issues/2890
       - https://github.com/ninia/jep/issues/205
      
       (as it is related to LGBM as well)
      
      * Align CommonC namespace
      
      * Add new external_libs/ to python setup
      
      * Try fast_double_parser fix #1
      
      Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe
      
      If it works it should fix more LGBM builds
      
      * CMake: Attempt to link fmt without explicit PUBLIC tag
      
      * Exclude external_libs from linting
      
      * Add exernal_libs to MANIFEST.in
      
      * Set dynamic linking option for fmt.
      
      * linting issues
      
      * Try to fix lint includes
      
      * Try to pass fPIC with static fmt lib
      
      * Try CMake P_I_C option with fmt library
      
      * [R-package] Add CMake support for R and CRAN
      
      * Cleanup CMakeLists
      
      * Try fmt hack to remove stdout
      
      * Switch to header-only mode
      
      * Add PRIVATE argument to target_link_libraries
      
      * use fmt in header-only mode
      
      * Remove CMakeLists comment
      
      * Change OpenMP to PUBLIC linking in Mac
      
      * Update fmt submodule to 7.1.2
      
      * Use fmt in header-only-mode
      
      * Remove fmt from CMakeLists.txt
      
      * Upgrade fast_double_parser to v0.2.0
      
      * Revert "Add PRIVATE argument to target_link_libraries"
      
      This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7.
      
      * Address James Lamb's comments
      
      * Update R-package/.Rbuildignore
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Upgrade to fast_double_parser v0.3.0 - Solaris support
      
      * Use legacy code only in Solaris
      
      * Fix lint issues
      
      * Fix comment
      
      * Address StrikerRUS's comments (solaris ifdef).
      
      * Change header guards
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      792c9303
  2. 23 Nov, 2020 1 commit
  3. 13 Nov, 2020 1 commit
    • shiyu1994's avatar
      Optimization of row-wise histogram construction (#3522) · 0655d67c
      shiyu1994 authored
      
      
      * store without offset in multi_val_dense_bin
      
      * fix offset bug
      
      * add comment for offset
      
      * add comment for bin type selection
      
      * faster operations for offset
      
      * keep most freq bin in histogram for multi val dense
      
      * use original feature iterators
      
      * consider 9 cases (3 x 3) for multi val bin construction
      
      * fix dense bin setting
      
      * fix bin data in multi val group
      
      * fix offset of the first feature histogram
      
      * use float hist buf
      
      * avx in histogram construction
      
      * use avx for hist construction without prefetch
      
      * vectorize bin extraction
      
      * use only 128 vec
      
      * use avx2
      
      * use vectorization for sparse row wise
      
      * add bit size for multi val dense bin
      
      * float with no vectorization
      
      * change multithreading strategy to dynamic
      
      * remove intrinsic header
      
      * fix dense multi val col copy
      
      * remove bit size
      
      * use large enough block size when the bin number is large
      
      * calc min block size by sparsity
      
      * rescale gradients
      
      * rollback gradients scaling
      
      * single precision histogram buffer as an option
      
      * add float hist buffer with thread buffer
      
      * fix setting zero in hist data
      
      * fix hist begin pointer in tree learners
      
      * remove debug logs
      
      * remove omp simd
      
      * update Makevars of R-package
      
      * fix feature group binary storing
      
      * two row wise for double hist buffer
      
      * add subfeature for two row wise
      
      * remove useless code and fix two row wise
      
      * refactor code
      
      * grouping the dense feature groups can get sparse multi val bin
      
      * clean format problems
      
      * one thread for two blocks in sep row wise
      
      * use ordered gradients for sep row wise
      
      * fix grad ptr
      
      * ordered grad with combined block for sep row wise
      
      * fix block threading
      
      * use the same min block size
      
      * rollback share min block size
      
      * remove logs
      
      * Update src/io/dataset.cpp
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      
      * fix parameter description
      
      * remove sep_row_wise
      
      * remove check codes
      
      * add check for empty multi val bin
      
      * fix lint error
      
      * rollback changes in config.h
      
      * Apply suggestions from code review
      Co-authored-by: default avatarUbuntu <shiyu@gbdt-04.ren3kv4wanvufliwrpy4k03lsf.xx.internal.cloudapp.net>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      0655d67c
  4. 27 Oct, 2020 2 commits
  5. 18 Oct, 2020 1 commit
    • James Lamb's avatar
      [ci] [R-package] Fix memory leaks found by valgrind (#3443) · 81d76113
      James Lamb authored
      
      
      * fix int64 write error
      
      * attempt
      
      * [WIP] [ci] [R-package] Add CI job that runs valgrind tests
      
      * update all-successful
      
      * install
      
      * executable
      
      * fix redirect stuff
      
      * Apply suggestions from code review
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      
      * more flags
      
      * add mc to msvc proj
      
      * fix memory leak in mc
      
      * Update monotone_constraints.hpp
      
      * Update r_package.yml
      
      * remove R_INT64_PTR
      
      * disable openmp
      
      * Update gbdt_model_text.cpp
      
      * Update gbdt_model_text.cpp
      
      * Apply suggestions from code review
      
      * try to free vector
      
      * free more memories.
      
      * Update src/boosting/gbdt_model_text.cpp
      
      * fix using
      
      * try the UNPROTECT(1);
      
      * fix a const pointer
      
      * fix Common
      
      * reduce UNPROTECT
      
      * remove UNPROTECT(1);
      
      * fix null handle
      
      * fix predictor
      
      * use NULL after free
      
      * fix a leaking in test
      
      * try more fixes
      
      * test the effect of tests
      
      * throw exception in Fatal
      
      * add test back
      
      * Apply suggestions from code review
      
      * commet some tests
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * trying to comment out tests
      
      * Update openmp_wrapper.h
      
      * Apply suggestions from code review
      
      * Update configure
      
      * Update configure.ac
      
      * trying to uncomment
      
      * more comments
      
      * more uncommenting
      
      * more uncommenting
      
      * fix comment
      
      * more uncommenting
      
      * uncomment fully-commented out stuff
      
      * try uncommenting more dataset tests
      
      * uncommenting more tests
      
      * ok getting closer
      
      * more uncommenting
      
      * free dataset
      
      * skipping a test, more uncommenting
      
      * more skipping
      
      * re-enable OpenMP
      
      * allow on OpenMP thing
      
      * move valgrind to comment-only job
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * changes from code review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * linting
      
      * issue comments too
      
      * remove issue_comment
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      81d76113
  6. 30 Sep, 2020 2 commits
    • Guolin Ke's avatar
      stable multi-threading sum reduction (#3385) · 692c9a5b
      Guolin Ke authored
      * Update serial_tree_learner.cpp
      
      * Update src/treelearner/serial_tree_learner.cpp
      
      * stable multi-threading reduction
      
      * Update src/treelearner/serial_tree_learner.cpp
      
      * more fixes
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Update src/boosting/gbdt.cpp
      692c9a5b
    • Guolin Ke's avatar
      fix address alignment, required by cran (#3415) · f30dbe87
      Guolin Ke authored
      * fix dataset binary file alignment
      
      * many fixes
      
      * fix warnings
      
      * fix bug
      
      * Update file_io.cpp
      
      * Update file_io.cpp
      
      * simplify code
      
      * Apply suggestions from code review
      
      * general
      
      * remove unneeded alignment
      
      * Update file_io.h
      
      * int32 to byte8 alignment
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      f30dbe87
  7. 29 Jul, 2020 2 commits
    • James Lamb's avatar
      remove unnecessary semicolon (#3260) · 1cf13dba
      James Lamb authored
      1cf13dba
    • James Lamb's avatar
      [R-package] make package installable with CRAN toolchain (fixes #2960) (#3188) · aa933eb4
      James Lamb authored
      
      
      * [R-package] make package installable with CRAN toolchain (fixes #2960)
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * remove GPU stuff
      
      * use wildcard to find objects to build
      
      * use -lomp
      
      * build configure before moving files
      
      * using wildcard for objects
      
      * Update .github/workflows/main.yml
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * add explicit objects back
      
      * reduce allowed R CMD check NOTEs and catch stderr from build-cran-package on Windows
      
      * fixing things
      
      * pin autoconf version
      
      * show diff
      
      * add automake back
      
      * run less checks
      
      * command was in the wrong place
      
      * fix autoconf version
      
      * change strategy for handling configure
      
      * fix Rbuildignore
      
      * fix NOTEs
      
      * fix notes about unrecognized files
      
      * fixing extra files
      
      * remove USE_R35
      
      * add OpenMP check for Mac CRAN build
      
      * run all checks
      
      * Apply suggestions from code review
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * suggestions from code review
      
      * undo indenting
      
      * remove 03 from Makevars.win.in
      
      * update language about OpenMP in configure script
      
      * checking if configure.ac check works
      
      * add autoconf back
      
      * remove testing code in configure.ac
      
      * more fixes for CI on configure script
      
      * print git diff
      
      * add VERSION.txt when checking configure
      
      * fix relative paths
      
      * remove git diff
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      aa933eb4
  8. 19 Jul, 2020 1 commit
    • Joan Fontanals's avatar
      Change locking strategy of Booster, allow for share and unique locks (#2760) · 1c35c3b9
      Joan Fontanals authored
      
      
      * Add capability to get possible max and min values for a model
      
      * Change implementation to have return value in tree.cpp, change naming to upper and lower bound, move implementation to gdbt.cpp
      
      * Update include/LightGBM/c_api.h
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Change iteration to avoid potential overflow, add bindings to R and Python and a basic test
      
      * Adjust test values
      
      * Consider const correctness and multithreading protection
      
      * Put everything possible as const
      
      * Include shared_mutex, for now as unique_lock
      
      * Update test values
      
      * Put everything possible as const
      
      * Include shared_mutex, for now as unique_lock
      
      * Make PredictSingleRow const and share the lock with other reading threads
      
      * Update test values
      
      * Add test to check that model is exactly the same in all platforms
      
      * Try to parse the model to get the expected values
      
      * Try to parse the model to get the expected values
      
      * Fix implementation, num_leaves can be lower than the leaf_value_ size
      
      * Do not check for num_leaves to be smaller than actual size and get back to test with hardcoded value
      
      * Change test order
      
      * Add gpu_use_dp option in test
      
      * Remove helper test method
      
      * Remove TODO
      
      * Add preprocessing option to compile with c++17
      
      * Update python-package/setup.py
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Remove unwanted changes
      
      * Move option
      
      * Fix problems introduced by conflict fix
      
      * Avoid switching to c++17 and use yamc mutex library to access shared lock functionality
      
      * Add extra yamc include
      
      * Change header order
      
      * some lint fix
      
      * change include order and remove some extra blank lines
      
      * Further fix lint issues
      
      * Update c_api.cpp
      
      * Further fix lint issues
      
      * Move yamc include files to a new yamc folder
      
      * Use standard unique_lock
      
      * Update windows/LightGBM.vcxproj
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      
      * Update windows/LightGBM.vcxproj.filters
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      
      * Update windows/LightGBM.vcxproj.filters
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update windows/LightGBM.vcxproj.filters
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update windows/LightGBM.vcxproj.filters
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Fix problems coming from merge conflict resolution
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      Co-authored-by: default avatarjoanfontanals <jfontanals@ntent.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      1c35c3b9
  9. 08 Jul, 2020 1 commit
  10. 23 Jun, 2020 1 commit
    • Belinda Trotta's avatar
      Interaction constraints (#3126) · bca2da97
      Belinda Trotta authored
      * Add interaction constraints functionality.
      
      * Minor fixes.
      
      * Minor fixes.
      
      * Change lambda to function.
      
      * Fix gpu bug, remove extra blank lines.
      
      * Fix gpu bug.
      
      * Fix style issues.
      
      * Try to fix segfault on MACOS.
      
      * Fix bug.
      
      * Fix bug.
      
      * Fix bugs.
      
      * Change parameter format for R.
      
      * Fix R style issues.
      
      * Change string formatting code.
      
      * Change docs to say R package not supported.
      
      * Remove R functionality, moving to separate PR.
      
      * Keep track of branch features in tree object.
      
      * Only track branch features when feature interactions are enabled.
      
      * Fix lint error.
      
      * Update docs and simplify tests.
      bca2da97
  11. 05 Jun, 2020 2 commits
  12. 01 Jun, 2020 1 commit
  13. 20 May, 2020 1 commit
  14. 13 Apr, 2020 1 commit
  15. 12 Apr, 2020 1 commit
  16. 10 Apr, 2020 1 commit
  17. 08 Apr, 2020 1 commit
  18. 04 Apr, 2020 1 commit
  19. 24 Mar, 2020 1 commit
    • James Lamb's avatar
      [R-package] Use Rprintf for logging in the R package (fixes #1440, fixes #1909) (#2901) · 0341906c
      James Lamb authored
      
      
      * [R-package] started cutting over from custom R-to-C interface to R.h
      
      * replaced LGBM_SE with SEXP
      
      * fixed error about ocnflicting definitions of length
      
      * got linking working
      
      * more stuff
      
      * eliminated R CMD CHECK note about printing
      
      * switched from hard-coded include dir to the one from FindLibR.cmake
      
      * cleaned up formatting in FindLibR.cmake
      
      * commented-out everything in CI that does not touch R
      
      * more changes
      
      * trying to get better logs
      
      * tried ignoring
      
      * added error message to confirm a suspicion
      
      * still trying to find R during R CMD CHECK
      
      * restore full CI
      
      * fixed comment
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      
      * changed strategy for finding LIBR_HOME on Windows
      
      * Removed 32-bit Windows stuff in FindLibR.cmake
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      
      * Update CMakeLists.txt
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update CMakeLists.txt
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * removed some duplication in cmake scripts
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * added LIBR_CORE_LIBRARY back
      
      * small fixes to CMakeLists
      
      * simplified FindLibR.cmake
      
      * some fixes for windows
      
      * Apply suggestions from code review
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * allowed for directly passing LIBR_EXECUTABLE to FindLibR.cmake
      
      * reorganized FindLibR.cmake to catch more cases
      
      * clean up inconsistencies  in R calls in FindLibR.cmake
      
      * Update R-package/src/cmake/modules/FindLibR.cmake
      Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>
      
      * removed unnecessary log messages
      
      * removed unnecessary unset() call
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      0341906c
  20. 23 Mar, 2020 1 commit
  21. 17 Mar, 2020 1 commit
  22. 11 Mar, 2020 1 commit
  23. 06 Mar, 2020 1 commit
  24. 05 Mar, 2020 1 commit
    • Guolin Ke's avatar
      speed up `FindBestThresholdFromHistogram` (#2867) · 77d92b7c
      Guolin Ke authored
      * speed up for const hessian
      
      * rename template
      
      * some refactorings
      
      * refine
      
      * refine
      
      * simplify codes
      
      * fix random in feature histogram
      
      * code refine
      
      * refine
      
      * try fix
      
      * make gcc happy
      
      * remove timer
      
      * rollback some changes
      
      * more templates
      
      * fix a bug
      
      * reduce the cost of timer
      
      * fix gpu
      
      * fix bug
      
      * fix gpu
      77d92b7c
  25. 04 Mar, 2020 1 commit
  26. 02 Mar, 2020 2 commits
  27. 26 Feb, 2020 1 commit
  28. 25 Feb, 2020 1 commit
  29. 22 Feb, 2020 1 commit
    • Guolin Ke's avatar
      some code refactoring (#2769) · 3e80df7e
      Guolin Ke authored
      * some refines
      
      * more omp refactoring
      
      * format define
      
      * fix merge bug
      
      * some fixes
      
      * fix some warnings
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * remove dup codes
      3e80df7e
  30. 20 Feb, 2020 1 commit
    • Nikita Titov's avatar
      added feature infos to JSON dump (#2660) · c4a7ab81
      Nikita Titov authored
      
      
      * added feature infos to JSON dump
      
      * slight json schema refactor
      
      * simpified code
      
      * refactor feature_infos
      
      * refactoring
      
      * Update src/boosting/gbdt.cpp
      
      * Update dataset.h
      
      * Update include/LightGBM/dataset.h
      
      * simplify
      
      * Apply suggestions from code review
      
      * parse string and construct JSON objs
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      c4a7ab81
  31. 08 Feb, 2020 1 commit
  32. 04 Feb, 2020 1 commit
  33. 02 Feb, 2020 1 commit
    • Guolin Ke's avatar
      Support both row-wise and col-wise multi-threading (#2699) · 509c2e50
      Guolin Ke authored
      
      
      * commit
      
      * fix a bug
      
      * fix bug
      
      * reset to track changes
      
      * refine the auto choose logic
      
      * sort the time stats output
      
      * fix include
      
      * change  multi_val_bin_sparse_threshold
      
      * add cmake
      
      * add _mm_malloc and _mm_free for cross platform
      
      * fix cmake bug
      
      * timer for split
      
      * try to fix cmake
      
      * fix tests
      
      * refactor DataPartition::Split
      
      * fix test
      
      * typo
      
      * formating
      
      * Revert "formating"
      
      This reverts commit 5b8de4f7fb9d975ee23701d276a66d40ee6d4222.
      
      * add document
      
      * [R-package] Added tests on use of force_col_wise and force_row_wise in training (#2719)
      
      * naming
      
      * fix gpu code
      
      * Update include/LightGBM/bin.h
      Co-Authored-By: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Update src/treelearner/ocl/histogram16.cl
      
      * test: swap compilers for CI
      
      * fix omp
      
      * not avx2
      
      * no aligned for feature histogram
      
      * Revert "refactor DataPartition::Split"
      
      This reverts commit 256e6d9641ade966a1f54da1752e998a1149b6f8.
      
      * slightly refactor data partition
      
      * reduce the memory cost
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
      509c2e50
  34. 15 Jan, 2020 1 commit
  35. 17 Dec, 2019 1 commit