1. 08 Dec, 2020 1 commit
    • Alberto Ferreira's avatar
      Fix model locale issue and improve model R/W performance. (#3405) · 792c9303
      Alberto Ferreira authored
      * Fix LightGBM models locale sensitivity and improve R/W performance.
      
      When Java is used, the default C++ locale is broken. This is true for
      Java providers that use the C API or even Python models that require JEP.
      
      This patch solves that issue making the model reads/writes insensitive
      to such settings.
      To achieve it, within the model read/write codebase:
       - C++ streams are imbued with the classic locale
       - Calls to functions that are dependent on the locale are replaced
       - The default locale is not changed!
      
      This approach means:
       - The user's locale is never tampered with, avoiding issues such as
          https://github.com/microsoft/LightGBM/issues/2979 with the previous
          approach https://github.com/microsoft/LightGBM/pull/2891
       - Datasets can still be read according the user's locale
       - The model file has a single format independent of locale
      
      Changes:
       - Add CommonC namespace which provides faster locale-independent versions of Common's methods
       - Model code makes conversions through CommonC
       - Cleanup unused Common methods
       - Performance improvements. Use fast libraries for locale-agnostic conversion:
         - value->string: https://github.com/fmtlib/fmt
         - string->double: https://github.com/lemire/fast_double_parser (10x
            faster double parsing according to their benchmark)
      
      Bugfixes:
       - https://github.com/microsoft/LightGBM/issues/2500
       - https://github.com/microsoft/LightGBM/issues/2890
       - https://github.com/ninia/jep/issues/205
      
       (as it is related to LGBM as well)
      
      * Align CommonC namespace
      
      * Add new external_libs/ to python setup
      
      * Try fast_double_parser fix #1
      
      Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe
      
      If it works it should fix more LGBM builds
      
      * CMake: Attempt to link fmt without explicit PUBLIC tag
      
      * Exclude external_libs from linting
      
      * Add exernal_libs to MANIFEST.in
      
      * Set dynamic linking option for fmt.
      
      * linting issues
      
      * Try to fix lint includes
      
      * Try to pass fPIC with static fmt lib
      
      * Try CMake P_I_C option with fmt library
      
      * [R-package] Add CMake support for R and CRAN
      
      * Cleanup CMakeLists
      
      * Try fmt hack to remove stdout
      
      * Switch to header-only mode
      
      * Add PRIVATE argument to target_link_libraries
      
      * use fmt in header-only mode
      
      * Remove CMakeLists comment
      
      * Change OpenMP to PUBLIC linking in Mac
      
      * Update fmt submodule to 7.1.2
      
      * Use fmt in header-only-mode
      
      * Remove fmt from CMakeLists.txt
      
      * Upgrade fast_double_parser to v0.2.0
      
      * Revert "Add PRIVATE argument to target_link_libraries"
      
      This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7.
      
      * Address James Lamb's comments
      
      * Update R-package/.Rbuildignore
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      
      * Upgrade to fast_double_parser v0.3.0 - Solaris support
      
      * Use legacy code only in Solaris
      
      * Fix lint issues
      
      * Fix comment
      
      * Address StrikerRUS's comments (solaris ifdef).
      
      * Change header guards
      Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
      792c9303
  2. 09 Apr, 2017 1 commit
    • Huan Zhang's avatar
      Initial GPU acceleration support for LightGBM (#368) · 0bb4a825
      Huan Zhang authored
      * add dummy gpu solver code
      
      * initial GPU code
      
      * fix crash bug
      
      * first working version
      
      * use asynchronous copy
      
      * use a better kernel for root
      
      * parallel read histogram
      
      * sparse features now works, but no acceleration, compute on CPU
      
      * compute sparse feature on CPU simultaneously
      
      * fix big bug; add gpu selection; add kernel selection
      
      * better debugging
      
      * clean up
      
      * add feature scatter
      
      * Add sparse_threshold control
      
      * fix a bug in feature scatter
      
      * clean up debug
      
      * temporarily add OpenCL kernels for k=64,256
      
      * fix up CMakeList and definition USE_GPU
      
      * add OpenCL kernels as string literals
      
      * Add boost.compute as a submodule
      
      * add boost dependency into CMakeList
      
      * fix opencl pragma
      
      * use pinned memory for histogram
      
      * use pinned buffer for gradients and hessians
      
      * better debugging message
      
      * add double precision support on GPU
      
      * fix boost version in CMakeList
      
      * Add a README
      
      * reconstruct GPU initialization code for ResetTrainingData
      
      * move data to GPU in parallel
      
      * fix a bug during feature copy
      
      * update gpu kernels
      
      * update gpu code
      
      * initial port to LightGBM v2
      
      * speedup GPU data loading process
      
      * Add 4-bit bin support to GPU
      
      * re-add sparse_threshold parameter
      
      * remove kMaxNumWorkgroups and allows an unlimited number of features
      
      * add feature mask support for skipping unused features
      
      * enable kernel cache
      
      * use GPU kernels withoug feature masks when all features are used
      
      * REAdme.
      
      * REAdme.
      
      * update README
      
      * fix typos (#349)
      
      * change compile to gcc on Apple as default
      
      * clean vscode related file
      
      * refine api of constructing from sampling data.
      
      * fix bug in the last commit.
      
      * more efficient algorithm to sample k from n.
      
      * fix bug in filter bin
      
      * change to boost from average output.
      
      * fix tests.
      
      * only stop training when all classes are finshed in multi-class.
      
      * limit the max tree output. change hessian in multi-class objective.
      
      * robust tree model loading.
      
      * fix test.
      
      * convert the probabilities to raw score in boost_from_average of classification.
      
      * fix the average label for binary classification.
      
      * Add boost_from_average to docs (#354)
      
      * don't use "ConvertToRawScore" for self-defined objective function.
      
      * boost_from_average seems doesn't work well in binary classification. remove it.
      
      * For a better jump link (#355)
      
      * Update Python-API.md
      
      * for a better jump in page
      
      A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)
      
      After adding the spaces, we can jump to the exact position in page by click the link.
      
      * fixed something mentioned by @wxchan
      
      * Update Python-API.md
      
      * add FitByExistingTree.
      
      * adapt GPU tree learner for FitByExistingTree
      
      * avoid NaN output.
      
      * update boost.compute
      
      * fix typos (#361)
      
      * fix broken links (#359)
      
      * update README
      
      * disable GPU acceleration by default
      
      * fix image url
      
      * cleanup debug macro
      
      * remove old README
      
      * do not save sparse_threshold_ in FeatureGroup
      
      * add details for new GPU settings
      
      * ignore submodule when doing pep8 check
      
      * allocate workspace for at least one thread during builing Feature4
      
      * move sparse_threshold to class Dataset
      
      * remove duplicated code in GPUTreeLearner::Split
      
      * Remove duplicated code in FindBestThresholds and BeforeFindBestSplit
      
      * do not rebuild ordered gradients and hessians for sparse features
      
      * support feature groups in GPUTreeLearner
      
      * Initial parallel learners with GPU support
      
      * add option device, cleanup code
      
      * clean up FindBestThresholds; add some omp parallel
      
      * constant hessian optimization for GPU
      
      * Fix GPUTreeLearner crash when there is zero feature
      
      * use np.testing.assert_almost_equal() to compare lists of floats in tests
      
      * travis for GPU
      0bb4a825