• Alberto Ferreira's avatar
    Fix model locale issue and improve model R/W performance. (#3405) · 792c9303
    Alberto Ferreira authored
    * Fix LightGBM models locale sensitivity and improve R/W performance.
    
    When Java is used, the default C++ locale is broken. This is true for
    Java providers that use the C API or even Python models that require JEP.
    
    This patch solves that issue making the model reads/writes insensitive
    to such settings.
    To achieve it, within the model read/write codebase:
     - C++ streams are imbued with the classic locale
     - Calls to functions that are dependent on the locale are replaced
     - The default locale is not changed!
    
    This approach means:
     - The user's locale is never tampered with, avoiding issues such as
        https://github.com/microsoft/LightGBM/issues/2979 with the previous
        approach https://github.com/microsoft/LightGBM/pull/2891
     - Datasets can still be read according the user's locale
     - The model file has a single format independent of locale
    
    Changes:
     - Add CommonC namespace which provides faster locale-independent versions of Common's methods
     - Model code makes conversions through CommonC
     - Cleanup unused Common methods
     - Performance improvements. Use fast libraries for locale-agnostic conversion:
       - value->string: https://github.com/fmtlib/fmt
       - string->double: https://github.com/lemire/fast_double_parser (10x
          faster double parsing according to their benchmark)
    
    Bugfixes:
     - https://github.com/microsoft/LightGBM/issues/2500
     - https://github.com/microsoft/LightGBM/issues/2890
     - https://github.com/ninia/jep/issues/205
    
     (as it is related to LGBM as well)
    
    * Align CommonC namespace
    
    * Add new external_libs/ to python setup
    
    * Try fast_double_parser fix #1
    
    Testing commit e09e5aad828bcb16bea7ed0ed8322e019112fdbe
    
    If it works it should fix more LGBM builds
    
    * CMake: Attempt to link fmt without explicit PUBLIC tag
    
    * Exclude external_libs from linting
    
    * Add exernal_libs to MANIFEST.in
    
    * Set dynamic linking option for fmt.
    
    * linting issues
    
    * Try to fix lint includes
    
    * Try to pass fPIC with static fmt lib
    
    * Try CMake P_I_C option with fmt library
    
    * [R-package] Add CMake support for R and CRAN
    
    * Cleanup CMakeLists
    
    * Try fmt hack to remove stdout
    
    * Switch to header-only mode
    
    * Add PRIVATE argument to target_link_libraries
    
    * use fmt in header-only mode
    
    * Remove CMakeLists comment
    
    * Change OpenMP to PUBLIC linking in Mac
    
    * Update fmt submodule to 7.1.2
    
    * Use fmt in header-only-mode
    
    * Remove fmt from CMakeLists.txt
    
    * Upgrade fast_double_parser to v0.2.0
    
    * Revert "Add PRIVATE argument to target_link_libraries"
    
    This reverts commit 3dd45dde7b92531b2530ab54522bb843c56227a7.
    
    * Address James Lamb's comments
    
    * Update R-package/.Rbuildignore
    Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
    
    * Upgrade to fast_double_parser v0.3.0 - Solaris support
    
    * Use legacy code only in Solaris
    
    * Fix lint issues
    
    * Fix comment
    
    * Address StrikerRUS's comments (solaris ifdef).
    
    * Change header guards
    Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
    792c9303
test.sh 10.3 KB