1. 19 Feb, 2021 3 commits
    • mjmckp's avatar
      Use high precision conversion from double to string in Tree::ToString() for... · 7f91dc66
      mjmckp authored
      
      Use high precision conversion from double to string in Tree::ToString() for new linear tree members (#3938)
      
      * Fix index out-of-range exception generated by BaggingHelper on small datasets.
      
      Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.
      
      * Update goss.hpp
      
      * Update goss.hpp
      
      * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)
      
      * Fix incorrect upstream merge
      
      * Add link to LightGBM.NET
      
      * Fix indenting to 2 spaces
      
      * Dummy edit to trigger CI
      
      * Dummy edit to trigger CI
      
      * remove duplicate functions from merge
      
      * In Tree::ToString() method, print double values for linear tree models with high precision, so that the tree may be accurately reproduced elsewhere (LightGBM.Net in particular)
      
      * Need to use more precise StringToArray instead of StringToArrayFast when parsing double valued arrays for linear trees, to ensure models round-trip via string or file correctly.
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      7f91dc66
    • James Lamb's avatar
      [docs] Change some 'parallel learning' references to 'distributed learning' (#4000) · 7880b79f
      James Lamb authored
      * [docs] Change some 'parallel learning' references to 'distributed learning'
      
      * found a few more
      
      * one more reference
      7880b79f
    • James Lamb's avatar
  2. 18 Feb, 2021 2 commits
  3. 17 Feb, 2021 2 commits
    • mjmckp's avatar
      Fix for CreatePredictor function and VS2017 Debug build (#3937) · 5321fef6
      mjmckp authored
      
      
      * Fix index out-of-range exception generated by BaggingHelper on small datasets.
      
      Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.
      
      * Update goss.hpp
      
      * Update goss.hpp
      
      * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)
      
      * Fix incorrect upstream merge
      
      * Add link to LightGBM.NET
      
      * Fix indenting to 2 spaces
      
      * Dummy edit to trigger CI
      
      * Dummy edit to trigger CI
      
      * remove duplicate functions from merge
      
      * Fix for CreatePredictor function: for VS2017 in Debug build, the previous version would end up giving an uninitialised prediction function that would throw access violation exceptions when invoked.
      Co-authored-by: default avatarmatthew-peacock <matthew.peacock@whiteoakam.com>
      Co-authored-by: default avatarGuolin Ke <guolin.ke@outlook.com>
      5321fef6
    • Alex Ford's avatar
      Optimize array-from-ctypes in basic.py (#3927) · de8c6105
      Alex Ford authored
      Approximately %80 of runtime when loading "low column count, high row
      count" DataFrames into Datasets is consumed in `np.fromiter`, called
      as part of the `Dataset.get_field` method.
      
      This is particularly pernicious hotspot, as unlike other ctypes-based
      methods this is a hot loop over a python iterator loop and causes
      significant GIL-contention in multi-threaded applications.
      
      Replace `np.fromiter` with a direct call to `np.ctypeslib.as_array`,
      which allows a single-shot `copy` of the underlying array.
      
      This reduces the load time of a ~35 million row categorical dataframe
      with 1 column from ~5 seconds to ~1 second, and allows multi-threaded
      execution.
      de8c6105
  4. 16 Feb, 2021 11 commits
  5. 15 Feb, 2021 17 commits
  6. 14 Feb, 2021 4 commits
  7. 13 Feb, 2021 1 commit