Fix evalution of linear trees with a single leaf. (#3987)

* Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix evalution of linear trees with a single leaf. Note that trees without linear models at the leaf always handle num_leaves = 1 as a special case and directly output the leaf value. Linear trees were missing this special case handling, and hence would have the following issues: * Calling Tree::Predict or Tree::PredictByMap would cause an access violation exception attempting to access the first value of the empty split_feature_ array in GetLeaf. * PredictionFunLinear would either cause an access violation or go into an infinite loop when attempting to do the equivalent of GetLeaf. Note also that PredictionFun does not need the same changes as PredictionFunLinear, since both are only called by Tree::AddPredictionToScore, which has a special case for (!is_linear_ && num_leaves_ <= 1) that precludes calling PredictionFun. Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by: Guolin Ke <guolin.ke@outlook.com>

Fix evalution of linear trees with a single leaf. (#3987)
* Fix index out-of-range exception generated by BaggingHelper on small datasets. Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero. * Update goss.hpp * Update goss.hpp * Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array) * Fix incorrect upstream merge * Add link to LightGBM.NET * Fix indenting to 2 spaces * Dummy edit to trigger CI * Dummy edit to trigger CI * remove duplicate functions from merge * Fix evalution of linear trees with a single leaf. Note that trees without linear models at the leaf always handle num_leaves = 1 as a special case and directly output the leaf value. Linear trees were missing this special case handling, and hence would have the following issues: * Calling Tree::Predict or Tree::PredictByMap would cause an access violation exception attempting to access the first value of the empty split_feature_ array in GetLeaf. * PredictionFunLinear would either cause an access violation or go into an infinite loop when attempting to do the equivalent of GetLeaf. Note also that PredictionFun does not need the same changes as PredictionFunLinear, since both are only called by Tree::AddPredictionToScore, which has a special case for (!is_linear_ && num_leaves_ <= 1) that precludes calling PredictionFun. Co-authored-by: matthew-peacock <matthew.peacock@whiteoakam.com> Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
605c97b5 · mjmckp · GitHub · b1d382ee · 605c97b5 · 605c97b5
Unverified Commit 605c97b5 authored Feb 22, 2021 by mjmckp Committed by GitHub Feb 21, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 10 deletions

include/LightGBM/tree.h include/LightGBM/tree.h +2 -2

src/io/tree.cpp src/io/tree.cpp +11 -8

No files found.
--- a/include/LightGBM/tree.h
+++ b/include/LightGBM/tree.h
@@ -575,7 +575,7 @@ inline void Tree::Split(int leaf, int feature, int real_feature,

 inline double Tree::Predict(const double* feature_values) const {
  if (is_linear_) {
-      int leaf = GetLeaf(feature_values);
+      int leaf = (num_leaves_ > 1) ? GetLeaf(feature_values) : 0;
      double output = leaf_const_[leaf];
      bool nan_found = false;
      for (size_t i = 0; i < leaf_features_[leaf].size(); ++i) {
@@ -605,7 +605,7 @@ inline double Tree::Predict(const double* feature_values) const {

 inline double Tree::PredictByMap(const std::unordered_map<int, double>& feature_values) const {
  if (is_linear_) {
-    int leaf = GetLeafByMap(feature_values);
+    int leaf = (num_leaves_ > 1) ? GetLeafByMap(feature_values) : 0;
    double output = leaf_const_[leaf];
    bool nan_found = false;
    for (size_t i = 0; i < leaf_features_[leaf].size(); ++i) {

--- a/src/io/tree.cpp
+++ b/src/io/tree.cpp
@@ -120,15 +120,18 @@ int Tree::SplitCategorical(int leaf, int feature, int real_feature, const uint32
  }                                                                           \
  for (data_size_t i = start; i < end; ++i) {                                 \
    int node = 0;                                                             \
-    while (node >= 0) {                                                       \
-      node = decision_fun(iter[(iter_idx)]->Get((data_idx)), node,            \
-                          default_bins[node], max_bins[node]);                \
+    if (num_leaves_ > 1) {                                                    \
+      while (node >= 0) {                                                     \
+        node = decision_fun(iter[(iter_idx)]->Get((data_idx)), node,          \
+                            default_bins[node], max_bins[node]);              \
+      }                                                                       \
+      node = ~node;                                                           \
    }                                                                         \
-    double add_score = leaf_const_[~node];                                    \
+    double add_score = leaf_const_[node];                                     \
    bool nan_found = false;                                                   \
-    const double* coeff_ptr = leaf_coeff_[~node].data();                      \
-    const float** data_ptr = feat_ptr[~node].data();                          \
-    for (size_t j = 0; j < leaf_features_inner_[~node].size(); ++j) {         \
+    const double* coeff_ptr = leaf_coeff_[node].data();                       \
+    const float** data_ptr = feat_ptr[node].data();                           \
+    for (size_t j = 0; j < leaf_features_inner_[node].size(); ++j) {          \
       float feat_val = data_ptr[j][(data_idx)];                              \
       if (std::isnan(feat_val)) {                                            \
          nan_found = true;                                                   \
@@ -137,7 +140,7 @@ int Tree::SplitCategorical(int leaf, int feature, int real_feature, const uint32
       add_score += coeff_ptr[j] * feat_val;                                  \
    }                                                                         \
    if (nan_found) {                                                          \
-       score[(data_idx)] += leaf_value_[~node];                               \
+       score[(data_idx)] += leaf_value_[node];                                \
    } else {                                                                  \
      score[(data_idx)] += add_score;                                         \
    }                                                                         \