[R-package] [docs] clarify shape of predictions (#5384)

* [R-package] [docs] clarify shape of predictions * Apply suggestions from code review Co-authored-by: Michael Mayer <mayermichael79@gmail.com> * regenerate docs * apply suggestions from code review * fix linting error abouut long lines Co-authored-by: Michael Mayer <mayermichael79@gmail.com>

[R-package] [docs] clarify shape of predictions (#5384)
* [R-package] [docs] clarify shape of predictions * Apply suggestions from code review Co-authored-by: Michael Mayer <mayermichael79@gmail.com> * regenerate docs * apply suggestions from code review * fix linting error abouut long lines Co-authored-by: Michael Mayer <mayermichael79@gmail.com>
212d1457 · James Lamb · GitHub · 44d37184 · 212d1457 · 212d1457
Unverified Commit 212d1457 authored Jul 29, 2022 by James Lamb Committed by GitHub Jul 29, 2022
Show whitespace changes
Inline Side-by-side

Showing with 36 additions and 14 deletions

R-package/R/lgb.Booster.R R-package/R/lgb.Booster.R +18 -7

R-package/man/predict.lgb.Booster.Rd R-package/man/predict.lgb.Booster.Rd +18 -7

No files found.
--- a/R-package/R/lgb.Booster.R
+++ b/R-package/R/lgb.Booster.R
@@ -767,9 +767,7 @@ Booster <- R6::R6Class(
 #'             \item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls
 #'                   in each tree in the model, outputted as integers, with one column per tree.
 #'             \item \code{"contrib"}: will return the per-feature contributions for each prediction, including an
-#'                   intercept (each feature will produce one column). If there are multiple classes, each class will
+#'                   intercept (each feature will produce one column).
-#'                   have separate feature contributions (thus the number of columns is features+1 multiplied by the
-#'                   number of classes).
 #'             }
 #'
 #'             Note that, if using custom objectives, types "class" and "response" will not be available and will
@@ -790,12 +788,25 @@ Booster <- R6::R6Class(
 #'               the values in \code{params} take precedence.
 #' @param ... ignored
 #' @return For prediction types that are meant to always return one output per observation (e.g. when predicting
-#'         \code{type="response"} on a binary classification or regression objective), will return a vector with one
+#'         \code{type="response"} or \code{type="raw"} on a binary classification or regression objective), will
-#'         element per row in \code{newdata}.
+#'         return a vector with one element per row in \code{newdata}.
 #'
 #'         For prediction types that are meant to return more than one output per observation (e.g. when predicting
-#'         \code{type="response"} on a multi-class objective, or when predicting \code{type="leaf"}, regardless of
+#'         \code{type="response"} or \code{type="raw"} on a multi-class objective, or when predicting
-#'         objective), will return a matrix with one row per observation in \code{newdata} and one column per output.
+#'         \code{type="leaf"}, regardless of objective), will return a matrix with one row per observation in
+#'         \code{newdata} and one column per output.
+#'
+#'         For \code{type="leaf"} predictions, will return a matrix with one row per observation in \code{newdata}
+#'         and one column per tree. Note that for multiclass objectives, LightGBM trains one tree per class at each
+#'         boosting iteration. That means that, for example, for a multiclass model with 3 classes, the leaf
+#'         predictions for the first class can be found in columns 1, 4, 7, 10, etc.
+#'
+#'         For \code{type="contrib"}, will return a matrix of SHAP values with one row per observation in
+#'         \code{newdata} and columns corresponding to features. For regression, ranking, cross-entropy, and binary
+#'         classification objectives, this matrix contains one column per feature plus a final column containing the
+#'         Shapley base value. For multiclass objectives, this matrix will represent \code{num_classes} such matrices,
+#'         in the order "feature contributions for first class, feature contributions for second class, feature
+#'         contributions for third class, etc.".
 #'
 #' @examples
 #' \donttest{

--- a/R-package/man/predict.lgb.Booster.Rd
+++ b/R-package/man/predict.lgb.Booster.Rd
@@ -34,9 +34,7 @@ a character representing a path to a text file (CSV, TSV, or LibSVM)}
            \item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls
                  in each tree in the model, outputted as integers, with one column per tree.
            \item \code{"contrib"}: will return the per-feature contributions for each prediction, including an
-                  intercept (each feature will produce one column). If there are multiple classes, each class will
+                  intercept (each feature will produce one column).
-                  have separate feature contributions (thus the number of columns is features+1 multiplied by the
-                  number of classes).
            }
            Note that, if using custom objectives, types "class" and "response" will not be available and will
@@ -64,12 +62,25 @@ the values in \code{params} take precedence.}
 }
 \value{
 For prediction types that are meant to always return one output per observation (e.g. when predicting
-        \code{type="response"} on a binary classification or regression objective), will return a vector with one
+        \code{type="response"} or \code{type="raw"} on a binary classification or regression objective), will
-        element per row in \code{newdata}.
+        return a vector with one element per row in \code{newdata}.
        For prediction types that are meant to return more than one output per observation (e.g. when predicting
-        \code{type="response"} on a multi-class objective, or when predicting \code{type="leaf"}, regardless of
+        \code{type="response"} or \code{type="raw"} on a multi-class objective, or when predicting
-        objective), will return a matrix with one row per observation in \code{newdata} and one column per output.
+        \code{type="leaf"}, regardless of objective), will return a matrix with one row per observation in
+        \code{newdata} and one column per output.
+        For \code{type="leaf"} predictions, will return a matrix with one row per observation in \code{newdata}
+        and one column per tree. Note that for multiclass objectives, LightGBM trains one tree per class at each
+        boosting iteration. That means that, for example, for a multiclass model with 3 classes, the leaf
+        predictions for the first class can be found in columns 1, 4, 7, 10, etc.
+        For \code{type="contrib"}, will return a matrix of SHAP values with one row per observation in
+        \code{newdata} and columns corresponding to features. For regression, ranking, cross-entropy, and binary
+        classification objectives, this matrix contains one column per feature plus a final column containing the
+        Shapley base value. For multiclass objectives, this matrix will represent \code{num_classes} such matrices,
+        in the order "feature contributions for first class, feature contributions for second class, feature
+        contributions for third class, etc.".
 }
 \description{
 Predicted values based on class \code{lgb.Booster}