Unverified Commit 212d1457 authored by James Lamb's avatar James Lamb Committed by GitHub
Browse files

[R-package] [docs] clarify shape of predictions (#5384)



* [R-package] [docs] clarify shape of predictions

* Apply suggestions from code review
Co-authored-by: default avatarMichael Mayer <mayermichael79@gmail.com>

* regenerate docs

* apply suggestions from code review

* fix linting error abouut long lines
Co-authored-by: default avatarMichael Mayer <mayermichael79@gmail.com>
parent 44d37184
...@@ -767,9 +767,7 @@ Booster <- R6::R6Class( ...@@ -767,9 +767,7 @@ Booster <- R6::R6Class(
#' \item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls #' \item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls
#' in each tree in the model, outputted as integers, with one column per tree. #' in each tree in the model, outputted as integers, with one column per tree.
#' \item \code{"contrib"}: will return the per-feature contributions for each prediction, including an #' \item \code{"contrib"}: will return the per-feature contributions for each prediction, including an
#' intercept (each feature will produce one column). If there are multiple classes, each class will #' intercept (each feature will produce one column).
#' have separate feature contributions (thus the number of columns is features+1 multiplied by the
#' number of classes).
#' } #' }
#' #'
#' Note that, if using custom objectives, types "class" and "response" will not be available and will #' Note that, if using custom objectives, types "class" and "response" will not be available and will
...@@ -790,12 +788,25 @@ Booster <- R6::R6Class( ...@@ -790,12 +788,25 @@ Booster <- R6::R6Class(
#' the values in \code{params} take precedence. #' the values in \code{params} take precedence.
#' @param ... ignored #' @param ... ignored
#' @return For prediction types that are meant to always return one output per observation (e.g. when predicting #' @return For prediction types that are meant to always return one output per observation (e.g. when predicting
#' \code{type="response"} on a binary classification or regression objective), will return a vector with one #' \code{type="response"} or \code{type="raw"} on a binary classification or regression objective), will
#' element per row in \code{newdata}. #' return a vector with one element per row in \code{newdata}.
#' #'
#' For prediction types that are meant to return more than one output per observation (e.g. when predicting #' For prediction types that are meant to return more than one output per observation (e.g. when predicting
#' \code{type="response"} on a multi-class objective, or when predicting \code{type="leaf"}, regardless of #' \code{type="response"} or \code{type="raw"} on a multi-class objective, or when predicting
#' objective), will return a matrix with one row per observation in \code{newdata} and one column per output. #' \code{type="leaf"}, regardless of objective), will return a matrix with one row per observation in
#' \code{newdata} and one column per output.
#'
#' For \code{type="leaf"} predictions, will return a matrix with one row per observation in \code{newdata}
#' and one column per tree. Note that for multiclass objectives, LightGBM trains one tree per class at each
#' boosting iteration. That means that, for example, for a multiclass model with 3 classes, the leaf
#' predictions for the first class can be found in columns 1, 4, 7, 10, etc.
#'
#' For \code{type="contrib"}, will return a matrix of SHAP values with one row per observation in
#' \code{newdata} and columns corresponding to features. For regression, ranking, cross-entropy, and binary
#' classification objectives, this matrix contains one column per feature plus a final column containing the
#' Shapley base value. For multiclass objectives, this matrix will represent \code{num_classes} such matrices,
#' in the order "feature contributions for first class, feature contributions for second class, feature
#' contributions for third class, etc.".
#' #'
#' @examples #' @examples
#' \donttest{ #' \donttest{
......
...@@ -34,9 +34,7 @@ a character representing a path to a text file (CSV, TSV, or LibSVM)} ...@@ -34,9 +34,7 @@ a character representing a path to a text file (CSV, TSV, or LibSVM)}
\item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls \item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls
in each tree in the model, outputted as integers, with one column per tree. in each tree in the model, outputted as integers, with one column per tree.
\item \code{"contrib"}: will return the per-feature contributions for each prediction, including an \item \code{"contrib"}: will return the per-feature contributions for each prediction, including an
intercept (each feature will produce one column). If there are multiple classes, each class will intercept (each feature will produce one column).
have separate feature contributions (thus the number of columns is features+1 multiplied by the
number of classes).
} }
Note that, if using custom objectives, types "class" and "response" will not be available and will Note that, if using custom objectives, types "class" and "response" will not be available and will
...@@ -64,12 +62,25 @@ the values in \code{params} take precedence.} ...@@ -64,12 +62,25 @@ the values in \code{params} take precedence.}
} }
\value{ \value{
For prediction types that are meant to always return one output per observation (e.g. when predicting For prediction types that are meant to always return one output per observation (e.g. when predicting
\code{type="response"} on a binary classification or regression objective), will return a vector with one \code{type="response"} or \code{type="raw"} on a binary classification or regression objective), will
element per row in \code{newdata}. return a vector with one element per row in \code{newdata}.
For prediction types that are meant to return more than one output per observation (e.g. when predicting For prediction types that are meant to return more than one output per observation (e.g. when predicting
\code{type="response"} on a multi-class objective, or when predicting \code{type="leaf"}, regardless of \code{type="response"} or \code{type="raw"} on a multi-class objective, or when predicting
objective), will return a matrix with one row per observation in \code{newdata} and one column per output. \code{type="leaf"}, regardless of objective), will return a matrix with one row per observation in
\code{newdata} and one column per output.
For \code{type="leaf"} predictions, will return a matrix with one row per observation in \code{newdata}
and one column per tree. Note that for multiclass objectives, LightGBM trains one tree per class at each
boosting iteration. That means that, for example, for a multiclass model with 3 classes, the leaf
predictions for the first class can be found in columns 1, 4, 7, 10, etc.
For \code{type="contrib"}, will return a matrix of SHAP values with one row per observation in
\code{newdata} and columns corresponding to features. For regression, ranking, cross-entropy, and binary
classification objectives, this matrix contains one column per feature plus a final column containing the
Shapley base value. For multiclass objectives, this matrix will represent \code{num_classes} such matrices,
in the order "feature contributions for first class, feature contributions for second class, feature
contributions for third class, etc.".
} }
\description{ \description{
Predicted values based on class \code{lgb.Booster} Predicted values based on class \code{lgb.Booster}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment