predict.lgb.Booster.Rd 7.68 KB
Newer Older
Guolin Ke's avatar
Guolin Ke committed
1
2
3
4
5
6
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lgb.Booster.R
\name{predict.lgb.Booster}
\alias{predict.lgb.Booster}
\title{Predict method for LightGBM model}
\usage{
7
8
\method{predict}{lgb.Booster}(
  object,
9
  newdata,
10
  type = "response",
11
  start_iteration = NULL,
12
13
  num_iteration = NULL,
  header = FALSE,
14
  params = list(),
15
16
  ...
)
Guolin Ke's avatar
Guolin Ke committed
17
18
19
20
}
\arguments{
\item{object}{Object of class \code{lgb.Booster}}

21
22
23
24
25
26
27
28
29
30
\item{newdata}{a \code{matrix} object, a \code{dgCMatrix}, a \code{dgRMatrix} object, a \code{dsparseVector} object,
               or a character representing a path to a text file (CSV, TSV, or LibSVM).

               For sparse inputs, if predictions are only going to be made for a single row, it will be faster to
               use CSR format, in which case the data may be passed as either a single-row CSR matrix (class
               \code{dgRMatrix} from package \code{Matrix}) or as a sparse numeric vector (class
               \code{dsparseVector} from package \code{Matrix}).

               If single-row predictions are going to be performed frequently, it is recommended to
               pre-configure the model object for fast single-row sparse predictions through function
31
32
33
               \link{lgb.configure_fast_predict}.

               \emph{Changed from 'data', in version 4.0.0}}
Guolin Ke's avatar
Guolin Ke committed
34

35
36
37
38
39
\item{type}{Type of prediction to output. Allowed types are:\itemize{
            \item \code{"response"}: will output the predicted score according to the objective function being
                  optimized (depending on the link function that the objective uses), after applying any necessary
                  transformations - for example, for \code{objective="binary"}, it will output class probabilities.
            \item \code{"class"}: for classification objectives, will output the class with the highest predicted
40
41
42
                  probability. For other objectives, will output the same as "response". Note that \code{"class"} is
                  not a supported type for \link{lgb.configure_fast_predict} (see the documentation of that function
                  for more details).
43
44
45
46
47
48
49
            \item \code{"raw"}: will output the non-transformed numbers (sum of predictions from boosting iterations'
                  results) from which the "response" number is produced for a given objective function - for example,
                  for \code{objective="binary"}, this corresponds to log-odds. For many objectives such as
                  "regression", since no transformation is applied, the output will be the same as for "response".
            \item \code{"leaf"}: will output the index of the terminal node / leaf at which each observations falls
                  in each tree in the model, outputted as integers, with one column per tree.
            \item \code{"contrib"}: will return the per-feature contributions for each prediction, including an
50
                  intercept (each feature will produce one column).
51
52
53
            }

            Note that, if using custom objectives, types "class" and "response" will not be available and will
54
55
56
57
58
            default towards using "raw" instead.

            If the model was fit through function \link{lightgbm} and it was passed a factor as labels,
            passing the prediction type through \code{params} instead of through this argument might
            result in factor levels for classification objectives not being applied correctly to the
59
60
61
            resulting output.

            \emph{New in version 4.0.0}}
62

63
64
65
66
67
68
69
70
71
\item{start_iteration}{int or None, optional (default=None)
Start index of the iteration to predict.
If None or <= 0, starts from the first iteration.}

\item{num_iteration}{int or None, optional (default=None)
Limit number of iterations in the prediction.
If None, if the best iteration exists and start_iteration is None or <= 0, the
best iteration is used; otherwise, all iterations from start_iteration are used.
If <= 0, all iterations from start_iteration are used (no limits).}
Guolin Ke's avatar
Guolin Ke committed
72
73
74

\item{header}{only used for prediction for text file. True if text file has header}

75
76
77
\item{params}{a list of additional named parameters. See
\href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#predict-parameters}{
the "Predict Parameters" section of the documentation} for a list of parameters and
78
79
valid values. Where these conflict with the values of keyword arguments to this function,
the values in \code{params} take precedence.}
80

81
\item{...}{ignored}
Guolin Ke's avatar
Guolin Ke committed
82
83
}
\value{
84
For prediction types that are meant to always return one output per observation (e.g. when predicting
85
86
        \code{type="response"} or \code{type="raw"} on a binary classification or regression objective), will
        return a vector with one element per row in \code{newdata}.
Guolin Ke's avatar
Guolin Ke committed
87

88
        For prediction types that are meant to return more than one output per observation (e.g. when predicting
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
        \code{type="response"} or \code{type="raw"} on a multi-class objective, or when predicting
        \code{type="leaf"}, regardless of objective), will return a matrix with one row per observation in
        \code{newdata} and one column per output.

        For \code{type="leaf"} predictions, will return a matrix with one row per observation in \code{newdata}
        and one column per tree. Note that for multiclass objectives, LightGBM trains one tree per class at each
        boosting iteration. That means that, for example, for a multiclass model with 3 classes, the leaf
        predictions for the first class can be found in columns 1, 4, 7, 10, etc.

        For \code{type="contrib"}, will return a matrix of SHAP values with one row per observation in
        \code{newdata} and columns corresponding to features. For regression, ranking, cross-entropy, and binary
        classification objectives, this matrix contains one column per feature plus a final column containing the
        Shapley base value. For multiclass objectives, this matrix will represent \code{num_classes} such matrices,
        in the order "feature contributions for first class, feature contributions for second class, feature
        contributions for third class, etc.".
104
105
106
107
108
109

        If the model was fit through function \link{lightgbm} and it was passed a factor as labels, predictions
        returned from this function will retain the factor levels (either as values for \code{type="class"}, or
        as column names for \code{type="response"} and \code{type="raw"} for multi-class objectives). Note that
        passing the requested prediction type under \code{params} instead of through \code{type} might result in
        the factor levels not being present in the output.
Guolin Ke's avatar
Guolin Ke committed
110
111
112
}
\description{
Predicted values based on class \code{lgb.Booster}
113
114

             \emph{New in version 4.0.0}
Guolin Ke's avatar
Guolin Ke committed
115
}
116
117
118
119
120
121
\details{
If the model object has been configured for fast single-row predictions through
         \link{lgb.configure_fast_predict}, this function will use the prediction parameters
         that were configured for it - as such, extra prediction parameters should not be passed
         here, otherwise the configuration will be ignored and the slow route will be taken.
}
Guolin Ke's avatar
Guolin Ke committed
122
\examples{
123
\donttest{
Guolin Ke's avatar
Guolin Ke committed
124
125
126
127
128
129
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
130
131
132
133
134
135
params <- list(
  objective = "regression"
  , metric = "l2"
  , min_data = 1L
  , learning_rate = 1.0
)
Guolin Ke's avatar
Guolin Ke committed
136
valids <- list(test = dtest)
137
138
139
model <- lgb.train(
  params = params
  , data = dtrain
140
  , nrounds = 5L
141
142
  , valids = valids
)
Guolin Ke's avatar
Guolin Ke committed
143
preds <- predict(model, test$data)
144
145

# pass other prediction parameters
146
preds <- predict(
147
148
149
150
151
152
    model,
    test$data,
    params = list(
        predict_disable_shape_check = TRUE
   )
)
Guolin Ke's avatar
Guolin Ke committed
153
}
154
}