lightgbm.Rd 7.18 KB
Newer Older
James Lamb's avatar
James Lamb committed
1
2
3
4
5
6
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lightgbm.R
\name{lightgbm}
\alias{lightgbm}
\title{Train a LightGBM model}
\usage{
7
8
9
lightgbm(
  data,
  label = NULL,
10
  weights = NULL,
11
  params = list(),
12
  nrounds = 100L,
13
  verbose = 1L,
14
15
16
17
  eval_freq = 1L,
  early_stopping_rounds = NULL,
  init_model = NULL,
  callbacks = list(),
18
  serializable = TRUE,
19
  objective = "auto",
20
  init_score = NULL,
21
  num_threads = NULL,
22
23
  colnames = NULL,
  categorical_feature = NULL,
24
25
  ...
)
James Lamb's avatar
James Lamb committed
26
27
}
\arguments{
28
29
30
\item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}},
may allow you to pass other types of data like \code{matrix} and then separately supply
\code{label} as a keyword argument.}
James Lamb's avatar
James Lamb committed
31
32
33

\item{label}{Vector of labels, used if \code{data} is not an \code{\link{lgb.Dataset}}}

34
\item{weights}{Sample / observation weights for rows in the input data. If \code{NULL}, will assume that all
35
36
37
               observations / rows have the same importance / weight.

               \emph{Changed from 'weight', in version 4.0.0}}
James Lamb's avatar
James Lamb committed
38

39
40
\item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{
the "Parameters" section of the documentation} for a list of parameters and valid values.}
James Lamb's avatar
James Lamb committed
41
42
43

\item{nrounds}{number of training rounds}

44
45
\item{verbose}{verbosity for output, if <= 0 and \code{valids} has been provided, also will disable the
printing of evaluation during training}
James Lamb's avatar
James Lamb committed
46

47
\item{eval_freq}{evaluation output frequency, only effective when verbose > 0 and \code{valids} has been provided}
James Lamb's avatar
James Lamb committed
48

49
50
51
52
53
\item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for \code{early_stopping_rounds} consecutive boosting rounds.
If training stops early, the returned model will have attribute \code{best_iter}
set to the iteration number of the best iteration.}
James Lamb's avatar
James Lamb committed
54

55
\item{init_model}{path of model file or \code{lgb.Booster} object, will continue training from this model}
James Lamb's avatar
James Lamb committed
56

57
\item{callbacks}{List of callback functions that are applied at each iteration.}
James Lamb's avatar
James Lamb committed
58

59
60
61
\item{serializable}{whether to make the resulting objects serializable through functions such as
\code{save} or \code{saveRDS} (see section "Model serialization").}

62
\item{objective}{Optimization objective (e.g. `"regression"`, `"binary"`, etc.).
63
64
65
66
67
68
69
70
71
72
                 For a list of accepted objectives, see
                 \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective}{
                 the "objective" item of the "Parameters" section of the documentation}.

                 If passing \code{"auto"} and \code{data} is not of type \code{lgb.Dataset}, the objective will
                 be determined according to what is passed for \code{label}:\itemize{
                 \item If passing a factor with two variables, will use objective \code{"binary"}.
                 \item If passing a factor with more than two variables, will use objective \code{"multiclass"}
                 (note that parameter \code{num_class} in this case will also be determined automatically from
                 \code{label}).
73
                 \item Otherwise (or if passing \code{lgb.Dataset} as input), will use objective \code{"regression"}.
74
75
76
                 }

                 \emph{New in version 4.0.0}}
77

78
79
80
\item{init_score}{initial score is the base prediction lightgbm will boost from

                  \emph{New in version 4.0.0}}
81

82
83
84
85
86
87
88
89
90
91
92
93
94
95
\item{num_threads}{Number of parallel threads to use. For best speed, this should be set to the number of
                   physical cores in the CPU - in a typical x86-64 machine, this corresponds to half the
                   number of maximum threads.

                   Be aware that using too many threads can result in speed degradation in smaller datasets
                   (see the parameters documentation for more details).

                   If passing zero, will use the default number of threads configured for OpenMP
                   (typically controlled through an environment variable \code{OMP_NUM_THREADS}).

                   If passing \code{NULL} (the default), will try to use the number of physical cores in the
                   system, but be aware that getting the number of cores detected correctly requires package
                   \code{RhpcBLASctl} to be installed.

96
                   This parameter gets overridden by \code{num_threads} and its aliases under \code{params}
97
98
99
                   if passed there.

                   \emph{New in version 4.0.0}}
100

101
102
103
104
105
106
107
\item{colnames}{Character vector of features. Only used if \code{data} is not an \code{\link{lgb.Dataset}}.}

\item{categorical_feature}{categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").
Only used if \code{data} is not an \code{\link{lgb.Dataset}}.}

James Lamb's avatar
James Lamb committed
108
109
\item{...}{Additional arguments passed to \code{\link{lgb.train}}. For example
\itemize{
110
111
   \item{\code{valids}: a list of \code{lgb.Dataset} objects, used for validation}
   \item{\code{obj}: objective function, can be character or custom objective function. Examples include
James Lamb's avatar
James Lamb committed
112
113
              \code{regression}, \code{regression_l1}, \code{huber},
               \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}}
114
115
116
   \item{\code{eval}: evaluation function, can be (a list of) character or custom eval function}
   \item{\code{record}: Boolean, TRUE will record iteration message to \code{booster$record_evals}}
   \item{\code{reset_data}: Boolean, setting it to TRUE (not the default value) will transform the booster model
James Lamb's avatar
James Lamb committed
117
118
119
                     into a predictor model which frees up memory and the original datasets}
}}
}
120
121
122
\value{
a trained \code{lgb.Booster}
}
James Lamb's avatar
James Lamb committed
123
\description{
124
125
126
127
128
129
High-level R interface to train a LightGBM model. Unlike \code{\link{lgb.train}}, this function
             is focused on compatibility with other statistics and machine learning interfaces in R.
             This focus on compatibility means that this interface may experience more frequent breaking API changes
             than \code{\link{lgb.train}}.
             For efficiency-sensitive applications, or for applications where breaking API changes across releases
             is very expensive, use \code{\link{lgb.train}}.
James Lamb's avatar
James Lamb committed
130
}
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
\section{Early Stopping}{


         "early stopping" refers to stopping the training process if the model's performance on a given
         validation set does not improve for several consecutive iterations.

         If multiple arguments are given to \code{eval}, their order will be preserved. If you enable
         early stopping by setting \code{early_stopping_rounds} in \code{params}, by default all
         metrics will be considered for early stopping.

         If you want to only consider the first metric for early stopping, pass
         \code{first_metric_only = TRUE} in \code{params}. Note that if you also specify \code{metric}
         in \code{params}, that metric will be considered the "first" one. If you omit \code{metric},
         a default metric will be used based on your choice for the parameter \code{obj} (keyword argument)
         or \code{objective} (passed into \code{params}).
}