% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lightgbm.R \name{lightgbm} \alias{lightgbm} \title{Train a LightGBM model} \usage{ lightgbm( data, label = NULL, weights = NULL, params = list(), nrounds = 100L, verbose = 1L, eval_freq = 1L, early_stopping_rounds = NULL, init_model = NULL, callbacks = list(), serializable = TRUE, objective = "regression", init_score = NULL, num_threads = NULL, ... ) } \arguments{ \item{data}{a \code{lgb.Dataset} object, used for training. Some functions, such as \code{\link{lgb.cv}}, may allow you to pass other types of data like \code{matrix} and then separately supply \code{label} as a keyword argument.} \item{label}{Vector of labels, used if \code{data} is not an \code{\link{lgb.Dataset}}} \item{weights}{Sample / observation weights for rows in the input data. If \code{NULL}, will assume that all observations / rows have the same importance / weight.} \item{params}{a list of parameters. See \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html}{ the "Parameters" section of the documentation} for a list of parameters and valid values.} \item{nrounds}{number of training rounds} \item{verbose}{verbosity for output, if <= 0, also will disable the print of evaluation during training} \item{eval_freq}{evaluation output frequency, only effect when verbose > 0} \item{early_stopping_rounds}{int. Activates early stopping. When this parameter is non-null, training will stop if the evaluation of any metric on any validation set fails to improve for \code{early_stopping_rounds} consecutive boosting rounds. If training stops early, the returned model will have attribute \code{best_iter} set to the iteration number of the best iteration.} \item{init_model}{path of model file of \code{lgb.Booster} object, will continue training from this model} \item{callbacks}{List of callback functions that are applied at each iteration.} \item{serializable}{whether to make the resulting objects serializable through functions such as \code{save} or \code{saveRDS} (see section "Model serialization").} \item{objective}{Optimization objective (e.g. `"regression"`, `"binary"`, etc.). For a list of accepted objectives, see \href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective}{ the "objective" item of the "Parameters" section of the documentation}.} \item{init_score}{initial score is the base prediction lightgbm will boost from} \item{num_threads}{Number of parallel threads to use. For best speed, this should be set to the number of physical cores in the CPU - in a typical x86-64 machine, this corresponds to half the number of maximum threads. Be aware that using too many threads can result in speed degradation in smaller datasets (see the parameters documentation for more details). If passing zero, will use the default number of threads configured for OpenMP (typically controlled through an environment variable \code{OMP_NUM_THREADS}). If passing \code{NULL} (the default), will try to use the number of physical cores in the system, but be aware that getting the number of cores detected correctly requires package \code{RhpcBLASctl} to be installed. This parameter gets overriden by \code{num_threads} and its aliases under \code{params} if passed there.} \item{...}{Additional arguments passed to \code{\link{lgb.train}}. For example \itemize{ \item{\code{valids}: a list of \code{lgb.Dataset} objects, used for validation} \item{\code{obj}: objective function, can be character or custom objective function. Examples include \code{regression}, \code{regression_l1}, \code{huber}, \code{binary}, \code{lambdarank}, \code{multiclass}, \code{multiclass}} \item{\code{eval}: evaluation function, can be (a list of) character or custom eval function} \item{\code{record}: Boolean, TRUE will record iteration message to \code{booster$record_evals}} \item{\code{colnames}: feature names, if not null, will use this to overwrite the names in dataset} \item{\code{categorical_feature}: categorical features. This can either be a character vector of feature names or an integer vector with the indices of the features (e.g. \code{c(1L, 10L)} to say "the first and tenth columns").} \item{\code{reset_data}: Boolean, setting it to TRUE (not the default value) will transform the booster model into a predictor model which frees up memory and the original datasets} }} } \value{ a trained \code{lgb.Booster} } \description{ Simple interface for training a LightGBM model. } \section{Early Stopping}{ "early stopping" refers to stopping the training process if the model's performance on a given validation set does not improve for several consecutive iterations. If multiple arguments are given to \code{eval}, their order will be preserved. If you enable early stopping by setting \code{early_stopping_rounds} in \code{params}, by default all metrics will be considered for early stopping. If you want to only consider the first metric for early stopping, pass \code{first_metric_only = TRUE} in \code{params}. Note that if you also specify \code{metric} in \code{params}, that metric will be considered the "first" one. If you omit \code{metric}, a default metric will be used based on your choice for the parameter \code{obj} (keyword argument) or \code{objective} (passed into \code{params}). }