% Generated by roxygen2: do not edit by hand % Please edit documentation in R/lgb.prepare_rules.R \name{lgb.prepare_rules} \alias{lgb.prepare_rules} \title{Data preparator for LightGBM datasets with rules (numeric)} \usage{ lgb.prepare_rules(data, rules = NULL) } \arguments{ \item{data}{A data.frame or data.table to prepare.} \item{rules}{A set of rules from the data preparator, if already used.} } \value{ A list with the cleaned dataset (\code{data}) and the rules (\code{rules}). The data must be converted to a matrix format (\code{as.matrix}) for input in \code{lgb.Dataset}. } \description{ Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}. Factors and characters are converted to numeric. In addition, keeps rules created so you can convert other datasets using this converter. } \examples{ data(iris) str(iris) new_iris <- lgb.prepare_rules(data = iris) # Autoconverter str(new_iris$data) data(iris) # Erase iris dataset iris$Species[1L] <- "NEW FACTOR" # Introduce junk factor (NA) # Use conversion using known rules # Unknown factors become 0, excellent for sparse datasets newer_iris <- lgb.prepare_rules(data = iris, rules = new_iris$rules) # Unknown factor is now zero, perfect for sparse datasets newer_iris$data[1L, ] # Species became 0 as it is an unknown factor newer_iris$data[1L, 5L] <- 1.0 # Put back real initial value # Is the newly created dataset equal? YES! all.equal(new_iris$data, newer_iris$data) # Can we test our own rules? data(iris) # Erase iris dataset # We remapped values differently personal_rules <- list( Species = c( "setosa" = 3L , "versicolor" = 2L , "virginica" = 1L ) ) newest_iris <- lgb.prepare_rules(data = iris, rules = personal_rules) str(newest_iris$data) # SUCCESS! }