lgb.Dataset.Rd 2.82 KB
Newer Older
Guolin Ke's avatar
Guolin Ke committed
1
2
3
4
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lgb.Dataset.R
\name{lgb.Dataset}
\alias{lgb.Dataset}
Nikita Titov's avatar
Nikita Titov committed
5
\title{Construct \code{lgb.Dataset} object}
Guolin Ke's avatar
Guolin Ke committed
6
\usage{
7
8
9
10
11
12
13
lgb.Dataset(
  data,
  params = list(),
  reference = NULL,
  colnames = NULL,
  categorical_feature = NULL,
  free_raw_data = TRUE,
14
15
16
  label = NULL,
  weight = NULL,
  group = NULL,
17
  init_score = NULL
18
)
Guolin Ke's avatar
Guolin Ke committed
19
20
}
\arguments{
21
22
23
\item{data}{a \code{matrix} object, a \code{dgCMatrix} object,
a character representing a path to a text file (CSV, TSV, or LibSVM),
or a character representing a path to a binary \code{lgb.Dataset} file}
Guolin Ke's avatar
Guolin Ke committed
24

25
26
27
28
\item{params}{a list of parameters. See
\href{https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters}{
The "Dataset Parameters" section of the documentation} for a list of parameters
and valid values.}
Guolin Ke's avatar
Guolin Ke committed
29

30
31
32
\item{reference}{reference dataset. When LightGBM creates a Dataset, it does some preprocessing like binning
continuous features into histograms. If you want to apply the same bin boundaries from an existing
dataset to new \code{data}, pass that existing Dataset to this argument.}
Guolin Ke's avatar
Guolin Ke committed
33
34
35

\item{colnames}{names of columns}

36
37
38
\item{categorical_feature}{categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").}
Guolin Ke's avatar
Guolin Ke committed
39

40
41
42
43
44
\item{free_raw_data}{LightGBM constructs its data format, called a "Dataset", from tabular data.
By default, that Dataset object on the R side does not keep a copy of the raw data.
This reduces LightGBM's memory consumption, but it means that the Dataset object
cannot be changed after it has been constructed. If you'd prefer to be able to
change the Dataset object after construction, set \code{free_raw_data = FALSE}.}
Guolin Ke's avatar
Guolin Ke committed
45

46
47
48
49
50
51
52
53
54
55
56
57
\item{label}{vector of labels to use as the target variable}

\item{weight}{numeric vector of sample weights}

\item{group}{used for learning-to-rank tasks. An integer vector describing how to
group rows together as ordered results from the same set of candidate results
to be ranked. For example, if you have a 100-document dataset with
\code{group = c(10, 20, 40, 10, 10, 10)}, that means that you have 6 groups,
where the first 10 records are in the first group, records 11-30 are in the
second group, etc.}

\item{init_score}{initial score is the base prediction lightgbm will boost from}
Guolin Ke's avatar
Guolin Ke committed
58
59
60
61
62
}
\value{
constructed dataset
}
\description{
Nikita Titov's avatar
Nikita Titov committed
63
Construct \code{lgb.Dataset} object from dense matrix, sparse matrix
64
             or local file (that was created previously by saving an \code{lgb.Dataset}).
Guolin Ke's avatar
Guolin Ke committed
65
66
}
\examples{
67
\donttest{
Guolin Ke's avatar
Guolin Ke committed
68
69
70
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
71
72
73
data_file <- tempfile(fileext = ".data")
lgb.Dataset.save(dtrain, data_file)
dtrain <- lgb.Dataset(data_file)
Guolin Ke's avatar
Guolin Ke committed
74
lgb.Dataset.construct(dtrain)
75
}
Guolin Ke's avatar
Guolin Ke committed
76
}