Commit f2afb2cd authored by James Lamb's avatar James Lamb Committed by Nikita Titov
Browse files

[R-package][docs] made roxygen2 tags explicit and cleaned up documentation (#2688)



* [R-package] made roxygen2 tags explicit and cleaned up documentation

* Apply suggestions from code review
Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>

* Apply suggestions from code review
Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>

* Update R-package/man/lightgbm.Rd
Co-Authored-By: default avatarNikita Titov <nekit94-08@mail.ru>

* [R-package] moved @name to the top of roxygen blocks and removed some inaccurate information in documentation on parameters
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
parent c7ae833e
......@@ -11,15 +11,13 @@ data(agaricus.test)
}
\description{
This data set is originally from the Mushroom data set,
UCI Machine Learning Repository.
}
\details{
This data set includes the following fields:
UCI Machine Learning Repository.
This data set includes the following fields:
\itemize{
\item \code{label} the label for each record
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
}
\itemize{
\item{\code{label}: the label for each record}
\item{\code{data}: a sparse Matrix of \code{dgCMatrix} class, with 126 columns.}
}
}
\references{
https://archive.ics.uci.edu/ml/datasets/Mushroom
......
......@@ -11,15 +11,13 @@ data(agaricus.train)
}
\description{
This data set is originally from the Mushroom data set,
UCI Machine Learning Repository.
}
\details{
This data set includes the following fields:
UCI Machine Learning Repository.
This data set includes the following fields:
\itemize{
\item \code{label} the label for each record
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
}
\itemize{
\item{\code{label}: the label for each record}
\item{\code{data}: a sparse Matrix of \code{dgCMatrix} class, with 126 columns.}
}
}
\references{
https://archive.ics.uci.edu/ml/datasets/Mushroom
......
......@@ -10,11 +10,10 @@ data(bank)
}
\description{
This data set is originally from the Bank Marketing data set,
UCI Machine Learning Repository.
}
\details{
It contains only the following: bank.csv with 10% of the examples and 17 inputs,
randomly selected from 3 (older version of this dataset with less inputs).
UCI Machine Learning Repository.
It contains only the following: bank.csv with 10% of the examples and 17 inputs,
randomly selected from 3 (older version of this dataset with less inputs).
}
\references{
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
......
......@@ -17,7 +17,7 @@ and the second one is column names}
}
\description{
Only column names are supported for \code{lgb.Dataset}, thus setting of
row names would have no effect and returned row names would be NULL.
row names would have no effect and returned row names would be NULL.
}
\details{
Generic \code{dimnames} methods are used by \code{colnames}.
......
......@@ -20,7 +20,7 @@ getinfo(dataset, ...)
info data
}
\description{
Get information of an \code{lgb.Dataset} object
Get one attribute of a \code{lgb.Dataset}
}
\details{
The \code{name} field can be one of the following:
......
......@@ -37,7 +37,7 @@ constructed dataset
}
\description{
Construct \code{lgb.Dataset} object from dense matrix, sparse matrix
or local file (that was created previously by saving an \code{lgb.Dataset}).
or local file (that was created previously by saving an \code{lgb.Dataset}).
}
\examples{
library(lightgbm)
......
......@@ -16,7 +16,7 @@ passed dataset
}
\description{
Please note that \code{init_score} is not saved in binary file.
If you need it, please set it again after loading Dataset.
If you need it, please set it again after loading Dataset.
}
\examples{
library(lightgbm)
......@@ -24,5 +24,4 @@ data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
lgb.Dataset.save(dtrain, "data.bin")
}
......@@ -9,13 +9,16 @@ lgb.Dataset.set.categorical(dataset, categorical_feature)
\arguments{
\item{dataset}{object of class \code{lgb.Dataset}}
\item{categorical_feature}{categorical features}
\item{categorical_feature}{categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").}
}
\value{
passed dataset
}
\description{
Set categorical feature of \code{lgb.Dataset}
Set the categorical features of an \code{lgb.Dataset} object. Use this function
to tell LightGBM which features should be treated as categorical.
}
\examples{
library(lightgbm)
......
......@@ -66,9 +66,9 @@ the \code{nfold} and \code{stratified} parameters are ignored.}
\item{colnames}{feature names, if not null, will use this to overwrite the names in dataset}
\item{categorical_feature}{list of str or int
type int represents index,
type str represents feature names}
\item{categorical_feature}{categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
\code{c(1L, 10L)} to say "the first and tenth columns").}
\item{early_stopping_rounds}{int. Activates early stopping. Requires at least one validation data
and one metric. If there's more than one, will check all of them
......@@ -82,11 +82,11 @@ into a predictor model which frees up memory and the original datasets}
\item{...}{other parameters, see Parameters.rst for more information. A few key parameters:
\itemize{
\item{boosting}{Boosting type. \code{"gbdt"} or \code{"dart"}}
\item{num_leaves}{number of leaves in one tree. defaults to 127}
\item{max_depth}{Limit the max depth for tree model. This is used to deal with
\item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.}
\item{\code{num_leaves}: Maximum number of leaves in one tree.}
\item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with
overfit when #data is small. Tree still grow by leaf-wise.}
\item{num_threads}{Number of threads for LightGBM. For the best speed, set this to
\item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to
the number of real CPU cores, not the number of threads (most
CPU using hyper-threading to generate 2 threads per CPU core).}
}}
......
......@@ -14,10 +14,10 @@ lgb.importance(model, percentage = TRUE)
\value{
For a tree model, a \code{data.table} with the following columns:
\itemize{
\item \code{Feature} Feature names in the model.
\item \code{Gain} The total gain of this feature's splits.
\item \code{Cover} The number of observation related to this feature.
\item \code{Frequency} The number of times a feature splited in trees.
\item{\code{Feature}: Feature names in the model.}
\item{\code{Gain}: The total gain of this feature's splits.}
\item{\code{Cover}: The number of observation related to this feature.}
\item{\code{Frequency}: The number of times a feature splited in trees.}
}
}
\description{
......
......@@ -19,8 +19,8 @@ lgb.interprete(model, data, idxset, num_iteration = NULL)
For regression, binary classification and lambdarank model, a \code{list} of \code{data.table}
with the following columns:
\itemize{
\item \code{Feature} Feature names in the model.
\item \code{Contribution} The total contribution of this feature's splits.
\item{\code{Feature}: Feature names in the model.}
\item{\code{Contribution}: The total contribution of this feature's splits.}
}
For multiclass classification, a \code{list} of \code{data.table} with the Feature column and
Contribution columns to each class.
......
......@@ -15,9 +15,8 @@ lgb.load(filename = NULL, model_str = NULL)
lgb.Booster
}
\description{
Load LightGBM model from saved model file or string
Load LightGBM takes in either a file path or model string
If both are provided, Load will default to loading from file
Load LightGBM takes in either a file path or model string.
If both are provided, Load will default to loading from file
}
\examples{
library(lightgbm)
......
......@@ -18,21 +18,21 @@ A \code{data.table} with detailed information about model trees' nodes and leafs
The columns of the \code{data.table} are:
\itemize{
\item \code{tree_index}: ID of a tree in a model (integer)
\item \code{split_index}: ID of a node in a tree (integer)
\item \code{split_feature}: for a node, it's a feature name (character);
for a leaf, it simply labels it as \code{"NA"}
\item \code{node_parent}: ID of the parent node for current node (integer)
\item \code{leaf_index}: ID of a leaf in a tree (integer)
\item \code{leaf_parent}: ID of the parent node for current leaf (integer)
\item \code{split_gain}: Split gain of a node
\item \code{threshold}: Splitting threshold value of a node
\item \code{decision_type}: Decision type of a node
\item \code{default_left}: Determine how to handle NA value, TRUE -> Left, FALSE -> Right
\item \code{internal_value}: Node value
\item \code{internal_count}: The number of observation collected by a node
\item \code{leaf_value}: Leaf value
\item \code{leaf_count}: The number of observation collected by a leaf
\item{\code{tree_index}: ID of a tree in a model (integer)}
\item{\code{split_index}: ID of a node in a tree (integer)}
\item{\code{split_feature}: for a node, it's a feature name (character);
for a leaf, it simply labels it as \code{"NA"}}
\item{\code{node_parent}: ID of the parent node for current node (integer)}
\item{\code{leaf_index}: ID of a leaf in a tree (integer)}
\item{\code{leaf_parent}: ID of the parent node for current leaf (integer)}
\item{\code{split_gain}: Split gain of a node}
\item{\code{threshold}: Splitting threshold value of a node}
\item{\code{decision_type}: Decision type of a node}
\item{\code{default_left}: Determine how to handle NA value, TRUE -> Left, FALSE -> Right}
\item{\code{internal_value}: Node value}
\item{\code{internal_count}: The number of observation collected by a node}
\item{\code{leaf_value}: Leaf value}
\item{\code{leaf_count}: The number of observation collected by a leaf}
}
}
\description{
......
......@@ -15,8 +15,8 @@ The cleaned dataset. It must be converted to a matrix format (\code{as.matrix})
}
\description{
Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}.
Factors and characters are converted to numeric without integers. Please use
\code{lgb.prepare_rules} if you want to apply this transformation to other datasets.
Factors and characters are converted to numeric without integers. Please use
\code{\link{lgb.prepare_rules}} if you want to apply this transformation to other datasets.
}
\examples{
library(lightgbm)
......
......@@ -15,11 +15,11 @@ The cleaned dataset. It must be converted to a matrix format (\code{as.matrix})
}
\description{
Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}.
Factors and characters are converted to numeric (specifically: integer).
Please use \code{lgb.prepare_rules2} if you want to apply this transformation to other datasets.
This is useful if you have a specific need for integer dataset instead of numeric dataset.
Note that there are programs which do not support integer-only input. Consider this as a half
memory technique which is dangerous, especially for LightGBM.
Factors and characters are converted to numeric (specifically: integer).
Please use \code{\link{lgb.prepare_rules2}} if you want to apply this transformation to
other datasets. This is useful if you have a specific need for integer dataset instead
of numeric dataset. Note that there are programs which do not support integer-only
input. Consider this as a half memory technique which is dangerous, especially for LightGBM.
}
\examples{
library(lightgbm)
......
......@@ -18,8 +18,8 @@ A list with the cleaned dataset (\code{data}) and the rules (\code{rules}).
}
\description{
Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}.
Factors and characters are converted to numeric. In addition, keeps rules created
so you can convert other datasets using this converter.
Factors and characters are converted to numeric. In addition, keeps rules created
so you can convert other datasets using this converter.
}
\examples{
library(lightgbm)
......
......@@ -18,11 +18,11 @@ A list with the cleaned dataset (\code{data}) and the rules (\code{rules}).
}
\description{
Attempts to prepare a clean dataset to prepare to put in a \code{lgb.Dataset}.
Factors and characters are converted to numeric (specifically: integer).
In addition, keeps rules created so you can convert other datasets using this converter.
This is useful if you have a specific need for integer dataset instead of numeric dataset.
Note that there are programs which do not support integer-only input.
Consider this as a half memory technique which is dangerous, especially for LightGBM.
Factors and characters are converted to numeric (specifically: integer).
In addition, keeps rules created so you can convert other datasets using this converter.
This is useful if you have a specific need for integer dataset instead of numeric dataset.
Note that there are programs which do not support integer-only input.
Consider this as a half memory technique which is dangerous, especially for LightGBM.
}
\examples{
library(lightgbm)
......
......@@ -39,5 +39,4 @@ model <- lgb.train(
, early_stopping_rounds = 5L
)
lgb.save(model, "model.txt")
}
......@@ -65,11 +65,11 @@ original datasets}
\item{...}{other parameters, see Parameters.rst for more information. A few key parameters:
\itemize{
\item{boosting}{Boosting type. \code{"gbdt"} or \code{"dart"}}
\item{num_leaves}{number of leaves in one tree. defaults to 127}
\item{max_depth}{Limit the max depth for tree model. This is used to deal with
\item{\code{boosting}: Boosting type. \code{"gbdt"}, \code{"rf"}, \code{"dart"} or \code{"goss"}.}
\item{\code{num_leaves}: Maximum number of leaves in one tree.}
\item{\code{max_depth}: Limit the max depth for tree model. This is used to deal with
overfit when #data is small. Tree still grow by leaf-wise.}
\item{num_threads}{Number of threads for LightGBM. For the best speed, set this to
\item{\code{num_threads}: Number of threads for LightGBM. For the best speed, set this to
the number of real CPU cores, not the number of threads (most
CPU using hyper-threading to generate 2 threads per CPU core).}
}}
......
......@@ -21,9 +21,9 @@ environment. Defaults to \code{FALSE} which means to not remove them.}
NULL invisibly.
}
\description{
Attempts to unload LightGBM packages so you can remove objects cleanly without having to restart R.
This is useful for instance if an object becomes stuck for no apparent reason and you do not want
to restart R to fix the lost object.
Attempts to unload LightGBM packages so you can remove objects cleanly without
having to restart R. This is useful for instance if an object becomes stuck for no
apparent reason and you do not want to restart R to fix the lost object.
}
\examples{
library(lightgbm)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment