Unverified Commit 057ba078 authored by Nikita Titov's avatar Nikita Titov Committed by GitHub
Browse files

[docs] document rounding behavior of floating point numbers in categorical features (#5009)

parent d31346f6
......@@ -25,6 +25,7 @@ Categorical Feature Support
- Categorical features must be encoded as non-negative integers (``int``) less than ``Int32.MaxValue`` (2147483647).
It is best to use a contiguous range of integers started from zero.
Floating point numbers in categorical features will be rounded towards 0.
- Use ``min_data_per_group``, ``cat_smooth`` to deal with over-fitting (when ``#data`` is small or ``#category`` is large).
......
......@@ -1159,6 +1159,7 @@ class Dataset:
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
params : dict or None, optional (default=None)
Other parameters for Dataset.
free_raw_data : bool, optional (default=True)
......@@ -3563,6 +3564,7 @@ class Booster:
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
dataset_params : dict or None, optional (default=None)
Other parameters for Dataset ``data``.
free_raw_data : bool, optional (default=True)
......
......@@ -109,6 +109,7 @@ def train(
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
keep_training_booster : bool, optional (default=False)
Whether the returned Booster will be used to keep training.
If False, the returned value will be converted into _InnerPredictor before returning.
......@@ -463,6 +464,7 @@ def cv(params, train_set, num_boost_round=100,
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
fpreproc : callable or None, optional (default=None)
Preprocessing function that takes (dtrain, dtest, params)
and returns transformed versions of those.
......
......@@ -262,6 +262,7 @@ _lgbmmodel_doc_fit = (
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
callbacks : list of callable, or None, optional (default=None)
List of callback functions that are applied at each iteration.
See Callbacks in Python API for more information.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment