Unverified Commit 057ba078 authored by Nikita Titov's avatar Nikita Titov Committed by GitHub
Browse files

[docs] document rounding behavior of floating point numbers in categorical features (#5009)

parent d31346f6
...@@ -25,6 +25,7 @@ Categorical Feature Support ...@@ -25,6 +25,7 @@ Categorical Feature Support
- Categorical features must be encoded as non-negative integers (``int``) less than ``Int32.MaxValue`` (2147483647). - Categorical features must be encoded as non-negative integers (``int``) less than ``Int32.MaxValue`` (2147483647).
It is best to use a contiguous range of integers started from zero. It is best to use a contiguous range of integers started from zero.
Floating point numbers in categorical features will be rounded towards 0.
- Use ``min_data_per_group``, ``cat_smooth`` to deal with over-fitting (when ``#data`` is small or ``#category`` is large). - Use ``min_data_per_group``, ``cat_smooth`` to deal with over-fitting (when ``#data`` is small or ``#category`` is large).
......
...@@ -1159,6 +1159,7 @@ class Dataset: ...@@ -1159,6 +1159,7 @@ class Dataset:
Large values could be memory consuming. Consider using consecutive integers starting from zero. Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values. All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature. The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
params : dict or None, optional (default=None) params : dict or None, optional (default=None)
Other parameters for Dataset. Other parameters for Dataset.
free_raw_data : bool, optional (default=True) free_raw_data : bool, optional (default=True)
...@@ -3563,6 +3564,7 @@ class Booster: ...@@ -3563,6 +3564,7 @@ class Booster:
Large values could be memory consuming. Consider using consecutive integers starting from zero. Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values. All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature. The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
dataset_params : dict or None, optional (default=None) dataset_params : dict or None, optional (default=None)
Other parameters for Dataset ``data``. Other parameters for Dataset ``data``.
free_raw_data : bool, optional (default=True) free_raw_data : bool, optional (default=True)
......
...@@ -109,6 +109,7 @@ def train( ...@@ -109,6 +109,7 @@ def train(
Large values could be memory consuming. Consider using consecutive integers starting from zero. Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values. All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature. The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
keep_training_booster : bool, optional (default=False) keep_training_booster : bool, optional (default=False)
Whether the returned Booster will be used to keep training. Whether the returned Booster will be used to keep training.
If False, the returned value will be converted into _InnerPredictor before returning. If False, the returned value will be converted into _InnerPredictor before returning.
...@@ -463,6 +464,7 @@ def cv(params, train_set, num_boost_round=100, ...@@ -463,6 +464,7 @@ def cv(params, train_set, num_boost_round=100,
Large values could be memory consuming. Consider using consecutive integers starting from zero. Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values. All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature. The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
fpreproc : callable or None, optional (default=None) fpreproc : callable or None, optional (default=None)
Preprocessing function that takes (dtrain, dtest, params) Preprocessing function that takes (dtrain, dtest, params)
and returns transformed versions of those. and returns transformed versions of those.
......
...@@ -262,6 +262,7 @@ _lgbmmodel_doc_fit = ( ...@@ -262,6 +262,7 @@ _lgbmmodel_doc_fit = (
Large values could be memory consuming. Consider using consecutive integers starting from zero. Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values. All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature. The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
callbacks : list of callable, or None, optional (default=None) callbacks : list of callable, or None, optional (default=None)
List of callback functions that are applied at each iteration. List of callback functions that are applied at each iteration.
See Callbacks in Python API for more information. See Callbacks in Python API for more information.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment