Unverified Commit 6ced58ad authored by Miguel Trejo Marrufo's avatar Miguel Trejo Marrufo Committed by GitHub
Browse files

[Docs] Weights non-negative for train data (#5013)

* docs: weight parameter non-negative

* docs: weights non negative only for train data

* docs: weights should be non negative for validation data

* typo in html render

* docs: brief weights non-negative description
parent d670a4d6
......@@ -780,6 +780,8 @@ Dataset Parameters
- **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``
- **Note**: weights should be non-negative
- ``group_column`` :raw-html:`<a id="group_column" title="Permalink to this parameter" href="#group_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = int or string, aliases: ``group``, ``group_id``, ``query_column``, ``query``, ``query_id``
- used to specify the query/group id column
......@@ -1275,7 +1277,8 @@ LightGBM supports weighted training. It uses an additional file to store weight
0.8
...
It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on.
It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on. Weights should be non-negative.
The weight file corresponds with data file line by line, and has per weight per line.
And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file.
......
......@@ -670,6 +670,7 @@ struct Config {
// desc = add a prefix ``name:`` for column name, e.g. ``weight=name:weight``
// desc = **Note**: works only in case of loading data directly from text file
// desc = **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``
// desc = **Note**: weights should be non-negative
std::string weight_column = "";
// type = int or string
......
......@@ -1138,7 +1138,7 @@ class Dataset:
reference : Dataset or None, optional (default=None)
If this is Dataset for validation, training data should be used as reference.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each instance.
Weight for each instance. Weights should be non-negative.
group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query data.
Only used in the learning-to-rank task.
......@@ -1818,7 +1818,7 @@ class Dataset:
label : list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)
Label of the data.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each instance.
Weight for each instance. Weights should be non-negative.
group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query data.
Only used in the learning-to-rank task.
......@@ -2154,7 +2154,7 @@ class Dataset:
Parameters
----------
weight : list, numpy 1-D array, pandas Series or None
Weight to be set for each data point.
Weight to be set for each data point. Weights should be non-negative.
Returns
-------
......@@ -2269,7 +2269,7 @@ class Dataset:
Returns
-------
weight : numpy array or None
Weight for each data point from the Dataset.
Weight for each data point from the Dataset. Weights should be non-negative.
"""
if self.weight is None:
self.weight = self.get_field('weight')
......@@ -3543,8 +3543,7 @@ class Booster:
reference : Dataset or None, optional (default=None)
Reference for ``data``.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each ``data`` instance. Weight should be non-negative values because the Hessian
value multiplied by weight is supposed to be non-negative.
Weight for each ``data`` instance. Weights should be non-negative.
group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query size for ``data``.
Only used in the learning-to-rank task.
......
......@@ -424,7 +424,7 @@ def _train(
model_factory : lightgbm.LGBMClassifier, lightgbm.LGBMRegressor, or lightgbm.LGBMRanker class
Class of the local underlying model.
sample_weight : Dask Array or Dask Series of shape = [n_samples] or None, optional (default=None)
Weights of training data.
Weights of training data. Weights should be non-negative.
init_score : Dask Array or Dask Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task), or Dask Array or Dask DataFrame of shape = [n_samples, n_classes] (for multi-class task), or None, optional (default=None)
Init score of training data.
group : Dask Array or Dask Series or None, optional (default=None)
......@@ -441,7 +441,7 @@ def _train(
eval_names : list of str, or None, optional (default=None)
Names of eval_set.
eval_sample_weight : list of Dask Array or Dask Series, or None, optional (default=None)
Weights for each validation set in eval_set.
Weights for each validation set in eval_set. Weights should be non-negative.
eval_class_weight : list of dict or str, or None, optional (default=None)
Class weights, one dict or str for each validation set in eval_set.
eval_init_score : list of Dask Array, Dask Series or Dask DataFrame (for multi-class task), or None, optional (default=None)
......
......@@ -149,7 +149,7 @@ class _EvalFunctionWrapper:
In case of custom ``objective``, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case.
weight : numpy 1-D array of shape = [n_samples]
The weight of samples.
The weight of samples. Weights should be non-negative.
group : numpy 1-D array
Group/query data.
Only used in the learning-to-rank task.
......@@ -215,7 +215,7 @@ _lgbmmodel_doc_fit = (
y : {y_shape}
The target values (class labels in classification, real numbers in regression).
sample_weight : {sample_weight_shape}
Weights of training data.
Weights of training data. Weights should be non-negative.
init_score : {init_score_shape}
Init score of training data.
group : {group_shape}
......@@ -229,7 +229,7 @@ _lgbmmodel_doc_fit = (
eval_names : list of str, or None, optional (default=None)
Names of eval_set.
eval_sample_weight : {eval_sample_weight_shape}
Weights of eval data.
Weights of eval data. Weights should be non-negative.
eval_class_weight : list or None, optional (default=None)
Class weights of eval data.
eval_init_score : {eval_init_score_shape}
......@@ -284,7 +284,7 @@ _lgbmmodel_doc_custom_eval_note = """
In case of custom ``objective``, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case.
weight : numpy 1-D array of shape = [n_samples]
The weight of samples.
The weight of samples. Weights should be non-negative.
group : numpy 1-D array
Group/query data.
Only used in the learning-to-rank task.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment