Unverified Commit 6ced58ad authored by Miguel Trejo Marrufo's avatar Miguel Trejo Marrufo Committed by GitHub
Browse files

[Docs] Weights non-negative for train data (#5013)

* docs: weight parameter non-negative

* docs: weights non negative only for train data

* docs: weights should be non negative for validation data

* typo in html render

* docs: brief weights non-negative description
parent d670a4d6
...@@ -780,6 +780,8 @@ Dataset Parameters ...@@ -780,6 +780,8 @@ Dataset Parameters
- **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0`` - **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``
- **Note**: weights should be non-negative
- ``group_column`` :raw-html:`<a id="group_column" title="Permalink to this parameter" href="#group_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = int or string, aliases: ``group``, ``group_id``, ``query_column``, ``query``, ``query_id`` - ``group_column`` :raw-html:`<a id="group_column" title="Permalink to this parameter" href="#group_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = int or string, aliases: ``group``, ``group_id``, ``query_column``, ``query``, ``query_id``
- used to specify the query/group id column - used to specify the query/group id column
...@@ -1275,7 +1277,8 @@ LightGBM supports weighted training. It uses an additional file to store weight ...@@ -1275,7 +1277,8 @@ LightGBM supports weighted training. It uses an additional file to store weight
0.8 0.8
... ...
It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on. It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on. Weights should be non-negative.
The weight file corresponds with data file line by line, and has per weight per line. The weight file corresponds with data file line by line, and has per weight per line.
And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file. And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file.
......
...@@ -670,6 +670,7 @@ struct Config { ...@@ -670,6 +670,7 @@ struct Config {
// desc = add a prefix ``name:`` for column name, e.g. ``weight=name:weight`` // desc = add a prefix ``name:`` for column name, e.g. ``weight=name:weight``
// desc = **Note**: works only in case of loading data directly from text file // desc = **Note**: works only in case of loading data directly from text file
// desc = **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0`` // desc = **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``
// desc = **Note**: weights should be non-negative
std::string weight_column = ""; std::string weight_column = "";
// type = int or string // type = int or string
......
...@@ -1138,7 +1138,7 @@ class Dataset: ...@@ -1138,7 +1138,7 @@ class Dataset:
reference : Dataset or None, optional (default=None) reference : Dataset or None, optional (default=None)
If this is Dataset for validation, training data should be used as reference. If this is Dataset for validation, training data should be used as reference.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None) weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each instance. Weight for each instance. Weights should be non-negative.
group : list, numpy 1-D array, pandas Series or None, optional (default=None) group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query data. Group/query data.
Only used in the learning-to-rank task. Only used in the learning-to-rank task.
...@@ -1818,7 +1818,7 @@ class Dataset: ...@@ -1818,7 +1818,7 @@ class Dataset:
label : list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None) label : list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)
Label of the data. Label of the data.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None) weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each instance. Weight for each instance. Weights should be non-negative.
group : list, numpy 1-D array, pandas Series or None, optional (default=None) group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query data. Group/query data.
Only used in the learning-to-rank task. Only used in the learning-to-rank task.
...@@ -2154,7 +2154,7 @@ class Dataset: ...@@ -2154,7 +2154,7 @@ class Dataset:
Parameters Parameters
---------- ----------
weight : list, numpy 1-D array, pandas Series or None weight : list, numpy 1-D array, pandas Series or None
Weight to be set for each data point. Weight to be set for each data point. Weights should be non-negative.
Returns Returns
------- -------
...@@ -2269,7 +2269,7 @@ class Dataset: ...@@ -2269,7 +2269,7 @@ class Dataset:
Returns Returns
------- -------
weight : numpy array or None weight : numpy array or None
Weight for each data point from the Dataset. Weight for each data point from the Dataset. Weights should be non-negative.
""" """
if self.weight is None: if self.weight is None:
self.weight = self.get_field('weight') self.weight = self.get_field('weight')
...@@ -3543,8 +3543,7 @@ class Booster: ...@@ -3543,8 +3543,7 @@ class Booster:
reference : Dataset or None, optional (default=None) reference : Dataset or None, optional (default=None)
Reference for ``data``. Reference for ``data``.
weight : list, numpy 1-D array, pandas Series or None, optional (default=None) weight : list, numpy 1-D array, pandas Series or None, optional (default=None)
Weight for each ``data`` instance. Weight should be non-negative values because the Hessian Weight for each ``data`` instance. Weights should be non-negative.
value multiplied by weight is supposed to be non-negative.
group : list, numpy 1-D array, pandas Series or None, optional (default=None) group : list, numpy 1-D array, pandas Series or None, optional (default=None)
Group/query size for ``data``. Group/query size for ``data``.
Only used in the learning-to-rank task. Only used in the learning-to-rank task.
......
...@@ -424,7 +424,7 @@ def _train( ...@@ -424,7 +424,7 @@ def _train(
model_factory : lightgbm.LGBMClassifier, lightgbm.LGBMRegressor, or lightgbm.LGBMRanker class model_factory : lightgbm.LGBMClassifier, lightgbm.LGBMRegressor, or lightgbm.LGBMRanker class
Class of the local underlying model. Class of the local underlying model.
sample_weight : Dask Array or Dask Series of shape = [n_samples] or None, optional (default=None) sample_weight : Dask Array or Dask Series of shape = [n_samples] or None, optional (default=None)
Weights of training data. Weights of training data. Weights should be non-negative.
init_score : Dask Array or Dask Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task), or Dask Array or Dask DataFrame of shape = [n_samples, n_classes] (for multi-class task), or None, optional (default=None) init_score : Dask Array or Dask Series of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task), or Dask Array or Dask DataFrame of shape = [n_samples, n_classes] (for multi-class task), or None, optional (default=None)
Init score of training data. Init score of training data.
group : Dask Array or Dask Series or None, optional (default=None) group : Dask Array or Dask Series or None, optional (default=None)
...@@ -441,7 +441,7 @@ def _train( ...@@ -441,7 +441,7 @@ def _train(
eval_names : list of str, or None, optional (default=None) eval_names : list of str, or None, optional (default=None)
Names of eval_set. Names of eval_set.
eval_sample_weight : list of Dask Array or Dask Series, or None, optional (default=None) eval_sample_weight : list of Dask Array or Dask Series, or None, optional (default=None)
Weights for each validation set in eval_set. Weights for each validation set in eval_set. Weights should be non-negative.
eval_class_weight : list of dict or str, or None, optional (default=None) eval_class_weight : list of dict or str, or None, optional (default=None)
Class weights, one dict or str for each validation set in eval_set. Class weights, one dict or str for each validation set in eval_set.
eval_init_score : list of Dask Array, Dask Series or Dask DataFrame (for multi-class task), or None, optional (default=None) eval_init_score : list of Dask Array, Dask Series or Dask DataFrame (for multi-class task), or None, optional (default=None)
......
...@@ -149,7 +149,7 @@ class _EvalFunctionWrapper: ...@@ -149,7 +149,7 @@ class _EvalFunctionWrapper:
In case of custom ``objective``, predicted values are returned before any transformation, In case of custom ``objective``, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case. e.g. they are raw margin instead of probability of positive class for binary task in this case.
weight : numpy 1-D array of shape = [n_samples] weight : numpy 1-D array of shape = [n_samples]
The weight of samples. The weight of samples. Weights should be non-negative.
group : numpy 1-D array group : numpy 1-D array
Group/query data. Group/query data.
Only used in the learning-to-rank task. Only used in the learning-to-rank task.
...@@ -215,7 +215,7 @@ _lgbmmodel_doc_fit = ( ...@@ -215,7 +215,7 @@ _lgbmmodel_doc_fit = (
y : {y_shape} y : {y_shape}
The target values (class labels in classification, real numbers in regression). The target values (class labels in classification, real numbers in regression).
sample_weight : {sample_weight_shape} sample_weight : {sample_weight_shape}
Weights of training data. Weights of training data. Weights should be non-negative.
init_score : {init_score_shape} init_score : {init_score_shape}
Init score of training data. Init score of training data.
group : {group_shape} group : {group_shape}
...@@ -229,7 +229,7 @@ _lgbmmodel_doc_fit = ( ...@@ -229,7 +229,7 @@ _lgbmmodel_doc_fit = (
eval_names : list of str, or None, optional (default=None) eval_names : list of str, or None, optional (default=None)
Names of eval_set. Names of eval_set.
eval_sample_weight : {eval_sample_weight_shape} eval_sample_weight : {eval_sample_weight_shape}
Weights of eval data. Weights of eval data. Weights should be non-negative.
eval_class_weight : list or None, optional (default=None) eval_class_weight : list or None, optional (default=None)
Class weights of eval data. Class weights of eval data.
eval_init_score : {eval_init_score_shape} eval_init_score : {eval_init_score_shape}
...@@ -284,7 +284,7 @@ _lgbmmodel_doc_custom_eval_note = """ ...@@ -284,7 +284,7 @@ _lgbmmodel_doc_custom_eval_note = """
In case of custom ``objective``, predicted values are returned before any transformation, In case of custom ``objective``, predicted values are returned before any transformation,
e.g. they are raw margin instead of probability of positive class for binary task in this case. e.g. they are raw margin instead of probability of positive class for binary task in this case.
weight : numpy 1-D array of shape = [n_samples] weight : numpy 1-D array of shape = [n_samples]
The weight of samples. The weight of samples. Weights should be non-negative.
group : numpy 1-D array group : numpy 1-D array
Group/query data. Group/query data.
Only used in the learning-to-rank task. Only used in the learning-to-rank task.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment