Parameters.rst 79.3 KB
Newer Older
1
..  List of parameters is auto generated by LightGBM\.ci\parameter-generator.py from LightGBM\include\LightGBM\config.h file.
2

3
4
5
.. role:: raw-html(raw)
    :format: html

6
7
8
Parameters
==========

9
This page contains descriptions of all parameters in LightGBM.
10
11
12
13
14
15
16
17
18
19

**List of other helpful links**

- `Python API <./Python-API.rst>`__

- `Parameters Tuning <./Parameters-Tuning.rst>`__

Parameters Format
-----------------

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Parameters are merged together in the following order (later items overwrite earlier ones):

1. LightGBM's default values
2. special files for ``weight``, ``init_score``, ``query``, and ``positions`` (see `Others <#others>`__)
3. (CLI only) configuration in a file passed like ``config=train.conf``
4. (CLI only) configuration passed via the command line
5. (Python, R) special keyword arguments to some functions (e.g. ``num_boost_round`` in ``train()``)
6. (Python, R) ``params`` function argument (including ``**kwargs`` in Python and ``...`` in R)
7. (C API) ``parameters`` or ``params`` function argument

Many parameters have "aliases", alternative names which refer to the same configuration.

Where a mix of the primary parameter name and aliases are given, the primary parameter name is always preferred to any aliases.

For example, in Python:

.. code-block:: python

38
   # use learning rate of 0.07, because 'learning_rate'
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
   # is the primary parameter name
   lgb.train(
      params={
         "learning_rate": 0.07,
         "shrinkage_rate": 0.12
      },
      train_set=dtrain
   )

Where multiple aliases are given, and the primary parameter name is not, the first alias
appearing in the lists returned by ``Config::parameter2aliases()`` in the C++ library is used.
Those lists are hard-coded in a fairly arbitrary way... wherever possible, avoid relying on this behavior.

For example, in Python:

.. code-block:: python

   # use learning rate of 0.12, LightGBM has a hard-coded preference for 'shrinkage_rate'
   # over any other aliases, and 'learning_rate' is not provided
   lgb.train(
      params={
         "eta": 0.19,
         "shrinkage_rate": 0.12
      },
      train_set=dtrain
   )

**CLI**

68
The parameters format is ``key1=value1 key2=value2 ...``.
69
Parameters can be set both in config file and command line.
70
71
72
By using command line, parameters should not have spaces before and after ``=``.
By using config files, one line can only contain one parameter. You can use ``#`` to comment.

73
74
**Python**

75
76
Any parameters that accept multiple values should be passed as a Python list.

77
78
79
80
81
82
83
84
85
.. code-block:: python

   params = {
      "monotone_constraints": [-1, 0, 1]
   }


**R**

86
87
Any parameters that accept multiple values should be passed as an R list.

88
89
90
91
92
93
.. code-block:: r

   params <- list(
      monotone_constraints = c(-1, 0, 1)
   )

94
95
.. start params list

96
97
98
Core Parameters
---------------

99
-  ``config`` :raw-html:`<a id="config" title="Permalink to this parameter" href="#config">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``config_file``
100
101
102

   -  path of config file

103
   -  **Note**: can be used only in CLI version
104

105
-  ``task`` :raw-html:`<a id="task" title="Permalink to this parameter" href="#task">&#x1F517;&#xFE0E;</a>`, default = ``train``, type = enum, options: ``train``, ``predict``, ``convert_model``, ``refit``, aliases: ``task_type``
106

107
   -  ``train``, for training, aliases: ``training``
108

109
   -  ``predict``, for prediction, aliases: ``prediction``, ``test``
110

Nikita Titov's avatar
Nikita Titov committed
111
   -  ``convert_model``, for converting model file into if-else format, see more information in `Convert Parameters <#convert-parameters>`__
112

113
   -  ``refit``, for refitting existing models with new data, aliases: ``refit_tree``
114

115
116
   -  ``save_binary``, load train (and validation) data then save dataset to binary file. Typical usage: ``save_binary`` first, then run multiple ``train`` tasks in parallel using the saved binary file

Guolin Ke's avatar
Guolin Ke committed
117
   -  **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent functions
118

119
-  ``objective`` :raw-html:`<a id="objective" title="Permalink to this parameter" href="#objective">&#x1F517;&#xFE0E;</a>`, default = ``regression``, type = enum, options: ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``mape``, ``gamma``, ``tweedie``, ``binary``, ``multiclass``, ``multiclassova``, ``cross_entropy``, ``cross_entropy_lambda``, ``lambdarank``, ``rank_xendcg``, aliases: ``objective_type``, ``app``, ``application``, ``loss``
120

121
   -  regression application
122

Guolin Ke's avatar
Guolin Ke committed
123
      -  ``regression``, L2 loss, aliases: ``regression_l2``, ``l2``, ``mean_squared_error``, ``mse``, ``l2_root``, ``root_mean_squared_error``, ``rmse``
124

Guolin Ke's avatar
Guolin Ke committed
125
      -  ``regression_l1``, L1 loss, aliases: ``l1``, ``mean_absolute_error``, ``mae``
126

127
      -  ``huber``, `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__
128

129
      -  ``fair``, `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__
130

131
      -  ``poisson``, `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__
132

133
      -  ``quantile``, `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__
134

135
      -  ``mape``, `MAPE loss <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`__, aliases: ``mean_absolute_percentage_error``
136

137
      -  ``gamma``, Gamma regression with log-link. It might be useful, e.g., for modeling insurance claims severity, or for any target that might be `gamma-distributed <https://en.wikipedia.org/wiki/Gamma_distribution#Occurrence_and_applications>`__
Guolin Ke's avatar
Guolin Ke committed
138

139
      -  ``tweedie``, Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any target that might be `tweedie-distributed <https://en.wikipedia.org/wiki/Tweedie_distribution#Occurrence_and_applications>`__
Guolin Ke's avatar
Guolin Ke committed
140

141
142
143
144
145
   -  binary classification application

      -  ``binary``, binary `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__ classification (or logistic regression)

      -  requires labels in {0, 1}; see ``cross-entropy`` application for general probability labels in [0, 1]
146
147
148

   -  multi-class classification application

149
      -  ``multiclass``, `softmax <https://en.wikipedia.org/wiki/Softmax_function>`__ objective function, aliases: ``softmax``
150

151
      -  ``multiclassova``, `One-vs-All <https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest>`__ binary objective function, aliases: ``multiclass_ova``, ``ova``, ``ovr``
Nikita Titov's avatar
Nikita Titov committed
152
153

      -  ``num_class`` should be set as well
154
155
156

   -  cross-entropy application

Guolin Ke's avatar
Guolin Ke committed
157
      -  ``cross_entropy``, objective function for cross-entropy (with optional linear weights), aliases: ``xentropy``
158

Guolin Ke's avatar
Guolin Ke committed
159
      -  ``cross_entropy_lambda``, alternative parameterization of cross-entropy, aliases: ``xentlambda``
160

161
      -  label is anything in interval [0, 1]
162

163
   -  ranking application
164

165
      -  ``lambdarank``, `lambdarank <https://proceedings.neurips.cc/paper/2006/hash/af44c4c56f385c43f2529f9b1b018f6a-Abstract.html>`__ objective. `label_gain <#label_gain>`__ can be used to set the gain (weight) of ``int`` label and all values in ``label`` must be smaller than number of elements in ``label_gain``
166

167
      -  ``rank_xendcg``, `XE_NDCG_MART <https://arxiv.org/abs/1911.09798>`__ ranking objective function, aliases: ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
168

169
      -  ``rank_xendcg`` is faster than and achieves the similar performance as ``lambdarank``
170

171
      -  label should be ``int`` type, and larger number represents the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect)
172

173
174
175
176
177
178
   -  custom objective function (gradients and hessians not computed directly by LightGBM)

      -  ``custom``

      -  must be passed through parameters explicitly in the C API

179
180
      -  **Note**: cannot be used in CLI version

181
-  ``boosting`` :raw-html:`<a id="boosting" title="Permalink to this parameter" href="#boosting">&#x1F517;&#xFE0E;</a>`, default = ``gbdt``, type = enum, options: ``gbdt``, ``rf``, ``dart``, aliases: ``boosting_type``, ``boost``
182

183
   -  ``gbdt``, traditional Gradient Boosting Decision Tree, aliases: ``gbrt``
184

185
   -  ``rf``, Random Forest, aliases: ``random_forest``
186

187
   -  ``dart``, `Dropouts meet Multiple Additive Regression Trees <https://arxiv.org/abs/1505.01866>`__
188

Nikita Titov's avatar
Nikita Titov committed
189
190
      -  **Note**: internally, LightGBM uses ``gbdt`` mode for the first ``1 / learning_rate`` iterations

191
192
193
194
195
196
197
198
-  ``data_sample_strategy`` :raw-html:`<a id="data_sample_strategy" title="Permalink to this parameter" href="#data_sample_strategy">&#x1F517;&#xFE0E;</a>`, default = ``bagging``, type = enum, options: ``bagging``, ``goss``

   -  ``bagging``, Randomly Bagging Sampling

      -  **Note**: ``bagging`` is only effective when ``bagging_freq > 0`` and ``bagging_fraction < 1.0``

   -  ``goss``, Gradient-based One-Side Sampling

James Lamb's avatar
James Lamb committed
199
   -  *New in version 4.0.0*
200

201
-  ``data`` :raw-html:`<a id="data" title="Permalink to this parameter" href="#data">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``train``, ``train_data``, ``train_data_file``, ``data_filename``
202

203
   -  path of training data, LightGBM will train from this data
204

205
206
   -  **Note**: can be used only in CLI version

207
-  ``valid`` :raw-html:`<a id="valid" title="Permalink to this parameter" href="#valid">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``test``, ``valid_data``, ``valid_data_file``, ``test_data``, ``test_data_file``, ``valid_filenames``
208

209
   -  path(s) of validation/test data, LightGBM will output metrics for these data
210

211
   -  support multiple validation data, separated by ``,``
212

213
214
   -  **Note**: can be used only in CLI version

215
-  ``num_iterations`` :raw-html:`<a id="num_iterations" title="Permalink to this parameter" href="#num_iterations">&#x1F517;&#xFE0E;</a>`, default = ``100``, type = int, aliases: ``num_iteration``, ``n_iter``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds``, ``nrounds``, ``num_boost_round``, ``n_estimators``, ``max_iter``, constraints: ``num_iterations >= 0``
216
217

   -  number of boosting iterations
218

219
   -  **Note**: internally, LightGBM constructs ``num_class * num_iterations`` trees for multi-class classification problems
220

221
-  ``learning_rate`` :raw-html:`<a id="learning_rate" title="Permalink to this parameter" href="#learning_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.1``, type = double, aliases: ``shrinkage_rate``, ``eta``, constraints: ``learning_rate > 0.0``
222
223
224
225
226

   -  shrinkage rate

   -  in ``dart``, it also affects on normalization weights of dropped trees

227
-  ``num_leaves`` :raw-html:`<a id="num_leaves" title="Permalink to this parameter" href="#num_leaves">&#x1F517;&#xFE0E;</a>`, default = ``31``, type = int, aliases: ``num_leaf``, ``max_leaves``, ``max_leaf``, ``max_leaf_nodes``, constraints: ``1 < num_leaves <= 131072``
228

229
   -  max number of leaves in one tree
230

231
-  ``tree_learner`` :raw-html:`<a id="tree_learner" title="Permalink to this parameter" href="#tree_learner">&#x1F517;&#xFE0E;</a>`, default = ``serial``, type = enum, options: ``serial``, ``feature``, ``data``, ``voting``, aliases: ``tree``, ``tree_type``, ``tree_learner_type``
232
233
234

   -  ``serial``, single machine tree learner

235
   -  ``feature``, feature parallel tree learner, aliases: ``feature_parallel``
236

237
   -  ``data``, data parallel tree learner, aliases: ``data_parallel``
238

239
   -  ``voting``, voting parallel tree learner, aliases: ``voting_parallel``
240

241
   -  refer to `Distributed Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details
242

243
-  ``num_threads`` :raw-html:`<a id="num_threads" title="Permalink to this parameter" href="#num_threads">&#x1F517;&#xFE0E;</a>`, default = ``0``, type = int, aliases: ``num_thread``, ``nthread``, ``nthreads``, ``n_jobs``
244

245
246
   -  used only in ``train``, ``prediction`` and ``refit`` tasks or in correspondent functions of language-specific packages

247
248
   -  number of threads for LightGBM

249
   -  ``0`` means default number of threads in OpenMP
250

251
   -  for the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPUs use `hyper-threading <https://en.wikipedia.org/wiki/Hyper-threading>`__ to generate 2 threads per CPU core)
252

253
   -  do not set it too large if your dataset is small (for instance, do not use 64 threads for a dataset with 10,000 rows)
254

255
   -  be aware a task manager or any similar CPU monitoring tool might report that cores not being fully utilized. **This is normal**
256

257
   -  for distributed learning, do not use all CPU cores because this will cause poor performance for the network communication
258

259
260
   -  **Note**: please **don't** change this during training, especially when running multiple jobs simultaneously by external packages, otherwise it may cause undesirable errors

261
-  ``device_type`` :raw-html:`<a id="device_type" title="Permalink to this parameter" href="#device_type">&#x1F517;&#xFE0E;</a>`, default = ``cpu``, type = enum, options: ``cpu``, ``gpu``, ``cuda``, aliases: ``device``
262

263
264
265
266
267
268
269
   -  device for the tree learning

   -  ``cpu`` supports all LightGBM functionality and is portable across the widest range of operating systems and hardware

   -  ``cuda`` offers faster training than ``gpu`` or ``cpu``, but only works on GPUs supporting CUDA

   -  ``gpu`` can be faster than ``cpu`` and works on a wider range of GPUs than CUDA
270
271
272

   -  **Note**: it is recommended to use the smaller ``max_bin`` (e.g. 63) to get the better speed up

273
274
   -  **Note**: for the faster speed, GPU uses 32-bit float point to sum up by default, so this may affect the accuracy for some tasks. You can set ``gpu_use_dp=true`` to enable 64-bit float point, but it will slow down the training

275
   -  **Note**: refer to `Installation Guide <./Installation-Guide.rst>`__ to build LightGBM with GPU or CUDA support
276

277
-  ``seed`` :raw-html:`<a id="seed" title="Permalink to this parameter" href="#seed">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = int, aliases: ``random_seed``, ``random_state``
278

279
   -  this seed is used to generate other seeds, e.g. ``data_random_seed``, ``feature_fraction_seed``, etc.
280

281
282
283
   -  by default, this seed is unused in favor of default values of other seeds

   -  this seed has lower priority in comparison with other seeds, which means that it will be overridden, if you set other seeds explicitly
284

Guolin Ke's avatar
Guolin Ke committed
285
286
287
288
289
290
291
292
293
294
295
296
-  ``deterministic`` :raw-html:`<a id="deterministic" title="Permalink to this parameter" href="#deterministic">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

   -  used only with ``cpu`` device type

   -  setting this to ``true`` should ensure the stable results when using the same data and the same parameters (and different ``num_threads``)

   -  when you use the different seeds, different LightGBM versions, the binaries compiled by different compilers, or in different systems, the results are expected to be different

   -  you can `raise issues <https://github.com/microsoft/LightGBM/issues>`__ in LightGBM GitHub repo when you meet the unstable results

   -  **Note**: setting this to ``true`` may slow down the training

297
298
   -  **Note**: to avoid potential instability due to numerical issues, please set ``force_col_wise=true`` or ``force_row_wise=true`` when setting ``deterministic=true``

299
300
301
Learning Control Parameters
---------------------------

302
303
-  ``force_col_wise`` :raw-html:`<a id="force_col_wise" title="Permalink to this parameter" href="#force_col_wise">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

304
305
306
   -  used only with ``cpu`` device type

   -  set this to ``true`` to force col-wise histogram building
307

308
   -  enabling this is recommended when:
309

310
      -  the number of columns is large, or the total number of bins is large
311

Nikita Titov's avatar
Nikita Titov committed
312
      -  ``num_threads`` is large, e.g. ``> 20``
313

314
      -  you want to reduce memory cost
315

316
317
318
   -  **Note**: when both ``force_col_wise`` and ``force_row_wise`` are ``false``, LightGBM will firstly try them both, and then use the faster one. To remove the overhead of testing set the faster one to ``true`` manually

   -  **Note**: this parameter cannot be used at the same time with ``force_row_wise``, choose only one of them
319
320
321

-  ``force_row_wise`` :raw-html:`<a id="force_row_wise" title="Permalink to this parameter" href="#force_row_wise">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

322
323
324
325
326
   -  used only with ``cpu`` device type

   -  set this to ``true`` to force row-wise histogram building

   -  enabling this is recommended when:
327

328
      -  the number of data points is large, and the total number of bins is relatively small
329

Nikita Titov's avatar
Nikita Titov committed
330
      -  ``num_threads`` is relatively small, e.g. ``<= 16``
331

332
      -  you want to use small ``bagging_fraction`` or ``goss`` sample strategy to speed up
333

334
   -  **Note**: setting this to ``true`` will double the memory cost for Dataset object. If you have not enough memory, you can try setting ``force_col_wise=true``
335

336
   -  **Note**: when both ``force_col_wise`` and ``force_row_wise`` are ``false``, LightGBM will firstly try them both, and then use the faster one. To remove the overhead of testing set the faster one to ``true`` manually
337

338
   -  **Note**: this parameter cannot be used at the same time with ``force_col_wise``, choose only one of them
339

340
341
342
343
344
345
-  ``histogram_pool_size`` :raw-html:`<a id="histogram_pool_size" title="Permalink to this parameter" href="#histogram_pool_size">&#x1F517;&#xFE0E;</a>`, default = ``-1.0``, type = double, aliases: ``hist_pool_size``

   -  max cache size in MB for historical histogram

   -  ``< 0`` means no limit

346
-  ``max_depth`` :raw-html:`<a id="max_depth" title="Permalink to this parameter" href="#max_depth">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int
347

348
   -  limit the max depth for tree model. This is used to deal with over-fitting when ``#data`` is small. Tree still grows leaf-wise
349

350
   -  ``<= 0`` means no limit
351

352
-  ``min_data_in_leaf`` :raw-html:`<a id="min_data_in_leaf" title="Permalink to this parameter" href="#min_data_in_leaf">&#x1F517;&#xFE0E;</a>`, default = ``20``, type = int, aliases: ``min_data_per_leaf``, ``min_data``, ``min_child_samples``, ``min_samples_leaf``, constraints: ``min_data_in_leaf >= 0``
353
354
355

   -  minimal number of data in one leaf. Can be used to deal with over-fitting

356
357
   -  **Note**: this is an approximation based on the Hessian, so occasionally you may observe splits which produce leaf nodes that have less than this many observations

358
-  ``min_sum_hessian_in_leaf`` :raw-html:`<a id="min_sum_hessian_in_leaf" title="Permalink to this parameter" href="#min_sum_hessian_in_leaf">&#x1F517;&#xFE0E;</a>`, default = ``1e-3``, type = double, aliases: ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian``, ``min_child_weight``, constraints: ``min_sum_hessian_in_leaf >= 0.0``
359
360
361

   -  minimal sum hessian in one leaf. Like ``min_data_in_leaf``, it can be used to deal with over-fitting

362
-  ``bagging_fraction`` :raw-html:`<a id="bagging_fraction" title="Permalink to this parameter" href="#bagging_fraction">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``sub_row``, ``subsample``, ``bagging``, constraints: ``0.0 < bagging_fraction <= 1.0``
363

364
   -  like ``feature_fraction``, but this will randomly select part of data without resampling
365
366
367
368
369

   -  can be used to speed up training

   -  can be used to deal with over-fitting

370
   -  **Note**: to enable bagging, ``bagging_freq`` should be set to a non zero value as well
371

Guolin Ke's avatar
Guolin Ke committed
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
-  ``pos_bagging_fraction`` :raw-html:`<a id="pos_bagging_fraction" title="Permalink to this parameter" href="#pos_bagging_fraction">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``pos_sub_row``, ``pos_subsample``, ``pos_bagging``, constraints: ``0.0 < pos_bagging_fraction <= 1.0``

   -  used only in ``binary`` application

   -  used for imbalanced binary classification problem, will randomly sample ``#pos_samples * pos_bagging_fraction`` positive samples in bagging

   -  should be used together with ``neg_bagging_fraction``

   -  set this to ``1.0`` to disable

   -  **Note**: to enable this, you need to set ``bagging_freq`` and ``neg_bagging_fraction`` as well

   -  **Note**: if both ``pos_bagging_fraction`` and ``neg_bagging_fraction`` are set to ``1.0``,  balanced bagging is disabled

   -  **Note**: if balanced bagging is enabled, ``bagging_fraction`` will be ignored

-  ``neg_bagging_fraction`` :raw-html:`<a id="neg_bagging_fraction" title="Permalink to this parameter" href="#neg_bagging_fraction">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``neg_sub_row``, ``neg_subsample``, ``neg_bagging``, constraints: ``0.0 < neg_bagging_fraction <= 1.0``

   -  used only in ``binary`` application

   -  used for imbalanced binary classification problem, will randomly sample ``#neg_samples * neg_bagging_fraction`` negative samples in bagging

   -  should be used together with ``pos_bagging_fraction``

   -  set this to ``1.0`` to disable

   -  **Note**: to enable this, you need to set ``bagging_freq`` and ``pos_bagging_fraction`` as well

   -  **Note**: if both ``pos_bagging_fraction`` and ``neg_bagging_fraction`` are set to ``1.0``,  balanced bagging is disabled

   -  **Note**: if balanced bagging is enabled, ``bagging_fraction`` will be ignored

404
-  ``bagging_freq`` :raw-html:`<a id="bagging_freq" title="Permalink to this parameter" href="#bagging_freq">&#x1F517;&#xFE0E;</a>`, default = ``0``, type = int, aliases: ``subsample_freq``
405

406
   -  frequency for bagging
407

408
   -  ``0`` means disable bagging; ``k`` means perform bagging at every ``k`` iteration. Every ``k``-th iteration, LightGBM will randomly select ``bagging_fraction * 100%`` of the data to use for the next ``k`` iterations
409

410
   -  **Note**: bagging is only effective when ``0.0 < bagging_fraction < 1.0``
411

412
-  ``bagging_seed`` :raw-html:`<a id="bagging_seed" title="Permalink to this parameter" href="#bagging_seed">&#x1F517;&#xFE0E;</a>`, default = ``3``, type = int, aliases: ``bagging_fraction_seed``
413
414
415

   -  random seed for bagging

416
417
418
419
-  ``bagging_by_query`` :raw-html:`<a id="bagging_by_query" title="Permalink to this parameter" href="#bagging_by_query">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

   -  whether to do bagging sample by query

James Lamb's avatar
James Lamb committed
420
421
   -  *New in version 4.6.0*

422
-  ``feature_fraction`` :raw-html:`<a id="feature_fraction" title="Permalink to this parameter" href="#feature_fraction">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``sub_feature``, ``colsample_bytree``, constraints: ``0.0 < feature_fraction <= 1.0``
423

424
   -  LightGBM will randomly select a subset of features on each iteration (tree) if ``feature_fraction`` is smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features before training each tree
425

426
   -  can be used to speed up training
427

428
   -  can be used to deal with over-fitting
429

430
-  ``feature_fraction_bynode`` :raw-html:`<a id="feature_fraction_bynode" title="Permalink to this parameter" href="#feature_fraction_bynode">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, aliases: ``sub_feature_bynode``, ``colsample_bynode``, constraints: ``0.0 < feature_fraction_bynode <= 1.0``
431

432
   -  LightGBM will randomly select a subset of features on each tree node if ``feature_fraction_bynode`` is smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features at each tree node
433
434
435

   -  can be used to deal with over-fitting

436
437
438
439
   -  **Note**: unlike ``feature_fraction``, this cannot speed up training

   -  **Note**: if both ``feature_fraction`` and ``feature_fraction_bynode`` are smaller than ``1.0``, the final fraction of each node is ``feature_fraction * feature_fraction_bynode``

440
-  ``feature_fraction_seed`` :raw-html:`<a id="feature_fraction_seed" title="Permalink to this parameter" href="#feature_fraction_seed">&#x1F517;&#xFE0E;</a>`, default = ``2``, type = int
441
442

   -  random seed for ``feature_fraction``
443

Nikita Titov's avatar
Nikita Titov committed
444
-  ``extra_trees`` :raw-html:`<a id="extra_trees" title="Permalink to this parameter" href="#extra_trees">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``extra_tree``
445
446
447
448
449

   -  use extremely randomized trees

   -  if set to ``true``, when evaluating node splits LightGBM will check only one randomly-chosen threshold for each feature

450
451
   -  can be used to speed up training

452
453
454
455
456
457
   -  can be used to deal with over-fitting

-  ``extra_seed`` :raw-html:`<a id="extra_seed" title="Permalink to this parameter" href="#extra_seed">&#x1F517;&#xFE0E;</a>`, default = ``6``, type = int

   -  random seed for selecting thresholds when ``extra_trees`` is true

458
-  ``early_stopping_round`` :raw-html:`<a id="early_stopping_round" title="Permalink to this parameter" href="#early_stopping_round">&#x1F517;&#xFE0E;</a>`, default = ``0``, type = int, aliases: ``early_stopping_rounds``, ``early_stopping``, ``n_iter_no_change``
459

460
   -  will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds
461

462
   -  ``<= 0`` means disable
463

464
465
   -  can be used to speed up training

466
467
468
469
-  ``early_stopping_min_delta`` :raw-html:`<a id="early_stopping_min_delta" title="Permalink to this parameter" href="#early_stopping_min_delta">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, constraints: ``early_stopping_min_delta >= 0.0``

   -  when early stopping is used (i.e. ``early_stopping_round > 0``), require the early stopping metric to improve by at least this delta to be considered an improvement

James Lamb's avatar
James Lamb committed
470
   -  *New in version 4.4.0*
James Lamb's avatar
James Lamb committed
471

472
473
-  ``first_metric_only`` :raw-html:`<a id="first_metric_only" title="Permalink to this parameter" href="#first_metric_only">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

474
   -  LightGBM allows you to provide multiple evaluation metrics. Set this to ``true``, if you want to use only the first metric for early stopping
475

476
-  ``max_delta_step`` :raw-html:`<a id="max_delta_step" title="Permalink to this parameter" href="#max_delta_step">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``max_tree_output``, ``max_leaf_output``
477

478
   -  used to limit the max output of tree leaves
479

480
   -  ``<= 0`` means no constraint
481

482
   -  the final max output of leaves is ``learning_rate * max_delta_step``
483

484
-  ``lambda_l1`` :raw-html:`<a id="lambda_l1" title="Permalink to this parameter" href="#lambda_l1">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``reg_alpha``, ``l1_regularization``, constraints: ``lambda_l1 >= 0.0``
485
486
487

   -  L1 regularization

488
-  ``lambda_l2`` :raw-html:`<a id="lambda_l2" title="Permalink to this parameter" href="#lambda_l2">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``reg_lambda``, ``lambda``, ``l2_regularization``, constraints: ``lambda_l2 >= 0.0``
489
490
491

   -  L2 regularization

492
493
-  ``linear_lambda`` :raw-html:`<a id="linear_lambda" title="Permalink to this parameter" href="#linear_lambda">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, constraints: ``linear_lambda >= 0.0``

494
   -  linear tree regularization, corresponds to the parameter ``lambda`` in Eq. 3 of `Gradient Boosting with Piece-Wise Linear Regression Trees <https://arxiv.org/abs/1802.05640>`__
495

496
-  ``min_gain_to_split`` :raw-html:`<a id="min_gain_to_split" title="Permalink to this parameter" href="#min_gain_to_split">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``min_split_gain``, constraints: ``min_gain_to_split >= 0.0``
497

498
   -  the minimal gain to perform split
499

500
501
   -  can be used to speed up training

502
-  ``drop_rate`` :raw-html:`<a id="drop_rate" title="Permalink to this parameter" href="#drop_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.1``, type = double, aliases: ``rate_drop``, constraints: ``0.0 <= drop_rate <= 1.0``
503

504
   -  used only in ``dart``
505

506
   -  dropout rate: a fraction of previous trees to drop during the dropout
507

508
-  ``max_drop`` :raw-html:`<a id="max_drop" title="Permalink to this parameter" href="#max_drop">&#x1F517;&#xFE0E;</a>`, default = ``50``, type = int
509

510
   -  used only in ``dart``
511

512
   -  max number of dropped trees during one boosting iteration
513

514
   -  ``<=0`` means no limit
515

516
-  ``skip_drop`` :raw-html:`<a id="skip_drop" title="Permalink to this parameter" href="#skip_drop">&#x1F517;&#xFE0E;</a>`, default = ``0.5``, type = double, constraints: ``0.0 <= skip_drop <= 1.0``
517

518
   -  used only in ``dart``
519

520
   -  probability of skipping the dropout procedure during a boosting iteration
521

522
-  ``xgboost_dart_mode`` :raw-html:`<a id="xgboost_dart_mode" title="Permalink to this parameter" href="#xgboost_dart_mode">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
523

524
   -  used only in ``dart``
525

526
   -  set this to ``true``, if you want to use XGBoost DART mode
527

528
-  ``uniform_drop`` :raw-html:`<a id="uniform_drop" title="Permalink to this parameter" href="#uniform_drop">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
529

530
   -  used only in ``dart``
531

532
   -  set this to ``true``, if you want to use uniform drop
533

534
-  ``drop_seed`` :raw-html:`<a id="drop_seed" title="Permalink to this parameter" href="#drop_seed">&#x1F517;&#xFE0E;</a>`, default = ``4``, type = int
535

536
   -  used only in ``dart``
537

538
   -  random seed to choose dropping models
539

540
-  ``top_rate`` :raw-html:`<a id="top_rate" title="Permalink to this parameter" href="#top_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.2``, type = double, constraints: ``0.0 <= top_rate <= 1.0``
541

542
   -  used only in ``goss``
543

544
   -  the retain ratio of large gradient data
545

546
-  ``other_rate`` :raw-html:`<a id="other_rate" title="Permalink to this parameter" href="#other_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.1``, type = double, constraints: ``0.0 <= other_rate <= 1.0``
547

548
   -  used only in ``goss``
549

550
551
   -  the retain ratio of small gradient data

552
-  ``min_data_per_group`` :raw-html:`<a id="min_data_per_group" title="Permalink to this parameter" href="#min_data_per_group">&#x1F517;&#xFE0E;</a>`, default = ``100``, type = int, constraints: ``min_data_per_group > 0``
553

554
555
   -  used for the categorical features

556
   -  minimal number of data per categorical group
557

558
-  ``max_cat_threshold`` :raw-html:`<a id="max_cat_threshold" title="Permalink to this parameter" href="#max_cat_threshold">&#x1F517;&#xFE0E;</a>`, default = ``32``, type = int, constraints: ``max_cat_threshold > 0``
559

560
   -  used for the categorical features
561

562
563
564
   -  limit number of split points considered for categorical features. See `the documentation on how LightGBM finds optimal splits for categorical features <./Features.rst#optimal-split-for-categorical-features>`_ for more details

   -  can be used to speed up training
565

566
-  ``cat_l2`` :raw-html:`<a id="cat_l2" title="Permalink to this parameter" href="#cat_l2">&#x1F517;&#xFE0E;</a>`, default = ``10.0``, type = double, constraints: ``cat_l2 >= 0.0``
567
568

   -  used for the categorical features
Guolin Ke's avatar
Guolin Ke committed
569

570
   -  L2 regularization in categorical split
571

572
-  ``cat_smooth`` :raw-html:`<a id="cat_smooth" title="Permalink to this parameter" href="#cat_smooth">&#x1F517;&#xFE0E;</a>`, default = ``10.0``, type = double, constraints: ``cat_smooth >= 0.0``
573
574
575
576
577

   -  used for the categorical features

   -  this can reduce the effect of noises in categorical features, especially for categories with few data

578
-  ``max_cat_to_onehot`` :raw-html:`<a id="max_cat_to_onehot" title="Permalink to this parameter" href="#max_cat_to_onehot">&#x1F517;&#xFE0E;</a>`, default = ``4``, type = int, constraints: ``max_cat_to_onehot > 0``
579

580
581
   -  used for the categorical features

582
583
   -  when number of categories of one feature smaller than or equal to ``max_cat_to_onehot``, one-vs-other split algorithm will be used

584
-  ``top_k`` :raw-html:`<a id="top_k" title="Permalink to this parameter" href="#top_k">&#x1F517;&#xFE0E;</a>`, default = ``20``, type = int, aliases: ``topk``, constraints: ``top_k > 0``
585

586
   -  used only in ``voting`` tree learner, refer to `Voting parallel <./Parallel-Learning-Guide.rst#choose-appropriate-parallel-algorithm>`__
587
588

   -  set this to larger value for more accurate result, but it will slow down the training speed
589

590
-  ``monotone_constraints`` :raw-html:`<a id="monotone_constraints" title="Permalink to this parameter" href="#monotone_constraints">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-int, aliases: ``mc``, ``monotone_constraint``, ``monotonic_cst``
Guolin Ke's avatar
Guolin Ke committed
591

592
   -  used for constraints of monotonic features
Guolin Ke's avatar
Guolin Ke committed
593

594
   -  ``1`` means increasing, ``-1`` means decreasing, ``0`` means non-constraint
Guolin Ke's avatar
Guolin Ke committed
595

596
   -  you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for the 1st feature, non-constraint for the 2nd feature and increasing for the 3rd feature
597

598
-  ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = enum, options: ``basic``, ``intermediate``, ``advanced``, aliases: ``monotone_constraining_method``, ``mc_method``
599
600
601
602
603

   -  used only if ``monotone_constraints`` is set

   -  monotone constraints method

604
      -  ``basic``, the most basic monotone constraints method. It does not slow down the training speed at all, but over-constrains the predictions
605

606
      -  ``intermediate``, a `more advanced method <https://hal.science/hal-02862802/document>`__, which may slow down the training speed very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
607

608
      -  ``advanced``, an `even more advanced method <https://hal.science/hal-02862802/document>`__, which may slow down the training speed. However, this method is even less constraining than the intermediate method and should again significantly improve the results
609

610
611
612
613
-  ``monotone_penalty`` :raw-html:`<a id="monotone_penalty" title="Permalink to this parameter" href="#monotone_penalty">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``monotone_splits_penalty``, ``ms_penalty``, ``mc_penalty``, constraints: ``monotone_penalty >= 0.0``

   -  used only if ``monotone_constraints`` is set

614
   -  `monotone penalty <https://hal.science/hal-02862802/document>`__: a penalization parameter X forbids any monotone splits on the first X (rounded down) level(s) of the tree. The penalty applied to monotone splits on a given depth is a continuous, increasing function the penalization parameter
615
616
617

   -  if ``0.0`` (the default), no penalization is applied

618
-  ``feature_contri`` :raw-html:`<a id="feature_contri" title="Permalink to this parameter" href="#feature_contri">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-double, aliases: ``feature_contrib``, ``fc``, ``fp``, ``feature_penalty``
Guolin Ke's avatar
Guolin Ke committed
619
620
621
622
623

   -  used to control feature's split gain, will use ``gain[i] = max(0, feature_contri[i]) * gain[i]`` to replace the split gain of i-th feature

   -  you need to specify all features in order

624
-  ``forcedsplits_filename`` :raw-html:`<a id="forcedsplits_filename" title="Permalink to this parameter" href="#forcedsplits_filename">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``fs``, ``forced_splits_filename``, ``forced_splits_file``, ``forced_splits``
625
626
627
628
629
630
631

   -  path to a ``.json`` file that specifies splits to force at the top of every decision tree before best-first learning commences

   -  ``.json`` file can be arbitrarily nested, and each split contains ``feature``, ``threshold`` fields, as well as ``left`` and ``right`` fields representing subsplits

   -  categorical splits are forced in a one-hot fashion, with ``left`` representing the split containing the feature value and ``right`` representing other values

632
633
   -  **Note**: the forced split logic will be ignored, if the split makes gain worse

634
   -  see `this file <https://github.com/microsoft/LightGBM/blob/master/examples/binary_classification/forced_splits.json>`__ as an example
Guolin Ke's avatar
Guolin Ke committed
635

Guolin Ke's avatar
Guolin Ke committed
636
637
638
639
640
641
-  ``refit_decay_rate`` :raw-html:`<a id="refit_decay_rate" title="Permalink to this parameter" href="#refit_decay_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.9``, type = double, constraints: ``0.0 <= refit_decay_rate <= 1.0``

   -  decay rate of ``refit`` task, will use ``leaf_output = refit_decay_rate * old_leaf_output + (1.0 - refit_decay_rate) * new_leaf_output`` to refit trees

   -  used only in ``refit`` task in CLI version or as argument in ``refit`` function in language-specific package

642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
-  ``cegb_tradeoff`` :raw-html:`<a id="cegb_tradeoff" title="Permalink to this parameter" href="#cegb_tradeoff">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, constraints: ``cegb_tradeoff >= 0.0``

   -  cost-effective gradient boosting multiplier for all penalties

-  ``cegb_penalty_split`` :raw-html:`<a id="cegb_penalty_split" title="Permalink to this parameter" href="#cegb_penalty_split">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, constraints: ``cegb_penalty_split >= 0.0``

   -  cost-effective gradient-boosting penalty for splitting a node

-  ``cegb_penalty_feature_lazy`` :raw-html:`<a id="cegb_penalty_feature_lazy" title="Permalink to this parameter" href="#cegb_penalty_feature_lazy">&#x1F517;&#xFE0E;</a>`, default = ``0,0,...,0``, type = multi-double

   -  cost-effective gradient boosting penalty for using a feature

   -  applied per data point

-  ``cegb_penalty_feature_coupled`` :raw-html:`<a id="cegb_penalty_feature_coupled" title="Permalink to this parameter" href="#cegb_penalty_feature_coupled">&#x1F517;&#xFE0E;</a>`, default = ``0,0,...,0``, type = multi-double

   -  cost-effective gradient boosting penalty for using a feature

   -  applied once per forest

Belinda Trotta's avatar
Belinda Trotta committed
662
663
664
665
666
667
-  ``path_smooth`` :raw-html:`<a id="path_smooth" title="Permalink to this parameter" href="#path_smooth">&#x1F517;&#xFE0E;</a>`, default = ``0``, type = double, constraints: ``path_smooth >=  0.0``

   -  controls smoothing applied to tree nodes

   -  helps prevent overfitting on leaves with few samples

668
   -  if ``0.0`` (the default), no smoothing is applied
Belinda Trotta's avatar
Belinda Trotta committed
669
670
671

   -  if ``path_smooth > 0`` then ``min_data_in_leaf`` must be at least ``2``

672
   -  larger values give stronger regularization
Belinda Trotta's avatar
Belinda Trotta committed
673

674
      -  the weight of each node is ``w * (n / path_smooth) / (n / path_smooth + 1) + w_p / (n / path_smooth + 1)``, where ``n`` is the number of samples in the node, ``w`` is the optimal node weight to minimise the loss (approximately ``-sum_gradients / sum_hessians``), and ``w_p`` is the weight of the parent node
Belinda Trotta's avatar
Belinda Trotta committed
675
676
677

      -  note that the parent output ``w_p`` itself has smoothing applied, unless it is the root node, so that the smoothing effect accumulates with the tree depth

678
679
680
681
682
683
684
685
686
687
-  ``interaction_constraints`` :raw-html:`<a id="interaction_constraints" title="Permalink to this parameter" href="#interaction_constraints">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string

   -  controls which features can appear in the same branch

   -  by default interaction constraints are disabled, to enable them you can specify

      -  for CLI, lists separated by commas, e.g. ``[0,1,2],[2,3]``

      -  for Python-package, list of lists, e.g. ``[[0, 1, 2], [2, 3]]``

688
      -  for R-package, list of character or numeric vectors, e.g. ``list(c("var1", "var2", "var3"), c("var3", "var4"))`` or ``list(c(1L, 2L, 3L), c(3L, 4L))``. Numeric vectors should use 1-based indexing, where ``1L`` is the first feature, ``2L`` is the second feature, etc.
689
690
691

   -  any two features can only appear in the same branch only if there exists a constraint containing both features

692
-  ``verbosity`` :raw-html:`<a id="verbosity" title="Permalink to this parameter" href="#verbosity">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``verbose``
693
694
695

   -  controls the level of LightGBM's verbosity

696
   -  ``< 0``: Fatal, ``= 0``: Error (Warning), ``= 1``: Info, ``> 1``: Debug
697

698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
-  ``input_model`` :raw-html:`<a id="input_model" title="Permalink to this parameter" href="#input_model">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``model_input``, ``model_in``

   -  filename of input model

   -  for ``prediction`` task, this model will be applied to prediction data

   -  for ``train`` task, training will be continued from this model

   -  **Note**: can be used only in CLI version

-  ``output_model`` :raw-html:`<a id="output_model" title="Permalink to this parameter" href="#output_model">&#x1F517;&#xFE0E;</a>`, default = ``LightGBM_model.txt``, type = string, aliases: ``model_output``, ``model_out``

   -  filename of output model in training

   -  **Note**: can be used only in CLI version
713
714
715
716
717
718
719
720

-  ``saved_feature_importance_type`` :raw-html:`<a id="saved_feature_importance_type" title="Permalink to this parameter" href="#saved_feature_importance_type">&#x1F517;&#xFE0E;</a>`, default = ``0``, type = int

   -  the feature importance type in the saved model file

   -  ``0``: count-based feature importance (numbers of splits are counted); ``1``: gain-based feature importance (values of gain are counted)

   -  **Note**: can be used only in CLI version
721
722
723
724
725
726
727
728
729

-  ``snapshot_freq`` :raw-html:`<a id="snapshot_freq" title="Permalink to this parameter" href="#snapshot_freq">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int, aliases: ``save_period``

   -  frequency of saving model file snapshot

   -  set this to positive value to enable this function. For example, the model file will be snapshotted at each iteration if ``snapshot_freq=1``

   -  **Note**: can be used only in CLI version

730
731
732
733
734
735
736
737
738
739
-  ``use_quantized_grad`` :raw-html:`<a id="use_quantized_grad" title="Permalink to this parameter" href="#use_quantized_grad">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

   -  whether to use gradient quantization when training

   -  enabling this will discretize (quantize) the gradients and hessians into bins of ``num_grad_quant_bins``

   -  with quantized training, most arithmetics in the training process will be integer operations

   -  gradient quantization can accelerate training, with little accuracy drop in most cases

740
   -  **Note**: works only with ``cpu`` and ``cuda`` device type
741

742
743
   -  *New in version 4.0.0*

744
745
-  ``num_grad_quant_bins`` :raw-html:`<a id="num_grad_quant_bins" title="Permalink to this parameter" href="#num_grad_quant_bins">&#x1F517;&#xFE0E;</a>`, default = ``4``, type = int

746
747
   -  used only if ``use_quantized_grad=true``

748
749
750
751
   -  number of bins to quantization gradients and hessians

   -  with more bins, the quantized training will be closer to full precision training

752
   -  **Note**: works only with ``cpu`` and ``cuda`` device type
753

James Lamb's avatar
James Lamb committed
754
   -  *New in version 4.0.0*
755

756
757
-  ``quant_train_renew_leaf`` :raw-html:`<a id="quant_train_renew_leaf" title="Permalink to this parameter" href="#quant_train_renew_leaf">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

758
759
   -  used only if ``use_quantized_grad=true``

760
761
762
763
   -  whether to renew the leaf values with original gradients when quantized training

   -  renewing is very helpful for good quantized training accuracy for ranking objectives

764
   -  **Note**: works only with ``cpu`` and ``cuda`` device type
765

James Lamb's avatar
James Lamb committed
766
   -  *New in version 4.0.0*
767

768
769
-  ``stochastic_rounding`` :raw-html:`<a id="stochastic_rounding" title="Permalink to this parameter" href="#stochastic_rounding">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool

770
771
   -  used only if ``use_quantized_grad=true``

772
773
   -  whether to use stochastic rounding in gradient quantization

774
   -  **Note**: works only with ``cpu`` and ``cuda`` device type
775

James Lamb's avatar
James Lamb committed
776
   -  *New in version 4.0.0*
777

778
779
780
781
782
783
IO Parameters
-------------

Dataset Parameters
~~~~~~~~~~~~~~~~~~

Nikita Titov's avatar
Nikita Titov committed
784
785
786
787
-  ``linear_tree`` :raw-html:`<a id="linear_tree" title="Permalink to this parameter" href="#linear_tree">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``linear_trees``

   -  fit piecewise linear gradient boosting tree

788
   -  tree splits are chosen in the usual way, but the model at each leaf is linear instead of constant
Nikita Titov's avatar
Nikita Titov committed
789

790
   -  the linear model at each leaf includes all the numerical features in that leaf's branch
Nikita Titov's avatar
Nikita Titov committed
791

792
   -  the first tree has constant leaf values
793

794
   -  categorical features are used for splits as normal but are not used in the linear models
Nikita Titov's avatar
Nikita Titov committed
795

796
   -  missing values should not be encoded as ``0``. Use ``np.nan`` for Python, ``NA`` for the CLI, and ``NA``, ``NA_real_``, or ``NA_integer_`` for R
Nikita Titov's avatar
Nikita Titov committed
797

798
   -  it is recommended to rescale data before training so that features have similar mean and standard deviation
Nikita Titov's avatar
Nikita Titov committed
799

800
   -  **Note**: works only with ``cpu``, ``gpu`` device type and ``serial`` tree learner
Nikita Titov's avatar
Nikita Titov committed
801

802
   -  **Note**: ``regression_l1`` objective is not supported with linear tree boosting
Nikita Titov's avatar
Nikita Titov committed
803

804
   -  **Note**: setting ``linear_tree=true`` significantly increases the memory use of LightGBM
Nikita Titov's avatar
Nikita Titov committed
805

806
   -  **Note**: if you specify ``monotone_constraints``, constraints will be enforced when choosing the split points, but not when fitting the linear models on leaves
Nikita Titov's avatar
Nikita Titov committed
807

808
-  ``max_bin`` :raw-html:`<a id="max_bin" title="Permalink to this parameter" href="#max_bin">&#x1F517;&#xFE0E;</a>`, default = ``255``, type = int, aliases: ``max_bins``, constraints: ``max_bin > 1``
809
810
811
812
813
814
815

   -  max number of bins that feature values will be bucketed in

   -  small number of bins may reduce training accuracy but may increase general power (deal with over-fitting)

   -  LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255``

Belinda Trotta's avatar
Belinda Trotta committed
816
817
818
819
820
821
-  ``max_bin_by_feature`` :raw-html:`<a id="max_bin_by_feature" title="Permalink to this parameter" href="#max_bin_by_feature">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-int

   -  max number of bins for each feature

   -  if not specified, will use ``max_bin`` for all features

822
-  ``min_data_in_bin`` :raw-html:`<a id="min_data_in_bin" title="Permalink to this parameter" href="#min_data_in_bin">&#x1F517;&#xFE0E;</a>`, default = ``3``, type = int, constraints: ``min_data_in_bin > 0``
823
824
825
826

   -  minimal number of data inside one bin

   -  use this to avoid one-data-one-bin (potential over-fitting)
827

828
-  ``bin_construct_sample_cnt`` :raw-html:`<a id="bin_construct_sample_cnt" title="Permalink to this parameter" href="#bin_construct_sample_cnt">&#x1F517;&#xFE0E;</a>`, default = ``200000``, type = int, aliases: ``subsample_for_bin``, constraints: ``bin_construct_sample_cnt > 0``
829

830
   -  number of data that sampled to construct feature discrete bins
831

832
   -  setting this to larger value will give better training result, but may increase data loading time
833
834
835

   -  set this to larger value if data is very sparse

836
837
   -  **Note**: don't set this to small values, otherwise, you may encounter unexpected errors and poor accuracy

838
-  ``data_random_seed`` :raw-html:`<a id="data_random_seed" title="Permalink to this parameter" href="#data_random_seed">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``data_seed``
839

840
   -  random seed for sampling data to construct histogram bins
841

842
-  ``is_enable_sparse`` :raw-html:`<a id="is_enable_sparse" title="Permalink to this parameter" href="#is_enable_sparse">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool, aliases: ``is_sparse``, ``enable_sparse``, ``sparse``
843

844
   -  used to enable/disable sparse optimization
845

846
-  ``enable_bundle`` :raw-html:`<a id="enable_bundle" title="Permalink to this parameter" href="#enable_bundle">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool, aliases: ``is_enable_bundle``, ``bundle``
847

848
   -  set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html>`__
849

850
   -  **Note**: disabling this may cause the slow training speed for sparse datasets
851

852
-  ``use_missing`` :raw-html:`<a id="use_missing" title="Permalink to this parameter" href="#use_missing">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
853

854
   -  set this to ``false`` to disable the special handle of missing value
855

856
-  ``zero_as_missing`` :raw-html:`<a id="zero_as_missing" title="Permalink to this parameter" href="#zero_as_missing">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
857

858
   -  set this to ``true`` to treat all zero as missing values (including the unshown values in LibSVM / sparse matrices)
859

860
   -  set this to ``false`` to use ``na`` for representing missing values
861

862
-  ``feature_pre_filter`` :raw-html:`<a id="feature_pre_filter" title="Permalink to this parameter" href="#feature_pre_filter">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
863

864
   -  set this to ``true`` (the default) to tell LightGBM to ignore the features that are unsplittable based on ``min_data_in_leaf``
865

866
   -  as dataset object is initialized only once and cannot be changed after that, you may need to set this to ``false`` when searching parameters with ``min_data_in_leaf``, otherwise features are filtered by ``min_data_in_leaf`` firstly if you don't reconstruct dataset object
867

868
   -  **Note**: setting this to ``false`` may slow down the training
869

870
-  ``pre_partition`` :raw-html:`<a id="pre_partition" title="Permalink to this parameter" href="#pre_partition">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_pre_partition``
871

872
   -  used for distributed learning (excluding the ``feature_parallel`` mode)
873
874
875

   -  ``true`` if training data are pre-partitioned, and different machines use different partitions

876
-  ``two_round`` :raw-html:`<a id="two_round" title="Permalink to this parameter" href="#two_round">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``two_round_loading``, ``use_two_round_loading``
877
878
879

   -  set this to ``true`` if data file is too big to fit in memory

880
881
   -  by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed, but may cause run out of memory error when the data file is very big

882
   -  **Note**: works only in case of loading data directly from text file
883

884
-  ``header`` :raw-html:`<a id="header" title="Permalink to this parameter" href="#header">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``has_header``
885
886
887

   -  set this to ``true`` if input data has header

888
   -  **Note**: works only in case of loading data directly from text file
889

890
-  ``label_column`` :raw-html:`<a id="label_column" title="Permalink to this parameter" href="#label_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = int or string, aliases: ``label``
891

892
   -  used to specify the label column
893
894
895
896
897

   -  use number for index, e.g. ``label=0`` means column\_0 is the label

   -  add a prefix ``name:`` for column name, e.g. ``label=name:is_click``

898
899
   -  if omitted, the first column in the training data is used as the label

900
   -  **Note**: works only in case of loading data directly from text file
901

902
-  ``weight_column`` :raw-html:`<a id="weight_column" title="Permalink to this parameter" href="#weight_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = int or string, aliases: ``weight``
903

904
   -  used to specify the weight column
905
906
907
908
909

   -  use number for index, e.g. ``weight=0`` means column\_0 is the weight

   -  add a prefix ``name:`` for column name, e.g. ``weight=name:weight``

910
   -  **Note**: works only in case of loading data directly from text file
911

912
   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``
913

914
915
   -  **Note**: weights should be non-negative

916
-  ``group_column`` :raw-html:`<a id="group_column" title="Permalink to this parameter" href="#group_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = int or string, aliases: ``group``, ``group_id``, ``query_column``, ``query``, ``query_id``
917

918
   -  used to specify the query/group id column
919
920
921
922
923

   -  use number for index, e.g. ``query=0`` means column\_0 is the query id

   -  add a prefix ``name:`` for column name, e.g. ``query=name:query_id``

924
   -  **Note**: works only in case of loading data directly from text file
925

926
   -  **Note**: data should be grouped by query\_id, for more information, see `Query Data <#query-data>`__
927

928
   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0 and query\_id is column\_1, the correct parameter is ``query=0``
929

930
-  ``ignore_column`` :raw-html:`<a id="ignore_column" title="Permalink to this parameter" href="#ignore_column">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = multi-int or string, aliases: ``ignore_feature``, ``blacklist``
931
932

   -  used to specify some ignoring columns in training
933
934
935
936
937

   -  use number for index, e.g. ``ignore_column=0,1,2`` means column\_0, column\_1 and column\_2 will be ignored

   -  add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored

938
   -  **Note**: works only in case of loading data directly from text file
939

940
   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``
941

942
943
   -  **Note**: despite the fact that specified columns will be completely ignored during the training, they still should have a valid format allowing LightGBM to load file successfully

944
-  ``categorical_feature`` :raw-html:`<a id="categorical_feature" title="Permalink to this parameter" href="#categorical_feature">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = multi-int or string, aliases: ``cat_feature``, ``categorical_column``, ``cat_column``, ``categorical_features``
945

946
   -  used to specify categorical features
947
948
949
950
951

   -  use number for index, e.g. ``categorical_feature=0,1,2`` means column\_0, column\_1 and column\_2 are categorical features

   -  add a prefix ``name:`` for column name, e.g. ``categorical_feature=name:c1,c2,c3`` means c1, c2 and c3 are categorical features

952
   -  **Note**: all values will be cast to ``int32`` (integer codes will be extracted from pandas categoricals in the Python-package)
953
954

   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``
955

956
957
   -  **Note**: all values should be less than ``Int32.MaxValue`` (2147483647)

958
   -  **Note**: using large values could be memory consuming. Tree decision rule works best when categorical features are presented by consecutive integers starting from zero
959

960
   -  **Note**: all negative values will be treated as **missing values**
961

962
963
   -  **Note**: the output cannot be monotonically constrained with respect to a categorical feature

964
965
   -  **Note**: floating point numbers in categorical features will be rounded towards 0

966
967
968
969
970
971
-  ``forcedbins_filename`` :raw-html:`<a id="forcedbins_filename" title="Permalink to this parameter" href="#forcedbins_filename">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string

   -  path to a ``.json`` file that specifies bin upper bounds for some or all features

   -  ``.json`` file should contain an array of objects, each containing the word ``feature`` (integer feature index) and ``bin_upper_bound`` (array of thresholds for binning)

972
   -  see `this file <https://github.com/microsoft/LightGBM/blob/master/examples/regression/forced_bins.json>`__ as an example
973
974
975
976
977
978
979
980
981

-  ``save_binary`` :raw-html:`<a id="save_binary" title="Permalink to this parameter" href="#save_binary">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_save_binary``, ``is_save_binary_file``

   -  if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time

   -  **Note**: ``init_score`` is not saved in binary file

   -  **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function

Chen Yufei's avatar
Chen Yufei committed
982
983
984
985
986
987
-  ``precise_float_parser`` :raw-html:`<a id="precise_float_parser" title="Permalink to this parameter" href="#precise_float_parser">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool

   -  use precise floating point number parsing for text parser (e.g. CSV, TSV, LibSVM input)

   -  **Note**: setting this to ``true`` may lead to much slower text parsing

988
989
990
991
992
993
994
995
-  ``parser_config_file`` :raw-html:`<a id="parser_config_file" title="Permalink to this parameter" href="#parser_config_file">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string

   -  path to a ``.json`` file that specifies customized parser initialized configuration

   -  see `lightgbm-transform <https://github.com/microsoft/lightgbm-transform>`__ for usage examples

   -  **Note**: ``lightgbm-transform`` is not maintained by LightGBM's maintainers. Bug reports or feature requests should go to `issues page <https://github.com/microsoft/lightgbm-transform/issues>`__

James Lamb's avatar
James Lamb committed
996
   -  *New in version 4.0.0*
997

998
999
1000
Predict Parameters
~~~~~~~~~~~~~~~~~~

1001
1002
1003
1004
1005
1006
1007
1008
-  ``start_iteration_predict`` :raw-html:`<a id="start_iteration_predict" title="Permalink to this parameter" href="#start_iteration_predict">&#x1F517;&#xFE0E;</a>`, default = ``0``, type = int

   -  used only in ``prediction`` task

   -  used to specify from which iteration to start the prediction

   -  ``<= 0`` means from the first iteration

1009
1010
1011
1012
1013
1014
1015
1016
-  ``num_iteration_predict`` :raw-html:`<a id="num_iteration_predict" title="Permalink to this parameter" href="#num_iteration_predict">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int

   -  used only in ``prediction`` task

   -  used to specify how many trained iterations will be used in prediction

   -  ``<= 0`` means no limit

1017
-  ``predict_raw_score`` :raw-html:`<a id="predict_raw_score" title="Permalink to this parameter" href="#predict_raw_score">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_predict_raw_score``, ``predict_rawscore``, ``raw_score``
1018

1019
   -  used only in ``prediction`` task
1020

1021
   -  set this to ``true`` to predict only the raw scores
1022

1023
   -  set this to ``false`` to predict transformed scores
1024

1025
-  ``predict_leaf_index`` :raw-html:`<a id="predict_leaf_index" title="Permalink to this parameter" href="#predict_leaf_index">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_predict_leaf_index``, ``leaf_index``
1026

1027
   -  used only in ``prediction`` task
1028

1029
   -  set this to ``true`` to predict with leaf index of all trees
1030

1031
-  ``predict_contrib`` :raw-html:`<a id="predict_contrib" title="Permalink to this parameter" href="#predict_contrib">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_predict_contrib``, ``contrib``
1032

1033
   -  used only in ``prediction`` task
1034

1035
   -  set this to ``true`` to estimate `SHAP values <https://arxiv.org/abs/1706.06060>`__, which represent how each feature contributes to each prediction
1036

1037
   -  produces ``#features + 1`` values where the last value is the expected value of the model output over the training data
1038

1039
   -  **Note**: if you want to get more explanation for your model's predictions using SHAP values like SHAP interaction values, you can install `shap package <https://github.com/shap>`__
1040

Nikita Titov's avatar
Nikita Titov committed
1041
   -  **Note**: unlike the shap package, with ``predict_contrib`` we return a matrix with an extra column, where the last column is the expected value
1042

1043
1044
   -  **Note**: this feature is not implemented for linear trees

1045
-  ``predict_disable_shape_check`` :raw-html:`<a id="predict_disable_shape_check" title="Permalink to this parameter" href="#predict_disable_shape_check">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
1046

1047
   -  used only in ``prediction`` task
1048

1049
   -  control whether or not LightGBM raises an error when you try to predict on data with a different number of features than the training data
1050

1051
1052
1053
1054
1055
   -  if ``false`` (the default), a fatal error will be raised if the number of features in the dataset you predict on differs from the number seen during training

   -  if ``true``, LightGBM will attempt to predict on whatever data you provide. This is dangerous because you might get incorrect predictions, but you could use it in situations where it is difficult or expensive to generate some features and you are very confident that they were never chosen for splits in the model

   -  **Note**: be very careful setting this parameter to ``true``
1056

1057
-  ``pred_early_stop`` :raw-html:`<a id="pred_early_stop" title="Permalink to this parameter" href="#pred_early_stop">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
1058

1059
   -  used only in ``prediction`` task
1060

1061
1062
   -  used only in ``classification`` and ``ranking`` applications

1063
1064
   -  used only for predicting normal or raw scores

1065
   -  if ``true``, will use early-stopping to speed up the prediction. May affect the accuracy
1066

1067
1068
   -  **Note**: cannot be used with ``rf`` boosting type or custom objective function

1069
-  ``pred_early_stop_freq`` :raw-html:`<a id="pred_early_stop_freq" title="Permalink to this parameter" href="#pred_early_stop_freq">&#x1F517;&#xFE0E;</a>`, default = ``10``, type = int
1070

1071
   -  used only in ``prediction`` task and if ``pred_early_stop=true``
1072
1073
1074

   -  the frequency of checking early-stopping prediction

1075
-  ``pred_early_stop_margin`` :raw-html:`<a id="pred_early_stop_margin" title="Permalink to this parameter" href="#pred_early_stop_margin">&#x1F517;&#xFE0E;</a>`, default = ``10.0``, type = double
1076

1077
   -  used only in ``prediction`` task and if ``pred_early_stop=true``
1078
1079
1080

   -  the threshold of margin in early-stopping prediction

1081
-  ``output_result`` :raw-html:`<a id="output_result" title="Permalink to this parameter" href="#output_result">&#x1F517;&#xFE0E;</a>`, default = ``LightGBM_predict_result.txt``, type = string, aliases: ``predict_result``, ``prediction_result``, ``predict_name``, ``prediction_name``, ``pred_name``, ``name_pred``
1082
1083
1084

   -  used only in ``prediction`` task

1085
   -  filename of prediction result
1086

1087
   -  **Note**: can be used only in CLI version
1088

1089
1090
Convert Parameters
~~~~~~~~~~~~~~~~~~
1091

1092
-  ``convert_model_language`` :raw-html:`<a id="convert_model_language" title="Permalink to this parameter" href="#convert_model_language">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string
1093

1094
   -  used only in ``convert_model`` task
1095

1096
   -  only ``cpp`` is supported yet; for conversion model to other languages consider using `m2cgen <https://github.com/BayesWitnesses/m2cgen>`__ utility
1097

1098
   -  if ``convert_model_language`` is set and ``task=train``, the model will be also converted
1099

1100
1101
   -  **Note**: can be used only in CLI version

1102
-  ``convert_model`` :raw-html:`<a id="convert_model" title="Permalink to this parameter" href="#convert_model">&#x1F517;&#xFE0E;</a>`, default = ``gbdt_prediction.cpp``, type = string, aliases: ``convert_model_file``
1103

1104
   -  used only in ``convert_model`` task
1105

1106
   -  output filename of converted model
1107

1108
1109
   -  **Note**: can be used only in CLI version

1110
1111
Objective Parameters
--------------------
1112

1113
1114
-  ``objective_seed`` :raw-html:`<a id="objective_seed" title="Permalink to this parameter" href="#objective_seed">&#x1F517;&#xFE0E;</a>`, default = ``5``, type = int

1115
   -  used only in ``rank_xendcg`` objective
1116

1117
   -  random seed for objectives, if random process is needed
1118

1119
-  ``num_class`` :raw-html:`<a id="num_class" title="Permalink to this parameter" href="#num_class">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``num_classes``, constraints: ``num_class > 0``
1120

1121
   -  used only in ``multi-class`` classification application
1122

1123
-  ``is_unbalance`` :raw-html:`<a id="is_unbalance" title="Permalink to this parameter" href="#is_unbalance">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``unbalance``, ``unbalanced_sets``
1124

1125
   -  used only in ``binary`` and ``multiclassova`` applications
1126

1127
   -  set this to ``true`` if training data are unbalanced
1128

1129
1130
   -  **Note**: while enabling this should increase the overall performance metric of your model, it will also result in poor estimates of the individual class probabilities

1131
   -  **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them
1132

1133
-  ``scale_pos_weight`` :raw-html:`<a id="scale_pos_weight" title="Permalink to this parameter" href="#scale_pos_weight">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, constraints: ``scale_pos_weight > 0.0``
1134

1135
   -  used only in ``binary`` and ``multiclassova`` applications
1136

1137
   -  weight of labels with positive class
1138

1139
1140
   -  **Note**: while enabling this should increase the overall performance metric of your model, it will also result in poor estimates of the individual class probabilities

1141
   -  **Note**: this parameter cannot be used at the same time with ``is_unbalance``, choose only **one** of them
1142

1143
-  ``sigmoid`` :raw-html:`<a id="sigmoid" title="Permalink to this parameter" href="#sigmoid">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, constraints: ``sigmoid > 0.0``
1144

1145
   -  used only in ``binary`` and ``multiclassova`` classification and in ``lambdarank`` applications
1146

1147
   -  parameter for the sigmoid function
1148

1149
-  ``boost_from_average`` :raw-html:`<a id="boost_from_average" title="Permalink to this parameter" href="#boost_from_average">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
1150

1151
   -  used only in ``regression``, ``binary``, ``multiclassova`` and ``cross-entropy`` applications
1152

1153
   -  adjusts initial score to the mean of labels for faster convergence
1154

1155
-  ``reg_sqrt`` :raw-html:`<a id="reg_sqrt" title="Permalink to this parameter" href="#reg_sqrt">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
1156

1157
   -  used only in ``regression`` application
1158

1159
   -  used to fit ``sqrt(label)`` instead of original values and prediction result will be also automatically converted to ``prediction^2``
1160

1161
   -  might be useful in case of large-range labels
1162

1163
-  ``alpha`` :raw-html:`<a id="alpha" title="Permalink to this parameter" href="#alpha">&#x1F517;&#xFE0E;</a>`, default = ``0.9``, type = double, constraints: ``alpha > 0.0``
1164

1165
   -  used only in ``huber`` and ``quantile`` ``regression`` applications
1166

1167
   -  parameter for `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__ and `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__
1168

1169
-  ``fair_c`` :raw-html:`<a id="fair_c" title="Permalink to this parameter" href="#fair_c">&#x1F517;&#xFE0E;</a>`, default = ``1.0``, type = double, constraints: ``fair_c > 0.0``
1170

1171
   -  used only in ``fair`` ``regression`` application
1172

1173
   -  parameter for `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__
1174

1175
-  ``poisson_max_delta_step`` :raw-html:`<a id="poisson_max_delta_step" title="Permalink to this parameter" href="#poisson_max_delta_step">&#x1F517;&#xFE0E;</a>`, default = ``0.7``, type = double, constraints: ``poisson_max_delta_step > 0.0``
1176

1177
   -  used only in ``poisson`` ``regression`` application
1178

1179
1180
   -  parameter for `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__ to safeguard optimization

1181
-  ``tweedie_variance_power`` :raw-html:`<a id="tweedie_variance_power" title="Permalink to this parameter" href="#tweedie_variance_power">&#x1F517;&#xFE0E;</a>`, default = ``1.5``, type = double, constraints: ``1.0 <= tweedie_variance_power < 2.0``
1182
1183
1184
1185
1186
1187

   -  used only in ``tweedie`` ``regression`` application

   -  used to control the variance of the tweedie distribution

   -  set this closer to ``2`` to shift towards a **Gamma** distribution
1188

1189
   -  set this closer to ``1`` to shift towards a **Poisson** distribution
1190

1191
-  ``lambdarank_truncation_level`` :raw-html:`<a id="lambdarank_truncation_level" title="Permalink to this parameter" href="#lambdarank_truncation_level">&#x1F517;&#xFE0E;</a>`, default = ``30``, type = int, constraints: ``lambdarank_truncation_level > 0``
1192

1193
   -  used only in ``lambdarank`` application
1194

1195
   -  controls the number of top-results to focus on during training, refer to "truncation level" in the Sec. 3 of `LambdaMART paper <https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/>`__
1196

Nikita Titov's avatar
Nikita Titov committed
1197
   -  this parameter is closely related to the desirable cutoff ``k`` in the metric **NDCG@k** that we aim at optimizing the ranker for. The optimal setting for this parameter is likely to be slightly higher than ``k`` (e.g., ``k + 3``) to include more pairs of documents to train on, but perhaps not too high to avoid deviating too much from the desired target metric **NDCG@k**
1198

1199
-  ``lambdarank_norm`` :raw-html:`<a id="lambdarank_norm" title="Permalink to this parameter" href="#lambdarank_norm">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
1200
1201
1202
1203
1204

   -  used only in ``lambdarank`` application

   -  set this to ``true`` to normalize the lambdas for different queries, and improve the performance for unbalanced data

1205
   -  set this to ``false`` to enforce the original lambdarank algorithm
1206

1207
-  ``label_gain`` :raw-html:`<a id="label_gain" title="Permalink to this parameter" href="#label_gain">&#x1F517;&#xFE0E;</a>`, default = ``0,1,3,7,15,31,63,...,2^30-1``, type = multi-double
1208

1209
   -  used only in ``lambdarank`` application
Nikita Titov's avatar
Nikita Titov committed
1210

1211
   -  relevant gain for labels. For example, the gain of label ``2`` is ``3`` in case of default label gains
Nikita Titov's avatar
Nikita Titov committed
1212

1213
   -  separate by ``,``
Guolin Ke's avatar
Guolin Ke committed
1214

1215
1216
-  ``lambdarank_position_bias_regularization`` :raw-html:`<a id="lambdarank_position_bias_regularization" title="Permalink to this parameter" href="#lambdarank_position_bias_regularization">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, constraints: ``lambdarank_position_bias_regularization >= 0.0``

1217
1218
1219
   -  used only in ``lambdarank`` application when positional information is provided and position bias is modeled

   -  larger values reduce the inferred position bias factors
1220

James Lamb's avatar
James Lamb committed
1221
1222
   -  *New in version 4.1.0*

1223
1224
1225
Metric Parameters
-----------------

1226
-  ``metric`` :raw-html:`<a id="metric" title="Permalink to this parameter" href="#metric">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = multi-enum, aliases: ``metrics``, ``metric_types``
1227

1228
   -  metric(s) to be evaluated on the evaluation set(s)
1229

1230
      -  ``""`` (empty string or not specified) means that metric corresponding to specified ``objective`` will be used (this is possible only for pre-defined objective functions, otherwise no evaluation metric will be added)
1231

1232
      -  ``"None"`` (string, **not** a ``None`` value) means that no metric will be registered, aliases: ``na``, ``null``, ``custom``
1233
1234
1235
1236
1237

      -  ``l1``, absolute loss, aliases: ``mean_absolute_error``, ``mae``, ``regression_l1``

      -  ``l2``, square loss, aliases: ``mean_squared_error``, ``mse``, ``regression_l2``, ``regression``

1238
      -  ``rmse``, root square loss, aliases: ``root_mean_squared_error``, ``l2_root``
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255

      -  ``quantile``, `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__

      -  ``mape``, `MAPE loss <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`__, aliases: ``mean_absolute_percentage_error``

      -  ``huber``, `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__

      -  ``fair``, `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__

      -  ``poisson``, negative log-likelihood for `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__

      -  ``gamma``, negative log-likelihood for **Gamma** regression

      -  ``gamma_deviance``, residual deviance for **Gamma** regression

      -  ``tweedie``, negative log-likelihood for **Tweedie** regression

1256
      -  ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__, aliases: ``lambdarank``, ``rank_xendcg``, ``xendcg``, ``xe_ndcg``, ``xe_ndcg_mart``, ``xendcg_mart``
1257
1258
1259
1260
1261

      -  ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision``

      -  ``auc``, `AUC <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`__

1262
1263
      -  ``average_precision``, `average precision score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html>`__

1264
1265
      -  ``r2``, `R-squared <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html>`__

1266
1267
      -  ``binary_logloss``, `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__, aliases: ``binary``

Misha Lisovyi's avatar
Misha Lisovyi committed
1268
      -  ``binary_error``, for one sample: ``0`` for correct classification, ``1`` for error classification
1269

1270
      -  ``auc_mu``, `AUC-mu <https://proceedings.mlr.press/v97/kleiman19a.html>`__
Belinda Trotta's avatar
Belinda Trotta committed
1271

1272
1273
1274
1275
      -  ``multi_logloss``, log loss for multi-class classification, aliases: ``multiclass``, ``softmax``, ``multiclassova``, ``multiclass_ova``, ``ova``, ``ovr``

      -  ``multi_error``, error rate for multi-class classification

Guolin Ke's avatar
Guolin Ke committed
1276
      -  ``cross_entropy``, cross-entropy (with optional linear weights), aliases: ``xentropy``
1277

Guolin Ke's avatar
Guolin Ke committed
1278
      -  ``cross_entropy_lambda``, "intensity-weighted" cross-entropy, aliases: ``xentlambda``
1279

Guolin Ke's avatar
Guolin Ke committed
1280
      -  ``kullback_leibler``, `Kullback-Leibler divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>`__, aliases: ``kldiv``
1281

Misha Lisovyi's avatar
Misha Lisovyi committed
1282
   -  support multiple metrics, separated by ``,``
1283

1284
-  ``metric_freq`` :raw-html:`<a id="metric_freq" title="Permalink to this parameter" href="#metric_freq">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``output_freq``, constraints: ``metric_freq > 0``
1285
1286
1287

   -  frequency for metric output

1288
1289
   -  **Note**: can be used only in CLI version

1290
-  ``is_provide_training_metric`` :raw-html:`<a id="is_provide_training_metric" title="Permalink to this parameter" href="#is_provide_training_metric">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``training_metric``, ``is_training_metric``, ``train_metric``
1291

1292
   -  set this to ``true`` to output metric result over training dataset
1293

1294
1295
   -  **Note**: can be used only in CLI version

1296
-  ``eval_at`` :raw-html:`<a id="eval_at" title="Permalink to this parameter" href="#eval_at">&#x1F517;&#xFE0E;</a>`, default = ``1,2,3,4,5``, type = multi-int, aliases: ``ndcg_eval_at``, ``ndcg_at``, ``map_eval_at``, ``map_at``
1297

1298
1299
   -  used only with ``ndcg`` and ``map`` metrics

1300
   -  `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__ and `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__ evaluation positions, separated by ``,``
1301

Belinda Trotta's avatar
Belinda Trotta committed
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
-  ``multi_error_top_k`` :raw-html:`<a id="multi_error_top_k" title="Permalink to this parameter" href="#multi_error_top_k">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, constraints: ``multi_error_top_k > 0``

   -  used only with ``multi_error`` metric

   -  threshold for top-k multi-error metric

   -  the error on each sample is ``0`` if the true class is among the top ``multi_error_top_k`` predictions, and ``1`` otherwise

      -  more precisely, the error on a sample is ``0`` if there are at least ``num_classes - multi_error_top_k`` predictions strictly less than the prediction on the true class

   -  when ``multi_error_top_k=1`` this is equivalent to the usual multi-error metric

Belinda Trotta's avatar
Belinda Trotta committed
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
-  ``auc_mu_weights`` :raw-html:`<a id="auc_mu_weights" title="Permalink to this parameter" href="#auc_mu_weights">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-double

   -  used only with ``auc_mu`` metric

   -  list representing flattened matrix (in row-major order) giving loss weights for classification errors

   -  list should have ``n * n`` elements, where ``n`` is the number of classes

   -  the matrix co-ordinate ``[i, j]`` should correspond to the ``i * n + j``-th element of the list

   -  if not specified, will use equal weights for all classes

1326
1327
1328
Network Parameters
------------------

1329
-  ``num_machines`` :raw-html:`<a id="num_machines" title="Permalink to this parameter" href="#num_machines">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``num_machine``, constraints: ``num_machines > 0``
1330

1331
   -  the number of machines for distributed learning application
1332

1333
   -  this parameter is needed to be set in both **socket** and **MPI** versions
1334

1335
-  ``local_listen_port`` :raw-html:`<a id="local_listen_port" title="Permalink to this parameter" href="#local_listen_port">&#x1F517;&#xFE0E;</a>`, default = ``12400 (random for Dask-package)``, type = int, aliases: ``local_port``, ``port``, constraints: ``local_listen_port > 0``
1336
1337
1338

   -  TCP listen port for local machines

1339
   -  **Note**: don't forget to allow this port in firewall settings before training
1340

1341
-  ``time_out`` :raw-html:`<a id="time_out" title="Permalink to this parameter" href="#time_out">&#x1F517;&#xFE0E;</a>`, default = ``120``, type = int, constraints: ``time_out > 0``
1342
1343
1344

   -  socket time-out in minutes

1345
-  ``machine_list_filename`` :raw-html:`<a id="machine_list_filename" title="Permalink to this parameter" href="#machine_list_filename">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``machine_list_file``, ``machine_list``, ``mlist``
1346

1347
   -  path of file that lists machines for this distributed learning application
1348

1349
   -  each line contains one IP and one port for one machine. The format is ``ip port`` (space as a separator)
1350

1351
1352
   -  **Note**: can be used only in CLI version

1353
-  ``machines`` :raw-html:`<a id="machines" title="Permalink to this parameter" href="#machines">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``workers``, ``nodes``
1354
1355

   -  list of machines in the following format: ``ip1:port1,ip2:port2``
1356
1357
1358
1359

GPU Parameters
--------------

1360
-  ``gpu_platform_id`` :raw-html:`<a id="gpu_platform_id" title="Permalink to this parameter" href="#gpu_platform_id">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int
1361

1362
1363
   -  used only with ``gpu`` device type

1364
   -  OpenCL platform ID. Usually each GPU vendor exposes one OpenCL platform
1365

1366
   -  ``-1`` means the system-wide default platform
1367

1368
1369
   -  **Note**: refer to `GPU Targets <./GPU-Targets.rst#query-opencl-devices-in-your-system>`__ for more details

1370
-  ``gpu_device_id`` :raw-html:`<a id="gpu_device_id" title="Permalink to this parameter" href="#gpu_device_id">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int
1371

1372
   -  OpenCL device ID in the specified platform or CUDA device ID. Each GPU in the selected platform has a unique device ID
1373

1374
   -  ``-1`` means the default device in the selected platform
1375

1376
1377
   -  **Note**: refer to `GPU Targets <./GPU-Targets.rst#query-opencl-devices-in-your-system>`__ for more details

1378
-  ``gpu_use_dp`` :raw-html:`<a id="gpu_use_dp" title="Permalink to this parameter" href="#gpu_use_dp">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
1379

1380
1381
   -  set this to ``true`` to use double precision math on GPU (by default single precision is used)

1382
   -  **Note**: can be used only in OpenCL implementation (``device_type="gpu"``), in CUDA implementation only double precision is currently supported
1383
1384
1385
1386
1387

-  ``num_gpu`` :raw-html:`<a id="num_gpu" title="Permalink to this parameter" href="#num_gpu">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, constraints: ``num_gpu > 0``

   -  number of GPUs

1388
   -  **Note**: can be used only in CUDA implementation (``device_type="cuda"``)
1389

1390
1391
.. end params list

1392
1393
1394
1395
1396
1397
Others
------

Continued Training with Input Score
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1398
1399
LightGBM supports continued training with initial scores.
It uses an additional file to store these initial scores, like the following:
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409

::

    0.5
    -0.1
    0.9
    ...

It means the initial score of the first data row is ``0.5``, second is ``-0.1``, and so on.
The initial score file corresponds with data file line by line, and has per score per line.
1410

1411
If the name of data file is ``train.txt``, the initial score file should be named as ``train.txt.init`` and placed in the same folder as the data file.
1412
In this case, LightGBM will auto load initial score file if it exists.
1413

1414
1415
If binary data files exist for raw data file ``train.txt``, for example in the name ``train.txt.bin``, then the initial score file should be named as ``train.txt.bin.init``.

1416
1417
1418
Weight Data
~~~~~~~~~~~

1419
1420
LightGBM supports weighted training.
It uses an additional file to store weight data, like the following:
1421
1422
1423
1424
1425
1426
1427
1428

::

    1.0
    0.5
    0.8
    ...

1429
1430
It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on. Weights should be non-negative.

1431
The weight file corresponds with data file line by line, and has per weight per line.
1432

1433
And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file.
1434
In this case, LightGBM will load the weight file automatically if it exists.
1435

1436
1437
Also, you can include weight column in your data file.
Please refer to the ``weight_column`` `parameter <#weight_column>`__ in above.
1438
1439
1440
1441

Query Data
~~~~~~~~~~

1442
For learning to rank, it needs query information for training data.
1443

Nikita Titov's avatar
Nikita Titov committed
1444
LightGBM uses an additional file to store query data, like the following:
1445
1446
1447
1448
1449
1450
1451
1452

::

    27
    18
    67
    ...

1453
1454
1455
1456
For wrapper libraries like in Python and R, this information can also be provided as an array-like via the Dataset parameter ``group``.

::

1457
    [27, 18, 67, ...]
1458
1459

For example, if you have a 112-document dataset with ``group = [27, 18, 67]``, that means that you have 3 groups, where the first 27 records are in the first group, records 28-45 are in the second group, and records 46-112 are in the third group.
1460
1461
1462

**Note**: data should be ordered by the query.

1463
If the name of data file is ``train.txt``, the query file should be named as ``train.txt.query`` and placed in the same folder as the data file.
1464
In this case, LightGBM will load the query file automatically if it exists.
1465

1466
1467
Also, you can include query/group id column in your data file.
Please refer to the ``group_column`` `parameter <#group_column>`__ in above.