Parameters.rst 35.9 KB
Newer Older
1
2
..  List of parameters is auto generated by LightGBM\helper\parameter_generator.py from LightGBM\include\LightGBM\config.h file.

3
4
5
Parameters
==========

6
This page contains descriptions of all parameters in LightGBM.
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

**List of other helpful links**

- `Python API <./Python-API.rst>`__

- `Parameters Tuning <./Parameters-Tuning.rst>`__

**External Links**

- `Laurae++ Interactive Documentation`_

Parameters Format
-----------------

The parameters format is ``key1=value1 key2=value2 ...``.
22
Parameters can be set both in config file and command line.
23
24
25
26
27
By using command line, parameters should not have spaces before and after ``=``.
By using config files, one line can only contain one parameter. You can use ``#`` to comment.

If one parameter appears in both command line and config file, LightGBM will use the parameter in command line.

28
29
.. start params list

30
31
32
Core Parameters
---------------

33
-  ``config``, default = ``""``, type = string, aliases: ``config_file``
34
35
36

   -  path of config file

37
   -  **Note**: can be used only in CLI version
38

39
-  ``task``, default = ``train``, type = enum, options: ``train``, ``predict``, ``convert_model``, ``refit``, aliases: ``task_type``
40

41
   -  ``train``, for training, aliases: ``training``
42

43
   -  ``predict``, for prediction, aliases: ``prediction``, ``test``
44

45
   -  ``convert_model``, for converting model file into if-else format, see more information in `IO Parameters <#io-parameters>`__
46

47
   -  ``refit``, for refitting existing models with new data, aliases: ``refit_tree``
48

49
   -  **Note**: can be used only in CLI version
50

51
-  ``objective``, default = ``regression``, type = enum, options: ``regression``, ``regression_l1``, ``huber``, ``fair``, ``poisson``, ``quantile``, ``mape``, ``gammma``, ``tweedie``, ``binary``, ``multiclass``, ``multiclassova``, ``xentropy``, ``xentlambda``, ``lambdarank``, aliases: ``objective_type``, ``app``, ``application``
52

53
   -  regression application
54

55
      -  ``regression_l2``, L2 loss, aliases: ``regression``, ``mean_squared_error``, ``mse``, ``l2_root``, ``root_mean_squared_error``, ``rmse``
56

57
      -  ``regression_l1``, L1 loss, aliases: ``mean_absolute_error``, ``mae``
58

59
      -  ``huber``, `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__
60

61
      -  ``fair``, `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__
62

63
      -  ``poisson``, `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__
64

65
      -  ``quantile``, `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__
66

67
      -  ``mape``, `MAPE loss <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`__, aliases: ``mean_absolute_percentage_error``
68

69
      -  ``gamma``, Gamma regression with log-link. It might be useful, e.g., for modeling insurance claims severity, or for any target that might be `gamma-distributed <https://en.wikipedia.org/wiki/Gamma_distribution#Applications>`__
Guolin Ke's avatar
Guolin Ke committed
70

71
      -  ``tweedie``, Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any target that might be `tweedie-distributed <https://en.wikipedia.org/wiki/Tweedie_distribution#Applications>`__
Guolin Ke's avatar
Guolin Ke committed
72

73
   -  ``binary``, binary `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__ classification (or logistic regression). Requires labels in {0, 1}; see ``xentropy`` for general probability labels in [0, 1]
74
75
76

   -  multi-class classification application

77
      -  ``multiclass``, `softmax <https://en.wikipedia.org/wiki/Softmax_function>`__ objective function, aliases: ``softmax``
78

79
      -  ``multiclassova``, `One-vs-All <https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest>`__ binary objective function, aliases: ``multiclass_ova``, ``ova``, ``ovr``
Nikita Titov's avatar
Nikita Titov committed
80
81

      -  ``num_class`` should be set as well
82
83
84

   -  cross-entropy application

85
      -  ``xentropy``, objective function for cross-entropy (with optional linear weights), aliases: ``cross_entropy``
86

87
      -  ``xentlambda``, alternative parameterization of cross-entropy, aliases: ``cross_entropy_lambda``
88

89
      -  label is anything in interval [0, 1]
90

91
   -  ``lambdarank``, `lambdarank <https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf>`__ application
92

93
      -  label should be ``int`` type in lambdarank tasks, and larger number represents the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect)
94

95
      -  `label_gain <#objective-parameters>`__ can be used to set the gain (weight) of ``int`` label
96
97

      -  all values in ``label`` must be smaller than number of elements in ``label_gain``
98

99
-  ``boosting``, default = ``gbdt``, type = enum, options: ``gbdt``, ``gbrt``, ``rf``, ``random_forest``, ``dart``, ``goss``, aliases: ``boosting_type``, ``boost``
100

101
   -  ``gbdt``, traditional Gradient Boosting Decision Tree, aliases: ``gbrt``
102

103
   -  ``rf``, Random Forest, aliases: ``random_forest``
104

105
   -  ``dart``, `Dropouts meet Multiple Additive Regression Trees <https://arxiv.org/abs/1505.01866>`__
106
107
108

   -  ``goss``, Gradient-based One-Side Sampling

109
-  ``data``, default = ``""``, type = string, aliases: ``train``, ``train_data``, ``data_filename``
110

111
   -  path of training data, LightGBM will train from this data
112

113
-  ``valid``, default = ``""``, type = string, aliases: ``test``, ``valid_data``, ``valid_data_file``, ``test_data``, ``valid_filenames``
114

115
   -  path(s) of validation/test data, LightGBM will output metrics for these data
116

117
   -  support multiple validation data, separated by ``,``
118

119
-  ``num_iterations``, default = ``100``, type = int, aliases: ``num_iteration``, ``num_tree``, ``num_trees``, ``num_round``, ``num_rounds``, ``num_boost_round``, ``n_estimators``, constraints: ``num_iterations >= 0``
120
121

   -  number of boosting iterations
122

123
   -  **Note**: for Python/R-package, **this parameter is ignored**, use ``num_boost_round`` (Python) or ``nrounds`` (R) input arguments of ``train`` and ``cv`` methods instead
124

125
   -  **Note**: internally, LightGBM constructs ``num_class * num_iterations`` trees for multi-class classification problems
126

127
-  ``learning_rate``, default = ``0.1``, type = double, aliases: ``shrinkage_rate``, constraints: ``learning_rate > 0.0``
128
129
130
131
132

   -  shrinkage rate

   -  in ``dart``, it also affects on normalization weights of dropped trees

133
-  ``num_leaves``, default = ``31``, type = int, aliases: ``num_leaf``, constraints: ``num_leaves > 1``
134

135
   -  max number of leaves in one tree
136

137
-  ``tree_learner``, default = ``serial``, type = enum, options: ``serial``, ``feature``, ``data``, ``voting``, aliases: ``tree``, ``tree_learner_type``
138
139
140

   -  ``serial``, single machine tree learner

141
   -  ``feature``, feature parallel tree learner, aliases: ``feature_parallel``
142

143
   -  ``data``, data parallel tree learner, aliases: ``data_parallel``
144

145
   -  ``voting``, voting parallel tree learner, aliases: ``voting_parallel``
146
147
148

   -  refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__ to get more details

149
-  ``num_threads``, default = ``0``, type = int, aliases: ``num_thread``, ``nthread``, ``nthreads``
150
151
152

   -  number of threads for LightGBM

153
   -  ``0`` means default number of threads in OpenMP
154

155
   -  for the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPUs use `hyper-threading <https://en.wikipedia.org/wiki/Hyper-threading>`__ to generate 2 threads per CPU core)
156

157
   -  do not set it too large if your dataset is small (for instance, do not use 64 threads for a dataset with 10,000 rows)
158

159
   -  be aware a task manager or any similar CPU monitoring tool might report that cores not being fully utilized. **This is normal**
160

161
   -  for parallel learning, do not use all CPU cores because this will cause poor performance for the network communication
162

163
164
165
-  ``device_type``, default = ``cpu``, type = enum, options: ``cpu``, ``gpu``, aliases: ``device``

   -  device for the tree learning, you can use GPU to achieve the faster learning
166
167
168

   -  **Note**: it is recommended to use the smaller ``max_bin`` (e.g. 63) to get the better speed up

169
170
171
172
173
   -  **Note**: for the faster speed, GPU uses 32-bit float point to sum up by default, so this may affect the accuracy for some tasks. You can set ``gpu_use_dp=true`` to enable 64-bit float point, but it will slow down the training

   -  **Note**: refer to `Installation Guide <./Installation-Guide.rst#build-gpu-version>`__ to build LightGBM with GPU support

-  ``seed``, default = ``0``, type = int, aliases: ``random_seed``
174

175
176
177
   -  this seed is used to generate other seeds, e.g. ``data_random_seed``, ``feature_fraction_seed``

   -  will be overridden, if you set other seeds
178
179
180
181

Learning Control Parameters
---------------------------

182
-  ``max_depth``, default = ``-1``, type = int
183

184
   -  limit the max depth for tree model. This is used to deal with over-fitting when ``#data`` is small. Tree still grows leaf-wise
185
186
187

   -  ``< 0`` means no limit

188
-  ``min_data_in_leaf``, default = ``20``, type = int, aliases: ``min_data_per_leaf``, ``min_data``, ``min_child_samples``, constraints: ``min_data_in_leaf >= 0``
189
190
191

   -  minimal number of data in one leaf. Can be used to deal with over-fitting

192
-  ``min_sum_hessian_in_leaf``, default = ``1e-3``, type = double, aliases: ``min_sum_hessian_per_leaf``, ``min_sum_hessian``, ``min_hessian``, ``min_child_weight``, constraints: ``min_sum_hessian_in_leaf >= 0.0``
193
194
195

   -  minimal sum hessian in one leaf. Like ``min_data_in_leaf``, it can be used to deal with over-fitting

196
-  ``bagging_fraction``, default = ``1.0``, type = double, aliases: ``sub_row``, ``subsample``, ``bagging``, constraints: ``0.0 < bagging_fraction <= 1.0``
197

198
   -  like ``feature_fraction``, but this will randomly select part of data without resampling
199
200
201
202
203

   -  can be used to speed up training

   -  can be used to deal with over-fitting

204
   -  **Note**: to enable bagging, ``bagging_freq`` should be set to a non zero value as well
205

206
-  ``bagging_freq``, default = ``0``, type = int, aliases: ``subsample_freq``
207

208
   -  frequency for bagging
209

210
211
212
213
214
215
216
217
218
219
220
   -  ``0`` means disable bagging; ``k`` means perform bagging at every ``k`` iteration

   -  **Note**: to enable bagging, ``bagging_fraction`` should be set to value smaller than ``1.0`` as well

-  ``bagging_seed``, default = ``3``, type = int, aliases: ``bagging_fraction_seed``

   -  random seed for bagging

-  ``feature_fraction``, default = ``1.0``, type = double, aliases: ``sub_feature``, ``colsample_bytree``, constraints: ``0.0 < feature_fraction <= 1.0``

   -  LightGBM will randomly select part of features on each iteration if ``feature_fraction`` smaller than ``1.0``. For example, if you set it to ``0.8``, LightGBM will select 80% of features before training each tree
221
222
223
224
225

   -  can be used to speed up training

   -  can be used to deal with over-fitting

226
227
228
-  ``feature_fraction_seed``, default = ``2``, type = int

   -  random seed for ``feature_fraction``
229

230
-  ``early_stopping_round``, default = ``0``, type = int, aliases: ``early_stopping_rounds``, ``early_stopping``
231

232
   -  will stop training if one metric of one validation data doesn't improve in last ``early_stopping_round`` rounds
233

234
   -  ``<= 0`` means disable
235

236
-  ``max_delta_step``, default = ``0.0``, type = double, aliases: ``max_tree_output``, ``max_leaf_output``
237

238
   -  used to limit the max output of tree leaves
239

240
   -  ``<= 0`` means no constraint
241

242
   -  the final max output of leaves is ``learning_rate * max_delta_step``
243

244
-  ``lambda_l1``, default = ``0.0``, type = double, aliases: ``reg_alpha``, constraints: ``lambda_l1 >= 0.0``
245
246
247

   -  L1 regularization

248
-  ``lambda_l2``, default = ``0.0``, type = double, aliases: ``reg_lambda``, constraints: ``lambda_l2 >= 0.0``
249
250
251

   -  L2 regularization

252
-  ``min_gain_to_split``, default = ``0.0``, type = double, aliases: ``min_split_gain``, constraints: ``min_gain_to_split >= 0.0``
253

254
   -  the minimal gain to perform split
255

256
-  ``drop_rate``, default = ``0.1``, type = double, constraints: ``0.0 <= drop_rate <= 1.0``
257

258
   -  used only in ``dart``
259

260
   -  dropout rate
261

262
-  ``max_drop``, default = ``50``, type = int
263

264
   -  used only in ``dart``
265

266
   -  max number of dropped trees on one iteration
267

268
   -  ``<=0`` means no limit
269

270
-  ``skip_drop``, default = ``0.5``, type = double, constraints: ``0.0 <= skip_drop <= 1.0``
271

272
   -  used only in ``dart``
273

274
   -  probability of skipping drop
275

276
-  ``xgboost_dart_mode``, default = ``false``, type = bool
277

278
   -  used only in ``dart``
279

280
   -  set this to ``true``, if you want to use xgboost dart mode
281

282
-  ``uniform_drop``, default = ``false``, type = bool
283

284
   -  used only in ``dart``
285

286
   -  set this to ``true``, if you want to use uniform drop
287

288
-  ``drop_seed``, default = ``4``, type = int
289

290
   -  used only in ``dart``
291

292
   -  random seed to choose dropping models
293

294
-  ``top_rate``, default = ``0.2``, type = double, constraints: ``0.0 <= top_rate <= 1.0``
295

296
   -  used only in ``goss``
297

298
   -  the retain ratio of large gradient data
299

300
-  ``other_rate``, default = ``0.1``, type = double, constraints: ``0.0 <= other_rate <= 1.0``
301

302
   -  used only in ``goss``
303

304
305
306
307
308
   -  the retain ratio of small gradient data

-  ``min_data_per_group``, default = ``100``, type = int, constraints: ``min_data_per_group > 0``

   -  minimal number of data per categorical group
309

310
-  ``max_cat_threshold``, default = ``32``, type = int, constraints: ``max_cat_threshold > 0``
311

312
   -  used for the categorical features
313

314
   -  limit the max threshold points in categorical features
315

316
317
318
-  ``cat_l2``, default = ``10.0``, type = double, constraints: ``cat_l2 >= 0.0``

   -  used for the categorical features
Guolin Ke's avatar
Guolin Ke committed
319
320

   -  L2 regularization in categorcial split
321

322
323
324
325
326
327
328
-  ``cat_smooth``, default = ``10.0``, type = double, constraints: ``cat_smooth >= 0.0``

   -  used for the categorical features

   -  this can reduce the effect of noises in categorical features, especially for categories with few data

-  ``max_cat_to_onehot``, default = ``4``, type = int, constraints: ``max_cat_to_onehot > 0``
329

330
331
   -  when number of categories of one feature smaller than or equal to ``max_cat_to_onehot``, one-vs-other split algorithm will be used

332
-  ``top_k``, default = ``20``, type = int, aliases: ``topk``, constraints: ``top_k > 0``
333
334
335
336

   -  used in `Voting parallel <./Parallel-Learning-Guide.rst#choose-appropriate-parallel-algorithm>`__

   -  set this to larger value for more accurate result, but it will slow down the training speed
337

338
-  ``monotone_constraints``, default = ``None``, type = multi-int, aliases: ``mc``, ``monotone_constraint``
Guolin Ke's avatar
Guolin Ke committed
339

340
   -  used for constraints of monotonic features
Guolin Ke's avatar
Guolin Ke committed
341

342
   -  ``1`` means increasing, ``-1`` means decreasing, ``0`` means non-constraint
Guolin Ke's avatar
Guolin Ke committed
343

344
345
346
347
348
349
350
351
352
353
354
   -  you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature

-  ``forcedsplits_filename``, default = ``""``, type = string, aliases: ``fs``, ``forced_splits_filename``, ``forced_splits_file``, ``forced_splits``

   -  path to a ``.json`` file that specifies splits to force at the top of every decision tree before best-first learning commences

   -  ``.json`` file can be arbitrarily nested, and each split contains ``feature``, ``threshold`` fields, as well as ``left`` and ``right`` fields representing subsplits

   -  categorical splits are forced in a one-hot fashion, with ``left`` representing the split containing the feature value and ``right`` representing other values

   -  see `this file <https://github.com/Microsoft/LightGBM/tree/master/examples/binary_classification/forced_splits.json>`__ as an example
Guolin Ke's avatar
Guolin Ke committed
355

356
357
358
IO Parameters
-------------

359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
-  ``verbosity``, default = ``1``, type = int, aliases: ``verbose``

   -  controls the level of LightGBM's verbosity

   -  ``< 0``: Fatal, ``= 0``: Error (Warn), ``> 0``: Info

-  ``max_bin``, default = ``255``, type = int, constraints: ``max_bin > 1``

   -  max number of bins that feature values will be bucketed in

   -  small number of bins may reduce training accuracy but may increase general power (deal with over-fitting)

   -  LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255``

-  ``min_data_in_bin``, default = ``3``, type = int, constraints: ``min_data_in_bin > 0``

   -  minimal number of data inside one bin

   -  use this to avoid one-data-one-bin (potential over-fitting)
378

379
-  ``bin_construct_sample_cnt``, default = ``200000``, type = int, aliases: ``subsample_for_bin``, constraints: ``bin_construct_sample_cnt > 0``
380

381
382
383
384
385
386
387
   -  number of data that sampled to construct histogram bins

   -  setting this to larger value will give better training result, but will increase data loading time

   -  set this to larger value if data is very sparse

-  ``histogram_pool_size``, default = ``-1.0``, type = double
388

389
   -  max cache size in MB for historical histogram
390

391
392
393
394
395
   -  ``< 0`` means no limit

-  ``data_random_seed``, default = ``1``, type = int

   -  random seed for data partition in parallel learning (excluding the ``feature_parallel`` mode)
396

397
-  ``output_model``, default = ``LightGBM_model.txt``, type = string, aliases: ``model_output``, ``model_out``
398

399
   -  filename of output model in training
400

401
-  ``snapshot_freq``, default = ``-1``, type = int
402

403
   -  frequency of saving model file snapshot
404

405
   -  set this to positive value to enable this function. For example, the model file will be snapshotted at each iteration if ``snapshot_freq=1``
406

407
-  ``input_model``, default = ``""``, type = string, aliases: ``model_input``, ``model_in``
408

409
410
411
   -  filename of input model

   -  for ``prediction`` task, this model will be applied to prediction data
412
413
414

   -  for ``train`` task, training will be continued from this model

415
416
417
   -  **Note**: can be used only in CLI version

-  ``output_result``, default = ``LightGBM_predict_result.txt``, type = string, aliases: ``predict_result``, ``prediction_result``
418

419
   -  filename of prediction result in ``prediction`` task
420

421
-  ``initscore_filename``, default = ``""``, type = string, aliases: ``init_score_filename``, ``init_score_file``, ``init_score``, ``input_init_score``
422

423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
   -  path of file with training initial score

   -  if ``""``, will use ``train_data_file`` + ``.init`` (if exists)

-  ``valid_data_initscores``, default = ``""``, type = string, aliases: ``valid_data_init_scores``, ``valid_init_score_file``, ``valid_init_score``

   -  path(s) of file(s) with validation initial score(s)

   -  if ``""``, will use ``valid_data_file`` + ``.init`` (if exists)

   -  separate by ``,`` for multi-validation data

-  ``pre_partition``, default = ``false``, type = bool, aliases: ``is_pre_partition``

   -  used for parallel learning (excluding the ``feature_parallel`` mode)
438
439
440

   -  ``true`` if training data are pre-partitioned, and different machines use different partitions

441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
-  ``enable_bundle``, default = ``true``, type = bool, aliases: ``is_enable_bundle``, ``bundle``

   -  set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`__

   -  **Note**: disabling this may cause the slow training speed for sparse datasets

-  ``max_conflict_rate``, default = ``0.0``, type = double, constraints: ``0.0 <= max_conflict_rate < 1.0``

   -  max conflict rate for bundles in EFB

   -  set this to ``0.0`` to disallow the conflict and provide more accurate results

   -  set this to a larger value to achieve faster speed

-  ``is_enable_sparse``, default = ``true``, type = bool, aliases: ``is_sparse``, ``enable_sparse``, ``sparse``

   -  used to enable/disable sparse optimization

-  ``sparse_threshold``, default = ``0.8``, type = double, constraints: ``0.0 < sparse_threshold <= 1.0``

   -  the threshold of zero elements precentage for treating a feature as a sparse one

-  ``use_missing``, default = ``true``, type = bool

   -  set this to ``false`` to disable the special handle of missing value
466

467
-  ``zero_as_missing``, default = ``false``, type = bool
468

469
   -  set this to ``true`` to treat all zero as missing values (including the unshown values in libsvm/sparse matrics)
470

471
472
473
   -  set this to ``false`` to use ``na`` for representing missing values

-  ``two_round``, default = ``false``, type = bool, aliases: ``two_round_loading``, ``use_two_round_loading``
474
475
476

   -  set this to ``true`` if data file is too big to fit in memory

477
478
479
480
481
   -  by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed, but may cause run out of memory error when the data file is very big

-  ``save_binary``, default = ``false``, type = bool, aliases: ``is_save_binary``, ``is_save_binary_file``

   -  if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time
482

483
-  ``enable_load_from_binary_file``, default = ``true``, type = bool, aliases: ``load_from_binary_file``, ``binary_load``, ``load_binary``
484

485
   -  set this to ``true`` to enable autoloading from previous saved binary datasets
486

487
   -  set this to ``false`` to ignore binary datasets
488

489
-  ``header``, default = ``false``, type = bool, aliases: ``has_header``
490
491
492

   -  set this to ``true`` if input data has header

493
-  ``label_column``, default = ``""``, type = int or string, aliases: ``label``
494

495
   -  used to specify the label column
496
497
498
499
500

   -  use number for index, e.g. ``label=0`` means column\_0 is the label

   -  add a prefix ``name:`` for column name, e.g. ``label=name:is_click``

501
-  ``weight_column``, default = ``""``, type = int or string, aliases: ``weight``
502

503
   -  used to specify the weight column
504
505
506
507
508

   -  use number for index, e.g. ``weight=0`` means column\_0 is the weight

   -  add a prefix ``name:`` for column name, e.g. ``weight=name:weight``

509
   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0, and weight is column\_1, the correct parameter is ``weight=0``
510

511
-  ``group_column``, default = ``""``, type = int or string, aliases: ``group``, ``group_id``, ``query_column``, ``query``, ``query_id``
512

513
   -  used to specify the query/group id column
514
515
516
517
518

   -  use number for index, e.g. ``query=0`` means column\_0 is the query id

   -  add a prefix ``name:`` for column name, e.g. ``query=name:query_id``

519
   -  **Note**: data should be grouped by query\_id
520

521
   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``, e.g. when label is column\_0 and query\_id is column\_1, the correct parameter is ``query=0``
522

523
524
525
-  ``ignore_column``, default = ``""``, type = multi-int or string, aliases: ``ignore_feature``, ``blacklist``

   -  used to specify some ignoring columns in training
526
527
528
529
530

   -  use number for index, e.g. ``ignore_column=0,1,2`` means column\_0, column\_1 and column\_2 will be ignored

   -  add a prefix ``name:`` for column name, e.g. ``ignore_column=name:c1,c2,c3`` means c1, c2 and c3 will be ignored

531
   -  **Note**: works only in case of loading data directly from file
532

533
   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``
534

535
-  ``categorical_feature``, default = ``""``, type = multi-int or string, aliases: ``cat_feature``, ``categorical_column``, ``cat_column``
536

537
   -  used to specify categorical features
538
539
540
541
542

   -  use number for index, e.g. ``categorical_feature=0,1,2`` means column\_0, column\_1 and column\_2 are categorical features

   -  add a prefix ``name:`` for column name, e.g. ``categorical_feature=name:c1,c2,c3`` means c1, c2 and c3 are categorical features

543
544
545
   -  **Note**: only supports categorical with ``int`` type

   -  **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``
546

547
548
   -  **Note**: all values should be less than ``Int32.MaxValue`` (2147483647)

549
550
   -  **Note**: the negative values will be treated as **missing values**

551
-  ``predict_raw_score``, default = ``false``, type = bool, aliases: ``is_predict_raw_score``, ``predict_rawscore``, ``raw_score``
552

553
   -  used only in ``prediction`` task
554

555
   -  set this to ``true`` to predict only the raw scores
556

557
   -  set this to ``false`` to predict transformed scores
558

559
-  ``predict_leaf_index``, default = ``false``, type = bool, aliases: ``is_predict_leaf_index``, ``leaf_index``
560

561
   -  used only in ``prediction`` task
562

563
   -  set this to ``true`` to predict with leaf index of all trees
564

565
-  ``predict_contrib``, default = ``false``, type = bool, aliases: ``is_predict_contrib``, ``contrib``
566

567
   -  used only in ``prediction`` task
568

569
   -  set this to ``true`` to estimate `SHAP values <https://arxiv.org/abs/1706.06060>`__, which represent how each feature contributs to each prediction
570

571
   -  produces ``#features + 1`` values where the last value is the expected value of the model output over the training data
572

573
-  ``num_iteration_predict``, default = ``-1``, type = int
574

575
   -  used only in ``prediction`` task
576

577
   -  used to specify how many trained iterations will be used in prediction
578

579
   -  ``<= 0`` means no limit
580

581
-  ``pred_early_stop``, default = ``false``, type = bool
582

583
   -  used only in ``prediction`` task
584

585
   -  if ``true``, will use early-stopping to speed up the prediction. May affect the accuracy
586

587
-  ``pred_early_stop_freq``, default = ``10``, type = int
588

589
   -  used only in ``prediction`` task
590
591
592

   -  the frequency of checking early-stopping prediction

593
594
595
-  ``pred_early_stop_margin``, default = ``10.0``, type = double

   -  used only in ``prediction`` task
596
597
598

   -  the threshold of margin in early-stopping prediction

599
-  ``convert_model_language``, default = ``""``, type = string
600

601
   -  used only in ``convert_model`` task
602

603
   -  only ``cpp`` is supported yet
604

605
   -  if ``convert_model_language`` is set and ``task=train``, the model will be also converted
606

607
-  ``convert_model``, default = ``gbdt_prediction.cpp``, type = string, aliases: ``convert_model_file``
608

609
   -  used only in ``convert_model`` task
610

611
   -  output filename of converted model
612

613
614
Objective Parameters
--------------------
615

616
-  ``num_class``, default = ``1``, type = int, aliases: ``num_classes``, constraints: ``num_class > 0``
617

618
   -  used only in ``multi-class`` classification application
619

620
-  ``is_unbalance``, default = ``false``, type = bool, aliases: ``unbalanced_sets``
621

622
   -  used only in ``binary`` application
623

624
   -  set this to ``true`` if training data are unbalance
625

626
   -  **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them
627

628
-  ``scale_pos_weight``, default = ``1.0``, type = double, constraints: ``scale_pos_weight > 0.0``
629

630
   -  used only in ``binary`` application
631

632
   -  weight of labels with positive class
633

634
   -  **Note**: this parameter cannot be used at the same time with ``is_unbalance``, choose only **one** of them
635

636
-  ``sigmoid``, default = ``1.0``, type = double, constraints: ``sigmoid > 0.0``
637

638
   -  used only in ``binary`` and ``multiclassova`` classification and in ``lambdarank`` applications
639

640
   -  parameter for the sigmoid function
641

642
-  ``boost_from_average``, default = ``true``, type = bool
643

644
   -  used only in ``regression``, ``binary`` and ``cross-entropy`` applications
645

646
   -  adjusts initial score to the mean of labels for faster convergence
647

648
-  ``reg_sqrt``, default = ``false``, type = bool
649

650
   -  used only in ``regression`` application
651

652
   -  used to fit ``sqrt(label)`` instead of original values and prediction result will be also automatically converted to ``prediction^2``
653

654
   -  might be useful in case of large-range labels
655

656
-  ``alpha``, default = ``0.9``, type = double, constraints: ``0.0 < alpha < 1.0``
657

658
   -  used only in ``huber`` and ``quantile`` ``regression`` applications
659

660
   -  parameter for `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__ and `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__
661

662
-  ``fair_c``, default = ``1.0``, type = double, constraints: ``fair_c > 0.0``
663

664
   -  used only in ``fair`` ``regression`` application
665

666
   -  parameter for `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__
667

668
-  ``poisson_max_delta_step``, default = ``0.7``, type = double, constraints: ``poisson_max_delta_step > 0.0``
669

670
   -  used only in ``poisson`` ``regression`` application
671

672
673
674
675
676
677
678
679
680
   -  parameter for `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__ to safeguard optimization

-  ``tweedie_variance_power``, default = ``1.5``, type = double, constraints: ``1.0 <= tweedie_variance_power < 2.0``

   -  used only in ``tweedie`` ``regression`` application

   -  used to control the variance of the tweedie distribution

   -  set this closer to ``2`` to shift towards a **Gamma** distribution
681

682
   -  set this closer to ``1`` to shift towards a **Poisson** distribution
683

684
-  ``max_position``, default = ``20``, type = int, constraints: ``max_position > 0``
685

686
   -  used only in ``lambdarank`` application
687

688
   -  optimizes `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__ at this position
689

690
-  ``label_gain``, default = ``0,1,3,7,15,31,63,...,2^30-1``, type = multi-double
691

692
   -  used only in ``lambdarank`` application
Nikita Titov's avatar
Nikita Titov committed
693

694
   -  relevant gain for labels. For example, the gain of label ``2`` is ``3`` in case of default label gains
Nikita Titov's avatar
Nikita Titov committed
695

696
   -  separate by ``,``
Guolin Ke's avatar
Guolin Ke committed
697

698
699
700
Metric Parameters
-----------------

701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
-  ``metric``, default = ``""``, type = multi-enum, aliases: ``metrics``, ``metric_types``

   -  metric(s) to be evaluated on the evaluation sets **in addition** to what is provided in the training arguments

      -  ``""`` (empty string or not specific) means that metric corresponding to specified ``objective`` will be used (this is possible only for pre-defined objective functions, otherwise no evaluation metric will be added)

      -  ``"None"`` (string, **not** a ``None`` value) means that no metric will be registered, aliases: ``na``

      -  ``l1``, absolute loss, aliases: ``mean_absolute_error``, ``mae``, ``regression_l1``

      -  ``l2``, square loss, aliases: ``mean_squared_error``, ``mse``, ``regression_l2``, ``regression``

      -  ``l2_root``, root square loss, aliases: ``root_mean_squared_error``, ``rmse``

      -  ``quantile``, `Quantile regression <https://en.wikipedia.org/wiki/Quantile_regression>`__

      -  ``mape``, `MAPE loss <https://en.wikipedia.org/wiki/Mean_absolute_percentage_error>`__, aliases: ``mean_absolute_percentage_error``

      -  ``huber``, `Huber loss <https://en.wikipedia.org/wiki/Huber_loss>`__

      -  ``fair``, `Fair loss <https://www.kaggle.com/c/allstate-claims-severity/discussion/24520>`__

      -  ``poisson``, negative log-likelihood for `Poisson regression <https://en.wikipedia.org/wiki/Poisson_regression>`__

      -  ``gamma``, negative log-likelihood for **Gamma** regression

      -  ``gamma_deviance``, residual deviance for **Gamma** regression

      -  ``tweedie``, negative log-likelihood for **Tweedie** regression

      -  ``ndcg``, `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__

      -  ``map``, `MAP <https://makarandtapaswi.wordpress.com/2012/07/02/intuition-behind-average-precision-and-map/>`__, aliases: ``mean_average_precision``

      -  ``auc``, `AUC <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`__

      -  ``binary_logloss``, `log loss <https://en.wikipedia.org/wiki/Cross_entropy>`__, aliases: ``binary``

Misha Lisovyi's avatar
Misha Lisovyi committed
739
      -  ``binary_error``, for one sample: ``0`` for correct classification, ``1`` for error classification
740
741
742
743
744
745
746
747
748
749

      -  ``multi_logloss``, log loss for multi-class classification, aliases: ``multiclass``, ``softmax``, ``multiclassova``, ``multiclass_ova``, ``ova``, ``ovr``

      -  ``multi_error``, error rate for multi-class classification

      -  ``xentropy``, cross-entropy (with optional linear weights), aliases: ``cross_entropy``

      -  ``xentlambda``, "intensity-weighted" cross-entropy, aliases: ``cross_entropy_lambda``

      -  ``kldiv``, `Kullback-Leibler divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence>`__, aliases: ``kullback_leibler``
750

Misha Lisovyi's avatar
Misha Lisovyi committed
751
   -  support multiple metrics, separated by ``,``
752

753
-  ``metric_freq``, default = ``1``, type = int, aliases: ``output_freq``, constraints: ``metric_freq > 0``
754
755
756

   -  frequency for metric output

757
-  ``is_provide_training_metric``, default = ``false``, type = bool, aliases: ``training_metric``, ``is_training_metric``, ``train_metric``
758

759
   -  set this to ``true`` to output metric result over training dataset
760

761
-  ``eval_at``, default = ``1,2,3,4,5``, type = multi-int, aliases: ``ndcg_eval_at``, ``ndcg_at``
762

763
764
765
   -  used only with ``ndcg`` and ``map`` metrics

   -  `NDCG <https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG>`__ evaluation positions, separated by ``,``
766
767
768
769

Network Parameters
------------------

770
-  ``num_machines``, default = ``1``, type = int, aliases: ``num_machine``, constraints: ``num_machines > 0``
771

772
   -  the number of machines for parallel learning application
773

774
   -  this parameter is needed to be set in both **socket** and **mpi** versions
775

776
-  ``local_listen_port``, default = ``12400``, type = int, aliases: ``local_port``, ``port``, constraints: ``local_listen_port > 0``
777
778
779

   -  TCP listen port for local machines

780
   -  **Note**: don't forget to allow this port in firewall settings before training
781

782
-  ``time_out``, default = ``120``, type = int, constraints: ``time_out > 0``
783
784
785

   -  socket time-out in minutes

786
787
788
-  ``machine_list_filename``, default = ``""``, type = string, aliases: ``machine_list_file``, ``machine_list``, ``mlist``

   -  path of file that lists machines for this parallel learning application
789

790
   -  each line contains one IP and one port for one machine. The format is ``ip port`` (space as a separator)
791

792
793
794
-  ``machines``, default = ``""``, type = string, aliases: ``workers``, ``nodes``

   -  list of machines in the following format: ``ip1:port1,ip2:port2``
795
796
797
798

GPU Parameters
--------------

799
-  ``gpu_platform_id``, default = ``-1``, type = int
800

801
   -  OpenCL platform ID. Usually each GPU vendor exposes one OpenCL platform
802

803
   -  ``-1`` means the system-wide default platform
804

805
-  ``gpu_device_id``, default = ``-1``, type = int
806
807
808

   -  OpenCL device ID in the specified platform. Each GPU in the selected platform has a unique device ID

809
   -  ``-1`` means the default device in the selected platform
810

811
-  ``gpu_use_dp``, default = ``false``, type = bool
812

813
   -  set this to ``true`` to use double precision math on GPU (by default single precision is used)
814

815
816
.. end params list

817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
Others
------

Continued Training with Input Score
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

LightGBM supports continued training with initial scores. It uses an additional file to store these initial scores, like the following:

::

    0.5
    -0.1
    0.9
    ...

It means the initial score of the first data row is ``0.5``, second is ``-0.1``, and so on.
The initial score file corresponds with data file line by line, and has per score per line.
And if the name of data file is ``train.txt``, the initial score file should be named as ``train.txt.init`` and in the same folder as the data file.
In this case LightGBM will auto load initial score file if it exists.

Weight Data
~~~~~~~~~~~

Nikita Titov's avatar
Nikita Titov committed
840
LightGBM supports weighted training. It uses an additional file to store weight data, like the following:
841
842
843
844
845
846
847
848
849
850

::

    1.0
    0.5
    0.8
    ...

It means the weight of the first data row is ``1.0``, second is ``0.5``, and so on.
The weight file corresponds with data file line by line, and has per weight per line.
851
852
And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file.
In this case LightGBM will load the weight file automatically if it exists.
853

854
Also, you can include weight column in your data file. Please refer to parameter ``weight`` in above.
855
856
857
858
859

Query Data
~~~~~~~~~~

For LambdaRank learning, it needs query information for training data.
Nikita Titov's avatar
Nikita Titov committed
860
LightGBM uses an additional file to store query data, like the following:
861
862
863
864
865
866
867
868

::

    27
    18
    67
    ...

869
It means first ``27`` lines samples belong to one query and next ``18`` lines belong to another, and so on.
870
871
872

**Note**: data should be ordered by the query.

873
If the name of data file is ``train.txt``, the query file should be named as ``train.txt.query`` and placed in the same folder as the data file.
874
875
In this case LightGBM will load the query file automatically if it exists.

876
Also, you can include query/group id column in your data file. Please refer to parameter ``group`` in above.
877
878

.. _Laurae++ Interactive Documentation: https://sites.google.com/view/lauraepp/parameters