"tests/cpp_tests/test_chunked_array.cpp" did not exist on "4ded1342ae06707ae908228fc56c9c07d48d8655"
Python-API.md 32 KB
Newer Older
wxchan's avatar
wxchan committed
1
2
##Catalog

Guolin Ke's avatar
Guolin Ke committed
3
4
5
6
7
* [Data Structure API](Python-API.md#basic-data-structure-api)
    - [Dataset](Python-API.md#dataset)
    - [Booster](Python-API.md#booster)

* [Training API](Python-API.md#training-api)
wxchan's avatar
wxchan committed
8
9
    - [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_nameauto-categorical_featureauto-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
    - [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-shuffletrue-metricsnone-fobjnone-fevalnone-init_modelnone-feature_nameauto-categorical_featureauto-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
Guolin Ke's avatar
Guolin Ke committed
10
11
12

* [Scikit-learn API](Python-API.md#scikit-learn-api)
    - [Common Methods](Python-API.md#common-methods)
13
    - [Common Attributes](Python-API.md#common-attributes)
Guolin Ke's avatar
Guolin Ke committed
14
15
16
    - [LGBMClassifier](Python-API.md#lgbmclassifier)
    - [LGBMRegressor](Python-API.md#lgbmregressor)
    - [LGBMRanker](Python-API.md#lgbmranker)
17
18
19
20
21
22
23
24

* [Callbacks](Python-API.md#callbacks)
    - [Before iteration](Python-API.md#before-iteration)
        + [reset_parameter](Python-API.md#reset_parameterkwargs)
    - [After iteration](Python-API.md#after-iteration)
        + [print_evaluation](Python-API.md#print_evaluationperiod1-show_stdvtrue)
        + [record_evaluation](Python-API.md#record_evaluationeval_result)
        + [early_stopping](Python-API.md#early_stoppingstopping_rounds-verbosetrue)
25

26
27
* [Plotting](Python-API.md#plotting)

wxchan's avatar
wxchan committed
28
29
The methods of each Class is in alphabetical order.

wxchan's avatar
wxchan committed
30
----
wxchan's avatar
wxchan committed
31
32
33

##Basic Data Structure API

wxchan's avatar
wxchan committed
34
###Dataset
wxchan's avatar
wxchan committed
35

wxchan's avatar
wxchan committed
36
####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)
wxchan's avatar
wxchan committed
37
38
39

    Parameters
    ----------
wxchan's avatar
wxchan committed
40
    data : str/numpy array/scipy.sparse
wxchan's avatar
wxchan committed
41
42
43
44
45
46
47
48
        Data source of Dataset.
        When data type is string, it represents the path of txt file
    label : list or numpy 1-D array, optional
        Label of the data
    max_bin : int, required
        Max number of discrete bin for features
    reference : Other Dataset, optional
        If this dataset validation, need to use training data as reference
wxchan's avatar
wxchan committed
49
    weight : list or numpy 1-D array, optional
wxchan's avatar
wxchan committed
50
        Weight for each instance.
wxchan's avatar
wxchan committed
51
    group : list or numpy 1-D array, optional
wxchan's avatar
wxchan committed
52
53
54
        Group/query size for dataset
    silent : boolean, optional
        Whether print messages during construction
wxchan's avatar
wxchan committed
55
    feature_name : list of str, or 'auto'
wxchan's avatar
wxchan committed
56
        Feature names
wxchan's avatar
wxchan committed
57
58
        If 'auto' and data is pandas DataFrame, use data columns name
    categorical_feature : list of str or int, or 'auto'
wxchan's avatar
wxchan committed
59
60
61
        Categorical features,
        type int represents index,
        type str represents feature names (need to specify feature_name as well)
wxchan's avatar
wxchan committed
62
        If 'auto' and data is pandas DataFrame, use pandas categorical columns
63
    params : dict, optional
wxchan's avatar
wxchan committed
64
        Other parameters
65
    free_raw_data : Bool
wxchan's avatar
wxchan committed
66
        True if need to free raw data after construct inner dataset
67

wxchan's avatar
wxchan committed
68
69
70

####create_valid(data, label=None, weight=None, group=None, silent=False, params=None)

wxchan's avatar
wxchan committed
71
    Create validation data align with current dataset.
wxchan's avatar
wxchan committed
72
73
74

    Parameters
    ----------
wxchan's avatar
wxchan committed
75
    data : str/numpy array/scipy.sparse
wxchan's avatar
wxchan committed
76
77
78
79
        Data source of _InnerDataset.
        When data type is string, it represents the path of txt file
    label : list or numpy 1-D array, optional
        Label of the training data.
wxchan's avatar
wxchan committed
80
    weight : list or numpy 1-D array, optional
wxchan's avatar
wxchan committed
81
        Weight for each instance.
wxchan's avatar
wxchan committed
82
    group : list or numpy 1-D array, optional
wxchan's avatar
wxchan committed
83
84
85
        Group/query size for dataset
    silent : boolean, optional
        Whether print messages during construction
86
    params : dict, optional
wxchan's avatar
wxchan committed
87
        Other parameters
88

wxchan's avatar
wxchan committed
89
90
91
92
93
94
95
96

####get_group()

    Get the initial score of the Dataset.

    Returns
    -------
    init_score : array
97

wxchan's avatar
wxchan committed
98
99
100
101
102
103
104
105

####get_init_score()

    Get the initial score of the Dataset.

    Returns
    -------
    init_score : array
106

wxchan's avatar
wxchan committed
107
108
109
110
111
112
113
114

####get_label()

    Get the label of the Dataset.

    Returns
    -------
    label : array
115

wxchan's avatar
wxchan committed
116
117
118
119
120
121
122
123

####get_weight()

    Get the weight of the Dataset.

    Returns
    -------
    weight : array
124

wxchan's avatar
wxchan committed
125
126
127
128
129
130
131
132

####num_data()

    Get the number of rows in the Dataset.

    Returns
    -------
    number of rows : int
133

wxchan's avatar
wxchan committed
134
135
136
137
138
139
140
141

####num_feature()

    Get the number of columns (features) in the Dataset.

    Returns
    -------
    number of columns : int
142

wxchan's avatar
wxchan committed
143
144
145

####save_binary(filename)

wxchan's avatar
wxchan committed
146
    Save Dataset to binary file.
wxchan's avatar
wxchan committed
147
148
149

    Parameters
    ----------
wxchan's avatar
wxchan committed
150
    filename : str
wxchan's avatar
wxchan committed
151
        Name of the output file.
152

wxchan's avatar
wxchan committed
153
154
155

####set_categorical_feature(categorical_feature)

wxchan's avatar
wxchan committed
156
    Set categorical features.
wxchan's avatar
wxchan committed
157
158
159

    Parameters
    ----------
wxchan's avatar
wxchan committed
160
161
    categorical_feature : list of str or list of int
        Name (str) or index (int) of categorical features
wxchan's avatar
wxchan committed
162

163

wxchan's avatar
wxchan committed
164
165
166

####set_feature_name(feature_name)

wxchan's avatar
wxchan committed
167
    Set feature name.
wxchan's avatar
wxchan committed
168
169
170
171
172

    Parameters
    ----------
    feature_name : list of str
        Feature names
173

wxchan's avatar
wxchan committed
174
175
176
177
178
179
180
181
182

####set_group(group)

    Set group size of Dataset (used for ranking).

    Parameters
    ----------
    group : numpy array or list or None
        Group size of each group
183

wxchan's avatar
wxchan committed
184
185
186
187
188
189
190

####set_init_score(init_score)

    Set init score of booster to start from.

    Parameters
    ----------
wxchan's avatar
wxchan committed
191
    init_score : numpy array or list or None
wxchan's avatar
wxchan committed
192
        Init score for booster
193

wxchan's avatar
wxchan committed
194
195
196

####set_label(label)

wxchan's avatar
wxchan committed
197
    Set label of Dataset.
wxchan's avatar
wxchan committed
198
199
200

    Parameters
    ----------
wxchan's avatar
wxchan committed
201
    label : numpy array or list or None
wxchan's avatar
wxchan committed
202
        The label information to be set into Dataset
203

wxchan's avatar
wxchan committed
204
205
206

####set_reference(reference)

wxchan's avatar
wxchan committed
207
    Set reference dataset.
wxchan's avatar
wxchan committed
208
209
210
211
212

    Parameters
    ----------
    reference : Dataset
        Will use reference as template to consturct current dataset
213

wxchan's avatar
wxchan committed
214
215
216
217
218
219
220
221
222

####set_weight(weight)

    Set weight of each instance.

    Parameters
    ----------
    weight : numpy array or list or None
        Weight for each data point
223

wxchan's avatar
wxchan committed
224
225
226

####subset(used_indices, params=None)

wxchan's avatar
wxchan committed
227
    Get subset of current dataset.
wxchan's avatar
wxchan committed
228
229
230
231
232
233
234

    Parameters
    ----------
    used_indices : list of int
        Used indices of this subset
    params : dict
        Other parameters
235

wxchan's avatar
wxchan committed
236
237

###Booster
wxchan's avatar
wxchan committed
238

wxchan's avatar
wxchan committed
239
240
241
242
243
244
245
246
247
248
####__init__(params=None, train_set=None, model_file=None, silent=False)

    Initialize the Booster.

    Parameters
    ----------
    params : dict
        Parameters for boosters.
    train_set : Dataset
        Training dataset
wxchan's avatar
wxchan committed
249
    model_file : str
wxchan's avatar
wxchan committed
250
251
252
        Path to the model file.
    silent : boolean, optional
        Whether print messages during construction
253

wxchan's avatar
wxchan committed
254
255
256

####add_valid(data, name)

wxchan's avatar
wxchan committed
257
    Add an validation data.
wxchan's avatar
wxchan committed
258
259
260
261
262

    Parameters
    ----------
    data : Dataset
        Validation data
wxchan's avatar
wxchan committed
263
    name : str
wxchan's avatar
wxchan committed
264
        Name of validation data
265

wxchan's avatar
wxchan committed
266
267
268
269
270
271
272
273
274
275
276
277
278
279

####attr(key)

    Get attribute string from the Booster.

    Parameters
    ----------
    key : str
        The key to get attribute from.

    Returns
    -------
    value : str
        The attribute value of the key, returns None if attribute do not exist.
280

wxchan's avatar
wxchan committed
281
282
283

####current_iteration()

wxchan's avatar
wxchan committed
284
285
286
287
288
289
290
    Get current number of iterations.

    Returns
    -------
    result : int
        Current number of iterations

wxchan's avatar
wxchan committed
291
292
####dump_model()

wxchan's avatar
wxchan committed
293
    Dump model to json format.
wxchan's avatar
wxchan committed
294
295
296

    Returns
    -------
wxchan's avatar
wxchan committed
297
298
    result : dict or list
        Json format of model
299

wxchan's avatar
wxchan committed
300
301
302

####eval(data, name, feval=None)

wxchan's avatar
wxchan committed
303
    Evaluate for data.
wxchan's avatar
wxchan committed
304
305
306
307
308
309
310
311
312
313

    Parameters
    ----------
    data : _InnerDataset object
    name :
        Name of data
    feval : function
        Custom evaluation function.
    Returns
    -------
wxchan's avatar
wxchan committed
314
    result : list
wxchan's avatar
wxchan committed
315
        Evaluation result list.
316

wxchan's avatar
wxchan committed
317
318
319

####eval_train(feval=None)

wxchan's avatar
wxchan committed
320
    Evaluate for training data.
wxchan's avatar
wxchan committed
321
322
323
324
325
326
327
328
329
330

    Parameters
    ----------
    feval : function
        Custom evaluation function.

    Returns
    -------
    result: str
        Evaluation result list.
331

wxchan's avatar
wxchan committed
332
333
334

####eval_valid(feval=None)

wxchan's avatar
wxchan committed
335
    Evaluate for validation data.
wxchan's avatar
wxchan committed
336
337
338
339
340
341
342
343

    Parameters
    ----------
    feval : function
        Custom evaluation function.

    Returns
    -------
wxchan's avatar
wxchan committed
344
    result : str
wxchan's avatar
wxchan committed
345
        Evaluation result list.
346

wxchan's avatar
wxchan committed
347

wxchan's avatar
wxchan committed
348
349
350
351
352
353
354
355
356
357
####feature_name()

    Get feature names.

    Returns
    -------
    result : array
        Array of feature names.


wxchan's avatar
wxchan committed
358
####feature_importance(importance_type="split")
wxchan's avatar
wxchan committed
359

wxchan's avatar
wxchan committed
360
    Get feature importances.
wxchan's avatar
wxchan committed
361

362
363
364
365
366
367
368
    Parameters
    ----------
    importance_type : str, default "split"
    How the importance is calculated: "split" or "gain"
    "split" is the number of times a feature is used in a model
    "gain" is the total gain of splits which use the feature

wxchan's avatar
wxchan committed
369
370
    Returns
    -------
wxchan's avatar
wxchan committed
371
    result : array
wxchan's avatar
wxchan committed
372
        Array of feature importances.
373

wxchan's avatar
wxchan committed
374
375
376

####predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True)

wxchan's avatar
wxchan committed
377
    Predict logic.
wxchan's avatar
wxchan committed
378
379
380

    Parameters
    ----------
wxchan's avatar
wxchan committed
381
    data : str/numpy array/scipy.sparse
wxchan's avatar
wxchan committed
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
        Data source for prediction
        When data type is string, it represents the path of txt file
    num_iteration : int
        Used iteration for prediction
    raw_score : bool
        True for predict raw score
    pred_leaf : bool
        True for predict leaf index
    data_has_header : bool
        Used for txt data
    is_reshape : bool
        Reshape to (nrow, ncol) if true

    Returns
    -------
    Prediction result
398

wxchan's avatar
wxchan committed
399
400
401

####reset_parameter(params)

wxchan's avatar
wxchan committed
402
    Reset parameters for booster.
wxchan's avatar
wxchan committed
403
404
405
406
407
408
409

    Parameters
    ----------
    params : dict
        New parameters for boosters
    silent : boolean, optional
        Whether print messages during construction
410

wxchan's avatar
wxchan committed
411
412
413

####rollback_one_iter()

wxchan's avatar
wxchan committed
414
    Rollback one iteration.
415

wxchan's avatar
wxchan committed
416
417
418

####save_model(filename, num_iteration=-1)

wxchan's avatar
wxchan committed
419
    Save model of booster to file.
wxchan's avatar
wxchan committed
420
421
422
423
424

    Parameters
    ----------
    filename : str
        Filename to save
425
    num_iteration : int
wxchan's avatar
wxchan committed
426
        Number of iteration that want to save. < 0 means save all
427

wxchan's avatar
wxchan committed
428

wxchan's avatar
wxchan committed
429
####set_attr(**kwargs)
wxchan's avatar
wxchan committed
430
431
432
433
434
435
436

    Set the attribute of the Booster.

    Parameters
    ----------
    **kwargs
        The attributes to set. Setting a value to None deletes an attribute.
437

wxchan's avatar
wxchan committed
438
439
440

####set_train_data_name(name)

wxchan's avatar
wxchan committed
441
442
443
444
445
446
447
    Set training data name.

    Parameters
    ----------
    name : str
        Name of training data.

wxchan's avatar
wxchan committed
448
449
####update(train_set=None, fobj=None)

wxchan's avatar
wxchan committed
450
    Update for one iteration.
wxchan's avatar
wxchan committed
451
452
    Note: for multi-class task, the score is group by class_id first, then group by row_id
          if you want to get i-th row score in j-th class, the access way is score[j*num_data+i]
wxchan's avatar
wxchan committed
453
          and you should group grad and hess in this way as well.
wxchan's avatar
wxchan committed
454
455
456
457
458
459
460
461
462
463
464

    Parameters
    ----------
    train_set :
        Training data, None means use last training data
    fobj : function
        Customized objective function.

    Returns
    -------
    is_finished, bool
465

wxchan's avatar
wxchan committed
466
467

##Training API
wxchan's avatar
wxchan committed
468

wxchan's avatar
wxchan committed
469
####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None)
wxchan's avatar
wxchan committed
470
471
472
473
474
475
476
477
478
479
480
481
482

    Train with given parameters.

    Parameters
    ----------
    params : dict
        Parameters for training.
    train_set : Dataset
        Data to be trained.
    num_boost_round: int
        Number of boosting iterations.
    valid_sets: list of Datasets
        List of data to be evaluated during training
wxchan's avatar
wxchan committed
483
    valid_names: list of str
wxchan's avatar
wxchan committed
484
485
486
487
488
489
490
491
        Names of valid_sets
    fobj : function
        Customized objective function.
    feval : function
        Customized evaluation function.
        Note: should return (eval_name, eval_result, is_higher_better) of list of this
    init_model : file name of lightgbm model or 'Booster' instance
        model used for continued train
wxchan's avatar
wxchan committed
492
    feature_name : list of str, or 'auto'
wxchan's avatar
wxchan committed
493
        Feature names
wxchan's avatar
wxchan committed
494
495
        If 'auto' and data is pandas DataFrame, use data columns name
    categorical_feature : list of str or int, or 'auto'
wxchan's avatar
wxchan committed
496
497
498
        Categorical features,
        type int represents index,
        type str represents feature names (need to specify feature_name as well)
wxchan's avatar
wxchan committed
499
        If 'auto' and data is pandas DataFrame, use pandas categorical columns
wxchan's avatar
wxchan committed
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
    early_stopping_rounds: int
        Activates early stopping.
        Requires at least one validation data and one metric
        If there's more than one, will check all of them
        Returns the model with (best_iter + early_stopping_rounds)
        If early stopping occurs, the model will add 'best_iteration' field
    evals_result: dict or None
        This dictionary used to store all evaluation results of all the items in valid_sets.
        Example: with a valid_sets containing [valid_set, train_set]
                 and valid_names containing ['eval', 'train']
                 and a paramater containing ('metric':'logloss')
        Returns: {'train': {'logloss': ['0.48253', '0.35953', ...]},
                  'eval': {'logloss': ['0.480385', '0.357756', ...]}}
        passed with None means no using this function
    verbose_eval : bool or int
        Requires at least one item in evals.
        If `verbose_eval` is True,
            the eval metric on the valid set is printed at each boosting stage.
        If `verbose_eval` is int,
            the eval metric on the valid set is printed at every `verbose_eval` boosting stage.
        The last boosting stage
            or the boosting stage found by using `early_stopping_rounds` is also printed.
        Example: with verbose_eval=4 and at least one item in evals,
            an evaluation metric is printed every 4 (instead of 1) boosting stages.
524
    learning_rates : list or function
wxchan's avatar
wxchan committed
525
        List of learning rate for each boosting round
526
527
        or a customized function that calculates learning_rate
        in terms of current number of round (e.g. yields learning rate decay)
wxchan's avatar
wxchan committed
528
        - list l: learning_rate = l[current_round]
529
        - function f: learning_rate = f(current_round)
wxchan's avatar
wxchan committed
530
    callbacks : list of callback functions
531
532
        List of callback functions that are applied at each iteration.
        See Callbacks in Python-API.md for more information.
wxchan's avatar
wxchan committed
533
534
535
536

    Returns
    -------
    booster : a trained booster model
537

wxchan's avatar
wxchan committed
538

wxchan's avatar
wxchan committed
539
####cv(params, train_set, num_boost_round=10, nfold=5, stratified=False, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)
wxchan's avatar
wxchan committed
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554

    Cross-validation with given paramaters.

    Parameters
    ----------
    params : dict
        Booster params.
    train_set : Dataset
        Data to be trained.
    num_boost_round : int
        Number of boosting iterations.
    nfold : int
        Number of folds in CV.
    stratified : bool
        Perform stratified sampling.
wxchan's avatar
wxchan committed
555
556
    shuffle: bool
        Whether shuffle before split data.
wxchan's avatar
wxchan committed
557
558
    folds : a KFold or StratifiedKFold instance
        Sklearn KFolds or StratifiedKFolds.
wxchan's avatar
wxchan committed
559
    metrics : str or list of str
wxchan's avatar
wxchan committed
560
561
562
563
564
565
566
        Evaluation metrics to be watched in CV.
    fobj : function
        Custom objective function.
    feval : function
        Custom evaluation function.
    init_model : file name of lightgbm model or 'Booster' instance
        model used for continued train
wxchan's avatar
wxchan committed
567
    feature_name : list of str, or 'auto'
wxchan's avatar
wxchan committed
568
        Feature names
wxchan's avatar
wxchan committed
569
570
571
572
        If 'auto' and data is pandas DataFrame, use data columns name
    categorical_feature : list of str or int, or 'auto'
        Categorical features,
        type int represents index,
wxchan's avatar
wxchan committed
573
        type str represents feature names (need to specify feature_name as well)
wxchan's avatar
wxchan committed
574
        If 'auto' and data is pandas DataFrame, use pandas categorical columns
wxchan's avatar
wxchan committed
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
    early_stopping_rounds: int
        Activates early stopping. CV error needs to decrease at least
        every <early_stopping_rounds> round(s) to continue.
        Last entry in evaluation history is the one from best iteration.
    fpreproc : function
        Preprocessing function that takes (dtrain, dtest, param)
        and returns transformed versions of those.
    verbose_eval : bool, int, or None, default None
        Whether to display the progress.
        If None, progress will be displayed when np.ndarray is returned.
        If True, progress will be displayed at boosting stage.
        If an integer is given,
            progress will be displayed at every given `verbose_eval` boosting stage.
    show_stdv : bool, default True
        Whether to display the standard deviation in progress.
        Results are not affected, and always contains std.
    seed : int
        Seed used to generate the folds (passed to numpy.random.seed).
    callbacks : list of callback functions
        List of callback functions that are applied at end of each iteration.

    Returns
    -------
wxchan's avatar
wxchan committed
598
    evaluation history : list of str
599

wxchan's avatar
wxchan committed
600
601

##Scikit-learn API
wxchan's avatar
wxchan committed
602

wxchan's avatar
wxchan committed
603
###Common Methods
wxchan's avatar
wxchan committed
604

605
####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=50000, objective="regression", min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0, nthread=-1, silent=True, sigmoid=1.0, huber_delta=1.0, gaussian_eta=1.0, fair_c=1.0, poisson_max_delta_step=0.7, max_position=20, label_gain=None, drop_rate=0.1, skip_drop=0.5, max_drop=50, uniform_drop=False, xgboost_dart_mode=False)
wxchan's avatar
wxchan committed
606
607
608
609
610

    Implementation of the Scikit-Learn API for LightGBM.

    Parameters
    ----------
wxchan's avatar
wxchan committed
611
    boosting_type : str
612
613
        gbdt, traditional Gradient Boosting Decision Tree
        dart, Dropouts meet Multiple Additive Regression Trees
wxchan's avatar
wxchan committed
614
615
616
617
618
619
620
621
    num_leaves : int
        Maximum tree leaves for base learners.
    max_depth : int
        Maximum tree depth for base learners, -1 means no limit.
    learning_rate : float
        Boosting learning rate
    n_estimators : int
        Number of boosted trees to fit.
Guolin Ke's avatar
Guolin Ke committed
622
623
    max_bin : int
        Number of bucketed bin for feature values
wxchan's avatar
wxchan committed
624
625
    subsample_for_bin : int
        Number of samples for constructing bins.
wxchan's avatar
wxchan committed
626
    objective : str or callable
wxchan's avatar
wxchan committed
627
628
        Specify the learning task and the corresponding learning objective or
        a custom objective function to be used (see note below).
wxchan's avatar
wxchan committed
629
        default: binary for LGBMClassifier, regression for LGBMRegressor, lambdarank for LGBMRanker
wxchan's avatar
wxchan committed
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
    min_split_gain : float
        Minimum loss reduction required to make a further partition on a leaf node of the tree.
    min_child_weight : int
        Minimum sum of instance weight(hessian) needed in a child(leaf)
    min_child_samples : int
        Minimum number of data need in a child(leaf)
    subsample : float
        Subsample ratio of the training instance.
    subsample_freq : int
        frequence of subsample, <=0 means no enable
    colsample_bytree : float
        Subsample ratio of columns when constructing each tree.
    reg_alpha : float
        L1 regularization term on weights
    reg_lambda : float
        L2 regularization term on weights
    scale_pos_weight : float
        Balancing of positive and negative weights.
    is_unbalance : bool
        Is unbalance for binary classification
    seed : int
        Random number seed.
wxchan's avatar
wxchan committed
652
653
654
655
656
657
    nthread : int
        Number of parallel threads
    silent : boolean
        Whether to print messages while running boosting.
    sigmoid : float
        Only used in binary classification and lambdarank. Parameter for sigmoid function.
Tsukasa OMOTO's avatar
Tsukasa OMOTO committed
658
659
    huber_delta : float
        Only used in regression. Parameter for Huber loss function.
660
661
662
    gaussian_eta : float
        Only used in regression. Parameter for L1 and Huber loss function.
        It is used to control the width of Gaussian function to approximate hessian.
Tsukasa OMOTO's avatar
Tsukasa OMOTO committed
663
664
    fair_c : float
        Only used in regression. Parameter for Fair loss function.
665
666
    poisson_max_delta_step : float
        parameter used to safeguard optimization in Poisson regression.
wxchan's avatar
wxchan committed
667
668
669
670
671
672
    max_position : int
        Only used in lambdarank, will optimize NDCG at this position.
    label_gain : list of float
        Only used in lambdarank, relevant gain for labels.
        For example, the gain of label 2 is 3 if using default label gains.
        None (default) means use default value of CLI version: {0,1,3,7,15,31,63,...}.
673
674
675
676
677
678
679
680
681
682
    drop_rate : float
        Only used when boosting_type='dart'. Probablity to select dropping trees.
    skip_drop : float
        Only used when boosting_type='dart'. Probablity to skip dropping trees.
    max_drop : int
        Only used when boosting_type='dart'. Max number of dropped trees in one iteration.
    uniform_drop : bool
        Only used when boosting_type='dart'. If true, drop trees uniformly, else drop according to weights.
    xgboost_dart_mode : bool
        Only used when boosting_type='dart'. Whether use xgboost dart mode.
wxchan's avatar
wxchan committed
683
684
685
686
687

    Note
    ----
    A custom objective function can be provided for the ``objective``
    parameter. In this case, it should have the signature
688
    ``objective(y_true, y_pred) -> grad, hess``
wxchan's avatar
wxchan committed
689
690
691
692
        or ``objective(y_true, y_pred, group) -> grad, hess``:

        y_true: array_like of shape [n_samples]
            The target values
693
        y_pred: array_like of shape [n_samples] or shape[n_samples * n_class]
wxchan's avatar
wxchan committed
694
695
696
            The predicted values
        group: array_like
            group/query data, used for ranking task
697
        grad: array_like of shape [n_samples] or shape[n_samples * n_class]
wxchan's avatar
wxchan committed
698
            The value of the gradient for each sample point.
699
        hess: array_like of shape [n_samples] or shape[n_samples * n_class]
wxchan's avatar
wxchan committed
700
701
702
703
704
            The value of the second derivative for each sample point

    for multi-class task, the y_pred is group by class_id first, then group by row_id
        if you want to get i-th row y_pred in j-th class, the access way is y_pred[j*num_data+i]
        and you should group grad and hess in this way as well
705

wxchan's avatar
wxchan committed
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721

####apply(X, num_iteration=0)

    Return the predicted leaf every tree for each sample.

    Parameters
    ----------
    X : array_like, shape=[n_samples, n_features]
        Input features matrix.

    num_iteration : int
        Limit number of iterations in the prediction; defaults to 0 (use all trees).

    Returns
    -------
    X_leaves : array_like, shape=[n_samples, n_trees]
722

wxchan's avatar
wxchan committed
723

wxchan's avatar
wxchan committed
724
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)
wxchan's avatar
wxchan committed
725

wxchan's avatar
wxchan committed
726
    Fit the gradient boosting model.
wxchan's avatar
wxchan committed
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741

    Parameters
    ----------
    X : array_like
        Feature matrix
    y : array_like
        Labels
    sample_weight : array_like
        weight of training data
    init_score : array_like
        init score of training data
    group : array_like
        group data of training data
    eval_set : list, optional
        A list of (X, y) tuple pairs to use as a validation set for early-stopping
742
743
744
745
746
747
    eval_sample_weight : list or dict of array
        weight of eval data; if you use dict, the index should start from 0
    eval_init_score : list or dict of array
        init score of eval data; if you use dict, the index should start from 0
    eval_group : list or dict of array
        group data of eval data; if you use dict, the index should start from 0
wxchan's avatar
wxchan committed
748
749
750
    eval_metric : str, list of str, callable, optional
        If a str, should be a built-in evaluation metric to use.
        If callable, a custom evaluation metric, see note for more details.
wxchan's avatar
wxchan committed
751
        default: binary_error for LGBMClassifier, l2 for LGBMRegressor, ndcg for LGBMRanker
wxchan's avatar
wxchan committed
752
753
754
    early_stopping_rounds : int
    verbose : bool
        If `verbose` and an evaluation set is used, writes the evaluation
wxchan's avatar
wxchan committed
755
    feature_name : list of str, or 'auto'
wxchan's avatar
wxchan committed
756
        Feature names
wxchan's avatar
wxchan committed
757
758
        If 'auto' and data is pandas DataFrame, use data columns name
    categorical_feature : list of str or int, or 'auto'
wxchan's avatar
wxchan committed
759
760
        Categorical features,
        type int represents index,
wxchan's avatar
wxchan committed
761
762
        type str represents feature names (need to specify feature_name as well)
        If 'auto' and data is pandas DataFrame, use pandas categorical columns
763
764
765
    callbacks : list of callback functions
        List of callback functions that are applied at each iteration.
        See Callbacks in Python-API.md for more information.
wxchan's avatar
wxchan committed
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792

    Note
    ----
    Custom eval function expects a callable with following functions:
        ``func(y_true, y_pred)``, ``func(y_true, y_pred, weight)``
            or ``func(y_true, y_pred, weight, group)``.
        return (eval_name, eval_result, is_bigger_better)
            or list of (eval_name, eval_result, is_bigger_better)

        y_true: array_like of shape [n_samples]
            The target values
        y_pred: array_like of shape [n_samples] or shape[n_samples * n_class] (for multi-class)
            The predicted values
        weight: array_like of shape [n_samples]
            The weight of samples
        group: array_like
            group/query data, used for ranking task
        eval_name: str
            name of evaluation
        eval_result: float
            eval result
        is_bigger_better: bool
            is eval result bigger better, e.g. AUC is bigger_better.
    for multi-class task, the y_pred is group by class_id first, then group by row_id
      if you want to get i-th row y_pred in j-th class, the access way is y_pred[j*num_data+i]


793
####predict(X, raw_score=False, num_iteration=0)
wxchan's avatar
wxchan committed
794
795
796
797
798
799
800
801
802
803
804
805
806
807

    Return the predicted value for each sample.

    Parameters
    ----------
    X : array_like, shape=[n_samples, n_features]
        Input features matrix.

    num_iteration : int
        Limit number of iterations in the prediction; defaults to 0 (use all trees).

    Returns
    -------
    predicted_result : array_like, shape=[n_samples] or [n_samples, n_classes]
808
809
810
811
812
813
814
815
816
817
818
819


###Common Attributes

####booster_

    Get the underlying lightgbm Booster of this model.

####evals_result_

    Get the evaluation results.

820
####feature_importances_
821
822
823

    Get normailized feature importances.

wxchan's avatar
wxchan committed
824
825
826

###LGBMClassifier

827
####predict_proba(X, raw_score=False, num_iteration=0)
wxchan's avatar
wxchan committed
828
829
830
831
832
833
834
835
836
837
838
839
840
841

    Return the predicted probability for each class for each sample.

    Parameters
    ----------
    X : array_like, shape=[n_samples, n_features]
        Input features matrix.

    num_iteration : int
        Limit number of iterations in the prediction; defaults to 0 (use all trees).

    Returns
    -------
    predicted_probability : array_like, shape=[n_samples, n_classes]
842
843
844
845
846
847
848
849

####classes_

    Get class label array.

####n_classes_

    Get number of classes.
850

wxchan's avatar
wxchan committed
851
852
853
854
855

###LGBMRegressor

###LGBMRanker

wxchan's avatar
wxchan committed
856
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None)
wxchan's avatar
wxchan committed
857

wxchan's avatar
wxchan committed
858
    Most arguments are same as Common Methods except:
wxchan's avatar
wxchan committed
859

wxchan's avatar
wxchan committed
860
    eval_at : int or list of int, default=1
wxchan's avatar
wxchan committed
861
862
        The evaulation positions of NDCG

863
##Callbacks
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939

###Before iteration

####reset_parameter(**kwargs)

    Reset parameter after first iteration

    NOTE: the initial parameter will still take in-effect on first iteration.

    Parameters
    ----------
    **kwargs: value should be list or function
        List of parameters for each boosting round
        or a customized function that calculates learning_rate in terms of
        current number of round (e.g. yields learning rate decay)
        - list l: parameter = l[current_round]
        - function f: parameter = f(current_round)
    Returns
    -------
    callback : function
        The requested callback function.

###After iteration

####print_evaluation(period=1, show_stdv=True)

    Create a callback that print evaluation result.
    (Same function as `verbose_eval` in lightgbm.train())

    Parameters
    ----------
    period : int
        The period to log the evaluation results

    show_stdv : bool, optional
        Whether show standard deviation if provided

    Returns
    -------
    callback : function
        A callback that prints evaluation every period iterations.

####record_evaluation(eval_result)

    Create a call back that records the evaluation history into eval_result.
    (Same function as `evals_result` in lightgbm.train())

    Parameters
    ----------
    eval_result : dict
       A dictionary to store the evaluation results.

    Returns
    -------
    callback : function
        The requested callback function.

####early_stopping(stopping_rounds, verbose=True)

    Create a callback that activates early stopping.
    To activates early stopping, at least one validation data and one metric is required.
    If there's more than one, all of them will be checked.
    (Same function as `early_stopping_rounds` in lightgbm.train())

    Parameters
    ----------
    stopping_rounds : int
       The stopping rounds before the trend occur.

    verbose : optional, bool
        Whether to print message about early stopping information.

    Returns
    -------
    callback : function
        The requested callback function.
940
941
942

##Plotting

wxchan's avatar
wxchan committed
943
####plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='Feature importance', ylabel='Features', importance_type='split', max_num_features=None, ignore_zero=True, figsize=None, grid=True, **kwargs):
944
945
946
947
948
949

    Plot model feature importances.

    Parameters
    ----------
    booster : Booster, LGBMModel or array
wxchan's avatar
wxchan committed
950
        Booster or LGBMModel instance, or array of feature importances.
951
952
953
    ax : matplotlib Axes
        Target axes instance. If None, new figure and axes will be created.
    height : float
wxchan's avatar
wxchan committed
954
955
956
957
958
        Bar height, passed to ax.barh().
    xlim : tuple of 2 elements
        Tuple passed to axes.xlim().
    ylim : tuple of 2 elements
        Tuple passed to axes.ylim().
959
960
961
962
963
964
965
    title : str
        Axes title. Pass None to disable.
    xlabel : str
        X axis title label. Pass None to disable.
    ylabel : str
        Y axis title label. Pass None to disable.
    importance_type : str
wxchan's avatar
wxchan committed
966
967
968
        How the importance is calculated: "split" or "gain".
        "split" is the number of times a feature is used in a model.
        "gain" is the total gain of splits which use the feature.
969
970
971
972
    max_num_features : int
        Max number of top features displayed on plot.
        If None or smaller than 1, all features will be displayed.
    ignore_zero : bool
wxchan's avatar
wxchan committed
973
974
975
        Ignore features with zero importance.
    figsize : tuple of 2 elements
        Figure size.
976
    grid : bool
wxchan's avatar
wxchan committed
977
        Whether add grid for axes.
978
    **kwargs :
wxchan's avatar
wxchan committed
979
980
981
982
983
984
        Other keywords passed to ax.barh().

    Returns
    -------
    ax : matplotlib Axes

985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
####plot_metric(booster, metric=None, dataset_names=None, ax=None, xlim=None, ylim=None, title='Metric during training', xlabel='Iterations', ylabel='auto', figsize=None, grid=True):
    
    Plot one metric during training.

    Parameters
    ----------
    booster : dict or LGBMModel
        Evals_result recorded by lightgbm.train() or LGBMModel instance
    metric : str or None
        The metric name to plot.
        Only one metric supported because different metrics have various scales.
        Pass None to pick `first` one (according to dict hashcode).
    dataset_names : None or list of str
        List of the dataset names to plot.
        Pass None to plot all datasets.
    ax : matplotlib Axes
        Target axes instance. If None, new figure and axes will be created.
    xlim : tuple of 2 elements
        Tuple passed to axes.xlim()
    ylim : tuple of 2 elements
        Tuple passed to axes.ylim()
    title : str
        Axes title. Pass None to disable.
    xlabel : str
        X axis title label. Pass None to disable.
    ylabel : str
        Y axis title label. Pass None to disable. Pass 'auto' to use `metric`.
    figsize : tuple of 2 elements
        Figure size
    grid : bool
        Whether add grid for axes

    Returns
    -------
    ax : matplotlib Axes

wxchan's avatar
wxchan committed
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
####plot_tree(booster, ax=None, tree_index=0, figsize=None, graph_attr=None, node_attr=None, edge_attr=None, show_info=None):
    Plot specified tree.

    Parameters
    ----------
    booster : Booster, LGBMModel
        Booster or LGBMModel instance.
    ax : matplotlib Axes
        Target axes instance. If None, new figure and axes will be created.
    tree_index : int, default 0
        Specify tree index of target tree.
    figsize : tuple of 2 elements
        Figure size.
    graph_attr: dict
        Mapping of (attribute, value) pairs for the graph.
    node_attr: dict
        Mapping of (attribute, value) pairs set for all nodes.
    edge_attr: dict
        Mapping of (attribute, value) pairs set for all edges.
    show_info : list
        Information shows on nodes.
        options: 'split_gain', 'internal_value', 'internal_count' or 'leaf_count'.
1043
1044
1045
1046

    Returns
    -------
    ax : matplotlib Axes