Commit 04e1726e authored by wxchan's avatar wxchan Committed by Guolin Ke
Browse files

add Catalog to python docs (#124)

* clean python docs

* clean python docs
parent de36b329
##Basic Data Structure API ##Catalog
* [Data Structure API](Python_API.md#basic-data-structure-api)
- [Dataset](Python_API.md#dataset)
- [Booster](Python_API.md#booster)
* [Training API](Python_API.md#training-api)
- [train](Python_API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
- [cv](Python_API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
* [Scikit-learn API](Python_API.md#scikit-learn-api)
- [Common Methods](Python_API.md#common-methods)
- [LGBMClassifier](Python_API.md#lgbmclassifier)
- [LGBMRegressor](Python_API.md#lgbmregressor)
- [LGBMRanker](Python_API.md#lgbmranker)
The methods of each Class is in alphabetical order.
---- ----
##Basic Data Structure API
###Dataset ###Dataset
####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name=None, categorical_feature=None, params=None, free_raw_data=True) ####__init__(data, label=None, max_bin=255, reference=None, weight=None, group=None, silent=False, feature_name=None, categorical_feature=None, params=None, free_raw_data=True)
Parameters Parameters
---------- ----------
data : string/numpy array/scipy.sparse data : str/numpy array/scipy.sparse
Data source of Dataset. Data source of Dataset.
When data type is string, it represents the path of txt file When data type is string, it represents the path of txt file
label : list or numpy 1-D array, optional label : list or numpy 1-D array, optional
...@@ -14,15 +35,15 @@ ...@@ -14,15 +35,15 @@
Max number of discrete bin for features Max number of discrete bin for features
reference : Other Dataset, optional reference : Other Dataset, optional
If this dataset validation, need to use training data as reference If this dataset validation, need to use training data as reference
weight : list or numpy 1-D array , optional weight : list or numpy 1-D array, optional
Weight for each instance. Weight for each instance.
group : list or numpy 1-D array , optional group : list or numpy 1-D array, optional
Group/query size for dataset Group/query size for dataset
silent : boolean, optional silent : boolean, optional
Whether print messages during construction Whether print messages during construction
feature_name : list of str feature_name : list of str
Feature names Feature names
categorical_feature : list of str or int categorical_feature : list of str or list of int
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
...@@ -39,18 +60,18 @@ ...@@ -39,18 +60,18 @@
####create_valid(data, label=None, weight=None, group=None, silent=False, params=None) ####create_valid(data, label=None, weight=None, group=None, silent=False, params=None)
Create validation data align with current dataset Create validation data align with current dataset.
Parameters Parameters
---------- ----------
data : string/numpy array/scipy.sparse data : str/numpy array/scipy.sparse
Data source of _InnerDataset. Data source of _InnerDataset.
When data type is string, it represents the path of txt file When data type is string, it represents the path of txt file
label : list or numpy 1-D array, optional label : list or numpy 1-D array, optional
Label of the training data. Label of the training data.
weight : list or numpy 1-D array , optional weight : list or numpy 1-D array, optional
Weight for each instance. Weight for each instance.
group : list or numpy 1-D array , optional group : list or numpy 1-D array, optional
Group/query size for dataset Group/query size for dataset
silent : boolean, optional silent : boolean, optional
Whether print messages during construction Whether print messages during construction
...@@ -114,28 +135,28 @@ ...@@ -114,28 +135,28 @@
####save_binary(filename) ####save_binary(filename)
Save Dataset to binary file Save Dataset to binary file.
Parameters Parameters
---------- ----------
filename : string filename : str
Name of the output file. Name of the output file.
####set_categorical_feature(categorical_feature) ####set_categorical_feature(categorical_feature)
Set categorical features Set categorical features.
Parameters Parameters
---------- ----------
categorical_feature : list of int or str categorical_feature : list of str or list of int
Name/index of categorical features Name (str) or index (int) of categorical features
####set_feature_name(feature_name) ####set_feature_name(feature_name)
Set feature name Set feature name.
Parameters Parameters
---------- ----------
...@@ -159,23 +180,23 @@ ...@@ -159,23 +180,23 @@
Parameters Parameters
---------- ----------
init_score: numpy array or list or None init_score : numpy array or list or None
Init score for booster Init score for booster
####set_label(label) ####set_label(label)
Set label of Dataset Set label of Dataset.
Parameters Parameters
---------- ----------
label: numpy array or list or None label : numpy array or list or None
The label information to be set into Dataset The label information to be set into Dataset
####set_reference(reference) ####set_reference(reference)
Set reference dataset Set reference dataset.
Parameters Parameters
---------- ----------
...@@ -195,7 +216,7 @@ ...@@ -195,7 +216,7 @@
####subset(used_indices, params=None) ####subset(used_indices, params=None)
Get subset of current dataset Get subset of current dataset.
Parameters Parameters
---------- ----------
...@@ -206,6 +227,7 @@ ...@@ -206,6 +227,7 @@
###Booster ###Booster
####__init__(params=None, train_set=None, model_file=None, silent=False) ####__init__(params=None, train_set=None, model_file=None, silent=False)
Initialize the Booster. Initialize the Booster.
...@@ -216,7 +238,7 @@ ...@@ -216,7 +238,7 @@
Parameters for boosters. Parameters for boosters.
train_set : Dataset train_set : Dataset
Training dataset Training dataset
model_file : string model_file : str
Path to the model file. Path to the model file.
silent : boolean, optional silent : boolean, optional
Whether print messages during construction Whether print messages during construction
...@@ -224,13 +246,13 @@ ...@@ -224,13 +246,13 @@
####add_valid(data, name) ####add_valid(data, name)
Add an validation data Add an validation data.
Parameters Parameters
---------- ----------
data : Dataset data : Dataset
Validation data Validation data
name : String name : str
Name of validation data Name of validation data
...@@ -251,18 +273,26 @@ ...@@ -251,18 +273,26 @@
####current_iteration() ####current_iteration()
Get current number of iterations.
Returns
-------
result : int
Current number of iterations
####dump_model() ####dump_model()
Dump model to json format Dump model to json format.
Returns Returns
------- -------
Json format of model result : dict or list
Json format of model
####eval(data, name, feval=None) ####eval(data, name, feval=None)
Evaluate for data Evaluate for data.
Parameters Parameters
---------- ----------
...@@ -273,13 +303,13 @@ ...@@ -273,13 +303,13 @@
Custom evaluation function. Custom evaluation function.
Returns Returns
------- -------
result: list result : list
Evaluation result list. Evaluation result list.
####eval_train(feval=None) ####eval_train(feval=None)
Evaluate for training data Evaluate for training data.
Parameters Parameters
---------- ----------
...@@ -294,7 +324,7 @@ ...@@ -294,7 +324,7 @@
####eval_valid(feval=None) ####eval_valid(feval=None)
Evaluate for validation data Evaluate for validation data.
Parameters Parameters
---------- ----------
...@@ -303,26 +333,27 @@ ...@@ -303,26 +333,27 @@
Returns Returns
------- -------
result: str result : str
Evaluation result list. Evaluation result list.
####feature_importance(importance_type=split) ####feature_importance(importance_type="split")
Feature importances Feature importances.
Returns Returns
------- -------
Array of feature importances result : array
Array of feature importances
####predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True) ####predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True)
Predict logic Predict logic.
Parameters Parameters
---------- ----------
data : string/numpy array/scipy.sparse data : str/numpy array/scipy.sparse
Data source for prediction Data source for prediction
When data type is string, it represents the path of txt file When data type is string, it represents the path of txt file
num_iteration : int num_iteration : int
...@@ -343,7 +374,7 @@ ...@@ -343,7 +374,7 @@
####reset_parameter(params) ####reset_parameter(params)
Reset parameters for booster Reset parameters for booster.
Parameters Parameters
---------- ----------
...@@ -355,12 +386,12 @@ ...@@ -355,12 +386,12 @@
####rollback_one_iter() ####rollback_one_iter()
Rollback one iteration Rollback one iteration.
####save_model(filename, num_iteration=-1) ####save_model(filename, num_iteration=-1)
Save model of booster to file Save model of booster to file.
Parameters Parameters
---------- ----------
...@@ -370,7 +401,7 @@ ...@@ -370,7 +401,7 @@
Number of iteration that want to save. < 0 means save all Number of iteration that want to save. < 0 means save all
####set_attr(kwargs) ####set_attr(**kwargs)
Set the attribute of the Booster. Set the attribute of the Booster.
...@@ -382,12 +413,19 @@ ...@@ -382,12 +413,19 @@
####set_train_data_name(name) ####set_train_data_name(name)
Set training data name.
Parameters
----------
name : str
Name of training data.
####update(train_set=None, fobj=None) ####update(train_set=None, fobj=None)
Update for one iteration Update for one iteration.
Note: for multi-class task, the score is group by class_id first, then group by row_id Note: for multi-class task, the score is group by class_id first, then group by row_id
if you want to get i-th row score in j-th class, the access way is score[j*num_data+i] if you want to get i-th row score in j-th class, the access way is score[j*num_data+i]
and you should group grad and hess in this way as well and you should group grad and hess in this way as well.
Parameters Parameters
---------- ----------
...@@ -402,7 +440,7 @@ ...@@ -402,7 +440,7 @@
##Training API ##Training API
----
####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None) ####train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, callbacks=None)
Train with given parameters. Train with given parameters.
...@@ -417,7 +455,7 @@ ...@@ -417,7 +455,7 @@
Number of boosting iterations. Number of boosting iterations.
valid_sets: list of Datasets valid_sets: list of Datasets
List of data to be evaluated during training List of data to be evaluated during training
valid_names: list of string valid_names: list of str
Names of valid_sets Names of valid_sets
fobj : function fobj : function
Customized objective function. Customized objective function.
...@@ -428,7 +466,7 @@ ...@@ -428,7 +466,7 @@
model used for continued train model used for continued train
feature_name : list of str feature_name : list of str
Feature names Feature names
categorical_feature : list of str or int categorical_feature : list of str or list of int
Categorical features, Categorical features,
type int represents index, type int represents index,
type str represents feature names (need to specify feature_name as well) type str represents feature names (need to specify feature_name as well)
...@@ -490,7 +528,7 @@ ...@@ -490,7 +528,7 @@
Perform stratified sampling. Perform stratified sampling.
folds : a KFold or StratifiedKFold instance folds : a KFold or StratifiedKFold instance
Sklearn KFolds or StratifiedKFolds. Sklearn KFolds or StratifiedKFolds.
metrics : string or list of strings metrics : str or list of str
Evaluation metrics to be watched in CV. Evaluation metrics to be watched in CV.
fobj : function fobj : function
Custom objective function. Custom objective function.
...@@ -526,19 +564,20 @@ ...@@ -526,19 +564,20 @@
Returns Returns
------- -------
evaluation history : list(string) evaluation history : list of str
##Scikit-learn API ##Scikit-learn API
----
###Common Methods ###Common Methods
####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, silent=True, objective=regression, nthread=-1, min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0)
####__init__(boosting_type="gbdt", num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, silent=True, objective="regression", nthread=-1, min_split_gain=0, min_child_weight=5, min_child_samples=10, subsample=1, subsample_freq=1, colsample_bytree=1, reg_alpha=0, reg_lambda=0, scale_pos_weight=1, is_unbalance=False, seed=0)
Implementation of the Scikit-Learn API for LightGBM. Implementation of the Scikit-Learn API for LightGBM.
Parameters Parameters
---------- ----------
boosting_type : string boosting_type : str
gbdt, traditional Gradient Boosting Decision Tree gbdt, traditional Gradient Boosting Decision Tree
dart, Dropouts meet Multiple Additive Regression Trees dart, Dropouts meet Multiple Additive Regression Trees
num_leaves : int num_leaves : int
...@@ -551,10 +590,10 @@ ...@@ -551,10 +590,10 @@
Number of boosted trees to fit. Number of boosted trees to fit.
silent : boolean silent : boolean
Whether to print messages while running boosting. Whether to print messages while running boosting.
objective : string or callable objective : str or callable
Specify the learning task and the corresponding learning objective or Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below). a custom objective function to be used (see note below).
default: binary for LGBMClassifier, lambdarank for LGBMRanker default: binary for LGBMClassifier, regression for LGBMRegressor, lambdarank for LGBMRanker
nthread : int nthread : int
Number of parallel threads Number of parallel threads
min_split_gain : float min_split_gain : float
...@@ -623,7 +662,7 @@ ...@@ -623,7 +662,7 @@
####booster() ####booster()
Get the underlying lightgbm Booster of this model. Get the underlying lightgbm Booster of this model.
This will raise an exception when fit was not called This will raise an exception when it's called before fit().
Returns Returns
------- -------
...@@ -641,16 +680,17 @@ ...@@ -641,16 +680,17 @@
####feature_importance() ####feature_importance()
Feature importances Return the feature importances of each feature.
Returns Returns
------- -------
Array of normailized feature importances result : array
Array of normailized feature importances
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, other_params=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, other_params=None)
Fit the gradient boosting model Fit the gradient boosting model.
Parameters Parameters
---------- ----------
...@@ -715,7 +755,7 @@ ...@@ -715,7 +755,7 @@
####get_params(deep=False) ####get_params(deep=False)
Get parameters Get parameters.
####predict(data, raw_score=False, num_iteration=0) ####predict(data, raw_score=False, num_iteration=0)
...@@ -760,9 +800,8 @@ ...@@ -760,9 +800,8 @@
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, eval_at=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, other_params=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, eval_at=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, other_params=None)
Most arguments like common methods except following: Most arguments are same as Common Methods except:
eval_at : list of int eval_at : list of int
The evaulation positions of NDCG The evaulation positions of NDCG
# coding: utf-8
# pylint: disable = C0103, C0111, C0301, C0321, C0330, W0621
import inspect
import lightgbm as lgb
file_api = open('Python_API.md', 'w+')
def write_func(func, leftSpace=0):
file_api.write('####' + func.__name__ + '('
+ ', '.join([
v.name + ('=' + str(v.default) if v.default != v.empty else '')
for _, v in inspect.signature(func).parameters.items() if v.name != 'self'
])
+ ')\n')
if func.__doc__:
for line in func.__doc__.splitlines():
if line: file_api.write(line[leftSpace:])
file_api.write('\n')
file_api.write('\n')
def write_class(class_):
file_api.write('###' + class_.__name__ + '\n')
for name, members in sorted(class_.__dict__.items(), key=lambda x: x[0]):
if name == '__init__' or not name.startswith('_'): write_func(members, leftSpace=4)
def write_module(name, members):
file_api.write('##' + name + '\n----\n')
for member in members:
if inspect.isclass(member): write_class(member)
else: write_func(member)
write_module('Basic Data Structure API', [
lgb.Dataset,
lgb.Booster
])
write_module('Training API', [
lgb.train,
lgb.cv
])
write_module('Scikit-learn API', [
lgb.LGBMModel,
lgb.LGBMClassifier,
lgb.LGBMRegressor,
lgb.LGBMRanker
])
file_api.close()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment