Commit 76f66b11 authored by Guolin Ke's avatar Guolin Ke
Browse files

move more documents to docs

parent c7ef8322
......@@ -17,9 +17,9 @@ For more details, please refer to [Features](https://github.com/Microsoft/LightG
News
----
12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
For the setting details, please refer to [IO Parameters](https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md#io-parameters).
For the setting details, please refer to [IO Parameters](./docs/Parameters.md#io-parameters).
12/02/2016 : Release [**python-package**](https://github.com/Microsoft/LightGBM/tree/master/python-package) beta version, welcome to have a try and provide issues and feedback.
12/02/2016 : Release [**python-package**](./python-package) beta version, welcome to have a try and provide issues and feedback.
Get Started
------------
......@@ -30,7 +30,7 @@ Documents
* [**Wiki**](https://github.com/Microsoft/LightGBM/wiki)
* [**Installation Guide**](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide)
* [**Quick Start**](https://github.com/Microsoft/LightGBM/wiki/Quick-Start)
* [**Examples**](https://github.com/Microsoft/LightGBM/tree/master/examples)
* [**Examples**](./examples)
* [**Features**](https://github.com/Microsoft/LightGBM/wiki/Features)
* [**Parallel Learning Guide**](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide)
* [**Configuration**](https://github.com/Microsoft/LightGBM/wiki/Configuration)
......
Refer to https://github.com/Microsoft/LightGBM/wiki/Installation-Guide.
Refer to https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide
\ No newline at end of file
......@@ -34,7 +34,7 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
* ```serial```, single machine tree learner
* ```feature```, feature parallel tree learner
* ```data```, data parallel tree learner
* Refer to [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide) to get more details.
* Refer to [Parallel Learning Guide](./Parallel-Learning-Guide.md) to get more details.
* ```num_threads```, default=OpenMP_default, type=int, alias=```num_thread```,```nthread```
* Number of threads for LightGBM.
* For the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPU using [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) to generate 2 threads per CPU core).
......@@ -217,7 +217,7 @@ LightGBM uses [leaf-wise](https://github.com/Microsoft/LightGBM/wiki/Features#op
* Use feature sub-sampling by set ```feature_fraction```
* Use small ```max_bin```
* Use ```save_binary``` to speed up data loading in future learning
* Use parallel learning, refer to [parallel learning guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide).
* Use parallel learning, refer to [parallel learning guide](./Parallel-Learning-Guide.md).
### For better accuracy
......
##Catalog
* [Data Structure API](Python_API.md#basic-data-structure-api)
- [Dataset](Python_API.md#dataset)
- [Booster](Python_API.md#booster)
* [Training API](Python_API.md#training-api)
- [train](Python_API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
- [cv](Python_API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
* [Scikit-learn API](Python_API.md#scikit-learn-api)
- [Common Methods](Python_API.md#common-methods)
- [LGBMClassifier](Python_API.md#lgbmclassifier)
- [LGBMRegressor](Python_API.md#lgbmregressor)
- [LGBMRanker](Python_API.md#lgbmranker)
* [Data Structure API](Python-API.md#basic-data-structure-api)
- [Dataset](Python-API.md#dataset)
- [Booster](Python-API.md#booster)
* [Training API](Python-API.md#training-api)
- [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
- [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
* [Scikit-learn API](Python-API.md#scikit-learn-api)
- [Common Methods](Python-API.md#common-methods)
- [LGBMClassifier](Python-API.md#lgbmclassifier)
- [LGBMRegressor](Python-API.md#lgbmregressor)
- [LGBMRanker](Python-API.md#lgbmranker)
The methods of each Class is in alphabetical order.
......
......@@ -3,12 +3,12 @@ Python Package Introduction
This document gives a basic walkthrough of LightGBM python package.
***List of other Helpful Links***
* [Python Examples](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide)
* [Python API Reference](https://github.com/Microsoft/LightGBM/blob/master/docs/Python_API.md)
* [Python Examples](../examples/python-guide/)
* [Python API Reference](./Python-API.md)
Install
-------
* Install the library first, follow the wiki [here](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide).
* Install the library first, follow the wiki [here](./Installation-Guide.md).
* In the `python-package` directory, run
```
python setup.py install
......@@ -95,7 +95,7 @@ However, Numpy/Array/Pandas object is memory cost. If you concern about your mem
Setting Parameters
------------------
LightGBM can use either a list of pairs or a dictionary to set [parameters](https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md). For instance:
LightGBM can use either a list of pairs or a dictionary to set [parameters](./Parameters.md). For instance:
* Booster parameters
```python
param = {'num_leaves':31, 'num_trees':100, 'objective':'binary' }
......
This is a quick start guide for LightGBM of cli version.
Follow the [Installation Guide](./Installation-Guide.md) to install LightGBM first.
***List of other Helpful Links***
* [Python Package quick start guide](./Python-intro.md)
## Training data format
LightGBM supports input data file with [CSV](https://en.wikipedia.org/wiki/Comma-separated_values), [TSV] (https://en.wikipedia.org/wiki/Tab-separated_values) and [LibSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) formats.
Label is the data of first column, and there is no header in the file.
### Categorical feature support
update 12/5/2016:
LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
For the setting details, please refer to [Parameters](./Parameters.md#io-parameters).
### Weight and query/group data
LightGBM also support weighted training, it needs an additional [weight data](./Parameters.md#weight-data). And it needs an additional [query data](./Parameters.md#query-data) for ranking task.
update 11/3/2016:
1. support input with header now
2. can specific label column, weight column and query/group id column. Both index and column are supported
3. can specific a list of ignored columns
For the detailed usage, please refer to [Configuration](./Parameters.md#io-parameters).
## Parameter quick look
The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can be in both config file and command line.
Some important parameters:
* ```config```, default=```""```, type=string, alias=```config_file```
* path of config file
* ```task```, default=```train```, type=enum, options=```train```,```prediction```
* ```train``` for training
* ```prediction``` for prediction.
* ```application```, default=```regression```, type=enum, options=```regression```,```binary```,```lambdarank```,```multiclass```, alias=```objective```,```app```
* ```regression```, regression application
* ```binary```, binary classification application
* ```lambdarank```, lambdarank application
* ```multiclass```, multi-class classification application, should set ```num_class``` as well
* ```boosting```, default=```gbdt```, type=enum, options=```gbdt```,```dart```, alias=```boost```,```boosting_type```
* ```gbdt```, traditional Gradient Boosting Decision Tree
* ```dart```, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866)
* ```data```, default=```""```, type=string, alias=```train```,```train_data```
* training data, LightGBM will train from this data
* ```valid```, default=```""```, type=multi-string, alias=```test```,```valid_data```,```test_data```
* validation/test data, LightGBM will output metrics for these data
* support multi validation data, separate by ```,```
* ```num_iterations```, default=```10```, type=int, alias=```num_iteration```,```num_tree```,```num_trees```,```num_round```,```num_rounds```
* number of boosting iterations/trees
* ```learning_rate```, default=```0.1```, type=double, alias=```shrinkage_rate```
* shrinkage rate
* ```num_leaves```, default=```127```, type=int, alias=```num_leaf```
* number of leaves in one tree
* ```tree_learner```, default=```serial```, type=enum, options=```serial```,```feature```,```data```
* ```serial```, single machine tree learner
* ```feature```, feature parallel tree learner
* ```data```, data parallel tree learner
* Refer to [Parallel Learning Guide](./Parallel-Learning-Guide.md) to get more details.
* ```num_threads```, default=OpenMP_default, type=int, alias=```num_thread```,```nthread```
* Number of threads for LightGBM.
* For the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPU using [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) to generate 2 threads per CPU core).
* For parallel learning, should not use full CPU cores since this will cause poor performance for the network.
* ```max_depth```, default=```-1```, type=int
* Limit the max depth for tree model. This is used to deal with overfit when #data is small. Tree still grow by leaf-wise.
* ```< 0``` means no limit
* ```min_data_in_leaf```, default=```100```, type=int, alias=```min_data_per_leaf``` , ```min_data```
* Minimal number of data in one leaf. Can use this to deal with over-fit.
* ```min_sum_hessian_in_leaf```, default=```10.0```, type=double, alias=```min_sum_hessian_per_leaf```, ```min_sum_hessian```, ```min_hessian```
* Minimal sum hessian in one leaf. Like ```min_data_in_leaf```, can use this to deal with over-fit.
For all parameters, please refer to [Parameters](./Parameters.md).
## Run LightGBM
For Windows:
```
lightgbm.exe config=your_config_file other_args ...
```
For unix:
```
./lightgbm config=your_config_file other_args ...
```
Parameters can be both in the config file and command line, and the parameters in command line have higher priority than in config file.
For example, following command line will keep 'num_trees=10' and ignore same parameter in config file.
```
./lightgbm config=train.conf num_trees=10
```
## Examples
* [Binary Classifiaction](../examples/binary_classification)
* [Regression](../examples/regression)
* [Lambdarank](../examples/lambdarank)
* [Parallel Learning](../examples/parallel_learning)
Documents
=========
* [Installation Guide](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide)
* [Quick Start](./Quick-Start.md)
* [Parameters](./Parameters.md)
* [Python Quick Start](./Python-intro.md)
* [Python API Reference](./Python-API.md)
* [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide)
.toctree-l4{
padding: 0.4045em 2.427em 0.4045em 3.227em !important;
}
site_name: LightGBM
theme: readthedocs
extra_css:
- css/extra.css
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment