move more documents to docs

76f66b11 · Guolin Ke · c7ef8322 · 76f66b11 · 76f66b11 · 76f66b11
Commit 76f66b11 authored Dec 16, 2016 by Guolin Ke
10 changed files
--- a/README.md
+++ b/README.md
@@ -17,9 +17,9 @@ For more details, please refer to [Features](https://github.com/Microsoft/LightG
 News
 ----
 12/05/2016 : **Categorical Features as input directly**(without one-hot coding). Experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up with same accuracy compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
-For the setting details, please refer to [IO Parameters](https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md#io-parameters).
+For the setting details, please refer to [IO Parameters](./docs/Parameters.md#io-parameters).

-12/02/2016 : Release [**python-package**](https://github.com/Microsoft/LightGBM/tree/master/python-package) beta version, welcome to have a try and provide issues and feedback.
+12/02/2016 : Release [**python-package**](./python-package) beta version, welcome to have a try and provide issues and feedback.

 Get Started
 ------------
@@ -30,7 +30,7 @@ Documents
 * [**Wiki**](https://github.com/Microsoft/LightGBM/wiki)
 * [**Installation Guide**](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide)
 * [**Quick Start**](https://github.com/Microsoft/LightGBM/wiki/Quick-Start)
-* [**Examples**](https://github.com/Microsoft/LightGBM/tree/master/examples)
+* [**Examples**](./examples)
 * [**Features**](https://github.com/Microsoft/LightGBM/wiki/Features)
 * [**Parallel Learning Guide**](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide)
 * [**Configuration**](https://github.com/Microsoft/LightGBM/wiki/Configuration)

--- a/docs/Installation-Guide.md
+++ b/docs/Installation-Guide.md
+Refer to https://github.com/Microsoft/LightGBM/wiki/Installation-Guide.
--- a/docs/Parallel-Learning-Guide.md
+++ b/docs/Parallel-Learning-Guide.md
+Refer to https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide
\ No newline at end of file
--- a/docs/Parameters.md
+++ b/docs/Parameters.md
@@ -34,7 +34,7 @@ The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can
  * ```serial```, single machine tree learner
  * ```feature```, feature parallel tree learner
  * ```data```, data parallel tree learner
-  * Refer to [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide) to get more details.
+  * Refer to [Parallel Learning Guide](./Parallel-Learning-Guide.md) to get more details.
 * ```num_threads```, default=OpenMP_default, type=int, alias=```num_thread```,```nthread```
  * Number of threads for LightGBM. 
  * For the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPU using [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) to generate 2 threads per CPU core).
@@ -217,7 +217,7 @@ LightGBM uses [leaf-wise](https://github.com/Microsoft/LightGBM/wiki/Features#op
 * Use feature sub-sampling by set ```feature_fraction```
 * Use small ```max_bin```
 * Use ```save_binary``` to speed up data loading in future learning
-* Use parallel learning, refer to [parallel learning guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide).
+* Use parallel learning, refer to [parallel learning guide](./Parallel-Learning-Guide.md).

 ### For better accuracy


--- a/docs/Python_API.md
+++ b/docs/Python_API.md
 ##Catalog

-* [Data Structure API](Python_API.md#basic-data-structure-api)
-    - [Dataset](Python_API.md#dataset)
-    - [Booster](Python_API.md#booster)
-
-* [Training API](Python_API.md#training-api)
-    - [train](Python_API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
-    - [cv](Python_API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
-
-* [Scikit-learn API](Python_API.md#scikit-learn-api)
-    - [Common Methods](Python_API.md#common-methods)
-    - [LGBMClassifier](Python_API.md#lgbmclassifier)
-    - [LGBMRegressor](Python_API.md#lgbmregressor)
-    - [LGBMRanker](Python_API.md#lgbmranker)
+* [Data Structure API](Python-API.md#basic-data-structure-api)
+    - [Dataset](Python-API.md#dataset)
+    - [Booster](Python-API.md#booster)
+
+* [Training API](Python-API.md#training-api)
+    - [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
+    - [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
+
+* [Scikit-learn API](Python-API.md#scikit-learn-api)
+    - [Common Methods](Python-API.md#common-methods)
+    - [LGBMClassifier](Python-API.md#lgbmclassifier)
+    - [LGBMRegressor](Python-API.md#lgbmregressor)
+    - [LGBMRanker](Python-API.md#lgbmranker)
    
 The methods of each Class is in alphabetical order.


--- a/docs/Python_intro.md
+++ b/docs/Python_intro.md
@@ -3,12 +3,12 @@ Python Package Introduction
 This document gives a basic walkthrough of LightGBM python package.

 ***List of other Helpful Links***
-* [Python Examples](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide)
-* [Python API Reference](https://github.com/Microsoft/LightGBM/blob/master/docs/Python_API.md)
+* [Python Examples](../examples/python-guide/)
+* [Python API Reference](./Python-API.md)

 Install
 -------
-* Install the library first, follow the wiki [here](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide).
+* Install the library first, follow the wiki [here](./Installation-Guide.md).
 * In the  `python-package` directory, run
 ```
 python setup.py install
@@ -95,7 +95,7 @@ However, Numpy/Array/Pandas object is memory cost. If you concern about your mem

 Setting Parameters
 ------------------
-LightGBM can use either a list of pairs or a dictionary to set [parameters](https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md). For instance:
+LightGBM can use either a list of pairs or a dictionary to set [parameters](./Parameters.md). For instance:
 * Booster parameters
 ```python
 param = {'num_leaves':31, 'num_trees':100, 'objective':'binary' }

--- a/docs/Quick-Start.md
+++ b/docs/Quick-Start.md
+This is a quick start guide for LightGBM of cli version.
+
+Follow the [Installation Guide](./Installation-Guide.md) to install LightGBM first.
+
+***List of other Helpful Links***
+* [Python Package quick start guide](./Python-intro.md)
+
+## Training data format 
+
+LightGBM supports input data file with [CSV](https://en.wikipedia.org/wiki/Comma-separated_values), [TSV] (https://en.wikipedia.org/wiki/Tab-separated_values) and [LibSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) formats.
+
+Label is the data of first column, and there is no header in the file.
+
+### Categorical feature support
+
+update 12/5/2016:
+
+LightGBM can use categorical feature directly (without one-hot coding). The experiment on [Expo data](http://stat-computing.org/dataexpo/2009/) shows about 8x speed-up compared with one-hot coding (refer to [categorical log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_speed.log) and [one-hot log]( https://github.com/guolinke/boosting_tree_benchmarks/blob/master/lightgbm/lightgbm_dataexpo_onehot_speed.log)).
+
+For the setting details, please refer to [Parameters](./Parameters.md#io-parameters).
+
+### Weight and query/group data
+LightGBM also support weighted training, it needs an additional [weight data](./Parameters.md#weight-data). And it needs an additional [query data](./Parameters.md#query-data) for ranking task.
+
+update 11/3/2016:
+
+1. support input with header now
+2. can specific label column, weight column and query/group id column. Both index and column are supported
+3. can specific a list of ignored columns
+
+For the detailed usage, please refer to [Configuration](./Parameters.md#io-parameters).
+
+## Parameter quick look
+
+The parameter format is ```key1=value1 key2=value2 ... ``` . And parameters can be in both config file and command line.
+
+Some important parameters:
+
+* ```config```, default=```""```, type=string, alias=```config_file```
+  * path of config file
+* ```task```, default=```train```, type=enum, options=```train```,```prediction```
+  * ```train``` for training
+  * ```prediction``` for prediction.
+* ```application```, default=```regression```, type=enum, options=```regression```,```binary```,```lambdarank```,```multiclass```, alias=```objective```,```app```
+  * ```regression```, regression application
+  * ```binary```, binary classification application 
+  * ```lambdarank```, lambdarank application
+  * ```multiclass```, multi-class classification application, should set ```num_class``` as well
+* ```boosting```, default=```gbdt```, type=enum, options=```gbdt```,```dart```, alias=```boost```,```boosting_type```
+  * ```gbdt```, traditional Gradient Boosting Decision Tree 
+  * ```dart```, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866)
+* ```data```, default=```""```, type=string, alias=```train```,```train_data```
+  * training data, LightGBM will train from this data
+* ```valid```, default=```""```, type=multi-string, alias=```test```,```valid_data```,```test_data```
+  * validation/test data, LightGBM will output metrics for these data
+  * support multi validation data, separate by ```,```
+* ```num_iterations```, default=```10```, type=int, alias=```num_iteration```,```num_tree```,```num_trees```,```num_round```,```num_rounds```
+  * number of boosting iterations/trees
+* ```learning_rate```, default=```0.1```, type=double, alias=```shrinkage_rate```
+  * shrinkage rate
+* ```num_leaves```, default=```127```, type=int, alias=```num_leaf```
+  * number of leaves in one tree
+* ```tree_learner```, default=```serial```, type=enum, options=```serial```,```feature```,```data```
+  * ```serial```, single machine tree learner
+  * ```feature```, feature parallel tree learner
+  * ```data```, data parallel tree learner
+  * Refer to [Parallel Learning Guide](./Parallel-Learning-Guide.md) to get more details.
+* ```num_threads```, default=OpenMP_default, type=int, alias=```num_thread```,```nthread```
+  * Number of threads for LightGBM. 
+  * For the best speed, set this to the number of **real CPU cores**, not the number of threads (most CPU using [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) to generate 2 threads per CPU core).
+  * For parallel learning, should not use full CPU cores since this will cause poor performance for the network.
+* ```max_depth```, default=```-1```, type=int
+  * Limit the max depth for tree model. This is used to deal with overfit when #data is small. Tree still grow by leaf-wise. 
+  * ```< 0``` means no limit 
+* ```min_data_in_leaf```, default=```100```, type=int, alias=```min_data_per_leaf``` , ```min_data```
+  * Minimal number of data in one leaf. Can use this to deal with over-fit.
+* ```min_sum_hessian_in_leaf```, default=```10.0```, type=double, alias=```min_sum_hessian_per_leaf```, ```min_sum_hessian```, ```min_hessian```
+  * Minimal sum hessian in one leaf. Like ```min_data_in_leaf```, can use this to deal with over-fit.
+
+For all parameters, please refer to [Parameters](./Parameters.md).
+
+
+## Run LightGBM
+
+For Windows:
+```
+lightgbm.exe config=your_config_file other_args ...
+```
+
+For unix:
+```
+./lightgbm config=your_config_file other_args ...
+```
+
+Parameters can be both in the config file and command line, and the parameters in command line have higher priority than in config file.
+For example, following command line will keep 'num_trees=10' and ignore same parameter in config file.
+```
+./lightgbm config=train.conf num_trees=10
+```
+
+## Examples
+
+* [Binary Classifiaction](../examples/binary_classification)
+* [Regression](../examples/regression)
+* [Lambdarank](../examples/lambdarank)
+* [Parallel Learning](../examples/parallel_learning)
--- a/docs/Readme.md
+++ b/docs/Readme.md
+Documents
+=========
+* [Installation Guide](https://github.com/Microsoft/LightGBM/wiki/Installation-Guide)
+* [Quick Start](./Quick-Start.md)
+* [Parameters](./Parameters.md) 
+* [Python Quick Start](./Python-intro.md)
+* [Python API Reference](./Python-API.md)
+* [Parallel Learning Guide](https://github.com/Microsoft/LightGBM/wiki/Parallel-Learning-Guide)
--- a/docs/css/extra.css
+++ b/docs/css/extra.css
-.toctree-l4{
-    padding: 0.4045em 2.427em 0.4045em 3.227em !important;
-}
--- a/mkdocs.yml
+++ b/mkdocs.yml
-site_name: LightGBM
-theme: readthedocs
-
-extra_css:
-  - css/extra.css