[docs] documentation improvement (#976)

* fixed typos and hotfixes * converted gcc-tips.Rmd; added ref to gcc-tips * renamed files * renamed Advanced-Topics * renamed README * renamed Parameters-Tuning * renamed FAQ * fixed refs to FAQ * fixed undecodable source characters * renamed Features * renamed Quick-Start * fixed undecodable source characters in Features * renamed Python-Intro * renamed GPU-Tutorial * renamed GPU-Windows * fixed markdown * fixed undecodable source characters in GPU-Windows * renamed Parameters * fixed markdown * removed recommonmark dependence * hotfixes * added anchors to links * fixed 404 * fixed typos * added more anchors * removed sphinxcontrib-napoleon dependence * removed outdated line in Travis config * fixed max-width of the ReadTheDocs theme * added horizontal align to images

[docs] documentation improvement (#976)
* fixed typos and hotfixes * converted gcc-tips.Rmd; added ref to gcc-tips * renamed files * renamed Advanced-Topics * renamed README * renamed Parameters-Tuning * renamed FAQ * fixed refs to FAQ * fixed undecodable source characters * renamed Features * renamed Quick-Start * fixed undecodable source characters in Features * renamed Python-Intro * renamed GPU-Tutorial * renamed GPU-Windows * fixed markdown * fixed undecodable source characters in GPU-Windows * renamed Parameters * fixed markdown * removed recommonmark dependence * hotfixes * added anchors to links * fixed 404 * fixed typos * added more anchors * removed sphinxcontrib-napoleon dependence * removed outdated line in Travis config * fixed max-width of the ReadTheDocs theme * added horizontal align to images
4aa32967 · Nikita Titov · Tsukasa OMOTO · 12257feb · 4aa32967 · 4aa32967
Commit 4aa32967 authored Oct 12, 2017 by Nikita Titov Committed by Tsukasa OMOTO Oct 12, 2017
20 changed files
--- a/.travis/test.sh
+++ b/.travis/test.sh
@@ -28,13 +28,12 @@ cd $TRAVIS_BUILD_DIR
 if [[ ${TASK} == "check-docs" ]]; then
    cd docs
    sudo apt-get install linkchecker
-    pip install rstcheck  # html5validator
-    pip install -r requirements.txt
+    pip install rstcheck sphinx sphinx_rtd_theme  # html5validator
    rstcheck --report warning --ignore-directives=autoclass,autofunction `find . -type f -name "*.rst"` || exit -1
    make html || exit -1
    find ./_build/html/ -type f -name '*.html' -exec \
-    sed -i -e 's#\(\.\/[^.]*\.\)\(md\|rst\)#\1html#g' {} \;  # Emulate js function
-#    html5validator --root ./_build/html/ || exit -1  For future (Sphinx 1.6) usage
+    sed -i -e 's;\(\.\/[^.]*\.\)rst\([^[:space:]]*\);\1html\2;g' {} \;  # Emulate js function
+#    html5validator --root ./_build/html/ || exit -1
    linkchecker --config=.linkcheckerrc ./_build/html/*.html || exit -1
    exit 0
 fi

--- a/R-package/R/lgb.cv.R
+++ b/R-package/R/lgb.cv.R
@@ -57,7 +57,7 @@ CVBooster <- R6Class(
 #'        If early stopping occurs, the model will have 'best_iter' field
 #' @param callbacks list of callback functions
 #'        List of callback functions that are applied at each iteration.
-#' @param ... other parameters, see parameters.md for more informations
+#' @param ... other parameters, see Parameters.rst for more informations
 #' 
 #' @return a trained model \code{lgb.CVBooster}.
 #' 

--- a/R-package/R/lgb.train.R
+++ b/R-package/R/lgb.train.R
@@ -30,7 +30,7 @@
 #' @param reset_data Boolean, setting it to TRUE (not the default value) will transform the booster model into a predictor model which frees up memory and the original datasets
 #' @param callbacks list of callback functions
 #'        List of callback functions that are applied at each iteration.
-#' @param ... other parameters, see parameters.md for more informations
+#' @param ... other parameters, see Parameters.rst for more informations
 #' 
 #' @return a trained booster model \code{lgb.Booster}.
 #' 

--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith
 - Parallel and GPU learning supported
 - Capable of handling large-scale data

-For more details, please refer to [Features](https://github.com/Microsoft/LightGBM/blob/master/docs/Features.md).
+For more details, please refer to [Features](https://github.com/Microsoft/LightGBM/blob/master/docs/Features.rst).

 [Comparison experiments](https://github.com/Microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the [parallel experiments](https://github.com/Microsoft/LightGBM/blob/master/docs/Experiments.rst#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

@@ -36,7 +36,7 @@ News

 05/03/2017 : LightGBM v2 stable release.

-04/10/2017 : LightGBM supports GPU-accelerated tree learning now. Please read our [GPU Tutorial](./docs/GPU-Tutorial.md) and [Performance Comparison](./docs/GPU-Performance.rst).
+04/10/2017 : LightGBM supports GPU-accelerated tree learning now. Please read our [GPU Tutorial](./docs/GPU-Tutorial.rst) and [Performance Comparison](./docs/GPU-Performance.rst).

 02/20/2017 : Update to LightGBM v2.

@@ -62,22 +62,22 @@ JPMML: https://github.com/jpmml/jpmml-lightgbm
 Get Started and Documentation
 -----------------------------

-Install by following the [guide](https://github.com/Microsoft/LightGBM/blob/master/docs/Installation-Guide.rst) for the command line program, [Python-package](https://github.com/Microsoft/LightGBM/tree/master/python-package) or [R-package](https://github.com/Microsoft/LightGBM/tree/master/R-package). Then please see the [Quick Start](https://github.com/Microsoft/LightGBM/blob/master/docs/Quick-Start.md) guide.
+Install by following the [guide](https://github.com/Microsoft/LightGBM/blob/master/docs/Installation-Guide.rst) for the command line program, [Python-package](https://github.com/Microsoft/LightGBM/tree/master/python-package) or [R-package](https://github.com/Microsoft/LightGBM/tree/master/R-package). Then please see the [Quick Start](https://github.com/Microsoft/LightGBM/blob/master/docs/Quick-Start.rst) guide.

 Our primary documentation is at https://lightgbm.readthedocs.io/ and is generated from this repository.

 Next you may want to read:

 * [**Examples**](https://github.com/Microsoft/LightGBM/tree/master/examples) showing command line usage of common tasks
-* [**Features**](https://github.com/Microsoft/LightGBM/blob/master/docs/Features.md) and algorithms supported by LightGBM
-* [**Parameters**](https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md) is an exhaustive list of customization you can make
-* [**Parallel Learning**](https://github.com/Microsoft/LightGBM/blob/master/docs/Parallel-Learning-Guide.rst) and [**GPU Learning**](https://github.com/Microsoft/LightGBM/blob/master/docs/GPU-Tutorial.md) can speed up computation
+* [**Features**](https://github.com/Microsoft/LightGBM/blob/master/docs/Features.rst) and algorithms supported by LightGBM
+* [**Parameters**](https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst) is an exhaustive list of customization you can make
+* [**Parallel Learning**](https://github.com/Microsoft/LightGBM/blob/master/docs/Parallel-Learning-Guide.rst) and [**GPU Learning**](https://github.com/Microsoft/LightGBM/blob/master/docs/GPU-Tutorial.rst) can speed up computation
 * [**Laurae++ interactive documentation**](https://sites.google.com/view/lauraepp/parameters) is a detailed guide for hyperparameters

 Documentation for contributors:

 * [**How we Update readthedocs.io**](https://github.com/Microsoft/LightGBM/blob/master/docs/README.md)
-* Check out the [Development Guide](https://github.com/Microsoft/LightGBM/blob/master/docs/development.rst).
+* Check out the [Development Guide](https://github.com/Microsoft/LightGBM/blob/master/docs/Development-Guide.rst).

 Support
 -------

--- a/docs/Advanced-Topic.md
+++ b/docs/Advanced-Topic.md
-# Advanced Topics
-
-## Missing Value Handle
-
-* LightGBM enables the missing value handle by default, you can disable it by set ```use_missing=false```.
-* LightGBM uses NA (NAN) to represent the missing value by default, you can change it to use zero by set ```zero_as_missing=true```.
-* When ```zero_as_missing=false``` (default), the unshown value in sparse matrices (and LightSVM) is treated as zeros.
-* When ```zero_as_missing=true```, NA and zeros (including unshown value in sparse matrices (and LightSVM)) are treated as missing.
-
-## Categorical Feature Support
-
-* LightGBM can offer a good accuracy when using native categorical features. Not like simply one-hot coding, LightGBM can find the optimal split of categorical features. Such an optimal split can provide the much better accuracy than one-hot coding solution.
-* Use `categorical_feature` to specify the categorical features. Refer to the parameter `categorical_feature` in [Parameters](./Parameters.md).
-* Converting to `int` type is needed first, and there is support for non-negative numbers only. It is better to convert into continues ranges.
-* Use `max_cat_group`, `cat_smooth_ratio` to deal with over-fitting (when #data is small or #category is large).
-* For categorical features with high cardinality (#category is large), it is better to convert it to numerical features.
-
-## LambdaRank
-
-* The label should be `int` type, and larger numbers represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect).
-* Use `label_gain` to set the gain(weight) of `int` label.
-* Use `max_position` to set the NDCG optimization position.
-
-## Parameters Tuning
-
-* Refer to [Parameters Tuning](./Parameters-tuning.md).
-
-## GPU Support
-
-* Refer to [GPU Tutorial](./GPU-Tutorial.md) and [GPU Targets](./GPU-Targets.rst).
-
-## Parallel Learning
-
-* Refer to [Parallel Learning Guide](./Parallel-Learning-Guide.rst).
--- a/docs/Advanced-Topics.rst
+++ b/docs/Advanced-Topics.rst
+Advanced Topics
+===============
+
+Missing Value Handle
+--------------------
+
+-  LightGBM enables the missing value handle by default, you can disable it by set ``use_missing=false``.
+
+-  LightGBM uses NA (NaN) to represent the missing value by default, you can change it to use zero by set ``zero_as_missing=true``.
+
+-  When ``zero_as_missing=false`` (default), the unshown value in sparse matrices (and LightSVM) is treated as zeros.
+
+-  When ``zero_as_missing=true``, NA and zeros (including unshown value in sparse matrices (and LightSVM)) are treated as missing.
+
+Categorical Feature Support
+---------------------------
+
+-  LightGBM can offer a good accuracy when using native categorical features. Not like simply one-hot coding, LightGBM can find the optimal split of categorical features.
+   Such an optimal split can provide the much better accuracy than one-hot coding solution.
+
+-  Use ``categorical_feature`` to specify the categorical features.
+   Refer to the parameter ``categorical_feature`` in `Parameters <./Parameters.rst>`__.
+
+-  Converting to ``int`` type is needed first, and there is support for non-negative numbers only.
+   It is better to convert into continues ranges.
+
+-  Use ``max_cat_group``, ``cat_smooth_ratio`` to deal with over-fitting
+   (when ``#data`` is small or ``#category`` is large).
+
+-  For categorical features with high cardinality (``#category`` is large), it is better to convert it to numerical features.
+
+LambdaRank
+----------
+
+-  The label should be ``int`` type, and larger numbers represent the higher relevance (e.g. 0:bad, 1:fair, 2:good, 3:perfect).
+
+-  Use ``label_gain`` to set the gain(weight) of ``int`` label.
+
+-  Use ``max_position`` to set the NDCG optimization position.
+
+Parameters Tuning
+-----------------
+
+-  Refer to `Parameters Tuning <./Parameters-Tuning.rst>`__.
+
+Parallel Learning
+-----------------
+
+-  Refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__.
+
+GPU Support
+-----------
+
+-  Refer to `GPU Tutorial <./GPU-Tutorial.rst>`__ and `GPU Targets <./GPU-Targets.rst>`__.
+
+Recommendations for gcc Users (MinGW, \*nix)
+--------------------------------------------
+
+-  Refer to `gcc Tips <./gcc-Tips.rst>`__.
--- a/docs/development.rst
+++ b/docs/development.rst
@@ -4,7 +4,7 @@ Development Guide
 Algorithms
 ----------

-Refer to `Features <./Features.md>`__ to understand important algorithms used in LightGBM.
+Refer to `Features <./Features.rst>`__ to understand important algorithms used in LightGBM.

 Classes and Code Structure
 --------------------------
@@ -68,9 +68,7 @@ Code Structure
 Documents API
 ~~~~~~~~~~~~~

-LightGBM support use `doxygen <http://www.stack.nl/~dimitri/doxygen/>`__ to generate documents for classes and functions.
-
-Refer to `docs README <./README.md>`__.
+Refer to `docs README <./README.rst>`__.

 C API
 -----
@@ -85,6 +83,6 @@ See the implementations at `Python-package <https://github.com/Microsoft/LightGB
 Questions
 ---------

-Refer to `FAQ <./FAQ.md>`__.
+Refer to `FAQ <./FAQ.rst>`__.

 Also feel free to open `issues <https://github.com/Microsoft/LightGBM/issues>`__ if you met problems.
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
-LightGBM FAQ
-============
-
-### Contents
-
- [Critical](#critical)
- [LightGBM](#lightgbm)
- [R-package](#r-package)
- [Python-package](#python-package)
-
---
-
-### Critical
-
-You encountered a critical issue when using LightGBM (crash, prediction error, non sense outputs...). Who should you contact?
-
-If your issue is not critical, just post an issue in [Microsoft/LightGBM repository](https://github.com/Microsoft/LightGBM/issues).
-
-If it is a critical issue, identify first what error you have:
-
-* Do you think it is reproducible on CLI (command line interface), R, and/or Python?
-* Is it specific to a wrapper? (R or Python?)
-* Is it specific to the compiler? (gcc versions? MinGW versions?)
-* Is it specific to your Operating System? (Windows? Linux?)
-* Are you able to reproduce this issue with a simple case?
-* Are you able to (not) reproduce this issue after removing all optimization flags and compiling LightGBM in debug mode?
-
-Depending on the answers, while opening your issue, feel free to ping (just mention them with the arobase (@) symbol) appropriately so we can attempt to solve your problem faster:
-
-* [@guolinke](https://github.com/guolinke) (C++ code / R-package / Python-package)
-* [@Laurae2](https://github.com/Laurae2) (R-package)
-* [@wxchan](https://github.com/wxchan) (Python-package)
-* [@henry0312](https://github.com/henry0312) (Python-package)
-* [@StrikerRUS](https://github.com/StrikerRUS) (Python-package)
-* [@huanzhang12](https://github.com/huanzhang12) (GPU support)
-
-Remember this is a free/open community support. We may not be available 24/7 to provide support.
-
---
-
-### LightGBM
-
- **Question 1**: Where do I find more details about LightGBM parameters?
-
- **Solution 1**: Look at [Parameters](./Parameters.md) and [Laurae++/Parameters](https://sites.google.com/view/lauraepp/parameters) website.
-
---
-
- **Question 2**: On datasets with million of features, training do not start (or starts after a very long time).
-
- **Solution 2**: Use a smaller value for `bin_construct_sample_cnt` and a larger value for `min_data`.
-
---
-
- **Question 3**: When running LightGBM on a large dataset, my computer runs out of RAM.
-
- **Solution 3**: Multiple solutions: set `histogram_pool_size` parameter to the MB you want to use for LightGBM (histogram_pool_size + dataset size = approximately RAM used), lower `num_leaves` or lower `max_bin` (see [Microsoft/LightGBM#562](https://github.com/Microsoft/LightGBM/issues/562)).
-
---
-
- **Question 4**: I am using Windows. Should I use Visual Studio or MinGW for compiling LightGBM?
-
- **Solution 4**: It is recommended to [use Visual Studio](https://github.com/Microsoft/LightGBM/issues/542) as its performance is higher for LightGBM.
-
---
-
- **Question 5**: When using LightGBM GPU, I cannot reproduce results over several runs.
-
- **Solution 5**: It is a normal issue, there is nothing we/you can do about, you may try to use `gpu_use_dp = true` for reproducibility (see [Microsoft/LightGBM#560](https://github.com/Microsoft/LightGBM/pull/560#issuecomment-304561654)). You may also use CPU version.
-
---
-
- **Question 6**: Bagging is not reproducible when changing the number of threads.
-
- **Solution 6**: As LightGBM bagging is running multithreaded, its output is dependent on the number of threads used. There is [no workaround currently](https://github.com/Microsoft/LightGBM/issues/632).
-
---
-
- **Question 7**: I tried to use Random Forest mode, and LightGBM crashes!
-
- **Solution 7**: It is by design. You must use `bagging_fraction` and `feature_fraction` different from 1, along with a `bagging_freq`. See [this thread](https://github.com/Microsoft/LightGBM/issues/691) as an example.
-
---
-
- **Question 8**: CPU are not kept busy (like 10% CPU usage only) in Windows when using LightGBM on very large datasets with many core systems.
-
- **Solution 8**: Please use [Visual Studio](https://www.visualstudio.com/downloads/) as it may be [10x faster than MinGW](https://github.com/Microsoft/LightGBM/issues/749) especially for very large trees.
-
---
-
-### R-package
-
- **Question 1**: Any training command using LightGBM does not work after an error occurred during the training of a previous LightGBM model.
-
- **Solution 1**: Run `lgb.unloader(wipe = TRUE)` in the R console, and recreate the LightGBM datasets (this will wipe all LightGBM-related variables). Due to the pointers, choosing to not wipe variables will not fix the error. This is a known issue: [Microsoft/LightGBM#698](https://github.com/Microsoft/LightGBM/issues/698).
-
- **Question 2**: I used `setinfo`, tried to print my `lgb.Dataset`, and now the R console froze!
-
- **Solution 2**: Avoid printing the `lgb.Dataset` after using `setinfo`. This is a known bug: [Microsoft/LightGBM#539](https://github.com/Microsoft/LightGBM/issues/539).
-
---
-
-### Python-package
-
- **Question 1**: I see error messages like this when install from github using `python setup.py install`.
-
-    ```
-    error: Error: setup script specifies an absolute path:
-
-    /Users/Microsoft/LightGBM/python-package/lightgbm/../../lib_lightgbm.so
-
-    setup() arguments must *always* be /-separated paths relative to the
-    setup.py directory, *never* absolute paths.
-    ```
-
- **Solution 1**: this error should be solved in latest version. If you still meet this error, try to remove lightgbm.egg-info folder in your Python-package and reinstall, or check [this thread on stackoverflow](http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path).
-
---
-
- **Question 2**: I see error messages like 
-    ```
-    Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset
-    ```
-    but I already construct dataset by some code like
-    ```
-    train = lightgbm.Dataset(X_train, y_train)
-    ```
-    or error messages like
-    ```
-    Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
-    ```
-
- **Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers, categorical features and feature names etc., the Dataset objects are constructed when construct a Booster. And if you set `free_raw_data=True` (default), the raw data (with Python data struct) will be freed. So, if you want to:
-
-  + get label(or weight/init_score/group) before construct dataset, it's same as get `self.label`
-  + set label(or weight/init_score/group) before construct dataset, it's same as `self.label=some_label_array`
-  + get num_data(or num_feature) before construct dataset, you can get data with `self.data`, then if your data is `numpy.ndarray`, use some code like `self.data.shape`
-  + set predictor(or reference/categorical feature) after construct dataset, you should set `free_raw_data=False` or init a Dataset object with the same raw data
--- a/docs/FAQ.rst
+++ b/docs/FAQ.rst
+LightGBM FAQ
+============
+
+Contents
+~~~~~~~~
+
+-  `Critical <#critical>`__
+
+-  `LightGBM <#lightgbm>`__
+
+-  `R-package <#r-package>`__
+
+-  `Python-package <#python-package>`__
+
+--------------
+
+Critical
+~~~~~~~~
+
+You encountered a critical issue when using LightGBM (crash, prediction error, non sense outputs...). Who should you contact?
+
+If your issue is not critical, just post an issue in `Microsoft/LightGBM repository <https://github.com/Microsoft/LightGBM/issues>`__.
+
+If it is a critical issue, identify first what error you have:
+
+-  Do you think it is reproducible on CLI (command line interface), R, and/or Python?
+
+-  Is it specific to a wrapper? (R or Python?)
+
+-  Is it specific to the compiler? (gcc versions? MinGW versions?)
+
+-  Is it specific to your Operating System? (Windows? Linux?)
+
+-  Are you able to reproduce this issue with a simple case?
+
+-  Are you able to (not) reproduce this issue after removing all optimization flags and compiling LightGBM in debug mode?
+
+Depending on the answers, while opening your issue, feel free to ping (just mention them with the arobase (@) symbol) appropriately so we can attempt to solve your problem faster:
+
+-  `@guolinke <https://github.com/guolinke>`__ (C++ code / R-package / Python-package)
+-  `@Laurae2 <https://github.com/Laurae2>`__ (R-package)
+-  `@wxchan <https://github.com/wxchan>`__ (Python-package)
+-  `@henry0312 <https://github.com/henry0312>`__ (Python-package)
+-  `@StrikerRUS <https://github.com/StrikerRUS>`__ (Python-package)
+-  `@huanzhang12 <https://github.com/huanzhang12>`__ (GPU support)
+
+Remember this is a free/open community support. We may not be available 24/7 to provide support.
+
+--------------
+
+LightGBM
+~~~~~~~~
+
+-  **Question 1**: Where do I find more details about LightGBM parameters?
+
+-  **Solution 1**: Take a look at `Parameters <./Parameters.rst>`__ and `Laurae++/Parameters <https://sites.google.com/view/lauraepp/parameters>`__ website.
+
+--------------
+
+-  **Question 2**: On datasets with million of features, training do not start (or starts after a very long time).
+
+-  **Solution 2**: Use a smaller value for ``bin_construct_sample_cnt`` and a larger value for ``min_data``.
+
+--------------
+
+-  **Question 3**: When running LightGBM on a large dataset, my computer runs out of RAM.
+
+-  **Solution 3**: Multiple solutions: set ``histogram_pool_size`` parameter to the MB you want to use for LightGBM (histogram\_pool\_size + dataset size = approximately RAM used),
+   lower ``num_leaves`` or lower ``max_bin`` (see `Microsoft/LightGBM#562 <https://github.com/Microsoft/LightGBM/issues/562>`__).
+
+--------------
+
+-  **Question 4**: I am using Windows. Should I use Visual Studio or MinGW for compiling LightGBM?
+
+-  **Solution 4**: It is recommended to `use Visual Studio <https://github.com/Microsoft/LightGBM/issues/542>`__ as its performance is higher for LightGBM.
+
+--------------
+
+-  **Question 5**: When using LightGBM GPU, I cannot reproduce results over several runs.
+
+-  **Solution 5**: It is a normal issue, there is nothing we/you can do about,
+   you may try to use ``gpu_use_dp = true`` for reproducibility (see `Microsoft/LightGBM#560 <https://github.com/Microsoft/LightGBM/pull/560#issuecomment-304561654>`__).
+   You may also use CPU version.
+
+--------------
+
+-  **Question 6**: Bagging is not reproducible when changing the number of threads.
+
+-  **Solution 6**: As LightGBM bagging is running multithreaded, its output is dependent on the number of threads used.
+   There is `no workaround currently <https://github.com/Microsoft/LightGBM/issues/632>`__.
+
+--------------
+
+-  **Question 7**: I tried to use Random Forest mode, and LightGBM crashes!
+
+-  **Solution 7**: It is by design.
+   You must use ``bagging_fraction`` and ``feature_fraction`` different from 1, along with a ``bagging_freq``.
+   See `this thread <https://github.com/Microsoft/LightGBM/issues/691>`__ as an example.
+
+--------------
+
+-  **Question 8**: CPU are not kept busy (like 10% CPU usage only) in Windows when using LightGBM on very large datasets with many core systems.
+
+-  **Solution 8**: Please use `Visual Studio <https://www.visualstudio.com/downloads/>`__
+   as it may be `10x faster than MinGW <https://github.com/Microsoft/LightGBM/issues/749>`__ especially for very large trees.
+
+--------------
+
+R-package
+~~~~~~~~~
+
+-  **Question 1**: Any training command using LightGBM does not work after an error occurred during the training of a previous LightGBM model.
+
+-  **Solution 1**: Run ``lgb.unloader(wipe = TRUE)`` in the R console, and recreate the LightGBM datasets (this will wipe all LightGBM-related variables).
+   Due to the pointers, choosing to not wipe variables will not fix the error.
+   This is a known issue: `Microsoft/LightGBM#698 <https://github.com/Microsoft/LightGBM/issues/698>`__.
+
+--------------
+
+-  **Question 2**: I used ``setinfo``, tried to print my ``lgb.Dataset``, and now the R console froze!
+
+-  **Solution 2**: Avoid printing the ``lgb.Dataset`` after using ``setinfo``.
+   This is a known bug: `Microsoft/LightGBM#539 <https://github.com/Microsoft/LightGBM/issues/539>`__.
+
+--------------
+
+Python-package
+~~~~~~~~~~~~~~
+
+-  **Question 1**: I see error messages like this when install from GitHub using ``python setup.py install``.
+
+   ::
+
+       error: Error: setup script specifies an absolute path:
+       /Users/Microsoft/LightGBM/python-package/lightgbm/../../lib_lightgbm.so
+       setup() arguments must *always* be /-separated paths relative to the setup.py directory, *never* absolute paths.
+
+-  **Solution 1**: This error should be solved in latest version.
+   If you still meet this error, try to remove ``lightgbm.egg-info`` folder in your Python-package and reinstall,
+   or check `this thread on stackoverflow <http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path>`__.
+
+--------------
+
+-  **Question 2**: I see error messages like
+
+   ::
+
+       Cannot get/set label/weight/init_score/group/num_data/num_feature before construct dataset
+
+   but I've already constructed dataset by some code like
+
+   ::
+
+       train = lightgbm.Dataset(X_train, y_train)
+
+   or error messages like
+
+   ::
+
+       Cannot set predictor/reference/categorical feature after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
+
+-  **Solution 2**: Because LightGBM constructs bin mappers to build trees, and train and valid Datasets within one Booster share the same bin mappers,
+   categorical features and feature names etc., the Dataset objects are constructed when construct a Booster.
+   And if you set ``free_raw_data=True`` (default), the raw data (with Python data struct) will be freed.
+   So, if you want to:
+
+   -  get label(or weight/init\_score/group) before construct dataset, it's same as get ``self.label``
+
+   -  set label(or weight/init\_score/group) before construct dataset, it's same as ``self.label=some_label_array``
+
+   -  get num\_data(or num\_feature) before construct dataset, you can get data with ``self.data``,
+      then if your data is ``numpy.ndarray``, use some code like ``self.data.shape``
+
+   -  set predictor(or reference/categorical feature) after construct dataset,
+      you should set ``free_raw_data=False`` or init a Dataset object with the same raw data
--- a/docs/Features.md
+++ b/docs/Features.md
-# Features
+Features
+========

 This is a short introduction for the features and algorithms used in LightGBM.

 This page doesn't contain detailed algorithms, please refer to cited papers or source code if you are interested.

-## Optimization in Speed and Memory Usage
+Optimization in Speed and Memory Usage
+--------------------------------------

-Many boosting tools use pre-sorted based algorithms[[1, 2]](#references) (e.g. default algorithm in xgboost) for decision tree learning. It is a simple solution, but not easy to optimize.
+Many boosting tools use pre-sorted based algorithms\ `[1, 2] <#references>`__ (e.g. default algorithm in xgboost) for decision tree learning. It is a simple solution, but not easy to optimize.

-LightGBM uses the histogram based algorithms[[3, 4, 5]](#references), which bucketing continuous feature(attribute) values into discrete bins, to speed up training procedure and reduce memory usage. Following are advantages for histogram based algorithms:
+LightGBM uses the histogram based algorithms\ `[3, 4, 5] <#references>`__, which bucketing continuous feature(attribute) values into discrete bins, to speed up training procedure and reduce memory usage.
+Following are advantages for histogram based algorithms:

- **Reduce calculation cost of split gain**
-  - Pre-sorted based algorithms need ``O(#data)`` times calculation
-  - Histogram based algorithms only need to calculate ``O(#bins)`` times, and ``#bins`` is far smaller than ``#data``
-    - It still needs ``O(#data)`` times to construct histogram, which only contain sum-up operation
- **Use histogram subtraction for further speed-up**
-  - To get one leaf's histograms in a binary tree, can use the histogram subtraction of its parent and its neighbor
-  - So it only need to construct histograms for one leaf (with smaller ``#data`` than its neighbor), then can get histograms of its neighbor by histogram subtraction with small cost(``O(#bins)``)
- **Reduce memory usage**
-  - Can replace continuous values to discrete bins. If ``#bins`` is small, can use small data type, e.g. uint8_t, to store training data
-  - No need to store additional information for pre-sorting feature values
- **Reduce communication cost for parallel learning**
+-  **Reduce calculation cost of split gain**

-## Sparse Optimization
+   -  Pre-sorted based algorithms need ``O(#data)`` times calculation

- Only need ``O(2 * #non_zero_data)`` to construct histogram for sparse features
+   -  Histogram based algorithms only need to calculate ``O(#bins)`` times, and ``#bins`` is far smaller than ``#data``

-## Optimization in Accuracy
+      -  It still needs ``O(#data)`` times to construct histogram, which only contain sum-up operation

-### Leaf-wise (Best-first) Tree Growth
+-  **Use histogram subtraction for further speed-up**
+
+   -  To get one leaf's histograms in a binary tree, can use the histogram subtraction of its parent and its neighbor
+
+   -  So it only need to construct histograms for one leaf (with smaller ``#data`` than its neighbor), then can get histograms of its neighbor by histogram subtraction with small cost(``O(#bins)``)
+-  **Reduce memory usage**
+
+   -  Can replace continuous values to discrete bins. If ``#bins`` is small, can use small data type, e.g. uint8\_t, to store training data
+
+   -  No need to store additional information for pre-sorting feature values
+
+-  **Reduce communication cost for parallel learning**
+
+Sparse Optimization
+-------------------
+
+-  Only need ``O(2 * #non_zero_data)`` to construct histogram for sparse features
+
+Optimization in Accuracy
+------------------------
+
+Leaf-wise (Best-first) Tree Growth
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Most decision tree learning algorithms grow tree by level(depth)-wise, like the following image:

-![level_wise](./_static/images/level-wise.png)
+.. image:: ./_static/images/level-wise.png
+   :align: center

-LightGBM grows tree by leaf-wise(best-first)[[6]](#references). It will choose the leaf with max delta loss to grow. When growing same ``#leaf``, leaf-wise algorithm can reduce more loss than level-wise algorithm.
+LightGBM grows tree by leaf-wise (best-first)\ `[6] <#references>`__. It will choose the leaf with max delta loss to grow.
+When growing same ``#leaf``, leaf-wise algorithm can reduce more loss than level-wise algorithm.

-Leaf-wise may cause over-fitting when ``#data`` is small. So, LightGBM can use an additional parameter ``max_depth`` to limit depth of tree and avoid over-fitting (tree still grows by leaf-wise).
+Leaf-wise may cause over-fitting when ``#data`` is small.
+So, LightGBM can use an additional parameter ``max_depth`` to limit depth of tree and avoid over-fitting (tree still grows by leaf-wise).

-![leaf_wise](./_static/images/leaf-wise.png)
+.. image:: ./_static/images/leaf-wise.png
+   :align: center

-### Optimal Split for Categorical Features
+Optimal Split for Categorical Features
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-We often convert the categorical features into one-hot coding. However, it is not a good solution in tree learner. The reason is, for the high cardinality categorical features, it will grow the very unbalance tree, and needs to grow very deep to achieve the good accuracy.
+We often convert the categorical features into one-hot coding.
+However, it is not a good solution in tree learner.
+The reason is, for the high cardinality categorical features, it will grow the very unbalance tree, and needs to grow very deep to achieve the good accuracy.

-Actually, the optimal solution is partitioning the categorical feature into 2 subsets, and there are ``2^(k-1) - 1`` possible partitions. But there is a efficient solution for regression tree[[7]](#references). It needs about ``k * log(k)`` to find the optimal partition.
+Actually, the optimal solution is partitioning the categorical feature into 2 subsets, and there are ``2^(k-1) - 1`` possible partitions.
+But there is a efficient solution for regression tree\ `[7] <#references>`__. It needs about ``k * log(k)`` to find the optimal partition.

-The basic idea is reordering the categories according to the relevance of training target. More specifically, reordering the histogram (of categorical feature) according to it's accumulate values (``sum_gradient / sum_hessian``), then find the best split on the sorted histogram.
+The basic idea is reordering the categories according to the relevance of training target.
+More specifically, reordering the histogram (of categorical feature) according to it's accumulate values (``sum_gradient / sum_hessian``), then find the best split on the sorted histogram.

-## Optimization in Network Communication
+Optimization in Network Communication
+-------------------------------------

-It only needs to use some collective communication algorithms, like "All reduce", "All gather" and "Reduce scatter", in parallel learning of LightGBM. LightGBM implement state-of-art algorithms[[8]](#references). These collective communication algorithms can provide much better performance than point-to-point communication.
+It only needs to use some collective communication algorithms, like "All reduce", "All gather" and "Reduce scatter", in parallel learning of LightGBM.
+LightGBM implement state-of-art algorithms\ `[8] <#references>`__.
+These collective communication algorithms can provide much better performance than point-to-point communication.

-## Optimization in Parallel Learning
+Optimization in Parallel Learning
+---------------------------------

 LightGBM provides following parallel learning algorithms.

-### Feature Parallel
+Feature Parallel
+~~~~~~~~~~~~~~~~

-#### Traditional Algorithm
+Traditional Algorithm
+^^^^^^^^^^^^^^^^^^^^^

 Feature parallel aims to parallel the "Find Best Split" in the decision tree. The procedure of traditional feature parallel is:

 1. Partition data vertically (different machines have different feature set)
+
 2. Workers find local best split point {feature, threshold} on local feature set
+
 3. Communicate local best splits with each other and get the best one
+
 4. Worker with best split to perform split, then send the split result of data to other workers
+
 5. Other workers split data according received data

 The shortage of traditional feature parallel:

- Has computation overhead, since it cannot speed up "split", whose time complexity is ``O(#data)``. Thus, feature parallel cannot speed up well when ``#data`` is large.
- Need communication of split result, which cost about ``O(#data / 8)`` (one bit for one data).
+-  Has computation overhead, since it cannot speed up "split", whose time complexity is ``O(#data)``.
+   Thus, feature parallel cannot speed up well when ``#data`` is large.

-#### Feature Parallel in LightGBM
+-  Need communication of split result, which cost about ``O(#data / 8)`` (one bit for one data).

-Since feature parallel cannot speed up well when ``#data`` is large, we make a little change here: instead of partitioning data vertically, every worker holds the full data. Thus, LightGBM doesn't need to communicate for split result of data since every worker know how to split data. And ``#data`` won't be larger, so it is reasonable to hold full data in every machine.
+Feature Parallel in LightGBM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Since feature parallel cannot speed up well when ``#data`` is large, we make a little change here: instead of partitioning data vertically, every worker holds the full data.
+Thus, LightGBM doesn't need to communicate for split result of data since every worker know how to split data.
+And ``#data`` won't be larger, so it is reasonable to hold full data in every machine.

 The procedure of feature parallel in LightGBM:

-1. Workers find local best split point{feature, threshold} on local feature set
+1. Workers find local best split point {feature, threshold} on local feature set
+
 2. Communicate local best splits with each other and get the best one
+
 3. Perform best split

-However, this feature parallel algorithm still suffers from computation overhead for "split" when ``#data`` is large. So it will be better to use data parallel when ``#data`` is large.
+However, this feature parallel algorithm still suffers from computation overhead for "split" when ``#data`` is large.
+So it will be better to use data parallel when ``#data`` is large.

-### Data Parallel
+Data Parallel
+~~~~~~~~~~~~~

-#### Traditional Algorithm
+Traditional Algorithm
+^^^^^^^^^^^^^^^^^^^^^

 Data parallel aims to parallel the whole decision learning. The procedure of data parallel is:

 1. Partition data horizontally
+
 2. Workers use local data to construct local histograms
+
 3. Merge global histograms from all local histograms
+
 4. Find best split from merged global histograms, then perform splits

 The shortage of traditional data parallel:

- High communication cost. If using point-to-point communication algorithm, communication cost for one machine is about ``O(#machine * #feature * #bin)``. If using collective communication algorithm (e.g. "All Reduce"), communication cost is about ``O(2 * #feature * #bin)`` (check cost of "All Reduce" in chapter 4.5 at [[8]](#references)).
+-  High communication cost.
+   If using point-to-point communication algorithm, communication cost for one machine is about ``O(#machine * #feature * #bin)``.
+   If using collective communication algorithm (e.g. "All Reduce"), communication cost is about ``O(2 * #feature * #bin)`` (check cost of "All Reduce" in chapter 4.5 at `[8] <#references>`__).

-#### Data Parallel in LightGBM
+Data Parallel in LightGBM
+^^^^^^^^^^^^^^^^^^^^^^^^^

 We reduce communication cost of data parallel in LightGBM:

-1. Instead of "Merge global histograms from all local histograms", LightGBM use "Reduce Scatter" to merge histograms of different(non-overlapping) features for different workers. Then workers find local best split on local merged histograms and sync up global best split.
-2. As aforementioned, LightGBM use histogram subtraction to speed up training. Based on this, we can communicate histograms only for one leaf, and get its neighbor's histograms by subtraction as well.
+1. Instead of "Merge global histograms from all local histograms", LightGBM use "Reduce Scatter" to merge histograms of different(non-overlapping) features for different workers.
+   Then workers find local best split on local merged histograms and sync up global best split.
+
+2. As aforementioned, LightGBM use histogram subtraction to speed up training.
+   Based on this, we can communicate histograms only for one leaf, and get its neighbor's histograms by subtraction as well.

 Above all, we reduce communication cost to ``O(0.5 * #feature * #bin)`` for data parallel in LightGBM.

-### Voting Parallel
+Voting Parallel
+~~~~~~~~~~~~~~~

-Voting parallel further reduce the communication cost in [Data Parallel](#data-parallel) to constant cost. It uses two stage voting to reduce the communication cost of feature histograms[[9]](#references).
+Voting parallel further reduce the communication cost in `Data Parallel <#data-parallel>`__ to constant cost.
+It uses two stage voting to reduce the communication cost of feature histograms\ `[9] <#references>`__.

-## GPU Support
+GPU Support
+-----------

-Thanks [@huanzhang12](https://github.com/huanzhang12) for contributing this feature. Please read[[10]](#references) to get more details.
+Thanks `@huanzhang12 <https://github.com/huanzhang12>`__ for contributing this feature. Please read `[10] <#references>`__ to get more details.

- [GPU Installation](./Installation-Guide.rst)
- [GPU Tutorial](./GPU-Tutorial.md)
+- `GPU Installation <./Installation-Guide.rst#build-gpu-version>`__

-## Applications and Metrics
+- `GPU Tutorial <./GPU-Tutorial.rst>`__
+
+Applications and Metrics
+------------------------

 Support following application:

- regression, the objective function is L2 loss
- binary classification, the objective function is logloss
- multi classification
- lambdarank, the objective function is lambdarank with NDCG
+-  regression, the objective function is L2 loss
+
+-  binary classification, the objective function is logloss
+
+-  multi classification
+
+-  lambdarank, the objective function is lambdarank with NDCG

 Support following metrics:

- L1 loss
- L2 loss
- Log loss
- Classification error rate
- AUC
- NDCG
- Multi class log loss
- Multi class error rate
-
-For more details, please refer to [Parameters](./Parameters.md).
-
-## Other Features
-
- Limit ``max_depth`` of tree while grows tree leaf-wise
- [DART](https://arxiv.org/abs/1505.01866)
- L1/L2 regularization
- Bagging
- Column(feature) sub-sample
- Continued train with input GBDT model
- Continued train with the input score file
- Weighted training
- Validation metric output during training
- Multi validation data
- Multi metrics
- Early stopping (both training and prediction)
- Prediction for leaf index
-
-For more details, please refer to [Parameters](./Parameters.md).
-
-## References
+-  L1 loss
+
+-  L2 loss
+
+-  Log loss
+
+-  Classification error rate
+
+-  AUC
+
+-  NDCG
+
+-  Multi class log loss
+
+-  Multi class error rate
+
+For more details, please refer to `Parameters <./Parameters.rst#metric-parameters>`__.
+
+Other Features
+--------------
+
+-  Limit ``max_depth`` of tree while grows tree leaf-wise
+
+-  `DART <https://arxiv.org/abs/1505.01866>`__
+
+-  L1/L2 regularization
+
+-  Bagging
+
+-  Column(feature) sub-sample
+
+-  Continued train with input GBDT model
+
+-  Continued train with the input score file
+
+-  Weighted training
+
+-  Validation metric output during training
+
+-  Multi validation data
+
+-  Multi metrics
+
+-  Early stopping (both training and prediction)
+
+-  Prediction for leaf index
+
+For more details, please refer to `Parameters <./Parameters.rst>`__.
+
+References
+----------

 [1] Mehta, Manish, Rakesh Agrawal, and Jorma Rissanen. "SLIQ: A fast scalable classifier for data mining." International Conference on Extending Database Technology. Springer Berlin Heidelberg, 1996.

@@ -174,10 +256,18 @@ For more details, please refer to [Parameters](./Parameters.md).

 [6] Shi, Haijian. "Best-first decision tree learning." Diss. The University of Waikato, 2007.

-[7] Walter D. Fisher. "[On Grouping for Maximum Homogeneity](http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1958.10501479)." Journal of the American Statistical Association. Vol. 53, No. 284 (Dec., 1958), pp. 789-798.
+[7] Walter D. Fisher. "`On Grouping for Maximum Homogeneity`_." Journal of the American Statistical Association. Vol. 53, No. 284 (Dec., 1958), pp. 789-798.
+
+[8] Thakur, Rajeev, Rolf Rabenseifner, and William Gropp. "`Optimization of collective communication operations in MPICH`_." International Journal of High Performance Computing Applications 19.1 (2005): 49-66.
+
+[9] Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tieyan Liu. "`A Communication-Efficient Parallel Algorithm for Decision Tree`_." Advances in Neural Information Processing Systems 29 (NIPS 2016).
+
+[10] Huan Zhang, Si Si and Cho-Jui Hsieh. "`GPU Acceleration for Large-scale Tree Boosting`_." arXiv:1706.08359, 2017.
+
+.. _On Grouping for Maximum Homogeneity: http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1958.10501479

-[8] Thakur, Rajeev, Rolf Rabenseifner, and William Gropp. "[Optimization of collective communication operations in MPICH](http://wwwi10.lrr.in.tum.de/~gerndt/home/Teaching/HPCSeminar/mpich_multi_coll.pdf)." International Journal of High Performance Computing Applications 19.1 (2005): 49-66.
+.. _Optimization of collective communication operations in MPICH: http://wwwi10.lrr.in.tum.de/~gerndt/home/Teaching/HPCSeminar/mpich_multi_coll.pdf

-[9] Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tieyan Liu. "[A Communication-Efficient Parallel Algorithm for Decision Tree](http://papers.nips.cc/paper/6381-a-communication-efficient-parallel-algorithm-for-decision-tree)." Advances in Neural Information Processing Systems 29 (NIPS 2016).
+.. _A Communication-Efficient Parallel Algorithm for Decision Tree: http://papers.nips.cc/paper/6381-a-communication-efficient-parallel-algorithm-for-decision-tree

-[10] Huan Zhang, Si Si and Cho-Jui Hsieh. "[GPU Acceleration for Large-scale Tree Boosting](https://arxiv.org/abs/1706.08359)." arXiv:1706.08359, 2017.
+.. _GPU Acceleration for Large-scale Tree Boosting: https://arxiv.org/abs/1706.08359
--- a/docs/GPU-Performance.rst
+++ b/docs/GPU-Performance.rst
@@ -161,7 +161,8 @@ For most datasets, using 63 bins is sufficient.

 We record the wall clock time after 500 iterations, as shown in the figure below:

-|Performance Comparison|
+.. image:: ./_static/images/gpu-performance-comparison.png
+   :align: center

 When using a GPU, it is advisable to use a bin size of 63 rather than 255, because it can speed up training significantly without noticeably affecting accuracy.
 On CPU, using a smaller bin size only marginally improves performance, sometimes even slows down training,
@@ -206,6 +207,4 @@ Huan Zhang, Si Si and Cho-Jui Hsieh. `GPU Acceleration for Large-scale Tree Boos

 .. _0bb4a82: https://github.com/Microsoft/LightGBM/commit/0bb4a82

-.. |Performance Comparison| image:: ./_static/images/gpu-performance-comparison.png
-
 .. _GPU Acceleration for Large-scale Tree Boosting: https://arxiv.org/abs/1706.08359
--- a/docs/GPU-Targets.rst
+++ b/docs/GPU-Targets.rst
 GPU Targets Table
 =================

-When using OpenCL SDKs, targeting CPU and GPU at the same time is
-sometimes possible. This is especially true for Intel OpenCL SDK and AMD
-APP SDK.
+When using OpenCL SDKs, targeting CPU and GPU at the same time is sometimes possible.
+This is especially true for Intel OpenCL SDK and AMD APP SDK.

 You can find below a table of correspondence:

@@ -22,8 +21,7 @@ Legend:
 -  \* Not usable directly.
 -  \*\* Reported as unsupported in public forums.

-AMD GPUs using Intel SDK for OpenCL is not a typo, nor AMD APP SDK
-compatibility with CPUs.
+AMD GPUs using Intel SDK for OpenCL is not a typo, nor AMD APP SDK compatibility with CPUs.

 --------------

@@ -36,8 +34,7 @@ We present the following scenarii:
 -  Single CPU and GPU (even with integrated graphics)
 -  Multiple CPU/GPU

-We provide test R code below, but you can use the language of your
-choice with the examples of your choices:
+We provide test R code below, but you can use the language of your choice with the examples of your choices:

 .. code:: r

@@ -73,15 +70,13 @@ Using a bad ``gpu_device_id`` is not critical, as it will fallback to:
 -  ``gpu_device_id = 0`` if using ``gpu_platform_id = 0``
 -  ``gpu_device_id = 1`` if using ``gpu_platform_id = 1``

-However, using a bad combination of ``gpu_platform_id`` and
-``gpu_device_id`` will lead to a **crash** (you will lose your entire
-session content). Beware of it.
+However, using a bad combination of ``gpu_platform_id`` and ``gpu_device_id`` will lead to a **crash** (you will lose your entire session content).
+Beware of it.

 CPU Only Architectures
 ----------------------

-When you have a single device (one CPU), OpenCL usage is
-straightforward: ``gpu_platform_id = 0``, ``gpu_device_id = 0``
+When you have a single device (one CPU), OpenCL usage is straightforward: ``gpu_platform_id = 0``, ``gpu_device_id = 0``

 This will use the CPU with OpenCL, even though it says it says GPU.

@@ -124,18 +119,15 @@ Example:
 Single CPU and GPU (even with integrated graphics)
 --------------------------------------------------

-If you have integrated graphics card (Intel HD Graphics) and a dedicated
-graphics card (AMD, NVIDIA), the dedicated graphics card will
-automatically override the integrated graphics card. The workaround is
-to disable your dedicated graphics card to be able to use your
-integrated graphics card.
+If you have integrated graphics card (Intel HD Graphics) and a dedicated graphics card (AMD, NVIDIA),
+the dedicated graphics card will automatically override the integrated graphics card.
+The workaround is to disable your dedicated graphics card to be able to use your integrated graphics card.

-When you have multiple devices (one CPU and one GPU), the order is
-usually the following:
+When you have multiple devices (one CPU and one GPU), the order is usually the following:
+
+-  GPU: ``gpu_platform_id = 0``, ``gpu_device_id = 0``,
+   sometimes it is usable using ``gpu_platform_id = 1``, ``gpu_device_id = 1`` but at your own risk!

-  GPU: ``gpu_platform_id = 0``, ``gpu_device_id = 0``, sometimes it is
-   usable using ``gpu_platform_id = 1``, ``gpu_device_id = 1`` but at
-   your own risk!
 -  CPU: ``gpu_platform_id = 0``, ``gpu_device_id = 1``

 Example of GPU (``gpu_platform_id = 0``, ``gpu_device_id = 0``):
@@ -209,8 +201,7 @@ Example of CPU (``gpu_platform_id = 0``, ``gpu_device_id = 1``):
    [LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
    [2]:    test's rmse:0

-When using a wrong ``gpu_device_id``, it will automatically fallback to
-``gpu_device_id = 0``:
+When using a wrong ``gpu_device_id``, it will automatically fallback to ``gpu_device_id = 0``:

 .. code:: r

@@ -245,8 +236,7 @@ When using a wrong ``gpu_device_id``, it will automatically fallback to
    [LightGBM] [Info] Trained a tree with leaves=7 and max_depth=5
    [2]:    test's rmse:0

-Do not ever run under the following scenario as it is known to crash
-even if it says it is using the CPU because it is NOT the case:
+Do not ever run under the following scenario as it is known to crash even if it says it is using the CPU because it is NOT the case:

 -  One CPU and one GPU
 -  ``gpu_platform_id = 1``, ``gpu_device_id = 0``
@@ -284,13 +274,12 @@ even if it says it is using the CPU because it is NOT the case:
 Multiple CPU and GPU
 --------------------

-If you have multiple devices (multiple CPUs and multiple GPUs), you will
-have to test different ``gpu_device_id`` and different
-``gpu_platform_id`` values to find out the values which suits the
-CPU/GPU you want to use. Keep in mind that using the integrated graphics
-card is not directly possible without disabling every dedicated graphics
-card.
+If you have multiple devices (multiple CPUs and multiple GPUs),
+you will have to test different ``gpu_device_id`` and different ``gpu_platform_id`` values to find out the values which suits the CPU/GPU you want to use.
+Keep in mind that using the integrated graphics card is not directly possible without disabling every dedicated graphics card.

 .. _Intel SDK for OpenCL: https://software.intel.com/en-us/articles/opencl-drivers
+
 .. _AMD APP SDK: http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/
+
 .. _NVIDIA CUDA Toolkit: https://developer.nvidia.com/cuda-downloads
--- a/docs/GPU-Tutorial.md
+++ b/docs/GPU-Tutorial.md
-LightGBM GPU Tutorial
-=====================
-
-The purpose of this document is to give you a quick step-by-step tutorial on GPU training.
-
-For Windows, please see [GPU Windows Tutorial](./GPU-Windows.md).
-
-We will use the GPU instance on [Microsoft Azure cloud computing platform](https://azure.microsoft.com/) for demonstration, but you can use any machine with modern AMD or NVIDIA GPUs.
-
-GPU Setup
---------
-
-You need to launch a `NV` type instance on Azure (available in East US, North Central US, South Central US, West Europe and Southeast Asia zones) and select Ubuntu 16.04 LTS as the operating system.
-
-For testing, the smallest `NV6` type virtual machine is sufficient, which includes 1/2 M60 GPU, with 8 GB memory, 180 GB/s memory bandwidth and 4,825 GFLOPS peak computation power. Don't use the `NC` type instance as the GPUs (K80) are based on an older architecture (Kepler).
-
-First we need to install minimal NVIDIA drivers and OpenCL development environment:
-
-```
-sudo apt-get update
-sudo apt-get install --no-install-recommends nvidia-375
-sudo apt-get install --no-install-recommends nvidia-opencl-icd-375 nvidia-opencl-dev opencl-headers
-```
-
-After installing the drivers you need to restart the server.
-
-```
-sudo init 6
-```
-
-After about 30 seconds, the server should be up again.
-
-If you are using a AMD GPU, you should download and install the [AMDGPU-Pro](http://support.amd.com/en-us/download/linux) driver and also install package `ocl-icd-libopencl1` and `ocl-icd-opencl-dev`.
-
-Build LightGBM
--------------
-
-Now install necessary building tools and dependencies:
-
-```
-sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
-```
-
-The NV6 GPU instance has a 320 GB ultra-fast SSD mounted at /mnt. Let's use it as our workspace (skip this if you are using your own machine):
-
-```
-sudo mkdir -p /mnt/workspace
-sudo chown $(whoami):$(whoami) /mnt/workspace
-cd /mnt/workspace
-```
-
-Now we are ready to checkout LightGBM and compile it with GPU support:
-
-```
-git clone --recursive https://github.com/Microsoft/LightGBM
-cd LightGBM
-mkdir build ; cd build
-cmake -DUSE_GPU=1 .. 
-make -j$(nproc)
-cd ..
-```
-
-You will see two binaries are generated, `lightgbm` and `lib_lightgbm.so`.
-
-If you are building on OSX, you probably need to remove macro `BOOST_COMPUTE_USE_OFFLINE_CACHE` in `src/treelearner/gpu_tree_learner.h` to avoid a known crash bug in Boost.Compute.
-
-Install Python Interface (optional)
-----------------------------------
-
-If you want to use the Python interface of LightGBM, you can install it now (along with some necessary Python-package dependencies):
-
-```
-sudo apt-get -y install python-pip
-sudo -H pip install setuptools numpy scipy scikit-learn -U
-cd python-package/
-sudo python setup.py install
-cd ..
-```
-
-You need to set an additional parameter `"device" : "gpu"` (along with your other options like `learning_rate`, `num_leaves`, etc) to use GPU in Python.
-
-You can read our [Python Guide](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide) for more information on how to use the Python interface.
-
-Dataset Preparation
-------------------
-
-Using the following commands to prepare the Higgs dataset:
-
-```
-git clone https://github.com/guolinke/boosting_tree_benchmarks.git
-cd boosting_tree_benchmarks/data
-wget "https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz"
-gunzip HIGGS.csv.gz
-python higgs2libsvm.py
-cd ../..
-ln -s boosting_tree_benchmarks/data/higgs.train
-ln -s boosting_tree_benchmarks/data/higgs.test
-```
-
-Now we create a configuration file for LightGBM by running the following commands (please copy the entire block and run it as a whole):
-
-```
-cat > lightgbm_gpu.conf <<EOF
-max_bin = 63
-num_leaves = 255
-num_iterations = 50
-learning_rate = 0.1
-tree_learner = serial
-task = train
-is_training_metric = false
-min_data_in_leaf = 1
-min_sum_hessian_in_leaf = 100
-ndcg_eval_at = 1,3,5,10
-sparse_threshold = 1.0
-device = gpu
-gpu_platform_id = 0
-gpu_device_id = 0
-EOF
-echo "num_threads=$(nproc)" >> lightgbm_gpu.conf
-```
-
-GPU is enabled in the configuration file we just created by setting `device=gpu`. It will use the first GPU installed on the system by default (`gpu_platform_id=0` and `gpu_device_id=0`).
-
-Run Your First Learning Task on GPU
-----------------------------------
-
-Now we are ready to start GPU training! First we want to verify the GPU works correctly. Run the following command to train on GPU, and take a note of the AUC after 50 iterations:
-
-```
-./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc
-```
-
-Now train the same dataset on CPU using the following command. You should observe a similar AUC:
-
-```
-./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc device=cpu
-```
-
-Now we can make a speed test on GPU without calculating AUC after each iteration.
-
-```
-./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=binary metric=auc
-```
-
-Speed test on CPU:
-
-```
-./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=binary metric=auc device=cpu
-```
-
-You should observe over three times speedup on this GPU.
-
-The GPU acceleration can be used on other tasks/metrics (regression, multi-class classification, ranking, etc) as well. For example, we can train the Higgs dataset on GPU as a regression task:
-
-```
-./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2
-```
-
-Also, you can compare the training speed with CPU:
-
-```
-./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2 device=cpu
-```
-
-Further Reading
---------------
-
-[GPU Tuning Guide and Performance Comparison](./GPU-Performance.rst)
-
-[GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.rst)
-
-[GPU Windows Tutorial](./GPU-Windows.md)
-
-Reference
---------
-
-Please kindly cite the following article in your publications if you find the GPU acceleration useful:
-
-Huan Zhang, Si Si and Cho-Jui Hsieh. [GPU Acceleration for Large-scale Tree Boosting](https://arxiv.org/abs/1706.08359). arXiv:1706.08359, 2017.
--- a/docs/GPU-Tutorial.rst
+++ b/docs/GPU-Tutorial.rst
+LightGBM GPU Tutorial
+=====================
+
+The purpose of this document is to give you a quick step-by-step tutorial on GPU training.
+
+For Windows, please see `GPU Windows Tutorial <./GPU-Windows.rst>`__.
+
+We will use the GPU instance on `Microsoft Azure cloud computing platform`_ for demonstration,
+but you can use any machine with modern AMD or NVIDIA GPUs.
+
+GPU Setup
+---------
+
+You need to launch a ``NV`` type instance on Azure (available in East US, North Central US, South Central US, West Europe and Southeast Asia zones)
+and select Ubuntu 16.04 LTS as the operating system.
+
+For testing, the smallest ``NV6`` type virtual machine is sufficient, which includes 1/2 M60 GPU, with 8 GB memory, 180 GB/s memory bandwidth and 4,825 GFLOPS peak computation power.
+Don't use the ``NC`` type instance as the GPUs (K80) are based on an older architecture (Kepler).
+
+First we need to install minimal NVIDIA drivers and OpenCL development environment:
+
+::
+
+    sudo apt-get update
+    sudo apt-get install --no-install-recommends nvidia-375
+    sudo apt-get install --no-install-recommends nvidia-opencl-icd-375 nvidia-opencl-dev opencl-headers
+
+After installing the drivers you need to restart the server.
+
+::
+
+    sudo init 6
+
+After about 30 seconds, the server should be up again.
+
+If you are using a AMD GPU, you should download and install the `AMDGPU-Pro`_ driver and also install package ``ocl-icd-libopencl1`` and ``ocl-icd-opencl-dev``.
+
+Build LightGBM
+--------------
+
+Now install necessary building tools and dependencies:
+
+::
+
+    sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev
+
+The ``NV6`` GPU instance has a 320 GB ultra-fast SSD mounted at ``/mnt``.
+Let's use it as our workspace (skip this if you are using your own machine):
+
+::
+
+    sudo mkdir -p /mnt/workspace
+    sudo chown $(whoami):$(whoami) /mnt/workspace
+    cd /mnt/workspace
+
+Now we are ready to checkout LightGBM and compile it with GPU support:
+
+::
+
+    git clone --recursive https://github.com/Microsoft/LightGBM
+    cd LightGBM
+    mkdir build ; cd build
+    cmake -DUSE_GPU=1 .. 
+    make -j$(nproc)
+    cd ..
+
+You will see two binaries are generated, ``lightgbm`` and ``lib_lightgbm.so``.
+
+If you are building on OSX, you probably need to remove macro ``BOOST_COMPUTE_USE_OFFLINE_CACHE`` in ``src/treelearner/gpu_tree_learner.h`` to avoid a known crash bug in Boost.Compute.
+
+Install Python Interface (optional)
+-----------------------------------
+
+If you want to use the Python interface of LightGBM, you can install it now (along with some necessary Python-package dependencies):
+
+::
+
+    sudo apt-get -y install python-pip
+    sudo -H pip install setuptools numpy scipy scikit-learn -U
+    cd python-package/
+    sudo python setup.py install --precompile
+    cd ..
+
+You need to set an additional parameter ``"device" : "gpu"`` (along with your other options like ``learning_rate``, ``num_leaves``, etc) to use GPU in Python.
+
+You can read our `Python Package Examples`_ for more information on how to use the Python interface.
+
+Dataset Preparation
+-------------------
+
+Using the following commands to prepare the Higgs dataset:
+
+::
+
+    git clone https://github.com/guolinke/boosting_tree_benchmarks.git
+    cd boosting_tree_benchmarks/data
+    wget "https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz"
+    gunzip HIGGS.csv.gz
+    python higgs2libsvm.py
+    cd ../..
+    ln -s boosting_tree_benchmarks/data/higgs.train
+    ln -s boosting_tree_benchmarks/data/higgs.test
+
+Now we create a configuration file for LightGBM by running the following commands (please copy the entire block and run it as a whole):
+
+::
+
+    cat > lightgbm_gpu.conf <<EOF
+    max_bin = 63
+    num_leaves = 255
+    num_iterations = 50
+    learning_rate = 0.1
+    tree_learner = serial
+    task = train
+    is_training_metric = false
+    min_data_in_leaf = 1
+    min_sum_hessian_in_leaf = 100
+    ndcg_eval_at = 1,3,5,10
+    sparse_threshold = 1.0
+    device = gpu
+    gpu_platform_id = 0
+    gpu_device_id = 0
+    EOF
+    echo "num_threads=$(nproc)" >> lightgbm_gpu.conf
+
+GPU is enabled in the configuration file we just created by setting ``device=gpu``.
+It will use the first GPU installed on the system by default (``gpu_platform_id=0`` and ``gpu_device_id=0``).
+
+Run Your First Learning Task on GPU
+-----------------------------------
+
+Now we are ready to start GPU training!
+
+First we want to verify the GPU works correctly.
+Run the following command to train on GPU, and take a note of the AUC after 50 iterations:
+
+::
+
+    ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc
+
+Now train the same dataset on CPU using the following command. You should observe a similar AUC:
+
+::
+
+    ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc device=cpu
+
+Now we can make a speed test on GPU without calculating AUC after each iteration.
+
+::
+
+    ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=binary metric=auc
+
+Speed test on CPU:
+
+::
+
+    ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=binary metric=auc device=cpu
+
+You should observe over three times speedup on this GPU.
+
+The GPU acceleration can be used on other tasks/metrics (regression, multi-class classification, ranking, etc) as well.
+For example, we can train the Higgs dataset on GPU as a regression task:
+
+::
+
+    ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2
+
+Also, you can compare the training speed with CPU:
+
+::
+
+    ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2 device=cpu
+
+Further Reading
+---------------
+
+- `GPU Tuning Guide and Performance Comparison <./GPU-Performance.rst>`__
+
+- `GPU SDK Correspondence and Device Targeting Table <./GPU-Targets.rst>`__
+
+- `GPU Windows Tutorial <./GPU-Windows.rst>`__
+
+Reference
+---------
+
+Please kindly cite the following article in your publications if you find the GPU acceleration useful:
+
+Huan Zhang, Si Si and Cho-Jui Hsieh. "`GPU Acceleration for Large-scale Tree Boosting`_." arXiv:1706.08359, 2017.
+
+.. _Microsoft Azure cloud computing platform: https://azure.microsoft.com/
+
+.. _AMDGPU-Pro: http://support.amd.com/en-us/download/linux
+
+.. _Python Package Examples: https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide
+
+.. _GPU Acceleration for Large-scale Tree Boosting: https://arxiv.org/abs/1706.08359
--- a/docs/GPU-Windows.md
+++ b/docs/GPU-Windows.md
--- a/docs/GPU-Windows.rst
+++ b/docs/GPU-Windows.rst
--- a/docs/Installation-Guide.rst
+++ b/docs/Installation-Guide.rst
@@ -31,7 +31,7 @@ The exe file will be in ``LightGBM-master/windows/x64/Release`` folder.
 From Command Line
 *****************

-1. Install `Git for Windows`_, `CMake`_ (3.8 or higher) and `MSBuild`_ (MSbuild is not needed if **Visual Studio** is installed).
+1. Install `Git for Windows`_, `CMake`_ (3.8 or higher) and `MSBuild`_ (**MSBuild** is not needed if **Visual Studio** is installed).

 2. Run the following commands:

@@ -66,10 +66,12 @@ The exe and dll files will be in ``LightGBM/`` folder.

 **Note**: you may need to run the ``cmake -G "MinGW Makefiles" ..`` one more time if met ``sh.exe was found in your PATH`` error.

+Also you may want to reed `gcc Tips <./gcc-Tips.rst>`__.
+
 Linux
 ~~~~~

-LightGBM uses ``CMake`` to build. Run the following commands:
+LightGBM uses **CMake** to build. Run the following commands:

 .. code::

@@ -80,6 +82,8 @@ LightGBM uses ``CMake`` to build. Run the following commands:

 **Note**: glibc >= 2.14 is required.

+Also you may want to reed `gcc Tips <./gcc-Tips.rst>`__.
+
 OSX
 ~~~

@@ -102,6 +106,8 @@ Then install LightGBM:
  cmake ..
  make -j4

+Also you may want to reed `gcc Tips <./gcc-Tips.rst>`__.
+
 Docker
 ~~~~~~

@@ -129,7 +135,7 @@ With GUI

 4. Go to ``LightGBM-master/windows`` folder.

-4. Open ``LightGBM.sln`` file with Visual Studio, choose ``Release_mpi`` configuration and click ``BUILD-> Build Solution (Ctrl+Shift+B)``.
+5. Open ``LightGBM.sln`` file with Visual Studio, choose ``Release_mpi`` configuration and click ``BUILD-> Build Solution (Ctrl+Shift+B)``.

   If you have errors about **Platform Toolset**, go to ``PROJECT-> Properties-> Configuration Properties-> General`` and select the toolset installed on your machine.

@@ -140,7 +146,7 @@ From Command Line

 1. You need to install `MS MPI`_ first. Both ``msmpisdk.msi`` and ``MSMpiSetup.exe`` are needed.

-2. Install `Git for Windows`_, `CMake`_ (3.8 or higher) and `MSBuild`_ (MSbuild is not needed if **Visual Studio** is installed).
+2. Install `Git for Windows`_, `CMake`_ (3.8 or higher) and `MSBuild`_ (MSBuild is not needed if **Visual Studio** is installed).

 3. Run the following commands:

@@ -226,11 +232,11 @@ To build LightGBM GPU version, run the following commands:
 Windows
 ^^^^^^^

-If you use **MinGW**, the build procedure are similar to the build in Linux. Refer to `GPU Windows Compilation <./GPU-Windows.md>`__ to get more details.
+If you use **MinGW**, the build procedure are similar to the build in Linux. Refer to `GPU Windows Compilation <./GPU-Windows.rst>`__ to get more details.

-Following procedure is for the MSVC(Microsoft Visual C++) build.
+Following procedure is for the MSVC (Microsoft Visual C++) build.

-1. Install `Git for Windows`_, `CMake`_ (3.8 or higher) and `MSBuild`_ (MSbuild is not needed if **Visual Studio** is installed).
+1. Install `Git for Windows`_, `CMake`_ (3.8 or higher) and `MSBuild`_ (MSBuild is not needed if **Visual Studio** is installed).

 2. Install **OpenCL** for Windows. The installation depends on the brand (NVIDIA, AMD, Intel) of your GPU card.


--- a/docs/Parallel-Learning-Guide.rst
+++ b/docs/Parallel-Learning-Guide.rst
@@ -3,7 +3,7 @@ Parallel Learning Guide

 This is a guide for parallel learning of LightGBM.

-Follow the `Quick Start`_ to know how to use LightGBM first.
+Follow the `Quick Start <./Quick-Start.rst>`__ to know how to use LightGBM first.

 Choose Appropriate Parallel Algorithm
 -------------------------------------
@@ -30,14 +30,14 @@ These algorithms are suited for different scenarios, which is listed in the foll
 | **#feature is large**   | Feature Parallel     | Voting Parallel      |
 +-------------------------+----------------------+----------------------+

-More details about these parallel algorithms can be found in `optimization in parallel learning`_.
+More details about these parallel algorithms can be found in `optimization in parallel learning <./Features.rst#optimization-in-parallel-learning>`__.

 Build Parallel Version
 ----------------------

 Default build version support parallel learning based on the socket.

-If you need to build parallel version with MPI support, please refer to `Installation Guide`_.
+If you need to build parallel version with MPI support, please refer to `Installation Guide <./Installation-Guide.rst#build-mpi-version>`__.

 Preparation
 -----------
@@ -64,7 +64,7 @@ Then write these IP in one file (assume ``mlist.txt``) like following:
    machine1_ip
    machine2_ip

-Note: For Windows users, need to start "smpd" to start MPI service. More details can be found `here`_.
+**Note**: For Windows users, need to start "smpd" to start MPI service. More details can be found `here`_.

 Run Parallel Learning
 ---------------------
@@ -74,49 +74,53 @@ Socket Version

 1. Edit following parameters in config file:

-``tree_learner=your_parallel_algorithm``, edit ``your_parallel_algorithm`` (e.g. feature/data) here.
+   ``tree_learner=your_parallel_algorithm``, edit ``your_parallel_algorithm`` (e.g. feature/data) here.

-``num_machines=your_num_machines``, edit ``your_num_machines`` (e.g. 4) here.
+   ``num_machines=your_num_machines``, edit ``your_num_machines`` (e.g. 4) here.

-``machine_list_file=mlist.txt``, ``mlist.txt`` is created in `Preparation section <#preparation>`__.
+   ``machine_list_file=mlist.txt``, ``mlist.txt`` is created in `Preparation section <#preparation>`__.

-``local_listen_port=12345``, ``12345`` is allocated in `Preparation section <#preparation>`__.
+   ``local_listen_port=12345``, ``12345`` is allocated in `Preparation section <#preparation>`__.

 2. Copy data file, executable file, config file and ``mlist.txt`` to all machines.

 3. Run following command on all machines, you need to change ``your_config_file`` to real config file.

-For Windows: ``lightgbm.exe config=your_config_file``
+   For Windows: ``lightgbm.exe config=your_config_file``

-For Linux: ``./lightgbm config=your_config_file``
+   For Linux: ``./lightgbm config=your_config_file``

 MPI Version
 ^^^^^^^^^^^

 1. Edit following parameters in config file:

-``tree_learner=your_parallel_algorithm``, edit ``your_parallel_algorithm`` (e.g. feature/data) here.
+   ``tree_learner=your_parallel_algorithm``, edit ``your_parallel_algorithm`` (e.g. feature/data) here.

-``num_machines=your_num_machines``, edit ``your_num_machines`` (e.g. 4) here.
+   ``num_machines=your_num_machines``, edit ``your_num_machines`` (e.g. 4) here.

-2. Copy data file, executable file, config file and ``mlist.txt`` to all machines. Note: MPI needs to be run in the **same path on all machines**.
+2. Copy data file, executable file, config file and ``mlist.txt`` to all machines.
+
+   **Note**: MPI needs to be run in the **same path on all machines**.

 3. Run following command on one machine (not need to run on all machines), need to change ``your_config_file`` to real config file.

-For Windows: ``mpiexec.exe /machinefile mlist.txt lightgbm.exe config=your_config_file``
+   For Windows:
+   
+   .. code::

-For Linux: ``mpiexec --machinefile mlist.txt ./lightgbm config=your_config_file``
+       mpiexec.exe /machinefile mlist.txt lightgbm.exe config=your_config_file

-Example
-^^^^^^^
+   For Linux:

-  `A simple parallel example`_.
+   .. code::

-.. _Quick Start: ./Quick-Start.md
+       mpiexec --machinefile mlist.txt ./lightgbm config=your_config_file

-.. _optimization in parallel learning: ./Features.md
+Example
+^^^^^^^

-.. _Installation Guide: ./Installation-Guide.rst
+-  `A simple parallel example`_

 .. _here: https://blogs.technet.microsoft.com/windowshpc/2015/02/02/how-to-compile-and-run-a-simple-ms-mpi-program/


--- a/docs/Parameters-Tuning.rst
+++ b/docs/Parameters-Tuning.rst
+Parameters Tuning
+=================
+
+This is a page contains all parameters in LightGBM.
+
+**List of other helpful links**
+
+-  `Parameters <./Parameters.rst>`__
+-  `Python API <./Python-API.rst>`__
+
+Tune Parameters for the Leaf-wise (Best-first) Tree
+---------------------------------------------------
+
+LightGBM uses the `leaf-wise <./Features.rst#leaf-wise-best-first-tree-growth>`__ tree growth algorithm, while many other popular tools use depth-wise tree growth.
+Compared with depth-wise growth, the leaf-wise algorithm can convenge much faster.
+However, the leaf-wise growth may be over-fitting if not used with the appropriate parameters.
+
+To get good results using a leaf-wise tree, these are some important parameters:
+
+1. ``num_leaves``. This is the main parameter to control the complexity of the tree model.
+   Theoretically, we can set ``num_leaves = 2^(max_depth)`` to convert from depth-wise tree.
+   However, this simple conversion is not good in practice.
+   The reason is, when number of leaves are the same, the leaf-wise tree is much deeper than depth-wise tree. As a result, it may be over-fitting.
+   Thus, when trying to tune the ``num_leaves``, we should let it be smaller than ``2^(max_depth)``.
+   For example, when the ``max_depth=6`` the depth-wise tree can get good accuracy,
+   but setting ``num_leaves`` to ``127`` may cause over-fitting, and setting it to ``70`` or ``80`` may get better accuracy than depth-wise.
+   Actually, the concept ``depth`` can be forgotten in leaf-wise tree, since it doesn't have a correct mapping from ``leaves`` to ``depth``.
+
+2. ``min_data_in_leaf``. This is a very important parameter to deal with over-fitting in leaf-wise tree.
+   Its value depends on the number of training data and ``num_leaves``.
+   Setting it to a large value can avoid growing too deep a tree, but may cause under-fitting.
+   In practice, setting it to hundreds or thousands is enough for a large dataset.
+
+3. ``max_depth``. You also can use ``max_depth`` to limit the tree depth explicitly.
+
+For Faster Speed
+----------------
+
+-  Use bagging by setting ``bagging_fraction`` and ``bagging_freq``
+
+-  Use feature sub-sampling by setting ``feature_fraction``
+
+-  Use small ``max_bin``
+
+-  Use ``save_binary`` to speed up data loading in future learning
+
+-  Use parallel learning, refer to `Parallel Learning Guide <./Parallel-Learning-Guide.rst>`__
+
+
+For Better Accuracy
+-------------------
+
+-  Use large ``max_bin`` (may be slower)
+
+-  Use small ``learning_rate`` with large ``num_iterations``
+
+-  Use large ``num_leaves`` (may cause over-fitting)
+
+-  Use bigger training data
+
+-  Try ``dart``
+
+Deal with Over-fitting
+----------------------
+
+-  Use small ``max_bin``
+
+-  Use small ``num_leaves``
+
+-  Use ``min_data_in_leaf`` and ``min_sum_hessian_in_leaf``
+
+-  Use bagging by set ``bagging_fraction`` and ``bagging_freq``
+
+-  Use feature sub-sampling by set ``feature_fraction``
+
+-  Use bigger training data
+
+-  Try ``lambda_l1``, ``lambda_l2`` and ``min_gain_to_split`` for regularization
+
+-  Try ``max_depth`` to avoid growing deep tree
--- a/docs/Parameters-tuning.md
+++ b/docs/Parameters-tuning.md
-# Parameters Tuning
-
-This is a page contains all parameters in LightGBM.
-
-***List of other Helpful Links***
-* [Parameters](./Parameters.md)
-* [Python API](./Python-API.rst)
-
-## Tune Parameters for the Leaf-wise (Best-first) Tree
-
-LightGBM uses the [leaf-wise](./Features.md) tree growth algorithm, while many other popular tools use depth-wise tree growth. Compared with depth-wise growth, the leaf-wise algorithm can convenge much faster. However, the leaf-wise growth may be over-fitting if not used with the appropriate parameters. 
-
-To get good results using a leaf-wise tree, these are some important parameters:
-
-1. ```num_leaves```. This is the main parameter to control the complexity of the tree model. Theoretically, we can set ```num_leaves = 2^(max_depth) ``` to convert from depth-wise tree. However, this simple conversion is not good in practice. The reason is, when number of leaves are the same, the leaf-wise tree is much deeper than depth-wise tree. As a result, it may be over-fitting. Thus, when trying to tune the ```num_leaves```, we should let it be smaller than ```2^(max_depth)```. For example, when the ```max_depth=6``` the depth-wise tree can get good accuracy, but setting ```num_leaves``` to ```127``` may cause over-fitting, and setting it to ```70``` or ```80``` may get better accuracy than depth-wise. Actually, the concept ```depth``` can be forgotten in leaf-wise tree, since it doesn't have a correct mapping from ```leaves``` to ```depth```. 
-
-2. ```min_data_in_leaf```. This is a very important parameter to deal with over-fitting in leaf-wise tree. Its value depends on the number of training data and ```num_leaves```. Setting it to a large value can avoid growing too deep a tree, but may cause under-fitting. In practice, setting it to hundreds or thousands is enough for a large dataset. 
-
-3. ```max_depth```. You also can use ```max_depth``` to limit the tree depth explicitly. 
-
-
-## For Faster Speed
-
-* Use bagging by setting ```bagging_fraction``` and ```bagging_freq``` 
-* Use feature sub-sampling by setting ```feature_fraction```
-* Use small ```max_bin```
-* Use ```save_binary``` to speed up data loading in future learning
-* Use parallel learning, refer to [Parallel Learning Guide](./Parallel-Learning-Guide.rst).
-
-## For Better Accuracy
-
-* Use large ```max_bin``` (may be slower)
-* Use small ```learning_rate``` with large ```num_iterations```
-* Use large ```num_leaves```(may cause over-fitting)
-* Use bigger training data
-* Try ```dart```
-
-## Deal with Over-fitting
-
-* Use small ```max_bin```
-* Use small ```num_leaves```
-* Use ```min_data_in_leaf``` and ```min_sum_hessian_in_leaf```
-* Use bagging by set ```bagging_fraction``` and ```bagging_freq``` 
-* Use feature sub-sampling by set ```feature_fraction```
-* Use bigger training data
-* Try ```lambda_l1```, ```lambda_l2``` and ```min_gain_to_split``` to regularization
-* Try ```max_depth``` to avoid growing deep tree