Commit ac6951d3 authored by Alex's avatar Alex Committed by Guolin Ke
Browse files

[docs] fixed some typos and grammatical errors (#1738)

parent 7949cf51
...@@ -56,7 +56,7 @@ LightGBM ...@@ -56,7 +56,7 @@ LightGBM
-------------- --------------
- **Question 2**: On datasets with million of features, training does not start (or starts after a very long time). - **Question 2**: On datasets with millions of features, training does not start (or starts after a very long time).
- **Solution 2**: Use a smaller value for ``bin_construct_sample_cnt`` and a larger value for ``min_data``. - **Solution 2**: Use a smaller value for ``bin_construct_sample_cnt`` and a larger value for ``min_data``.
......
...@@ -93,7 +93,7 @@ Feature parallel aims to parallelize the "Find Best Split" in the decision tree. ...@@ -93,7 +93,7 @@ Feature parallel aims to parallelize the "Find Best Split" in the decision tree.
4. Worker with best split to perform split, then send the split result of data to other workers. 4. Worker with best split to perform split, then send the split result of data to other workers.
5. Other workers split data according received data. 5. Other workers split data according to received data.
The shortcomings of traditional feature parallel: The shortcomings of traditional feature parallel:
......
...@@ -75,7 +75,7 @@ OpenCL SDK Installation ...@@ -75,7 +75,7 @@ OpenCL SDK Installation
----------------------- -----------------------
Installing the appropriate OpenCL SDK requires you to download the correct vendor source SDK. Installing the appropriate OpenCL SDK requires you to download the correct vendor source SDK.
You need to know on what you are going to use LightGBM!: You need to know what you are going to use LightGBM!:
- For running on Intel, get `Intel SDK for OpenCL`_ (NOT RECOMMENDED) - For running on Intel, get `Intel SDK for OpenCL`_ (NOT RECOMMENDED)
......
...@@ -485,7 +485,7 @@ IO Parameters ...@@ -485,7 +485,7 @@ IO Parameters
- ``sparse_threshold`` :raw-html:`<a id="sparse_threshold" title="Permalink to this parameter" href="#sparse_threshold">&#x1F517;&#xFE0E;</a>`, default = ``0.8``, type = double, constraints: ``0.0 < sparse_threshold <= 1.0`` - ``sparse_threshold`` :raw-html:`<a id="sparse_threshold" title="Permalink to this parameter" href="#sparse_threshold">&#x1F517;&#xFE0E;</a>`, default = ``0.8``, type = double, constraints: ``0.0 < sparse_threshold <= 1.0``
- the threshold of zero elements precentage for treating a feature as a sparse one - the threshold of zero elements percentage for treating a feature as a sparse one
- ``use_missing`` :raw-html:`<a id="use_missing" title="Permalink to this parameter" href="#use_missing">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool - ``use_missing`` :raw-html:`<a id="use_missing" title="Permalink to this parameter" href="#use_missing">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
...@@ -493,7 +493,7 @@ IO Parameters ...@@ -493,7 +493,7 @@ IO Parameters
- ``zero_as_missing`` :raw-html:`<a id="zero_as_missing" title="Permalink to this parameter" href="#zero_as_missing">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool - ``zero_as_missing`` :raw-html:`<a id="zero_as_missing" title="Permalink to this parameter" href="#zero_as_missing">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
- set this to ``true`` to treat all zero as missing values (including the unshown values in libsvm/sparse matrics) - set this to ``true`` to treat all zero as missing values (including the unshown values in libsvm/sparse matrices)
- set this to ``false`` to use ``na`` for representing missing values - set this to ``false`` to use ``na`` for representing missing values
...@@ -573,7 +573,7 @@ IO Parameters ...@@ -573,7 +573,7 @@ IO Parameters
- **Note**: all values should be less than ``Int32.MaxValue`` (2147483647) - **Note**: all values should be less than ``Int32.MaxValue`` (2147483647)
- **Note**: using large values could be memory consuming. Tree decision rule works best when categorical features are presented by consecutive integers started from zero - **Note**: using large values could be memory consuming. Tree decision rule works best when categorical features are presented by consecutive integers starting from zero
- **Note**: all negative values will be treated as **missing values** - **Note**: all negative values will be treated as **missing values**
...@@ -656,7 +656,7 @@ Objective Parameters ...@@ -656,7 +656,7 @@ Objective Parameters
- used only in ``binary`` application - used only in ``binary`` application
- set this to ``true`` if training data are unbalance - set this to ``true`` if training data are unbalanced
- **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them - **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them
...@@ -872,7 +872,7 @@ It means the initial score of the first data row is ``0.5``, second is ``-0.1``, ...@@ -872,7 +872,7 @@ It means the initial score of the first data row is ``0.5``, second is ``-0.1``,
The initial score file corresponds with data file line by line, and has per score per line. The initial score file corresponds with data file line by line, and has per score per line.
And if the name of data file is ``train.txt``, the initial score file should be named as ``train.txt.init`` and in the same folder as the data file. And if the name of data file is ``train.txt``, the initial score file should be named as ``train.txt.init`` and in the same folder as the data file.
In this case LightGBM will auto load initial score file if it exists. In this case, LightGBM will auto load initial score file if it exists.
Otherwise, you should specify the path to the custom named file with initial scores by the ``initscore_filename`` `parameter <#initscore_filename>`__. Otherwise, you should specify the path to the custom named file with initial scores by the ``initscore_filename`` `parameter <#initscore_filename>`__.
...@@ -892,7 +892,7 @@ It means the weight of the first data row is ``1.0``, second is ``0.5``, and so ...@@ -892,7 +892,7 @@ It means the weight of the first data row is ``1.0``, second is ``0.5``, and so
The weight file corresponds with data file line by line, and has per weight per line. The weight file corresponds with data file line by line, and has per weight per line.
And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file. And if the name of data file is ``train.txt``, the weight file should be named as ``train.txt.weight`` and placed in the same folder as the data file.
In this case LightGBM will load the weight file automatically if it exists. In this case, LightGBM will load the weight file automatically if it exists.
Also, you can include weight column in your data file. Please refer to the ``weight_column`` `parameter <#weight_column>`__ in above. Also, you can include weight column in your data file. Please refer to the ``weight_column`` `parameter <#weight_column>`__ in above.
...@@ -914,7 +914,7 @@ It means first ``27`` lines samples belong to one query and next ``18`` lines be ...@@ -914,7 +914,7 @@ It means first ``27`` lines samples belong to one query and next ``18`` lines be
**Note**: data should be ordered by the query. **Note**: data should be ordered by the query.
If the name of data file is ``train.txt``, the query file should be named as ``train.txt.query`` and placed in the same folder as the data file. If the name of data file is ``train.txt``, the query file should be named as ``train.txt.query`` and placed in the same folder as the data file.
In this case LightGBM will load the query file automatically if it exists. In this case, LightGBM will load the query file automatically if it exists.
Also, you can include query/group id column in your data file. Please refer to the ``group_column`` `parameter <#group_column>`__ in above. Also, you can include query/group id column in your data file. Please refer to the ``group_column`` `parameter <#group_column>`__ in above.
......
...@@ -112,11 +112,11 @@ or ...@@ -112,11 +112,11 @@ or
And you can use ``Dataset.set_init_score()`` to set initial score, and ``Dataset.set_group()`` to set group/query data for ranking tasks. And you can use ``Dataset.set_init_score()`` to set initial score, and ``Dataset.set_group()`` to set group/query data for ranking tasks.
**Memory efficent usage:** **Memory efficient usage:**
The ``Dataset`` object in LightGBM is very memory-efficient, due to it only need to save discrete bins. The ``Dataset`` object in LightGBM is very memory-efficient, due to it only need to save discrete bins.
However, Numpy/Array/Pandas object is memory cost. However, Numpy/Array/Pandas object is memory cost.
If you concern about your memory consumption, you can save memory according to following: If you concern about your memory consumption, you can save memory according to the following:
1. Let ``free_raw_data=True`` (default is ``True``) when constructing the ``Dataset`` 1. Let ``free_raw_data=True`` (default is ``True``) when constructing the ``Dataset``
...@@ -204,7 +204,7 @@ Note that if you specify more than one evaluation metric, all of them will be us ...@@ -204,7 +204,7 @@ Note that if you specify more than one evaluation metric, all of them will be us
Prediction Prediction
---------- ----------
A model that has been trained or loaded can perform predictions on data sets: A model that has been trained or loaded can perform predictions on datasets:
.. code:: python .. code:: python
......
...@@ -61,8 +61,8 @@ Run LightGBM ...@@ -61,8 +61,8 @@ Run LightGBM
"./lightgbm" config=your_config_file other_args ... "./lightgbm" config=your_config_file other_args ...
Parameters can be set both in config file and command line, and the parameters in command line have higher priority than in config file. Parameters can be set both in the config file and command line, and the parameters in command line have higher priority than in the config file.
For example, the following command line will keep ``num_trees=10`` and ignore the same parameter in config file. For example, the following command line will keep ``num_trees=10`` and ignore the same parameter in the config file.
:: ::
......
...@@ -470,13 +470,13 @@ public: ...@@ -470,13 +470,13 @@ public:
// check = >0.0 // check = >0.0
// check = <=1.0 // check = <=1.0
// desc = the threshold of zero elements precentage for treating a feature as a sparse one // desc = the threshold of zero elements percentage for treating a feature as a sparse one
double sparse_threshold = 0.8; double sparse_threshold = 0.8;
// desc = set this to ``false`` to disable the special handle of missing value // desc = set this to ``false`` to disable the special handle of missing value
bool use_missing = true; bool use_missing = true;
// desc = set this to ``true`` to treat all zero as missing values (including the unshown values in libsvm/sparse matrics) // desc = set this to ``true`` to treat all zero as missing values (including the unshown values in libsvm/sparse matrices)
// desc = set this to ``false`` to use ``na`` for representing missing values // desc = set this to ``false`` to use ``na`` for representing missing values
bool zero_as_missing = false; bool zero_as_missing = false;
...@@ -539,7 +539,7 @@ public: ...@@ -539,7 +539,7 @@ public:
// desc = **Note**: only supports categorical with ``int`` type // desc = **Note**: only supports categorical with ``int`` type
// desc = **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int`` // desc = **Note**: index starts from ``0`` and it doesn't count the label column when passing type is ``int``
// desc = **Note**: all values should be less than ``Int32.MaxValue`` (2147483647) // desc = **Note**: all values should be less than ``Int32.MaxValue`` (2147483647)
// desc = **Note**: using large values could be memory consuming. Tree decision rule works best when categorical features are presented by consecutive integers started from zero // desc = **Note**: using large values could be memory consuming. Tree decision rule works best when categorical features are presented by consecutive integers starting from zero
// desc = **Note**: all negative values will be treated as **missing values** // desc = **Note**: all negative values will be treated as **missing values**
std::string categorical_feature = ""; std::string categorical_feature = "";
...@@ -601,7 +601,7 @@ public: ...@@ -601,7 +601,7 @@ public:
// alias = unbalance, unbalanced_sets // alias = unbalance, unbalanced_sets
// desc = used only in ``binary`` application // desc = used only in ``binary`` application
// desc = set this to ``true`` if training data are unbalance // desc = set this to ``true`` if training data are unbalanced
// desc = **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them // desc = **Note**: this parameter cannot be used at the same time with ``scale_pos_weight``, choose only **one** of them
bool is_unbalance = false; bool is_unbalance = false;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment