Unverified Commit 060c681d authored by Nikita Titov's avatar Nikita Titov Committed by GitHub
Browse files

[docs] Added subsections into param list (#2779)

* added possibility to render nested sections in params

* reorganize param sections

* reorder params
parent 222dff54
...@@ -228,6 +228,12 @@ Learning Control Parameters ...@@ -228,6 +228,12 @@ Learning Control Parameters
- **Note**: this parameter cannot be used at the same time with ``force_col_wise``, choose only one of them - **Note**: this parameter cannot be used at the same time with ``force_col_wise``, choose only one of them
- ``histogram_pool_size`` :raw-html:`<a id="histogram_pool_size" title="Permalink to this parameter" href="#histogram_pool_size">&#x1F517;&#xFE0E;</a>`, default = ``-1.0``, type = double, aliases: ``hist_pool_size``
- max cache size in MB for historical histogram
- ``< 0`` means no limit
- ``max_depth`` :raw-html:`<a id="max_depth" title="Permalink to this parameter" href="#max_depth">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int - ``max_depth`` :raw-html:`<a id="max_depth" title="Permalink to this parameter" href="#max_depth">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int
- limit the max depth for tree model. This is used to deal with over-fitting when ``#data`` is small. Tree still grows leaf-wise - limit the max depth for tree model. This is used to deal with over-fitting when ``#data`` is small. Tree still grows leaf-wise
...@@ -468,14 +474,6 @@ Learning Control Parameters ...@@ -468,14 +474,6 @@ Learning Control Parameters
- see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/binary_classification/forced_splits.json>`__ as an example - see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/binary_classification/forced_splits.json>`__ as an example
- ``forcedbins_filename`` :raw-html:`<a id="forcedbins_filename" title="Permalink to this parameter" href="#forcedbins_filename">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string
- path to a ``.json`` file that specifies bin upper bounds for some or all features
- ``.json`` file should contain an array of objects, each containing the word ``feature`` (integer feature index) and ``bin_upper_bound`` (array of thresholds for binning)
- see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/regression/forced_bins.json>`__ as an example
- ``refit_decay_rate`` :raw-html:`<a id="refit_decay_rate" title="Permalink to this parameter" href="#refit_decay_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.9``, type = double, constraints: ``0.0 <= refit_decay_rate <= 1.0`` - ``refit_decay_rate`` :raw-html:`<a id="refit_decay_rate" title="Permalink to this parameter" href="#refit_decay_rate">&#x1F517;&#xFE0E;</a>`, default = ``0.9``, type = double, constraints: ``0.0 <= refit_decay_rate <= 1.0``
- decay rate of ``refit`` task, will use ``leaf_output = refit_decay_rate * old_leaf_output + (1.0 - refit_decay_rate) * new_leaf_output`` to refit trees - decay rate of ``refit`` task, will use ``leaf_output = refit_decay_rate * old_leaf_output + (1.0 - refit_decay_rate) * new_leaf_output`` to refit trees
...@@ -502,15 +500,42 @@ Learning Control Parameters ...@@ -502,15 +500,42 @@ Learning Control Parameters
- applied once per forest - applied once per forest
IO Parameters
-------------
- ``verbosity`` :raw-html:`<a id="verbosity" title="Permalink to this parameter" href="#verbosity">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``verbose`` - ``verbosity`` :raw-html:`<a id="verbosity" title="Permalink to this parameter" href="#verbosity">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``verbose``
- controls the level of LightGBM's verbosity - controls the level of LightGBM's verbosity
- ``< 0``: Fatal, ``= 0``: Error (Warning), ``= 1``: Info, ``> 1``: Debug - ``< 0``: Fatal, ``= 0``: Error (Warning), ``= 1``: Info, ``> 1``: Debug
- ``input_model`` :raw-html:`<a id="input_model" title="Permalink to this parameter" href="#input_model">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``model_input``, ``model_in``
- filename of input model
- for ``prediction`` task, this model will be applied to prediction data
- for ``train`` task, training will be continued from this model
- **Note**: can be used only in CLI version
- ``output_model`` :raw-html:`<a id="output_model" title="Permalink to this parameter" href="#output_model">&#x1F517;&#xFE0E;</a>`, default = ``LightGBM_model.txt``, type = string, aliases: ``model_output``, ``model_out``
- filename of output model in training
- **Note**: can be used only in CLI version
- ``snapshot_freq`` :raw-html:`<a id="snapshot_freq" title="Permalink to this parameter" href="#snapshot_freq">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int, aliases: ``save_period``
- frequency of saving model file snapshot
- set this to positive value to enable this function. For example, the model file will be snapshotted at each iteration if ``snapshot_freq=1``
- **Note**: can be used only in CLI version
IO Parameters
-------------
Dataset Parameters
~~~~~~~~~~~~~~~~~~
- ``max_bin`` :raw-html:`<a id="max_bin" title="Permalink to this parameter" href="#max_bin">&#x1F517;&#xFE0E;</a>`, default = ``255``, type = int, constraints: ``max_bin > 1`` - ``max_bin`` :raw-html:`<a id="max_bin" title="Permalink to this parameter" href="#max_bin">&#x1F517;&#xFE0E;</a>`, default = ``255``, type = int, constraints: ``max_bin > 1``
- max number of bins that feature values will be bucketed in - max number of bins that feature values will be bucketed in
...@@ -519,10 +544,6 @@ IO Parameters ...@@ -519,10 +544,6 @@ IO Parameters
- LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255`` - LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255``
- ``is_enable_sparse`` :raw-html:`<a id="is_enable_sparse" title="Permalink to this parameter" href="#is_enable_sparse">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool, aliases: ``is_sparse``, ``enable_sparse``, ``sparse``
- used to enable/disable sparse optimization
- ``max_bin_by_feature`` :raw-html:`<a id="max_bin_by_feature" title="Permalink to this parameter" href="#max_bin_by_feature">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-int - ``max_bin_by_feature`` :raw-html:`<a id="max_bin_by_feature" title="Permalink to this parameter" href="#max_bin_by_feature">&#x1F517;&#xFE0E;</a>`, default = ``None``, type = multi-int
- max number of bins for each feature - max number of bins for each feature
...@@ -535,14 +556,6 @@ IO Parameters ...@@ -535,14 +556,6 @@ IO Parameters
- use this to avoid one-data-one-bin (potential over-fitting) - use this to avoid one-data-one-bin (potential over-fitting)
- ``feature_pre_filter`` :raw-html:`<a id="feature_pre_filter" title="Permalink to this parameter" href="#feature_pre_filter">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
- set this to ``true`` to pre-filter the unsplittable features by ``min_data_in_leaf``
- as dataset object is initialized only once and cannot be changed after that, you may need to set this to ``false`` when searching parameters with ``min_data_in_leaf``, otherwise features are filtered by ``min_data_in_leaf`` firstly if you don't reconstruct dataset object
- **Note**: setting this to ``false`` may slow down the training
- ``bin_construct_sample_cnt`` :raw-html:`<a id="bin_construct_sample_cnt" title="Permalink to this parameter" href="#bin_construct_sample_cnt">&#x1F517;&#xFE0E;</a>`, default = ``200000``, type = int, aliases: ``subsample_for_bin``, constraints: ``bin_construct_sample_cnt > 0`` - ``bin_construct_sample_cnt`` :raw-html:`<a id="bin_construct_sample_cnt" title="Permalink to this parameter" href="#bin_construct_sample_cnt">&#x1F517;&#xFE0E;</a>`, default = ``200000``, type = int, aliases: ``subsample_for_bin``, constraints: ``bin_construct_sample_cnt > 0``
- number of data that sampled to construct histogram bins - number of data that sampled to construct histogram bins
...@@ -551,45 +564,37 @@ IO Parameters ...@@ -551,45 +564,37 @@ IO Parameters
- set this to larger value if data is very sparse - set this to larger value if data is very sparse
- ``histogram_pool_size`` :raw-html:`<a id="histogram_pool_size" title="Permalink to this parameter" href="#histogram_pool_size">&#x1F517;&#xFE0E;</a>`, default = ``-1.0``, type = double, aliases: ``hist_pool_size``
- max cache size in MB for historical histogram
- ``< 0`` means no limit
- ``data_random_seed`` :raw-html:`<a id="data_random_seed" title="Permalink to this parameter" href="#data_random_seed">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``data_seed`` - ``data_random_seed`` :raw-html:`<a id="data_random_seed" title="Permalink to this parameter" href="#data_random_seed">&#x1F517;&#xFE0E;</a>`, default = ``1``, type = int, aliases: ``data_seed``
- random seed for data partition in parallel learning (excluding the ``feature_parallel`` mode) - random seed for data partition in parallel learning (excluding the ``feature_parallel`` mode)
- ``output_model`` :raw-html:`<a id="output_model" title="Permalink to this parameter" href="#output_model">&#x1F517;&#xFE0E;</a>`, default = ``LightGBM_model.txt``, type = string, aliases: ``model_output``, ``model_out`` - ``is_enable_sparse`` :raw-html:`<a id="is_enable_sparse" title="Permalink to this parameter" href="#is_enable_sparse">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool, aliases: ``is_sparse``, ``enable_sparse``, ``sparse``
- filename of output model in training
- **Note**: can be used only in CLI version - used to enable/disable sparse optimization
- ``snapshot_freq`` :raw-html:`<a id="snapshot_freq" title="Permalink to this parameter" href="#snapshot_freq">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int, aliases: ``save_period`` - ``enable_bundle`` :raw-html:`<a id="enable_bundle" title="Permalink to this parameter" href="#enable_bundle">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool, aliases: ``is_enable_bundle``, ``bundle``
- frequency of saving model file snapshot - set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`__
- set this to positive value to enable this function. For example, the model file will be snapshotted at each iteration if ``snapshot_freq=1`` - **Note**: disabling this may cause the slow training speed for sparse datasets
- **Note**: can be used only in CLI version - ``use_missing`` :raw-html:`<a id="use_missing" title="Permalink to this parameter" href="#use_missing">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
- ``input_model`` :raw-html:`<a id="input_model" title="Permalink to this parameter" href="#input_model">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string, aliases: ``model_input``, ``model_in`` - set this to ``false`` to disable the special handle of missing value
- filename of input model - ``zero_as_missing`` :raw-html:`<a id="zero_as_missing" title="Permalink to this parameter" href="#zero_as_missing">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
- for ``prediction`` task, this model will be applied to prediction data - set this to ``true`` to treat all zero as missing values (including the unshown values in LibSVM / sparse matrices)
- for ``train`` task, training will be continued from this model - set this to ``false`` to use ``na`` for representing missing values
- **Note**: can be used only in CLI version - ``feature_pre_filter`` :raw-html:`<a id="feature_pre_filter" title="Permalink to this parameter" href="#feature_pre_filter">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
- ``output_result`` :raw-html:`<a id="output_result" title="Permalink to this parameter" href="#output_result">&#x1F517;&#xFE0E;</a>`, default = ``LightGBM_predict_result.txt``, type = string, aliases: ``predict_result``, ``prediction_result``, ``predict_name``, ``prediction_name``, ``pred_name``, ``name_pred`` - set this to ``true`` to pre-filter the unsplittable features by ``min_data_in_leaf``
- filename of prediction result in ``prediction`` task - as dataset object is initialized only once and cannot be changed after that, you may need to set this to ``false`` when searching parameters with ``min_data_in_leaf``, otherwise features are filtered by ``min_data_in_leaf`` firstly if you don't reconstruct dataset object
- **Note**: can be used only in CLI version - **Note**: setting this to ``false`` may slow down the training
- ``pre_partition`` :raw-html:`<a id="pre_partition" title="Permalink to this parameter" href="#pre_partition">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_pre_partition`` - ``pre_partition`` :raw-html:`<a id="pre_partition" title="Permalink to this parameter" href="#pre_partition">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_pre_partition``
...@@ -597,22 +602,6 @@ IO Parameters ...@@ -597,22 +602,6 @@ IO Parameters
- ``true`` if training data are pre-partitioned, and different machines use different partitions - ``true`` if training data are pre-partitioned, and different machines use different partitions
- ``enable_bundle`` :raw-html:`<a id="enable_bundle" title="Permalink to this parameter" href="#enable_bundle">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool, aliases: ``is_enable_bundle``, ``bundle``
- set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`__
- **Note**: disabling this may cause the slow training speed for sparse datasets
- ``use_missing`` :raw-html:`<a id="use_missing" title="Permalink to this parameter" href="#use_missing">&#x1F517;&#xFE0E;</a>`, default = ``true``, type = bool
- set this to ``false`` to disable the special handle of missing value
- ``zero_as_missing`` :raw-html:`<a id="zero_as_missing" title="Permalink to this parameter" href="#zero_as_missing">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
- set this to ``true`` to treat all zero as missing values (including the unshown values in LibSVM / sparse matrices)
- set this to ``false`` to use ``na`` for representing missing values
- ``two_round`` :raw-html:`<a id="two_round" title="Permalink to this parameter" href="#two_round">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``two_round_loading``, ``use_two_round_loading`` - ``two_round`` :raw-html:`<a id="two_round" title="Permalink to this parameter" href="#two_round">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``two_round_loading``, ``use_two_round_loading``
- set this to ``true`` if data file is too big to fit in memory - set this to ``true`` if data file is too big to fit in memory
...@@ -621,14 +610,6 @@ IO Parameters ...@@ -621,14 +610,6 @@ IO Parameters
- **Note**: works only in case of loading data directly from file - **Note**: works only in case of loading data directly from file
- ``save_binary`` :raw-html:`<a id="save_binary" title="Permalink to this parameter" href="#save_binary">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_save_binary``, ``is_save_binary_file``
- if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time
- **Note**: ``init_score`` is not saved in binary file
- **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function
- ``header`` :raw-html:`<a id="header" title="Permalink to this parameter" href="#header">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``has_header`` - ``header`` :raw-html:`<a id="header" title="Permalink to this parameter" href="#header">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``has_header``
- set this to ``true`` if input data has header - set this to ``true`` if input data has header
...@@ -705,6 +686,33 @@ IO Parameters ...@@ -705,6 +686,33 @@ IO Parameters
- **Note**: the output cannot be monotonically constrained with respect to a categorical feature - **Note**: the output cannot be monotonically constrained with respect to a categorical feature
- ``forcedbins_filename`` :raw-html:`<a id="forcedbins_filename" title="Permalink to this parameter" href="#forcedbins_filename">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string
- path to a ``.json`` file that specifies bin upper bounds for some or all features
- ``.json`` file should contain an array of objects, each containing the word ``feature`` (integer feature index) and ``bin_upper_bound`` (array of thresholds for binning)
- see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/regression/forced_bins.json>`__ as an example
- ``save_binary`` :raw-html:`<a id="save_binary" title="Permalink to this parameter" href="#save_binary">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_save_binary``, ``is_save_binary_file``
- if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time
- **Note**: ``init_score`` is not saved in binary file
- **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function
Predict Parameters
~~~~~~~~~~~~~~~~~~
- ``num_iteration_predict`` :raw-html:`<a id="num_iteration_predict" title="Permalink to this parameter" href="#num_iteration_predict">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int
- used only in ``prediction`` task
- used to specify how many trained iterations will be used in prediction
- ``<= 0`` means no limit
- ``predict_raw_score`` :raw-html:`<a id="predict_raw_score" title="Permalink to this parameter" href="#predict_raw_score">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_predict_raw_score``, ``predict_rawscore``, ``raw_score`` - ``predict_raw_score`` :raw-html:`<a id="predict_raw_score" title="Permalink to this parameter" href="#predict_raw_score">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool, aliases: ``is_predict_raw_score``, ``predict_rawscore``, ``raw_score``
- used only in ``prediction`` task - used only in ``prediction`` task
...@@ -731,13 +739,17 @@ IO Parameters ...@@ -731,13 +739,17 @@ IO Parameters
- **Note**: unlike the shap package, with ``predict_contrib`` we return a matrix with an extra column, where the last column is the expected value - **Note**: unlike the shap package, with ``predict_contrib`` we return a matrix with an extra column, where the last column is the expected value
- ``num_iteration_predict`` :raw-html:`<a id="num_iteration_predict" title="Permalink to this parameter" href="#num_iteration_predict">&#x1F517;&#xFE0E;</a>`, default = ``-1``, type = int - ``predict_disable_shape_check`` :raw-html:`<a id="predict_disable_shape_check" title="Permalink to this parameter" href="#predict_disable_shape_check">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
- used only in ``prediction`` task - used only in ``prediction`` task
- used to specify how many trained iterations will be used in prediction - control whether or not LightGBM raises an error when you try to predict on data with a different number of features than the training data
- ``<= 0`` means no limit - if ``false`` (the default), a fatal error will be raised if the number of features in the dataset you predict on differs from the number seen during training
- if ``true``, LightGBM will attempt to predict on whatever data you provide. This is dangerous because you might get incorrect predictions, but you could use it in situations where it is difficult or expensive to generate some features and you are very confident that they were never chosen for splits in the model
- **Note**: be very careful setting this parameter to ``true``
- ``pred_early_stop`` :raw-html:`<a id="pred_early_stop" title="Permalink to this parameter" href="#pred_early_stop">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool - ``pred_early_stop`` :raw-html:`<a id="pred_early_stop" title="Permalink to this parameter" href="#pred_early_stop">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool
...@@ -757,17 +769,16 @@ IO Parameters ...@@ -757,17 +769,16 @@ IO Parameters
- the threshold of margin in early-stopping prediction - the threshold of margin in early-stopping prediction
- ``predict_disable_shape_check`` :raw-html:`<a id="predict_disable_shape_check" title="Permalink to this parameter" href="#predict_disable_shape_check">&#x1F517;&#xFE0E;</a>`, default = ``false``, type = bool - ``output_result`` :raw-html:`<a id="output_result" title="Permalink to this parameter" href="#output_result">&#x1F517;&#xFE0E;</a>`, default = ``LightGBM_predict_result.txt``, type = string, aliases: ``predict_result``, ``prediction_result``, ``predict_name``, ``prediction_name``, ``pred_name``, ``name_pred``
- used only in ``prediction`` task - used only in ``prediction`` task
- control whether or not LightGBM raises an error when you try to predict on data with a different number of features than the training data - filename of prediction result
- if ``false`` (the default), a fatal error will be raised if the number of features in the dataset you predict on differs from the number seen during training - **Note**: can be used only in CLI version
- if ``true``, LightGBM will attempt to predict on whatever data you provide. This is dangerous because you might get incorrect predictions, but you could use it in situations where it is difficult or expensive to generate some features and you are very confident that they were never chosen for splits in the model
- **Note**: be very careful setting this parameter to ``true`` Convert Parameters
~~~~~~~~~~~~~~~~~~
- ``convert_model_language`` :raw-html:`<a id="convert_model_language" title="Permalink to this parameter" href="#convert_model_language">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string - ``convert_model_language`` :raw-html:`<a id="convert_model_language" title="Permalink to this parameter" href="#convert_model_language">&#x1F517;&#xFE0E;</a>`, default = ``""``, type = string
......
...@@ -24,6 +24,7 @@ def get_parameter_infos(config_hpp): ...@@ -24,6 +24,7 @@ def get_parameter_infos(config_hpp):
""" """
is_inparameter = False is_inparameter = False
cur_key = None cur_key = None
key_lvl = 0
cur_info = {} cur_info = {}
keys = [] keys = []
member_infos = [] member_infos = []
...@@ -32,10 +33,12 @@ def get_parameter_infos(config_hpp): ...@@ -32,10 +33,12 @@ def get_parameter_infos(config_hpp):
if "#pragma region Parameters" in line: if "#pragma region Parameters" in line:
is_inparameter = True is_inparameter = True
elif "#pragma region" in line and "Parameters" in line: elif "#pragma region" in line and "Parameters" in line:
key_lvl += 1
cur_key = line.split("region")[1].strip() cur_key = line.split("region")[1].strip()
keys.append(cur_key) keys.append((cur_key, key_lvl))
member_infos.append([]) member_infos.append([])
elif '#pragma endregion' in line: elif '#pragma endregion' in line:
key_lvl -= 1
if cur_key is not None: if cur_key is not None:
cur_key = None cur_key = None
elif is_inparameter: elif is_inparameter:
...@@ -196,8 +199,10 @@ def gen_parameter_description(sections, descriptions, params_rst): ...@@ -196,8 +199,10 @@ def gen_parameter_description(sections, descriptions, params_rst):
return check[idx:], check[:idx] return check[idx:], check[:idx]
params_to_write = [] params_to_write = []
for section_name, section_params in zip(sections, descriptions): lvl_mapper = {1: '-', 2: '~'}
params_to_write.append('{0}\n{1}'.format(section_name, '-' * len(section_name))) for (section_name, section_lvl), section_params in zip(sections, descriptions):
heading_sign = lvl_mapper[section_lvl]
params_to_write.append('{0}\n{1}'.format(section_name, heading_sign * len(section_name)))
for param_desc in section_params: for param_desc in section_params:
name = param_desc['name'][0] name = param_desc['name'][0]
default_raw = param_desc['default'][0] default_raw = param_desc['default'][0]
......
...@@ -3,7 +3,8 @@ ...@@ -3,7 +3,8 @@
* Licensed under the MIT License. See LICENSE file in the project root for license information. * Licensed under the MIT License. See LICENSE file in the project root for license information.
* *
* \note * \note
* desc and descl2 fields must be written in reStructuredText format * desc and descl2 fields must be written in reStructuredText format;
* nested sections can be placed only at the bottom of parent's section
*/ */
#ifndef LIGHTGBM_CONFIG_H_ #ifndef LIGHTGBM_CONFIG_H_
#define LIGHTGBM_CONFIG_H_ #define LIGHTGBM_CONFIG_H_
...@@ -235,6 +236,11 @@ struct Config { ...@@ -235,6 +236,11 @@ struct Config {
// desc = **Note**: this parameter cannot be used at the same time with ``force_col_wise``, choose only one of them // desc = **Note**: this parameter cannot be used at the same time with ``force_col_wise``, choose only one of them
bool force_row_wise = false; bool force_row_wise = false;
// alias = hist_pool_size
// desc = max cache size in MB for historical histogram
// desc = ``< 0`` means no limit
double histogram_pool_size = -1.0;
// desc = limit the max depth for tree model. This is used to deal with over-fitting when ``#data`` is small. Tree still grows leaf-wise // desc = limit the max depth for tree model. This is used to deal with over-fitting when ``#data`` is small. Tree still grows leaf-wise
// desc = ``<= 0`` means no limit // desc = ``<= 0`` means no limit
int max_depth = -1; int max_depth = -1;
...@@ -443,11 +449,6 @@ struct Config { ...@@ -443,11 +449,6 @@ struct Config {
// desc = see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/binary_classification/forced_splits.json>`__ as an example // desc = see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/binary_classification/forced_splits.json>`__ as an example
std::string forcedsplits_filename = ""; std::string forcedsplits_filename = "";
// desc = path to a ``.json`` file that specifies bin upper bounds for some or all features
// desc = ``.json`` file should contain an array of objects, each containing the word ``feature`` (integer feature index) and ``bin_upper_bound`` (array of thresholds for binning)
// desc = see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/regression/forced_bins.json>`__ as an example
std::string forcedbins_filename = "";
// check = >=0.0 // check = >=0.0
// check = <=1.0 // check = <=1.0
// desc = decay rate of ``refit`` task, will use ``leaf_output = refit_decay_rate * old_leaf_output + (1.0 - refit_decay_rate) * new_leaf_output`` to refit trees // desc = decay rate of ``refit`` task, will use ``leaf_output = refit_decay_rate * old_leaf_output + (1.0 - refit_decay_rate) * new_leaf_output`` to refit trees
...@@ -474,26 +475,41 @@ struct Config { ...@@ -474,26 +475,41 @@ struct Config {
// desc = applied once per forest // desc = applied once per forest
std::vector<double> cegb_penalty_feature_coupled; std::vector<double> cegb_penalty_feature_coupled;
#pragma endregion
#pragma region IO Parameters
// alias = verbose // alias = verbose
// desc = controls the level of LightGBM's verbosity // desc = controls the level of LightGBM's verbosity
// desc = ``< 0``: Fatal, ``= 0``: Error (Warning), ``= 1``: Info, ``> 1``: Debug // desc = ``< 0``: Fatal, ``= 0``: Error (Warning), ``= 1``: Info, ``> 1``: Debug
int verbosity = 1; int verbosity = 1;
// alias = model_input, model_in
// desc = filename of input model
// desc = for ``prediction`` task, this model will be applied to prediction data
// desc = for ``train`` task, training will be continued from this model
// desc = **Note**: can be used only in CLI version
std::string input_model = "";
// alias = model_output, model_out
// desc = filename of output model in training
// desc = **Note**: can be used only in CLI version
std::string output_model = "LightGBM_model.txt";
// alias = save_period
// desc = frequency of saving model file snapshot
// desc = set this to positive value to enable this function. For example, the model file will be snapshotted at each iteration if ``snapshot_freq=1``
// desc = **Note**: can be used only in CLI version
int snapshot_freq = -1;
#pragma endregion
#pragma region IO Parameters
#pragma region Dataset Parameters
// check = >1 // check = >1
// desc = max number of bins that feature values will be bucketed in // desc = max number of bins that feature values will be bucketed in
// desc = small number of bins may reduce training accuracy but may increase general power (deal with over-fitting) // desc = small number of bins may reduce training accuracy but may increase general power (deal with over-fitting)
// desc = LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255`` // desc = LightGBM will auto compress memory according to ``max_bin``. For example, LightGBM will use ``uint8_t`` for feature value if ``max_bin=255``
int max_bin = 255; int max_bin = 255;
// alias = is_sparse, enable_sparse, sparse
// desc = used to enable/disable sparse optimization
bool is_enable_sparse = true;
// type = multi-int // type = multi-int
// default = None // default = None
// desc = max number of bins for each feature // desc = max number of bins for each feature
...@@ -505,11 +521,6 @@ struct Config { ...@@ -505,11 +521,6 @@ struct Config {
// desc = use this to avoid one-data-one-bin (potential over-fitting) // desc = use this to avoid one-data-one-bin (potential over-fitting)
int min_data_in_bin = 3; int min_data_in_bin = 3;
// desc = set this to ``true`` to pre-filter the unsplittable features by ``min_data_in_leaf``
// desc = as dataset object is initialized only once and cannot be changed after that, you may need to set this to ``false`` when searching parameters with ``min_data_in_leaf``, otherwise features are filtered by ``min_data_in_leaf`` firstly if you don't reconstruct dataset object
// desc = **Note**: setting this to ``false`` may slow down the training
bool feature_pre_filter = true;
// alias = subsample_for_bin // alias = subsample_for_bin
// check = >0 // check = >0
// desc = number of data that sampled to construct histogram bins // desc = number of data that sampled to construct histogram bins
...@@ -517,42 +528,13 @@ struct Config { ...@@ -517,42 +528,13 @@ struct Config {
// desc = set this to larger value if data is very sparse // desc = set this to larger value if data is very sparse
int bin_construct_sample_cnt = 200000; int bin_construct_sample_cnt = 200000;
// alias = hist_pool_size
// desc = max cache size in MB for historical histogram
// desc = ``< 0`` means no limit
double histogram_pool_size = -1.0;
// alias = data_seed // alias = data_seed
// desc = random seed for data partition in parallel learning (excluding the ``feature_parallel`` mode) // desc = random seed for data partition in parallel learning (excluding the ``feature_parallel`` mode)
int data_random_seed = 1; int data_random_seed = 1;
// alias = model_output, model_out // alias = is_sparse, enable_sparse, sparse
// desc = filename of output model in training // desc = used to enable/disable sparse optimization
// desc = **Note**: can be used only in CLI version bool is_enable_sparse = true;
std::string output_model = "LightGBM_model.txt";
// alias = save_period
// desc = frequency of saving model file snapshot
// desc = set this to positive value to enable this function. For example, the model file will be snapshotted at each iteration if ``snapshot_freq=1``
// desc = **Note**: can be used only in CLI version
int snapshot_freq = -1;
// alias = model_input, model_in
// desc = filename of input model
// desc = for ``prediction`` task, this model will be applied to prediction data
// desc = for ``train`` task, training will be continued from this model
// desc = **Note**: can be used only in CLI version
std::string input_model = "";
// alias = predict_result, prediction_result, predict_name, prediction_name, pred_name, name_pred
// desc = filename of prediction result in ``prediction`` task
// desc = **Note**: can be used only in CLI version
std::string output_result = "LightGBM_predict_result.txt";
// alias = is_pre_partition
// desc = used for parallel learning (excluding the ``feature_parallel`` mode)
// desc = ``true`` if training data are pre-partitioned, and different machines use different partitions
bool pre_partition = false;
// alias = is_enable_bundle, bundle // alias = is_enable_bundle, bundle
// desc = set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`__ // desc = set this to ``false`` to disable Exclusive Feature Bundling (EFB), which is described in `LightGBM: A Highly Efficient Gradient Boosting Decision Tree <https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree>`__
...@@ -566,18 +548,22 @@ struct Config { ...@@ -566,18 +548,22 @@ struct Config {
// desc = set this to ``false`` to use ``na`` for representing missing values // desc = set this to ``false`` to use ``na`` for representing missing values
bool zero_as_missing = false; bool zero_as_missing = false;
// desc = set this to ``true`` to pre-filter the unsplittable features by ``min_data_in_leaf``
// desc = as dataset object is initialized only once and cannot be changed after that, you may need to set this to ``false`` when searching parameters with ``min_data_in_leaf``, otherwise features are filtered by ``min_data_in_leaf`` firstly if you don't reconstruct dataset object
// desc = **Note**: setting this to ``false`` may slow down the training
bool feature_pre_filter = true;
// alias = is_pre_partition
// desc = used for parallel learning (excluding the ``feature_parallel`` mode)
// desc = ``true`` if training data are pre-partitioned, and different machines use different partitions
bool pre_partition = false;
// alias = two_round_loading, use_two_round_loading // alias = two_round_loading, use_two_round_loading
// desc = set this to ``true`` if data file is too big to fit in memory // desc = set this to ``true`` if data file is too big to fit in memory
// desc = by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed, but may cause run out of memory error when the data file is very big // desc = by default, LightGBM will map data file to memory and load features from memory. This will provide faster data loading speed, but may cause run out of memory error when the data file is very big
// desc = **Note**: works only in case of loading data directly from file // desc = **Note**: works only in case of loading data directly from file
bool two_round = false; bool two_round = false;
// alias = is_save_binary, is_save_binary_file
// desc = if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time
// desc = **Note**: ``init_score`` is not saved in binary file
// desc = **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function
bool save_binary = false;
// alias = has_header // alias = has_header
// desc = set this to ``true`` if input data has header // desc = set this to ``true`` if input data has header
// desc = **Note**: works only in case of loading data directly from file // desc = **Note**: works only in case of loading data directly from file
...@@ -633,6 +619,26 @@ struct Config { ...@@ -633,6 +619,26 @@ struct Config {
// desc = **Note**: the output cannot be monotonically constrained with respect to a categorical feature // desc = **Note**: the output cannot be monotonically constrained with respect to a categorical feature
std::string categorical_feature = ""; std::string categorical_feature = "";
// desc = path to a ``.json`` file that specifies bin upper bounds for some or all features
// desc = ``.json`` file should contain an array of objects, each containing the word ``feature`` (integer feature index) and ``bin_upper_bound`` (array of thresholds for binning)
// desc = see `this file <https://github.com/microsoft/LightGBM/tree/master/examples/regression/forced_bins.json>`__ as an example
std::string forcedbins_filename = "";
// alias = is_save_binary, is_save_binary_file
// desc = if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time
// desc = **Note**: ``init_score`` is not saved in binary file
// desc = **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function
bool save_binary = false;
#pragma endregion
#pragma region Predict Parameters
// desc = used only in ``prediction`` task
// desc = used to specify how many trained iterations will be used in prediction
// desc = ``<= 0`` means no limit
int num_iteration_predict = -1;
// alias = is_predict_raw_score, predict_rawscore, raw_score // alias = is_predict_raw_score, predict_rawscore, raw_score
// desc = used only in ``prediction`` task // desc = used only in ``prediction`` task
// desc = set this to ``true`` to predict only the raw scores // desc = set this to ``true`` to predict only the raw scores
...@@ -653,9 +659,11 @@ struct Config { ...@@ -653,9 +659,11 @@ struct Config {
bool predict_contrib = false; bool predict_contrib = false;
// desc = used only in ``prediction`` task // desc = used only in ``prediction`` task
// desc = used to specify how many trained iterations will be used in prediction // desc = control whether or not LightGBM raises an error when you try to predict on data with a different number of features than the training data
// desc = ``<= 0`` means no limit // desc = if ``false`` (the default), a fatal error will be raised if the number of features in the dataset you predict on differs from the number seen during training
int num_iteration_predict = -1; // desc = if ``true``, LightGBM will attempt to predict on whatever data you provide. This is dangerous because you might get incorrect predictions, but you could use it in situations where it is difficult or expensive to generate some features and you are very confident that they were never chosen for splits in the model
// desc = **Note**: be very careful setting this parameter to ``true``
bool predict_disable_shape_check = false;
// desc = used only in ``prediction`` task // desc = used only in ``prediction`` task
// desc = if ``true``, will use early-stopping to speed up the prediction. May affect the accuracy // desc = if ``true``, will use early-stopping to speed up the prediction. May affect the accuracy
...@@ -669,12 +677,15 @@ struct Config { ...@@ -669,12 +677,15 @@ struct Config {
// desc = the threshold of margin in early-stopping prediction // desc = the threshold of margin in early-stopping prediction
double pred_early_stop_margin = 10.0; double pred_early_stop_margin = 10.0;
// alias = predict_result, prediction_result, predict_name, prediction_name, pred_name, name_pred
// desc = used only in ``prediction`` task // desc = used only in ``prediction`` task
// desc = control whether or not LightGBM raises an error when you try to predict on data with a different number of features than the training data // desc = filename of prediction result
// desc = if ``false`` (the default), a fatal error will be raised if the number of features in the dataset you predict on differs from the number seen during training // desc = **Note**: can be used only in CLI version
// desc = if ``true``, LightGBM will attempt to predict on whatever data you provide. This is dangerous because you might get incorrect predictions, but you could use it in situations where it is difficult or expensive to generate some features and you are very confident that they were never chosen for splits in the model std::string output_result = "LightGBM_predict_result.txt";
// desc = **Note**: be very careful setting this parameter to ``true``
bool predict_disable_shape_check = false; #pragma endregion
#pragma region Convert Parameters
// desc = used only in ``convert_model`` task // desc = used only in ``convert_model`` task
// desc = only ``cpp`` is supported yet; for conversion model to other languages consider using `m2cgen <https://github.com/BayesWitnesses/m2cgen>`__ utility // desc = only ``cpp`` is supported yet; for conversion model to other languages consider using `m2cgen <https://github.com/BayesWitnesses/m2cgen>`__ utility
...@@ -690,6 +701,8 @@ struct Config { ...@@ -690,6 +701,8 @@ struct Config {
#pragma endregion #pragma endregion
#pragma endregion
#pragma region Objective Parameters #pragma region Objective Parameters
// check = >0 // check = >0
......
...@@ -49,6 +49,7 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() { ...@@ -49,6 +49,7 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"device", "device_type"}, {"device", "device_type"},
{"random_seed", "seed"}, {"random_seed", "seed"},
{"random_state", "seed"}, {"random_state", "seed"},
{"hist_pool_size", "histogram_pool_size"},
{"min_data_per_leaf", "min_data_in_leaf"}, {"min_data_per_leaf", "min_data_in_leaf"},
{"min_data", "min_data_in_leaf"}, {"min_data", "min_data_in_leaf"},
{"min_child_samples", "min_data_in_leaf"}, {"min_child_samples", "min_data_in_leaf"},
...@@ -93,30 +94,21 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() { ...@@ -93,30 +94,21 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"forced_splits_file", "forcedsplits_filename"}, {"forced_splits_file", "forcedsplits_filename"},
{"forced_splits", "forcedsplits_filename"}, {"forced_splits", "forcedsplits_filename"},
{"verbose", "verbosity"}, {"verbose", "verbosity"},
{"is_sparse", "is_enable_sparse"}, {"model_input", "input_model"},
{"enable_sparse", "is_enable_sparse"}, {"model_in", "input_model"},
{"sparse", "is_enable_sparse"},
{"subsample_for_bin", "bin_construct_sample_cnt"},
{"hist_pool_size", "histogram_pool_size"},
{"data_seed", "data_random_seed"},
{"model_output", "output_model"}, {"model_output", "output_model"},
{"model_out", "output_model"}, {"model_out", "output_model"},
{"save_period", "snapshot_freq"}, {"save_period", "snapshot_freq"},
{"model_input", "input_model"}, {"subsample_for_bin", "bin_construct_sample_cnt"},
{"model_in", "input_model"}, {"data_seed", "data_random_seed"},
{"predict_result", "output_result"}, {"is_sparse", "is_enable_sparse"},
{"prediction_result", "output_result"}, {"enable_sparse", "is_enable_sparse"},
{"predict_name", "output_result"}, {"sparse", "is_enable_sparse"},
{"prediction_name", "output_result"},
{"pred_name", "output_result"},
{"name_pred", "output_result"},
{"is_pre_partition", "pre_partition"},
{"is_enable_bundle", "enable_bundle"}, {"is_enable_bundle", "enable_bundle"},
{"bundle", "enable_bundle"}, {"bundle", "enable_bundle"},
{"is_pre_partition", "pre_partition"},
{"two_round_loading", "two_round"}, {"two_round_loading", "two_round"},
{"use_two_round_loading", "two_round"}, {"use_two_round_loading", "two_round"},
{"is_save_binary", "save_binary"},
{"is_save_binary_file", "save_binary"},
{"has_header", "header"}, {"has_header", "header"},
{"label", "label_column"}, {"label", "label_column"},
{"weight", "weight_column"}, {"weight", "weight_column"},
...@@ -130,6 +122,8 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() { ...@@ -130,6 +122,8 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"cat_feature", "categorical_feature"}, {"cat_feature", "categorical_feature"},
{"categorical_column", "categorical_feature"}, {"categorical_column", "categorical_feature"},
{"cat_column", "categorical_feature"}, {"cat_column", "categorical_feature"},
{"is_save_binary", "save_binary"},
{"is_save_binary_file", "save_binary"},
{"is_predict_raw_score", "predict_raw_score"}, {"is_predict_raw_score", "predict_raw_score"},
{"predict_rawscore", "predict_raw_score"}, {"predict_rawscore", "predict_raw_score"},
{"raw_score", "predict_raw_score"}, {"raw_score", "predict_raw_score"},
...@@ -137,6 +131,12 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() { ...@@ -137,6 +131,12 @@ const std::unordered_map<std::string, std::string>& Config::alias_table() {
{"leaf_index", "predict_leaf_index"}, {"leaf_index", "predict_leaf_index"},
{"is_predict_contrib", "predict_contrib"}, {"is_predict_contrib", "predict_contrib"},
{"contrib", "predict_contrib"}, {"contrib", "predict_contrib"},
{"predict_result", "output_result"},
{"prediction_result", "output_result"},
{"predict_name", "output_result"},
{"prediction_name", "output_result"},
{"pred_name", "output_result"},
{"name_pred", "output_result"},
{"convert_model_file", "convert_model"}, {"convert_model_file", "convert_model"},
{"num_classes", "num_class"}, {"num_classes", "num_class"},
{"unbalance", "is_unbalance"}, {"unbalance", "is_unbalance"},
...@@ -180,6 +180,7 @@ const std::unordered_set<std::string>& Config::parameter_set() { ...@@ -180,6 +180,7 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"seed", "seed",
"force_col_wise", "force_col_wise",
"force_row_wise", "force_row_wise",
"histogram_pool_size",
"max_depth", "max_depth",
"min_data_in_leaf", "min_data_in_leaf",
"min_sum_hessian_in_leaf", "min_sum_hessian_in_leaf",
...@@ -216,45 +217,44 @@ const std::unordered_set<std::string>& Config::parameter_set() { ...@@ -216,45 +217,44 @@ const std::unordered_set<std::string>& Config::parameter_set() {
"monotone_constraints", "monotone_constraints",
"feature_contri", "feature_contri",
"forcedsplits_filename", "forcedsplits_filename",
"forcedbins_filename",
"refit_decay_rate", "refit_decay_rate",
"cegb_tradeoff", "cegb_tradeoff",
"cegb_penalty_split", "cegb_penalty_split",
"cegb_penalty_feature_lazy", "cegb_penalty_feature_lazy",
"cegb_penalty_feature_coupled", "cegb_penalty_feature_coupled",
"verbosity", "verbosity",
"input_model",
"output_model",
"snapshot_freq",
"max_bin", "max_bin",
"is_enable_sparse",
"max_bin_by_feature", "max_bin_by_feature",
"min_data_in_bin", "min_data_in_bin",
"feature_pre_filter",
"bin_construct_sample_cnt", "bin_construct_sample_cnt",
"histogram_pool_size",
"data_random_seed", "data_random_seed",
"output_model", "is_enable_sparse",
"snapshot_freq",
"input_model",
"output_result",
"pre_partition",
"enable_bundle", "enable_bundle",
"use_missing", "use_missing",
"zero_as_missing", "zero_as_missing",
"feature_pre_filter",
"pre_partition",
"two_round", "two_round",
"save_binary",
"header", "header",
"label_column", "label_column",
"weight_column", "weight_column",
"group_column", "group_column",
"ignore_column", "ignore_column",
"categorical_feature", "categorical_feature",
"forcedbins_filename",
"save_binary",
"num_iteration_predict",
"predict_raw_score", "predict_raw_score",
"predict_leaf_index", "predict_leaf_index",
"predict_contrib", "predict_contrib",
"num_iteration_predict", "predict_disable_shape_check",
"pred_early_stop", "pred_early_stop",
"pred_early_stop_freq", "pred_early_stop_freq",
"pred_early_stop_margin", "pred_early_stop_margin",
"predict_disable_shape_check", "output_result",
"convert_model_language", "convert_model_language",
"convert_model", "convert_model",
"num_class", "num_class",
...@@ -313,6 +313,8 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -313,6 +313,8 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetBool(params, "force_row_wise", &force_row_wise); GetBool(params, "force_row_wise", &force_row_wise);
GetDouble(params, "histogram_pool_size", &histogram_pool_size);
GetInt(params, "max_depth", &max_depth); GetInt(params, "max_depth", &max_depth);
GetInt(params, "min_data_in_leaf", &min_data_in_leaf); GetInt(params, "min_data_in_leaf", &min_data_in_leaf);
...@@ -418,8 +420,6 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -418,8 +420,6 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetString(params, "forcedsplits_filename", &forcedsplits_filename); GetString(params, "forcedsplits_filename", &forcedsplits_filename);
GetString(params, "forcedbins_filename", &forcedbins_filename);
GetDouble(params, "refit_decay_rate", &refit_decay_rate); GetDouble(params, "refit_decay_rate", &refit_decay_rate);
CHECK(refit_decay_rate >=0.0); CHECK(refit_decay_rate >=0.0);
CHECK(refit_decay_rate <=1.0); CHECK(refit_decay_rate <=1.0);
...@@ -440,11 +440,15 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -440,11 +440,15 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetInt(params, "verbosity", &verbosity); GetInt(params, "verbosity", &verbosity);
GetString(params, "input_model", &input_model);
GetString(params, "output_model", &output_model);
GetInt(params, "snapshot_freq", &snapshot_freq);
GetInt(params, "max_bin", &max_bin); GetInt(params, "max_bin", &max_bin);
CHECK(max_bin >1); CHECK(max_bin >1);
GetBool(params, "is_enable_sparse", &is_enable_sparse);
if (GetString(params, "max_bin_by_feature", &tmp_str)) { if (GetString(params, "max_bin_by_feature", &tmp_str)) {
max_bin_by_feature = Common::StringToArray<int32_t>(tmp_str, ','); max_bin_by_feature = Common::StringToArray<int32_t>(tmp_str, ',');
} }
...@@ -452,24 +456,12 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -452,24 +456,12 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetInt(params, "min_data_in_bin", &min_data_in_bin); GetInt(params, "min_data_in_bin", &min_data_in_bin);
CHECK(min_data_in_bin >0); CHECK(min_data_in_bin >0);
GetBool(params, "feature_pre_filter", &feature_pre_filter);
GetInt(params, "bin_construct_sample_cnt", &bin_construct_sample_cnt); GetInt(params, "bin_construct_sample_cnt", &bin_construct_sample_cnt);
CHECK(bin_construct_sample_cnt >0); CHECK(bin_construct_sample_cnt >0);
GetDouble(params, "histogram_pool_size", &histogram_pool_size);
GetInt(params, "data_random_seed", &data_random_seed); GetInt(params, "data_random_seed", &data_random_seed);
GetString(params, "output_model", &output_model); GetBool(params, "is_enable_sparse", &is_enable_sparse);
GetInt(params, "snapshot_freq", &snapshot_freq);
GetString(params, "input_model", &input_model);
GetString(params, "output_result", &output_result);
GetBool(params, "pre_partition", &pre_partition);
GetBool(params, "enable_bundle", &enable_bundle); GetBool(params, "enable_bundle", &enable_bundle);
...@@ -477,9 +469,11 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -477,9 +469,11 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetBool(params, "zero_as_missing", &zero_as_missing); GetBool(params, "zero_as_missing", &zero_as_missing);
GetBool(params, "two_round", &two_round); GetBool(params, "feature_pre_filter", &feature_pre_filter);
GetBool(params, "save_binary", &save_binary); GetBool(params, "pre_partition", &pre_partition);
GetBool(params, "two_round", &two_round);
GetBool(params, "header", &header); GetBool(params, "header", &header);
...@@ -493,13 +487,19 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -493,13 +487,19 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetString(params, "categorical_feature", &categorical_feature); GetString(params, "categorical_feature", &categorical_feature);
GetString(params, "forcedbins_filename", &forcedbins_filename);
GetBool(params, "save_binary", &save_binary);
GetInt(params, "num_iteration_predict", &num_iteration_predict);
GetBool(params, "predict_raw_score", &predict_raw_score); GetBool(params, "predict_raw_score", &predict_raw_score);
GetBool(params, "predict_leaf_index", &predict_leaf_index); GetBool(params, "predict_leaf_index", &predict_leaf_index);
GetBool(params, "predict_contrib", &predict_contrib); GetBool(params, "predict_contrib", &predict_contrib);
GetInt(params, "num_iteration_predict", &num_iteration_predict); GetBool(params, "predict_disable_shape_check", &predict_disable_shape_check);
GetBool(params, "pred_early_stop", &pred_early_stop); GetBool(params, "pred_early_stop", &pred_early_stop);
...@@ -507,7 +507,7 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str ...@@ -507,7 +507,7 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str
GetDouble(params, "pred_early_stop_margin", &pred_early_stop_margin); GetDouble(params, "pred_early_stop_margin", &pred_early_stop_margin);
GetBool(params, "predict_disable_shape_check", &predict_disable_shape_check); GetString(params, "output_result", &output_result);
GetString(params, "convert_model_language", &convert_model_language); GetString(params, "convert_model_language", &convert_model_language);
...@@ -598,6 +598,7 @@ std::string Config::SaveMembersToString() const { ...@@ -598,6 +598,7 @@ std::string Config::SaveMembersToString() const {
str_buf << "[num_threads: " << num_threads << "]\n"; str_buf << "[num_threads: " << num_threads << "]\n";
str_buf << "[force_col_wise: " << force_col_wise << "]\n"; str_buf << "[force_col_wise: " << force_col_wise << "]\n";
str_buf << "[force_row_wise: " << force_row_wise << "]\n"; str_buf << "[force_row_wise: " << force_row_wise << "]\n";
str_buf << "[histogram_pool_size: " << histogram_pool_size << "]\n";
str_buf << "[max_depth: " << max_depth << "]\n"; str_buf << "[max_depth: " << max_depth << "]\n";
str_buf << "[min_data_in_leaf: " << min_data_in_leaf << "]\n"; str_buf << "[min_data_in_leaf: " << min_data_in_leaf << "]\n";
str_buf << "[min_sum_hessian_in_leaf: " << min_sum_hessian_in_leaf << "]\n"; str_buf << "[min_sum_hessian_in_leaf: " << min_sum_hessian_in_leaf << "]\n";
...@@ -634,45 +635,44 @@ std::string Config::SaveMembersToString() const { ...@@ -634,45 +635,44 @@ std::string Config::SaveMembersToString() const {
str_buf << "[monotone_constraints: " << Common::Join(Common::ArrayCast<int8_t, int>(monotone_constraints), ",") << "]\n"; str_buf << "[monotone_constraints: " << Common::Join(Common::ArrayCast<int8_t, int>(monotone_constraints), ",") << "]\n";
str_buf << "[feature_contri: " << Common::Join(feature_contri, ",") << "]\n"; str_buf << "[feature_contri: " << Common::Join(feature_contri, ",") << "]\n";
str_buf << "[forcedsplits_filename: " << forcedsplits_filename << "]\n"; str_buf << "[forcedsplits_filename: " << forcedsplits_filename << "]\n";
str_buf << "[forcedbins_filename: " << forcedbins_filename << "]\n";
str_buf << "[refit_decay_rate: " << refit_decay_rate << "]\n"; str_buf << "[refit_decay_rate: " << refit_decay_rate << "]\n";
str_buf << "[cegb_tradeoff: " << cegb_tradeoff << "]\n"; str_buf << "[cegb_tradeoff: " << cegb_tradeoff << "]\n";
str_buf << "[cegb_penalty_split: " << cegb_penalty_split << "]\n"; str_buf << "[cegb_penalty_split: " << cegb_penalty_split << "]\n";
str_buf << "[cegb_penalty_feature_lazy: " << Common::Join(cegb_penalty_feature_lazy, ",") << "]\n"; str_buf << "[cegb_penalty_feature_lazy: " << Common::Join(cegb_penalty_feature_lazy, ",") << "]\n";
str_buf << "[cegb_penalty_feature_coupled: " << Common::Join(cegb_penalty_feature_coupled, ",") << "]\n"; str_buf << "[cegb_penalty_feature_coupled: " << Common::Join(cegb_penalty_feature_coupled, ",") << "]\n";
str_buf << "[verbosity: " << verbosity << "]\n"; str_buf << "[verbosity: " << verbosity << "]\n";
str_buf << "[input_model: " << input_model << "]\n";
str_buf << "[output_model: " << output_model << "]\n";
str_buf << "[snapshot_freq: " << snapshot_freq << "]\n";
str_buf << "[max_bin: " << max_bin << "]\n"; str_buf << "[max_bin: " << max_bin << "]\n";
str_buf << "[is_enable_sparse: " << is_enable_sparse << "]\n";
str_buf << "[max_bin_by_feature: " << Common::Join(max_bin_by_feature, ",") << "]\n"; str_buf << "[max_bin_by_feature: " << Common::Join(max_bin_by_feature, ",") << "]\n";
str_buf << "[min_data_in_bin: " << min_data_in_bin << "]\n"; str_buf << "[min_data_in_bin: " << min_data_in_bin << "]\n";
str_buf << "[feature_pre_filter: " << feature_pre_filter << "]\n";
str_buf << "[bin_construct_sample_cnt: " << bin_construct_sample_cnt << "]\n"; str_buf << "[bin_construct_sample_cnt: " << bin_construct_sample_cnt << "]\n";
str_buf << "[histogram_pool_size: " << histogram_pool_size << "]\n";
str_buf << "[data_random_seed: " << data_random_seed << "]\n"; str_buf << "[data_random_seed: " << data_random_seed << "]\n";
str_buf << "[output_model: " << output_model << "]\n"; str_buf << "[is_enable_sparse: " << is_enable_sparse << "]\n";
str_buf << "[snapshot_freq: " << snapshot_freq << "]\n";
str_buf << "[input_model: " << input_model << "]\n";
str_buf << "[output_result: " << output_result << "]\n";
str_buf << "[pre_partition: " << pre_partition << "]\n";
str_buf << "[enable_bundle: " << enable_bundle << "]\n"; str_buf << "[enable_bundle: " << enable_bundle << "]\n";
str_buf << "[use_missing: " << use_missing << "]\n"; str_buf << "[use_missing: " << use_missing << "]\n";
str_buf << "[zero_as_missing: " << zero_as_missing << "]\n"; str_buf << "[zero_as_missing: " << zero_as_missing << "]\n";
str_buf << "[feature_pre_filter: " << feature_pre_filter << "]\n";
str_buf << "[pre_partition: " << pre_partition << "]\n";
str_buf << "[two_round: " << two_round << "]\n"; str_buf << "[two_round: " << two_round << "]\n";
str_buf << "[save_binary: " << save_binary << "]\n";
str_buf << "[header: " << header << "]\n"; str_buf << "[header: " << header << "]\n";
str_buf << "[label_column: " << label_column << "]\n"; str_buf << "[label_column: " << label_column << "]\n";
str_buf << "[weight_column: " << weight_column << "]\n"; str_buf << "[weight_column: " << weight_column << "]\n";
str_buf << "[group_column: " << group_column << "]\n"; str_buf << "[group_column: " << group_column << "]\n";
str_buf << "[ignore_column: " << ignore_column << "]\n"; str_buf << "[ignore_column: " << ignore_column << "]\n";
str_buf << "[categorical_feature: " << categorical_feature << "]\n"; str_buf << "[categorical_feature: " << categorical_feature << "]\n";
str_buf << "[forcedbins_filename: " << forcedbins_filename << "]\n";
str_buf << "[save_binary: " << save_binary << "]\n";
str_buf << "[num_iteration_predict: " << num_iteration_predict << "]\n";
str_buf << "[predict_raw_score: " << predict_raw_score << "]\n"; str_buf << "[predict_raw_score: " << predict_raw_score << "]\n";
str_buf << "[predict_leaf_index: " << predict_leaf_index << "]\n"; str_buf << "[predict_leaf_index: " << predict_leaf_index << "]\n";
str_buf << "[predict_contrib: " << predict_contrib << "]\n"; str_buf << "[predict_contrib: " << predict_contrib << "]\n";
str_buf << "[num_iteration_predict: " << num_iteration_predict << "]\n"; str_buf << "[predict_disable_shape_check: " << predict_disable_shape_check << "]\n";
str_buf << "[pred_early_stop: " << pred_early_stop << "]\n"; str_buf << "[pred_early_stop: " << pred_early_stop << "]\n";
str_buf << "[pred_early_stop_freq: " << pred_early_stop_freq << "]\n"; str_buf << "[pred_early_stop_freq: " << pred_early_stop_freq << "]\n";
str_buf << "[pred_early_stop_margin: " << pred_early_stop_margin << "]\n"; str_buf << "[pred_early_stop_margin: " << pred_early_stop_margin << "]\n";
str_buf << "[predict_disable_shape_check: " << predict_disable_shape_check << "]\n"; str_buf << "[output_result: " << output_result << "]\n";
str_buf << "[convert_model_language: " << convert_model_language << "]\n"; str_buf << "[convert_model_language: " << convert_model_language << "]\n";
str_buf << "[convert_model: " << convert_model << "]\n"; str_buf << "[convert_model: " << convert_model << "]\n";
str_buf << "[num_class: " << num_class << "]\n"; str_buf << "[num_class: " << num_class << "]\n";
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment