Unverified Commit e79716e0 authored by Andrew Ziem's avatar Andrew Ziem Committed by GitHub
Browse files

Correct spelling (#4250)



* Correct spelling

Most changes were in comments, and there were a few changes to literals for log output.

There were no changes to variable names, function names, IDs, or functionality.

* Clarify a phrase in a comment
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Clarify a phrase in a comment
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>

* Correct spelling

Most are code comments, but one case is a literal in a logging message.

There are a few grammar fixes too.
Co-authored-by: default avatarJames Lamb <jaylamb20@gmail.com>
parent bb88d92e
...@@ -6,7 +6,7 @@ function Check-Output { ...@@ -6,7 +6,7 @@ function Check-Output {
} }
} }
# unify environment variable for Azure devops and AppVeyor # unify environment variable for Azure DevOps and AppVeyor
if (Test-Path env:APPVEYOR) { if (Test-Path env:APPVEYOR) {
$env:APPVEYOR = "true" $env:APPVEYOR = "true"
} }
...@@ -66,7 +66,7 @@ elseif ($env:TASK -eq "sdist") { ...@@ -66,7 +66,7 @@ elseif ($env:TASK -eq "sdist") {
} }
elseif ($env:TASK -eq "bdist") { elseif ($env:TASK -eq "bdist") {
# Import the Chocolatey profile module so that the RefreshEnv command # Import the Chocolatey profile module so that the RefreshEnv command
# invoked below properly updates the current PowerShell session enviroment. # invoked below properly updates the current PowerShell session environment.
$module = "$env:ChocolateyInstall\helpers\chocolateyProfile.psm1" $module = "$env:ChocolateyInstall\helpers\chocolateyProfile.psm1"
Import-Module "$module" ; Check-Output $? Import-Module "$module" ; Check-Output $?
RefreshEnv RefreshEnv
......
...@@ -26,7 +26,7 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith ...@@ -26,7 +26,7 @@ LightGBM is a gradient boosting framework that uses tree based learning algorith
For further details, please refer to [Features](https://github.com/microsoft/LightGBM/blob/master/docs/Features.rst). For further details, please refer to [Features](https://github.com/microsoft/LightGBM/blob/master/docs/Features.rst).
Benefitting from these advantages, LightGBM is being widely-used in many [winning solutions](https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions) of machine learning competitions. Benefiting from these advantages, LightGBM is being widely-used in many [winning solutions](https://github.com/microsoft/LightGBM/blob/master/examples/README.md#machine-learning-challenge-winning-solutions) of machine learning competitions.
[Comparison experiments](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, [distributed learning experiments](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings. [Comparison experiments](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#comparison-experiment) on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, [distributed learning experiments](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#parallel-experiment) show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
......
...@@ -8,9 +8,9 @@ Missing Value Handle ...@@ -8,9 +8,9 @@ Missing Value Handle
- LightGBM uses NA (NaN) to represent missing values by default. Change it to use zero by setting ``zero_as_missing=true``. - LightGBM uses NA (NaN) to represent missing values by default. Change it to use zero by setting ``zero_as_missing=true``.
- When ``zero_as_missing=false`` (default), the unshown values in sparse matrices (and LightSVM) are treated as zeros. - When ``zero_as_missing=false`` (default), the unrecorded values in sparse matrices (and LightSVM) are treated as zeros.
- When ``zero_as_missing=true``, NA and zeros (including unshown values in sparse matrices (and LightSVM)) are treated as missing. - When ``zero_as_missing=true``, NA and zeros (including unrecorded values in sparse matrices (and LightSVM)) are treated as missing.
Categorical Feature Support Categorical Feature Support
--------------------------- ---------------------------
......
...@@ -194,7 +194,7 @@ We used a terabyte click log dataset to conduct parallel experiments. Details ar ...@@ -194,7 +194,7 @@ We used a terabyte click log dataset to conduct parallel experiments. Details ar
+--------+-----------------------+---------+---------------+----------+ +--------+-----------------------+---------+---------------+----------+
This data contains 13 integer features and 26 categorical features for 24 days of click logs. This data contains 13 integer features and 26 categorical features for 24 days of click logs.
We statisticized the clickthrough rate (CTR) and count for these 26 categorical features from the first ten days. We statisticized the click-through rate (CTR) and count for these 26 categorical features from the first ten days.
Then we used next ten days' data, after replacing the categorical features by the corresponding CTR and count, as training data. Then we used next ten days' data, after replacing the categorical features by the corresponding CTR and count, as training data.
The processed training data have a total of 1.7 billions records and 67 features. The processed training data have a total of 1.7 billions records and 67 features.
......
...@@ -187,7 +187,7 @@ LightGBM supports the following applications: ...@@ -187,7 +187,7 @@ LightGBM supports the following applications:
- cross-entropy, the objective function is logloss and supports training on non-binary labels - cross-entropy, the objective function is logloss and supports training on non-binary labels
- lambdarank, the objective function is lambdarank with NDCG - LambdaRank, the objective function is LambdaRank with NDCG
LightGBM supports the following metrics: LightGBM supports the following metrics:
......
Python-package Introduction Python-package Introduction
=========================== ===========================
This document gives a basic walkthrough of LightGBM Python-package. This document gives a basic walk-through of LightGBM Python-package.
**List of other helpful links** **List of other helpful links**
......
...@@ -7,11 +7,11 @@ boosting_type = gbdt ...@@ -7,11 +7,11 @@ boosting_type = gbdt
# application type, support following application # application type, support following application
# regression , regression task # regression , regression task
# binary , binary classification task # binary , binary classification task
# lambdarank , lambdarank task # lambdarank , LambdaRank task
# alias: application, app # alias: application, app
objective = binary objective = binary
# eval metrics, support multi metric, delimite by ',' , support following metrics # eval metrics, support multi metric, delimited by ',' , support following metrics
# l1 # l1
# l2 , default metric for regression # l2 , default metric for regression
# ndcg , default metric for lambdarank # ndcg , default metric for lambdarank
...@@ -20,7 +20,7 @@ objective = binary ...@@ -20,7 +20,7 @@ objective = binary
# binary_error # binary_error
metric = binary_logloss,auc metric = binary_logloss,auc
# frequence for metric output # frequency for metric output
metric_freq = 1 metric_freq = 1
# true if need output metric for training data, alias: tranining_metric, train_metric # true if need output metric for training data, alias: tranining_metric, train_metric
...@@ -30,12 +30,12 @@ is_training_metric = true ...@@ -30,12 +30,12 @@ is_training_metric = true
max_bin = 255 max_bin = 255
# training data # training data
# if exsting weight file, should name to "binary.train.weight" # if existing weight file, should name to "binary.train.weight"
# alias: train_data, train # alias: train_data, train
data = binary.train data = binary.train
# validation data, support multi validation data, separated by ',' # validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "binary.test.weight" # if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data, # alias: valid, test, test_data,
valid_data = binary.test valid_data = binary.test
...@@ -56,7 +56,7 @@ num_leaves = 63 ...@@ -56,7 +56,7 @@ num_leaves = 63
# alias: tree # alias: tree
tree_learner = serial tree_learner = serial
# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu. # number of threads for multi-threading. One thread will use each CPU. The default is the CPU count.
# num_threads = 8 # num_threads = 8
# feature sub-sample, will random select 80% feature to train on each iteration # feature sub-sample, will random select 80% feature to train on each iteration
...@@ -66,7 +66,7 @@ feature_fraction = 0.8 ...@@ -66,7 +66,7 @@ feature_fraction = 0.8
# Support bagging (data sub-sample), will perform bagging every 5 iterations # Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 5 bagging_freq = 5
# Bagging farction, will random select 80% data on bagging # Bagging fraction, will random select 80% data on bagging
# alias: sub_row # alias: sub_row
bagging_fraction = 0.8 bagging_fraction = 0.8
...@@ -74,7 +74,7 @@ bagging_fraction = 0.8 ...@@ -74,7 +74,7 @@ bagging_fraction = 0.8
# alias : min_data_per_leaf, min_data # alias : min_data_per_leaf, min_data
min_data_in_leaf = 50 min_data_in_leaf = 50
# minimal sum hessians for one leaf, use this to deal with over-fit # minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0 min_sum_hessian_in_leaf = 5.0
# save memory and faster speed for sparse feature, alias: is_sparse # save memory and faster speed for sparse feature, alias: is_sparse
......
...@@ -7,13 +7,13 @@ boosting_type = gbdt ...@@ -7,13 +7,13 @@ boosting_type = gbdt
# application type, support following application # application type, support following application
# regression , regression task # regression , regression task
# binary , binary classification task # binary , binary classification task
# lambdarank , lambdarank task # lambdarank , LambdaRank task
# alias: application, app # alias: application, app
objective = binary objective = binary
linear_tree = true linear_tree = true
# eval metrics, support multi metric, delimite by ',' , support following metrics # eval metrics, support multi metric, delimited by ',' , support following metrics
# l1 # l1
# l2 , default metric for regression # l2 , default metric for regression
# ndcg , default metric for lambdarank # ndcg , default metric for lambdarank
...@@ -22,7 +22,7 @@ linear_tree = true ...@@ -22,7 +22,7 @@ linear_tree = true
# binary_error # binary_error
metric = binary_logloss,auc metric = binary_logloss,auc
# frequence for metric output # frequency for metric output
metric_freq = 1 metric_freq = 1
# true if need output metric for training data, alias: tranining_metric, train_metric # true if need output metric for training data, alias: tranining_metric, train_metric
...@@ -32,12 +32,12 @@ is_training_metric = true ...@@ -32,12 +32,12 @@ is_training_metric = true
max_bin = 255 max_bin = 255
# training data # training data
# if exsting weight file, should name to "binary.train.weight" # if existing weight file, should name to "binary.train.weight"
# alias: train_data, train # alias: train_data, train
data = binary.train data = binary.train
# validation data, support multi validation data, separated by ',' # validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "binary.test.weight" # if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data, # alias: valid, test, test_data,
valid_data = binary.test valid_data = binary.test
...@@ -58,7 +58,7 @@ num_leaves = 63 ...@@ -58,7 +58,7 @@ num_leaves = 63
# alias: tree # alias: tree
tree_learner = serial tree_learner = serial
# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu. # number of threads for multi-threading. One thread will use each CPU. The default is set to CPU count.
# num_threads = 8 # num_threads = 8
# feature sub-sample, will random select 80% feature to train on each iteration # feature sub-sample, will random select 80% feature to train on each iteration
...@@ -68,7 +68,7 @@ feature_fraction = 0.8 ...@@ -68,7 +68,7 @@ feature_fraction = 0.8
# Support bagging (data sub-sample), will perform bagging every 5 iterations # Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 5 bagging_freq = 5
# Bagging farction, will random select 80% data on bagging # Bagging fraction, will random select 80% data on bagging
# alias: sub_row # alias: sub_row
bagging_fraction = 0.8 bagging_fraction = 0.8
...@@ -76,7 +76,7 @@ bagging_fraction = 0.8 ...@@ -76,7 +76,7 @@ bagging_fraction = 0.8
# alias : min_data_per_leaf, min_data # alias : min_data_per_leaf, min_data
min_data_in_leaf = 50 min_data_in_leaf = 50
# minimal sum hessians for one leaf, use this to deal with over-fit # minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0 min_sum_hessian_in_leaf = 5.0
# save memory and faster speed for sparse feature, alias: is_sparse # save memory and faster speed for sparse feature, alias: is_sparse
......
LambdaRank Example LambdaRank Example
================== ==================
Here is an example for LightGBM to run lambdarank task. Here is an example for LightGBM to run LambdaRank task.
***You must follow the [installation instructions](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html) ***You must follow the [installation instructions](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html)
for the following commands to work. The `lightgbm` binary must be built and available at the root of this project.*** for the following commands to work. The `lightgbm` binary must be built and available at the root of this project.***
......
...@@ -7,11 +7,11 @@ boosting_type = gbdt ...@@ -7,11 +7,11 @@ boosting_type = gbdt
# application type, support following application # application type, support following application
# regression , regression task # regression , regression task
# binary , binary classification task # binary , binary classification task
# lambdarank , lambdarank task # lambdarank , LambdaRank task
# alias: application, app # alias: application, app
objective = lambdarank objective = lambdarank
# eval metrics, support multi metric, delimite by ',' , support following metrics # eval metrics, support multi metric, delimited by ',' , support following metrics
# l1 # l1
# l2 , default metric for regression # l2 , default metric for regression
# ndcg , default metric for lambdarank # ndcg , default metric for lambdarank
...@@ -23,7 +23,7 @@ metric = ndcg ...@@ -23,7 +23,7 @@ metric = ndcg
# evaluation position for ndcg metric, alias : ndcg_at # evaluation position for ndcg metric, alias : ndcg_at
ndcg_eval_at = 1,3,5 ndcg_eval_at = 1,3,5
# frequence for metric output # frequency for metric output
metric_freq = 1 metric_freq = 1
# true if need output metric for training data, alias: tranining_metric, train_metric # true if need output metric for training data, alias: tranining_metric, train_metric
...@@ -33,14 +33,14 @@ is_training_metric = true ...@@ -33,14 +33,14 @@ is_training_metric = true
max_bin = 255 max_bin = 255
# training data # training data
# if exsting weight file, should name to "rank.train.weight" # if existing weight file, should name to "rank.train.weight"
# if exsting query file, should name to "rank.train.query" # if existing query file, should name to "rank.train.query"
# alias: train_data, train # alias: train_data, train
data = rank.train data = rank.train
# validation data, support multi validation data, separated by ',' # validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "rank.test.weight" # if existing weight file, should name to "rank.test.weight"
# if exsting query file, should name to "rank.test.query" # if existing query file, should name to "rank.test.query"
# alias: valid, test, test_data, # alias: valid, test, test_data,
valid_data = rank.test valid_data = rank.test
...@@ -71,7 +71,7 @@ feature_fraction = 1.0 ...@@ -71,7 +71,7 @@ feature_fraction = 1.0
# Support bagging (data sub-sample), will perform bagging every 5 iterations # Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 1 bagging_freq = 1
# Bagging farction, will random select 80% data on bagging # Bagging fraction, will random select 80% data on bagging
# alias: sub_row # alias: sub_row
bagging_fraction = 0.9 bagging_fraction = 0.9
...@@ -79,7 +79,7 @@ bagging_fraction = 0.9 ...@@ -79,7 +79,7 @@ bagging_fraction = 0.9
# alias : min_data_per_leaf, min_data # alias : min_data_per_leaf, min_data
min_data_in_leaf = 50 min_data_in_leaf = 50
# minimal sum hessians for one leaf, use this to deal with over-fit # minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0 min_sum_hessian_in_leaf = 5.0
# save memory and faster speed for sparse feature, alias: is_sparse # save memory and faster speed for sparse feature, alias: is_sparse
......
...@@ -7,12 +7,12 @@ boosting_type = gbdt ...@@ -7,12 +7,12 @@ boosting_type = gbdt
# application type, support following application # application type, support following application
# regression , regression task # regression , regression task
# binary , binary classification task # binary , binary classification task
# lambdarank , lambdarank task # lambdarank , LambdaRank task
# multiclass # multiclass
# alias: application, app # alias: application, app
objective = multiclass objective = multiclass
# eval metrics, support multi metric, delimite by ',' , support following metrics # eval metrics, support multi metric, delimited by ',' , support following metrics
# l1 # l1
# l2 , default metric for regression # l2 , default metric for regression
# ndcg , default metric for lambdarank # ndcg , default metric for lambdarank
...@@ -35,7 +35,7 @@ auc_mu_weights = 0,1,2,3,4,5,0,6,7,8,9,10,0,11,12,13,14,15,0,16,17,18,19,20,0 ...@@ -35,7 +35,7 @@ auc_mu_weights = 0,1,2,3,4,5,0,6,7,8,9,10,0,11,12,13,14,15,0,16,17,18,19,20,0
# number of class, for multiclass classification # number of class, for multiclass classification
num_class = 5 num_class = 5
# frequence for metric output # frequency for metric output
metric_freq = 1 metric_freq = 1
# true if need output metric for training data, alias: tranining_metric, train_metric # true if need output metric for training data, alias: tranining_metric, train_metric
...@@ -45,7 +45,7 @@ is_training_metric = true ...@@ -45,7 +45,7 @@ is_training_metric = true
max_bin = 255 max_bin = 255
# training data # training data
# if exsting weight file, should name to "regression.train.weight" # if existing weight file, should name to "regression.train.weight"
# alias: train_data, train # alias: train_data, train
data = multiclass.train data = multiclass.train
......
...@@ -7,7 +7,7 @@ boosting_type = gbdt ...@@ -7,7 +7,7 @@ boosting_type = gbdt
# application type, support following application # application type, support following application
# regression , regression task # regression , regression task
# binary , binary classification task # binary , binary classification task
# lambdarank , lambdarank task # lambdarank , LambdaRank task
# alias: application, app # alias: application, app
objective = binary objective = binary
...@@ -20,7 +20,7 @@ objective = binary ...@@ -20,7 +20,7 @@ objective = binary
# binary_error # binary_error
metric = binary_logloss,auc metric = binary_logloss,auc
# frequence for metric output # frequency for metric output
metric_freq = 1 metric_freq = 1
# true if need output metric for training data, alias: tranining_metric, train_metric # true if need output metric for training data, alias: tranining_metric, train_metric
...@@ -30,12 +30,12 @@ is_training_metric = true ...@@ -30,12 +30,12 @@ is_training_metric = true
max_bin = 255 max_bin = 255
# training data # training data
# if exsting weight file, should name to "binary.train.weight" # if existing weight file, should name to "binary.train.weight"
# alias: train_data, train # alias: train_data, train
data = binary.train data = binary.train
# validation data, support multi validation data, separated by ',' # validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "binary.test.weight" # if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data, # alias: valid, test, test_data,
valid_data = binary.test valid_data = binary.test
...@@ -56,7 +56,7 @@ num_leaves = 63 ...@@ -56,7 +56,7 @@ num_leaves = 63
# alias: tree # alias: tree
tree_learner = feature tree_learner = feature
# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu. # number of threads for multi-threading. One thread will use each CPU. The default is the CPU count.
# num_threads = 8 # num_threads = 8
# feature sub-sample, will random select 80% feature to train on each iteration # feature sub-sample, will random select 80% feature to train on each iteration
...@@ -66,7 +66,7 @@ feature_fraction = 0.8 ...@@ -66,7 +66,7 @@ feature_fraction = 0.8
# Support bagging (data sub-sample), will perform bagging every 5 iterations # Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 5 bagging_freq = 5
# Bagging farction, will random select 80% data on bagging # Bagging fraction, will random select 80% data on bagging
# alias: sub_row # alias: sub_row
bagging_fraction = 0.8 bagging_fraction = 0.8
...@@ -74,7 +74,7 @@ bagging_fraction = 0.8 ...@@ -74,7 +74,7 @@ bagging_fraction = 0.8
# alias : min_data_per_leaf, min_data # alias : min_data_per_leaf, min_data
min_data_in_leaf = 50 min_data_in_leaf = 50
# minimal sum hessians for one leaf, use this to deal with over-fit # minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0 min_sum_hessian_in_leaf = 5.0
# save memory and faster speed for sparse feature, alias: is_sparse # save memory and faster speed for sparse feature, alias: is_sparse
......
...@@ -93,7 +93,7 @@ with open('model.pkl', 'rb') as fin: ...@@ -93,7 +93,7 @@ with open('model.pkl', 'rb') as fin:
# can predict with any iteration when loaded in pickle way # can predict with any iteration when loaded in pickle way
y_pred = pkl_bst.predict(X_test, num_iteration=7) y_pred = pkl_bst.predict(X_test, num_iteration=7)
# eval with loaded model # eval with loaded model
print("The rmse of pickled model's prediction is:", mean_squared_error(y_test, y_pred) ** 0.5) print("The RMSE of pickled model's prediction is:", mean_squared_error(y_test, y_pred) ** 0.5)
# continue training # continue training
# init_model accepts: # init_model accepts:
...@@ -146,7 +146,7 @@ def loglikelihood(preds, train_data): ...@@ -146,7 +146,7 @@ def loglikelihood(preds, train_data):
# f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool # f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool
# binary error # binary error
# NOTE: when you do customized loss function, the default prediction value is margin # NOTE: when you do customized loss function, the default prediction value is margin
# This may make built-in evalution metric calculate wrong results # This may make built-in evaluation metric calculate wrong results
# For example, we are doing log likelihood loss, the prediction is score before logistic transformation # For example, we are doing log likelihood loss, the prediction is score before logistic transformation
# Keep this in mind when you use the customization # Keep this in mind when you use the customization
def binary_error(preds, train_data): def binary_error(preds, train_data):
...@@ -170,7 +170,7 @@ print('Finished 40 - 50 rounds with self-defined objective function and eval met ...@@ -170,7 +170,7 @@ print('Finished 40 - 50 rounds with self-defined objective function and eval met
# f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool # f(preds: array, train_data: Dataset) -> name: string, eval_result: float, is_higher_better: bool
# accuracy # accuracy
# NOTE: when you do customized loss function, the default prediction value is margin # NOTE: when you do customized loss function, the default prediction value is margin
# This may make built-in evalution metric calculate wrong results # This may make built-in evaluation metric calculate wrong results
# For example, we are doing log likelihood loss, the prediction is score before logistic transformation # For example, we are doing log likelihood loss, the prediction is score before logistic transformation
# Keep this in mind when you use the customization # Keep this in mind when you use the customization
def accuracy(preds, train_data): def accuracy(preds, train_data):
......
...@@ -7,7 +7,7 @@ boosting_type = gbdt ...@@ -7,7 +7,7 @@ boosting_type = gbdt
# application type, support following application # application type, support following application
# regression , regression task # regression , regression task
# binary , binary classification task # binary , binary classification task
# lambdarank , lambdarank task # lambdarank , LambdaRank task
# alias: application, app # alias: application, app
objective = rank_xendcg objective = rank_xendcg
...@@ -23,7 +23,7 @@ metric = ndcg ...@@ -23,7 +23,7 @@ metric = ndcg
# evaluation position for ndcg metric, alias : ndcg_at # evaluation position for ndcg metric, alias : ndcg_at
ndcg_eval_at = 1,3,5 ndcg_eval_at = 1,3,5
# frequence for metric output # frequency for metric output
metric_freq = 1 metric_freq = 1
# true if need output metric for training data, alias: tranining_metric, train_metric # true if need output metric for training data, alias: tranining_metric, train_metric
...@@ -33,14 +33,14 @@ is_training_metric = true ...@@ -33,14 +33,14 @@ is_training_metric = true
max_bin = 255 max_bin = 255
# training data # training data
# if exsting weight file, should name to "rank.train.weight" # if existing weight file, should name to "rank.train.weight"
# if exsting query file, should name to "rank.train.query" # if existing query file, should name to "rank.train.query"
# alias: train_data, train # alias: train_data, train
data = rank.train data = rank.train
# validation data, support multi validation data, separated by ',' # validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "rank.test.weight" # if existing weight file, should name to "rank.test.weight"
# if exsting query file, should name to "rank.test.query" # if existing query file, should name to "rank.test.query"
# alias: valid, test, test_data, # alias: valid, test, test_data,
valid_data = rank.test valid_data = rank.test
...@@ -72,7 +72,7 @@ feature_fraction = 1.0 ...@@ -72,7 +72,7 @@ feature_fraction = 1.0
# Support bagging (data sub-sample), will perform bagging every 5 iterations # Support bagging (data sub-sample), will perform bagging every 5 iterations
bagging_freq = 1 bagging_freq = 1
# Bagging farction, will random select 80% data on bagging # Bagging fraction, will random select 80% data on bagging
# alias: sub_row # alias: sub_row
bagging_fraction = 0.9 bagging_fraction = 0.9
...@@ -80,7 +80,7 @@ bagging_fraction = 0.9 ...@@ -80,7 +80,7 @@ bagging_fraction = 0.9
# alias : min_data_per_leaf, min_data # alias : min_data_per_leaf, min_data
min_data_in_leaf = 50 min_data_in_leaf = 50
# minimal sum hessians for one leaf, use this to deal with over-fit # minimal sum Hessians for one leaf, use this to deal with over-fit
min_sum_hessian_in_leaf = 5.0 min_sum_hessian_in_leaf = 5.0
# save memory and faster speed for sparse feature, alias: is_sparse # save memory and faster speed for sparse feature, alias: is_sparse
......
...@@ -27,12 +27,12 @@ namespace LightGBM { ...@@ -27,12 +27,12 @@ namespace LightGBM {
class DatasetLoader; class DatasetLoader;
/*! /*!
* \brief This class is used to store some meta(non-feature) data for training data, * \brief This class is used to store some meta(non-feature) data for training data,
* e.g. labels, weights, initial scores, query level informations. * e.g. labels, weights, initial scores, query level information.
* *
* Some details: * Some details:
* 1. Label, used for training. * 1. Label, used for training.
* 2. Weights, weighs of records, optional * 2. Weights, weighs of records, optional
* 3. Query Boundaries, necessary for lambdarank. * 3. Query Boundaries, necessary for LambdaRank.
* The documents of i-th query is in [ query_boundaries[i], query_boundaries[i+1] ) * The documents of i-th query is in [ query_boundaries[i], query_boundaries[i+1] )
* 4. Query Weights, auto calculate by weights and query_boundaries(if both of them are existed) * 4. Query Weights, auto calculate by weights and query_boundaries(if both of them are existed)
* the weight for i-th query is sum(query_boundaries[i] , .., query_boundaries[i+1]) / (query_boundaries[i + 1] - query_boundaries[i+1]) * the weight for i-th query is sum(query_boundaries[i] , .., query_boundaries[i+1]) / (query_boundaries[i + 1] - query_boundaries[i+1])
...@@ -45,7 +45,7 @@ class Metadata { ...@@ -45,7 +45,7 @@ class Metadata {
*/ */
Metadata(); Metadata();
/*! /*!
* \brief Initialization will load query level informations, since it is need for sampling data * \brief Initialization will load query level information, since it is need for sampling data
* \param data_filename Filename of data * \param data_filename Filename of data
*/ */
void Init(const char* data_filename); void Init(const char* data_filename);
...@@ -611,7 +611,7 @@ class Dataset { ...@@ -611,7 +611,7 @@ class Dataset {
// replace ' ' in feature_names with '_' // replace ' ' in feature_names with '_'
bool spaceInFeatureName = false; bool spaceInFeatureName = false;
for (auto& feature_name : feature_names_) { for (auto& feature_name : feature_names_) {
// check json // check JSON
if (!Common::CheckAllowedJSON(feature_name)) { if (!Common::CheckAllowedJSON(feature_name)) {
Log::Fatal("Do not support special JSON characters in feature name."); Log::Fatal("Do not support special JSON characters in feature name.");
} }
...@@ -625,7 +625,7 @@ class Dataset { ...@@ -625,7 +625,7 @@ class Dataset {
feature_name_set.insert(feature_name); feature_name_set.insert(feature_name);
} }
if (spaceInFeatureName) { if (spaceInFeatureName) {
Log::Warning("Find whitespaces in feature_names, replace with underlines"); Log::Warning("Found whitespace in feature_names, replace with underlines");
} }
} }
......
...@@ -105,14 +105,14 @@ class DCGCalculator { ...@@ -105,14 +105,14 @@ class DCGCalculator {
/*! /*!
* \brief Check the metadata for NDCG and lambdarank * \brief Check the metadata for NDCG and LambdaRank
* \param metadata Metadata * \param metadata Metadata
* \param num_queries Number of queries * \param num_queries Number of queries
*/ */
static void CheckMetadata(const Metadata& metadata, data_size_t num_queries); static void CheckMetadata(const Metadata& metadata, data_size_t num_queries);
/*! /*!
* \brief Check the label range for NDCG and lambdarank * \brief Check the label range for NDCG and LambdaRank
* \param label Pointer of label * \param label Pointer of label
* \param num_data Number of data * \param num_data Number of data
*/ */
......
...@@ -128,7 +128,7 @@ class Network { ...@@ -128,7 +128,7 @@ class Network {
const ReduceFunction& reducer); const ReduceFunction& reducer);
/*! /*!
* \brief Performing all_gather by using bruck algorithm. * \brief Performing all_gather by using Bruck algorithm.
Communication times is O(log(n)), and communication cost is O(send_size * number_machine) Communication times is O(log(n)), and communication cost is O(send_size * number_machine)
* It can be used when all nodes have same input size. * It can be used when all nodes have same input size.
* \param input Input data * \param input Input data
...@@ -138,7 +138,7 @@ class Network { ...@@ -138,7 +138,7 @@ class Network {
static void Allgather(char* input, comm_size_t send_size, char* output); static void Allgather(char* input, comm_size_t send_size, char* output);
/*! /*!
* \brief Performing all_gather by using bruck algorithm. * \brief Performing all_gather by using Bruck algorithm.
Communication times is O(log(n)), and communication cost is O(all_size) Communication times is O(log(n)), and communication cost is O(all_size)
* It can be used when nodes have different input size. * It can be used when nodes have different input size.
* \param input Input data * \param input Input data
......
...@@ -68,7 +68,7 @@ class ThreadExceptionHelper { ...@@ -68,7 +68,7 @@ class ThreadExceptionHelper {
#else #else
/* /*
* To be compatible with openmp, define a nothrow macro which is used by gcc * To be compatible with OpenMP, define a nothrow macro which is used by gcc
* openmp, but not by clang. * openmp, but not by clang.
* See also https://github.com/dmlc/dmlc-core/blob/3106c1cbdcc9fc9ef3a2c1d2196a7a6f6616c13d/include/dmlc/omp.h#L14 * See also https://github.com/dmlc/dmlc-core/blob/3106c1cbdcc9fc9ef3a2c1d2196a7a6f6616c13d/include/dmlc/omp.h#L14
*/ */
......
...@@ -2799,7 +2799,7 @@ class Booster: ...@@ -2799,7 +2799,7 @@ class Booster:
eval_data : Dataset eval_data : Dataset
The evaluation dataset. The evaluation dataset.
eval_name : string eval_name : string
The name of evaluation function (without whitespaces). The name of evaluation function (without whitespace).
eval_result : float eval_result : float
The eval result. The eval result.
is_higher_better : bool is_higher_better : bool
...@@ -2847,7 +2847,7 @@ class Booster: ...@@ -2847,7 +2847,7 @@ class Booster:
train_data : Dataset train_data : Dataset
The training dataset. The training dataset.
eval_name : string eval_name : string
The name of evaluation function (without whitespaces). The name of evaluation function (without whitespace).
eval_result : float eval_result : float
The eval result. The eval result.
is_higher_better : bool is_higher_better : bool
...@@ -2880,7 +2880,7 @@ class Booster: ...@@ -2880,7 +2880,7 @@ class Booster:
valid_data : Dataset valid_data : Dataset
The validation dataset. The validation dataset.
eval_name : string eval_name : string
The name of evaluation function (without whitespaces). The name of evaluation function (without whitespace).
eval_result : float eval_result : float
The eval result. The eval result.
is_higher_better : bool is_higher_better : bool
......
...@@ -452,7 +452,7 @@ def _train( ...@@ -452,7 +452,7 @@ def _train(
model = results[0] model = results[0]
# if network parameters were changed during training, remove them from the # if network parameters were changed during training, remove them from the
# returned moodel so that they're generated dynamically on every run based # returned model so that they're generated dynamically on every run based
# on the Dask cluster you're connected to and which workers have pieces of # on the Dask cluster you're connected to and which workers have pieces of
# the training data # the training data
if not listen_port_in_params: if not listen_port_in_params:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment