Unverified Commit 4278f222 authored by CharlesAuguste's avatar CharlesAuguste Committed by GitHub
Browse files

Pr4 advanced method monotone constraints (#3264)



* No need to pass the tree to all fuctions related to monotone constraints because the pointer is shared.

* Fix OppositeChildShouldBeUpdated numerical split optimisation.

* No need to use constraints when computing the output of the root.

* Refactor existing constraints.

* Add advanced constraints method.

* Update tests.

* Add override.

* linting.

* Add override.

* Simplify condition in LeftRightContainsRelevantInformation.

* Add virtual destructor to FeatureConstraint.

* Remove redundant blank line.

* linting of else.

* Indentation.

* Lint else.

* Replaced non-const reference by pointers.

* Forgotten reference.

* Leverage USE_MC for efficiency.

* Make constraints const again in feature_histogram.hpp.

* Update docs.

* Add "advanced" to the monotone constraints options.

* Update monotone constraints restrictions.

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Remove superfluous parenthesis.

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Remove std namespace qualifier.

* Fix unsigned_int size_t comparison.

* Set num_features as int for consistency with the rest of the codebase.

* Make sure constraints exist before recomputing them.

* Initialize previous constraints in UpdateConstraints.

* Update monotone constraints restrictions.

* Refactor UpdateConstraints loop.

* Update src/io/config.cpp
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Delete white spaces.
Co-authored-by: default avatarCharles Auguste <charles.auguste@sig.com>
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
parent 3454698e
......@@ -462,7 +462,7 @@ Learning Control Parameters
- you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature
- ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = enum, options: ``basic``, ``intermediate``, aliases: ``monotone_constraining_method``, ``mc_method``
- ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = enum, options: ``basic``, ``intermediate``, ``advanced``, aliases: ``monotone_constraining_method``, ``mc_method``
- used only if ``monotone_constraints`` is set
......@@ -472,6 +472,8 @@ Learning Control Parameters
- ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
- ``advanced``, an `even more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library. However, this method is even less constraining than the intermediate method and should again significantly improve the results
- ``monotone_penalty`` :raw-html:`<a id="monotone_penalty" title="Permalink to this parameter" href="#monotone_penalty">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``monotone_splits_penalty``, ``ms_penalty``, ``mc_penalty``, constraints: ``monotone_penalty >= 0.0``
- used only if ``monotone_constraints`` is set
......
......@@ -443,11 +443,12 @@ struct Config {
// type = enum
// alias = monotone_constraining_method, mc_method
// options = basic, intermediate
// options = basic, intermediate, advanced
// desc = used only if ``monotone_constraints`` is set
// desc = monotone constraints method
// descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions
// descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
// descl2 = ``advanced``, an `even more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library. However, this method is even less constraining than the intermediate method and should again significantly improve the results
std::string monotone_constraints_method = "basic";
// alias = monotone_splits_penalty, ms_penalty, mc_penalty
......
......@@ -345,15 +345,15 @@ void Config::CheckParamConflict() {
min_data_in_leaf = 2;
Log::Warning("min_data_in_leaf has been increased to 2 because this is required when path smoothing is active.");
}
if (is_parallel && monotone_constraints_method == std::string("intermediate")) {
if (is_parallel && (monotone_constraints_method == std::string("intermediate") || monotone_constraints_method == std::string("advanced"))) {
// In distributed mode, local node doesn't have histograms on all features, cannot perform "intermediate" monotone constraints.
Log::Warning("Cannot use \"intermediate\" monotone constraints in parallel learning, auto set to \"basic\" method.");
Log::Warning("Cannot use \"intermediate\" or \"advanced\" monotone constraints in parallel learning, auto set to \"basic\" method.");
monotone_constraints_method = "basic";
}
if (feature_fraction_bynode != 1.0 && monotone_constraints_method == std::string("intermediate")) {
if (feature_fraction_bynode != 1.0 && (monotone_constraints_method == std::string("intermediate") || monotone_constraints_method == std::string("advanced"))) {
// "intermediate" monotone constraints need to recompute splits. If the features are sampled when computing the
// split initially, then the sampling needs to be recorded or done once again, which is currently not supported
Log::Warning("Cannot use \"intermediate\" monotone constraints with feature fraction different from 1, auto set monotone constraints to \"basic\" method.");
Log::Warning("Cannot use \"intermediate\" or \"advanced\" monotone constraints with feature fraction different from 1, auto set monotone constraints to \"basic\" method.");
monotone_constraints_method = "basic";
}
if (max_depth > 0 && monotone_penalty >= max_depth) {
......
......@@ -84,7 +84,7 @@ class FeatureHistogram {
void FindBestThreshold(double sum_gradient, double sum_hessian,
data_size_t num_data,
const ConstraintEntry& constraints,
const FeatureConstraint* constraints,
double parent_output,
SplitInfo* output) {
output->default_left = true;
......@@ -158,7 +158,7 @@ class FeatureHistogram {
#define TEMPLATE_PREFIX USE_RAND, USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING
#define LAMBDA_ARGUMENTS \
double sum_gradient, double sum_hessian, data_size_t num_data, \
const ConstraintEntry &constraints, double parent_output, SplitInfo *output
const FeatureConstraint* constraints, double parent_output, SplitInfo *output
#define BEFORE_ARGUMENTS sum_gradient, sum_hessian, parent_output, num_data, output, &rand_threshold
#define FUNC_ARGUMENTS \
sum_gradient, sum_hessian, num_data, constraints, min_gain_shift, \
......@@ -278,7 +278,7 @@ class FeatureHistogram {
void FindBestThresholdCategoricalInner(double sum_gradient,
double sum_hessian,
data_size_t num_data,
const ConstraintEntry& constraints,
const FeatureConstraint* constraints,
double parent_output,
SplitInfo* output) {
is_splittable_ = false;
......@@ -288,6 +288,9 @@ class FeatureHistogram {
double best_sum_left_gradient = 0;
double best_sum_left_hessian = 0;
double gain_shift;
if (USE_MC) {
constraints->InitCumulativeConstraints(true);
}
if (USE_SMOOTHING) {
gain_shift = GetLeafGainGivenOutput<USE_L1>(
sum_gradient, sum_hessian, meta_->config->lambda_l1, meta_->config->lambda_l2, parent_output);
......@@ -474,14 +477,14 @@ class FeatureHistogram {
output->left_output = CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
best_sum_left_gradient, best_sum_left_hessian,
meta_->config->lambda_l1, l2, meta_->config->max_delta_step,
constraints, meta_->config->path_smooth, best_left_count, parent_output);
constraints->LeftToBasicConstraint(), meta_->config->path_smooth, best_left_count, parent_output);
output->left_count = best_left_count;
output->left_sum_gradient = best_sum_left_gradient;
output->left_sum_hessian = best_sum_left_hessian - kEpsilon;
output->right_output = CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_gradient - best_sum_left_gradient,
sum_hessian - best_sum_left_hessian, meta_->config->lambda_l1, l2,
meta_->config->max_delta_step, constraints, meta_->config->path_smooth,
meta_->config->max_delta_step, constraints->RightToBasicConstraint(), meta_->config->path_smooth,
num_data - best_left_count, parent_output);
output->right_count = num_data - best_left_count;
output->right_sum_gradient = sum_gradient - best_sum_left_gradient;
......@@ -763,7 +766,7 @@ class FeatureHistogram {
template <bool USE_MC, bool USE_L1, bool USE_MAX_OUTPUT, bool USE_SMOOTHING>
static double CalculateSplittedLeafOutput(
double sum_gradients, double sum_hessians, double l1, double l2,
double max_delta_step, const ConstraintEntry& constraints,
double max_delta_step, const BasicConstraint& constraints,
double smoothing, data_size_t num_data, double parent_output) {
double ret = CalculateSplittedLeafOutput<USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_gradients, sum_hessians, l1, l2, max_delta_step, smoothing, num_data, parent_output);
......@@ -784,7 +787,7 @@ class FeatureHistogram {
double sum_right_gradients,
double sum_right_hessians, double l1, double l2,
double max_delta_step,
const ConstraintEntry& constraints,
const FeatureConstraint* constraints,
int8_t monotone_constraint,
double smoothing,
data_size_t left_count,
......@@ -803,11 +806,11 @@ class FeatureHistogram {
double left_output =
CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_left_gradients, sum_left_hessians, l1, l2, max_delta_step,
constraints, smoothing, left_count, parent_output);
constraints->LeftToBasicConstraint(), smoothing, left_count, parent_output);
double right_output =
CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_right_gradients, sum_right_hessians, l1, l2, max_delta_step,
constraints, smoothing, right_count, parent_output);
constraints->RightToBasicConstraint(), smoothing, right_count, parent_output);
if (((monotone_constraint > 0) && (left_output > right_output)) ||
((monotone_constraint < 0) && (left_output < right_output))) {
return 0;
......@@ -854,7 +857,7 @@ class FeatureHistogram {
bool REVERSE, bool SKIP_DEFAULT_BIN, bool NA_AS_MISSING>
void FindBestThresholdSequentially(double sum_gradient, double sum_hessian,
data_size_t num_data,
const ConstraintEntry& constraints,
const FeatureConstraint* constraints,
double min_gain_shift, SplitInfo* output,
int rand_threshold, double parent_output) {
const int8_t offset = meta_->offset;
......@@ -864,6 +867,16 @@ class FeatureHistogram {
data_size_t best_left_count = 0;
uint32_t best_threshold = static_cast<uint32_t>(meta_->num_bin);
const double cnt_factor = num_data / sum_hessian;
BasicConstraint best_right_constraints;
BasicConstraint best_left_constraints;
bool constraint_update_necessary =
USE_MC && constraints->ConstraintDifferentDependingOnThreshold();
if (USE_MC) {
constraints->InitCumulativeConstraints(REVERSE);
}
if (REVERSE) {
double sum_right_gradient = 0.0f;
double sum_right_hessian = kEpsilon;
......@@ -910,6 +923,11 @@ class FeatureHistogram {
continue;
}
}
if (USE_MC && constraint_update_necessary) {
constraints->Update(t + offset);
}
// current split gain
double current_gain = GetSplitGains<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_left_gradient, sum_left_hessian, sum_right_gradient,
......@@ -932,6 +950,10 @@ class FeatureHistogram {
// left is <= threshold, right is > threshold. so this is t-1
best_threshold = static_cast<uint32_t>(t - 1 + offset);
best_gain = current_gain;
if (USE_MC) {
best_right_constraints = constraints->RightToBasicConstraint();
best_left_constraints = constraints->LeftToBasicConstraint();
}
}
}
} else {
......@@ -1016,6 +1038,10 @@ class FeatureHistogram {
best_sum_left_hessian = sum_left_hessian;
best_threshold = static_cast<uint32_t>(t + offset);
best_gain = current_gain;
if (USE_MC) {
best_right_constraints = constraints->RightToBasicConstraint();
best_left_constraints = constraints->LeftToBasicConstraint();
}
}
}
}
......@@ -1027,7 +1053,7 @@ class FeatureHistogram {
CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
best_sum_left_gradient, best_sum_left_hessian,
meta_->config->lambda_l1, meta_->config->lambda_l2,
meta_->config->max_delta_step, constraints, meta_->config->path_smooth,
meta_->config->max_delta_step, best_left_constraints, meta_->config->path_smooth,
best_left_count, parent_output);
output->left_count = best_left_count;
output->left_sum_gradient = best_sum_left_gradient;
......@@ -1037,7 +1063,7 @@ class FeatureHistogram {
sum_gradient - best_sum_left_gradient,
sum_hessian - best_sum_left_hessian, meta_->config->lambda_l1,
meta_->config->lambda_l2, meta_->config->max_delta_step,
constraints, meta_->config->path_smooth, num_data - best_left_count,
best_right_constraints, meta_->config->path_smooth, num_data - best_left_count,
parent_output);
output->right_count = num_data - best_left_count;
output->right_sum_gradient = sum_gradient - best_sum_left_gradient;
......@@ -1053,7 +1079,7 @@ class FeatureHistogram {
hist_t* data_;
bool is_splittable_ = true;
std::function<void(double, double, data_size_t, const ConstraintEntry&,
std::function<void(double, double, data_size_t, const FeatureConstraint*,
double, SplitInfo*)>
find_best_threshold_fun_;
};
......
This diff is collapsed.
......@@ -46,7 +46,7 @@ void SerialTreeLearner::Init(const Dataset* train_data, bool is_constant_hessian
// push split information for all leaves
best_split_per_leaf_.resize(config_->num_leaves);
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves));
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves, train_data_->num_features()));
// initialize splits for leaf
smaller_leaf_splits_.reset(new LeafSplits(train_data_->num_data()));
......@@ -146,7 +146,7 @@ void SerialTreeLearner::ResetConfig(const Config* config) {
}
cegb_->Init();
}
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves));
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves, train_data_->num_features()));
}
Tree* SerialTreeLearner::Train(const score_t* gradients, const score_t *hessians) {
......@@ -561,7 +561,7 @@ void SerialTreeLearner::SplitInner(Tree* tree, int best_leaf, int* left_leaf,
auto next_leaf_id = tree->NextLeafId();
// update before tree split
constraints_->BeforeSplit(tree, best_leaf, next_leaf_id,
constraints_->BeforeSplit(best_leaf, next_leaf_id,
best_split_info.monotone_type);
bool is_numerical_split =
......@@ -657,7 +657,7 @@ void SerialTreeLearner::SplitInner(Tree* tree, int best_leaf, int* left_leaf,
best_split_info.left_output);
}
auto leaves_need_update = constraints_->Update(
tree, is_numerical_split, *left_leaf, *right_leaf,
is_numerical_split, *left_leaf, *right_leaf,
best_split_info.monotone_type, best_split_info.right_output,
best_split_info.left_output, inner_feature_index, best_split_info,
best_split_per_leaf_);
......@@ -711,20 +711,29 @@ void SerialTreeLearner::ComputeBestSplitForFeature(
FeatureHistogram* histogram_array_, int feature_index, int real_fidx,
bool is_feature_used, int num_data, const LeafSplits* leaf_splits,
SplitInfo* best_split) {
bool is_feature_numerical = train_data_->FeatureBinMapper(feature_index)
->bin_type() == BinType::NumericalBin;
if (is_feature_numerical & !config_->monotone_constraints.empty()) {
constraints_->RecomputeConstraintsIfNeeded(
constraints_.get(), feature_index, ~(leaf_splits->leaf_index()),
train_data_->FeatureNumBin(feature_index));
}
SplitInfo new_split;
double parent_output;
if (leaf_splits->leaf_index() == 0) {
// for root leaf the "parent" output is its own output because we don't apply any smoothing to the root
parent_output = FeatureHistogram::CalculateSplittedLeafOutput<true, true, true, false>(
parent_output = FeatureHistogram::CalculateSplittedLeafOutput<false, true, true, false>(
leaf_splits->sum_gradients(), leaf_splits->sum_hessians(), config_->lambda_l1,
config_->lambda_l2, config_->max_delta_step, constraints_->Get(leaf_splits->leaf_index()),
config_->lambda_l2, config_->max_delta_step, BasicConstraint(),
config_->path_smooth, static_cast<data_size_t>(num_data), 0);
} else {
parent_output = leaf_splits->weight();
}
histogram_array_[feature_index].FindBestThreshold(
leaf_splits->sum_gradients(), leaf_splits->sum_hessians(), num_data,
constraints_->Get(leaf_splits->leaf_index()), parent_output, &new_split);
constraints_->GetFeatureConstraint(leaf_splits->leaf_index(), feature_index), parent_output, &new_split);
new_split.feature = real_fidx;
if (cegb_ != nullptr) {
new_split.gain -=
......
......@@ -1247,7 +1247,7 @@ class TestEngine(unittest.TestCase):
for test_with_categorical_variable in [True, False]:
trainset = self.generate_trainset_for_monotone_constraints_tests(test_with_categorical_variable)
for monotone_constraints_method in ["basic", "intermediate"]:
for monotone_constraints_method in ["basic", "intermediate", "advanced"]:
params = {
'min_data': 20,
'num_leaves': 20,
......@@ -1281,7 +1281,7 @@ class TestEngine(unittest.TestCase):
monotone_constraints = [1, -1, 0]
penalization_parameter = 2.0
trainset = self.generate_trainset_for_monotone_constraints_tests(x3_to_category=False)
for monotone_constraints_method in ["basic", "intermediate"]:
for monotone_constraints_method in ["basic", "intermediate", "advanced"]:
params = {
'max_depth': max_depth,
'monotone_constraints': monotone_constraints,
......@@ -1320,7 +1320,7 @@ class TestEngine(unittest.TestCase):
unconstrained_model_predictions = unconstrained_model.\
predict(x3_negatively_correlated_with_y.reshape(-1, 1))
for monotone_constraints_method in ["basic", "intermediate"]:
for monotone_constraints_method in ["basic", "intermediate", "advanced"]:
params_constrained_model["monotone_constraints_method"] = monotone_constraints_method
# The penalization is so high that the first 2 features should not be used here
constrained_model = lgb.train(params_constrained_model, trainset_constrained_model, 10)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment