Unverified Commit 4278f222 authored by CharlesAuguste's avatar CharlesAuguste Committed by GitHub
Browse files

Pr4 advanced method monotone constraints (#3264)



* No need to pass the tree to all fuctions related to monotone constraints because the pointer is shared.

* Fix OppositeChildShouldBeUpdated numerical split optimisation.

* No need to use constraints when computing the output of the root.

* Refactor existing constraints.

* Add advanced constraints method.

* Update tests.

* Add override.

* linting.

* Add override.

* Simplify condition in LeftRightContainsRelevantInformation.

* Add virtual destructor to FeatureConstraint.

* Remove redundant blank line.

* linting of else.

* Indentation.

* Lint else.

* Replaced non-const reference by pointers.

* Forgotten reference.

* Leverage USE_MC for efficiency.

* Make constraints const again in feature_histogram.hpp.

* Update docs.

* Add "advanced" to the monotone constraints options.

* Update monotone constraints restrictions.

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Remove superfluous parenthesis.

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Fix loop iterator.
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Remove std namespace qualifier.

* Fix unsigned_int size_t comparison.

* Set num_features as int for consistency with the rest of the codebase.

* Make sure constraints exist before recomputing them.

* Initialize previous constraints in UpdateConstraints.

* Update monotone constraints restrictions.

* Refactor UpdateConstraints loop.

* Update src/io/config.cpp
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>

* Delete white spaces.
Co-authored-by: default avatarCharles Auguste <charles.auguste@sig.com>
Co-authored-by: default avatarNikita Titov <nekit94-08@mail.ru>
parent 3454698e
...@@ -462,7 +462,7 @@ Learning Control Parameters ...@@ -462,7 +462,7 @@ Learning Control Parameters
- you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature - you need to specify all features in order. For example, ``mc=-1,0,1`` means decreasing for 1st feature, non-constraint for 2nd feature and increasing for the 3rd feature
- ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = enum, options: ``basic``, ``intermediate``, aliases: ``monotone_constraining_method``, ``mc_method`` - ``monotone_constraints_method`` :raw-html:`<a id="monotone_constraints_method" title="Permalink to this parameter" href="#monotone_constraints_method">&#x1F517;&#xFE0E;</a>`, default = ``basic``, type = enum, options: ``basic``, ``intermediate``, ``advanced``, aliases: ``monotone_constraining_method``, ``mc_method``
- used only if ``monotone_constraints`` is set - used only if ``monotone_constraints`` is set
...@@ -472,6 +472,8 @@ Learning Control Parameters ...@@ -472,6 +472,8 @@ Learning Control Parameters
- ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results - ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
- ``advanced``, an `even more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library. However, this method is even less constraining than the intermediate method and should again significantly improve the results
- ``monotone_penalty`` :raw-html:`<a id="monotone_penalty" title="Permalink to this parameter" href="#monotone_penalty">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``monotone_splits_penalty``, ``ms_penalty``, ``mc_penalty``, constraints: ``monotone_penalty >= 0.0`` - ``monotone_penalty`` :raw-html:`<a id="monotone_penalty" title="Permalink to this parameter" href="#monotone_penalty">&#x1F517;&#xFE0E;</a>`, default = ``0.0``, type = double, aliases: ``monotone_splits_penalty``, ``ms_penalty``, ``mc_penalty``, constraints: ``monotone_penalty >= 0.0``
- used only if ``monotone_constraints`` is set - used only if ``monotone_constraints`` is set
......
...@@ -443,11 +443,12 @@ struct Config { ...@@ -443,11 +443,12 @@ struct Config {
// type = enum // type = enum
// alias = monotone_constraining_method, mc_method // alias = monotone_constraining_method, mc_method
// options = basic, intermediate // options = basic, intermediate, advanced
// desc = used only if ``monotone_constraints`` is set // desc = used only if ``monotone_constraints`` is set
// desc = monotone constraints method // desc = monotone constraints method
// descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions // descl2 = ``basic``, the most basic monotone constraints method. It does not slow the library at all, but over-constrains the predictions
// descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results // descl2 = ``intermediate``, a `more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library very slightly. However, this method is much less constraining than the basic method and should significantly improve the results
// descl2 = ``advanced``, an `even more advanced method <https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf>`__, which may slow the library. However, this method is even less constraining than the intermediate method and should again significantly improve the results
std::string monotone_constraints_method = "basic"; std::string monotone_constraints_method = "basic";
// alias = monotone_splits_penalty, ms_penalty, mc_penalty // alias = monotone_splits_penalty, ms_penalty, mc_penalty
......
...@@ -345,15 +345,15 @@ void Config::CheckParamConflict() { ...@@ -345,15 +345,15 @@ void Config::CheckParamConflict() {
min_data_in_leaf = 2; min_data_in_leaf = 2;
Log::Warning("min_data_in_leaf has been increased to 2 because this is required when path smoothing is active."); Log::Warning("min_data_in_leaf has been increased to 2 because this is required when path smoothing is active.");
} }
if (is_parallel && monotone_constraints_method == std::string("intermediate")) { if (is_parallel && (monotone_constraints_method == std::string("intermediate") || monotone_constraints_method == std::string("advanced"))) {
// In distributed mode, local node doesn't have histograms on all features, cannot perform "intermediate" monotone constraints. // In distributed mode, local node doesn't have histograms on all features, cannot perform "intermediate" monotone constraints.
Log::Warning("Cannot use \"intermediate\" monotone constraints in parallel learning, auto set to \"basic\" method."); Log::Warning("Cannot use \"intermediate\" or \"advanced\" monotone constraints in parallel learning, auto set to \"basic\" method.");
monotone_constraints_method = "basic"; monotone_constraints_method = "basic";
} }
if (feature_fraction_bynode != 1.0 && monotone_constraints_method == std::string("intermediate")) { if (feature_fraction_bynode != 1.0 && (monotone_constraints_method == std::string("intermediate") || monotone_constraints_method == std::string("advanced"))) {
// "intermediate" monotone constraints need to recompute splits. If the features are sampled when computing the // "intermediate" monotone constraints need to recompute splits. If the features are sampled when computing the
// split initially, then the sampling needs to be recorded or done once again, which is currently not supported // split initially, then the sampling needs to be recorded or done once again, which is currently not supported
Log::Warning("Cannot use \"intermediate\" monotone constraints with feature fraction different from 1, auto set monotone constraints to \"basic\" method."); Log::Warning("Cannot use \"intermediate\" or \"advanced\" monotone constraints with feature fraction different from 1, auto set monotone constraints to \"basic\" method.");
monotone_constraints_method = "basic"; monotone_constraints_method = "basic";
} }
if (max_depth > 0 && monotone_penalty >= max_depth) { if (max_depth > 0 && monotone_penalty >= max_depth) {
......
...@@ -84,7 +84,7 @@ class FeatureHistogram { ...@@ -84,7 +84,7 @@ class FeatureHistogram {
void FindBestThreshold(double sum_gradient, double sum_hessian, void FindBestThreshold(double sum_gradient, double sum_hessian,
data_size_t num_data, data_size_t num_data,
const ConstraintEntry& constraints, const FeatureConstraint* constraints,
double parent_output, double parent_output,
SplitInfo* output) { SplitInfo* output) {
output->default_left = true; output->default_left = true;
...@@ -158,7 +158,7 @@ class FeatureHistogram { ...@@ -158,7 +158,7 @@ class FeatureHistogram {
#define TEMPLATE_PREFIX USE_RAND, USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING #define TEMPLATE_PREFIX USE_RAND, USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING
#define LAMBDA_ARGUMENTS \ #define LAMBDA_ARGUMENTS \
double sum_gradient, double sum_hessian, data_size_t num_data, \ double sum_gradient, double sum_hessian, data_size_t num_data, \
const ConstraintEntry &constraints, double parent_output, SplitInfo *output const FeatureConstraint* constraints, double parent_output, SplitInfo *output
#define BEFORE_ARGUMENTS sum_gradient, sum_hessian, parent_output, num_data, output, &rand_threshold #define BEFORE_ARGUMENTS sum_gradient, sum_hessian, parent_output, num_data, output, &rand_threshold
#define FUNC_ARGUMENTS \ #define FUNC_ARGUMENTS \
sum_gradient, sum_hessian, num_data, constraints, min_gain_shift, \ sum_gradient, sum_hessian, num_data, constraints, min_gain_shift, \
...@@ -278,7 +278,7 @@ class FeatureHistogram { ...@@ -278,7 +278,7 @@ class FeatureHistogram {
void FindBestThresholdCategoricalInner(double sum_gradient, void FindBestThresholdCategoricalInner(double sum_gradient,
double sum_hessian, double sum_hessian,
data_size_t num_data, data_size_t num_data,
const ConstraintEntry& constraints, const FeatureConstraint* constraints,
double parent_output, double parent_output,
SplitInfo* output) { SplitInfo* output) {
is_splittable_ = false; is_splittable_ = false;
...@@ -288,6 +288,9 @@ class FeatureHistogram { ...@@ -288,6 +288,9 @@ class FeatureHistogram {
double best_sum_left_gradient = 0; double best_sum_left_gradient = 0;
double best_sum_left_hessian = 0; double best_sum_left_hessian = 0;
double gain_shift; double gain_shift;
if (USE_MC) {
constraints->InitCumulativeConstraints(true);
}
if (USE_SMOOTHING) { if (USE_SMOOTHING) {
gain_shift = GetLeafGainGivenOutput<USE_L1>( gain_shift = GetLeafGainGivenOutput<USE_L1>(
sum_gradient, sum_hessian, meta_->config->lambda_l1, meta_->config->lambda_l2, parent_output); sum_gradient, sum_hessian, meta_->config->lambda_l1, meta_->config->lambda_l2, parent_output);
...@@ -474,14 +477,14 @@ class FeatureHistogram { ...@@ -474,14 +477,14 @@ class FeatureHistogram {
output->left_output = CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( output->left_output = CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
best_sum_left_gradient, best_sum_left_hessian, best_sum_left_gradient, best_sum_left_hessian,
meta_->config->lambda_l1, l2, meta_->config->max_delta_step, meta_->config->lambda_l1, l2, meta_->config->max_delta_step,
constraints, meta_->config->path_smooth, best_left_count, parent_output); constraints->LeftToBasicConstraint(), meta_->config->path_smooth, best_left_count, parent_output);
output->left_count = best_left_count; output->left_count = best_left_count;
output->left_sum_gradient = best_sum_left_gradient; output->left_sum_gradient = best_sum_left_gradient;
output->left_sum_hessian = best_sum_left_hessian - kEpsilon; output->left_sum_hessian = best_sum_left_hessian - kEpsilon;
output->right_output = CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( output->right_output = CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_gradient - best_sum_left_gradient, sum_gradient - best_sum_left_gradient,
sum_hessian - best_sum_left_hessian, meta_->config->lambda_l1, l2, sum_hessian - best_sum_left_hessian, meta_->config->lambda_l1, l2,
meta_->config->max_delta_step, constraints, meta_->config->path_smooth, meta_->config->max_delta_step, constraints->RightToBasicConstraint(), meta_->config->path_smooth,
num_data - best_left_count, parent_output); num_data - best_left_count, parent_output);
output->right_count = num_data - best_left_count; output->right_count = num_data - best_left_count;
output->right_sum_gradient = sum_gradient - best_sum_left_gradient; output->right_sum_gradient = sum_gradient - best_sum_left_gradient;
...@@ -763,7 +766,7 @@ class FeatureHistogram { ...@@ -763,7 +766,7 @@ class FeatureHistogram {
template <bool USE_MC, bool USE_L1, bool USE_MAX_OUTPUT, bool USE_SMOOTHING> template <bool USE_MC, bool USE_L1, bool USE_MAX_OUTPUT, bool USE_SMOOTHING>
static double CalculateSplittedLeafOutput( static double CalculateSplittedLeafOutput(
double sum_gradients, double sum_hessians, double l1, double l2, double sum_gradients, double sum_hessians, double l1, double l2,
double max_delta_step, const ConstraintEntry& constraints, double max_delta_step, const BasicConstraint& constraints,
double smoothing, data_size_t num_data, double parent_output) { double smoothing, data_size_t num_data, double parent_output) {
double ret = CalculateSplittedLeafOutput<USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( double ret = CalculateSplittedLeafOutput<USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_gradients, sum_hessians, l1, l2, max_delta_step, smoothing, num_data, parent_output); sum_gradients, sum_hessians, l1, l2, max_delta_step, smoothing, num_data, parent_output);
...@@ -784,7 +787,7 @@ class FeatureHistogram { ...@@ -784,7 +787,7 @@ class FeatureHistogram {
double sum_right_gradients, double sum_right_gradients,
double sum_right_hessians, double l1, double l2, double sum_right_hessians, double l1, double l2,
double max_delta_step, double max_delta_step,
const ConstraintEntry& constraints, const FeatureConstraint* constraints,
int8_t monotone_constraint, int8_t monotone_constraint,
double smoothing, double smoothing,
data_size_t left_count, data_size_t left_count,
...@@ -803,11 +806,11 @@ class FeatureHistogram { ...@@ -803,11 +806,11 @@ class FeatureHistogram {
double left_output = double left_output =
CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_left_gradients, sum_left_hessians, l1, l2, max_delta_step, sum_left_gradients, sum_left_hessians, l1, l2, max_delta_step,
constraints, smoothing, left_count, parent_output); constraints->LeftToBasicConstraint(), smoothing, left_count, parent_output);
double right_output = double right_output =
CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_right_gradients, sum_right_hessians, l1, l2, max_delta_step, sum_right_gradients, sum_right_hessians, l1, l2, max_delta_step,
constraints, smoothing, right_count, parent_output); constraints->RightToBasicConstraint(), smoothing, right_count, parent_output);
if (((monotone_constraint > 0) && (left_output > right_output)) || if (((monotone_constraint > 0) && (left_output > right_output)) ||
((monotone_constraint < 0) && (left_output < right_output))) { ((monotone_constraint < 0) && (left_output < right_output))) {
return 0; return 0;
...@@ -854,7 +857,7 @@ class FeatureHistogram { ...@@ -854,7 +857,7 @@ class FeatureHistogram {
bool REVERSE, bool SKIP_DEFAULT_BIN, bool NA_AS_MISSING> bool REVERSE, bool SKIP_DEFAULT_BIN, bool NA_AS_MISSING>
void FindBestThresholdSequentially(double sum_gradient, double sum_hessian, void FindBestThresholdSequentially(double sum_gradient, double sum_hessian,
data_size_t num_data, data_size_t num_data,
const ConstraintEntry& constraints, const FeatureConstraint* constraints,
double min_gain_shift, SplitInfo* output, double min_gain_shift, SplitInfo* output,
int rand_threshold, double parent_output) { int rand_threshold, double parent_output) {
const int8_t offset = meta_->offset; const int8_t offset = meta_->offset;
...@@ -864,6 +867,16 @@ class FeatureHistogram { ...@@ -864,6 +867,16 @@ class FeatureHistogram {
data_size_t best_left_count = 0; data_size_t best_left_count = 0;
uint32_t best_threshold = static_cast<uint32_t>(meta_->num_bin); uint32_t best_threshold = static_cast<uint32_t>(meta_->num_bin);
const double cnt_factor = num_data / sum_hessian; const double cnt_factor = num_data / sum_hessian;
BasicConstraint best_right_constraints;
BasicConstraint best_left_constraints;
bool constraint_update_necessary =
USE_MC && constraints->ConstraintDifferentDependingOnThreshold();
if (USE_MC) {
constraints->InitCumulativeConstraints(REVERSE);
}
if (REVERSE) { if (REVERSE) {
double sum_right_gradient = 0.0f; double sum_right_gradient = 0.0f;
double sum_right_hessian = kEpsilon; double sum_right_hessian = kEpsilon;
...@@ -910,6 +923,11 @@ class FeatureHistogram { ...@@ -910,6 +923,11 @@ class FeatureHistogram {
continue; continue;
} }
} }
if (USE_MC && constraint_update_necessary) {
constraints->Update(t + offset);
}
// current split gain // current split gain
double current_gain = GetSplitGains<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( double current_gain = GetSplitGains<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
sum_left_gradient, sum_left_hessian, sum_right_gradient, sum_left_gradient, sum_left_hessian, sum_right_gradient,
...@@ -932,6 +950,10 @@ class FeatureHistogram { ...@@ -932,6 +950,10 @@ class FeatureHistogram {
// left is <= threshold, right is > threshold. so this is t-1 // left is <= threshold, right is > threshold. so this is t-1
best_threshold = static_cast<uint32_t>(t - 1 + offset); best_threshold = static_cast<uint32_t>(t - 1 + offset);
best_gain = current_gain; best_gain = current_gain;
if (USE_MC) {
best_right_constraints = constraints->RightToBasicConstraint();
best_left_constraints = constraints->LeftToBasicConstraint();
}
} }
} }
} else { } else {
...@@ -1016,6 +1038,10 @@ class FeatureHistogram { ...@@ -1016,6 +1038,10 @@ class FeatureHistogram {
best_sum_left_hessian = sum_left_hessian; best_sum_left_hessian = sum_left_hessian;
best_threshold = static_cast<uint32_t>(t + offset); best_threshold = static_cast<uint32_t>(t + offset);
best_gain = current_gain; best_gain = current_gain;
if (USE_MC) {
best_right_constraints = constraints->RightToBasicConstraint();
best_left_constraints = constraints->LeftToBasicConstraint();
}
} }
} }
} }
...@@ -1027,7 +1053,7 @@ class FeatureHistogram { ...@@ -1027,7 +1053,7 @@ class FeatureHistogram {
CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( CalculateSplittedLeafOutput<USE_MC, USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>(
best_sum_left_gradient, best_sum_left_hessian, best_sum_left_gradient, best_sum_left_hessian,
meta_->config->lambda_l1, meta_->config->lambda_l2, meta_->config->lambda_l1, meta_->config->lambda_l2,
meta_->config->max_delta_step, constraints, meta_->config->path_smooth, meta_->config->max_delta_step, best_left_constraints, meta_->config->path_smooth,
best_left_count, parent_output); best_left_count, parent_output);
output->left_count = best_left_count; output->left_count = best_left_count;
output->left_sum_gradient = best_sum_left_gradient; output->left_sum_gradient = best_sum_left_gradient;
...@@ -1037,7 +1063,7 @@ class FeatureHistogram { ...@@ -1037,7 +1063,7 @@ class FeatureHistogram {
sum_gradient - best_sum_left_gradient, sum_gradient - best_sum_left_gradient,
sum_hessian - best_sum_left_hessian, meta_->config->lambda_l1, sum_hessian - best_sum_left_hessian, meta_->config->lambda_l1,
meta_->config->lambda_l2, meta_->config->max_delta_step, meta_->config->lambda_l2, meta_->config->max_delta_step,
constraints, meta_->config->path_smooth, num_data - best_left_count, best_right_constraints, meta_->config->path_smooth, num_data - best_left_count,
parent_output); parent_output);
output->right_count = num_data - best_left_count; output->right_count = num_data - best_left_count;
output->right_sum_gradient = sum_gradient - best_sum_left_gradient; output->right_sum_gradient = sum_gradient - best_sum_left_gradient;
...@@ -1053,7 +1079,7 @@ class FeatureHistogram { ...@@ -1053,7 +1079,7 @@ class FeatureHistogram {
hist_t* data_; hist_t* data_;
bool is_splittable_ = true; bool is_splittable_ = true;
std::function<void(double, double, data_size_t, const ConstraintEntry&, std::function<void(double, double, data_size_t, const FeatureConstraint*,
double, SplitInfo*)> double, SplitInfo*)>
find_best_threshold_fun_; find_best_threshold_fun_;
}; };
......
This diff is collapsed.
...@@ -46,7 +46,7 @@ void SerialTreeLearner::Init(const Dataset* train_data, bool is_constant_hessian ...@@ -46,7 +46,7 @@ void SerialTreeLearner::Init(const Dataset* train_data, bool is_constant_hessian
// push split information for all leaves // push split information for all leaves
best_split_per_leaf_.resize(config_->num_leaves); best_split_per_leaf_.resize(config_->num_leaves);
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves)); constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves, train_data_->num_features()));
// initialize splits for leaf // initialize splits for leaf
smaller_leaf_splits_.reset(new LeafSplits(train_data_->num_data())); smaller_leaf_splits_.reset(new LeafSplits(train_data_->num_data()));
...@@ -146,7 +146,7 @@ void SerialTreeLearner::ResetConfig(const Config* config) { ...@@ -146,7 +146,7 @@ void SerialTreeLearner::ResetConfig(const Config* config) {
} }
cegb_->Init(); cegb_->Init();
} }
constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves)); constraints_.reset(LeafConstraintsBase::Create(config_, config_->num_leaves, train_data_->num_features()));
} }
Tree* SerialTreeLearner::Train(const score_t* gradients, const score_t *hessians) { Tree* SerialTreeLearner::Train(const score_t* gradients, const score_t *hessians) {
...@@ -561,7 +561,7 @@ void SerialTreeLearner::SplitInner(Tree* tree, int best_leaf, int* left_leaf, ...@@ -561,7 +561,7 @@ void SerialTreeLearner::SplitInner(Tree* tree, int best_leaf, int* left_leaf,
auto next_leaf_id = tree->NextLeafId(); auto next_leaf_id = tree->NextLeafId();
// update before tree split // update before tree split
constraints_->BeforeSplit(tree, best_leaf, next_leaf_id, constraints_->BeforeSplit(best_leaf, next_leaf_id,
best_split_info.monotone_type); best_split_info.monotone_type);
bool is_numerical_split = bool is_numerical_split =
...@@ -657,7 +657,7 @@ void SerialTreeLearner::SplitInner(Tree* tree, int best_leaf, int* left_leaf, ...@@ -657,7 +657,7 @@ void SerialTreeLearner::SplitInner(Tree* tree, int best_leaf, int* left_leaf,
best_split_info.left_output); best_split_info.left_output);
} }
auto leaves_need_update = constraints_->Update( auto leaves_need_update = constraints_->Update(
tree, is_numerical_split, *left_leaf, *right_leaf, is_numerical_split, *left_leaf, *right_leaf,
best_split_info.monotone_type, best_split_info.right_output, best_split_info.monotone_type, best_split_info.right_output,
best_split_info.left_output, inner_feature_index, best_split_info, best_split_info.left_output, inner_feature_index, best_split_info,
best_split_per_leaf_); best_split_per_leaf_);
...@@ -711,20 +711,29 @@ void SerialTreeLearner::ComputeBestSplitForFeature( ...@@ -711,20 +711,29 @@ void SerialTreeLearner::ComputeBestSplitForFeature(
FeatureHistogram* histogram_array_, int feature_index, int real_fidx, FeatureHistogram* histogram_array_, int feature_index, int real_fidx,
bool is_feature_used, int num_data, const LeafSplits* leaf_splits, bool is_feature_used, int num_data, const LeafSplits* leaf_splits,
SplitInfo* best_split) { SplitInfo* best_split) {
bool is_feature_numerical = train_data_->FeatureBinMapper(feature_index)
->bin_type() == BinType::NumericalBin;
if (is_feature_numerical & !config_->monotone_constraints.empty()) {
constraints_->RecomputeConstraintsIfNeeded(
constraints_.get(), feature_index, ~(leaf_splits->leaf_index()),
train_data_->FeatureNumBin(feature_index));
}
SplitInfo new_split; SplitInfo new_split;
double parent_output; double parent_output;
if (leaf_splits->leaf_index() == 0) { if (leaf_splits->leaf_index() == 0) {
// for root leaf the "parent" output is its own output because we don't apply any smoothing to the root // for root leaf the "parent" output is its own output because we don't apply any smoothing to the root
parent_output = FeatureHistogram::CalculateSplittedLeafOutput<true, true, true, false>( parent_output = FeatureHistogram::CalculateSplittedLeafOutput<false, true, true, false>(
leaf_splits->sum_gradients(), leaf_splits->sum_hessians(), config_->lambda_l1, leaf_splits->sum_gradients(), leaf_splits->sum_hessians(), config_->lambda_l1,
config_->lambda_l2, config_->max_delta_step, constraints_->Get(leaf_splits->leaf_index()), config_->lambda_l2, config_->max_delta_step, BasicConstraint(),
config_->path_smooth, static_cast<data_size_t>(num_data), 0); config_->path_smooth, static_cast<data_size_t>(num_data), 0);
} else { } else {
parent_output = leaf_splits->weight(); parent_output = leaf_splits->weight();
} }
histogram_array_[feature_index].FindBestThreshold( histogram_array_[feature_index].FindBestThreshold(
leaf_splits->sum_gradients(), leaf_splits->sum_hessians(), num_data, leaf_splits->sum_gradients(), leaf_splits->sum_hessians(), num_data,
constraints_->Get(leaf_splits->leaf_index()), parent_output, &new_split); constraints_->GetFeatureConstraint(leaf_splits->leaf_index(), feature_index), parent_output, &new_split);
new_split.feature = real_fidx; new_split.feature = real_fidx;
if (cegb_ != nullptr) { if (cegb_ != nullptr) {
new_split.gain -= new_split.gain -=
......
...@@ -1247,7 +1247,7 @@ class TestEngine(unittest.TestCase): ...@@ -1247,7 +1247,7 @@ class TestEngine(unittest.TestCase):
for test_with_categorical_variable in [True, False]: for test_with_categorical_variable in [True, False]:
trainset = self.generate_trainset_for_monotone_constraints_tests(test_with_categorical_variable) trainset = self.generate_trainset_for_monotone_constraints_tests(test_with_categorical_variable)
for monotone_constraints_method in ["basic", "intermediate"]: for monotone_constraints_method in ["basic", "intermediate", "advanced"]:
params = { params = {
'min_data': 20, 'min_data': 20,
'num_leaves': 20, 'num_leaves': 20,
...@@ -1281,7 +1281,7 @@ class TestEngine(unittest.TestCase): ...@@ -1281,7 +1281,7 @@ class TestEngine(unittest.TestCase):
monotone_constraints = [1, -1, 0] monotone_constraints = [1, -1, 0]
penalization_parameter = 2.0 penalization_parameter = 2.0
trainset = self.generate_trainset_for_monotone_constraints_tests(x3_to_category=False) trainset = self.generate_trainset_for_monotone_constraints_tests(x3_to_category=False)
for monotone_constraints_method in ["basic", "intermediate"]: for monotone_constraints_method in ["basic", "intermediate", "advanced"]:
params = { params = {
'max_depth': max_depth, 'max_depth': max_depth,
'monotone_constraints': monotone_constraints, 'monotone_constraints': monotone_constraints,
...@@ -1320,7 +1320,7 @@ class TestEngine(unittest.TestCase): ...@@ -1320,7 +1320,7 @@ class TestEngine(unittest.TestCase):
unconstrained_model_predictions = unconstrained_model.\ unconstrained_model_predictions = unconstrained_model.\
predict(x3_negatively_correlated_with_y.reshape(-1, 1)) predict(x3_negatively_correlated_with_y.reshape(-1, 1))
for monotone_constraints_method in ["basic", "intermediate"]: for monotone_constraints_method in ["basic", "intermediate", "advanced"]:
params_constrained_model["monotone_constraints_method"] = monotone_constraints_method params_constrained_model["monotone_constraints_method"] = monotone_constraints_method
# The penalization is so high that the first 2 features should not be used here # The penalization is so high that the first 2 features should not be used here
constrained_model = lgb.train(params_constrained_model, trainset_constrained_model, 10) constrained_model = lgb.train(params_constrained_model, trainset_constrained_model, 10)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment