• remcob-gr's avatar
    Add Cost Effective Gradient Boosting (#2014) · 76102284
    remcob-gr authored
    * Add configuration parameters for CEGB.
    
    * Add skeleton CEGB tree learner
    
    Like the original CEGB version, this inherits from SerialTreeLearner.
    Currently, it changes nothing from the original.
    
    * Track features used in CEGB tree learner.
    
    * Pull CEGB tradeoff and coupled feature penalty from config.
    
    * Implement finding best splits for CEGB
    
    This is heavily based on the serial version, but just adds using the coupled penalties.
    
    * Set proper defaults for cegb parameters.
    
    * Ensure sanity checks don't switch off CEGB.
    
    * Implement per-data-point feature penalties in CEGB.
    
    * Implement split penalty and remove unused parameters.
    
    * Merge changes from CEGB tree learner into serial tree learner
    
    * Represent features_used_in_data by a bitset, to reduce the memory overhead of CEGB, and add sanity checks for the lengths of the penalty vectors.
    
    * Fix bug where CEGB would incorrectly penalise a previously used feature
    
    The tree learner did not update the gains of previously computed leaf splits when splitting a leaf elsewhere in the tree.
    This caused it to prefer new features due to incorrectly penalising splitting on previously used features.
    
    * Document CEGB parameters and add them to the appropriate section.
    
    * Remove leftover reference to cegb tree learner.
    
    * Remove outdated diff.
    
    * Fix warnings
    
    * Fix minor issues identified by @StrikerRUS.
    
    * Add docs section on CEGB, including citation.
    
    * Fix link.
    
    * Fix CI failure.
    
    * Add some unit tests
    
    * Fix pylint issues.
    
    * Fix remaining pylint issue
    76102284
config_auto.cpp 21.4 KB