• Baber Abbasi's avatar
    [longbench] fix metric calculation (#2983) · 147e9d61
    Baber Abbasi authored
    * use all answers
    
    * use middle truncation
    
    * maybe fix classification score
    
    * strip classification preds
    
    * [vllm] remove stop tokens post-hoc
    
    * strip all preds
    
    * pacify pre-commit
    
    * start on truncation utility
    
    * add to readme
    
    * add a footgun doc
    
    * fix newline in yaml templates
    
    * do not strip code_sim preds!
    
    * fix pre-commit config
    
    * fix instruction warning
    
    * add not to longbench readme
    147e9d61
utils.py 30.4 KB