• Kozzy Voudouris's avatar
    Add metabench task to LM Evaluation Harness (#2357) · 62b4364d
    Kozzy Voudouris authored
    
    
    * Add metabench (Kipnis et al. 2024)
    
    * Update metabench tasks for full replication of original benchmarks, using publicly available datasets
    
    * Remove unnecessary import
    
    * Add permute versions of each task, where the answer orders are randomly shuffled.
    
    * Add metabench group for easier evaluations
    
    * Fix mmlu counts after removing duplicate
    
    * Add secondary datasets
    
    * Fix f-string error
    
    * Fix f-string error for permute processing
    
    * Add original hash to outputs for easy matching to original results
    
    * Add line break at end of utils files
    
    * Remove extra line from winogrande
    
    * Reformat for linters
    
    * fix multiple input test
    
    * appease pre-commit
    
    * Add metabench to tasks README
    
    * fix multiple input `test_doc_to_text`
    
    ---------
    Co-authored-by: default avatarBaber <baber@hey.com>
    62b4364d
process_docs.py 7.09 KB