-
Kozzy Voudouris authored
* Add metabench (Kipnis et al. 2024) * Update metabench tasks for full replication of original benchmarks, using publicly available datasets * Remove unnecessary import * Add permute versions of each task, where the answer orders are randomly shuffled. * Add metabench group for easier evaluations * Fix mmlu counts after removing duplicate * Add secondary datasets * Fix f-string error * Fix f-string error for permute processing * Add original hash to outputs for easy matching to original results * Add line break at end of utils files * Remove extra line from winogrande * Reformat for linters * fix multiple input test * appease pre-commit * Add metabench to tasks README * fix multiple input `test_doc_to_text` --------- Co-authored-by:Baber <baber@hey.com>
62b4364d