- 05 Feb, 2025 10 commits
- 04 Feb, 2025 1 commit
-
-
Baber authored
-
- 29 Jan, 2025 3 commits
-
-
Baber authored
-
Irina Proskurina authored
* Add Histoires Morales task * Histoires Morales task: fix mixed line endings * Histoires Morales task: fix mixed line endings * Remove tag for a single task * Add some MT for Histoires Morales
-
Baber Abbasi authored
* remove group from task configs * add tags * update readme
-
- 28 Jan, 2025 5 commits
-
-
Baber Abbasi authored
* nit * update pre-commit
-
Seungwoo Ryu authored
Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
* feat: drop Python 3.8 support * feat: drop Python 3.8 tests * pre-commit * handle chat_template for multiple iput
-
Nicky Pochinkov authored
* add TransformerLens example Many people use TransformerLens to do interpretability and interventions on models, and then need to test the model. Here is a simple script that allows one to pass in the TransformerLens model and run evaluations on it. * Ran pre-commit checks
-
Irina Proskurina authored
* Add moral stories task * Add moral stories task * Create README.md * Update README.md * Update line endings in moral_stories files
-
- 24 Jan, 2025 1 commit
-
-
Minho Ryu authored
* separate category * set version 0.0 * apply precommit
-
- 21 Jan, 2025 3 commits
-
-
Jan Kaniecki authored
* Update vllm_vlms.py * pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Minho Ryu authored
-
Minho Ryu authored
-
- 20 Jan, 2025 6 commits
-
-
Ramiro R. C. authored
* fixed mmlu generative response extraction * updated file version | added args to exact_match * fix * fix * pre-commit * fix groups --------- Co-authored-by:Baber <baber@hey.com>
-
nike00811 authored
-
Gyouk Chu authored
* Update KorMedMCQA: ver 2.0 * Fix pre-commit formatting issues * Update KorMedMCQA v2.0 * pre-commit
-
Minho Ryu authored
-
Boda Sadallah authored
* point to the original ArabicMMLU dataset * create the new subtasks files * fix bug when the context filed is empty
-
Minho Ryu authored
* add hrm8k benchmark for both Korean and English * apply precommit * revise tasks to make models not to directly answer; use zeroshot_cot if possible * add README * Add hrm8k on the task-list --------- Co-authored-by:Baber <baber@hey.com>
-
- 19 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update pre-commit
-
- 17 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* switch arg
-
- 16 Jan, 2025 4 commits
- 15 Jan, 2025 5 commits
-
-
Baber authored
# Conflicts: # lm_eval/models/openai_completions.py
-
Baber Abbasi authored
* add assistant prefix * add arc_challenge from llama * nit * nit * nit * add assistant prefix * add mmlu_llama * nit * nit * Revert "nit" This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc. * fix regex bug * add assistant_prefix to vllm * add `Question:` * add mmlu_pro * add fewshot assistant_prefix * use `assistant_prefill` * typehints * nits * nits * add to docs * add readme
-
Shivansh Pachnanda authored
* Add MLQA * add mlqa_common_yaml * add 49 tests of mlqa family * update tasks/README.md --------- * fix: mlqa ast error * nit: removed .yaml ext from template_yaml * nit changes: minor modifications generate_tasks.py * deleted lm_eval/tasks/mlqa/mlqa_common_yaml.yaml * tests updated * nit
-
Hojin Lee authored
* add mbpp * fix some bugs * add README for mbpp * update README * nits --------- Co-authored-by:
Hojin Lee <19949034+hjlee1371@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
Hojin Lee authored
* add custom filter * fix type casting of references * add humaneval * fix a bug in humaneval * add greedy version of humaneval * update tasks README * test humaneval * return multiple metrics * nit * add confirmation to run code tasks * nit * nit --------- Co-authored-by:
Hojin Lee <19949034+hjlee1371@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-