- 08 Oct, 2025 3 commits
- 03 Oct, 2025 1 commit
-
-
Stella Biderman authored
-
- 02 Oct, 2025 3 commits
-
-
Baber Abbasi authored
* update pre-commit * unpin datasets
-
Vineeth authored
-
Janna authored
-
- 22 Sep, 2025 1 commit
-
-
priverabsc authored
* Add eqbench tasks in Spanish and Catalan * Incremented catalan_bench and spanish_bench versions. Added 'multilingual' folder inside 'eq_bench' and moved the eqbench_ca and eqbench_es .yaml to that folder. Updated the tasks README with eqbench_es and eqbench_ca, expliciting inside each description both the Hugging Face link and the translation method. * Fixed tasks table. * remove test_task.sh and results folder * Add utils.py to multilingual folder
-
- 21 Sep, 2025 6 commits
-
-
its-alpesh authored
* Add humaneval_infilling task * pacify pre-commit --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
Janna authored
* register aime * lint --------- Co-authored-by:Baber <baber@hey.com>
-
Janna authored
* create babilong tasks * lint * add clarification * fix typo * add babilong description
-
Luis Cosio authored
* Added benchmark * Added more testing * Added task definition for mmlu_redux and mmlu_redux_spanish * Add MMLU Redux English and Spanish tasks with YAML fixes and READMEs * Add remaining MMLU Redux YAMLs and updated tasks README * Add MMLU Redux English and Spanish tasks with YAML fixes and READMEs * Add MMLU Redux changes from pr-2705 * Resolve pre-commit hook and pytest overlapping group issues by adding mmlu_redux_spanish task entries and unique subgroup names * Enhance retry logic to prevent 429 error when using Hugging Face API for tests, apply pre-commit fixes * Revert python test changes and comments one task group to avoid Hugging Face rate limit and task failure --------- Co-authored-by:CT-6282 <ricardo.godric@hotmail.com>
-
kaixuanliu authored
Signed-off-by:Liu, Kaixuan <kaixuan.liu@intel.com>
-
Timur Aysin authored
* fix: set 'do_sample=False' and use double quotes in 'doc_to_text' * feat: update versions and README for longbench * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
- 12 Sep, 2025 1 commit
-
-
fxmarty-amd authored
-
- 08 Sep, 2025 3 commits
-
-
Slim Frikha authored
* feat(vllm_causallms): make collator ignore seed when splitting batch into chunks * fix(collator): revert PR changes * fix(vllm-causallm): update collator call with groupby None * feat(sglang-causallms): make generation accept a list of sampling params --------- Co-authored-by:Baber <baber@hey.com>
-
James A. Michaelov authored
* add icelandic_winogrande * fix spacing for final words in sentence
-
Lucia Quirke authored
-
- 02 Sep, 2025 4 commits
-
-
Valle Ruiz-Fernández authored
* Add EsBBQ and CaBBQ tasks * Linter fixes * add esbbq and cabbq to task list --------- Co-authored-by:Júlia Falcão <juliafsfalcao@hotmail.com>
-
James A. Michaelov authored
-
James A. Michaelov authored
-
James A. Michaelov authored
* run linter * add acc_norm
-
- 27 Aug, 2025 3 commits
-
-
Gül Sena A authored
* Fix codex-glue/code2text group issue * Added README * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
-
Slim Frikha authored
-
- 26 Aug, 2025 1 commit
-
-
Janna authored
* add AIME tasks * standardize the repeats * fix task naming * aime25 only has test set * edit readme * add utils * standardize * fix case sensitivity * repeat once * lint * more linting * lint huggingface.py
-
- 25 Aug, 2025 4 commits
-
-
Weihao XUAN authored
* update MMLU_ProX * update MMLU_ProX * cleanup code by pre-commit
-
Nikita Savelyev authored
* Add support for OVModelForSeq2SeqLM * Add test
-
William Held authored
* Anthropic Discrim Eval * Mixed Effects Regression * Actually wire it all upo * Operator Name Doesn't Exist on Github * Update lm_eval/tasks/discrim_eval/discrim_eval_implicit.yaml Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> * Update discrim_eval_implicit.yaml * Update discrim_eval_explicit.yaml * pacify pre-commit --------- Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
Geun, Lim authored
* feat: Add CLIcK task * Fix formatting issues * Add Click Task Description * fix: lint * fix
-
- 23 Aug, 2025 1 commit
-
-
Baber Abbasi authored
* update math_verify * remove normalization * use full solution in `parse` * update version
-
- 22 Aug, 2025 1 commit
-
-
Patrick Haller authored
Co-authored-by:Patrick Haller <phmaker@Patricks-MacBook-Pro.local>
-
- 21 Aug, 2025 8 commits
-
-
James A. Michaelov authored
* add lm_syneval * edit readme * update task readme * formatting fixes * run linting * add descriptions and examples * clean readme formatting
-
James A. Michaelov authored
* add turblimp * update general task readme * add normalized accuracy
-
James A. Michaelov authored
* add blimp_nl * add template yaml file
-
James A. Michaelov authored
* add zhoblimp files * correct group name * fix group * add normalized accuracy
-
FranValero97 authored
-
Kurt Yang authored
Adding support for OpenAI GPT-5 model; Models only support hardcoded tempeature=1 and stop=None (#3247)
-
Anri Lombard authored
-
Jafar Isbarov authored
-