- 21 Mar, 2025 2 commits
-
-
Alexandre Marques authored
-
heli-qi authored
* update mmlu_prox configs * update tasks/README * correct hyphon to underline in task/README * update pre-commit codes
-
- 20 Mar, 2025 6 commits
-
-
Alexandre Marques authored
* Update generation_kwargs in default template to include additional end tokens * Update filter_list in MMLU Pro configuration to use strict_match * Update _default_template_yaml
-
Baber Abbasi authored
-
Baber Abbasi authored
-
Yifei Zhang authored
-
Kiersten Stokes authored
* Add markdown linter to pre-commit hooks * Reformat existing markdown (excluding lm_eval/tasks/*.md)
-
Alexandre Marques authored
* Update continuation template YAML for MMLU task with new generation and filtering options * Refactor filter_list structure in continuation template YAML for improved readability * Add 'take_first' function to filter_list in continuation template YAML * Update filter_list in continuation template YAML to use 'strict_match' and modify filtering functions * Add 'do_sample' option to generation_kwargs in MMLU template YAML
-
- 19 Mar, 2025 2 commits
-
-
Stella Biderman authored
-
Kiersten Stokes authored
-
- 18 Mar, 2025 8 commits
-
-
Jaedong Hwang authored
-
Surya Kasturi authored
* Allow writing confing to wandb * set defaults * Update help * Update help
-
Baber Abbasi authored
* add changelog to readme template * add readme * add to task list
-
Baber Abbasi authored
* add min_pixels, max_pixels * fix
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
Jonas Golde authored
* add MastermindEval benchmark * fill out checklist
-
Santiago Galiano Segura authored
* Add cocoteros_va dataset * Fix format in cocoteros_va.yml * Undo newline added * Execute pre-commit to fix format errors * Update catalan_bench.yaml version and add Changelog section into Readme.md
-
Baber Abbasi authored
* add __version__ * add version consistency check to publish action
-
- 17 Mar, 2025 3 commits
-
-
Kiersten Stokes authored
* Add support for token-based auth for watsonx models * Fix lint * Move dotenv import to inner scope * Improve readability of _verify_credentials
-
Angelika Romanou authored
* Add INCLUDE tasks * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Avelina9X authored
* Update openllm.yaml to use train fewshot split for arc
-
- 16 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 14 Mar, 2025 4 commits
-
-
Oskar van der Wal authored
* Implementation of Winogender * Minor fixes README.md * Add winogender * Clean winogender utils.py * Change dataset to one containing All subsets * Flesh out README for BBQ task * Add missing tasks for BBQ * Add simple cooccurrence bias task * Fix wrong mask for ambiguated context+rename metrics * Made generate_until evaluation (following PALM paper) default Also moved separate config files per category to separate metrics using custom function. Created config file for multiple_choice way of evaluating BBQ. * Add missing version metadata * Add missing versionmetadata for bbq multiple choice * Fix metrics and address edge cases * Made BBQ multiple choice the default version * Added settings following winogrande * Add num_fewshot to simple_cooccurrence_bias * Fixes for bbq (multiple choice) * Fix wrong dataset * CrowS-Pairs: make it easier to use another dataset by removing dataset_name from the subsets. * Use simplest prompt possible without description * Merge * BBQ: Fix np.NaN related bug * BBQ: Fix wrong aggregation method for disamb accuracy * BBQ: Make it possible to only evaluate on (dis)ambiguous subset (needed for few shot eval) * BBQ: fix showing one target in case of few-shot evals * BBQ: Fix few-shot example for bbq_generate * BBQ: simplify subtasks * BBQ: Minimize number of UNK variations to reduce inference time * BBQ: Add extra UNK keywords for the generate task * Add a generate_until version of simple_cooccurrence_bias * Change system/description prompt to include few-shot examples * Group agg rework * Run pre-commit * add tasks to readme table * remove trailing space from simple_cooccurrence_bias_gen.yaml `doc_to_text` * fix * fix * fix version --------- Co-authored-by:Baber <baber@hey.com>
-
achervyakov authored
* Added audio-modality pipeline for qwen2-audio model * Beauty imports * fix apply_chat_template args * update default audio placeholders list * add demo task - common_voice subset * add audiolm_qwen libs to pyproject.toml * pre-commit beautify --------- Co-authored-by:Alexandra Rak <rakalexandra@mail.ru>
-
daniel-salib authored
-
Baber Abbasi authored
-
- 12 Mar, 2025 1 commit
-
-
Zeyuan Allen-Zhu authored
minor bug fix, lm_eval.setup_logging -> setup_logging
-
- 11 Mar, 2025 7 commits
-
-
Baber Abbasi authored
* add instruct humaneval * nit * add to readme * nit
-
Surya Kasturi authored
* Capture gen_kwargs from CLI in squad_completion * Update lm_eval/tasks/squad_completion/task.py Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> * Update lm_eval/tasks/squad_completion/task.py Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> * pre-commit --------- Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
PabloAgustin authored
* New healthcare benchmark: careqa * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0> * Add fixes, READMES, and remove task_list.txt * pre-commit passed, add formatting updates; add nanmean agg_metric * Fix import error. * Wrapped imports in try excepts * Wrapped imports in try excepts; also metrics to catch bert_score import error * Try except to catch ImportErrors as well * use np.nan * pre-commit --------- Co-authored-by:
PabloAgustin <pablo.martin@bsc.es> Co-authored-by:
Baber <baber@hey.com>
-
Kajetan Dymkiewicz authored
* fix for mc2 calculation * increment versions and changelog --------- Co-authored-by:Baber <baber@hey.com>
-
Yotam Perlitz authored
* Filter new leaderboard_math_hard dataset to "Level 5" only * align to linters Signed-off-by:
Yotam Perlitz <y.perlitz@ibm.com> --------- Signed-off-by:
Yotam Perlitz <y.perlitz@ibm.com>
-
Giulio Lovisotto authored
-
Baber Abbasi authored
-
- 10 Mar, 2025 1 commit
-
-
Rui Vieira authored
-
- 06 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 05 Mar, 2025 3 commits
-
-
Baber Abbasi authored
* bug fix * add warning for instruct models * nit
-
Yongkeun Hwang authored
-
Baber Abbasi authored
-
- 04 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-