- 21 Jul, 2025 1 commit
-
-
Baber authored
-
- 14 Jul, 2025 1 commit
-
-
Ankit Gola authored
-
- 04 Jul, 2025 1 commit
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
- 30 Jun, 2025 2 commits
- 11 Mar, 2025 1 commit
-
-
PabloAgustin authored
* New healthcare benchmark: careqa * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0> * Add fixes, READMES, and remove task_list.txt * pre-commit passed, add formatting updates; add nanmean agg_metric * Fix import error. * Wrapped imports in try excepts * Wrapped imports in try excepts; also metrics to catch bert_score import error * Try except to catch ImportErrors as well * use np.nan * pre-commit --------- Co-authored-by:
PabloAgustin <pablo.martin@bsc.es> Co-authored-by:
Baber <baber@hey.com>
-
- 21 Feb, 2025 1 commit
-
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-
- 19 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update pre-commit
-
- 01 Aug, 2024 1 commit
-
-
Nathan Weinberg authored
* refactor: move scipy and sklearn module imports to func imports Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * refactor: consolidate weighted_f1_score func into lm_eval utils Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * lint: allow for utils file to have unused imports this allows for shared functions to be defined only once while allowing for the YAML function importing to continue working Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> --------- Signed-off-by:
Nathan Weinberg <nweinber@redhat.com>
-
- 15 Jul, 2024 1 commit
-
-
Lintang Sutawika authored
-
- 12 Jul, 2024 1 commit
-
-
Jess authored
* add afrixnli to task * add chat completion * remove chat completion -untested * afrimmlu added * afrimmlu folder update * afrimmlu folder update * updated prompt * remove print * add afrimgsm -direct * add squad metric * fix bash script * remove direct util, update common yaml * remove print * add few show. metric fixes * fix direct path, add bash script for gpt models * added transate test * update afrixnli tasks * update afrixnli tasks * update metrics for afrixnli * prompt translations fix * prompt translations fix * filter and metric fix -mgsm * remove squad metric * remove squad metric * add f1 score to mgsm * add f1 score to mgsm * update native-direct with lin * change f1 function * add lin to utils * add utils * remove test limit * remove test configs * add swahili to mmlu * change eng to ewe in ewe yaml mmlu * add squad metric to mgsm, remove whitespace filter * added translate test * added afrixnli_translate * fix exact match valueError * fix exact match valueError * restructure mmlu folder * spacing * remove afrimmlu_translate folder * add utility * format task name, clean ups * modefied mgsm * update on afrimgsm * update on afrimgsm * removed utils * other mgsm varieties * other mgsm varieties * adding trasnslate direct * Update translate_direct_yaml * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model * edit for open models * Update translate_direct_yaml * add verbalizer for xnli * change xnli from multiple choice to generate * add manual accuracy scores * revert xnli to multiple choice * change afrimgsm utils * revert xnli to multiple_choice * cleanups and readmes * remove openai fixes and unused regex * pr review changes * revert metrics.py, task.py and extraction.py to main version --------- Co-authored-by:
Israel Abebe Azime <azime@cg.uni-saarland.de> Co-authored-by:
Israel Abebe Azime <se.israel.abebe@gmail.com>
-
- 01 Jul, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 24 May, 2024 2 commits
-
-
Hailey Schoelkopf authored
* add handling for bootstrap_iters=0 case * add more detail to docstring * run precommit
-
Lintang Sutawika authored
`gold_one_hot` needs to follow the dimension of predictions so that it still works when `--limit` is used and the indexes in gold does not cover all gold indexes.
-
- 26 Feb, 2024 1 commit
-
-
Lintang Sutawika authored
* add brier_score * process brier_score * brier score is working for N-sized class * fxied brier score * add TED to BigBench and Brier score to MMLU * format * Update metrics.py * Update task.py * Update generate_until_template_yaml * Delete lm_eval/tasks/bigbench/aux_metric.py * Update generate_until_template_yaml * Update _default_template_yaml * Update _generate_configs.py * Update _generate_configs.py * Update _generate_configs.py * fix (format?) * format? * format, once more --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 20 Feb, 2024 1 commit
-
-
Baber Abbasi authored
* add key lookup for same contexts * nit * appease pre-commit * nit * use `expand` (in-place view) rather than `repeat` * try mixed grouping * add docs. * nit * nit * nits * fix tests * Move greedy_tokens calculation out of cache loop * nit * nits * add test * nits * fix name conflict * fix name conflict * chunk tensor * move Collator * nits/docstring * fixup * fixup * group contexts only for decoders * pre-commit * fix `generate_until` test * fix `generate_until` test * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add docs * nit * add docs * add docs * add 'logits_cache' arg * bugfix --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 13 Feb, 2024 1 commit
-
-
Hailey Schoelkopf authored
* fix weight_by_size condition * add tests, update stderr formula slightly * apply pre-commit
-
- 06 Feb, 2024 1 commit
-
-
Hailey Schoelkopf authored
* update formula for stderr aggregation * hack: see what happens when using stderr_for_metric bootstrapping on a group * undo bootstrap_for_stderr test * factor out variance-aggregation formulas into api.metrics * fix failing tests * remove stray print * update comment * further detail in comment * add back initialize_tasks() call * fix format
-
- 31 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo
-
- 20 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* add ruff and isort. remove black and flake8 * remove unnecessary dependencies * remove dependency from table * change order * ran ruff * check 3.9 * exclude evaluator * update CI workflow * use ruff config in pyproject.toml * test * add isort rules to ruff * sort imports * import `make_table` * try stages for no-commit-to-branch * turn on mypy for pre-commit * test * test * test * change no-commit-to-branch to default * nits * fixed dependency
-
- 02 Nov, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 19 Oct, 2023 3 commits
-
-
haileyschoelkopf authored
-
lintangsutawika authored
-
lintangsutawika authored
-
- 18 Oct, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 25 Aug, 2023 1 commit
-
-
Ethan Smith authored
This adds a bunch of simple annotations suggested by https://github.com/JelleZijlstra/autotyping.
-
- 14 Aug, 2023 5 commits
- 12 Aug, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 11 Aug, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 03 Aug, 2023 1 commit
-
-
Aflah authored
-
- 02 Aug, 2023 4 commits
- 06 Jul, 2023 1 commit
-
-
haileyschoelkopf authored
-