- 25 Sep, 2025 24 commits
-
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
- 23 Jul, 2025 1 commit
-
-
Baber Abbasi authored
* Fix: pin datasets < 4.0 * fix * update type hints in HF * fix hellaswag path
-
- 14 Jul, 2025 1 commit
-
-
Ankit Gola authored
-
- 05 Jul, 2025 1 commit
-
-
Baber Abbasi authored
-
- 04 Jul, 2025 1 commit
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
- 25 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 03 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* fix: bug in acc_mutual_info slicing; add `target_delimiter` to uncond choices * add tests
-
- 21 May, 2025 1 commit
-
-
Baber Abbasi authored
This reverts commit 4dbd5ec9
-
- 19 May, 2025 1 commit
-
-
Baber Abbasi authored
* add `sglang-generate` * nit * nit * nit * pacify pre-commit
-
- 15 May, 2025 1 commit
-
-
Tingchen Fu authored
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* add warning in for default until * fix stop tokens; add vcsum * bugfix:fix doc_to_target to string * fix lsht, trec * add task to readme * add debugging logs for multiple input/output
-
- 07 Apr, 2025 1 commit
-
-
Felipe Maia Polo authored
Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520) * added option --examples * specifying examples in dictionary * run pre-commit - fix arg type Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * fixing bug when examples==None * fixing bug when examples==None * limit or examples must be None in simple_evaluate.py and in evaluator.py * run pre-commit (fix formatting) Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * merge main and run pre-commit (fix formatting) Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * Update __main__.py undefined "limit" and "examples" * update branch, fix conflicts, run pre-commit * nits * nits * change 'examples' to 'samples' --------- Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com Co-authored-by:
mirianfrsilva <mirianfrsilva@ibm.com> Co-authored-by:
Stella Biderman <stellabiderman@gmail.com> Co-authored-by:
Baber <baber@hey.com>
-
- 18 Mar, 2025 1 commit
-
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
- 14 Mar, 2025 1 commit
-
-
achervyakov authored
* Added audio-modality pipeline for qwen2-audio model * Beauty imports * fix apply_chat_template args * update default audio placeholders list * add demo task - common_voice subset * add audiolm_qwen libs to pyproject.toml * pre-commit beautify --------- Co-authored-by:Alexandra Rak <rakalexandra@mail.ru>
-
- 11 Mar, 2025 1 commit
-
-
PabloAgustin authored
* New healthcare benchmark: careqa * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0> * Add fixes, READMES, and remove task_list.txt * pre-commit passed, add formatting updates; add nanmean agg_metric * Fix import error. * Wrapped imports in try excepts * Wrapped imports in try excepts; also metrics to catch bert_score import error * Try except to catch ImportErrors as well * use np.nan * pre-commit --------- Co-authored-by:
PabloAgustin <pablo.martin@bsc.es> Co-authored-by:
Baber <baber@hey.com>
-
- 04 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 21 Feb, 2025 1 commit
-
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-