- 10 Jul, 2025 1 commit
-
-
Baber authored
-
- 08 Jul, 2025 2 commits
- 07 Jul, 2025 3 commits
- 05 Jul, 2025 1 commit
-
-
Baber Abbasi authored
-
- 04 Jul, 2025 4 commits
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
Baber authored
-
Baber authored
-
Baber authored
-
- 03 Jul, 2025 1 commit
-
-
Baber authored
-
- 01 Jul, 2025 1 commit
-
-
Baber authored
-
- 30 Jun, 2025 7 commits
- 25 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 03 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* fix: bug in acc_mutual_info slicing; add `target_delimiter` to uncond choices * add tests
-
- 21 May, 2025 1 commit
-
-
Baber Abbasi authored
This reverts commit 4dbd5ec9
-
- 19 May, 2025 1 commit
-
-
Baber Abbasi authored
* add `sglang-generate` * nit * nit * nit * pacify pre-commit
-
- 15 May, 2025 1 commit
-
-
Tingchen Fu authored
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* add warning in for default until * fix stop tokens; add vcsum * bugfix:fix doc_to_target to string * fix lsht, trec * add task to readme * add debugging logs for multiple input/output
-
- 07 Apr, 2025 1 commit
-
-
Felipe Maia Polo authored
Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520) * added option --examples * specifying examples in dictionary * run pre-commit - fix arg type Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * fixing bug when examples==None * fixing bug when examples==None * limit or examples must be None in simple_evaluate.py and in evaluator.py * run pre-commit (fix formatting) Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * merge main and run pre-commit (fix formatting) Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * Update __main__.py undefined "limit" and "examples" * update branch, fix conflicts, run pre-commit * nits * nits * change 'examples' to 'samples' --------- Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com Co-authored-by:
mirianfrsilva <mirianfrsilva@ibm.com> Co-authored-by:
Stella Biderman <stellabiderman@gmail.com> Co-authored-by:
Baber <baber@hey.com>
-
- 18 Mar, 2025 1 commit
-
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
- 14 Mar, 2025 1 commit
-
-
achervyakov authored
* Added audio-modality pipeline for qwen2-audio model * Beauty imports * fix apply_chat_template args * update default audio placeholders list * add demo task - common_voice subset * add audiolm_qwen libs to pyproject.toml * pre-commit beautify --------- Co-authored-by:Alexandra Rak <rakalexandra@mail.ru>
-
- 11 Mar, 2025 1 commit
-
-
PabloAgustin authored
* New healthcare benchmark: careqa * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0> * Add fixes, READMES, and remove task_list.txt * pre-commit passed, add formatting updates; add nanmean agg_metric * Fix import error. * Wrapped imports in try excepts * Wrapped imports in try excepts; also metrics to catch bert_score import error * Try except to catch ImportErrors as well * use np.nan * pre-commit --------- Co-authored-by:
PabloAgustin <pablo.martin@bsc.es> Co-authored-by:
Baber <baber@hey.com>
-
- 04 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 21 Feb, 2025 1 commit
-
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-
- 14 Feb, 2025 2 commits
-
-
Baber Abbasi authored
* set target delimiter to empty string * nit * add warning
-
Kiersten Stokes authored
-
- 06 Feb, 2025 1 commit
-
-
Baber Abbasi authored
-
- 28 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* feat: drop Python 3.8 support * feat: drop Python 3.8 tests * pre-commit * handle chat_template for multiple iput
-
- 19 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update pre-commit
-
- 17 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* switch arg
-
- 15 Jan, 2025 2 commits
-
-
Baber Abbasi authored
* add assistant prefix * add arc_challenge from llama * nit * nit * nit * add assistant prefix * add mmlu_llama * nit * nit * Revert "nit" This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc. * fix regex bug * add assistant_prefix to vllm * add `Question:` * add mmlu_pro * add fewshot assistant_prefix * use `assistant_prefill` * typehints * nits * nits * add to docs * add readme
-
Hojin Lee authored
* add custom filter * fix type casting of references * add humaneval * fix a bug in humaneval * add greedy version of humaneval * update tasks README * test humaneval * return multiple metrics * nit * add confirmation to run code tasks * nit * nit --------- Co-authored-by:
Hojin Lee <19949034+hjlee1371@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-