Commits · 0087929ec4e6cdd445888d553ecfb63828765d85 · gaoqiong / lm-evaluation-harness

23 Jul, 2025 2 commits
- move test one doc to method · 0087929e
  Baber authored Jul 23, 2025
  
  0087929e
- overload Task methods if callable in yaml dict · f40e9b58
  Baber authored Jul 23, 2025
  
  f40e9b58
22 Jul, 2025 4 commits
- types · 87445e95
  Baber authored Jul 23, 2025
  
  87445e95
- type hints · f264f2e2
  Baber authored Jul 22, 2025
  
  f264f2e2
- make multiple_input explicit · 230352ce
  Baber authored Jul 22, 2025
  
  230352ce
- nit · 00048838
  Baber authored Jul 22, 2025
  
  00048838
21 Jul, 2025 4 commits
- feat: implement check_gold_index_error utility and refactor process_results... · 55be51ea
  Baber authored Jul 22, 2025
```
feat: implement check_gold_index_error utility and refactor process_results for improved error handling. remove generate_until multiple-choice
```
  55be51ea
- nit · 9f345f33
  Baber authored Jul 21, 2025
  
  9f345f33
- nit · e3fee7ea
  Baber authored Jul 21, 2025
  
  e3fee7ea
- move multi_target to `exact_match` · 8f924e1c
  Baber authored Jul 21, 2025
  
  8f924e1c
19 Jul, 2025 1 commit
- type hints · 4facd5c8
  Baber authored Jul 19, 2025
  
  4facd5c8
18 Jul, 2025 1 commit
- remove prompt-source for now · b6f38ac8
  Baber authored Jul 18, 2025
  
  b6f38ac8
10 Jul, 2025 2 commits
- fix tests · fbd34827
  Baber authored Jul 10, 2025
  
  fbd34827
- fixup · 66736bc1
  Baber authored Jul 10, 2025
  
  66736bc1
08 Jul, 2025 2 commits
- refactor: improve dataset and metric handling in TaskConfig · fedaf262
  Baber authored Jul 08, 2025
  
  fedaf262
- refactor: update type hints and improve filter ensemble construction · 863ff340
  Baber authored Jul 08, 2025
  
  863ff340
07 Jul, 2025 1 commit
- nit · 5efa7937
  Baber authored Jul 08, 2025
  
  5efa7937
04 Jul, 2025 2 commits
- refactor configs to files · fb63ac0f
  Baber authored Jul 04, 2025
  
  fb63ac0f
- nit · b0aca59b
  Baber authored Jul 04, 2025
  
  b0aca59b
03 Jul, 2025 1 commit
- type hints · db5dff9c
  Baber authored Jul 03, 2025
  
  db5dff9c
01 Jul, 2025 1 commit
- add docs · 49bfaf68
  Baber authored Jul 01, 2025
  
  49bfaf68
30 Jun, 2025 6 commits
- add temlplateconfigs · 15d07121
  Baber authored Jul 01, 2025
  
  15d07121
- update type hints · 9b192374
  Baber authored Jun 30, 2025
  
  9b192374
- add `sample_metric` and `is_elementwise` to MetricConfig · cb8dfe63
  Baber authored Jun 30, 2025
  
  cb8dfe63
- add FewshotConfig · 108674ed
  Baber authored Jun 30, 2025
  
  108674ed
- nit · c5aa5cf0
  Baber authored Jun 30, 2025
  
  c5aa5cf0
- add MetricConfig · 1b5c6f88
  Baber authored Jun 30, 2025
  
  1b5c6f88
25 Jun, 2025 1 commit
- Ensure backwards compatibility in fewshot_context by using kwargs (#3079) · 532909c0
  Kiersten Stokes authored Jun 25, 2025
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
```
  532909c0
03 Jun, 2025 1 commit
- [Fix] acc_mutual_info metric calculation bug (#3035) · 3f792954
  Baber Abbasi authored Jun 03, 2025
```
* fix: bug in acc_mutual_info slicing; add `target_delimiter` to uncond choices

* add tests
```
  3f792954
21 May, 2025 1 commit
- Revert "feat: add question suffix (#2876)" (#3007) · 29ea6832
  Baber Abbasi authored May 21, 2025
```
This reverts commit 4dbd5ec9
```
  29ea6832
19 May, 2025 1 commit
- [SGLANG] Add the SGLANG generate API (#2997) · 53c65300
  Baber Abbasi authored May 19, 2025
```
* add `sglang-generate`

* nit

* nit

* nit

* pacify pre-commit
```
  53c65300
15 May, 2025 1 commit
- feat: add question suffix (#2876) · 4dbd5ec9
  Tingchen Fu authored May 15, 2025
  
  4dbd5ec9
16 Apr, 2025 1 commit

Baber Abbasi authored Apr 17, 2025

* add warning in for default until

* fix stop tokens; add vcsum

* bugfix:fix doc_to_target to string

* fix lsht, trec

* add task to readme

* add debugging logs for multiple input/output

930d8378

07 Apr, 2025 1 commit

Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2

Felipe Maia Polo authored Apr 07, 2025


 Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)

* added option --examples

* specifying examples in dictionary

* run pre-commit - fix arg type

Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com

* fixing bug when examples==None

* fixing bug when examples==None

* limit or examples must be None in simple_evaluate.py and in evaluator.py

* run pre-commit (fix formatting)

Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com

* merge main and run pre-commit (fix formatting)

Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com

* Update __main__.py

undefined "limit" and "examples"

* update branch, fix conflicts, run pre-commit

* nits

* nits

* change 'examples' to 'samples'

---------

Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
Co-authored-by: mirianfrsilva <mirianfrsilva@ibm.com>
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
Co-authored-by: Baber <baber@hey.com>

d693dcd2

18 Mar, 2025 1 commit

Add loncxt tasks (#2629) · 80a10075

Baber Abbasi authored Mar 18, 2025

suport for longcontext (and other synthetic tasks)
* add ruler
* add longbench
* pass `metadata` to TaskConfig

80a10075

14 Mar, 2025 1 commit

add audio modality (qwen2 audio only) (#2689) · 62552d2c

achervyakov authored Mar 14, 2025



* Added audio-modality pipeline for qwen2-audio model

* Beauty imports

* fix apply_chat_template args

* update default audio placeholders list

* add demo task - common_voice subset

* add audiolm_qwen libs to pyproject.toml

* pre-commit beautify

---------
Co-authored-by: Alexandra Rak <rakalexandra@mail.ru>

62552d2c

04 Mar, 2025 1 commit
- add debug log (#2757) · 74332955
  Baber Abbasi authored Mar 04, 2025
  
  74332955
21 Feb, 2025 1 commit

Logging (#2203) · 1ba35e62

Lintang Sutawika authored Feb 20, 2025



* changed source of eval_logger

* allow eval_logger to be set from args

* removed verbosity arg from non-main methods

* fix logging

* pre-commit

* set verbosity in eval logger

* replace utils.eval_logger

* fix logging in main

* add logging to docs

* add logging message

* nit

* add logging to docs

* refactor setup_logging to utils

---------
Co-authored-by: Baber <baber@hey.com>

1ba35e62

14 Feb, 2025 1 commit
- Update remaining references to assistant_prefill to gen_prefix (#2683) · ef6f5243
  Kiersten Stokes authored Feb 14, 2025
  
  ef6f5243
06 Feb, 2025 1 commit
- fix early return for multuple dict (#2673) · 144a1e58
  Baber Abbasi authored Feb 06, 2025
  
  144a1e58