Commits · eec9de3eb3bc7298302fa042d5d5b82a5b169d52 · gaoqiong / lm-evaluation-harness

25 Jul, 2025 5 commits
- add TaskRegistry · eec9de3e
  Baber authored Jul 25, 2025
  
  eec9de3e
- refactor yaml loading · c8f9991f
  Baber authored Jul 25, 2025
  
  c8f9991f
- refactor: improve type hints and simplify YAML loading functions · d77596da
  Baber authored Jul 25, 2025
  
  d77596da
- refactor: improve type hints and simplify YAML loading functions · 15e930af
  Baber authored Jul 25, 2025
  
  15e930af
- fix · 4a0a8bd8
  Baber authored Jul 25, 2025
  
  4a0a8bd8
14 Jul, 2025 2 commits
- refactor: remove root_dir in iglob · c40a012a
  Baber authored Jul 14, 2025
  
  c40a012a
- refactor: update type hints for 3.9 · bfaf24cd
  Baber authored Jul 14, 2025
  
  bfaf24cd
12 Jul, 2025 4 commits
- refactor: replace ConfigurableGroup with GroupConfig · 0aca6958
  Baber authored Jul 13, 2025
  
  0aca6958
- refactor: simplify docstrings and improve task name matching logic · 7fcfb4ac
  Baber authored Jul 12, 2025
  
  7fcfb4ac
- add docs · 5e632643
  Baber authored Jul 12, 2025
  
  5e632643
- nit · 68b3cddc
  Baber authored Jul 12, 2025
  
  68b3cddc
11 Jul, 2025 7 commits
- nit · 495ea3a0
  Baber authored Jul 12, 2025
  
  495ea3a0
- nit · 3c969207
  Baber authored Jul 12, 2025
  
  3c969207
- fix circular · 6fc2ac49
  Baber authored Jul 12, 2025
  
  6fc2ac49
- refactor: migrate utils functions to lm_eval.tasks and update references · a9c16905
  Baber authored Jul 12, 2025
  
  a9c16905
- refactor: add type hints · 85f61d85
  Baber authored Jul 11, 2025
  
  85f61d85
- refactor: use Path · 5454e95d
  Baber authored Jul 11, 2025
  
  5454e95d
- refactor: enhance task loading by including yaml_path parameter · 0a184f46
  Baber authored Jul 11, 2025
  
  0a184f46
10 Jul, 2025 3 commits
- refactor: preserve original task name during config inclusion · 15c01f4d
  Baber authored Jul 11, 2025
  
  15c01f4d
- refactor: simplify task and config validation methods · acc634fa
  Baber authored Jul 11, 2025
  
  acc634fa
- warning for "chat" pretrained; disable buggy evalita configs (#3127) · f3a0b554
  Baber Abbasi authored Jul 10, 2025
```
* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test
```
  f3a0b554
03 Jul, 2025 2 commits

Humaneval - fix regression (#3102) · 8c1016cb
Baber Abbasi authored Jul 03, 2025
```
* use double quotes
```
8c1016cb

Truthfulqa multi harness (#3062) · e0dc33ae

Blanca Calvo authored Jul 03, 2025



* truthfulqa-multi task

* truthfulqa-multi with chat few-shot

* few shot chat implementation

* changed until so it outputs lists

* changed dataset location

* added MT task

* Create README.md

* do not include MT

* changes for PR

* tag change

* removed yaml extension

* adding task to the table

* fix task configs

* add import exception

---------
Co-authored-by: Baber <baber@hey.com>

e0dc33ae

30 Jun, 2025 1 commit

FixBug: Align the Humaneval with official results for Llama-3.1-70B-Instruct (#3092) · a7ca0435

jinze authored Jul 01, 2025

* Fix: Align the Humaneval dataset with official results

Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals".

(2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one.

Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5).

Ref: PR#2650

* add changelog and version

* add changelog

a7ca0435

25 Jun, 2025 1 commit
- Ensure backwards compatibility in fewshot_context by using kwargs (#3079) · 532909c0
  Kiersten Stokes authored Jun 25, 2025
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
```
  532909c0
20 Jun, 2025 1 commit

llama3 task: update README.md (#3074) · 68c3a811

Anna Fontana authored Jun 20, 2025

"arc_chalenge_chat" doesn't exist: I think it should be "arc_challenge_chat", but this task is not implemented here (see arc task folder).

68c3a811

19 Jun, 2025 2 commits
- Update instructions.py (#3060) · 37357004
  Maxim Evtush authored Jun 19, 2025
  
  37357004
- Update README.md (#3070) · 5a15058e
  Anna Fontana authored Jun 19, 2025
```
Wrong task name: mmlu_generation doesn't non exist -> mmlu_generative is the correct one
```
  5a15058e
16 Jun, 2025 2 commits
- fix longbech citation (#3061) · 9fbe48c2
  Baber Abbasi authored Jun 16, 2025
```
* fix longbech citation
```
  9fbe48c2
- Fix Typo in README and Comment in utils_mcq.py (#3057) · e20ef72e
  fuder.eth authored Jun 16, 2025
```
* Update README.md

* Update utils_mcq.py
```
  e20ef72e
12 Jun, 2025 1 commit
- Fallback to super impl in fewshot_context for Unitxt tasks (#3023) · d09e03dd
  Kiersten Stokes authored Jun 12, 2025
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
```
  d09e03dd
08 Jun, 2025 1 commit

[longbench] fix metric calculation (#2983) · 147e9d61

Baber Abbasi authored Jun 08, 2025

* use all answers

* use middle truncation

* maybe fix classification score

* strip classification preds

* [vllm] remove stop tokens post-hoc

* strip all preds

* pacify pre-commit

* start on truncation utility

* add to readme

* add a footgun doc

* fix newline in yaml templates

* do not strip code_sim preds!

* fix pre-commit config

* fix instruction warning

* add not to longbench readme

147e9d61

03 Jun, 2025 2 commits

remove prints (#3041) · 9f152e0b
Baber Abbasi authored Jun 03, 2025

9f152e0b

add Mbpp instruct (#2995) · 60e85da5

Baber Abbasi authored Jun 03, 2025

* feat: add mbpp_instruct

* fix: update generation_kwargs to use an empty until list

* fix: correct predictions formatting in pass_at_1 function

* fix: improve code block extraction by checking first without opening backticks

* fix mbpp `pass_at_1`

60e85da5

26 May, 2025 1 commit
- add arab_culture task (#3006) · 8bc4afff
  Boda Sadallah authored May 26, 2025
```
* add arab_culture tasks

* add target_delimeter and remove debugging code
```
  8bc4afff
21 May, 2025 1 commit
- add kbl 2025 (#3000) · 8be417a8
  Hongseok Oh authored May 21, 2025
  
  8be417a8
19 May, 2025 2 commits

[SGLANG] Add the SGLANG generate API (#2997) · 53c65300
Baber Abbasi authored May 19, 2025
```
* add `sglang-generate`

* nit

* nit

* nit

* pacify pre-commit
```
53c65300

Adding ACPBench Hard tasks (#2980) · 0daf28fd

Harsha authored May 19, 2025

* adding ACPBench_hard

* adding Clingo

* changing tarski to tarski[clingo]

* denoting the main variants in each paper

0daf28fd

15 May, 2025 2 commits
- fix formatting (#2759) · 0126f6d1
  Baber Abbasi authored May 15, 2025
  
  0126f6d1
- Update utils.py (#2870) · 2bde99e4
  tawsif authored May 15, 2025
  
  2bde99e4