Commits · bfaf24cdcc77ed5d225899b5ddc136da5331ecc6 · gaoqiong / lm-evaluation-harness

14 Jul, 2025 1 commit
- refactor: update type hints for 3.9 · bfaf24cd
  Baber authored Jul 14, 2025
  
  bfaf24cd
13 Jul, 2025 1 commit
- add: create new YAML configurations for task and group setups · 45c11c31
  Baber authored Jul 14, 2025
  
  45c11c31
12 Jul, 2025 4 commits
- refactor: replace ConfigurableGroup with GroupConfig · 0aca6958
  Baber authored Jul 13, 2025
  
  0aca6958
- refactor: simplify docstrings and improve task name matching logic · 7fcfb4ac
  Baber authored Jul 12, 2025
  
  7fcfb4ac
- add docs · 5e632643
  Baber authored Jul 12, 2025
  
  5e632643
- nit · 68b3cddc
  Baber authored Jul 12, 2025
  
  68b3cddc
11 Jul, 2025 8 commits
- nit · 495ea3a0
  Baber authored Jul 12, 2025
  
  495ea3a0
- nit · 3c969207
  Baber authored Jul 12, 2025
  
  3c969207
- fix circular · 6fc2ac49
  Baber authored Jul 12, 2025
  
  6fc2ac49
- refactor: migrate utils functions to lm_eval.tasks and update references · a9c16905
  Baber authored Jul 12, 2025
  
  a9c16905
- refactor: add apply_template function and improve lazy initialization · e11fa05d
  Baber authored Jul 11, 2025
  
  e11fa05d
- refactor: add type hints · 85f61d85
  Baber authored Jul 11, 2025
  
  85f61d85
- refactor: use Path · 5454e95d
  Baber authored Jul 11, 2025
  
  5454e95d
- refactor: enhance task loading by including yaml_path parameter · 0a184f46
  Baber authored Jul 11, 2025
  
  0a184f46
10 Jul, 2025 4 commits
- refactor: preserve original task name during config inclusion · 15c01f4d
  Baber authored Jul 11, 2025
  
  15c01f4d
- refactor: simplify task and config validation methods · acc634fa
  Baber authored Jul 11, 2025
  
  acc634fa
- fix: remove warning (#3128) · fcddf195
  Baber Abbasi authored Jul 10, 2025
  
  fcddf195
- warning for "chat" pretrained; disable buggy evalita configs (#3127) · f3a0b554
  Baber Abbasi authored Jul 10, 2025
```
* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test
```
  f3a0b554
06 Jul, 2025 3 commits
- check pil dep (#3114) · ab3acc73
  Baber Abbasi authored Jul 06, 2025
  
  ab3acc73
- Neuralmagic (#3113) · 89654090
  Baber Abbasi authored Jul 06, 2025
```
* remove sparse-ml
```
  89654090
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
05 Jul, 2025 4 commits
- add image hashing and `LMEVAL_HASHMM` envar (#2973) · e69ca5ed
  achervyakov authored Jul 05, 2025
```
* add image hashing

* remove unused params decription

* use `LMEVAL_HASHMM` (defualt '1') to save raw images

---------
Co-authored-by: Baber <baber@hey.com>
```
  e69ca5ed
- Fixed #3005: Processes both formats of model_args: string and dictionay (#3097) · 0e96cd18
  Debjyoti Ray authored Jul 05, 2025
```
* git push --force
correctly processes both formats of model_args: string and dictionary both

* exctract to function for better test

* nit

---------
Co-authored-by: Baber <baber@hey.com>
```
  0e96cd18
- delete unneeded files (#3108) · 6e91fdcd
  Baber Abbasi authored Jul 05, 2025
```
* delete unneeded files
```
  6e91fdcd
- remove all; reformat table (#3107) · 28001d29
  Baber Abbasi authored Jul 05, 2025
  
  28001d29
04 Jul, 2025 1 commit

[FIX] Initial code to disable multi-proc for stderr (#3106) · 71d0289d

Neel Gupta authored Jul 04, 2025



* [FIX] Initial code to disable multi-proc for stderr

* add docs; align no-mp bootstrap with mp

---------
Co-authored-by: Baber <baber@hey.com>

71d0289d

03 Jul, 2025 4 commits

Bugfix/hf tokenizer gguf override (#3098) · ff41a856

Ankush authored Jul 03, 2025

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend

ff41a856

Humaneval - fix regression (#3102) · 8c1016cb
Baber Abbasi authored Jul 03, 2025
```
* use double quotes
```
8c1016cb

Fix: Reduce CLI loading time from 2.2s to 0.05s (#3099) · 944d32b4

Alex Stachowiak authored Jul 03, 2025



* Lazy-load submodules to reduce import time

* pacify pre-commit

---------
Co-authored-by: Baber <baber@hey.com>

944d32b4

Truthfulqa multi harness (#3062) · e0dc33ae

Blanca Calvo authored Jul 03, 2025



* truthfulqa-multi task

* truthfulqa-multi with chat few-shot

* few shot chat implementation

* changed until so it outputs lists

* changed dataset location

* added MT task

* Create README.md

* do not include MT

* changes for PR

* tag change

* removed yaml extension

* adding task to the table

* fix task configs

* add import exception

---------
Co-authored-by: Baber <baber@hey.com>

e0dc33ae

30 Jun, 2025 2 commits

FixBug: Align the Humaneval with official results for Llama-3.1-70B-Instruct (#3092) · a7ca0435

jinze authored Jul 01, 2025

* Fix: Align the Humaneval dataset with official results

Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals".

(2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one.

Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5).

Ref: PR#2650

* add changelog and version

* add changelog

a7ca0435

[HF] fix quantization config (#3039) · fea4d11d

Baber Abbasi authored Jun 30, 2025

* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8

.
The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
Require further modification to restore the functionality provided by the previous commit.

* wrap quantization_config in AutoQuantizationConfig

* handle quantization config not dict

* wrap quantization_config in AutoQuantizationConfig if dict

---------
Co-authored-by: shanhx2000 <hs359@duke.edu>

fea4d11d

25 Jun, 2025 3 commits
- feat / fix: Properly make use of `subfolder` from HF models (#3072) · 6b3f3f7e
  Younes B authored Jun 25, 2025
```
* add subfolder

* lint

* change it to empty string

* fix typehints

---------
Co-authored-by: Baber <baber@hey.com>
```
  6b3f3f7e
- remove system message if `TemplateError` (#3076) · 0f63d4f5
  Baber Abbasi authored Jun 25, 2025
  
  0f63d4f5
- Ensure backwards compatibility in fewshot_context by using kwargs (#3079) · 532909c0
  Kiersten Stokes authored Jun 25, 2025
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
```
  532909c0
23 Jun, 2025 1 commit

Fix Anthropic API compatibility issues in chat completions (#3054) · 8bc46207

NourFahmy authored Jun 23, 2025



* Fix Anthropic API compatibility issues in chat completions

solves two important compatibility issues between the LM Eval Harness and Anthropic's API:

1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included
2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters

tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors

* pacufy pre-commit

* add type

---------
Co-authored-by: Baber <baber@hey.com>

8bc46207

20 Jun, 2025 1 commit

llama3 task: update README.md (#3074) · 68c3a811

Anna Fontana authored Jun 20, 2025

"arc_chalenge_chat" doesn't exist: I think it should be "arc_challenge_chat", but this task is not implemented here (see arc task folder).

68c3a811

19 Jun, 2025 3 commits
- bump version to `0.4.9` (#3073) · 45274951
  Baber Abbasi authored Jun 19, 2025
  
  45274951
- Update instructions.py (#3060) · 37357004
  Maxim Evtush authored Jun 19, 2025
  
  37357004
- Update README.md (#3070) · 5a15058e
  Anna Fontana authored Jun 19, 2025
```
Wrong task name: mmlu_generation doesn't non exist -> mmlu_generative is the correct one
```
  5a15058e