Commits · c8f9991f2ef884b677dfd6c0d18df2f9833c6d25 · gaoqiong / lm-evaluation-harness

25 Jul, 2025 4 commits
- refactor yaml loading · c8f9991f
  Baber authored Jul 25, 2025
  
  c8f9991f
- refactor: improve type hints and simplify YAML loading functions · d77596da
  Baber authored Jul 25, 2025
  
  d77596da
- refactor: improve type hints and simplify YAML loading functions · 15e930af
  Baber authored Jul 25, 2025
  
  15e930af
- fix · 4a0a8bd8
  Baber authored Jul 25, 2025
  
  4a0a8bd8
14 Jul, 2025 2 commits
- refactor: remove root_dir in iglob · c40a012a
  Baber authored Jul 14, 2025
  
  c40a012a
- refactor: update type hints for 3.9 · bfaf24cd
  Baber authored Jul 14, 2025
  
  bfaf24cd
13 Jul, 2025 1 commit
- add: create new YAML configurations for task and group setups · 45c11c31
  Baber authored Jul 14, 2025
  
  45c11c31
12 Jul, 2025 4 commits
- refactor: replace ConfigurableGroup with GroupConfig · 0aca6958
  Baber authored Jul 13, 2025
  
  0aca6958
- refactor: simplify docstrings and improve task name matching logic · 7fcfb4ac
  Baber authored Jul 12, 2025
  
  7fcfb4ac
- add docs · 5e632643
  Baber authored Jul 12, 2025
  
  5e632643
- nit · 68b3cddc
  Baber authored Jul 12, 2025
  
  68b3cddc
11 Jul, 2025 8 commits
- nit · 495ea3a0
  Baber authored Jul 12, 2025
  
  495ea3a0
- nit · 3c969207
  Baber authored Jul 12, 2025
  
  3c969207
- fix circular · 6fc2ac49
  Baber authored Jul 12, 2025
  
  6fc2ac49
- refactor: migrate utils functions to lm_eval.tasks and update references · a9c16905
  Baber authored Jul 12, 2025
  
  a9c16905
- refactor: add apply_template function and improve lazy initialization · e11fa05d
  Baber authored Jul 11, 2025
  
  e11fa05d
- refactor: add type hints · 85f61d85
  Baber authored Jul 11, 2025
  
  85f61d85
- refactor: use Path · 5454e95d
  Baber authored Jul 11, 2025
  
  5454e95d
- refactor: enhance task loading by including yaml_path parameter · 0a184f46
  Baber authored Jul 11, 2025
  
  0a184f46
10 Jul, 2025 4 commits
- refactor: preserve original task name during config inclusion · 15c01f4d
  Baber authored Jul 11, 2025
  
  15c01f4d
- refactor: simplify task and config validation methods · acc634fa
  Baber authored Jul 11, 2025
  
  acc634fa
- fix: remove warning (#3128) · fcddf195
  Baber Abbasi authored Jul 10, 2025
  
  fcddf195
- warning for "chat" pretrained; disable buggy evalita configs (#3127) · f3a0b554
  Baber Abbasi authored Jul 10, 2025
```
* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test
```
  f3a0b554
06 Jul, 2025 3 commits
- check pil dep (#3114) · ab3acc73
  Baber Abbasi authored Jul 06, 2025
  
  ab3acc73
- Neuralmagic (#3113) · 89654090
  Baber Abbasi authored Jul 06, 2025
```
* remove sparse-ml
```
  89654090
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
05 Jul, 2025 4 commits
- add image hashing and `LMEVAL_HASHMM` envar (#2973) · e69ca5ed
  achervyakov authored Jul 05, 2025
```
* add image hashing

* remove unused params decription

* use `LMEVAL_HASHMM` (defualt '1') to save raw images

---------
Co-authored-by: Baber <baber@hey.com>
```
  e69ca5ed
- Fixed #3005: Processes both formats of model_args: string and dictionay (#3097) · 0e96cd18
  Debjyoti Ray authored Jul 05, 2025
```
* git push --force
correctly processes both formats of model_args: string and dictionary both

* exctract to function for better test

* nit

---------
Co-authored-by: Baber <baber@hey.com>
```
  0e96cd18
- delete unneeded files (#3108) · 6e91fdcd
  Baber Abbasi authored Jul 05, 2025
```
* delete unneeded files
```
  6e91fdcd
- remove all; reformat table (#3107) · 28001d29
  Baber Abbasi authored Jul 05, 2025
  
  28001d29
04 Jul, 2025 1 commit

[FIX] Initial code to disable multi-proc for stderr (#3106) · 71d0289d

Neel Gupta authored Jul 04, 2025



* [FIX] Initial code to disable multi-proc for stderr

* add docs; align no-mp bootstrap with mp

---------
Co-authored-by: Baber <baber@hey.com>

71d0289d

03 Jul, 2025 4 commits

Bugfix/hf tokenizer gguf override (#3098) · ff41a856

Ankush authored Jul 03, 2025

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend

ff41a856

Humaneval - fix regression (#3102) · 8c1016cb
Baber Abbasi authored Jul 03, 2025
```
* use double quotes
```
8c1016cb

Fix: Reduce CLI loading time from 2.2s to 0.05s (#3099) · 944d32b4

Alex Stachowiak authored Jul 03, 2025



* Lazy-load submodules to reduce import time

* pacify pre-commit

---------
Co-authored-by: Baber <baber@hey.com>

944d32b4

Truthfulqa multi harness (#3062) · e0dc33ae

Blanca Calvo authored Jul 03, 2025



* truthfulqa-multi task

* truthfulqa-multi with chat few-shot

* few shot chat implementation

* changed until so it outputs lists

* changed dataset location

* added MT task

* Create README.md

* do not include MT

* changes for PR

* tag change

* removed yaml extension

* adding task to the table

* fix task configs

* add import exception

---------
Co-authored-by: Baber <baber@hey.com>

e0dc33ae

30 Jun, 2025 2 commits

FixBug: Align the Humaneval with official results for Llama-3.1-70B-Instruct (#3092) · a7ca0435

jinze authored Jul 01, 2025

* Fix: Align the Humaneval dataset with official results

Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals".

(2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one.

Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5).

Ref: PR#2650

* add changelog and version

* add changelog

a7ca0435

[HF] fix quantization config (#3039) · fea4d11d

Baber Abbasi authored Jun 30, 2025

* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8

.
The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
Require further modification to restore the functionality provided by the previous commit.

* wrap quantization_config in AutoQuantizationConfig

* handle quantization config not dict

* wrap quantization_config in AutoQuantizationConfig if dict

---------
Co-authored-by: shanhx2000 <hs359@duke.edu>

fea4d11d

25 Jun, 2025 3 commits
- feat / fix: Properly make use of `subfolder` from HF models (#3072) · 6b3f3f7e
  Younes B authored Jun 25, 2025
```
* add subfolder

* lint

* change it to empty string

* fix typehints

---------
Co-authored-by: Baber <baber@hey.com>
```
  6b3f3f7e
- remove system message if `TemplateError` (#3076) · 0f63d4f5
  Baber Abbasi authored Jun 25, 2025
  
  0f63d4f5
- Ensure backwards compatibility in fewshot_context by using kwargs (#3079) · 532909c0
  Kiersten Stokes authored Jun 25, 2025
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
```
  532909c0