Commits · 84d02f77ced6843dfe7fde525315cb1089d32c19 · gaoqiong / lm-evaluation-harness

10 Jul, 2025 3 commits
- Merge branch 'main' into feature/eval_from_config · 84d02f77
  Baber authored Jul 10, 2025
  
  84d02f77
- fix: remove warning (#3128) · fcddf195
  Baber Abbasi authored Jul 10, 2025
  
  fcddf195
- warning for "chat" pretrained; disable buggy evalita configs (#3127) · f3a0b554
  Baber Abbasi authored Jul 10, 2025
```
* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test
```
  f3a0b554
06 Jul, 2025 3 commits
- check pil dep (#3114) · ab3acc73
  Baber Abbasi authored Jul 06, 2025
  
  ab3acc73
- Neuralmagic (#3113) · 89654090
  Baber Abbasi authored Jul 06, 2025
```
* remove sparse-ml
```
  89654090
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
05 Jul, 2025 4 commits
- add image hashing and `LMEVAL_HASHMM` envar (#2973) · e69ca5ed
  achervyakov authored Jul 05, 2025
```
* add image hashing

* remove unused params decription

* use `LMEVAL_HASHMM` (defualt '1') to save raw images

---------
Co-authored-by: Baber <baber@hey.com>
```
  e69ca5ed
- Fixed #3005: Processes both formats of model_args: string and dictionay (#3097) · 0e96cd18
  Debjyoti Ray authored Jul 05, 2025
```
* git push --force
correctly processes both formats of model_args: string and dictionary both

* exctract to function for better test

* nit

---------
Co-authored-by: Baber <baber@hey.com>
```
  0e96cd18
- delete unneeded files (#3108) · 6e91fdcd
  Baber Abbasi authored Jul 05, 2025
```
* delete unneeded files
```
  6e91fdcd
- remove all; reformat table (#3107) · 28001d29
  Baber Abbasi authored Jul 05, 2025
  
  28001d29
04 Jul, 2025 6 commits
- [FIX] Initial code to disable multi-proc for stderr (#3106) · 71d0289d
  Neel Gupta authored Jul 04, 2025
```
* [FIX] Initial code to disable multi-proc for stderr

* add docs; align no-mp bootstrap with mp

---------
Co-authored-by: Baber <baber@hey.com>
```
  71d0289d
- add tests · 15ce554c
  Baber authored Jul 04, 2025
  
  15ce554c
- nit · b9ee592b
  Baber authored Jul 04, 2025
  
  b9ee592b
- add docs · f3cfff61
  Baber authored Jul 04, 2025
  
  f3cfff61
- improve logging · dbe4c391
  Baber authored Jul 04, 2025
  
  dbe4c391
- fix logging · 442ce51a
  Baber authored Jul 04, 2025
  
  442ce51a
03 Jul, 2025 18 commits
- update docs · 897ed70a
  Baber authored Jul 04, 2025
  
  897ed70a
- update docs · 560905cb
  Baber authored Jul 04, 2025
  
  560905cb
- cleanup · 613d383e
  Baber authored Jul 04, 2025
  
  613d383e
- fix help · 768f55b3
  Baber authored Jul 04, 2025
  
  768f55b3
- pre-commit · c59d4e2a
  Baber authored Jul 04, 2025
  
  c59d4e2a
- pre-commit · be78dc7a
  Baber authored Jul 04, 2025
  
  be78dc7a
- cleanup · b7d3f0dd
  Baber authored Jul 04, 2025
  
  b7d3f0dd
- cleanup · 9de93651
  Baber authored Jul 04, 2025
  
  9de93651
- add subcommands · febdcc5b
  Baber authored Jul 04, 2025
  
  febdcc5b
- modularize cli · 30fa3c7c
  Baber authored Jul 03, 2025
  
  30fa3c7c
- nit · 82517de7
  Baber authored Jul 03, 2025
  
  82517de7
- Merge branch 'main' into feature/eval_from_config · 2ebef470
  Baber authored Jul 03, 2025
```
# Conflicts:
#	lm_eval/__main__.py
```
  2ebef470
- remove prints · d816f64a
  Baber authored Jul 03, 2025
  
  d816f64a
- Bugfix/hf tokenizer gguf override (#3098) · ff41a856
  Ankush authored Jul 03, 2025
```
* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
```
  ff41a856
- Humaneval - fix regression (#3102) · 8c1016cb
  Baber Abbasi authored Jul 03, 2025
```
* use double quotes
```
  8c1016cb
- Fix: Reduce CLI loading time from 2.2s to 0.05s (#3099) · 944d32b4
  Alex Stachowiak authored Jul 03, 2025
```
* Lazy-load submodules to reduce import time

* pacify pre-commit

---------
Co-authored-by: Baber <baber@hey.com>
```
  944d32b4
- nit · caab7820
  Baber authored Jul 03, 2025
  
  caab7820
- Truthfulqa multi harness (#3062) · e0dc33ae
  Blanca Calvo authored Jul 03, 2025
```
* truthfulqa-multi task

* truthfulqa-multi with chat few-shot

* few shot chat implementation

* changed until so it outputs lists

* changed dataset location

* added MT task

* Create README.md

* do not include MT

* changes for PR

* tag change

* removed yaml extension

* adding task to the table

* fix task configs

* add import exception

---------
Co-authored-by: Baber <baber@hey.com>
```
  e0dc33ae
30 Jun, 2025 2 commits

FixBug: Align the Humaneval with official results for Llama-3.1-70B-Instruct (#3092) · a7ca0435

jinze authored Jul 01, 2025

* Fix: Align the Humaneval dataset with official results

Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals".

(2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one.

Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5).

Ref: PR#2650

* add changelog and version

* add changelog

a7ca0435

[HF] fix quantization config (#3039) · fea4d11d

Baber Abbasi authored Jun 30, 2025

* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8

.
The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
Require further modification to restore the functionality provided by the previous commit.

* wrap quantization_config in AutoQuantizationConfig

* handle quantization config not dict

* wrap quantization_config in AutoQuantizationConfig if dict

---------
Co-authored-by: shanhx2000 <hs359@duke.edu>

fea4d11d

25 Jun, 2025 3 commits
- feat / fix: Properly make use of `subfolder` from HF models (#3072) · 6b3f3f7e
  Younes B authored Jun 25, 2025
```
* add subfolder

* lint

* change it to empty string

* fix typehints

---------
Co-authored-by: Baber <baber@hey.com>
```
  6b3f3f7e
- remove system message if `TemplateError` (#3076) · 0f63d4f5
  Baber Abbasi authored Jun 25, 2025
  
  0f63d4f5
- Ensure backwards compatibility in fewshot_context by using kwargs (#3079) · 532909c0
  Kiersten Stokes authored Jun 25, 2025
```
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
```
  532909c0
23 Jun, 2025 1 commit
- Merge branch 'main' into feature/eval_from_config · 601be343
  Baber authored Jun 23, 2025
  
  601be343