Commits · dba2ee3eb44505b99be5dd9746eebdbd10015264 · gaoqiong / lm-evaluation-harness

08 Oct, 2025 3 commits
- add support for`GeminiOpenAI` API · dba2ee3e
  Baber authored Oct 08, 2025
  
  dba2ee3e
- add `parse_generations` to OpenAIChatCompletion · a7362d8b
  Baber authored Oct 08, 2025
  
  a7362d8b
- add `max_thinking_tokens` for anthropic · 3e28eed1
  Baber authored Oct 08, 2025
  
  3e28eed1
02 Oct, 2025 1 commit
- fix: sp, req order (#3303) · a1404f06
  Vineeth authored Oct 02, 2025
  
  a1404f06
21 Sep, 2025 1 commit
- add xpu support HFLM (#3211) · 368275f3
  kaixuanliu authored Sep 21, 2025
```
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
```
  368275f3
12 Sep, 2025 1 commit
- add quote to type hints (#3292) · 0c134ee9
  fxmarty-amd authored Sep 12, 2025
  
  0c134ee9
08 Sep, 2025 2 commits

Ignore seed when splitting batch in chunks with groupby (#3047) · 44398478

Slim Frikha authored Sep 09, 2025



* feat(vllm_causallms): make collator ignore seed when splitting batch into chunks

* fix(collator): revert PR changes

* fix(vllm-causallm): update collator call with groupby None

* feat(sglang-causallms): make generation accept a list of sampling params

---------
Co-authored-by: Baber <baber@hey.com>

44398478

Add support for steering specific attention heads (#3279) · a46180bf
Lucia Quirke authored Sep 08, 2025

a46180bf

27 Aug, 2025 1 commit
- pacify pre-commit (#3268) · 3a9bcc3f
  Baber Abbasi authored Aug 27, 2025
  
  3a9bcc3f
26 Aug, 2025 1 commit

Support for AIME dataset (#3248) · 5ac7cdf8

Janna authored Aug 26, 2025

* add AIME tasks

* standardize the repeats

* fix task naming

* aime25 only has test set

* edit readme

* add utils

* standardize

* fix case sensitivity

* repeat once

* lint

* more linting

* lint huggingface.py

5ac7cdf8

25 Aug, 2025 1 commit
- Add support for OpenVINO text2text generation models (#3101) · 05b37f20
  Nikita Savelyev authored Aug 25, 2025
```
* Add support for OVModelForSeq2SeqLM

* Add test
```
  05b37f20
21 Aug, 2025 2 commits
- Adding support for OpenAI GPT-5 model; Models only support hardcoded... · 30885632
  Kurt Yang authored Aug 21, 2025
```
Adding support for OpenAI GPT-5 model; Models only support hardcoded tempeature=1 and stop=None (#3247)
```
  30885632
- Fix `add_bos_token` not updated for Gemma tokenizer (#3206) · 206b7722
  Cyrus Leung authored Aug 21, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  206b7722
13 Aug, 2025 1 commit
- Adding support for evaluating with fine-tuned Gemma3 (#3234) · 3bc7cc8a
  Xinhe Shi authored Aug 14, 2025
  
  3bc7cc8a
02 Aug, 2025 1 commit

Update vLLM compatibility (#3024) · bc811365

Cyrus Leung authored Aug 03, 2025



* Update vLLM compatibility
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* add TokensPrompt to all generate calls

---------
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Baber <baber@hey.com>

bc811365

24 Jul, 2025 2 commits
- vllm: remove device (#3181) · 4f8195f1
  Baber Abbasi authored Jul 24, 2025
  
  4f8195f1
- fix vllm test issue that call pop() from None (#3182) · 5f5f35e5
  weiliang authored Jul 24, 2025
  
  5f5f35e5
23 Jul, 2025 3 commits

Remove "device" from vllm_causallms.py (#3176) · 8c6fde08

Michael Goin authored Jul 23, 2025

Device has been a deprecated arg for a few releases of vLLM and is now removed in 0.10.0 https://github.com/vllm-project/vllm/pull/21349

8c6fde08

Pin datasets < 4.0.0 (#3172) · 904bba12

Baber Abbasi authored Jul 23, 2025

* Fix: pin datasets < 4.0

* fix

* update type hints in HF

* fix hellaswag path

904bba12

Added `chat_template_args` to pass additional kwargs to tokenizer.apply_chat_template (#3164) · 2eea3f50

Avelina Asada Hadji-Kyriacou authored Jul 23, 2025



* added support for additional chat template arguments

* use `enable_thinking`

* add wrap logging function

* add `chat_template_args` back to HF

---------
Co-authored-by: Baber <baber@hey.com>

2eea3f50

18 Jul, 2025 2 commits

Custom request headers | trust_remote_code param fix (#3069) · 56def33d

Ramiro R. C. authored Jul 18, 2025



* added headers and custom model name | fixed bug with trust_remote_code param

* linting

* removed custom model name | changed headers override

* add `header` to base TemplateAPI

* nit

---------
Co-authored-by: Baber <baber@hey.com>

56def33d

fix request hanging when request api (#3090) · e6ea0315

mans authored Jul 18, 2025



* fix request hanging when request api

* pre commit

---------
Co-authored-by: qinyidao <qinyidao@moonshot.cn>

e6ea0315

16 Jul, 2025 1 commit

truncate thinking tags in generations (#3145) · 51ede33c

Baber Abbasi authored Jul 17, 2025

* feat: add postprocessing for generated text to strip stop sequences and thinking tokens

* nit

* fix: trim leading whitespace after stripping thinking tokens from generation

* feat: add think_end_token to model_args

* nit

* nit

* nit

* add to readme

* nit

51ede33c

15 Jul, 2025 1 commit
- fix: vllm lora (#3132) · 3102a8e4
  MaYongQing authored Jul 15, 2025
  
  3102a8e4
14 Jul, 2025 1 commit
- Added mixed_precision_dtype arg (#3138) · 31895e5b
  Avelina Asada Hadji-Kyriacou authored Jul 14, 2025
  
  31895e5b
06 Jul, 2025 1 commit
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
03 Jul, 2025 1 commit

Bugfix/hf tokenizer gguf override (#3098) · ff41a856

Ankush authored Jul 03, 2025

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend

ff41a856

30 Jun, 2025 1 commit

[HF] fix quantization config (#3039) · fea4d11d

Baber Abbasi authored Jun 30, 2025

* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8

.
The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
Require further modification to restore the functionality provided by the previous commit.

* wrap quantization_config in AutoQuantizationConfig

* handle quantization config not dict

* wrap quantization_config in AutoQuantizationConfig if dict

---------
Co-authored-by: shanhx2000 <hs359@duke.edu>

fea4d11d

25 Jun, 2025 2 commits
- feat / fix: Properly make use of `subfolder` from HF models (#3072) · 6b3f3f7e
  Younes B authored Jun 25, 2025
```
* add subfolder

* lint

* change it to empty string

* fix typehints

---------
Co-authored-by: Baber <baber@hey.com>
```
  6b3f3f7e
- remove system message if `TemplateError` (#3076) · 0f63d4f5
  Baber Abbasi authored Jun 25, 2025
  
  0f63d4f5
23 Jun, 2025 1 commit

Fix Anthropic API compatibility issues in chat completions (#3054) · 8bc46207

NourFahmy authored Jun 23, 2025



* Fix Anthropic API compatibility issues in chat completions

solves two important compatibility issues between the LM Eval Harness and Anthropic's API:

1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included
2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters

tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors

* pacufy pre-commit

* add type

---------
Co-authored-by: Baber <baber@hey.com>

8bc46207

08 Jun, 2025 1 commit

[longbench] fix metric calculation (#2983) · 147e9d61

Baber Abbasi authored Jun 08, 2025

* use all answers

* use middle truncation

* maybe fix classification score

* strip classification preds

* [vllm] remove stop tokens post-hoc

* strip all preds

* pacify pre-commit

* start on truncation utility

* add to readme

* add a footgun doc

* fix newline in yaml templates

* do not strip code_sim preds!

* fix pre-commit config

* fix instruction warning

* add not to longbench readme

147e9d61

03 Jun, 2025 1 commit
- fix: fix vllm issue with DP>1 (#3025) · d57e3d65
  Younes B authored Jun 03, 2025
  
  d57e3d65
02 Jun, 2025 1 commit
- Enable text-only evals for VLM models (#2999) · 82a99365
  Yury Sulsky authored Jun 02, 2025
  
  82a99365
26 May, 2025 1 commit

[vllm] data parallel for V1 (#3011) · 5a481f43

Baber Abbasi authored May 26, 2025

* add data_parallel for V1

* use Process instead of Queue

* ray used if V0 DP

* better error handling

* fix truncation warning comparison

5a481f43

23 May, 2025 2 commits

Fix error due in Collating queries with different continuation lengths (fixes #2984) (#2987) · 7aaceeec

Ameya Godbole authored May 22, 2025



* FIX error due to grouping queries with different continuation length

Make Collator choose query with the longest continuation as the
candidate for generation

* use max for key selection

* added comments explaining variable cont length (identical ctx+cont[:-1])

---------
Co-authored-by: Baber <baber@hey.com>

7aaceeec

[Fix] Update `resolve_hf_chat_template` arguments (#2992) · 357d4eaa
fxmarty-amd authored May 23, 2025
```
* fix arguments

* pacify pre-commit

---------
Co-authored-by: Baber <baber@hey.com>
```
357d4eaa

21 May, 2025 3 commits

Adding resize images support (#2958) · 143a7fe0

achervyakov authored May 21, 2025



* first version of image resizing

* fixed bug

* clean up `resize_image`

---------
Co-authored-by: Artem Safin <artemsafin67@gmail.com>
Co-authored-by: Baber <baber@hey.com>

143a7fe0

use images with api models (#2981) · 2cfdd0a2
Baber Abbasi authored May 21, 2025
```
* use images with apis

* pacify pre-commit
```
2cfdd0a2
Log tokenized request warning only once (#3002) · 07e5348c
Rob Geada authored May 21, 2025
```
* Log tokenized request warning only once

* Fix logging for concurrent usecase as well
```
07e5348c