Commits · 2bfc3ce4d14e985dbdaa98748ab55ae6ebdb6f70 · gaoqiong / lm-evaluation-harness

16 Oct, 2025 1 commit
- if max_length reached truncate generations · 2bfc3ce4
  Baber authored Oct 16, 2025
  
  2bfc3ce4
15 Oct, 2025 9 commits
- fix type · 4d40c5d7
  Baber authored Oct 16, 2025
  
  4d40c5d7
- `self.tokenizer.bos_token` can be None · 9db56820
  Baber authored Oct 15, 2025
  
  9db56820
- fixup! default add_special_tokens as unset · 03f6ad01
  Baber authored Oct 15, 2025
  
  03f6ad01
- fix bos token handling · 03f0e80e
  Baber authored Oct 15, 2025
  
  03f0e80e
- fix bos token handling · d701d50f
  Baber authored Oct 15, 2025
  
  d701d50f
- skip duplicate bos · aab23be4
  Baber authored Oct 15, 2025
  
  aab23be4
- fix params · 22197e78
  Baber authored Oct 15, 2025
  
  22197e78
- check tokenizer.add_bos_token for bos control · d5c234ce
  Baber authored Oct 15, 2025
  
  d5c234ce
- fix duplicate `bos` token when `context==""` · 5c6e9b50
  Baber authored Oct 15, 2025
  
  5c6e9b50
14 Oct, 2025 2 commits

Leverage vllm's `tokenizer_info` endpoint to avoid manual duplication (#3185) · 690ef8ba

Mac Misiura authored Oct 14, 2025

* ✨

 added an approach to use tokenizer_info endpoint from vllm
Signed-off-by: m-misiura <mmisiura@redhat.com>

* 🚧

 removed all auto-detection and tokenization logic from `LocalChatCompletion`

* pacify pre-commit

---------
Signed-off-by: m-misiura <mmisiura@redhat.com>
Co-authored-by: Baber <baber@hey.com>

690ef8ba

update mrl subsets · 5b7ef05f
Baber authored Oct 14, 2025

5b7ef05f

02 Oct, 2025 1 commit
- fix: sp, req order (#3303) · a1404f06
  Vineeth authored Oct 02, 2025
  
  a1404f06
21 Sep, 2025 1 commit
- add xpu support HFLM (#3211) · 368275f3
  kaixuanliu authored Sep 21, 2025
```
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
```
  368275f3
12 Sep, 2025 1 commit
- add quote to type hints (#3292) · 0c134ee9
  fxmarty-amd authored Sep 12, 2025
  
  0c134ee9
08 Sep, 2025 2 commits

Ignore seed when splitting batch in chunks with groupby (#3047) · 44398478

Slim Frikha authored Sep 09, 2025



* feat(vllm_causallms): make collator ignore seed when splitting batch into chunks

* fix(collator): revert PR changes

* fix(vllm-causallm): update collator call with groupby None

* feat(sglang-causallms): make generation accept a list of sampling params

---------
Co-authored-by: Baber <baber@hey.com>

44398478

Add support for steering specific attention heads (#3279) · a46180bf
Lucia Quirke authored Sep 08, 2025

a46180bf

27 Aug, 2025 1 commit
- pacify pre-commit (#3268) · 3a9bcc3f
  Baber Abbasi authored Aug 27, 2025
  
  3a9bcc3f
26 Aug, 2025 1 commit

Support for AIME dataset (#3248) · 5ac7cdf8

Janna authored Aug 26, 2025

* add AIME tasks

* standardize the repeats

* fix task naming

* aime25 only has test set

* edit readme

* add utils

* standardize

* fix case sensitivity

* repeat once

* lint

* more linting

* lint huggingface.py

5ac7cdf8

25 Aug, 2025 1 commit
- Add support for OpenVINO text2text generation models (#3101) · 05b37f20
  Nikita Savelyev authored Aug 25, 2025
```
* Add support for OVModelForSeq2SeqLM

* Add test
```
  05b37f20
21 Aug, 2025 2 commits
- Adding support for OpenAI GPT-5 model; Models only support hardcoded... · 30885632
  Kurt Yang authored Aug 21, 2025
```
Adding support for OpenAI GPT-5 model; Models only support hardcoded tempeature=1 and stop=None (#3247)
```
  30885632
- Fix `add_bos_token` not updated for Gemma tokenizer (#3206) · 206b7722
  Cyrus Leung authored Aug 21, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  206b7722
13 Aug, 2025 1 commit
- Adding support for evaluating with fine-tuned Gemma3 (#3234) · 3bc7cc8a
  Xinhe Shi authored Aug 14, 2025
  
  3bc7cc8a
02 Aug, 2025 1 commit

Update vLLM compatibility (#3024) · bc811365

Cyrus Leung authored Aug 03, 2025



* Update vLLM compatibility
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

* add TokensPrompt to all generate calls

---------
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Baber <baber@hey.com>

bc811365

24 Jul, 2025 2 commits
- vllm: remove device (#3181) · 4f8195f1
  Baber Abbasi authored Jul 24, 2025
  
  4f8195f1
- fix vllm test issue that call pop() from None (#3182) · 5f5f35e5
  weiliang authored Jul 24, 2025
  
  5f5f35e5
23 Jul, 2025 3 commits

Remove "device" from vllm_causallms.py (#3176) · 8c6fde08

Michael Goin authored Jul 23, 2025

Device has been a deprecated arg for a few releases of vLLM and is now removed in 0.10.0 https://github.com/vllm-project/vllm/pull/21349

8c6fde08

Pin datasets < 4.0.0 (#3172) · 904bba12

Baber Abbasi authored Jul 23, 2025

* Fix: pin datasets < 4.0

* fix

* update type hints in HF

* fix hellaswag path

904bba12

Added `chat_template_args` to pass additional kwargs to tokenizer.apply_chat_template (#3164) · 2eea3f50

Avelina Asada Hadji-Kyriacou authored Jul 23, 2025



* added support for additional chat template arguments

* use `enable_thinking`

* add wrap logging function

* add `chat_template_args` back to HF

---------
Co-authored-by: Baber <baber@hey.com>

2eea3f50

18 Jul, 2025 2 commits

Custom request headers | trust_remote_code param fix (#3069) · 56def33d

Ramiro R. C. authored Jul 18, 2025



* added headers and custom model name | fixed bug with trust_remote_code param

* linting

* removed custom model name | changed headers override

* add `header` to base TemplateAPI

* nit

---------
Co-authored-by: Baber <baber@hey.com>

56def33d

fix request hanging when request api (#3090) · e6ea0315

mans authored Jul 18, 2025



* fix request hanging when request api

* pre commit

---------
Co-authored-by: qinyidao <qinyidao@moonshot.cn>

e6ea0315

16 Jul, 2025 1 commit

truncate thinking tags in generations (#3145) · 51ede33c

Baber Abbasi authored Jul 17, 2025

* feat: add postprocessing for generated text to strip stop sequences and thinking tokens

* nit

* fix: trim leading whitespace after stripping thinking tokens from generation

* feat: add think_end_token to model_args

* nit

* nit

* nit

* add to readme

* nit

51ede33c

15 Jul, 2025 1 commit
- fix: vllm lora (#3132) · 3102a8e4
  MaYongQing authored Jul 15, 2025
  
  3102a8e4
14 Jul, 2025 1 commit
- Added mixed_precision_dtype arg (#3138) · 31895e5b
  Avelina Asada Hadji-Kyriacou authored Jul 14, 2025
  
  31895e5b
06 Jul, 2025 1 commit
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
03 Jul, 2025 1 commit

Bugfix/hf tokenizer gguf override (#3098) · ff41a856

Ankush authored Jul 03, 2025

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend

ff41a856

30 Jun, 2025 1 commit

[HF] fix quantization config (#3039) · fea4d11d

Baber Abbasi authored Jun 30, 2025

* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8

.
The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
Require further modification to restore the functionality provided by the previous commit.

* wrap quantization_config in AutoQuantizationConfig

* handle quantization config not dict

* wrap quantization_config in AutoQuantizationConfig if dict

---------
Co-authored-by: shanhx2000 <hs359@duke.edu>

fea4d11d

25 Jun, 2025 2 commits
- feat / fix: Properly make use of `subfolder` from HF models (#3072) · 6b3f3f7e
  Younes B authored Jun 25, 2025
```
* add subfolder

* lint

* change it to empty string

* fix typehints

---------
Co-authored-by: Baber <baber@hey.com>
```
  6b3f3f7e
- remove system message if `TemplateError` (#3076) · 0f63d4f5
  Baber Abbasi authored Jun 25, 2025
  
  0f63d4f5
23 Jun, 2025 1 commit

Fix Anthropic API compatibility issues in chat completions (#3054) · 8bc46207

NourFahmy authored Jun 23, 2025



* Fix Anthropic API compatibility issues in chat completions

solves two important compatibility issues between the LM Eval Harness and Anthropic's API:

1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included
2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters

tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors

* pacufy pre-commit

* add type

---------
Co-authored-by: Baber <baber@hey.com>

8bc46207