Commits · 84d02f77ced6843dfe7fde525315cb1089d32c19 · gaoqiong / lm-evaluation-harness

06 Jul, 2025 1 commit
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
03 Jul, 2025 1 commit

Bugfix/hf tokenizer gguf override (#3098) · ff41a856

Ankush authored Jul 03, 2025

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend

ff41a856

30 Jun, 2025 1 commit

[HF] fix quantization config (#3039) · fea4d11d

Baber Abbasi authored Jun 30, 2025

* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8

.
The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type.
Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py.
Require further modification to restore the functionality provided by the previous commit.

* wrap quantization_config in AutoQuantizationConfig

* handle quantization config not dict

* wrap quantization_config in AutoQuantizationConfig if dict

---------
Co-authored-by: shanhx2000 <hs359@duke.edu>

fea4d11d

25 Jun, 2025 2 commits
- feat / fix: Properly make use of `subfolder` from HF models (#3072) · 6b3f3f7e
  Younes B authored Jun 25, 2025
```
* add subfolder

* lint

* change it to empty string

* fix typehints

---------
Co-authored-by: Baber <baber@hey.com>
```
  6b3f3f7e
- remove system message if `TemplateError` (#3076) · 0f63d4f5
  Baber Abbasi authored Jun 25, 2025
  
  0f63d4f5
23 Jun, 2025 1 commit

Fix Anthropic API compatibility issues in chat completions (#3054) · 8bc46207

NourFahmy authored Jun 23, 2025



* Fix Anthropic API compatibility issues in chat completions

solves two important compatibility issues between the LM Eval Harness and Anthropic's API:

1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included
2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters

tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors

* pacufy pre-commit

* add type

---------
Co-authored-by: Baber <baber@hey.com>

8bc46207

08 Jun, 2025 1 commit

[longbench] fix metric calculation (#2983) · 147e9d61

Baber Abbasi authored Jun 08, 2025

* use all answers

* use middle truncation

* maybe fix classification score

* strip classification preds

* [vllm] remove stop tokens post-hoc

* strip all preds

* pacify pre-commit

* start on truncation utility

* add to readme

* add a footgun doc

* fix newline in yaml templates

* do not strip code_sim preds!

* fix pre-commit config

* fix instruction warning

* add not to longbench readme

147e9d61

03 Jun, 2025 1 commit
- fix: fix vllm issue with DP>1 (#3025) · d57e3d65
  Younes B authored Jun 03, 2025
  
  d57e3d65
02 Jun, 2025 1 commit
- Enable text-only evals for VLM models (#2999) · 82a99365
  Yury Sulsky authored Jun 02, 2025
  
  82a99365
26 May, 2025 1 commit

[vllm] data parallel for V1 (#3011) · 5a481f43

Baber Abbasi authored May 26, 2025

* add data_parallel for V1

* use Process instead of Queue

* ray used if V0 DP

* better error handling

* fix truncation warning comparison

5a481f43

23 May, 2025 2 commits

Fix error due in Collating queries with different continuation lengths (fixes #2984) (#2987) · 7aaceeec

Ameya Godbole authored May 22, 2025



* FIX error due to grouping queries with different continuation length

Make Collator choose query with the longest continuation as the
candidate for generation

* use max for key selection

* added comments explaining variable cont length (identical ctx+cont[:-1])

---------
Co-authored-by: Baber <baber@hey.com>

7aaceeec

[Fix] Update `resolve_hf_chat_template` arguments (#2992) · 357d4eaa
fxmarty-amd authored May 23, 2025
```
* fix arguments

* pacify pre-commit

---------
Co-authored-by: Baber <baber@hey.com>
```
357d4eaa

21 May, 2025 3 commits

Adding resize images support (#2958) · 143a7fe0

achervyakov authored May 21, 2025



* first version of image resizing

* fixed bug

* clean up `resize_image`

---------
Co-authored-by: Artem Safin <artemsafin67@gmail.com>
Co-authored-by: Baber <baber@hey.com>

143a7fe0

use images with api models (#2981) · 2cfdd0a2
Baber Abbasi authored May 21, 2025
```
* use images with apis

* pacify pre-commit
```
2cfdd0a2
Log tokenized request warning only once (#3002) · 07e5348c
Rob Geada authored May 21, 2025
```
* Log tokenized request warning only once

* Fix logging for concurrent usecase as well
```
07e5348c

19 May, 2025 1 commit
- [SGLANG] Add the SGLANG generate API (#2997) · 53c65300
  Baber Abbasi authored May 19, 2025
```
* add `sglang-generate`

* nit

* nit

* nit

* pacify pre-commit
```
  53c65300
15 May, 2025 1 commit
- Add device arg to model_args passed to LLM object in VLLM model class (#2879) · 96966f53
  Filippo Momentè authored May 15, 2025
```
* fix: pass device arg in model_ar in vllm_causallms

* casting device arg to str in vLLM model args
```
  96966f53
10 May, 2025 1 commit
- fix: type error while checking context length (#2972) · 1c03af33
  Sungjae Lee authored May 10, 2025
  
  1c03af33
09 May, 2025 1 commit
- add warning on truncation (#2962) · 2f03271d
  Baber Abbasi authored May 09, 2025
  
  2f03271d
06 May, 2025 1 commit
- Add support for enable_thinking argument in vllm model, set default to False (#2947) · ab618f01
  Alexandre Marques authored May 06, 2025
  
  ab618f01
18 Apr, 2025 1 commit

Added softmax_dtype argument to HFLM to coerce log_softmax computations (#2921) · e4a7b69f

Avelina9X authored Apr 18, 2025



* Added softmax_dtype argument to coerce log_softmax computations

* move softmax_dtype

---------
Co-authored-by: Baber <baber@hey.com>

e4a7b69f

16 Apr, 2025 2 commits
- init pixels before tokenizer creation (#2911) · 82fe48ec
  achervyakov authored Apr 16, 2025
  
  82fe48ec
- fix resolve_hf_chat_template version (#2917) · 38ba7dce
  Baber Abbasi authored Apr 16, 2025
```
* fix resolve_hf_chat_template version

* pre-commit
```
  38ba7dce
15 Apr, 2025 1 commit

Add support for quantization_config (#2842) · 758c5ed8

Jerry Zhang authored Apr 14, 2025

* Add support for quantization_config

Summary:
Previously quantization_config is ignored, so torchao quantized models are not supported,
this PR adds that.

Test Plan:
lm_eval --model hf --model_args pretrained=jerryzh168/gemma3-int4wo --tasks hellaswag --device cuda:0 --batch_size 8

Reviewers:

Subscribers:

Tasks:

Tags:

* quantization_config is optional

758c5ed8

14 Apr, 2025 1 commit

Extend support for chat template in vLLM (#2902) · 2a41c02e

Alexandre Marques authored Apr 14, 2025

* Add support for chat templates defined outside of tokenizer_config.json, as supported by vLLM

* Update template name to avoid conflict with other variable

2a41c02e

04 Apr, 2025 1 commit
- Update authentications methods, add support for deployment_id for IBM watsonx_ai (#2877) · 1da9e4e8
  Nikodem Szwast authored Apr 04, 2025
```
* update authnentications methods, add support for deployment_id

* run pre-commit on changed file
```
  1da9e4e8
20 Mar, 2025 2 commits
- [VLLM, SLANG] default temp=0.0 (#2819) · c6b9aeeb
  Baber Abbasi authored Mar 20, 2025
  
  c6b9aeeb
- Configure the pad tokens for Qwen when using vLLM (#2810) · 61b63da7
  Yifei Zhang authored Mar 20, 2025
  
  61b63da7
18 Mar, 2025 1 commit
- [hf-multimodal] pass kwargs to self.processor (#2667) · 1e2428a2
  Baber Abbasi authored Mar 18, 2025
```
* add min_pixels, max_pixels

* fix
```
  1e2428a2
17 Mar, 2025 1 commit

Add support for token-based auth for watsonx models (#2796) · 78d57e0f

Kiersten Stokes authored Mar 17, 2025

* Add support for token-based auth for watsonx models

* Fix lint

* Move dotenv import to inner scope

* Improve readability of _verify_credentials

78d57e0f

14 Mar, 2025 2 commits

add audio modality (qwen2 audio only) (#2689) · 62552d2c

achervyakov authored Mar 14, 2025



* Added audio-modality pipeline for qwen2-audio model

* Beauty imports

* fix apply_chat_template args

* update default audio placeholders list

* add demo task - common_voice subset

* add audiolm_qwen libs to pyproject.toml

* pre-commit beautify

---------
Co-authored-by: Alexandra Rak <rakalexandra@mail.ru>

62552d2c

use verify_certificate flag in batch requests (#2785) · 3b7dbef9
daniel-salib authored Mar 14, 2025

3b7dbef9

11 Mar, 2025 1 commit
- initialize tokenizer with bos_token (#2781) · 07bd7e23
  Baber Abbasi authored Mar 11, 2025
  
  07bd7e23
04 Mar, 2025 1 commit

Enable steering HF models (#2749) · d35008f1

Lucia Quirke authored Mar 04, 2025



* Enable steering HF models
Co-authored-by: Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

* increase HF download timeout

* Update readme; improve steering vector device handling

* Update latest news

* remove HF timeout increase

* fix tests

* ignore sae lens test

* fix accidental force push

---------
Co-authored-by: Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

d35008f1

27 Feb, 2025 1 commit
- fix vllm data parallel (#2746) · a87fe425
  Baber Abbasi authored Feb 27, 2025
```
* remove ray.remote resources

* remove kobtest tag (registered as group)
```
  a87fe425
25 Feb, 2025 1 commit

Support SGLang as Potential Backend for Evaluation (#2703) · 29971faa

Jinwei authored Feb 25, 2025



* initial components to support sglang

* init of class SGLangLM

* draft for generate_until of SGLang model

* mock loglikelihood

* initial loglikelihood_tokens

* todo: fix bug of sglang engine init

* implement generation tasks and test

* support output type loglikelihood and loglikelihood_rolling (#1)

* .

* loglikelihood_rolling

* /

* support dp_size>1

* typo

* add tests and clean code

* skip tests of sglang for now

* fix OOM error of sglang pytest

* finish test for sglang

* add sglang to readme

* fix OOM of tests and clean SGLang model

* update readme

* clean pyproject and add tests for evaluator

* add accuracy tests and it passed locally

* add notes for test

* Update README.md

update readme

* pre-commit

---------
Co-authored-by: Xiaotong Jiang <xiaotong.jiang@databricks.com>
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
Co-authored-by: Baber <baber@hey.com>

29971faa

24 Feb, 2025 1 commit
- add o3-mini support (#2697) · 01849b40
  Jocelyn authored Feb 25, 2025
```
* add o3-mini support

* fix linter tests
```
  01849b40
21 Feb, 2025 1 commit

Logging (#2203) · 1ba35e62

Lintang Sutawika authored Feb 20, 2025



* changed source of eval_logger

* allow eval_logger to be set from args

* removed verbosity arg from non-main methods

* fix logging

* pre-commit

* set verbosity in eval logger

* replace utils.eval_logger

* fix logging in main

* add logging to docs

* add logging message

* nit

* add logging to docs

* refactor setup_logging to utils

---------
Co-authored-by: Baber <baber@hey.com>

1ba35e62

17 Feb, 2025 1 commit
- fix vllm (#2708) · 52df63b7
  Baber Abbasi authored Feb 17, 2025
```
* fix vllm

* fix data_parallel

* copy to multimodal
```
  52df63b7
12 Feb, 2025 1 commit
- change ensure_ascii to False for JsonChatStr (#2691) · 96f5e58f
  achervyakov authored Feb 13, 2025
  
  96f5e58f