Commits · 143a7fe0d4b5504fb411dbaf8d2df0c734e293cd · gaoqiong / lm-evaluation-harness

21 May, 2025 3 commits

Adding resize images support (#2958) · 143a7fe0

achervyakov authored May 21, 2025



* first version of image resizing

* fixed bug

* clean up `resize_image`

---------
Co-authored-by: Artem Safin <artemsafin67@gmail.com>
Co-authored-by: Baber <baber@hey.com>

143a7fe0

use images with api models (#2981) · 2cfdd0a2
Baber Abbasi authored May 21, 2025
```
* use images with apis

* pacify pre-commit
```
2cfdd0a2
Log tokenized request warning only once (#3002) · 07e5348c
Rob Geada authored May 21, 2025
```
* Log tokenized request warning only once

* Fix logging for concurrent usecase as well
```
07e5348c

19 May, 2025 1 commit
- [SGLANG] Add the SGLANG generate API (#2997) · 53c65300
  Baber Abbasi authored May 19, 2025
```
* add `sglang-generate`

* nit

* nit

* nit

* pacify pre-commit
```
  53c65300
15 May, 2025 1 commit
- Add device arg to model_args passed to LLM object in VLLM model class (#2879) · 96966f53
  Filippo Momentè authored May 15, 2025
```
* fix: pass device arg in model_ar in vllm_causallms

* casting device arg to str in vLLM model args
```
  96966f53
10 May, 2025 1 commit
- fix: type error while checking context length (#2972) · 1c03af33
  Sungjae Lee authored May 10, 2025
  
  1c03af33
09 May, 2025 1 commit
- add warning on truncation (#2962) · 2f03271d
  Baber Abbasi authored May 09, 2025
  
  2f03271d
06 May, 2025 1 commit
- Add support for enable_thinking argument in vllm model, set default to False (#2947) · ab618f01
  Alexandre Marques authored May 06, 2025
  
  ab618f01
18 Apr, 2025 1 commit

Added softmax_dtype argument to HFLM to coerce log_softmax computations (#2921) · e4a7b69f

Avelina9X authored Apr 18, 2025



* Added softmax_dtype argument to coerce log_softmax computations

* move softmax_dtype

---------
Co-authored-by: Baber <baber@hey.com>

e4a7b69f

16 Apr, 2025 2 commits
- init pixels before tokenizer creation (#2911) · 82fe48ec
  achervyakov authored Apr 16, 2025
  
  82fe48ec
- fix resolve_hf_chat_template version (#2917) · 38ba7dce
  Baber Abbasi authored Apr 16, 2025
```
* fix resolve_hf_chat_template version

* pre-commit
```
  38ba7dce
15 Apr, 2025 1 commit

Add support for quantization_config (#2842) · 758c5ed8

Jerry Zhang authored Apr 14, 2025

* Add support for quantization_config

Summary:
Previously quantization_config is ignored, so torchao quantized models are not supported,
this PR adds that.

Test Plan:
lm_eval --model hf --model_args pretrained=jerryzh168/gemma3-int4wo --tasks hellaswag --device cuda:0 --batch_size 8

Reviewers:

Subscribers:

Tasks:

Tags:

* quantization_config is optional

758c5ed8

14 Apr, 2025 1 commit

Extend support for chat template in vLLM (#2902) · 2a41c02e

Alexandre Marques authored Apr 14, 2025

* Add support for chat templates defined outside of tokenizer_config.json, as supported by vLLM

* Update template name to avoid conflict with other variable

2a41c02e

04 Apr, 2025 1 commit
- Update authentications methods, add support for deployment_id for IBM watsonx_ai (#2877) · 1da9e4e8
  Nikodem Szwast authored Apr 04, 2025
```
* update authnentications methods, add support for deployment_id

* run pre-commit on changed file
```
  1da9e4e8
20 Mar, 2025 2 commits
- [VLLM, SLANG] default temp=0.0 (#2819) · c6b9aeeb
  Baber Abbasi authored Mar 20, 2025
  
  c6b9aeeb
- Configure the pad tokens for Qwen when using vLLM (#2810) · 61b63da7
  Yifei Zhang authored Mar 20, 2025
  
  61b63da7
18 Mar, 2025 1 commit
- [hf-multimodal] pass kwargs to self.processor (#2667) · 1e2428a2
  Baber Abbasi authored Mar 18, 2025
```
* add min_pixels, max_pixels

* fix
```
  1e2428a2
17 Mar, 2025 1 commit

Add support for token-based auth for watsonx models (#2796) · 78d57e0f

Kiersten Stokes authored Mar 17, 2025

* Add support for token-based auth for watsonx models

* Fix lint

* Move dotenv import to inner scope

* Improve readability of _verify_credentials

78d57e0f

14 Mar, 2025 2 commits

add audio modality (qwen2 audio only) (#2689) · 62552d2c

achervyakov authored Mar 14, 2025



* Added audio-modality pipeline for qwen2-audio model

* Beauty imports

* fix apply_chat_template args

* update default audio placeholders list

* add demo task - common_voice subset

* add audiolm_qwen libs to pyproject.toml

* pre-commit beautify

---------
Co-authored-by: Alexandra Rak <rakalexandra@mail.ru>

62552d2c

use verify_certificate flag in batch requests (#2785) · 3b7dbef9
daniel-salib authored Mar 14, 2025

3b7dbef9

11 Mar, 2025 1 commit
- initialize tokenizer with bos_token (#2781) · 07bd7e23
  Baber Abbasi authored Mar 11, 2025
  
  07bd7e23
04 Mar, 2025 1 commit

Enable steering HF models (#2749) · d35008f1

Lucia Quirke authored Mar 04, 2025



* Enable steering HF models
Co-authored-by: Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

* increase HF download timeout

* Update readme; improve steering vector device handling

* Update latest news

* remove HF timeout increase

* fix tests

* ignore sae lens test

* fix accidental force push

---------
Co-authored-by: Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

d35008f1

27 Feb, 2025 1 commit
- fix vllm data parallel (#2746) · a87fe425
  Baber Abbasi authored Feb 27, 2025
```
* remove ray.remote resources

* remove kobtest tag (registered as group)
```
  a87fe425
25 Feb, 2025 1 commit

Support SGLang as Potential Backend for Evaluation (#2703) · 29971faa

Jinwei authored Feb 25, 2025



* initial components to support sglang

* init of class SGLangLM

* draft for generate_until of SGLang model

* mock loglikelihood

* initial loglikelihood_tokens

* todo: fix bug of sglang engine init

* implement generation tasks and test

* support output type loglikelihood and loglikelihood_rolling (#1)

* .

* loglikelihood_rolling

* /

* support dp_size>1

* typo

* add tests and clean code

* skip tests of sglang for now

* fix OOM error of sglang pytest

* finish test for sglang

* add sglang to readme

* fix OOM of tests and clean SGLang model

* update readme

* clean pyproject and add tests for evaluator

* add accuracy tests and it passed locally

* add notes for test

* Update README.md

update readme

* pre-commit

---------
Co-authored-by: Xiaotong Jiang <xiaotong.jiang@databricks.com>
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
Co-authored-by: Baber <baber@hey.com>

29971faa

24 Feb, 2025 1 commit
- add o3-mini support (#2697) · 01849b40
  Jocelyn authored Feb 25, 2025
```
* add o3-mini support

* fix linter tests
```
  01849b40
21 Feb, 2025 1 commit

Logging (#2203) · 1ba35e62

Lintang Sutawika authored Feb 20, 2025



* changed source of eval_logger

* allow eval_logger to be set from args

* removed verbosity arg from non-main methods

* fix logging

* pre-commit

* set verbosity in eval logger

* replace utils.eval_logger

* fix logging in main

* add logging to docs

* add logging message

* nit

* add logging to docs

* refactor setup_logging to utils

---------
Co-authored-by: Baber <baber@hey.com>

1ba35e62

17 Feb, 2025 1 commit
- fix vllm (#2708) · 52df63b7
  Baber Abbasi authored Feb 17, 2025
```
* fix vllm

* fix data_parallel

* copy to multimodal
```
  52df63b7
12 Feb, 2025 1 commit
- change ensure_ascii to False for JsonChatStr (#2691) · 96f5e58f
  achervyakov authored Feb 13, 2025
  
  96f5e58f
07 Feb, 2025 1 commit
- remove cuda device assertion (#2680) · a40fe42a
  Baber Abbasi authored Feb 07, 2025
  
  a40fe42a
21 Jan, 2025 1 commit
- Fix max_tokens handling in vllm_vlms.py (#2637) · 370e2f9e
  Jan Kaniecki authored Jan 21, 2025
```
* Update vllm_vlms.py

* pre-commit

---------
Co-authored-by: Baber <baber@hey.com>
```
  370e2f9e
19 Jan, 2025 1 commit
- update pre-commit (#2632) · f724be69
  Baber Abbasi authored Jan 19, 2025
```
* update pre-commit
```
  f724be69
15 Jan, 2025 1 commit

assistant prefill (#2615) · 703fbffd

Baber Abbasi authored Jan 15, 2025

* add assistant prefix

* add arc_challenge from llama

* nit

* nit

* nit

* add assistant prefix

* add mmlu_llama

* nit

* nit

* Revert "nit"

This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.

* fix regex bug

* add assistant_prefix to vllm

* add `Question:`

* add mmlu_pro

* add fewshot assistant_prefix

* use `assistant_prefill`

* typehints

* nits

* nits

* add to docs

* add readme

703fbffd

07 Jan, 2025 1 commit

Fix gguf loading via Transformers (#2596) · 16cfe464

CL-ModelCloud authored Jan 07, 2025



* hf support load gguf file

* code review

* code review

* code clean up

* note about use_fast compat with gguf

---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

16cfe464

25 Dec, 2024 1 commit

fix extra_match low if batch_size > 1 (#2595) · 59f9ad4b

Wang, Yi authored Dec 25, 2024



* fix extra_match low if batch_size > 1
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add sorting to logprobs

* nit

---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Baber <baber@hey.com>

59f9ad4b

19 Dec, 2024 1 commit
- add warning for truncation (#2585) · 6ccd520f
  Baber Abbasi authored Dec 19, 2024
```
* add warning for truncation
```
  6ccd520f
16 Dec, 2024 1 commit

batch `loglikelihood_rolling` across requests (#2559) · 0bfb0220

Baber Abbasi authored Dec 16, 2024

* batch all rolling token windows

* nit

* copy to vllm

* fix max_length for `get_rolling_token_windows`

* bugfix

* bugfix

* add type hints

0bfb0220

13 Dec, 2024 1 commit

add optimum-intel ipex model (#2566) · 919470a1

Yao Matrix authored Dec 14, 2024



* initial support for optimum-intel ipex model. LM model as first step

* format
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* pass dtype
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* update README
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

919470a1

09 Dec, 2024 2 commits

Update Lightning import (#2549) · 0b994433

Maanu Grover authored Dec 09, 2024



* update import
Signed-off-by: Maanu Grover <maanug@nvidia.com>

* run formatting

---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>

0b994433

[API] left truncate for generate_until (#2554) · 2d11f2e5
Baber Abbasi authored Dec 09, 2024
```
* left truncate for generate_until

* pre-commit
```
2d11f2e5

04 Dec, 2024 1 commit
- Support pipeline parallel with OpenVINO models (#2349) · 1f9bc88f
  Slawomir Strehlke authored Dec 04, 2024
```
* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
```
  1f9bc88f