Commits · 84d02f77ced6843dfe7fde525315cb1089d32c19 · gaoqiong / lm-evaluation-harness

06 Jul, 2025 1 commit
- delete neuralmagic models (#3112) · f93001db
  Baber Abbasi authored Jul 06, 2025
  
  f93001db
05 Jul, 2025 2 commits
- delete unneeded files (#3108) · 6e91fdcd
  Baber Abbasi authored Jul 05, 2025
```
* delete unneeded files
```
  6e91fdcd
- remove all; reformat table (#3107) · 28001d29
  Baber Abbasi authored Jul 05, 2025
  
  28001d29
03 Jul, 2025 1 commit

Bugfix/hf tokenizer gguf override (#3098) · ff41a856

Ankush authored Jul 03, 2025

* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend

ff41a856

06 May, 2025 1 commit

Change citation name (#2956) · a96085f1

Stella Biderman authored May 06, 2025

This hasn't been a library for few shot language model evaluation in quite a while. Let's update the citation to use "the Language Model Evaluation Harness" as the title.

a96085f1

01 Apr, 2025 1 commit
- Update supported models (#2866) · 773dcd7f
  Daniel Holanda authored Apr 01, 2025
  
  773dcd7f
20 Mar, 2025 1 commit

Add Markdown linter (#2818) · 7158f4f4

Kiersten Stokes authored Mar 19, 2025

* Add markdown linter to pre-commit hooks

* Reformat existing markdown (excluding lm_eval/tasks/*.md)

7158f4f4

19 Mar, 2025 1 commit
- Clean up README and pyproject.toml (#2814) · ce9ba47e
  Kiersten Stokes authored Mar 19, 2025
  
  ce9ba47e
10 Mar, 2025 1 commit
- docs: Fix typos in README.md (#2778) · ebb498e4
  Rui Vieira authored Mar 10, 2025
  
  ebb498e4
04 Mar, 2025 1 commit

Enable steering HF models (#2749) · d35008f1

Lucia Quirke authored Mar 04, 2025



* Enable steering HF models
Co-authored-by: Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

* increase HF download timeout

* Update readme; improve steering vector device handling

* Update latest news

* remove HF timeout increase

* fix tests

* ignore sae lens test

* fix accidental force push

---------
Co-authored-by: Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>

d35008f1

03 Mar, 2025 1 commit

[Readme change for SGLang] fix error in readme and add OOM solutions for sglang (#2738) · 529f4805

Jinwei authored Mar 02, 2025



* initial components to support sglang

* init of class SGLangLM

* draft for generate_until of SGLang model

* mock loglikelihood

* initial loglikelihood_tokens

* todo: fix bug of sglang engine init

* implement generation tasks and test

* support output type loglikelihood and loglikelihood_rolling (#1)

* .

* loglikelihood_rolling

* /

* support dp_size>1

* typo

* add tests and clean code

* skip tests of sglang for now

* fix OOM error of sglang pytest

* finish test for sglang

* add sglang to readme

* fix OOM of tests and clean SGLang model

* update readme

* clean pyproject and add tests for evaluator

* add accuracy tests and it passed locally

* add notes for test

* Update README.md

update readme

* pre-commit

* add OOM guideline for sglang and fix readme error

* fix typo

* fix typo

* add readme

---------
Co-authored-by: Xiaotong Jiang <xiaotong.jiang@databricks.com>
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
Co-authored-by: Baber <baber@hey.com>

529f4805

25 Feb, 2025 1 commit

Support SGLang as Potential Backend for Evaluation (#2703) · 29971faa

Jinwei authored Feb 25, 2025



* initial components to support sglang

* init of class SGLangLM

* draft for generate_until of SGLang model

* mock loglikelihood

* initial loglikelihood_tokens

* todo: fix bug of sglang engine init

* implement generation tasks and test

* support output type loglikelihood and loglikelihood_rolling (#1)

* .

* loglikelihood_rolling

* /

* support dp_size>1

* typo

* add tests and clean code

* skip tests of sglang for now

* fix OOM error of sglang pytest

* finish test for sglang

* add sglang to readme

* fix OOM of tests and clean SGLang model

* update readme

* clean pyproject and add tests for evaluator

* add accuracy tests and it passed locally

* add notes for test

* Update README.md

update readme

* pre-commit

---------
Co-authored-by: Xiaotong Jiang <xiaotong.jiang@databricks.com>
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
Co-authored-by: Baber <baber@hey.com>

29971faa

14 Feb, 2025 1 commit
- Update README.md (#2694) · 157d8c3c
  Irina Proskurina authored Feb 14, 2025
  
  157d8c3c
13 Dec, 2024 1 commit

add optimum-intel ipex model (#2566) · 919470a1

Yao Matrix authored Dec 14, 2024



* initial support for optimum-intel ipex model. LM model as first step

* format
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* pass dtype
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* update README
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

919470a1

09 Dec, 2024 1 commit
- [API] left truncate for generate_until (#2554) · 2d11f2e5
  Baber Abbasi authored Dec 09, 2024
```
* left truncate for generate_until

* pre-commit
```
  2d11f2e5
05 Dec, 2024 1 commit
- Update README.md (#2546) · bcb4cbf4
  fzyzcjy authored Dec 05, 2024
  
  bcb4cbf4
04 Dec, 2024 2 commits
- Support pipeline parallel with OpenVINO models (#2349) · 1f9bc88f
  Slawomir Strehlke authored Dec 04, 2024
```
* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
```
  1f9bc88f
- Update README.md (#2534) · 4a12959f
  Baber Abbasi authored Dec 04, 2024
```
* Update README.md

add caching tip to readme

* Update README.md

add api link
```
  4a12959f
01 Dec, 2024 1 commit

Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e

Yoav Katz authored Dec 01, 2024


Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)

* Moved to require unitxt installation and not download unitxt from HF hub.

This has performance benefits and simplifies the code.
Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated watsonx documentation

* Updated installation instructions

* Removed redundant comman

* Allowed unitxt tasks to generate chat APIs

Modified WatsonXI model to support chat apis

* Removed print

* Run precommit formatting

---------
Signed-off-by: Yoav Katz <katz@il.ibm.com>

1170ef9e

31 Oct, 2024 1 commit

Add GPTQModel support for evaluating GPTQ models (#2217) · 4f8e479e

Qubitium-ModelCloud authored Nov 01, 2024



* support gptqmodel

* code opt

* add gptqmodel option

* Update huggingface.py

* Update pyproject.toml

* gptqmodel version upgraded to 1.0.6

* GPTQModel version upgraded to 1.0.8

* Update pyproject.toml

* fix ruff-format error

* add gptqmodel test

* Update gptqmodel test model

* skip cuda

* python3.8 compatible

* Update README.md

* Update README.md

---------
Co-authored-by: CL-ModelCloud <cl@modelcloud.ai>

4f8e479e

17 Sep, 2024 1 commit

Update README.md (#2297) · a5e0adcb

SYusupov authored Sep 17, 2024

* Update README.md

I encounter some Git buffer size limits when trying to download all commits history of the repository, such as:
```error: RPC failed; curl 18 transfer closed with outstanding read data remaining
error: 5815 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF```

therefore the installation is faster and there are not errors when I download only the last version of the repository

* Fix linting issue

a5e0adcb

13 Sep, 2024 1 commit

Multimodal prototyping (#2243) · fb963f0f

Lintang Sutawika authored Sep 13, 2024



* add WIP hf vlm class

* add doc_to_image

* add mmmu tasks

* fix merge conflicts

* add lintang's changes to hf_vlms.py

* fix doc_to_image

* added yaml_path for config-loading

* revert

* add line to process str type v

* update

* modeling cleanup

* add aggregation for mmmu

* rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP)

* implemented doc_to_image

* update doc_to_image to accept list of features

* update functions

* readd image processed

* update args process

* bugfix for repeated images fed to model

* push WIP loglikelihood code

* commit most recent code (generative ; qwen2-vl testing)

* preliminary image_token_id handling

* small mmmu update: some qs have >4 mcqa options

* push updated modeling code

* use processor.apply_chat_template

* add mathvista draft

* nit

* nit

* ensure no footguns in text<>multimodal LM<>task incompatibility

* add notification to readme regarding launch of prototype!

* fix compatibility check

* reorganize mmmu configs

* chat_template=None

* add interleave chat_template

* add condition

* add max_images; interleave=true

* nit

* testmini_mcq

* nit

* pass image string; convert img

* add vllm

* add init

* vlm add multi attr

* fixup

* pass max images to vllm model init

* nit

* encoding to device

* fix HFMultimodalLM.chat_template ?

* add mmmu readme

* remove erroneous prints

* use HFMultimodalLM.chat_template ; restore tasks/__init__.py

* add docstring for replace_placeholders in utils

* fix `replace_placeholders`; set image_string=None

* fix typo

* cleanup + fix merge conflicts

* update MMMU readme

* del mathvista

* add some sample scores

* Update README.md

* add log msg for image_string value

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Baber Abbasi <baber@eleuther.ai>
Co-authored-by: Baber <baber@hey.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

fb963f0f

15 Aug, 2024 1 commit
- Update citation in README.md (#2083) · cbdc3539
  Anton Polishko authored Aug 15, 2024
```
Bumped citation to the v0.4.3
```
  cbdc3539
05 Aug, 2024 2 commits

Update README.md (#2186) · cddce0a1
Hailey Schoelkopf authored Aug 05, 2024

cddce0a1

Dp and mp support (#2056) · 0ce7734d

Nathan Habib authored Aug 05, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* linting

* add doc

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/huggingface.py

* linter

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* style

* remove prepare

* fix

* style

* last check

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: clementine@huggingface.co <clementine@huggingface.co>

0ce7734d

29 Jul, 2024 1 commit

bugfix and docs for API (#2139) · b70af4f5

Baber Abbasi authored Jul 29, 2024



* encoding bugfix

* encoding bugfix

* overload logliklehood rather than loglikehood_tokens

* add custom tokenizer

* add docs

* Update API_guide.md

fix link; add note

* Update API_guide.md

typo

* pre-commit

* add link in readme

* nit

* nit

* nit

* Update API_guide.md

nits

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update README.md

* Update docs/API_guide.md

* Update docs/API_guide.md

* Update API_guide.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

b70af4f5

22 Jul, 2024 1 commit

Refactor API models (#2008) · 42dc2448

Baber Abbasi authored Jul 23, 2024



* refactor pad_token handling to fn

* fix docs

* add pad_token_handling to vllm

* start on API superclass

* don't detokenize the returned logits

* streamline vllm tokenizer

* add type hint

* pre-commit

* seems to be in working order

* add model to init

* refactor api models

* nit

* cleanup

* add pbar

* fix type hints

* change optional dependencies

* json encode chat template

* add type hints

* deal with different prompt input requiremnts

* nits

* fix

* cache inside async

* fix

* fix

* nits

* nits

* nits

* nit

* fixup

* fixup

* nit

* add dummy retry

* add dummy retry

* handle imports; skip failing test

* add type hint

* add tests

* add dependency to tests

* add package names to exception

* nit

* docs; type hints

* handle api key

* nit

* tokenizer bug

* fix tokenizer

* nit

* nit

* add better error messages

* nit

* remove decorator

* CI: install api dep

* revert evaluator.py

* consolidate

* consolidate

* nits

* nit

* fix typealias

* nit

* nit

* nit

* Update lm_eval/models/api_models.py

typo
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/anthropic_llms.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/api_models.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix typo

* add news section

* add info for API

* pre-commit

* typo

* fix bug: unpack logliklehood requests

* fix bug: shared gen_kwargs mutated

* nit: handle copy properly

* Update README.md

* Update README.md

* Update README.md

* Update api_models.py

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

42dc2448

08 Jul, 2024 1 commit

Easier unitxt tasks loading and removal of unitxt library dependancy (#1933) · ad80f555

Elron Bandel authored Jul 08, 2024



* Updated unitxt loading
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

* Revert change to general Readme
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

* Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

* Fix scrolls
Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Update documentation
Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Enforce backward compatability
Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Format unitxt class
Signed-off-by: elronbandel <elron.bandel@ibm.com>

---------
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: elronbandel <elron.bandel@ibm.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

ad80f555

03 Jul, 2024 1 commit

Adds Open LLM Leaderboard Taks (#2047) · 3c8db1bb

Nathan Habib authored Jul 03, 2024



* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

3c8db1bb

03 Jun, 2024 1 commit

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

31 May, 2024 1 commit

Add dataset card when pushing to HF hub (#1898) · f4f59251

KonradSzafer authored May 31, 2024



* dataset card initial

* few fixes

* adds groups for math, mmlu, gpqa

* added summary agrs

* moved sanitize_list to utils

* readme update

* recreate metadata moved

* multiple model support

* results latest split fix

* readme update and small refactor

* fix grouping

* add comments

* added pathlib

* corrected pathlib approach

* check whether to create a metadata card

* convert posix paths to str

* default hf org from token

* hf token value error

* Add logs after successful upload

* logging updates

* dataset card example in the readme

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Alina Lozovskaia <alinailozovskaya@gmail.com>

f4f59251

07 May, 2024 2 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt



The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unitxt]'
2. Added a check that unitxt is installed and print a clear error message if not

* Commited missing pyproject change

* Added documentation on adding datasets

* More doc changes

* add unitxt extra to readme

* run precommit

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

885f48d6

link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b

05 May, 2024 1 commit
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
03 May, 2024 1 commit

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

25 Apr, 2024 1 commit
- reference `--tasks list` in README (#1726) · 80a056bb
  Brian Vaughan authored Apr 25, 2024
```
https://github.com/EleutherAI/lm-evaluation-harness/issues/1698
```
  80a056bb
16 Apr, 2024 2 commits

Add `neuralmagic` models for `sparseml` and `deepsparse` (#1674) · 8b326be7

Michael Goin authored Apr 16, 2024



* Add neuralmagic models for SparseML and DeepSparse

* Update to latest and add test

* Format

* Fix list to List

* Format

* Add deepsparse/sparseml to automated testing

* Update pyproject.toml

* Update pyproject.toml

* Update README

* Fixes for dtype and device

* Format

* Fix test

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Address review comments!

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8b326be7

Add delta weights model loading (#1712) · 12a165d1

KonradSzafer authored Apr 16, 2024

* added delta weights

* removed debug

* readme update

* better error handling

* autogptq warn

* warn update

* peft and delta error, explicitly deleting _model_delta

* linter fix

12a165d1

08 Apr, 2024 1 commit
- Update README.md (#1680) · 7852985b
  Hailey Schoelkopf authored Apr 08, 2024
  
  7852985b
05 Apr, 2024 1 commit

Anthropic Chat API (#1594) · 27924d77

Seungwoo Ryu authored Apr 06, 2024



* claude3

* supply for anthropic claude3

* supply for anthropic claude3

* anthropic config changes

* add callback options on anthropic

* line passed

* claude3 tiny change

* help anthropic installation

* mention sysprompt / being careful with format in readme

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

27924d77