Commits · 2b56339e7b8d4301a2ade091aae38c4cbcef8d9f · gaoqiong / lm-evaluation-harness

15 Jan, 2025 1 commit

Baber Abbasi authored Jan 15, 2025

* add assistant prefix

* add arc_challenge from llama

* nit

* nit

* nit

* add assistant prefix

* add mmlu_llama

* nit

* nit

* Revert "nit"

This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.

* fix regex bug

* add assistant_prefix to vllm

* add `Question:`

* add mmlu_pro

* add fewshot assistant_prefix

* use `assistant_prefill`

* typehints

* nits

* nits

* add to docs

* add readme

703fbffd

07 Jan, 2025 1 commit

Fix gguf loading via Transformers (#2596) · 16cfe464

CL-ModelCloud authored Jan 07, 2025



* hf support load gguf file

* code review

* code review

* code clean up

* note about use_fast compat with gguf

---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

16cfe464

25 Dec, 2024 1 commit

fix extra_match low if batch_size > 1 (#2595) · 59f9ad4b

Wang, Yi authored Dec 25, 2024



* fix extra_match low if batch_size > 1
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add sorting to logprobs

* nit

---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Baber <baber@hey.com>

59f9ad4b

19 Dec, 2024 1 commit
- add warning for truncation (#2585) · 6ccd520f
  Baber Abbasi authored Dec 19, 2024
```
* add warning for truncation
```
  6ccd520f
16 Dec, 2024 1 commit

batch `loglikelihood_rolling` across requests (#2559) · 0bfb0220

Baber Abbasi authored Dec 16, 2024

* batch all rolling token windows

* nit

* copy to vllm

* fix max_length for `get_rolling_token_windows`

* bugfix

* bugfix

* add type hints

0bfb0220

13 Dec, 2024 1 commit

add optimum-intel ipex model (#2566) · 919470a1

Yao Matrix authored Dec 14, 2024



* initial support for optimum-intel ipex model. LM model as first step

* format
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* pass dtype
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* update README
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

919470a1

09 Dec, 2024 2 commits

Update Lightning import (#2549) · 0b994433

Maanu Grover authored Dec 09, 2024



* update import
Signed-off-by: Maanu Grover <maanug@nvidia.com>

* run formatting

---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>

0b994433

[API] left truncate for generate_until (#2554) · 2d11f2e5
Baber Abbasi authored Dec 09, 2024
```
* left truncate for generate_until

* pre-commit
```
2d11f2e5

04 Dec, 2024 1 commit
- Support pipeline parallel with OpenVINO models (#2349) · 1f9bc88f
  Slawomir Strehlke authored Dec 04, 2024
```
* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
```
  1f9bc88f
03 Dec, 2024 1 commit

avoid timeout errors with high concurrency in api_model (#2307) · 9632b343

Trawinski, Dariusz authored Dec 03, 2024



* avoid timeout errors with high concurrency in api_model

* style

* add timeout

* add docs

---------
Co-authored-by: Baber <baber@hey.com>

9632b343

01 Dec, 2024 1 commit

Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e

Yoav Katz authored Dec 01, 2024


Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)

* Moved to require unitxt installation and not download unitxt from HF hub.

This has performance benefits and simplifies the code.
Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated watsonx documentation

* Updated installation instructions

* Removed redundant comman

* Allowed unitxt tasks to generate chat APIs

Modified WatsonXI model to support chat apis

* Removed print

* Run precommit formatting

---------
Signed-off-by: Yoav Katz <katz@il.ibm.com>

1170ef9e

30 Nov, 2024 1 commit
- make utility function to handle `until` (#2518) · 0230356c
  Baber Abbasi authored Nov 30, 2024
```
* make utility function to handle `until`

* fix text
```
  0230356c
22 Nov, 2024 1 commit
- parse tokenizer_backend=None properly (#2509) · 9d36354e
  Baber Abbasi authored Nov 22, 2024
  
  9d36354e
18 Nov, 2024 1 commit
- Add mamba hf to `mamba_ssm` (#2496) · 0f5dc265
  Baber Abbasi authored Nov 18, 2024
```
* add hf mamba to mamba_lm

* fix _model_generate for hf
```
  0f5dc265
16 Nov, 2024 1 commit
- update pre-commit hooks and git actions (#2497) · badf273a
  Baber Abbasi authored Nov 16, 2024
```
* pre-commit update

* update github actions

* make logging less verbose

* fix artifacts
```
  badf273a
15 Nov, 2024 2 commits

Fix revision parameter to vllm get_tokenizer (#2492) · e20e1ddc
Oyvind Tafjord authored Nov 15, 2024

e20e1ddc

IBM watsonx_llm fixes & refactor (#2464) · 4259a6d4

Nikodem Szwast authored Nov 15, 2024

* refactor code, fix config path bug

* update types to be from typing lib

* add pre-commit formatting

* specify version of ibm_watsonx_ai package

* adjust get_watsonx_credentials() function, add minor refactor to adress PR review comments

* change missing installation hint from ibm_watsonx_ai to lm_eval[ibm_watsonx_ai]

4259a6d4

11 Nov, 2024 2 commits

change warning to debug (#2481) · 6b628d9a
Baber Abbasi authored Nov 11, 2024

6b628d9a

Fix chat template; fix leaderboard math (#2475) · 77c811ea

Baber Abbasi authored Nov 11, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* Chat template fix (#7)

* cleanup

* cleanup

* cleanup

* linting

* fix tests

* add ifeval install to new_task CI

* Revert "add ifeval install to new_task CI"

This reverts commit 1d19449bb7fbfa05d51e7cd20950475eae533bf1.

* adds leaderboard tasks (#1)

* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix math parser

* fix math parser

* fix version

* add warning about chat template

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.co>
Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Nathan Habib <nathan.habib19@gmail.com>

77c811ea

09 Nov, 2024 1 commit

OpenAI ChatCompletions: switch `max_tokens` (#2443) · 060e8761

Baber Abbasi authored Nov 09, 2024

* switch `max_tokens` for `max_completion_tokens`. OpenAI ChatCompletions

* remove stop, temp=1 for o1

* add chat assertion

* HF_DATASETS_TRUST_REMOTE_CODE = True for task tests

* move warning

060e8761

07 Nov, 2024 1 commit
- pass device_map other than auto for parallelize (#2457) · 4155ec7f
  Baber Abbasi authored Nov 07, 2024
```
* pass device_map other than auto for parallelize
```
  4155ec7f
06 Nov, 2024 1 commit
- Fix 'loglikelihood' typos in the api models file (#2459) · bf2abb41
  Rob Geada authored Nov 06, 2024
  
  bf2abb41
31 Oct, 2024 1 commit

Add GPTQModel support for evaluating GPTQ models (#2217) · 4f8e479e

Qubitium-ModelCloud authored Nov 01, 2024



* support gptqmodel

* code opt

* add gptqmodel option

* Update huggingface.py

* Update pyproject.toml

* gptqmodel version upgraded to 1.0.6

* GPTQModel version upgraded to 1.0.8

* Update pyproject.toml

* fix ruff-format error

* add gptqmodel test

* Update gptqmodel test model

* skip cuda

* python3.8 compatible

* Update README.md

* Update README.md

---------
Co-authored-by: CL-ModelCloud <cl@modelcloud.ai>

4f8e479e

30 Oct, 2024 2 commits
- Add verify_certificate argument to local-completion (#2440) · 57272b63
  Samuel Monson authored Oct 30, 2024
  
  57272b63
- Fix lora requests when dp with vllm (#2433) · 838a3e03
  Chris Kerwell Gresla authored Oct 30, 2024
```
* fix: use lora_request for data parallel vllm evals

* fix(docs): include type hint

* chore: lint, et pre-commit al

---------
Co-authored-by: Chris Kerwell Gresla <chris@wafer.systems>
```
  838a3e03
25 Oct, 2024 1 commit

Fix package extras for watsonx support (#2426) · 7882043b

Kiersten Stokes authored Oct 25, 2024



* Update pyproject.toml with watsonx package extra
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

* Remove unused function
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

---------
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

7882043b

23 Oct, 2024 1 commit

Support for IBM watsonx_llm (#2397) · 1185e89a

Nikodem Szwast authored Oct 23, 2024



* add support for IBM watsonx_llm

* add ibm_watsonx_ai package to optional-dependencies

* move global scope imports to inner scope

* change cache to lru_cache

* fix circular import

* use 3.8 typing

* use 3.8 typing

---------
Co-authored-by: Baber <baber@hey.com>

1185e89a

22 Oct, 2024 1 commit

[Fix] Replace generic exception classes with a more specific ones (#1989) · d4ae9635

Leonid Sinev authored Oct 22, 2024

* Replace generic exception classes with a more specific ones

* rerun pre-commit to pass linter tests

* Revert "rerun pre-commit to pass linter tests"

This reverts commit 67f88ccf144469853217704520e613196042d859.

* reduce repetitions in errors or so

* Replace generic exception class with a more specific one

d4ae9635

08 Oct, 2024 3 commits

Fix Llava-1.5-hf ; Update to version 0.4.5 (#2388) · 2576a8cb
Hailey Schoelkopf authored Oct 08, 2024

2576a8cb

max_images are passed on to vllms `limit_mm_per_prompt` (#2387) · 1ed1f9ed

Baber Abbasi authored Oct 09, 2024

* max_images are passed on to vllms `limit_mm_per_prompt`

* replace max image placeholders in string

* handle chat_template error

* move `fewshot_random_seed` to global

1ed1f9ed

HF: switch conditional checks to `self.backend` from `AUTO_MODEL_CLASS` (#2353) · ab2c46c3

Baber Abbasi authored Oct 09, 2024



* switch conditional checks to `self.backend`

* nit

* nit

* commit feedback

* fix test; update precommit hooks

* add escape hatch for custom self.AUTO_MODEL_CLASS

* add escape hatch for custom self.AUTO_MODEL_CLASS

* fix

* move assertion

* add logging messages

* update AUTO_MODEL_CLASS behavior in _get_backend

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

ab2c46c3

07 Oct, 2024 1 commit

[API] tokenizer: add trust-remote-code (#2372) · 4cec66e4

Baber Abbasi authored Oct 07, 2024



* tokenizer: trust-remote-code

* pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4cec66e4

26 Sep, 2024 1 commit

openai: better error messages; fix greedy matching (#2327) · 1bc6c933

Baber Abbasi authored Sep 27, 2024



* better error message; fix greedy matching

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

1bc6c933

24 Sep, 2024 1 commit
- Fixed dummy model (#2339) · d7734d19
  Amine Elhattami authored Sep 24, 2024
  
  d7734d19
18 Sep, 2024 1 commit

Update neuron backend (#2314) · 9a092f37

David Corvoysier authored Sep 18, 2024

* feat(neuron): align with latest optimum-neuron

* feat(neuron): support pre-exported neuron models

* fix(neuron): correctly use max_length

* fix(neuron): adapt loglikelihood

The evaluation of log likelihood was not working for neuron models
using continuous batching, such as all cached neuron LLama models.

* refactor(neuron): remove dead code

9a092f37

13 Sep, 2024 1 commit

Multimodal prototyping (#2243) · fb963f0f

Lintang Sutawika authored Sep 13, 2024



* add WIP hf vlm class

* add doc_to_image

* add mmmu tasks

* fix merge conflicts

* add lintang's changes to hf_vlms.py

* fix doc_to_image

* added yaml_path for config-loading

* revert

* add line to process str type v

* update

* modeling cleanup

* add aggregation for mmmu

* rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP)

* implemented doc_to_image

* update doc_to_image to accept list of features

* update functions

* readd image processed

* update args process

* bugfix for repeated images fed to model

* push WIP loglikelihood code

* commit most recent code (generative ; qwen2-vl testing)

* preliminary image_token_id handling

* small mmmu update: some qs have >4 mcqa options

* push updated modeling code

* use processor.apply_chat_template

* add mathvista draft

* nit

* nit

* ensure no footguns in text<>multimodal LM<>task incompatibility

* add notification to readme regarding launch of prototype!

* fix compatibility check

* reorganize mmmu configs

* chat_template=None

* add interleave chat_template

* add condition

* add max_images; interleave=true

* nit

* testmini_mcq

* nit

* pass image string; convert img

* add vllm

* add init

* vlm add multi attr

* fixup

* pass max images to vllm model init

* nit

* encoding to device

* fix HFMultimodalLM.chat_template ?

* add mmmu readme

* remove erroneous prints

* use HFMultimodalLM.chat_template ; restore tasks/__init__.py

* add docstring for replace_placeholders in utils

* fix `replace_placeholders`; set image_string=None

* fix typo

* cleanup + fix merge conflicts

* update MMMU readme

* del mathvista

* add some sample scores

* Update README.md

* add log msg for image_string value

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Baber Abbasi <baber@eleuther.ai>
Co-authored-by: Baber <baber@hey.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

fb963f0f

04 Sep, 2024 1 commit

Chat Template fix (cont. #2235) (#2269) · 7a1614eb

Baber Abbasi authored Sep 04, 2024



* default chat template method fix

* move chat_template to TemplateLM

* remove hotfix

* handle openai `chat_template`

* Update lm_eval/api/model.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* add 'max_tokens' to gen_kwargs

* pre-commit

---------
Co-authored-by: KonradSzafer <szafer.konrad@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

7a1614eb

30 Aug, 2024 2 commits

hotfix #2262 (#2264) · 928e8bb6

Baber Abbasi authored Aug 30, 2024

* max_length - 1 (generation always >= 1)

* vllm: fix rolling prefix_token

* nit: add comment

* fixup! max_length should be handled for logliklihoods

* Revert "fixup! max_length should be handled for logliklihoods"

This reverts commit 432d1a3b754c117c3a54ea2fe792ab3a1bd09ed3.

928e8bb6

API: fix maxlen; vllm: prefix_token_id bug (#2262) · b31f92e8

Baber Abbasi authored Aug 30, 2024

* max_length - 1 (generation always >= 1)

* vllm: fix rolling prefix_token

* nit: add comment

* fixup! max_length should be handled for logliklihoods

b31f92e8

28 Aug, 2024 1 commit

Fix `loglikelihood_rolling` caching ( #1821 ) (#2187) · 8138fd52

Hailey Schoelkopf authored Aug 28, 2024



* fix revision type

* allow for None-input loglikelihood reqs to be cached

* handle no remaining cache items

* pre-commit

* change cache_hook.add_partial(loglikelihood_rolling...) convention

---------
Co-authored-by: Baber Abbasi <baber@eleuther.ai>

8138fd52