Commits · ade1cc4e481c65f8be44b8f4c6fc2c4942e6bd33 · gaoqiong / lm-evaluation-harness

01 Nov, 2024 1 commit
- Add missing task links (#2449) · ade1cc4e
  Sypherd authored Nov 01, 2024
  
  ade1cc4e
31 Oct, 2024 1 commit

Add GPTQModel support for evaluating GPTQ models (#2217) · 4f8e479e

Qubitium-ModelCloud authored Nov 01, 2024



* support gptqmodel

* code opt

* add gptqmodel option

* Update huggingface.py

* Update pyproject.toml

* gptqmodel version upgraded to 1.0.6

* GPTQModel version upgraded to 1.0.8

* Update pyproject.toml

* fix ruff-format error

* add gptqmodel test

* Update gptqmodel test model

* skip cuda

* python3.8 compatible

* Update README.md

* Update README.md

---------
Co-authored-by: CL-ModelCloud <cl@modelcloud.ai>

4f8e479e

30 Oct, 2024 3 commits
- Add verify_certificate argument to local-completion (#2440) · 57272b63
  Samuel Monson authored Oct 30, 2024
  
  57272b63
- Add xquad task (#2435) · b40a20ae
  zxcvuser authored Oct 30, 2024
```
* Add xquad task

* Update general README

* Run pre-commit
```
  b40a20ae
- Fix lora requests when dp with vllm (#2433) · 838a3e03
  Chris Kerwell Gresla authored Oct 30, 2024
```
* fix: use lora_request for data parallel vllm evals

* fix(docs): include type hint

* chore: lint, et pre-commit al

---------
Co-authored-by: Chris Kerwell Gresla <chris@wafer.systems>
```
  838a3e03
25 Oct, 2024 1 commit

Fix package extras for watsonx support (#2426) · 7882043b

Kiersten Stokes authored Oct 25, 2024



* Update pyproject.toml with watsonx package extra
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

* Remove unused function
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

---------
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

7882043b

23 Oct, 2024 1 commit

Support for IBM watsonx_llm (#2397) · 1185e89a

Nikodem Szwast authored Oct 23, 2024



* add support for IBM watsonx_llm

* add ibm_watsonx_ai package to optional-dependencies

* move global scope imports to inner scope

* change cache to lru_cache

* fix circular import

* use 3.8 typing

* use 3.8 typing

---------
Co-authored-by: Baber <baber@hey.com>

1185e89a

22 Oct, 2024 2 commits

[Fix] Replace generic exception classes with a more specific ones (#1989) · d4ae9635

Leonid Sinev authored Oct 22, 2024

* Replace generic exception classes with a more specific ones

* rerun pre-commit to pass linter tests

* Revert "rerun pre-commit to pass linter tests"

This reverts commit 67f88ccf144469853217704520e613196042d859.

* reduce repetitions in errors or so

* Replace generic exception class with a more specific one

d4ae9635

Update prompt (#2421) · 389347ee

Iker García-Ferrero authored Oct 22, 2024

Update prompt according to: 
https://github.com/ikergarcia1996/NoticIA/blob/main/prompts.py

389347ee

20 Oct, 2024 1 commit
- fix storycloze datanames (#2409) · 9b052fdc
  Yuxian Gu authored Oct 20, 2024
  
  9b052fdc
17 Oct, 2024 2 commits
- Fix: Turkish MMLU Regex Pattern (#2393) · c1d8795d
  Arda authored Oct 17, 2024
```
* Fix Regex Pattern for CoT experiments

---------
```
  c1d8795d
- group to tag for minerva_math (#2404) · 624017b7
  Ranger authored Oct 17, 2024
```
I find out this bug by comparing the code between hendrycks_math and minerva_math.
```
  624017b7
16 Oct, 2024 1 commit

Add new tasks to spanish_bench and fix duplicates (#2390) · 7ecee2bc

zxcvuser authored Oct 17, 2024



* added tasks to spanish_bench

* fixed capitalization in escola and run pre-commit

* Update _flores_common_yaml

* Update _flores_common_yaml

* Update direct_yaml

* Update cot_yaml

* Update cot_yaml

* Update _flores_common_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

7ecee2bc

14 Oct, 2024 1 commit

Add Unitxt Multimodality Support (#2364) · 7785577c

Elron Bandel authored Oct 14, 2024



* Add Unitxt Multimodality Support
Signed-off-by: elronbandel <elronbandel@gmail.com>

* Update
Signed-off-by: elronbandel <elronbandel@gmail.com>

* Fix formatting
Signed-off-by: elronbandel <elronbandel@gmail.com>

---------
Signed-off-by: elronbandel <elronbandel@gmail.com>

7785577c

08 Oct, 2024 4 commits

Bump version to v0.4.5 (#2389) · 0845b588
Hailey Schoelkopf authored Oct 08, 2024

0845b588
Fix Llava-1.5-hf ; Update to version 0.4.5 (#2388) · 2576a8cb
Hailey Schoelkopf authored Oct 08, 2024

2576a8cb

max_images are passed on to vllms `limit_mm_per_prompt` (#2387) · 1ed1f9ed

Baber Abbasi authored Oct 09, 2024

* max_images are passed on to vllms `limit_mm_per_prompt`

* replace max image placeholders in string

* handle chat_template error

* move `fewshot_random_seed` to global

1ed1f9ed

HF: switch conditional checks to `self.backend` from `AUTO_MODEL_CLASS` (#2353) · ab2c46c3

Baber Abbasi authored Oct 09, 2024



* switch conditional checks to `self.backend`

* nit

* nit

* commit feedback

* fix test; update precommit hooks

* add escape hatch for custom self.AUTO_MODEL_CLASS

* add escape hatch for custom self.AUTO_MODEL_CLASS

* fix

* move assertion

* add logging messages

* update AUTO_MODEL_CLASS behavior in _get_backend

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

ab2c46c3

07 Oct, 2024 5 commits

[API] tokenizer: add trust-remote-code (#2372) · 4cec66e4

Baber Abbasi authored Oct 07, 2024



* tokenizer: trust-remote-code

* pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4cec66e4

Fix float limit override (#2325) · aa457edc

Chenjie Luo authored Oct 07, 2024

* Fix float limit override

See: https://github.com/EleutherAI/lm-evaluation-harness/issues/2324

The float limit will be override with the previous int limit of multiple tasks are triggered together.

This PR fix this issue

* Update evaluator.py

* Update evaluator.py

aa457edc

LingOly - Fixing scoring bugs for smaller models (#2376) · fe3040f1
am-bean authored Oct 07, 2024
```
* Fixing scoring bugs for smaller models

* Catching another error type in parsing
```
fe3040f1
Solution for CSAT-QA tasks evaluation (#2385) · 8f619361
kyujinHan authored Oct 07, 2024

8f619361
Hotfix! (#2383) · bfdcdbe0
Baber Abbasi authored Oct 07, 2024
```
* bugfix

* pre-commit
```
bfdcdbe0

04 Oct, 2024 3 commits

fix tests (#2380) · 5e0bc289
Baber Abbasi authored Oct 04, 2024

5e0bc289

Add new benchmark: Catalan bench (#2154) · cb069004

zxcvuser authored Oct 04, 2024



* Add catalan_bench

* added flores_ca.yaml

* Updated some task groupings and readme

* Fix create_yamls_flores_ca.py

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

cb069004

Add new benchmark: Basque bench (#2153) · c887796d

zxcvuser authored Oct 04, 2024



* Add basque_bench

* Add flores_eu group

* Update _flores_common_yaml

* Run linters, updated flores, mgsm, copa, and readme

* Apply suggestions from code review
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>

---------
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

c887796d

03 Oct, 2024 2 commits

Add new benchmark: Galician bench (#2155) · 0e763862

zxcvuser authored Oct 03, 2024

* Add galician_bench

* Update xnli_gl path

* Add flores_gl group

* Update _flores_common_yaml

* Updated some task groupings and readme

---------

0e763862

Add new benchmark: Spanish bench (#2157) · ea17b98e

zxcvuser authored Oct 03, 2024

* Add spanish_bench

* Add flores_es group

* Update _flores_common_yaml

* Delete lm_eval/tasks/spanish_bench/escola.yaml

* Delete escola from spanish_bench.yaml

* Delete escola from README.md

* pre-commit run --all-files

* Updated some task groupings and readme

---------

ea17b98e

30 Sep, 2024 2 commits
- Fix missing key in custom task loading. (#2304) · 15ffb0da
  Giulio Lovisotto authored Sep 30, 2024
  
  15ffb0da
- Add new benchmark: Portuguese bench (#2156) · caa7c409
  zxcvuser authored Sep 30, 2024
```
* Add portuguese_bench

* Add flores_pt group

* Update _flores_common_yaml

* Run linters and update flores and readme
```
  caa7c409
28 Sep, 2024 1 commit

fix some bugs of mmlu (#2299) · 5a48ca27

eyuansu62 authored Sep 28, 2024



* fix some bugs of mmlu

* Fix end of file newline issue

---------
Co-authored-by: eyuansu62 <772468951@qq.com>

5a48ca27

26 Sep, 2024 9 commits

openai: better error messages; fix greedy matching (#2327) · 1bc6c933

Baber Abbasi authored Sep 27, 2024



* better error message; fix greedy matching

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

1bc6c933

add mmlu readme (#2282) · 00f5537a
Baber Abbasi authored Sep 27, 2024

00f5537a

Added TurkishMMLU to LM Evaluation Harness (#2283) · deb43287

Arda authored Sep 26, 2024



* Added TurkishMMLU to LM Evaluation Harness

* Fixed COT name

* Fixed COT name

* Updated Readme

* Fixed Test issues

* Completed  Scan for changed tasks

* Updated Readme

* Update README.md

* fixup task naming casing + ensure yaml template stubs aren't registered

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

deb43287

mmlu-pro: add newlines to task descriptions (not leaderboard) (#2334) · 558d0d71

Baber Abbasi authored Sep 27, 2024



* add newlines to task descriptions; increment versions

* fix task tests (with groups)

* Apply suggestions from code review

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

558d0d71

change glianorex to test split (#2332) · 7d242381

Baber Abbasi authored Sep 26, 2024

* change glianorex to test set

* nit

* fix test; doc_to_target can be str for multiple_choice

* nit

7d242381

change group to tags in task `eus_exams` task configs (#2320) · af92448e
Baber Abbasi authored Sep 26, 2024

af92448e

Treat tags in python tasks the same as yaml tasks (#2288) · b2bf7bc4

Giulio Lovisotto authored Sep 26, 2024

* Treat python tasks same as yaml tasks.

* Add tests.

* Re-add fixture decorators.

* Fix typing specification error for Python 3.9.

b2bf7bc4

fix writeout script (#2350) · 72d619ff
Baber Abbasi authored Sep 26, 2024

72d619ff
load metric with `evaluate` (#2351) · f378f306
Baber Abbasi authored Sep 26, 2024

f378f306