Commits · v0.4.6 · gaoqiong / lm-evaluation-harness

22 Nov, 2024 1 commit
- parse tokenizer_backend=None properly (#2509) · 9d36354e
  Baber Abbasi authored Nov 22, 2024
  
  9d36354e
20 Nov, 2024 1 commit
- Nits (#2500) · 867413f8
  Baber Abbasi authored Nov 20, 2024
```
* fix test task

* dont call lm.chat_template each time
```
  867413f8
18 Nov, 2024 3 commits

Add metabench task to LM Evaluation Harness (#2357) · 62b4364d

Kozzy Voudouris authored Nov 18, 2024



* Add metabench (Kipnis et al. 2024)

* Update metabench tasks for full replication of original benchmarks, using publicly available datasets

* Remove unnecessary import

* Add permute versions of each task, where the answer orders are randomly shuffled.

* Add metabench group for easier evaluations

* Fix mmlu counts after removing duplicate

* Add secondary datasets

* Fix f-string error

* Fix f-string error for permute processing

* Add original hash to outputs for easy matching to original results

* Add line break at end of utils files

* Remove extra line from winogrande

* Reformat for linters

* fix multiple input test

* appease pre-commit

* Add metabench to tasks README

* fix multiple input `test_doc_to_text`

---------
Co-authored-by: Baber <baber@hey.com>

62b4364d

remove duplicate `arc_ca` (#2499) · 8222ad0a
Baber Abbasi authored Nov 18, 2024

8222ad0a
Add mamba hf to `mamba_ssm` (#2496) · 0f5dc265
Baber Abbasi authored Nov 18, 2024
```
* add hf mamba to mamba_lm

* fix _model_generate for hf
```
0f5dc265

16 Nov, 2024 2 commits

kbl-v0.1.1 (#2493) · cbc31eb8

Wonseok Hwang authored Nov 17, 2024

* release kbl-v0.1

* fix linting

* remove rag tasks as  doc_to_text functions cause trouble

* remove remaining rag tasks

* remove unnecessary repeat in yaml files and rag dataset in hf-hub

* remove unncessary newline; introduce cfg files in lbox/kbl in hf

* Make task yaml files consistent to hf-datasets-config

* Make task yaml files consistent to hf-datasets-config

* Remove trailing empty space in doc-to-text

* Remove unncessary yaml file

* Fix task nameing error

* trailing space removed

cbc31eb8

update pre-commit hooks and git actions (#2497) · badf273a
Baber Abbasi authored Nov 16, 2024
```
* pre-commit update

* update github actions

* make logging less verbose

* fix artifacts
```
badf273a

15 Nov, 2024 2 commits

Fix revision parameter to vllm get_tokenizer (#2492) · e20e1ddc
Oyvind Tafjord authored Nov 15, 2024

e20e1ddc

IBM watsonx_llm fixes & refactor (#2464) · 4259a6d4

Nikodem Szwast authored Nov 15, 2024

* refactor code, fix config path bug

* update types to be from typing lib

* add pre-commit formatting

* specify version of ibm_watsonx_ai package

* adjust get_watsonx_credentials() function, add minor refactor to adress PR review comments

* change missing installation hint from ibm_watsonx_ai to lm_eval[ibm_watsonx_ai]

4259a6d4

12 Nov, 2024 1 commit
- wandb logger fix, added pre-commit (#2484) · 67db63a5
  Alex Titterton authored Nov 12, 2024
  
  67db63a5
11 Nov, 2024 2 commits

change warning to debug (#2481) · 6b628d9a
Baber Abbasi authored Nov 11, 2024

6b628d9a

Fix chat template; fix leaderboard math (#2475) · 77c811ea

Baber Abbasi authored Nov 11, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* Chat template fix (#7)

* cleanup

* cleanup

* cleanup

* linting

* fix tests

* add ifeval install to new_task CI

* Revert "add ifeval install to new_task CI"

This reverts commit 1d19449bb7fbfa05d51e7cd20950475eae533bf1.

* adds leaderboard tasks (#1)

* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix math parser

* fix math parser

* fix version

* add warning about chat template

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.co>
Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Nathan Habib <nathan.habib19@gmail.com>

77c811ea

09 Nov, 2024 2 commits

Ifeval: Dowload `punkt_tab` on rank 0 (#2267) · bd80a6c0
Baber Abbasi authored Nov 09, 2024
```
* download nltk `punkt_tab` on LOCAL_RANK=0

* remove print

* remove `time`

* nit
```
bd80a6c0

OpenAI ChatCompletions: switch `max_tokens` (#2443) · 060e8761

Baber Abbasi authored Nov 09, 2024

* switch `max_tokens` for `max_completion_tokens`. OpenAI ChatCompletions

* remove stop, temp=1 for o1

* add chat assertion

* HF_DATASETS_TRUST_REMOTE_CODE = True for task tests

* move warning

060e8761

07 Nov, 2024 3 commits
- pass device_map other than auto for parallelize (#2457) · 4155ec7f
  Baber Abbasi authored Nov 07, 2024
```
* pass device_map other than auto for parallelize
```
  4155ec7f
- typo (#2465) · 901053f6
  Baber Abbasi authored Nov 07, 2024
  
  901053f6
- use global filter (#2461) · cd18cb3b
  Baber Abbasi authored Nov 07, 2024
  
  cd18cb3b
06 Nov, 2024 1 commit
- Fix 'loglikelihood' typos in the api models file (#2459) · bf2abb41
  Rob Geada authored Nov 06, 2024
  
  bf2abb41
05 Nov, 2024 3 commits

Add Japanese Leaderboard (#2439) · 26f607f5

mtkachenko authored Nov 05, 2024

* add jaqket_v2 and jcommonsenseqa

* remove comments

* remove num_beams as it is incompatible with vllm

* add jnli + refactor

* rename jnla -> jnli

* add jsquad + replace colon chars with the Japanese unicode

* ignore whitespaces in generation tasks

* add marc_ja

* add xwinograd + simplify other yamls

* add mgsm and xlsum

* refactor xlsum

* add ja_leaderboard tag

* edit README.md

* update README.md

* add credit + minor changes

* run ruff format

* address review comments + add group

* remove aggregate_metric_list

* remove tags

* update tasks/README.md

26f607f5

Modify label errors in catcola and paws-x (#2434) · fb2e4b59

zxcvuser authored Nov 05, 2024



* Modify label errors in catcola and paws

* Update version to 1.0 in pawsx_template_yaml

* add changelog

---------
Co-authored-by: Baber <baber@hey.com>

fb2e4b59

Add real process_docs example (#2456) · 0b8358ec
Sypherd authored Nov 05, 2024

0b8358ec

04 Nov, 2024 1 commit
- Update CODEOWNERS (#2453) · c0745fec
  Hailey Schoelkopf authored Nov 04, 2024
  
  c0745fec
01 Nov, 2024 1 commit
- Add missing task links (#2449) · ade1cc4e
  Sypherd authored Nov 01, 2024
  
  ade1cc4e
31 Oct, 2024 1 commit

Add GPTQModel support for evaluating GPTQ models (#2217) · 4f8e479e

Qubitium-ModelCloud authored Nov 01, 2024



* support gptqmodel

* code opt

* add gptqmodel option

* Update huggingface.py

* Update pyproject.toml

* gptqmodel version upgraded to 1.0.6

* GPTQModel version upgraded to 1.0.8

* Update pyproject.toml

* fix ruff-format error

* add gptqmodel test

* Update gptqmodel test model

* skip cuda

* python3.8 compatible

* Update README.md

* Update README.md

---------
Co-authored-by: CL-ModelCloud <cl@modelcloud.ai>

4f8e479e

30 Oct, 2024 3 commits
- Add verify_certificate argument to local-completion (#2440) · 57272b63
  Samuel Monson authored Oct 30, 2024
  
  57272b63
- Add xquad task (#2435) · b40a20ae
  zxcvuser authored Oct 30, 2024
```
* Add xquad task

* Update general README

* Run pre-commit
```
  b40a20ae
- Fix lora requests when dp with vllm (#2433) · 838a3e03
  Chris Kerwell Gresla authored Oct 30, 2024
```
* fix: use lora_request for data parallel vllm evals

* fix(docs): include type hint

* chore: lint, et pre-commit al

---------
Co-authored-by: Chris Kerwell Gresla <chris@wafer.systems>
```
  838a3e03
25 Oct, 2024 1 commit

Fix package extras for watsonx support (#2426) · 7882043b

Kiersten Stokes authored Oct 25, 2024



* Update pyproject.toml with watsonx package extra
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

* Remove unused function
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

---------
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>

7882043b

23 Oct, 2024 1 commit

Support for IBM watsonx_llm (#2397) · 1185e89a

Nikodem Szwast authored Oct 23, 2024



* add support for IBM watsonx_llm

* add ibm_watsonx_ai package to optional-dependencies

* move global scope imports to inner scope

* change cache to lru_cache

* fix circular import

* use 3.8 typing

* use 3.8 typing

---------
Co-authored-by: Baber <baber@hey.com>

1185e89a

22 Oct, 2024 2 commits

[Fix] Replace generic exception classes with a more specific ones (#1989) · d4ae9635

Leonid Sinev authored Oct 22, 2024

* Replace generic exception classes with a more specific ones

* rerun pre-commit to pass linter tests

* Revert "rerun pre-commit to pass linter tests"

This reverts commit 67f88ccf144469853217704520e613196042d859.

* reduce repetitions in errors or so

* Replace generic exception class with a more specific one

d4ae9635

Update prompt (#2421) · 389347ee

Iker García-Ferrero authored Oct 22, 2024

Update prompt according to: 
https://github.com/ikergarcia1996/NoticIA/blob/main/prompts.py

389347ee

20 Oct, 2024 1 commit
- fix storycloze datanames (#2409) · 9b052fdc
  Yuxian Gu authored Oct 20, 2024
  
  9b052fdc
17 Oct, 2024 2 commits
- Fix: Turkish MMLU Regex Pattern (#2393) · c1d8795d
  Arda authored Oct 17, 2024
```
* Fix Regex Pattern for CoT experiments

---------
```
  c1d8795d
- group to tag for minerva_math (#2404) · 624017b7
  Ranger authored Oct 17, 2024
```
I find out this bug by comparing the code between hendrycks_math and minerva_math.
```
  624017b7
16 Oct, 2024 1 commit

Add new tasks to spanish_bench and fix duplicates (#2390) · 7ecee2bc

zxcvuser authored Oct 17, 2024



* added tasks to spanish_bench

* fixed capitalization in escola and run pre-commit

* Update _flores_common_yaml

* Update _flores_common_yaml

* Update direct_yaml

* Update cot_yaml

* Update cot_yaml

* Update _flores_common_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

7ecee2bc

14 Oct, 2024 1 commit

Add Unitxt Multimodality Support (#2364) · 7785577c

Elron Bandel authored Oct 14, 2024



* Add Unitxt Multimodality Support
Signed-off-by: elronbandel <elronbandel@gmail.com>

* Update
Signed-off-by: elronbandel <elronbandel@gmail.com>

* Fix formatting
Signed-off-by: elronbandel <elronbandel@gmail.com>

---------
Signed-off-by: elronbandel <elronbandel@gmail.com>

7785577c

08 Oct, 2024 4 commits

Bump version to v0.4.5 (#2389) · 0845b588
Hailey Schoelkopf authored Oct 08, 2024

0845b588
Fix Llava-1.5-hf ; Update to version 0.4.5 (#2388) · 2576a8cb
Hailey Schoelkopf authored Oct 08, 2024

2576a8cb

max_images are passed on to vllms `limit_mm_per_prompt` (#2387) · 1ed1f9ed

Baber Abbasi authored Oct 09, 2024

* max_images are passed on to vllms `limit_mm_per_prompt`

* replace max image placeholders in string

* handle chat_template error

* move `fewshot_random_seed` to global

1ed1f9ed

HF: switch conditional checks to `self.backend` from `AUTO_MODEL_CLASS` (#2353) · ab2c46c3

Baber Abbasi authored Oct 09, 2024



* switch conditional checks to `self.backend`

* nit

* nit

* commit feedback

* fix test; update precommit hooks

* add escape hatch for custom self.AUTO_MODEL_CLASS

* add escape hatch for custom self.AUTO_MODEL_CLASS

* fix

* move assertion

* add logging messages

* update AUTO_MODEL_CLASS behavior in _get_backend

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

ab2c46c3