Commits · cbc31eb85edae8881d16b888f4f2a7c868e0c008 · gaoqiong / lm-evaluation-harness

16 Nov, 2024 1 commit

Wonseok Hwang authored Nov 17, 2024

* release kbl-v0.1

* fix linting

* remove rag tasks as  doc_to_text functions cause trouble

* remove remaining rag tasks

* remove unnecessary repeat in yaml files and rag dataset in hf-hub

* remove unncessary newline; introduce cfg files in lbox/kbl in hf

* Make task yaml files consistent to hf-datasets-config

* Make task yaml files consistent to hf-datasets-config

* Remove trailing empty space in doc-to-text

* Remove unncessary yaml file

* Fix task nameing error

* trailing space removed

cbc31eb8

05 Nov, 2024 1 commit

Add Japanese Leaderboard (#2439) · 26f607f5

mtkachenko authored Nov 05, 2024

* add jaqket_v2 and jcommonsenseqa

* remove comments

* remove num_beams as it is incompatible with vllm

* add jnli + refactor

* rename jnla -> jnli

* add jsquad + replace colon chars with the Japanese unicode

* ignore whitespaces in generation tasks

* add marc_ja

* add xwinograd + simplify other yamls

* add mgsm and xlsum

* refactor xlsum

* add ja_leaderboard tag

* edit README.md

* update README.md

* add credit + minor changes

* run ruff format

* address review comments + add group

* remove aggregate_metric_list

* remove tags

* update tasks/README.md

26f607f5

01 Nov, 2024 1 commit
- Add missing task links (#2449) · ade1cc4e
  Sypherd authored Nov 01, 2024
  
  ade1cc4e
30 Oct, 2024 1 commit
- Add xquad task (#2435) · b40a20ae
  zxcvuser authored Oct 30, 2024
```
* Add xquad task

* Update general README

* Run pre-commit
```
  b40a20ae
04 Oct, 2024 2 commits

Add new benchmark: Catalan bench (#2154) · cb069004

zxcvuser authored Oct 04, 2024



* Add catalan_bench

* added flores_ca.yaml

* Updated some task groupings and readme

* Fix create_yamls_flores_ca.py

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

cb069004

Add new benchmark: Basque bench (#2153) · c887796d

zxcvuser authored Oct 04, 2024



* Add basque_bench

* Add flores_eu group

* Update _flores_common_yaml

* Run linters, updated flores, mgsm, copa, and readme

* Apply suggestions from code review
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>

---------
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

c887796d

03 Oct, 2024 2 commits

Add new benchmark: Galician bench (#2155) · 0e763862

zxcvuser authored Oct 03, 2024

* Add galician_bench

* Update xnli_gl path

* Add flores_gl group

* Update _flores_common_yaml

* Updated some task groupings and readme

---------

0e763862

Add new benchmark: Spanish bench (#2157) · ea17b98e

zxcvuser authored Oct 03, 2024

* Add spanish_bench

* Add flores_es group

* Update _flores_common_yaml

* Delete lm_eval/tasks/spanish_bench/escola.yaml

* Delete escola from spanish_bench.yaml

* Delete escola from README.md

* pre-commit run --all-files

* Updated some task groupings and readme

---------

ea17b98e

30 Sep, 2024 1 commit

Add new benchmark: Portuguese bench (#2156) · caa7c409

zxcvuser authored Sep 30, 2024

* Add portuguese_bench

* Add flores_pt group

* Update _flores_common_yaml

* Run linters and update flores and readme

caa7c409

26 Sep, 2024 1 commit

Added TurkishMMLU to LM Evaluation Harness (#2283) · deb43287

Arda authored Sep 26, 2024



* Added TurkishMMLU to LM Evaluation Harness

* Fixed COT name

* Fixed COT name

* Updated Readme

* Fixed Test issues

* Completed  Scan for changed tasks

* Updated Readme

* Update README.md

* fixup task naming casing + ensure yaml template stubs aren't registered

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

deb43287

10 Sep, 2024 1 commit

Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) (#2232) · decc533d

Malikeh Ehghaghi authored Sep 10, 2024



* arabic leaferboard yaml file is added

* arabic toxigen is implemented

* Dataset library is imported

* arabic sciq is added

* util file of arabic toxigen is updated

* arabic race is added

* arabic piqa is implemented

* arabic open qa is added

* arabic copa is implemented

* arabic boolq ia added

* arabic arc easy is added

* arabic arc challenge is added

* arabic exams benchmark is implemented

* arabic hellaswag is added

* arabic leaderboard yaml file metrics are updated

* arabic mmlu benchmarks are added

* arabic mmlu group yaml file is updated

* alghafa benchmarks are added

* acva benchmarks are added

* acva utils.py is updated

* light version of arabic leaderboard benchmarks are added

* bugs fixed

* bug fixed

* bug fixed

* bug fixed

* bug fixed

* bug fixed

* library import bug is fixed

* doc to target updated

* bash file is deleted

* results folder is deleted

* leaderboard groups are added

* full arabic leaderboard groups are added, plus some bug fixes to the light version

* Create README.md

README.md for arabic_leaderboard_complete

* Create README.md

README.md for arabic_leaderboard_light

* Delete lm_eval/tasks/arabic_leaderboard directory

* Update README.md

* Update README.md

adding the Arabic leaderboards to the library

* Update README.md

10% of the training set

* Update README.md

10% of the training set

* revert .gitignore to prev version

* Update lm_eval/tasks/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* updated main README.md

* Update lm_eval/tasks/README.md

* specify machine translated benchmarks (complete)

* specify machine translated benchmarks (light version)

* add alghafa to the related task names (complete and light)

* add 'acva' to the related task names (complete and light)

* add 'arabic_leaderboard' to all the groups (complete and light)

* all dataset - not a random sample

* added more accurate details to the readme file

* added mt_mmlu from okapi

* Update lm_eval/tasks/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/tasks/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* updated mt_mmlu readme

* renaming 'alghafa' full and light

* renaming 'arabic_mmlu' light and full

* renaming 'acva' full and light

* update readme and standardize dir/file names

* running pre-commit

---------
Co-authored-by: shahrzads <sayehban@ualberta.ca>
Co-authored-by: shahrzads <56282669+shahrzads@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

decc533d

23 Aug, 2024 1 commit

Fix typos in multiple places (#2244) · fa837646

LSinev authored Aug 23, 2024

ACLUE bibtex typo reported to ACL Anthology and fixed here as title in pdf is correct.

fa837646

19 Aug, 2024 1 commit

Lingoly README update (#2228) · f81b62bf

am-bean authored Aug 19, 2024

* Setting up lingoly task

* Testing yaml changes to debug

* Adding pre-commit hooks

* Functional LingOly benchmark

* Renaming files and adding grouping

* Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores.

* Adding LingOly to the README file

f81b62bf

05 Aug, 2024 1 commit
- add okapi machine translated notice. (#2168) · 54c9a979
  Amir Hossein Kargaran authored Aug 05, 2024
  
  54c9a979
04 Aug, 2024 2 commits
- Update README.md (#2125) · 05e6505b
  zhabuye authored Aug 04, 2024
  
  05e6505b
- fix typo. (#2169) · 836eba52
  Amir Hossein Kargaran authored Aug 04, 2024
  
  836eba52
14 Jul, 2024 1 commit

Added MedConceptsQA Benchmark (#2010) · 2b26690f

Ben Shoham Ofir authored Jul 14, 2024



* Added MedConceptsQA Benchmark

* pre-commit factor

* update group name

* update in naming

* changed name

* Changed mcqa to med_concepts_qa prefix

* Added med_concepts_qa to README.md

* Changed config files according the new format

* Updated README

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

2b26690f

12 Jul, 2024 1 commit

Add new dataset MMLU-SR tasks (#2032) · d5f39bf8

SuperCat authored Jul 12, 2024



* add mmlusr tasks

* renamed all tasks names in mmlusr

* edit format and readme

* added mmlu_sr

* mmlu_sr -> mmlusr

* update

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

d5f39bf8

03 Jul, 2024 2 commits

#1442 inverse scaling tasks implementation (#1589) · d855d0ba

Hanwool Albert Lee authored Jul 03, 2024



* initial_implementation (test has to be proceeded)

* minor fix

* revised task name and implemented new task

* minor fixes

* new tasks implement

* minor fix

* added 'prompt injection' task

* delete prompt injection task (will be implemented at next PR)

* trust remote code

* Update lm_eval/tasks/inverse_scaling/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added readme

* Update lm_eval/tasks/README.md

* Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml

* Update lm_eval/tasks/inverse_scaling/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md

* precommit?

* run precommit on readme

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

d855d0ba

Adds Open LLM Leaderboard Taks (#2047) · 3c8db1bb

Nathan Habib authored Jul 03, 2024



* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

3c8db1bb

25 Jun, 2024 1 commit

Added CommonsenseQA task (#1721) · b62b9bd0

Brendan Murphy authored Jun 25, 2024



* Initial configuration

* Using the validation set for the test set, because the test set on HF doesn't have labels

* Probably just makes more sense to have validation be validation

* fix format ; add docs to tasks/README.md

* fix format

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

b62b9bd0

20 Jun, 2024 1 commit

Add BertaQA dataset tasks (#1964) · 6f7b4a05

Julen Etxaniz authored Jun 20, 2024



* add bertaqa tasks

* rename basquetrivia-->bertaqa ; make template stub not .yaml

* add bertaqa entry to lm_eval/tasks/README.md

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

6f7b4a05

19 Jun, 2024 2 commits

Added ArabicMMLU (#1987) · a08bc3c8
Yazeed Alnumay authored Jun 19, 2024
```
* Added ArabicMMLU

* Rename `ammlu` to `arabicmmlu`
```
a08bc3c8

[New Task] Add Paloma benchmark (#1928) · f257d38b

Zafir Stojanovski authored Jun 19, 2024



* init paloma benchmark

* pre-process in utils function

* add `task_alias`

* updated task aliases

* Update paloma_dolma-v1_5.yaml

* Update paloma_twitterAAE_HELM_fixed.yaml

* Update paloma_dolma_100_programing_languages.yaml

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

f257d38b

11 Jun, 2024 1 commit
- Remove AMMLU Due to Translation (#1948) · d0f6e011
  Hailey Schoelkopf authored Jun 11, 2024
```
* Update README.md

* Delete lm_eval/tasks/ammlu directory
```
  d0f6e011
06 Jun, 2024 1 commit

Add new Lambada translations (#1897) · b9d96b50

Zafir Stojanovski authored Jun 06, 2024



* added tasks and task family descriptors

* configs for the new lambada translations

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* update `lm_eval/tasks/README.md` with task description

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: anthony <anthonydipofi@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

b9d96b50

03 Jun, 2024 1 commit

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

18 Mar, 2024 1 commit

Cleanup for v0.4.2 release (#1573) · 5627e819

Hailey Schoelkopf authored Mar 18, 2024

* Update interface.md

* fix: make caching reqs always work with accelerate launch

* remove stale task migration checklist

* remove deprecation warnings

* make informative TypeErrors for get_task_dict

* bump version metadata

* fix num_fewshot printing bug

* add fewshot value to cache key

5627e819

25 Sep, 2023 1 commit
- add belebele · c5ebdd0f
  ManuelFay authored Sep 25, 2023
  
  c5ebdd0f
07 Sep, 2023 1 commit
- checked mutual and qasper · e0e0746d
  lintangsutawika authored Sep 07, 2023
  
  e0e0746d
06 Sep, 2023 2 commits
- edit readme · 78522c94
  lintangsutawika authored Sep 06, 2023
  
  78522c94
- Update README.md · 00c4ffff
  Lintang Sutawika authored Sep 06, 2023
```
Crossed WMT off.
```
  00c4ffff
30 Aug, 2023 2 commits
- checked coqa on readme · 8287fe7c
  lintangsutawika authored Aug 30, 2023
  
  8287fe7c
- running process for drop · 79aa53b1
  lintangsutawika authored Aug 30, 2023
  
  79aa53b1
26 Aug, 2023 2 commits
- add asdiv task · 69a08222
  lintangsutawika authored Aug 26, 2023
  
  69a08222
- add to readme · 04f5697d
  lintangsutawika authored Aug 26, 2023
  
  04f5697d
15 Aug, 2023 4 commits
- update readme docs · da8af971
  lintangsutawika authored Aug 15, 2023
  
  da8af971
- edit readme · 0436b5d6
  lintangsutawika authored Aug 15, 2023
  
  0436b5d6
- Update README.md · 4b0ab122
  Lintang Sutawika authored Aug 15, 2023
  
  4b0ab122
- Update README.md · 55749b9b
  Lintang Sutawika authored Aug 15, 2023
  
  55749b9b