Commits · 46e8c8e6e57abbbbeaff8f07dcc2ec2e9ea5e02f · gaoqiong / lm-evaluation-harness

21 Jun, 2024 2 commits
- add more explicit aggregation groups · 46e8c8e6
  haileyschoelkopf authored Jun 21, 2024
  
  46e8c8e6
- add groups for agieval, aexams, aclue · a382359c
  haileyschoelkopf authored Jun 21, 2024
  
  a382359c
10 Jun, 2024 3 commits
- update format · 83c070d4
  lintangsutawika authored Jun 10, 2024
  
  83c070d4
- update aggregate_metric arg · 5b64fb58
  lintangsutawika authored Jun 10, 2024
  
  5b64fb58
- update mmlu/ · ac1a1cef
  lintangsutawika authored Jun 10, 2024
  
  ac1a1cef
07 Jun, 2024 2 commits
- update all mmlu variants · 9a30374c
  lintangsutawika authored Jun 07, 2024
  
  9a30374c
- formatting · e6b1581f
  lintangsutawika authored Jun 07, 2024
  
  e6b1581f
06 Jun, 2024 1 commit
- corner case for single tag being called · 4140dd99
  lintangsutawika authored Jun 06, 2024
  
  4140dd99
03 Jun, 2024 1 commit

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

31 May, 2024 1 commit

Making hardcoded few shots compatible with the chat template mechanism (#1895) · 4902aaaf

Clémentine Fourrier authored May 31, 2024



* init test 1

* fix

* this format seems to be working - need to update all other tasks with the new format

* bbh with few shot format

* fix fewshot bbh

* add mmlu flan cot

* samples of cot

* kmmlu

* fix gsm8k

* update keys for mmlu

* minerva math

* bbh

* fix

* fix samples

* small fixes to templates

* last prompt format change

* fixing prompt

* fixed minerva math format

* rm accidental commited file

* added doc for few shot samples

* Update lm_eval/loggers/evaluation_tracker.py

* Update lm_eval/loggers/evaluation_tracker.py

* Update docs/new_task_guide.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added check in sampler per code review

* added the system from a function, plus an example in minerva math

* style

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix unit tests 1

* forcing use of test split

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4902aaaf

24 May, 2024 2 commits

Bigbench fix (#1686) · 78a215e0

Lintang Sutawika authored May 25, 2024



* edit process multiple-choice

* split template yaml

* remove

* modified multiple_choice tasks

* udpate

* Update multiple_choice_template_b_yaml

* Update multiple_choice_template_a_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

78a215e0

add mmlu tasks from pile-t5 (#1710) · f2ea37e3

Lintang Sutawika authored May 25, 2024



* add mmlu tasks from pile-t5

* Update _mmlu_flan_cot_fewshot_template_yaml

* Update _mmlu_flan_cot_zeroshot_template_yaml

* Update _mmlu_flan_generative_template_yaml

* Update _mmlu_flan_loglikelihood_template_yaml

* Update _default_template_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

f2ea37e3

22 May, 2024 1 commit
- Update polemo2_out.yaml (#1871) · 70e1de09
  zhabuye authored May 22, 2024
  
  70e1de09
21 May, 2024 1 commit
- fixed incorrect check for task type (replace `~` with `not`) (#1865) · 00b7a61c
  Zafir Stojanovski authored May 21, 2024
  
  00b7a61c
17 May, 2024 1 commit
- fix config passing issues · 3be0916c
  lintangsutawika authored May 17, 2024
  
  3be0916c
16 May, 2024 2 commits
- update mmlu · 104292ff
  lintangsutawika authored May 16, 2024
  
  104292ff
- remove tag check · 923f3e81
  lintangsutawika authored May 16, 2024
  
  923f3e81
15 May, 2024 1 commit
- add _process_group_config · 6d1753dc
  lintangsutawika authored May 15, 2024
  
  6d1753dc
13 May, 2024 1 commit

Adding tinyBenchmarks datasets (#1545) · fe9fef4e

Lucas Weber authored May 13, 2024



* Add tinyBenchmarks

* Add acknowledgements

* Add ordering of outputs for data-parallel

* Run pre-commit

* Add few_shot specifications

* Add tinyBenchmarks post-processing

* add conditional import ; fix task names

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

fe9fef4e

11 May, 2024 8 commits
- reformate · aef7028c
  lintangsutawika authored May 11, 2024
  
  aef7028c
- Update truthfulqa_mc2.yaml · 79ce346d
  Lintang Sutawika authored May 11, 2024
  
  79ce346d
- new group config parameter `tag_to_task` · 41e64b2e
  lintangsutawika authored May 11, 2024
  
  41e64b2e
- new mmlu config · 1960eb9a
  lintangsutawika authored May 11, 2024
  
  1960eb9a
- revert mmlu tasks · 0ffdee02
  lintangsutawika authored May 11, 2024
  
  0ffdee02
- revert truthfulqa · 39246084
  lintangsutawika authored May 11, 2024
  
  39246084
- add task_id for python tasks as well · 1ef5b0bf
  lintangsutawika authored May 11, 2024
  
  1ef5b0bf
- add task_id for python tasks as well · 78c3f7d3
  lintangsutawika authored May 11, 2024
  
  78c3f7d3
10 May, 2024 6 commits
- reformat · 0905615f
  lintangsutawika authored May 10, 2024
  
  0905615f
- convert all groups to configurable groups · e0986475
  lintangsutawika authored May 10, 2024
  
  e0986475
- fix bug · 5a3a9573
  lintangsutawika authored May 10, 2024
  
  5a3a9573
- fix bug · 5c289f97
  lintangsutawika authored May 10, 2024
  
  5c289f97
- update · 719fa9b1
  lintangsutawika authored May 10, 2024
  
  719fa9b1
- update loading of task group and newly added tags · f36fb47f
  lintangsutawika authored May 10, 2024
  
  f36fb47f
09 May, 2024 1 commit

Copal task (#1803) · 1980a13c

Edd authored May 10, 2024

* add copal

* change name to copal id for clarity and the task name

* remove `copal_id...` to yaml to make it work

* checkmark on README

* change group name to `copal_id`

1980a13c

08 May, 2024 4 commits
- replace group to tag · 2f2322b9
  lintangsutawika authored May 08, 2024
  
  2f2322b9
- add task for mmlu evaluation in arc multiple choice format (#1745) · 9097ad3e
  jonabur authored May 08, 2024
```
* add mmlu arc style evaluation

* rename arc_style to continuation

---------
Co-authored-by: Jonathan Burdge <jburdge@mahti-login11.mahti.csc.fi>
Co-authored-by: Jonathan Burdge <jburdge@mahti-login12.mahti.csc.fi>
```
  9097ad3e
- update truthfulqa · 9f06432c
  lintangsutawika authored May 08, 2024
  
  9f06432c
- update additional condition when loading a group in a group yaml · 44d70398
  lintangsutawika authored May 08, 2024
  
  44d70398
07 May, 2024 2 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt

The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unit...

885f48d6

reversed task list · 56373978
lintangsutawika authored May 07, 2024

56373978