Commits · 30fbcfc91c89024e176f4751f5e0c8ade3bc494a · gaoqiong / lm-evaluation-harness

02 Jul, 2024 1 commit
- make KMMLU a tag for now · 30fbcfc9
  haileyschoelkopf authored Jul 02, 2024
  
  30fbcfc9
25 Jun, 2024 6 commits
- some docs updates · 5251525d
  haileyschoelkopf authored Jun 25, 2024
  
  5251525d
- add more error msgs, agg_metric -> agg_metric_list · 93c17c57
  haileyschoelkopf authored Jun 25, 2024
  
  93c17c57
- add more explicit group configs · 09dd7f6c
  haileyschoelkopf authored Jun 25, 2024
  
  09dd7f6c
- add more explicit group configs · 8fdcbc13
  haileyschoelkopf authored Jun 25, 2024
  
  8fdcbc13
- add many explicit group configs · 51519e40
  haileyschoelkopf authored Jun 25, 2024
  
  51519e40
- add many explicit group configs · 44a602ab
  haileyschoelkopf authored Jun 25, 2024
  
  44a602ab
21 Jun, 2024 6 commits
- more groupings · c9801daf
  haileyschoelkopf authored Jun 21, 2024
  
  c9801daf
- add more groupings · c8693599
  haileyschoelkopf authored Jun 21, 2024
  
  c8693599
- add more groupings / tags distinctions · c171fa30
  haileyschoelkopf authored Jun 21, 2024
  
  c171fa30
- add more explicit aggregation groups · 46e8c8e6
  haileyschoelkopf authored Jun 21, 2024
  
  46e8c8e6
- add groups for agieval, aexams, aclue · a382359c
  haileyschoelkopf authored Jun 21, 2024
  
  a382359c
- some docs fixes · f48d87ec
  haileyschoelkopf authored Jun 21, 2024
  
  f48d87ec
10 Jun, 2024 15 commits
- update format · 5b527a71
  lintangsutawika authored Jun 10, 2024
  
  5b527a71
- update format · 83c070d4
  lintangsutawika authored Jun 10, 2024
  
  83c070d4
- update aggregate_metric arg · 5b64fb58
  lintangsutawika authored Jun 10, 2024
  
  5b64fb58
- change how aggregate_metric is loaded · 9fa3b3f4
  lintangsutawika authored Jun 10, 2024
  
  9fa3b3f4
- change how aggregate_metric is loaded · 80d0f412
  lintangsutawika authored Jun 10, 2024
  
  80d0f412
- Merge branch 'group-agg-rework' of... · 0f095f79
  lintangsutawika authored Jun 10, 2024
```
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-evaluation-harness into group-agg-rework
```
  0f095f79
- removed " " in make_table · e184c501
  lintangsutawika authored Jun 10, 2024
  
  e184c501
- update mmlu/ · ac1a1cef
  lintangsutawika authored Jun 10, 2024
  
  ac1a1cef
- Update docs/task_guide.md · 9e940f3d
  Lintang Sutawika authored Jun 10, 2024
```
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  9e940f3d
- Merge branch 'group-agg-rework' of... · b0028491
  lintangsutawika authored Jun 10, 2024
```
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-evaluation-harness into group-agg-rework
```
  b0028491
- remove version for metadata · 1848d664
  lintangsutawika authored Jun 10, 2024
  
  1848d664
- Update docs/task_guide.md · 26578d24
  Lintang Sutawika authored Jun 10, 2024
```
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  26578d24
- Merge branch 'group-agg-rework' of... · 016615fa
  lintangsutawika authored Jun 10, 2024
```
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-evaluation-harness into group-agg-rework
```
  016615fa
- remove group_alias · e8f49184
  lintangsutawika authored Jun 10, 2024
  
  e8f49184
- Update docs/task_guide.md · ed1f7574
  Lintang Sutawika authored Jun 10, 2024
```
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  ed1f7574
07 Jun, 2024 3 commits
- update all mmlu variants · 9a30374c
  lintangsutawika authored Jun 07, 2024
  
  9a30374c
- formatting · e6b1581f
  lintangsutawika authored Jun 07, 2024
  
  e6b1581f
- fix indentation · 5032ebaf
  lintangsutawika authored Jun 07, 2024
  
  5032ebaf
06 Jun, 2024 2 commits
- corner case for single tag being called · 4140dd99
  lintangsutawika authored Jun 06, 2024
  
  4140dd99
- add documentation · be8b547b
  lintangsutawika authored Jun 06, 2024
  
  be8b547b
04 Jun, 2024 2 commits
- format fix · 4eeb8715
  lintangsutawika authored Jun 04, 2024
  
  4eeb8715
- resolved merge conflict from latest version · 3e1301bb
  lintangsutawika authored Jun 04, 2024
  
  3e1301bb
03 Jun, 2024 3 commits

KonradSzafer authored Jun 03, 2024



* initial chat template

* tokenizer attribute check

* variable rename

* interface update

* system instruction

* system inst default update

* fewshot as multiturn

* typing update

* indent update

* added comments

* Adding a fewshot in a more readable way

* linting

* Moved apply chat template to LM

* multiturn alternation fix

* cache key update

* apply chat template method fix

* add system prompt hash to cache_key

* tokenizer name property for cache_key

* property name fix

* linting backward compatibility fix

* docs and errors update

* add documentation on adding chat template compatibility to model_guide

* fewshot as multiturn check fix

* saving system inst and chat template in results

* eval tracker update

* docs update

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

070d31df

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

Fix fewshot seed only set when overriding num_fewshot (#1914) · 42d5c4bf
LSinev authored Jun 03, 2024
```
Fix #1906
```
42d5c4bf

31 May, 2024 2 commits

Try to make existing tests run little bit faster (#1905) · 1060b68d
LSinev authored May 31, 2024

1060b68d

Making hardcoded few shots compatible with the chat template mechanism (#1895) · 4902aaaf

Clémentine Fourrier authored May 31, 2024



* init test 1

* fix

* this format seems to be working - need to update all other tasks with the new format

* bbh with few shot format

* fix fewshot bbh

* add mmlu flan cot

* samples of cot

* kmmlu

* fix gsm8k

* update keys for mmlu

* minerva math

* bbh

* fix

* fix samples

* small fixes to templates

* last prompt format change

* fixing prompt

* fixed minerva math format

* rm accidental commited file

* added doc for few shot samples

* Update lm_eval/loggers/evaluation_tracker.py

* Update lm_eval/loggers/evaluation_tracker.py

* Update docs/new_task_guide.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added check in sampler per code review

* added the system from a function, plus an example in minerva math

* style

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix unit tests 1

* forcing use of test split

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4902aaaf