Commits · mmlu-pro-changes · gaoqiong / lm-evaluation-harness

05 Aug, 2024 6 commits
- format · 458342e2
  lintangsutawika authored Aug 05, 2024
  
  458342e2
- pre-commit · b8122d98
  lintangsutawika authored Aug 05, 2024
  
  b8122d98
- pre-commit · 01b129bb
  lintangsutawika authored Aug 05, 2024
  
  01b129bb
- add process for each subtask · 89de5103
  lintangsutawika authored Aug 05, 2024
  
  89de5103
- add process for each subtask · 9bee4b4f
  lintangsutawika authored Aug 05, 2024
  
  9bee4b4f
- add custom fewshot doc_to_text, target, and choice · 578f5d48
  lintangsutawika authored Aug 05, 2024
  
  578f5d48
27 Jun, 2024 3 commits
- updated · 42d194f8
  lintangsutawika authored Jun 27, 2024
  
  42d194f8
- moved files out, and removed unused versions · 1f6a6ebc
  lintangsutawika authored Jun 27, 2024
  
  1f6a6ebc
- update files · 5be2bb10
  lintangsutawika authored Jun 27, 2024
  
  5be2bb10
26 Jun, 2024 1 commit
- removed · bfbda3b3
  lintangsutawika authored Jun 26, 2024
  
  bfbda3b3
24 Jun, 2024 5 commits
- Merge branch 'mmlu-pro' of github.com:ysjprojects/lm-evaluation-harness into mmlu-pro · 067f681a
  Yu Shi Jie authored Jun 24, 2024
```
sync with remote
```
  067f681a
- added README.md for mmlu_pro · e91c1182
  Yu Shi Jie authored Jun 24, 2024
  
  e91c1182
- Merge branch 'EleutherAI:main' into mmlu-pro · bb5b46d7
  Yu Shi Jie authored Jun 24, 2024
  
  bb5b46d7
- mmlu-pro: added continuation and flan_cot_zeroshot · 1e4e058c
  Yu Shi Jie authored Jun 24, 2024
  
  1e4e058c
- updated mmlu-pro to take on 3 splits: test, val, dev · 23bd3449
  Yu Shi Jie authored Jun 24, 2024
  
  23bd3449
20 Jun, 2024 1 commit

Add BertaQA dataset tasks (#1964) · 6f7b4a05

Julen Etxaniz authored Jun 20, 2024



* add bertaqa tasks

* rename basquetrivia-->bertaqa ; make template stub not .yaml

* add bertaqa entry to lm_eval/tasks/README.md

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

6f7b4a05

19 Jun, 2024 5 commits

Fix Datasets `--trust_remote_code` (#1998) · d14b36e8
Hailey Schoelkopf authored Jun 19, 2024

d14b36e8
Added ArabicMMLU (#1987) · a08bc3c8
Yazeed Alnumay authored Jun 19, 2024
```
* Added ArabicMMLU

* Rename `ammlu` to `arabicmmlu`
```
a08bc3c8

Log `fewshot_as_multiturn` in results files (#1995) · 78a54e14

Hailey Schoelkopf authored Jun 19, 2024



* log fewshot_as_multiturn in general tracker args

* Update evaluator.py

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

78a54e14

Fix Paloma Template yaml (#1993) · ead2964e

Hailey Schoelkopf authored Jun 19, 2024



* init paloma benchmark

* pre-process in utils function

* add `task_alias`

* updated task aliases

* Update paloma_dolma-v1_5.yaml

* Update paloma_twitterAAE_HELM_fixed.yaml

* Update paloma_dolma_100_programing_languages.yaml

* update on names

* fix paloma template issue

---------
Co-authored-by: Zafir Stojanovski <zaf.stojano@gmail.com>
Co-authored-by: Zafir Stojanovski <zafir.stojanovski@icloud.com>
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

ead2964e

[New Task] Add Paloma benchmark (#1928) · f257d38b

Zafir Stojanovski authored Jun 19, 2024



* init paloma benchmark

* pre-process in utils function

* add `task_alias`

* updated task aliases

* Update paloma_dolma-v1_5.yaml

* Update paloma_twitterAAE_HELM_fixed.yaml

* Update paloma_dolma_100_programing_languages.yaml

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

f257d38b

18 Jun, 2024 2 commits
- Fix self assignment in neuron_optimum.py (#1990) · bdb78d22
  LSinev authored Jun 18, 2024
  
  bdb78d22
- add trust_remote_code for piqa (#1983) · 72bb6241
  Wang, Chang authored Jun 19, 2024
```
Signed-off-by: changwangss <chang1.wang@intel.com>
```
  72bb6241
14 Jun, 2024 1 commit
- added cot fewshot for mmlu-pro · 772d6f6f
  Yu Shi Jie authored Jun 14, 2024
  
  772d6f6f
13 Jun, 2024 7 commits
- fix: add directory filter to os.walk to ignore 'ipynb_checkpoints' (#1956) · 568af943
  johnwee1 authored Jun 13, 2024
```
* fix: add filter to os.walk to ignore 'ipynb_checkpoints

* Update __init__.py

* Update __init__.py

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>
```
  568af943
- make write_out.py explicitly error if no splits match (#1796) · ed72238f
  Hailey Schoelkopf authored Jun 13, 2024
```
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
  ed72238f
- Merge branch 'mmlu-pro' of github.com:ysjprojects/lm-evaluation-harness into mmlu-pro · fbeaa2c1
  Yu Shi Jie authored Jun 13, 2024
```
Resolve conflict.
```
  fbeaa2c1
- Fix `--gen_kwargs` and VLLM (`temperature` not respected) (#1800) · 5c7cba23
  Hailey Schoelkopf authored Jun 13, 2024
```
* Update vllm_causallms.py

* adjust

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
  5c7cba23
- added generative mmlu-pro · 91b2eec6
  Yu Shi Jie authored Jun 13, 2024
  
  91b2eec6
- `samples` is newline delimited (#1930) · 3850e21a
  Baber Abbasi authored Jun 13, 2024
```
* `samples` is newline delimited

* updated git and pre-commit

* appease pre-commit

* nit

* Revert back for now

* Revert for now

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>
```
  3850e21a
- initialized mmlu_pro task · 6c5edc99
  Yu Shi Jie authored Jun 13, 2024
  
  6c5edc99
12 Jun, 2024 2 commits
- Fix self.max_tokens in anthropic_llms.py (#1848) · 793469e0
  Nikita Lozhnikov authored Jun 12, 2024
```
Fix bug where `self.max_tokens` was not set
```
  793469e0
- Update interface.md (#1955) · 6f434934
  Sadra Barikbin authored Jun 12, 2024
  
  6f434934
11 Jun, 2024 4 commits
- add hacky add_bos_token forcing for Gemma to VLLM too (#1857) · b3e4c49a
  Hailey Schoelkopf authored Jun 11, 2024
  
  b3e4c49a
- add include_defaults kwarg to taskmanager, add tests for include_path (#1856) · 4bb77e82
  Hailey Schoelkopf authored Jun 11, 2024
  
  4bb77e82
- Remove AMMLU Due to Translation (#1948) · d0f6e011
  Hailey Schoelkopf authored Jun 11, 2024
```
* Update README.md

* Delete lm_eval/tasks/ammlu directory
```
  d0f6e011
- Results filenames handling fix (#1926) · 69952581
  KonradSzafer authored Jun 11, 2024
```
* results filenames handling moved to utils

* zeno results handling fix

* tasks_for_model backward compatibility

* results files logic moved to tasks_for_model

* moved sanitize_model_name to utils
```
  69952581
10 Jun, 2024 1 commit
- Add the Arabic version with refactor to Arabic pica to be in alghafa folder (#1940) · 305fb636
  khalil authored Jun 10, 2024
  
  305fb636
09 Jun, 2024 1 commit
- Update __main__.py (#1939) · bea1a859
  Sadra Barikbin authored Jun 09, 2024
  
  bea1a859
07 Jun, 2024 1 commit

Test output table layout consistency (#1916) · 40f5458f

Zafir Stojanovski authored Jun 07, 2024

* sort metrics in output table

* update docstring in `consolidate_results`

* add tests for verifying consistency of table output

* update tests to account for floating point inconsistencies

* updated tests based on `pythia-14m`

40f5458f