Commits · afrimmlu-edit · gaoqiong / lm-evaluation-harness

08 Aug, 2024 6 commits
- add group yaml · 7f49429a
  lintangsutawika authored Aug 08, 2024
  
  7f49429a
- edit groups settings · 789b6525
  lintangsutawika authored Aug 08, 2024
  
  789b6525
- precommit format · 573d1401
  lintangsutawika authored Aug 08, 2024
  
  573d1401
- precommit format · 1cee9303
  lintangsutawika authored Aug 08, 2024
  
  1cee9303
- fix included yaml · bfb8844d
  lintangsutawika authored Aug 08, 2024
  
  bfb8844d
- resolved merge conflict · e58b8182
  lintangsutawika authored Aug 08, 2024
  
  e58b8182
07 Aug, 2024 1 commit
- gsm_plus minor fix (#2191) · 0571eeb1
  Yu Shi Jie authored Aug 07, 2024
```
* fixed gsm

* GSM-Plus: remove dataset_name line
```
  0571eeb1
05 Aug, 2024 8 commits

Update README.md (#2186) · cddce0a1
Hailey Schoelkopf authored Aug 05, 2024

cddce0a1
fix revision type (#2184) · 7ff13e9e
Hailey Schoelkopf authored Aug 05, 2024

7ff13e9e

Yu Shi Jie authored Aug 06, 2024



* added gsm_plus

* formatted dataset to have train-test-splits

* README.md for gsm-plus

* Update README.md

* GSM-Plus: added gsm_plus_mini

* GSM-Plus: attribution to original dataset

* Update README.md

* Update README.md

* Update README.md

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

d8506db0

Mmlu Pro (#1961) · 69d56f45

Yu Shi Jie authored Aug 06, 2024



* initialized mmlu_pro task

* added generative mmlu-pro

* added cot fewshot for mmlu-pro

* Initial commit

* updated mmlu-pro to take on 3 splits: test, val, dev

* mmlu-pro: added continuation and flan_cot_zeroshot

* added README.md for mmlu_pro

* removed

* update files

* moved files out, and removed unused versions

* updated

* mmlu_pro:

-changed task 'other' to 'miscellaneous'
there is already a group named 'other'
task and group with the same alias (e.g. mmlu_pro_other_generative) throws an error

-fixed yaml backslash escape for fewshot cot

* changed choices -> options in yaml config to fit dataset schema

* ONLY FOR DEFAULT: fixed yaml file to use variable number of choices

* mmlu-pro: fixed doc_to_text/choice/target configs for all variants

* mmlu-pro: minor fixes

* mmlu-pro/default: aligned with mmlu updates

* mmlu-pro: update yaml content in line with mmlu

* mmlu-pro: fixed mislabelling of task (math->chemistry)

* mmlu-pro: fixed yaml formatting

* add custom fewshot doc_to_text, target, and choice

* add process for each subtask

* add process for each subtask

* pre-commit

* pre-commit

* format

* resolved left out merge

* deleted folders + updated readme

* Update evaluator.py

* Update evaluator.py

---------
Co-authored-by: Yu Shi Jie <shijie@tensorplex.ai>
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
Co-authored-by: root <root@455bdd73-01.cloud.together.ai>
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

69d56f45

remove incorrectly inherited group names (#2181) · c2168869
Hailey Schoelkopf authored Aug 05, 2024

c2168869
add okapi machine translated notice. (#2168) · 54c9a979
Amir Hossein Kargaran authored Aug 05, 2024

54c9a979
[hotfix] API: messages were created twice (#2174) · 8cffa29b
Baber Abbasi authored Aug 05, 2024

8cffa29b

Dp and mp support (#2056) · 0ce7734d

Nathan Habib authored Aug 05, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* linting

* add doc

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/huggingface.py

* linter

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* style

* remove prepare

* fix

* style

* last check

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: clementine@huggingface.co <clementine@huggingface.co>

0ce7734d

04 Aug, 2024 2 commits
- Update README.md (#2125) · 05e6505b
  zhabuye authored Aug 04, 2024
  
  05e6505b
- fix typo. (#2169) · 836eba52
  Amir Hossein Kargaran authored Aug 04, 2024
  
  836eba52
01 Aug, 2024 3 commits

Update lm-eval-overview.ipynb (#2118) · 7ad7c5b9
Hailey Schoelkopf authored Aug 01, 2024

7ad7c5b9

refactor: limit usage of `scipy` and `skilearn` dependencies (#2097) · 7f15cce4

Nathan Weinberg authored Aug 01, 2024



* refactor: move scipy and sklearn module imports to func imports
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

* refactor: consolidate weighted_f1_score func into lm_eval utils
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

* lint: allow for utils file to have unused imports

this allows for shared functions to be defined only
once while allowing for the YAML function importing
to continue working
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

---------
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

7f15cce4

[Bugfix] add temperature=0 to logprobs and seed args to API models (#2149) · 63e76e89
Baber Abbasi authored Aug 01, 2024
```
* add temperature for log probs

* add seed

* nit

* add new args to test

* added warning for api chat models
```
63e76e89

29 Jul, 2024 1 commit

bugfix and docs for API (#2139) · b70af4f5

Baber Abbasi authored Jul 29, 2024



* encoding bugfix

* encoding bugfix

* overload logliklehood rather than loglikehood_tokens

* add custom tokenizer

* add docs

* Update API_guide.md

fix link; add note

* Update API_guide.md

typo

* pre-commit

* add link in readme

* nit

* nit

* nit

* Update API_guide.md

nits

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update README.md

* Update docs/API_guide.md

* Update docs/API_guide.md

* Update API_guide.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

b70af4f5

22 Jul, 2024 1 commit

Refactor API models (#2008) · 42dc2448

Baber Abbasi authored Jul 23, 2024

* refactor pad_token handling to fn

* fix docs

* add pad_token_handling to vllm

* start on API superclass

* don't detokenize the returned logits

* streamline vllm tokenizer

* add type hint

* pre-commit

* seems to be in working order

* add model to init

* refactor api models

* nit

* cleanup

* add pbar

* fix type hints

* change optional dependencies

* json encode chat template

* add type hints

* deal with different prompt input requiremnts

* nits

* fix

* cache inside async

* fix

* fix

* nits

* nits

* nits

* nit

* fixup

* fixup

* nit

* add dummy retry

* add dummy retry

* handle imports; skip failing test

* add type hint

* add tests

* add dependency to tests

* add package names to exception

* nit

* docs; type hints

* handle api key

* nit

* tokenizer bug

* fix tokenizer

* nit

* nit

* add better error messages

* nit

* remove decorator

* CI...

42dc2448

21 Jul, 2024 1 commit
- fix caching module (hotfix for now) (#2124) · 4a62757d
  Hailey Schoelkopf authored Jul 21, 2024
  
  4a62757d
20 Jul, 2024 1 commit
- docs: update truthfulqa tasks (#2119) · feff1b55
  Jennifer Cwagenberg authored Jul 19, 2024
  
  feff1b55
18 Jul, 2024 2 commits
- fix: broken discord link in CONTRIBUTING.md (#2114) · 8f8e7f6e
  Nathan Weinberg authored Jul 18, 2024
```
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
```
  8f8e7f6e
- [python] fix haerae tasks (#2112) · 9d4a04a0
  Jungwhan Kim authored Jul 18, 2024
  
  9d4a04a0
17 Jul, 2024 1 commit
- Fixed colon in Belebele _default_template_yaml (#2111) · 69502c06
  jab13x authored Jul 17, 2024
  
  69502c06
15 Jul, 2024 6 commits
- docs: align local test command to match CI (#2100) · 1adab703
  Nathan Weinberg authored Jul 15, 2024
```
Also add 'test_logs/' to .gitignore
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
```
  1adab703
- formatting (#2104) · 56a4e794
  Lintang Sutawika authored Jul 15, 2024
  
  56a4e794
- make recurrent_gemma model types included in the force-BOS case (#2105) · 9884ad6e
  Hailey Schoelkopf authored Jul 15, 2024
  
  9884ad6e
- add yamls · d213a533
  lintangsutawika authored Jul 15, 2024
  
  d213a533
- udpate tasks · ffc9e6a0
  lintangsutawika authored Jul 15, 2024
  
  ffc9e6a0
- modified format · 86f3bb3d
  lintangsutawika authored Jul 15, 2024
  
  86f3bb3d
14 Jul, 2024 1 commit

Added MedConceptsQA Benchmark (#2010) · 2b26690f

Ben Shoham Ofir authored Jul 14, 2024



* Added MedConceptsQA Benchmark

* pre-commit factor

* update group name

* update in naming

* changed name

* Changed mcqa to med_concepts_qa prefix

* Added med_concepts_qa to README.md

* Changed config files according the new format

* Updated README

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

2b26690f

13 Jul, 2024 1 commit
- docs: remove trailing sentence from contribution doc (#2098) · a7a2923f
  Nathan Weinberg authored Jul 13, 2024
```
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
```
  a7a2923f
12 Jul, 2024 5 commits

adjust naming · 37a6dbe2
lintangsutawika authored Jul 12, 2024

37a6dbe2
renamed firectory · 75e3f993
lintangsutawika authored Jul 12, 2024

75e3f993
added Irokobench tasks · 1220dd44
lintangsutawika authored Jul 12, 2024

1220dd44
formatting · a4d16d36
lintangsutawika authored Jul 12, 2024

a4d16d36

Irokobench: Benchmark Dataset for African languages (#2042) · 383bbd54

Jess authored Jul 12, 2024



* add afrixnli to task

* add chat completion

* remove chat completion -untested

* afrimmlu added

* afrimmlu folder update

* afrimmlu folder update

* updated prompt

* remove print

* add afrimgsm -direct

* add squad metric

* fix bash script

* remove direct util, update common yaml

* remove print

* add few show. metric fixes

* fix direct path, add bash script for gpt models

* added transate test

* update afrixnli tasks

* update afrixnli tasks

* update metrics for afrixnli

* prompt translations fix

* prompt translations fix

* filter and metric fix -mgsm

* remove squad metric

* remove squad metric

* add f1 score to mgsm

* add f1 score to mgsm

* update native-direct with lin

* change f1 function

* add lin to utils

* add utils

* remove test limit

* remove test configs

* add swahili to mmlu

* change eng to ewe in ewe yaml mmlu

* add squad metric to mgsm, remove whitespace filter

* added translate test

* added afrixnli_translate

* fix exact match valueError

* fix exact match valueError

* restructure mmlu folder

* spacing

* remove afrimmlu_translate folder

* add utility

* format task name, clean ups

* modefied mgsm

* update on afrimgsm

* update on afrimgsm

* removed utils

* other mgsm varieties

* other mgsm varieties

* adding trasnslate direct

* Update translate_direct_yaml

* add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model

* edit for open models

* Update translate_direct_yaml

* add verbalizer for xnli

* change xnli from multiple choice to generate

* add manual accuracy scores

* revert xnli to multiple choice

* change afrimgsm utils

* revert xnli to multiple_choice

* cleanups and readmes

* remove openai fixes and unused regex

* pr review changes

* revert metrics.py, task.py and extraction.py to main version

---------
Co-authored-by: Israel Abebe Azime <azime@cg.uni-saarland.de>
Co-authored-by: Israel Abebe Azime <se.israel.abebe@gmail.com>

383bbd54