Commits · multimodal-evals · gaoqiong / lm-evaluation-harness

01 Jul, 2024 2 commits
- fixes · 593998e8
  haileyschoelkopf authored Jul 01, 2024
  
  593998e8
- add warning re: _get_backend behavior · d5195462
  haileyschoelkopf authored Jul 01, 2024
  
  d5195462
28 Jun, 2024 4 commits
- revert to orig. requirements · dffce65e
  haileyschoelkopf authored Jun 28, 2024
  
  dffce65e
- remove llava model · ad92d498
  haileyschoelkopf authored Jun 28, 2024
  
  ad92d498
- add up-to-date code (bs>1 seems to work w/ Llava-1.6-mistral: 0.34 MMMU · 55f8321d
  haileyschoelkopf authored Jun 28, 2024
  
  55f8321d
- WIP code · 6e3b2ea1
  haileyschoelkopf authored Jun 28, 2024
  
  6e3b2ea1
17 Jun, 2024 3 commits
- More Format Changes · 7d7a3a1c
  Ashvin Nihalani authored Jun 17, 2024
  
  7d7a3a1c
- Ruff Linter Checks · 2dc436fa
  Ashvin Nihalani authored Jun 17, 2024
  
  2dc436fa
- Adding LLaVa support · 1dda496f
  Ashvin Nihalani authored Mar 13, 2024
```
Updating APIs for MM support

Adding MLLM dependencies

Rebase off mainline
```
  1dda496f
13 Jun, 2024 4 commits

fix: add directory filter to os.walk to ignore 'ipynb_checkpoints' (#1956) · 568af943

johnwee1 authored Jun 13, 2024



* fix: add filter to os.walk to ignore 'ipynb_checkpoints

* Update __init__.py

* Update __init__.py

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

568af943

make write_out.py explicitly error if no splits match (#1796) · ed72238f
Hailey Schoelkopf authored Jun 13, 2024
```
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
ed72238f
Fix `--gen_kwargs` and VLLM (`temperature` not respected) (#1800) · 5c7cba23
Hailey Schoelkopf authored Jun 13, 2024
```
* Update vllm_causallms.py

* adjust

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
5c7cba23

`samples` is newline delimited (#1930) · 3850e21a

Baber Abbasi authored Jun 13, 2024



* `samples` is newline delimited

* updated git and pre-commit

* appease pre-commit

* nit

* Revert back for now

* Revert for now

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

3850e21a

12 Jun, 2024 2 commits
- Fix self.max_tokens in anthropic_llms.py (#1848) · 793469e0
  Nikita Lozhnikov authored Jun 12, 2024
```
Fix bug where `self.max_tokens` was not set
```
  793469e0
- Update interface.md (#1955) · 6f434934
  Sadra Barikbin authored Jun 12, 2024
  
  6f434934
11 Jun, 2024 4 commits
- add hacky add_bos_token forcing for Gemma to VLLM too (#1857) · b3e4c49a
  Hailey Schoelkopf authored Jun 11, 2024
  
  b3e4c49a
- add include_defaults kwarg to taskmanager, add tests for include_path (#1856) · 4bb77e82
  Hailey Schoelkopf authored Jun 11, 2024
  
  4bb77e82
- Remove AMMLU Due to Translation (#1948) · d0f6e011
  Hailey Schoelkopf authored Jun 11, 2024
```
* Update README.md

* Delete lm_eval/tasks/ammlu directory
```
  d0f6e011
- Results filenames handling fix (#1926) · 69952581
  KonradSzafer authored Jun 11, 2024
```
* results filenames handling moved to utils

* zeno results handling fix

* tasks_for_model backward compatibility

* results files logic moved to tasks_for_model

* moved sanitize_model_name to utils
```
  69952581
10 Jun, 2024 1 commit
- Add the Arabic version with refactor to Arabic pica to be in alghafa folder (#1940) · 305fb636
  khalil authored Jun 10, 2024
  
  305fb636
09 Jun, 2024 1 commit
- Update __main__.py (#1939) · bea1a859
  Sadra Barikbin authored Jun 09, 2024
  
  bea1a859
07 Jun, 2024 4 commits

Test output table layout consistency (#1916) · 40f5458f

Zafir Stojanovski authored Jun 07, 2024

* sort metrics in output table

* update docstring in `consolidate_results`

* add tests for verifying consistency of table output

* update tests to account for floating point inconsistencies

* updated tests based on `pythia-14m`

40f5458f

Update basque-glue (#1913) · 59418aac

zhabuye authored Jun 07, 2024

* Update README.md

* Update bec.yaml

* Update bhtc.yaml

* Update coref.yaml

* Update qnli.yaml

* Update vaxx.yaml

* Update wic.yaml

59418aac

Update siqa.yaml (#1909) · 3f0ef80b
Hailey Schoelkopf authored Jun 07, 2024

3f0ef80b
Add The Arabic version of the PICA benchmark (#1917) · 923852b0
khalil authored Jun 07, 2024

923852b0

06 Jun, 2024 3 commits

Implement NoticIA (#1912) · f2843b2f
Iker García-Ferrero authored Jun 06, 2024
```
* Noticia

* test

* Final testes implementation

* Fixes

* Fix linters
```
f2843b2f

Add new Lambada translations (#1897) · b9d96b50

Zafir Stojanovski authored Jun 06, 2024



* added tasks and task family descriptors

* configs for the new lambada translations

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* update `lm_eval/tasks/README.md` with task description

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: anthony <anthonydipofi@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

b9d96b50

[add] fld logical formula task (#1931) · 33eef48f
MorishT authored Jun 06, 2024

33eef48f

05 Jun, 2024 2 commits

Modify pre-commit hook to check merge conflicts accidentally committed not at... · e39df01c
LSinev authored Jun 05, 2024
```
Modify pre-commit hook to check merge conflicts accidentally committed not at current merge commit (#1927)
```
e39df01c

Multiple Choice Questions and Large Languages Models: A Case Study with... · 7257aa2e

Maxime authored Jun 05, 2024

Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data (#1867)

* glianorex tasks

* Create README.md

* Update README.md

* Update README.md

* fix formatting

* fix internal formatting

7257aa2e

03 Jun, 2024 3 commits

Add chat template (#1873) · 070d31df

KonradSzafer authored Jun 03, 2024



* initial chat template

* tokenizer attribute check

* variable rename

* interface update

* system instruction

* system inst default update

* fewshot as multiturn

* typing update

* indent update

* added comments

* Adding a fewshot in a more readable way

* linting

* Moved apply chat template to LM

* multiturn alternation fix

* cache key update

* apply chat template method fix

* add system prompt hash to cache_key

* tokenizer name property for cache_key

* property name fix

* linting backward compatibility fix

* docs and errors update

* add documentation on adding chat template compatibility to model_guide

* fewshot as multiturn check fix

* saving system inst and chat template in results

* eval tracker update

* docs update

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

070d31df

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

Fix fewshot seed only set when overriding num_fewshot (#1914) · 42d5c4bf
LSinev authored Jun 03, 2024
```
Fix #1906
```
42d5c4bf

31 May, 2024 3 commits

Try to make existing tests run little bit faster (#1905) · 1060b68d
LSinev authored May 31, 2024

1060b68d

Making hardcoded few shots compatible with the chat template mechanism (#1895) · 4902aaaf

Clémentine Fourrier authored May 31, 2024



* init test 1

* fix

* this format seems to be working - need to update all other tasks with the new format

* bbh with few shot format

* fix fewshot bbh

* add mmlu flan cot

* samples of cot

* kmmlu

* fix gsm8k

* update keys for mmlu

* minerva math

* bbh

* fix

* fix samples

* small fixes to templates

* last prompt format change

* fixing prompt

* fixed minerva math format

* rm accidental commited file

* added doc for few shot samples

* Update lm_eval/loggers/evaluation_tracker.py

* Update lm_eval/loggers/evaluation_tracker.py

* Update docs/new_task_guide.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added check in sampler per code review

* added the system from a function, plus an example in minerva math

* style

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix unit tests 1

* forcing use of test split

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4902aaaf

Add dataset card when pushing to HF hub (#1898) · f4f59251

KonradSzafer authored May 31, 2024



* dataset card initial

* few fixes

* adds groups for math, mmlu, gpqa

* added summary agrs

* moved sanitize_list to utils

* readme update

* recreate metadata moved

* multiple model support

* results latest split fix

* readme update and small refactor

* fix grouping

* add comments

* added pathlib

* corrected pathlib approach

* check whether to create a metadata card

* convert posix paths to str

* default hf org from token

* hf token value error

* Add logs after successful upload

* logging updates

* dataset card example in the readme

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Alina Lozovskaia <alinailozovskaya@gmail.com>

f4f59251

30 May, 2024 2 commits

`higher_is_better` tickers in output table (#1893) · 14221c84

Zafir Stojanovski authored May 30, 2024



* Higher is better tickers in output table

* add extra check for `higher_is_better` not being None already

* Update lm_eval/evaluator.py

* fixup format I messed up

* add comment (and retrigger tests)

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

14221c84

[HFLM]Add support for Ascend NPU (#1886) · 8f716817

Huazhong Ji authored May 31, 2024



* [HFLM]Add support for Ascend NPU
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
Co-authored-by: zhabuye <2947436155@qq.com>

* bump accelerate dependency version to 0.26.0 for NPU compat.

---------
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
Co-authored-by: zhabuye <2947436155@qq.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8f716817

28 May, 2024 1 commit
- Updated vllm imports in vllm_causallms.py (#1890) · b4cd85d4
  Michael Goin authored May 28, 2024
```
* Reorder vllm imports in vllm_causallms.py

* Update vllm_causallms.py
```
  b4cd85d4
26 May, 2024 1 commit
- Rename `lm_eval.logging -> lm_eval.loggers` (#1858) · 0ff6ab99
  Hailey Schoelkopf authored May 26, 2024
```
* rename lm_eval.logging module

* fix evaluation tracker args
```
  0ff6ab99