Commits · 566acef53ffd5cd909a15dcd5e15cb07fb8fc51d · gaoqiong / lm-evaluation-harness

01 Jul, 2024 12 commits
- update multimodal args · 566acef5
  lintangsutawika authored Jul 01, 2024
  
  566acef5
- update · df137b46
  lintangsutawika authored Jul 01, 2024
  
  df137b46
- remove lines · ae25104c
  lintangsutawika authored Jul 01, 2024
  
  ae25104c
- removed lines · 86e7637b
  lintangsutawika authored Jul 01, 2024
  
  86e7637b
- removed lines · cf040ac5
  lintangsutawika authored Jul 01, 2024
  
  cf040ac5
- working draft · facb2f89
  lintangsutawika authored Jul 01, 2024
  
  facb2f89
- add task yamls · 2b87299e
  lintangsutawika authored Jul 01, 2024
  
  2b87299e
- remove input_type · 8bff2285
  lintangsutawika authored Jul 01, 2024
  
  8bff2285
- modify name of installation name · 407339c3
  lintangsutawika authored Jul 01, 2024
  
  407339c3
- request_list edit · 82cd972c
  lintangsutawika authored Jul 01, 2024
  
  82cd972c
- edit arguments · 1794975e
  lintangsutawika authored Jul 01, 2024
  
  1794975e
- rework doc_to_visual · 8a8c2982
  lintangsutawika authored Jul 01, 2024
  
  8a8c2982
17 Jun, 2024 3 commits
- More Format Changes · 7d7a3a1c
  Ashvin Nihalani authored Jun 17, 2024
  
  7d7a3a1c
- Ruff Linter Checks · 2dc436fa
  Ashvin Nihalani authored Jun 17, 2024
  
  2dc436fa
- Adding LLaVa support · 1dda496f
  Ashvin Nihalani authored Mar 13, 2024
```
Updating APIs for MM support

Adding MLLM dependencies

Rebase off mainline
```
  1dda496f
13 Jun, 2024 4 commits

fix: add directory filter to os.walk to ignore 'ipynb_checkpoints' (#1956) · 568af943

johnwee1 authored Jun 13, 2024



* fix: add filter to os.walk to ignore 'ipynb_checkpoints

* Update __init__.py

* Update __init__.py

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

568af943

make write_out.py explicitly error if no splits match (#1796) · ed72238f
Hailey Schoelkopf authored Jun 13, 2024
```
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
ed72238f
Fix `--gen_kwargs` and VLLM (`temperature` not respected) (#1800) · 5c7cba23
Hailey Schoelkopf authored Jun 13, 2024
```
* Update vllm_causallms.py

* adjust

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
5c7cba23

`samples` is newline delimited (#1930) · 3850e21a

Baber Abbasi authored Jun 13, 2024



* `samples` is newline delimited

* updated git and pre-commit

* appease pre-commit

* nit

* Revert back for now

* Revert for now

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

3850e21a

12 Jun, 2024 2 commits
- Fix self.max_tokens in anthropic_llms.py (#1848) · 793469e0
  Nikita Lozhnikov authored Jun 12, 2024
```
Fix bug where `self.max_tokens` was not set
```
  793469e0
- Update interface.md (#1955) · 6f434934
  Sadra Barikbin authored Jun 12, 2024
  
  6f434934
11 Jun, 2024 4 commits
- add hacky add_bos_token forcing for Gemma to VLLM too (#1857) · b3e4c49a
  Hailey Schoelkopf authored Jun 11, 2024
  
  b3e4c49a
- add include_defaults kwarg to taskmanager, add tests for include_path (#1856) · 4bb77e82
  Hailey Schoelkopf authored Jun 11, 2024
  
  4bb77e82
- Remove AMMLU Due to Translation (#1948) · d0f6e011
  Hailey Schoelkopf authored Jun 11, 2024
```
* Update README.md

* Delete lm_eval/tasks/ammlu directory
```
  d0f6e011
- Results filenames handling fix (#1926) · 69952581
  KonradSzafer authored Jun 11, 2024
```
* results filenames handling moved to utils

* zeno results handling fix

* tasks_for_model backward compatibility

* results files logic moved to tasks_for_model

* moved sanitize_model_name to utils
```
  69952581
10 Jun, 2024 1 commit
- Add the Arabic version with refactor to Arabic pica to be in alghafa folder (#1940) · 305fb636
  khalil authored Jun 10, 2024
  
  305fb636
09 Jun, 2024 1 commit
- Update __main__.py (#1939) · bea1a859
  Sadra Barikbin authored Jun 09, 2024
  
  bea1a859
07 Jun, 2024 4 commits

Test output table layout consistency (#1916) · 40f5458f

Zafir Stojanovski authored Jun 07, 2024

* sort metrics in output table

* update docstring in `consolidate_results`

* add tests for verifying consistency of table output

* update tests to account for floating point inconsistencies

* updated tests based on `pythia-14m`

40f5458f

Update basque-glue (#1913) · 59418aac

zhabuye authored Jun 07, 2024

* Update README.md

* Update bec.yaml

* Update bhtc.yaml

* Update coref.yaml

* Update qnli.yaml

* Update vaxx.yaml

* Update wic.yaml

59418aac

Update siqa.yaml (#1909) · 3f0ef80b
Hailey Schoelkopf authored Jun 07, 2024

3f0ef80b
Add The Arabic version of the PICA benchmark (#1917) · 923852b0
khalil authored Jun 07, 2024

923852b0

06 Jun, 2024 3 commits

Implement NoticIA (#1912) · f2843b2f
Iker García-Ferrero authored Jun 06, 2024
```
* Noticia

* test

* Final testes implementation

* Fixes

* Fix linters
```
f2843b2f

Add new Lambada translations (#1897) · b9d96b50

Zafir Stojanovski authored Jun 06, 2024



* added tasks and task family descriptors

* configs for the new lambada translations

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* update `lm_eval/tasks/README.md` with task description

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: anthony <anthonydipofi@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

b9d96b50

[add] fld logical formula task (#1931) · 33eef48f
MorishT authored Jun 06, 2024

33eef48f

05 Jun, 2024 2 commits

Modify pre-commit hook to check merge conflicts accidentally committed not at... · e39df01c
LSinev authored Jun 05, 2024
```
Modify pre-commit hook to check merge conflicts accidentally committed not at current merge commit (#1927)
```
e39df01c

Multiple Choice Questions and Large Languages Models: A Case Study with... · 7257aa2e

Maxime authored Jun 05, 2024

Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data (#1867)

* glianorex tasks

* Create README.md

* Update README.md

* Update README.md

* fix formatting

* fix internal formatting

7257aa2e

03 Jun, 2024 3 commits

Add chat template (#1873) · 070d31df

KonradSzafer authored Jun 03, 2024



* initial chat template

* tokenizer attribute check

* variable rename

* interface update

* system instruction

* system inst default update

* fewshot as multiturn

* typing update

* indent update

* added comments

* Adding a fewshot in a more readable way

* linting

* Moved apply chat template to LM

* multiturn alternation fix

* cache key update

* apply chat template method fix

* add system prompt hash to cache_key

* tokenizer name property for cache_key

* property name fix

* linting backward compatibility fix

* docs and errors update

* add documentation on adding chat template compatibility to model_guide

* fewshot as multiturn check fix

* saving system inst and chat template in results

* eval tracker update

* docs update

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

070d31df

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

Fix fewshot seed only set when overriding num_fewshot (#1914) · 42d5c4bf
LSinev authored Jun 03, 2024
```
Fix #1906
```
42d5c4bf

31 May, 2024 1 commit
- Try to make existing tests run little bit faster (#1905) · 1060b68d
  LSinev authored May 31, 2024
  
  1060b68d