Commits · 5b0b8a56dcaf3df0c1d3787d2d9ca8a491241b66 · gaoqiong / lm-evaluation-harness

26 Jan, 2024 1 commit
- Refix issue regarding stderr (#1357) · 5b0b8a56
  thnkinbtfly authored Jan 26, 2024
  
  5b0b8a56
25 Jan, 2024 2 commits

Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (#1330) · 52f48e8c
Hailey Schoelkopf authored Jan 25, 2024
```
* Update README.md

* [!Tip]
```
52f48e8c

`Filter` docs not offset by `doc_id` (#1349) · a0f1cacd

Baber Abbasi authored Jan 26, 2024

* get `doc` from instance

* acceletate bugfix: get ground doc from instance

* convert filter to `process_result`

* get docs from instances in `FilterEnsemble`

* rename

* nit

* better looping

* fix typehint

a0f1cacd

24 Jan, 2024 2 commits
- update links to task_guide.md (#1348) · 34cded30
  Hailey Schoelkopf authored Jan 24, 2024
  
  34cded30
- modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (#1345) · 38c8d02f
  Baber Abbasi authored Jan 24, 2024
  
  38c8d02f
23 Jan, 2024 4 commits

manage default (greedy) gen_kwargs in vllm (#1341) · 081deb8b

Baber Abbasi authored Jan 24, 2024

* manage default (greedy) gen_kwargs in vllm better

* mirror HF `do_sample`

* just need to set temp=0 for greedy

081deb8b

Don't use `get_task_dict()` in task registration / initialization (#1331) · 969b48bf

Hailey Schoelkopf authored Jan 23, 2024



* don't use get_task_dict() as a helper, it will download the dataset!

* pre-commit

* Update README.md

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

969b48bf

Update migrated HF dataset paths (#1332) · 45a8f709

Hailey Schoelkopf authored Jan 22, 2024



* Update arc_easy.yaml

* Update flan_cot.yaml

* update HF dataset path

* Update freeform.yaml

* Update flan_cot.yaml

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

45a8f709

skip "benchmarks" in changed_tasks (#1336) · 6e65ef38
Baber Abbasi authored Jan 23, 2024

6e65ef38

22 Jan, 2024 5 commits
- fix a trailing whitespace that breaks a lint job (#1335) · 84357a46
  Brian Vaughan authored Jan 22, 2024
  
  84357a46
- fallback to classname when LM doesnt have config (#1334) · 607d7da5
  Brian Vaughan authored Jan 22, 2024
  
  607d7da5
- Add `local-completions` support using OpenAI interface (#1277) · 5c25dd55
  Michael Goin authored Jan 22, 2024
```
* Add `local-completions` support using OpenAI interface

* Refactor oa_completion

* Address tokenizer comments and change request chunks to batch size

* Add warning message for tiktoken backend

* fix formatting

* fix whitespace

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  5c25dd55
- Fix Issue regarding stderr (#1327) · 076372ee
  Lintang Sutawika authored Jan 23, 2024
```
* add fix fordeciding if stderr is N/A or not

* process N/A
```
  076372ee
- don't pass extra kwargs to mamba any more (#1328) · 181ccf43
  Hailey Schoelkopf authored Jan 22, 2024
  
  181ccf43
19 Jan, 2024 1 commit
- Update polemo2_in.yaml (#1318) · 4477d572
  Lintang Sutawika authored Jan 20, 2024
  
  4477d572
18 Jan, 2024 7 commits
- Update task_guide.md (#1316) · b93c3bcb
  kwrobel.eth authored Jan 18, 2024
  
  b93c3bcb
- Fix group register (#1315) · 72ea626e
  Lintang Sutawika authored Jan 19, 2024
```
* tuple should be considered as well

* set option to keep callable as callable
```
  72ea626e
- Update pyproject.toml (#1314) · b4c6bdb7
  Hailey Schoelkopf authored Jan 18, 2024
  
  b4c6bdb7
- Fix polemo2_in.yaml config name (#1313) · b8cbc425
  Quentin Lhoest authored Jan 18, 2024
  
  b8cbc425
- Update pyproject.toml (#1312) · 6f60a924
  Hailey Schoelkopf authored Jan 18, 2024
  
  6f60a924
- Update task_guide.md (#1306) · 92cc10ef
  Danielle Pintz authored Jan 18, 2024
  
  92cc10ef
- Update nq_open.yaml (#1305) · 10488d0d
  Hannibal046 authored Jan 18, 2024
```
* Update nq_open.yaml

change regex

* Bump NQ version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  10488d0d
16 Jan, 2024 2 commits
- Update README.md with custom integration doc (#1298) · ada4a31d
  Mark Saroufim authored Jan 16, 2024
```
* Update README.md

* punctuation

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  ada4a31d
- Update nq_open.yaml (#1289) · 032e879b
  Hailey Schoelkopf authored Jan 15, 2024
  
  032e879b
15 Jan, 2024 7 commits
- Update CITATION.bib (#1285) · 588a493c
  Hailey Schoelkopf authored Jan 15, 2024
```
Bumping CITATION.bib to match re-adding the citation in readme. 

cc @StellaAthena
```
  588a493c
- Re-add citation · 39a465ca
  Stella Biderman authored Jan 15, 2024
```
It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.
```
  39a465ca
- Rework documentation for explaining local dataset (#1284) · b074ccb6
  Lintang Sutawika authored Jan 15, 2024
```
* rewor documentation for explaining local dataset

* fix typo

* Update new_task_guide.md
```
  b074ccb6
- Fix data-parallel evaluation with quantized models (#1270) · ef665088
  Hailey Schoelkopf authored Jan 15, 2024
```
* add WIP device_map overrides

* update handling outside of accelerate launcher

* change .to(device) log to debug level

* run linter
```
  ef665088
- Allow parameter edits for registered tasks when listed in a benchmark (#1273) · 03e7df51
  Lintang Sutawika authored Jan 15, 2024
```
* benchmark yamls allow minor edits of already registered tasks

* add documentation

* removed print
```
  03e7df51
- Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (#1261) · 39e7b264
  Hailey Schoelkopf authored Jan 15, 2024
```
* Make parallelize=True distinction clearer in documentation.

* run linter
```
  39e7b264
- fix whitespace in target + prompt for CoT gsm8k (#1275) · ace4393e
  Hailey Schoelkopf authored Jan 15, 2024
  
  ace4393e
12 Jan, 2024 3 commits
- apply process_docs() to fewshot_split too (#1276) · 89618bf8
  Hailey Schoelkopf authored Jan 12, 2024
  
  89618bf8
- add Kobest (#1263) · 653217a7
  jp authored Jan 12, 2024
```
* Add: kobest config file

* Add: kobest utils

* Add: README

* Update utils.py
```
  653217a7
- update versioning logging (#1271) · 75dc2b87
  Hailey Schoelkopf authored Jan 11, 2024
  
  75dc2b87
11 Jan, 2024 3 commits
- Update README.md · eed2d3a6
  Stella Biderman authored Jan 11, 2024
  
  eed2d3a6
- Fix bug in multi-token Stop Sequences (#1268) · ff739414
  Hailey Schoelkopf authored Jan 11, 2024
```
* fix incorrect lookback protections

* bump generate_until task versions
```
  ff739414
- MultiMedQA (#1198) · 818c056b
  Tanishq Abraham authored Jan 10, 2024
```
* multimedqa

* Update medqa.yaml

* move to benchmarks folder

* add README.md

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>
```
  818c056b
10 Jan, 2024 3 commits
- Call "exact_match" once for each multiple-target sample (#1266) · 692e0f83
  Baber Abbasi authored Jan 10, 2024
```
* Refine scoring logic for multiple_target "exact_match" metric

* skip old tests from master

* skip old tests from master

* delete tests from master
```
  692e0f83
- fixed belebele (#1267) · 9b0b15b1
  James A. Michaelov authored Jan 10, 2024
  
  9b0b15b1
- specify utf-8 encoding to save samples to file. (#1265) · 7264a2e0
  Baber Abbasi authored Jan 10, 2024
  
  7264a2e0