Commits · 032e879bf5ff39c08ae0db1f622a5b382a42eaa2 · gaoqiong / lm-evaluation-harness

16 Jan, 2024 1 commit
- Update nq_open.yaml (#1289) · 032e879b
  Hailey Schoelkopf authored Jan 15, 2024
  
  032e879b
15 Jan, 2024 7 commits
- Update CITATION.bib (#1285) · 588a493c
  Hailey Schoelkopf authored Jan 15, 2024
```
Bumping CITATION.bib to match re-adding the citation in readme. 

cc @StellaAthena
```
  588a493c
- Re-add citation · 39a465ca
  Stella Biderman authored Jan 15, 2024
```
It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.
```
  39a465ca
- Rework documentation for explaining local dataset (#1284) · b074ccb6
  Lintang Sutawika authored Jan 15, 2024
```
* rewor documentation for explaining local dataset

* fix typo

* Update new_task_guide.md
```
  b074ccb6
- Fix data-parallel evaluation with quantized models (#1270) · ef665088
  Hailey Schoelkopf authored Jan 15, 2024
```
* add WIP device_map overrides

* update handling outside of accelerate launcher

* change .to(device) log to debug level

* run linter
```
  ef665088
- Allow parameter edits for registered tasks when listed in a benchmark (#1273) · 03e7df51
  Lintang Sutawika authored Jan 15, 2024
```
* benchmark yamls allow minor edits of already registered tasks

* add documentation

* removed print
```
  03e7df51
- Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (#1261) · 39e7b264
  Hailey Schoelkopf authored Jan 15, 2024
```
* Make parallelize=True distinction clearer in documentation.

* run linter
```
  39e7b264
- fix whitespace in target + prompt for CoT gsm8k (#1275) · ace4393e
  Hailey Schoelkopf authored Jan 15, 2024
  
  ace4393e
12 Jan, 2024 3 commits
- apply process_docs() to fewshot_split too (#1276) · 89618bf8
  Hailey Schoelkopf authored Jan 12, 2024
  
  89618bf8
- add Kobest (#1263) · 653217a7
  jp authored Jan 12, 2024
```
* Add: kobest config file

* Add: kobest utils

* Add: README

* Update utils.py
```
  653217a7
- update versioning logging (#1271) · 75dc2b87
  Hailey Schoelkopf authored Jan 11, 2024
  
  75dc2b87
11 Jan, 2024 3 commits
- Update README.md · eed2d3a6
  Stella Biderman authored Jan 11, 2024
  
  eed2d3a6
- Fix bug in multi-token Stop Sequences (#1268) · ff739414
  Hailey Schoelkopf authored Jan 11, 2024
```
* fix incorrect lookback protections

* bump generate_until task versions
```
  ff739414
- MultiMedQA (#1198) · 818c056b
  Tanishq Abraham authored Jan 10, 2024
```
* multimedqa

* Update medqa.yaml

* move to benchmarks folder

* add README.md

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>
```
  818c056b
10 Jan, 2024 3 commits
- Call "exact_match" once for each multiple-target sample (#1266) · 692e0f83
  Baber Abbasi authored Jan 10, 2024
```
* Refine scoring logic for multiple_target "exact_match" metric

* skip old tests from master

* skip old tests from master

* delete tests from master
```
  692e0f83
- fixed belebele (#1267) · 9b0b15b1
  James A. Michaelov authored Jan 10, 2024
  
  9b0b15b1
- specify utf-8 encoding to save samples to file. (#1265) · 7264a2e0
  Baber Abbasi authored Jan 10, 2024
  
  7264a2e0
08 Jan, 2024 2 commits

Stella Biderman authored Jan 08, 2024

Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.

ecb1df28

fixed fewshot loading for multiple input tasks (#1255) · cf6a8321
Lintang Sutawika authored Jan 08, 2024

cf6a8321

05 Jan, 2024 2 commits

Do not escape ascii is logging outputs (#1246) · 28ec7fa9

Sam Passaglia authored Jan 05, 2024



* do not ensure ascii

* Update __main__.py

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

28ec7fa9

Add multilingual HellaSwag task (#1228) · 28bb45fb

JorgeDeCorte authored Jan 05, 2024



* add hellaswag_nl

* add other languages and update readme to hellaswag

* refactor as new task

* update readme

* add endline to yaml files and readme.md

* add group, change folder location and update yaml file

* rename default hellaswag yaml file

* fix whitespace error in some labels

* downgrade log level of whitespace checking

---------
Co-authored-by: JorgeDeCorte <jorge.decorte@ravago.be>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

28bb45fb

04 Jan, 2024 2 commits

Remove self.dataset_path post_init process (#1243) · e7c03d0c
Lintang Sutawika authored Jan 04, 2024
```
* Remove self.dataset_path post_init process

* Update task.py

* Update task.py
```
e7c03d0c

vllm: handle max_length better and substitute Collator (#1241) · eca6926b

Baber Abbasi authored Jan 04, 2024



* copies max_length from huggingface

* handle max_length properly

* get tokens from inputs

* substitute Collator for Reorderer

* `batch=auto` if using data_parallel

* nit

* cleanup

* update code comments

* `ray.shutdown()` after calling method if data_parallel_size > 1

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

eca6926b

02 Jan, 2024 3 commits
- Update openai_completions.py (#1238) · 25a15379
  Stella Biderman authored Jan 02, 2024
  
  25a15379
- batch_schedular bug in Collator (#1229) · 4d10ad56
  Baber Abbasi authored Jan 02, 2024
```
* auto-batch requires len of iter

* handle case when batch_size="auto:N"
```
  4d10ad56
- Update README.md (#1230) · a12ef445
  Pasquale Minervini authored Jan 02, 2024
  
  a12ef445
30 Dec, 2023 1 commit
- Update README.md (#1195) · 1229862a
  Anjor Kanekar authored Dec 30, 2023
  
  1229862a
29 Dec, 2023 1 commit

Don't silence errors when loading tasks (#1148) · 34b563b1

Paul McCann authored Dec 30, 2023



* Add example failing task

This task includes an invalid import. This will cause an exception and
the task will not be loaded. But this just results in a DEBUG level log
message, so in normal usage you'll see no error, and will be told the
task doesn't exist.

Here's an example command line to run the task:

    python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail

This task is based on a Japanese Winograd task, but that's not
important, and was just used due to familiarity.

* Do not ignore errors when loading tasks

* Change how task errors are logged

This makes the proposed changes from PR discussion.

1. Exceptions not related to missing modules/imports are logged as
   warnings.

2. module/import related exceptions are still logged at debug level, but
   if any of them happen there is a warning about it with instructions
   on how to show logs.

* Remove intentionally failing task

---------
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

34b563b1

28 Dec, 2023 1 commit
- add length of strings and answer options to metadata (#1222) · 46c79664
  Alex Bäuerle authored Dec 28, 2023
  
  46c79664
27 Dec, 2023 2 commits
- nits + fix siqa (#1216) · 6a1c19ed
  Baber Abbasi authored Dec 27, 2023
```
* fix group

* siqa: default.yml -> default.yaml

* max_gen_toks -> self.max_gen_toks

* add ids to task tests

* fix siqa

* fix gen_kwargs for openai-chat
```
  6a1c19ed
- fix unbounded local variable (#1218) · f2853995
  Jaewoo Yang authored Dec 27, 2023
  
  f2853995
25 Dec, 2023 1 commit
- pin vllm at < 0.2.6 (#1212) · af74a93d
  Hailey Schoelkopf authored Dec 25, 2023
  
  af74a93d
24 Dec, 2023 2 commits
- fix: incorrect argument order in `utils.divide` doc (#1208) · e4970d81
  Yuliang Li authored Dec 24, 2023
  
  e4970d81
- Add remove_whitespace to FLD benchmark (#1206) · 8ffbe58a
  MorishT authored Dec 24, 2023
```
* Add remove_whitespace to FLD benchmark

* bump task version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  8ffbe58a
23 Dec, 2023 2 commits

Consolidate batching (#1197) · 9fb2ebab

Baber Abbasi authored Dec 23, 2023



* refactor dataloader

* cleanup + add docs

* change arg

* renamed Collator and added testing

* parametrized test for Collator

* appease pre-commit

* added edge case batch 0 (no batching)

* fix typos

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9fb2ebab

Fix documentation in API table (#1203) · b12bb1d4
Hailey Schoelkopf authored Dec 23, 2023

b12bb1d4

22 Dec, 2023 4 commits

Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/437 (#1180) · 8286b1d9
Anjor Kanekar authored Dec 22, 2023

8286b1d9

Upstream Mamba Support (`mamba_ssm`) (#1110) · 5503b274

Hailey Schoelkopf authored Dec 22, 2023

* modularize HFLM code

* pass through extra kwargs to AutoModel.from_pretrained call

* remove explicit model_kwargs

* rename gptq -> autogptq

* fix tokenizer pad token errors

* ensure model always respects device_map and autogptq's selected devices

* add a _get_config helper fn

* add mambaLMWrapper

* add mamba extra

* add mamba extra

* fix conditional import

* Fix botched merge commit

* Remove beginning-of-file comment for consistency

* Add docstring for mambaLM re: supported kwargs

* Alphabetize extras

* Update extras table

* appease precommit

* run precommit on mamba_lm

5503b274

Update minerva_math_algebra.yaml (#1189) · b69ca72e
Hailey Schoelkopf authored Dec 22, 2023

b69ca72e
Refer in README to main branch (#1200) · 25cefbc1
Bram Vanroy authored Dec 22, 2023

25cefbc1