Commits · eed2d3a6e40b60cf9df0bf52d3d6f572a3dc5fb8 · gaoqiong / lm-evaluation-harness

11 Jan, 2024 3 commits
- Update README.md · eed2d3a6
  Stella Biderman authored Jan 11, 2024
  
  eed2d3a6
- Fix bug in multi-token Stop Sequences (#1268) · ff739414
  Hailey Schoelkopf authored Jan 11, 2024
```
* fix incorrect lookback protections

* bump generate_until task versions
```
  ff739414
- MultiMedQA (#1198) · 818c056b
  Tanishq Abraham authored Jan 10, 2024
```
* multimedqa

* Update medqa.yaml

* move to benchmarks folder

* add README.md

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>
```
  818c056b
10 Jan, 2024 3 commits
- Call "exact_match" once for each multiple-target sample (#1266) · 692e0f83
  Baber Abbasi authored Jan 10, 2024
```
* Refine scoring logic for multiple_target "exact_match" metric

* skip old tests from master

* skip old tests from master

* delete tests from master
```
  692e0f83
- fixed belebele (#1267) · 9b0b15b1
  James A. Michaelov authored Jan 10, 2024
  
  9b0b15b1
- specify utf-8 encoding to save samples to file. (#1265) · 7264a2e0
  Baber Abbasi authored Jan 10, 2024
  
  7264a2e0
08 Jan, 2024 2 commits

Stella Biderman authored Jan 08, 2024

Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.

ecb1df28

fixed fewshot loading for multiple input tasks (#1255) · cf6a8321
Lintang Sutawika authored Jan 08, 2024

cf6a8321

05 Jan, 2024 2 commits

Do not escape ascii is logging outputs (#1246) · 28ec7fa9

Sam Passaglia authored Jan 05, 2024



* do not ensure ascii

* Update __main__.py

---------
Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

28ec7fa9

Add multilingual HellaSwag task (#1228) · 28bb45fb

JorgeDeCorte authored Jan 05, 2024



* add hellaswag_nl

* add other languages and update readme to hellaswag

* refactor as new task

* update readme

* add endline to yaml files and readme.md

* add group, change folder location and update yaml file

* rename default hellaswag yaml file

* fix whitespace error in some labels

* downgrade log level of whitespace checking

---------
Co-authored-by: JorgeDeCorte <jorge.decorte@ravago.be>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

28bb45fb

04 Jan, 2024 2 commits

Remove self.dataset_path post_init process (#1243) · e7c03d0c
Lintang Sutawika authored Jan 04, 2024
```
* Remove self.dataset_path post_init process

* Update task.py

* Update task.py
```
e7c03d0c

vllm: handle max_length better and substitute Collator (#1241) · eca6926b

Baber Abbasi authored Jan 04, 2024



* copies max_length from huggingface

* handle max_length properly

* get tokens from inputs

* substitute Collator for Reorderer

* `batch=auto` if using data_parallel

* nit

* cleanup

* update code comments

* `ray.shutdown()` after calling method if data_parallel_size > 1

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

eca6926b

02 Jan, 2024 3 commits
- Update openai_completions.py (#1238) · 25a15379
  Stella Biderman authored Jan 02, 2024
  
  25a15379
- batch_schedular bug in Collator (#1229) · 4d10ad56
  Baber Abbasi authored Jan 02, 2024
```
* auto-batch requires len of iter

* handle case when batch_size="auto:N"
```
  4d10ad56
- Update README.md (#1230) · a12ef445
  Pasquale Minervini authored Jan 02, 2024
  
  a12ef445
30 Dec, 2023 1 commit
- Update README.md (#1195) · 1229862a
  Anjor Kanekar authored Dec 30, 2023
  
  1229862a
29 Dec, 2023 1 commit

Don't silence errors when loading tasks (#1148) · 34b563b1

Paul McCann authored Dec 30, 2023



* Add example failing task

This task includes an invalid import. This will cause an exception and
the task will not be loaded. But this just results in a DEBUG level log
message, so in normal usage you'll see no error, and will be told the
task doesn't exist.

Here's an example command line to run the task:

    python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail

This task is based on a Japanese Winograd task, but that's not
important, and was just used due to familiarity.

* Do not ignore errors when loading tasks

* Change how task errors are logged

This makes the proposed changes from PR discussion.

1. Exceptions not related to missing modules/imports are logged as
   warnings.

2. module/import related exceptions are still logged at debug level, but
   if any of them happen there is a warning about it with instructions
   on how to show logs.

* Remove intentionally failing task

---------
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

34b563b1

28 Dec, 2023 1 commit
- add length of strings and answer options to metadata (#1222) · 46c79664
  Alex Bäuerle authored Dec 28, 2023
  
  46c79664
27 Dec, 2023 2 commits
- nits + fix siqa (#1216) · 6a1c19ed
  Baber Abbasi authored Dec 27, 2023
```
* fix group

* siqa: default.yml -> default.yaml

* max_gen_toks -> self.max_gen_toks

* add ids to task tests

* fix siqa

* fix gen_kwargs for openai-chat
```
  6a1c19ed
- fix unbounded local variable (#1218) · f2853995
  Jaewoo Yang authored Dec 27, 2023
  
  f2853995
25 Dec, 2023 1 commit
- pin vllm at < 0.2.6 (#1212) · af74a93d
  Hailey Schoelkopf authored Dec 25, 2023
  
  af74a93d
24 Dec, 2023 2 commits
- fix: incorrect argument order in `utils.divide` doc (#1208) · e4970d81
  Yuliang Li authored Dec 24, 2023
  
  e4970d81
- Add remove_whitespace to FLD benchmark (#1206) · 8ffbe58a
  MorishT authored Dec 24, 2023
```
* Add remove_whitespace to FLD benchmark

* bump task version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  8ffbe58a
23 Dec, 2023 2 commits

Consolidate batching (#1197) · 9fb2ebab

Baber Abbasi authored Dec 23, 2023



* refactor dataloader

* cleanup + add docs

* change arg

* renamed Collator and added testing

* parametrized test for Collator

* appease pre-commit

* added edge case batch 0 (no batching)

* fix typos

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9fb2ebab

Fix documentation in API table (#1203) · b12bb1d4
Hailey Schoelkopf authored Dec 23, 2023

b12bb1d4

22 Dec, 2023 5 commits

Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/437 (#1180) · 8286b1d9
Anjor Kanekar authored Dec 22, 2023

8286b1d9

Upstream Mamba Support (`mamba_ssm`) (#1110) · 5503b274

Hailey Schoelkopf authored Dec 22, 2023

* modularize HFLM code

* pass through extra kwargs to AutoModel.from_pretrained call

* remove explicit model_kwargs

* rename gptq -> autogptq

* fix tokenizer pad token errors

* ensure model always respects device_map and autogptq's selected devices

* add a _get_config helper fn

* add mambaLMWrapper

* add mamba extra

* add mamba extra

* fix conditional import

* Fix botched merge commit

* Remove beginning-of-file comment for consistency

* Add docstring for mambaLM re: supported kwargs

* Alphabetize extras

* Update extras table

* appease precommit

* run precommit on mamba_lm

5503b274

Update minerva_math_algebra.yaml (#1189) · b69ca72e
Hailey Schoelkopf authored Dec 22, 2023

b69ca72e
Refer in README to main branch (#1200) · 25cefbc1
Bram Vanroy authored Dec 22, 2023

25cefbc1

Generic decorator for handling rate limit errors (#1109) · 046ea6e2

Zach Schillaci authored Dec 21, 2023



* Add retry error handler

* fixup! Add retry error handler

* Move to utils.py

* Run isort on utils.py

* Catch multiple exceptions

* Update LMs with exception handler

* Fixes to anthropic retry handler

* fix callback kwarg

* Update textsynth.py

* fix python 3.8 incompatibility

* fix indenterror I introduced

* placate linter?

* Update on_exception_callback kwarg name

* fixup! Merge branch 'main' into add-retry-error-handler

* fixup! fixup! Merge branch 'main' into add-retry-error-handler

* Merge conflicts are fun

* Run pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

046ea6e2

21 Dec, 2023 8 commits
- disable `mypy` (#1193) · 09493fd2
  Baber Abbasi authored Dec 22, 2023
  
  09493fd2
- Update README.md (#1181) · 9267354e
  Anjor Kanekar authored Dec 21, 2023
```
* Update README.md

Add a not about running on apple arm gpus

* Update README.md

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  9267354e
- Remove tokenizer for openai chat completions (#1191) · da0a5e36
  Anjor Kanekar authored Dec 21, 2023
```
* remove tokenizer for openai chat completions

* reordering function

* linter

* remove tiktoken import
```
  da0a5e36
- update Zeno example and reference in README (#1190) · 84790e99
  Alex Bäuerle authored Dec 21, 2023
  
  84790e99
- Correctly Print Task Versioning (#1173) · 9cd79897
  Hailey Schoelkopf authored Dec 21, 2023
```
* change version field formatting in metadata

* mention versioning in new task guide

* add instructions for changelog

* run linters
```
  9cd79897
- Add tokenizer backend (#1186) · a0cfe3f6
  Anjor Kanekar authored Dec 21, 2023
```
* separate local flag

* tokenizer_backend

* import order
```
  a0cfe3f6
- Update README.md (#1183) · 2b0b6fd8
  Anjor Kanekar authored Dec 21, 2023
  
  2b0b6fd8
- Update README.md (#1184) · e548d94d
  Anjor Kanekar authored Dec 21, 2023
  
  e548d94d
20 Dec, 2023 2 commits

Implementing local OpenAI API-style chat completions on any given inference server (#1174) · fcfc0c60

Vicki Boykis authored Dec 20, 2023

* LocalChatCompletionsLM add

* clean up completions class

* clean up completions class

* update tokens

* README

* fix constructor

* eos token

* folding local-chat-completions into OpenAIChatCompletions

* refactoring to include gen_kwargs as passable option

* add todo on chat completion kwarg validation

* Ruff and README fix

* generalize to **kwargs

* remove unnecessary kwargs

* README and remove kwargs

* README

fcfc0c60

Error in --num_fewshot option for K-MMLU Evaluation Harness (#1178) · 12f2c5ea
GUIJIN SON authored Dec 21, 2023
```
* update kmmlu default formatting

* Update _default_kmmlu_yaml

* Delete lm_eval/tasks/kmmlu/utils.py
```
12f2c5ea