Commits · 34b563b18b6073b7cd4e040896c3363d321b641a · gaoqiong / lm-evaluation-harness

29 Dec, 2023 1 commit

Don't silence errors when loading tasks (#1148) · 34b563b1

Paul McCann authored Dec 30, 2023



* Add example failing task

This task includes an invalid import. This will cause an exception and
the task will not be loaded. But this just results in a DEBUG level log
message, so in normal usage you'll see no error, and will be told the
task doesn't exist.

Here's an example command line to run the task:

    python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail

This task is based on a Japanese Winograd task, but that's not
important, and was just used due to familiarity.

* Do not ignore errors when loading tasks

* Change how task errors are logged

This makes the proposed changes from PR discussion.

1. Exceptions not related to missing modules/imports are logged as
   warnings.

2. module/import related exceptions are still logged at debug level, but
   if any of them happen there is a warning about it with instructions
   on how to show logs.

* Remove intentionally failing task

---------
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

34b563b1

28 Dec, 2023 1 commit
- add length of strings and answer options to metadata (#1222) · 46c79664
  Alex Bäuerle authored Dec 28, 2023
  
  46c79664
27 Dec, 2023 2 commits
- nits + fix siqa (#1216) · 6a1c19ed
  Baber Abbasi authored Dec 27, 2023
```
* fix group

* siqa: default.yml -> default.yaml

* max_gen_toks -> self.max_gen_toks

* add ids to task tests

* fix siqa

* fix gen_kwargs for openai-chat
```
  6a1c19ed
- fix unbounded local variable (#1218) · f2853995
  Jaewoo Yang authored Dec 27, 2023
  
  f2853995
25 Dec, 2023 1 commit
- pin vllm at < 0.2.6 (#1212) · af74a93d
  Hailey Schoelkopf authored Dec 25, 2023
  
  af74a93d
24 Dec, 2023 2 commits
- fix: incorrect argument order in `utils.divide` doc (#1208) · e4970d81
  Yuliang Li authored Dec 24, 2023
  
  e4970d81
- Add remove_whitespace to FLD benchmark (#1206) · 8ffbe58a
  MorishT authored Dec 24, 2023
```
* Add remove_whitespace to FLD benchmark

* bump task version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  8ffbe58a
23 Dec, 2023 2 commits

Consolidate batching (#1197) · 9fb2ebab

Baber Abbasi authored Dec 23, 2023



* refactor dataloader

* cleanup + add docs

* change arg

* renamed Collator and added testing

* parametrized test for Collator

* appease pre-commit

* added edge case batch 0 (no batching)

* fix typos

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9fb2ebab

Fix documentation in API table (#1203) · b12bb1d4
Hailey Schoelkopf authored Dec 23, 2023

b12bb1d4

22 Dec, 2023 5 commits

Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/437 (#1180) · 8286b1d9
Anjor Kanekar authored Dec 22, 2023

8286b1d9

Upstream Mamba Support (`mamba_ssm`) (#1110) · 5503b274

Hailey Schoelkopf authored Dec 22, 2023

* modularize HFLM code

* pass through extra kwargs to AutoModel.from_pretrained call

* remove explicit model_kwargs

* rename gptq -> autogptq

* fix tokenizer pad token errors

* ensure model always respects device_map and autogptq's selected devices

* add a _get_config helper fn

* add mambaLMWrapper

* add mamba extra

* add mamba extra

* fix conditional import

* Fix botched merge commit

* Remove beginning-of-file comment for consistency

* Add docstring for mambaLM re: supported kwargs

* Alphabetize extras

* Update extras table

* appease precommit

* run precommit on mamba_lm

5503b274

Update minerva_math_algebra.yaml (#1189) · b69ca72e
Hailey Schoelkopf authored Dec 22, 2023

b69ca72e
Refer in README to main branch (#1200) · 25cefbc1
Bram Vanroy authored Dec 22, 2023

25cefbc1

Generic decorator for handling rate limit errors (#1109) · 046ea6e2

Zach Schillaci authored Dec 21, 2023



* Add retry error handler

* fixup! Add retry error handler

* Move to utils.py

* Run isort on utils.py

* Catch multiple exceptions

* Update LMs with exception handler

* Fixes to anthropic retry handler

* fix callback kwarg

* Update textsynth.py

* fix python 3.8 incompatibility

* fix indenterror I introduced

* placate linter?

* Update on_exception_callback kwarg name

* fixup! Merge branch 'main' into add-retry-error-handler

* fixup! fixup! Merge branch 'main' into add-retry-error-handler

* Merge conflicts are fun

* Run pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

046ea6e2

21 Dec, 2023 8 commits
- disable `mypy` (#1193) · 09493fd2
  Baber Abbasi authored Dec 22, 2023
  
  09493fd2
- Update README.md (#1181) · 9267354e
  Anjor Kanekar authored Dec 21, 2023
```
* Update README.md

Add a not about running on apple arm gpus

* Update README.md

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  9267354e
- Remove tokenizer for openai chat completions (#1191) · da0a5e36
  Anjor Kanekar authored Dec 21, 2023
```
* remove tokenizer for openai chat completions

* reordering function

* linter

* remove tiktoken import
```
  da0a5e36
- update Zeno example and reference in README (#1190) · 84790e99
  Alex Bäuerle authored Dec 21, 2023
  
  84790e99
- Correctly Print Task Versioning (#1173) · 9cd79897
  Hailey Schoelkopf authored Dec 21, 2023
```
* change version field formatting in metadata

* mention versioning in new task guide

* add instructions for changelog

* run linters
```
  9cd79897
- Add tokenizer backend (#1186) · a0cfe3f6
  Anjor Kanekar authored Dec 21, 2023
```
* separate local flag

* tokenizer_backend

* import order
```
  a0cfe3f6
- Update README.md (#1183) · 2b0b6fd8
  Anjor Kanekar authored Dec 21, 2023
  
  2b0b6fd8
- Update README.md (#1184) · e548d94d
  Anjor Kanekar authored Dec 21, 2023
  
  e548d94d
20 Dec, 2023 4 commits

Implementing local OpenAI API-style chat completions on any given inference server (#1174) · fcfc0c60

Vicki Boykis authored Dec 20, 2023

* LocalChatCompletionsLM add

* clean up completions class

* clean up completions class

* update tokens

* README

* fix constructor

* eos token

* folding local-chat-completions into OpenAIChatCompletions

* refactoring to include gen_kwargs as passable option

* add todo on chat completion kwarg validation

* Ruff and README fix

* generalize to **kwargs

* remove unnecessary kwargs

* README and remove kwargs

* README

fcfc0c60

Error in --num_fewshot option for K-MMLU Evaluation Harness (#1178) · 12f2c5ea
GUIJIN SON authored Dec 21, 2023
```
* update kmmlu default formatting

* Update _default_kmmlu_yaml

* Delete lm_eval/tasks/kmmlu/utils.py
```
12f2c5ea

Switch Linting to `ruff` (#1166) · 65b8761d

Baber Abbasi authored Dec 20, 2023

* add ruff and isort. remove black and flake8

* remove unnecessary dependencies

* remove dependency from table

* change order

* ran ruff

* check 3.9

* exclude evaluator

* update CI workflow

* use ruff config in pyproject.toml

* test

* add isort rules to ruff

* sort imports

* import `make_table`

* try stages for no-commit-to-branch

* turn on mypy for pre-commit

* test

* test

* test

* change no-commit-to-branch to default

* nits

* fixed dependency

65b8761d

feat: add option to upload results to Zeno (#990) · 21d4ae98

Alex Bäuerle authored Dec 20, 2023



* feat: add option to upload results to Zeno

* config-based upload supporting different task types and metrics

* upload tasks as individual projects

* wording

* readme

* add example notebook

* Update documentation for Zeno integration

* Make zeno deps an extra

* Update README.md

* Document extra deps installation

* Update zeno_visualize.py

* fix: balance parens

* fix typo

* fix merge commit I botched

* Update zeno_visualize.py

* Update logger warning stmt

* fix whitespace

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

21d4ae98

19 Dec, 2023 5 commits

Fix Column Naming and Dataset Naming Conventions in K-MMLU Evaluation (#1171) · 9e03d9d0

seungduk.kim.2304 authored Dec 20, 2023



* Correct column names and dataset names

* Remove kmmlu_general_physics.yaml and kmmlu_korean_language.yaml

* Update _default_kmmlu_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9e03d9d0

self.device in huggingface.py line 210 treated as torch.device but might be a string (#1172) · 78545d42

Pasquale Minervini authored Dec 19, 2023



* self.device in huggingface.py line 210

In huggingface.py line 210, self.device is str and does not have a "type" attribute

* Update huggingface.py

This handles both the case where `self.device` is a `torch.device` and a string

* Update huggingface.py

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

78545d42

generalize qwen pad token fix (#1146) · 3f0a3611
Hailey Schoelkopf authored Dec 19, 2023

3f0a3611

Simplify evaluator (#1126) · 42730d90

Lintang Sutawika authored Dec 19, 2023

* save progress

* fixed issue with table only showing 1 group

* store aliases directly in results_agg

* removed unused parts

42730d90

Add docs on adding a multiple choice metric (#1147) · 8e87eff4
Paul McCann authored Dec 19, 2023
```
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
```
8e87eff4

18 Dec, 2023 5 commits
- Update CODEOWNERS · 13fbfef7
  Stella Biderman authored Dec 18, 2023
  
  13fbfef7
- Remove GooseAI docs and change no-commit-to-branch precommit hook (#1154) · c2fad099
  Vicki Boykis authored Dec 18, 2023
```
* remove gooseAI

* Modify preconfig to specify commit branch

* precommit

* remove openai alias for completions
```
  c2fad099
- bugfix (#1150) · 6f9630c8
  Baber Abbasi authored Dec 18, 2023
  
  6f9630c8
- Add shorthand flags (#1149) · 3ab5a262
  Baber Abbasi authored Dec 18, 2023
  
  3ab5a262
- set `--gen_kwargs` arg to None (#1145) · 08fcf1fe
  Baber Abbasi authored Dec 18, 2023
```
* set `--gen_kwargs` to None + add help to CLI

* add logging metavar

* fix verbosity help messages

* Reorder severity levels.
```
  08fcf1fe
17 Dec, 2023 1 commit

[WIP] Add IFEval / Instruction-Following Eval (#1087) · aa61f940

Wis Kojohnjaratkul authored Dec 17, 2023

* Add IFEval task

* Check and download nltk punkt if not already downloaded

* Update gen_max_toks to 2048 to support "900 words+" instructions

* Resolve pre-commit linting issues

* Reduce max_gen_toks to 1280 to conserve token usage

* Add warning message in `process_results` call for non chat-finetuned models

aa61f940

16 Dec, 2023 2 commits
- openai nits (#1139) · 8f5b2295
  Baber Abbasi authored Dec 16, 2023
```
* fixed syntactic nits

* fix temperature and seed

* fix logprobs

* fixup merge
```
  8f5b2295
- `use_tqdm=False` if batch_size != auto (#1144) · f7c67f0e
  Baber Abbasi authored Dec 16, 2023
  
  f7c67f0e
15 Dec, 2023 1 commit
- Enabling OpenAI completions via gooseai (#1141) · bd0f2414
  Vicki Boykis authored Dec 15, 2023
```
* enabling OpenAI completions via gooseai

* openai-completions and pin openai
```
  bd0f2414