Commits · cda25fef4e1df2f4bc2dab3ec6668ae9f5bf7296 · gaoqiong / lm-evaluation-harness

02 Jan, 2024 8 commits
- revert · dfb41835
  lintangsutawika authored Jan 02, 2024
  
  dfb41835
- revert · 2054c2e6
  lintangsutawika authored Jan 02, 2024
  
  2054c2e6
- adjust to be backwards compatible · 2a573a19
  lintangsutawika authored Jan 02, 2024
  
  2a573a19
- adjusted aggregation config · 703e0d55
  lintangsutawika authored Jan 02, 2024
  
  703e0d55
- readd aggregation · 787b23f6
  lintangsutawika authored Jan 02, 2024
  
  787b23f6
- readded suport for aggregation · aaf64aab
  lintangsutawika authored Jan 02, 2024
  
  aaf64aab
- batch_schedular bug in Collator (#1229) · 4d10ad56
  Baber Abbasi authored Jan 02, 2024
```
* auto-batch requires len of iter

* handle case when batch_size="auto:N"
```
  4d10ad56
- Update README.md (#1230) · a12ef445
  Pasquale Minervini authored Jan 02, 2024
  
  a12ef445
29 Dec, 2023 2 commits

Don't silence errors when loading tasks (#1148) · 34b563b1

Paul McCann authored Dec 30, 2023



* Add example failing task

This task includes an invalid import. This will cause an exception and
the task will not be loaded. But this just results in a DEBUG level log
message, so in normal usage you'll see no error, and will be told the
task doesn't exist.

Here's an example command line to run the task:

    python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail

This task is based on a Japanese Winograd task, but that's not
important, and was just used due to familiarity.

* Do not ignore errors when loading tasks

* Change how task errors are logged

This makes the proposed changes from PR discussion.

1. Exceptions not related to missing modules/imports are logged as
   warnings.

2. module/import related exceptions are still logged at debug level, but
   if any of them happen there is a warning about it with instructions
   on how to show logs.

* Remove intentionally failing task

---------
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

34b563b1

list tuple for string based multigpu collection · 439dca55
lintangsutawika authored Dec 29, 2023

439dca55

28 Dec, 2023 9 commits
- process hf evaluate metrics · 99ce4eff
  lintangsutawika authored Dec 28, 2023
  
  99ce4eff
- revert to just load metric_fn · 150f11f6
  lintangsutawika authored Dec 28, 2023
  
  150f11f6
- use HFEvaluateAdaptor for hf metrics · 6a336b15
  lintangsutawika authored Dec 28, 2023
  
  6a336b15
- kwargs are added to metric_fn through partial at the beginning · 20c10dfe
  lintangsutawika authored Dec 28, 2023
  
  20c10dfe
- remove aggregation · e5b245cc
  lintangsutawika authored Dec 28, 2023
  
  e5b245cc
- removed passthrough fn · 039832e5
  lintangsutawika authored Dec 28, 2023
  
  039832e5
- simplify registry · 3888193d
  lintangsutawika authored Dec 28, 2023
  
  3888193d
- aggregation to compute_metric · 9d6bc929
  lintangsutawika authored Dec 28, 2023
  
  9d6bc929
- aggregation to compute_metric · 4d49dd03
  lintangsutawika authored Dec 28, 2023
  
  4d49dd03
27 Dec, 2023 3 commits
- update · c6a91582
  lintangsutawika authored Dec 27, 2023
  
  c6a91582
- nits + fix siqa (#1216) · 6a1c19ed
  Baber Abbasi authored Dec 27, 2023
```
* fix group

* siqa: default.yml -> default.yaml

* max_gen_toks -> self.max_gen_toks

* add ids to task tests

* fix siqa

* fix gen_kwargs for openai-chat
```
  6a1c19ed
- fix unbounded local variable (#1218) · f2853995
  Jaewoo Yang authored Dec 27, 2023
  
  f2853995
24 Dec, 2023 2 commits
- fix: incorrect argument order in `utils.divide` doc (#1208) · e4970d81
  Yuliang Li authored Dec 24, 2023
  
  e4970d81
- Add remove_whitespace to FLD benchmark (#1206) · 8ffbe58a
  MorishT authored Dec 24, 2023
```
* Add remove_whitespace to FLD benchmark

* bump task version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  8ffbe58a
23 Dec, 2023 1 commit

Consolidate batching (#1197) · 9fb2ebab

Baber Abbasi authored Dec 23, 2023



* refactor dataloader

* cleanup + add docs

* change arg

* renamed Collator and added testing

* parametrized test for Collator

* appease pre-commit

* added edge case batch 0 (no batching)

* fix typos

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

9fb2ebab

22 Dec, 2023 3 commits

Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/437 (#1180) · 8286b1d9
Anjor Kanekar authored Dec 22, 2023

8286b1d9

Upstream Mamba Support (`mamba_ssm`) (#1110) · 5503b274

Hailey Schoelkopf authored Dec 22, 2023

* modularize HFLM code

* pass through extra kwargs to AutoModel.from_pretrained call

* remove explicit model_kwargs

* rename gptq -> autogptq

* fix tokenizer pad token errors

* ensure model always respects device_map and autogptq's selected devices

* add a _get_config helper fn

* add mambaLMWrapper

* add mamba extra

* add mamba extra

* fix conditional import

* Fix botched merge commit

* Remove beginning-of-file comment for consistency

* Add docstring for mambaLM re: supported kwargs

* Alphabetize extras

* Update extras table

* appease precommit

* run precommit on mamba_lm

5503b274

Generic decorator for handling rate limit errors (#1109) · 046ea6e2

Zach Schillaci authored Dec 21, 2023



* Add retry error handler

* fixup! Add retry error handler

* Move to utils.py

* Run isort on utils.py

* Catch multiple exceptions

* Update LMs with exception handler

* Fixes to anthropic retry handler

* fix callback kwarg

* Update textsynth.py

* fix python 3.8 incompatibility

* fix indenterror I introduced

* placate linter?

* Update on_exception_callback kwarg name

* fixup! Merge branch 'main' into add-retry-error-handler

* fixup! fixup! Merge branch 'main' into add-retry-error-handler

* Merge conflicts are fun

* Run pre-commit

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

046ea6e2

21 Dec, 2023 3 commits

Remove tokenizer for openai chat completions (#1191) · da0a5e36

Anjor Kanekar authored Dec 21, 2023

* remove tokenizer for openai chat completions

* reordering function

* linter

* remove tiktoken import

da0a5e36

Correctly Print Task Versioning (#1173) · 9cd79897

Hailey Schoelkopf authored Dec 21, 2023

* change version field formatting in metadata

* mention versioning in new task guide

* add instructions for changelog

* run linters

9cd79897

Add tokenizer backend (#1186) · a0cfe3f6
Anjor Kanekar authored Dec 21, 2023
```
* separate local flag

* tokenizer_backend

* import order
```
a0cfe3f6

20 Dec, 2023 3 commits

Implementing local OpenAI API-style chat completions on any given inference server (#1174) · fcfc0c60

Vicki Boykis authored Dec 20, 2023

* LocalChatCompletionsLM add

* clean up completions class

* clean up completions class

* update tokens

* README

* fix constructor

* eos token

* folding local-chat-completions into OpenAIChatCompletions

* refactoring to include gen_kwargs as passable option

* add todo on chat completion kwarg validation

* Ruff and README fix

* generalize to **kwargs

* remove unnecessary kwargs

* README and remove kwargs

* README

fcfc0c60

Error in --num_fewshot option for K-MMLU Evaluation Harness (#1178) · 12f2c5ea
GUIJIN SON authored Dec 21, 2023
```
* update kmmlu default formatting

* Update _default_kmmlu_yaml

* Delete lm_eval/tasks/kmmlu/utils.py
```
12f2c5ea

Switch Linting to `ruff` (#1166) · 65b8761d

Baber Abbasi authored Dec 20, 2023

* add ruff and isort. remove black and flake8

* remove unnecessary dependencies

* remove dependency from table

* change order

* ran ruff

* check 3.9

* exclude evaluator

* update CI workflow

* use ruff config in pyproject.toml

* test

* add isort rules to ruff

* sort imports

* import `make_table`

* try stages for no-commit-to-branch

* turn on mypy for pre-commit

* test

* test

* test

* change no-commit-to-branch to default

* nits

* fixed dependency

65b8761d

19 Dec, 2023 6 commits
- Fix Column Naming and Dataset Naming Conventions in K-MMLU Evaluation (#1171) · 9e03d9d0
  seungduk.kim.2304 authored Dec 20, 2023
```
* Correct column names and dataset names

* Remove kmmlu_general_physics.yaml and kmmlu_korean_language.yaml

* Update _default_kmmlu_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  9e03d9d0
- self.device in huggingface.py line 210 treated as torch.device but might be a string (#1172) · 78545d42
  Pasquale Minervini authored Dec 19, 2023
```
* self.device in huggingface.py line 210

In huggingface.py line 210, self.device is str and does not have a "type" attribute

* Update huggingface.py

This handles both the case where `self.device` is a `torch.device` and a string

* Update huggingface.py

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  78545d42
- generalize qwen pad token fix (#1146) · 3f0a3611
  Hailey Schoelkopf authored Dec 19, 2023
  
  3f0a3611
- changed how metrics are calculated · 6117c507
  lintangsutawika authored Dec 19, 2023
  
  6117c507
- loglikelihood and loglikelihood rolling modified · 028f04c7
  lintangsutawika authored Dec 19, 2023
  
  028f04c7
- change how metrics are registered · 1d262a59
  lintangsutawika authored Dec 19, 2023
  
  1d262a59