Commits · 0256c68258d9977ee7a7491e863d2320e946986b · gaoqiong / lm-evaluation-harness

11 Feb, 2024 2 commits

Add multilingual ARC task (#1419) · 0256c682
Uanu authored Feb 11, 2024

0256c682

Baber Abbasi authored Feb 11, 2024

* un-exclude `evaluate.py` from linting

* readability

* readability

* add task name to build info message

* fix link

* nit

* add functions for var and mean pooling

* add functions for var and mean pooling

* metadata compatibility with task

* rename `override_config` to `set_config` and move to `Task`

* add unit test

* nit

* nit

* bugfix

* nit

* nit

* nit

* add docstrings

* fix metadata-fewshot

* revert metric refactor

* nit

* type checking

* type hints

* type hints

* move `override_metric` to `Task`

* change metadata

* change name

* pre-commit

* rename

* remove

* remove

* `override_metric` backwards compatible with `Task`

* type hints

* use generic

* type hint

1ff84897

10 Feb, 2024 2 commits

Fix watchdog timeout (#1404) · 1e6825da
Jeevan authored Feb 10, 2024
```
* Fix watchdog timeout

* Pre-commit fix

* Timedelta
```
1e6825da

Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/1416 (#1418) · 921eab86

Pasquale Minervini authored Feb 10, 2024

* Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/1416

Sets `do_sample = False` if `temperature == 0.0` and `do_sample = None`

* Update huggingface.py

* Update huggingface.py

making linter happy

921eab86

09 Feb, 2024 1 commit
- use reversed task hierarchy for print (#1414) · ab4dba8f
  Hailey Schoelkopf authored Feb 09, 2024
  
  ab4dba8f
07 Feb, 2024 1 commit
- `batch_size` with `auto` defaults to 1 if `No executable batch size found` (#1405) · 4c17c55c
  Pasquale Minervini authored Feb 07, 2024
```
Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/1323
```
  4c17c55c
06 Feb, 2024 4 commits

adding hf_transfer (#1400) · 756eeb6f

Michael Feil authored Feb 06, 2024



* add hf_transfer

* update dependencies

* Delete stale `[linting]` extra

* Update README.md with extras table

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

756eeb6f

Use Pooled rather than Combined Variance for calculating stderr of task groupings (#1390) · 94cc1850

Hailey Schoelkopf authored Feb 06, 2024

* update formula for stderr aggregation

* hack: see what happens when using stderr_for_metric bootstrapping on a group

* undo bootstrap_for_stderr test

* factor out variance-aggregation formulas into api.metrics

* fix failing tests

* remove stray print

* update comment

* further detail in comment

* add back initialize_tasks() call

* fix format

94cc1850

Fix confusing `write_out.py` instructions in README (#1371) · df01adf6
Hailey Schoelkopf authored Feb 06, 2024

df01adf6

Update README.md (#1398) · 4d7d2f64

Michael Chen authored Feb 05, 2024

Add instructions for non-MacOS users on how to compile janitor_util.cpp so that janitor.py can use it.

4d7d2f64

05 Feb, 2024 1 commit

Support for Inf2 optimum class [WIP] (#1364) · d17dcea0

Michael Feil authored Feb 05, 2024

* initial commit

* remove overwrite bs

* adding neuronx dependencies

* Update README.md

* update neuronx

d17dcea0

02 Feb, 2024 2 commits
- fix on --task list (#1387) · 74119471
  Lintang Sutawika authored Feb 02, 2024
  
  74119471
- Fix for https://github.com/EleutherAI/lm-evaluation-harness/issues/1383 (#1384) · 9a902155
  Pasquale Minervini authored Feb 02, 2024
```
Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/1383

If this is okay, it will need to be propagated to SCROLLS
```
  9a902155
01 Feb, 2024 4 commits

Faster Task and Group Loading, Allow Recursive Groups (#1321) · d714fc95

Lintang Sutawika authored Feb 01, 2024

* add trust_remote_code as default

* task for testing recursive

* changed source of ALL_TASKS

* tasks should only accept TaskObjects

* initialize_tasks returns list of tasks and groups

* remove trust_remote_code for now

* moved constructor process to inside load_yaml_config

* more comprehensive way to index tasks and groups

* pre-commit format

* add exit after error

* adjust how task objects are called

* no need to use get_task_dict

* load_task_or_group works but only for tasks

* pre-commit format

* half working for nested groups

* changed variable names

* allow groups and tasks to work

* temp save

* indexing and loading are part of a task_manager object

* adapted initialize_tasks

* iron out bugs

* fixed typo

* fixed typo

* simplified code

* further tidy up

* remove lines for testing

* removed test lines

* removed unused code

* remove unused import

* fixed bu...

d714fc95

Enable override of printed `n-shot` in table (#1379) · 17191063
Hailey Schoelkopf authored Feb 01, 2024
```
* allow tasks to specify printed fewshot val

* fix to belebele

* update metadata field's documentation
```
17191063
Hf: minor egde cases (#1380) · 994bdb3f
Baber Abbasi authored Feb 01, 2024
```
* edge cases where variable might not be assigned.

* type hint
```
994bdb3f

Expand docs, update CITATION.bib (#1227) · f5408b6b

Hailey Schoelkopf authored Feb 01, 2024



* Update CITATION.bib

* Create CONTRIBUTING.md

* add disclaimer re: multi node

* flesh out some sections more

* Flesh out contributor guide

* revert CITATION.bib

* appease pre-commit

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

f5408b6b

31 Jan, 2024 5 commits

add bypass metric (#1156) · f8203de1

Baber Abbasi authored Feb 01, 2024

* add bypass metric

* fixed `bypass` metric.

* add task attributes if predict_only

* add `predict_only` checks

* add docs

* added `overide_metric`, `override_config` to `Task`

* nits

* nit

* changed --predict_only to generations; nits

* nits

* nits

* change gen_kwargs warning

* add note about `--predict_only` in README.md

* added `predict_only`

* move table to bottom

* nit

* change null aggregation to bypass (conflict)

* bugfix; default `temp=0.0`

* typo

f8203de1

Add support for RWKV models with World tokenizer (#1374) · 084b7050

Eugene Cheah authored Jan 31, 2024



* Add support for RWKV models with World tokenizer

The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0

This however fails all the "if set" checks, and would cause the tokenizer to crash.

A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers

* Update huggingface.py

Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes.

* Comply with formatting guidelines

* fix format

---------
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

084b7050

Make dependencies compatible with PyPI (#1378) · a0a2fec8
Hailey Schoelkopf authored Jan 31, 2024
```
* make deps not point to github urls

* formatting

* try making PyPI only run on tag pushes
```
a0a2fec8
Publish to pypi (#1194) · 0da0dcba
Anjor Kanekar authored Jan 31, 2024
```
* publish to pypi

* lint

* Update publish.yml

* minor
```
0da0dcba

Fix unintuitive `--gen_kwargs` behavior (#1329) · bd7d265a

Hailey Schoelkopf authored Jan 31, 2024

* don't override do_sample if no value for it is passed

* Update gen_kwargs override condition

* Update huggingface.py

* Update huggingface.py

* run linters

* silence an erroneous warning

bd7d265a

30 Jan, 2024 1 commit
- delay filter init; remove `*args` (#1369) · 1554066c
  Baber Abbasi authored Jan 30, 2024
```
* delay filter init; remove `*args`

* bugfix

* optimize

* type hint
```
  1554066c
29 Jan, 2024 1 commit
- serialize callable functions in config (#1367) · 7fc43656
  Baber Abbasi authored Jan 29, 2024
  
  7fc43656
28 Jan, 2024 1 commit

Apply some best practices and guideline recommendations to code (#1363) · 488759d2

LSinev authored Jan 28, 2024

* raise Exception, not a string

Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes
https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions

* Apply PEP8 recommendation to prefer isinstance

"Object type comparisons should always use isinstance() instead of comparing types directly"
https://peps.python.org/pep-0008/

* Remove dangerous default mutable values in arguments

https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html

* Format logging messages with fstring (not with format)

Additional info
https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html
There are also discussions about the speed of formatting while logging or some unintended code executions
https://github.com/pylint-dev/pylint/issues/2395
https://stackoverflow.com/a/54368109
but at least one format (fstring one) will be u...

488759d2

26 Jan, 2024 2 commits

Add causalLM OpenVino models (#1290) · 97a67d27

NoushNabi authored Jan 26, 2024



* added intel optimum

* added intel optimum in readme

* modified intel optimum

* modified intel optimum

* modified intel optimum

* modified install optimum

* modified path of IR file

* added openvino_device

* added openvino_device2

* changed optimum-causal to openvino-causal

* Update README.md

* Update README.md

* remove `lm_eval.base` import

* update openvino-causal -> openvino ; pass device through super().__init__()

* Update README.md

* Add optimum to tests dependencies

* apply pre-commit

* fix so tests pass

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

97a67d27

Refix issue regarding stderr (#1357) · 5b0b8a56
thnkinbtfly authored Jan 26, 2024

5b0b8a56

25 Jan, 2024 2 commits

Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (#1330) · 52f48e8c
Hailey Schoelkopf authored Jan 25, 2024
```
* Update README.md

* [!Tip]
```
52f48e8c

`Filter` docs not offset by `doc_id` (#1349) · a0f1cacd

Baber Abbasi authored Jan 26, 2024

* get `doc` from instance

* acceletate bugfix: get ground doc from instance

* convert filter to `process_result`

* get docs from instances in `FilterEnsemble`

* rename

* nit

* better looping

* fix typehint

a0f1cacd

24 Jan, 2024 2 commits
- update links to task_guide.md (#1348) · 34cded30
  Hailey Schoelkopf authored Jan 24, 2024
  
  34cded30
- modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (#1345) · 38c8d02f
  Baber Abbasi authored Jan 24, 2024
  
  38c8d02f
23 Jan, 2024 4 commits

manage default (greedy) gen_kwargs in vllm (#1341) · 081deb8b

Baber Abbasi authored Jan 24, 2024

* manage default (greedy) gen_kwargs in vllm better

* mirror HF `do_sample`

* just need to set temp=0 for greedy

081deb8b

Don't use `get_task_dict()` in task registration / initialization (#1331) · 969b48bf

Hailey Schoelkopf authored Jan 23, 2024



* don't use get_task_dict() as a helper, it will download the dataset!

* pre-commit

* Update README.md

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

969b48bf

Update migrated HF dataset paths (#1332) · 45a8f709

Hailey Schoelkopf authored Jan 22, 2024



* Update arc_easy.yaml

* Update flan_cot.yaml

* update HF dataset path

* Update freeform.yaml

* Update flan_cot.yaml

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

45a8f709

skip "benchmarks" in changed_tasks (#1336) · 6e65ef38
Baber Abbasi authored Jan 23, 2024

6e65ef38

22 Jan, 2024 5 commits
- fix a trailing whitespace that breaks a lint job (#1335) · 84357a46
  Brian Vaughan authored Jan 22, 2024
  
  84357a46
- fallback to classname when LM doesnt have config (#1334) · 607d7da5
  Brian Vaughan authored Jan 22, 2024
  
  607d7da5
- Add `local-completions` support using OpenAI interface (#1277) · 5c25dd55
  Michael Goin authored Jan 22, 2024
```
* Add `local-completions` support using OpenAI interface

* Refactor oa_completion

* Address tokenizer comments and change request chunks to batch size

* Add warning message for tiktoken backend

* fix formatting

* fix whitespace

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  5c25dd55
- Fix Issue regarding stderr (#1327) · 076372ee
  Lintang Sutawika authored Jan 23, 2024
```
* add fix fordeciding if stderr is N/A or not

* process N/A
```
  076372ee
- don't pass extra kwargs to mamba any more (#1328) · 181ccf43
  Hailey Schoelkopf authored Jan 22, 2024
  
  181ccf43