Commits · da2119697bd3b16e707f71d0886aed2cf5b0cd40 · gaoqiong / lm-evaluation-harness

03 Jun, 2024 1 commit

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

31 May, 2024 1 commit

Add dataset card when pushing to HF hub (#1898) · f4f59251

KonradSzafer authored May 31, 2024



* dataset card initial

* few fixes

* adds groups for math, mmlu, gpqa

* added summary agrs

* moved sanitize_list to utils

* readme update

* recreate metadata moved

* multiple model support

* results latest split fix

* readme update and small refactor

* fix grouping

* add comments

* added pathlib

* corrected pathlib approach

* check whether to create a metadata card

* convert posix paths to str

* default hf org from token

* hf token value error

* Add logs after successful upload

* logging updates

* dataset card example in the readme

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Alina Lozovskaia <alinailozovskaya@gmail.com>

f4f59251

07 May, 2024 2 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt



The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unitxt]'
2. Added a check that unitxt is installed and print a clear error message if not

* Commited missing pyproject change

* Added documentation on adding datasets

* More doc changes

* add unitxt extra to readme

* run precommit

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

885f48d6

link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b

05 May, 2024 1 commit
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
03 May, 2024 1 commit

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

25 Apr, 2024 1 commit
- reference `--tasks list` in README (#1726) · 80a056bb
  Brian Vaughan authored Apr 25, 2024
```
https://github.com/EleutherAI/lm-evaluation-harness/issues/1698
```
  80a056bb
16 Apr, 2024 2 commits

Add `neuralmagic` models for `sparseml` and `deepsparse` (#1674) · 8b326be7

Michael Goin authored Apr 16, 2024



* Add neuralmagic models for SparseML and DeepSparse

* Update to latest and add test

* Format

* Fix list to List

* Format

* Add deepsparse/sparseml to automated testing

* Update pyproject.toml

* Update pyproject.toml

* Update README

* Fixes for dtype and device

* Format

* Fix test

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Address review comments!

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8b326be7

Add delta weights model loading (#1712) · 12a165d1

KonradSzafer authored Apr 16, 2024

* added delta weights

* removed debug

* readme update

* better error handling

* autogptq warn

* warn update

* peft and delta error, explicitly deleting _model_delta

* linter fix

12a165d1

08 Apr, 2024 1 commit
- Update README.md (#1680) · 7852985b
  Hailey Schoelkopf authored Apr 08, 2024
  
  7852985b
05 Apr, 2024 1 commit

Anthropic Chat API (#1594) · 27924d77

Seungwoo Ryu authored Apr 06, 2024



* claude3

* supply for anthropic claude3

* supply for anthropic claude3

* anthropic config changes

* add callback options on anthropic

* line passed

* claude3 tiny change

* help anthropic installation

* mention sysprompt / being careful with format in readme

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

27924d77

26 Mar, 2024 1 commit

Integration of NeMo models into LM Evaluation Harness library (#1598) · e9d429e1

Sergio Perez authored Mar 26, 2024

* Integration of NeMo models into LM Evaluation Harness library

* rename nemo model as nemo_lm

* move nemo section in readme after hf section

* use self.eot_token_id in get_until()

* improve progress bar showing loglikelihood requests

* data replication or tensor/pipeline replication working fine within one node

* run pre-commit on modified files

* check whether dependencies are installed

* clarify usage of torchrun in README

e9d429e1

25 Mar, 2024 1 commit
- Add vLLM FAQs to README (#1625) (#1633) · a97fde23
  Hailey Schoelkopf authored Mar 25, 2024
  
  a97fde23
15 Mar, 2024 1 commit
- Fix README section on vllm integration (#1579) · 7d9922c8
  Eitan Turok authored Mar 15, 2024
```
* Link to vllm integration

* add pip install .[vllm] cmd
```
  7d9922c8
01 Mar, 2024 1 commit

modify `WandbLogger` to accept arbitrary kwargs (#1491) · ae79b121

Baber Abbasi authored Mar 01, 2024

* make `WandbLogger` init args optional

* nit

* nit

* nit

* move import warning to `WandbLogger`

* nit

* update docs

* nit

ae79b121

22 Feb, 2024 1 commit

feat: Add Weights and Biases support (#1339) · 2683fbbb

Ayush Thakur authored Feb 23, 2024



* add wandb as extra dependency

* wandb metrics logging

* refactor

* log samples as tables

* fix linter

* refactor: put in a class

* change dir

* add panels

* log eval as table

* improve tables logging

* improve reports logging

* precommit run

* ruff check

* handle importing reports api gracefully

* ruff

* compare results

* minor pre-commit fixes

* build comparison report

* ruff check

* log results as artifacts

* remove comparison script

* update dependency

* type annotate and docstring

* add example

* update readme

* fix typo

* teardown

* handle outside wandb run

* gracefully fail reports creation

* precommit checks

* add report url to summary

* use wandb  printer for better url stdout

* fix ruff

* handle N/A and groups

* fix eval table

* remove unused var

* update wandb version req + disable reports stdout

* remove reports feature to TODO

* add label to multi-choice question data

* log model predictions

* lints

* loglikelihood_rolling

* log eval result for groups

* log tables by group for better handling

* precommit

* choices column for multi-choice

* graciously fail wandb

* remove reports feature

* track system metrics + total eval time + stdout

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

2683fbbb

06 Feb, 2024 2 commits

adding hf_transfer (#1400) · 756eeb6f

Michael Feil authored Feb 06, 2024



* add hf_transfer

* update dependencies

* Delete stale `[linting]` extra

* Update README.md with extras table

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

756eeb6f

Fix confusing `write_out.py` instructions in README (#1371) · df01adf6
Hailey Schoelkopf authored Feb 06, 2024

df01adf6

05 Feb, 2024 1 commit

Support for Inf2 optimum class [WIP] (#1364) · d17dcea0

Michael Feil authored Feb 05, 2024

* initial commit

* remove overwrite bs

* adding neuronx dependencies

* Update README.md

* update neuronx

d17dcea0

01 Feb, 2024 1 commit

Expand docs, update CITATION.bib (#1227) · f5408b6b

Hailey Schoelkopf authored Feb 01, 2024



* Update CITATION.bib

* Create CONTRIBUTING.md

* add disclaimer re: multi node

* flesh out some sections more

* Flesh out contributor guide

* revert CITATION.bib

* appease pre-commit

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

f5408b6b

31 Jan, 2024 1 commit

add bypass metric (#1156) · f8203de1

Baber Abbasi authored Feb 01, 2024

* add bypass metric

* fixed `bypass` metric.

* add task attributes if predict_only

* add `predict_only` checks

* add docs

* added `overide_metric`, `override_config` to `Task`

* nits

* nit

* changed --predict_only to generations; nits

* nits

* nits

* change gen_kwargs warning

* add note about `--predict_only` in README.md

* added `predict_only`

* move table to bottom

* nit

* change null aggregation to bypass (conflict)

* bugfix; default `temp=0.0`

* typo

f8203de1

26 Jan, 2024 1 commit

Add causalLM OpenVino models (#1290) · 97a67d27

NoushNabi authored Jan 26, 2024



* added intel optimum

* added intel optimum in readme

* modified intel optimum

* modified intel optimum

* modified intel optimum

* modified install optimum

* modified path of IR file

* added openvino_device

* added openvino_device2

* changed optimum-causal to openvino-causal

* Update README.md

* Update README.md

* remove `lm_eval.base` import

* update openvino-causal -> openvino ; pass device through super().__init__()

* Update README.md

* Add optimum to tests dependencies

* apply pre-commit

* fix so tests pass

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

97a67d27

25 Jan, 2024 1 commit
- Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (#1330) · 52f48e8c
  Hailey Schoelkopf authored Jan 25, 2024
```
* Update README.md

* [!Tip]
```
  52f48e8c
23 Jan, 2024 1 commit

Don't use `get_task_dict()` in task registration / initialization (#1331) · 969b48bf

Hailey Schoelkopf authored Jan 23, 2024



* don't use get_task_dict() as a helper, it will download the dataset!

* pre-commit

* Update README.md

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

969b48bf

22 Jan, 2024 2 commits

fix a trailing whitespace that breaks a lint job (#1335) · 84357a46
Brian Vaughan authored Jan 22, 2024

84357a46

Add `local-completions` support using OpenAI interface (#1277) · 5c25dd55

Michael Goin authored Jan 22, 2024



* Add `local-completions` support using OpenAI interface

* Refactor oa_completion

* Address tokenizer comments and change request chunks to batch size

* Add warning message for tiktoken backend

* fix formatting

* fix whitespace

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

5c25dd55

16 Jan, 2024 1 commit

Update README.md with custom integration doc (#1298) · ada4a31d

Mark Saroufim authored Jan 16, 2024



* Update README.md

* punctuation

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

ada4a31d

15 Jan, 2024 2 commits

Re-add citation · 39a465ca

Stella Biderman authored Jan 15, 2024

It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.

39a465ca

Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (#1261) · 39e7b264
Hailey Schoelkopf authored Jan 15, 2024
```
* Make parallelize=True distinction clearer in documentation.

* run linter
```
39e7b264

11 Jan, 2024 1 commit
- Update README.md · eed2d3a6
  Stella Biderman authored Jan 11, 2024
  
  eed2d3a6
08 Jan, 2024 1 commit

Revert citation (#1257) · ecb1df28

Stella Biderman authored Jan 08, 2024

Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.

ecb1df28

30 Dec, 2023 1 commit
- Update README.md (#1195) · 1229862a
  Anjor Kanekar authored Dec 30, 2023
  
  1229862a
23 Dec, 2023 1 commit
- Fix documentation in API table (#1203) · b12bb1d4
  Hailey Schoelkopf authored Dec 23, 2023
  
  b12bb1d4
22 Dec, 2023 2 commits

Upstream Mamba Support (`mamba_ssm`) (#1110) · 5503b274

Hailey Schoelkopf authored Dec 22, 2023

* modularize HFLM code

* pass through extra kwargs to AutoModel.from_pretrained call

* remove explicit model_kwargs

* rename gptq -> autogptq

* fix tokenizer pad token errors

* ensure model always respects device_map and autogptq's selected devices

* add a _get_config helper fn

* add mambaLMWrapper

* add mamba extra

* add mamba extra

* fix conditional import

* Fix botched merge commit

* Remove beginning-of-file comment for consistency

* Add docstring for mambaLM re: supported kwargs

* Alphabetize extras

* Update extras table

* appease precommit

* run precommit on mamba_lm

5503b274

Refer in README to main branch (#1200) · 25cefbc1
Bram Vanroy authored Dec 22, 2023

25cefbc1

21 Dec, 2023 3 commits
- Update README.md (#1181) · 9267354e
  Anjor Kanekar authored Dec 21, 2023
```
* Update README.md

Add a not about running on apple arm gpus

* Update README.md

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
```
  9267354e
- update Zeno example and reference in README (#1190) · 84790e99
  Alex Bäuerle authored Dec 21, 2023
  
  84790e99
- Update README.md (#1184) · e548d94d
  Anjor Kanekar authored Dec 21, 2023
  
  e548d94d
20 Dec, 2023 2 commits

Implementing local OpenAI API-style chat completions on any given inference server (#1174) · fcfc0c60

Vicki Boykis authored Dec 20, 2023

* LocalChatCompletionsLM add

* clean up completions class

* clean up completions class

* update tokens

* README

* fix constructor

* eos token

* folding local-chat-completions into OpenAIChatCompletions

* refactoring to include gen_kwargs as passable option

* add todo on chat completion kwarg validation

* Ruff and README fix

* generalize to **kwargs

* remove unnecessary kwargs

* README and remove kwargs

* README

fcfc0c60

Switch Linting to `ruff` (#1166) · 65b8761d

Baber Abbasi authored Dec 20, 2023

* add ruff and isort. remove black and flake8

* remove unnecessary dependencies

* remove dependency from table

* change order

* ran ruff

* check 3.9

* exclude evaluator

* update CI workflow

* use ruff config in pyproject.toml

* test

* add isort rules to ruff

* sort imports

* import `make_table`

* try stages for no-commit-to-branch

* turn on mypy for pre-commit

* test

* test

* test

* change no-commit-to-branch to default

* nits

* fixed dependency

65b8761d