Commits · 4eecbabb78c96941d011aecc44adcddc8a672736 · gaoqiong / lm-evaluation-harness

13 Sep, 2024 1 commit

Multimodal prototyping (#2243) · fb963f0f

Lintang Sutawika authored Sep 13, 2024



* add WIP hf vlm class

* add doc_to_image

* add mmmu tasks

* fix merge conflicts

* add lintang's changes to hf_vlms.py

* fix doc_to_image

* added yaml_path for config-loading

* revert

* add line to process str type v

* update

* modeling cleanup

* add aggregation for mmmu

* rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP)

* implemented doc_to_image

* update doc_to_image to accept list of features

* update functions

* readd image processed

* update args process

* bugfix for repeated images fed to model

* push WIP loglikelihood code

* commit most recent code (generative ; qwen2-vl testing)

* preliminary image_token_id handling

* small mmmu update: some qs have >4 mcqa options

* push updated modeling code

* use processor.apply_chat_template

* add mathvista draft

* nit

* nit

* ensure no footguns in text<>multimodal LM<>task incompatibility

* add notification to readme regarding launch of prototype!

* fix compatibility check

* reorganize mmmu configs

* chat_template=None

* add interleave chat_template

* add condition

* add max_images; interleave=true

* nit

* testmini_mcq

* nit

* pass image string; convert img

* add vllm

* add init

* vlm add multi attr

* fixup

* pass max images to vllm model init

* nit

* encoding to device

* fix HFMultimodalLM.chat_template ?

* add mmmu readme

* remove erroneous prints

* use HFMultimodalLM.chat_template ; restore tasks/__init__.py

* add docstring for replace_placeholders in utils

* fix `replace_placeholders`; set image_string=None

* fix typo

* cleanup + fix merge conflicts

* update MMMU readme

* del mathvista

* add some sample scores

* Update README.md

* add log msg for image_string value

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Baber Abbasi <baber@eleuther.ai>
Co-authored-by: Baber <baber@hey.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

fb963f0f

15 Aug, 2024 1 commit
- Update citation in README.md (#2083) · cbdc3539
  Anton Polishko authored Aug 15, 2024
```
Bumped citation to the v0.4.3
```
  cbdc3539
05 Aug, 2024 2 commits

Update README.md (#2186) · cddce0a1
Hailey Schoelkopf authored Aug 05, 2024

cddce0a1

Dp and mp support (#2056) · 0ce7734d

Nathan Habib authored Aug 05, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* linting

* add doc

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/huggingface.py

* linter

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* style

* remove prepare

* fix

* style

* last check

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: clementine@huggingface.co <clementine@huggingface.co>

0ce7734d

29 Jul, 2024 1 commit

bugfix and docs for API (#2139) · b70af4f5

Baber Abbasi authored Jul 29, 2024



* encoding bugfix

* encoding bugfix

* overload logliklehood rather than loglikehood_tokens

* add custom tokenizer

* add docs

* Update API_guide.md

fix link; add note

* Update API_guide.md

typo

* pre-commit

* add link in readme

* nit

* nit

* nit

* Update API_guide.md

nits

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update README.md

* Update docs/API_guide.md

* Update docs/API_guide.md

* Update API_guide.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

b70af4f5

22 Jul, 2024 1 commit

Refactor API models (#2008) · 42dc2448

Baber Abbasi authored Jul 23, 2024



* refactor pad_token handling to fn

* fix docs

* add pad_token_handling to vllm

* start on API superclass

* don't detokenize the returned logits

* streamline vllm tokenizer

* add type hint

* pre-commit

* seems to be in working order

* add model to init

* refactor api models

* nit

* cleanup

* add pbar

* fix type hints

* change optional dependencies

* json encode chat template

* add type hints

* deal with different prompt input requiremnts

* nits

* fix

* cache inside async

* fix

* fix

* nits

* nits

* nits

* nit

* fixup

* fixup

* nit

* add dummy retry

* add dummy retry

* handle imports; skip failing test

* add type hint

* add tests

* add dependency to tests

* add package names to exception

* nit

* docs; type hints

* handle api key

* nit

* tokenizer bug

* fix tokenizer

* nit

* nit

* add better error messages

* nit

* remove decorator

* CI: install api dep

* revert evaluator.py

* consolidate

* consolidate

* nits

* nit

* fix typealias

* nit

* nit

* nit

* Update lm_eval/models/api_models.py

typo
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/anthropic_llms.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/api_models.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix typo

* add news section

* add info for API

* pre-commit

* typo

* fix bug: unpack logliklehood requests

* fix bug: shared gen_kwargs mutated

* nit: handle copy properly

* Update README.md

* Update README.md

* Update README.md

* Update api_models.py

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

42dc2448

08 Jul, 2024 1 commit

Easier unitxt tasks loading and removal of unitxt library dependancy (#1933) · ad80f555

Elron Bandel authored Jul 08, 2024



* Updated unitxt loading
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

* Revert change to general Readme
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

* Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>

* Fix scrolls
Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Update documentation
Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Enforce backward compatability
Signed-off-by: elronbandel <elron.bandel@ibm.com>

* Format unitxt class
Signed-off-by: elronbandel <elron.bandel@ibm.com>

---------
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: elronbandel <elron.bandel@ibm.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

ad80f555

03 Jul, 2024 1 commit

Adds Open LLM Leaderboard Taks (#2047) · 3c8db1bb

Nathan Habib authored Jul 03, 2024



* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

3c8db1bb

03 Jun, 2024 1 commit

Complete task list from pr 1727 (#1901) · 3e500e9d

anthony-dipofi authored Jun 03, 2024



* added tasks and task family descriptors

* continue work on task list w/ links; slightly reorganize README

* Apply suggestions from code review

* Rename file so that it'll preview in Github when viewing lm_eval/tasks folder

* Update new_task_guide.md

* Update README.md

* run linter

* Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs

* fix typo

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* apply format

---------
Co-authored-by: Harish Vadaparty <harishvadaparty@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

3e500e9d

31 May, 2024 1 commit

Add dataset card when pushing to HF hub (#1898) · f4f59251

KonradSzafer authored May 31, 2024



* dataset card initial

* few fixes

* adds groups for math, mmlu, gpqa

* added summary agrs

* moved sanitize_list to utils

* readme update

* recreate metadata moved

* multiple model support

* results latest split fix

* readme update and small refactor

* fix grouping

* add comments

* added pathlib

* corrected pathlib approach

* check whether to create a metadata card

* convert posix paths to str

* default hf org from token

* hf token value error

* Add logs after successful upload

* logging updates

* dataset card example in the readme

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Alina Lozovskaia <alinailozovskaya@gmail.com>

f4f59251

07 May, 2024 2 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt



The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unitxt]'
2. Added a check that unitxt is installed and print a clear error message if not

* Commited missing pyproject change

* Added documentation on adding datasets

* More doc changes

* add unitxt extra to readme

* run precommit

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

885f48d6

link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b

05 May, 2024 1 commit
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
03 May, 2024 1 commit

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

25 Apr, 2024 1 commit
- reference `--tasks list` in README (#1726) · 80a056bb
  Brian Vaughan authored Apr 25, 2024
```
https://github.com/EleutherAI/lm-evaluation-harness/issues/1698
```
  80a056bb
16 Apr, 2024 2 commits

Add `neuralmagic` models for `sparseml` and `deepsparse` (#1674) · 8b326be7

Michael Goin authored Apr 16, 2024



* Add neuralmagic models for SparseML and DeepSparse

* Update to latest and add test

* Format

* Fix list to List

* Format

* Add deepsparse/sparseml to automated testing

* Update pyproject.toml

* Update pyproject.toml

* Update README

* Fixes for dtype and device

* Format

* Fix test

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Address review comments!

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8b326be7

Add delta weights model loading (#1712) · 12a165d1

KonradSzafer authored Apr 16, 2024

* added delta weights

* removed debug

* readme update

* better error handling

* autogptq warn

* warn update

* peft and delta error, explicitly deleting _model_delta

* linter fix

12a165d1

08 Apr, 2024 1 commit
- Update README.md (#1680) · 7852985b
  Hailey Schoelkopf authored Apr 08, 2024
  
  7852985b
05 Apr, 2024 1 commit

Anthropic Chat API (#1594) · 27924d77

Seungwoo Ryu authored Apr 06, 2024



* claude3

* supply for anthropic claude3

* supply for anthropic claude3

* anthropic config changes

* add callback options on anthropic

* line passed

* claude3 tiny change

* help anthropic installation

* mention sysprompt / being careful with format in readme

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

27924d77

26 Mar, 2024 1 commit

Integration of NeMo models into LM Evaluation Harness library (#1598) · e9d429e1

Sergio Perez authored Mar 26, 2024

* Integration of NeMo models into LM Evaluation Harness library

* rename nemo model as nemo_lm

* move nemo section in readme after hf section

* use self.eot_token_id in get_until()

* improve progress bar showing loglikelihood requests

* data replication or tensor/pipeline replication working fine within one node

* run pre-commit on modified files

* check whether dependencies are installed

* clarify usage of torchrun in README

e9d429e1

25 Mar, 2024 1 commit
- Add vLLM FAQs to README (#1625) (#1633) · a97fde23
  Hailey Schoelkopf authored Mar 25, 2024
  
  a97fde23
15 Mar, 2024 1 commit
- Fix README section on vllm integration (#1579) · 7d9922c8
  Eitan Turok authored Mar 15, 2024
```
* Link to vllm integration

* add pip install .[vllm] cmd
```
  7d9922c8
01 Mar, 2024 1 commit

modify `WandbLogger` to accept arbitrary kwargs (#1491) · ae79b121

Baber Abbasi authored Mar 01, 2024

* make `WandbLogger` init args optional

* nit

* nit

* nit

* move import warning to `WandbLogger`

* nit

* update docs

* nit

ae79b121

22 Feb, 2024 1 commit

feat: Add Weights and Biases support (#1339) · 2683fbbb

Ayush Thakur authored Feb 23, 2024



* add wandb as extra dependency

* wandb metrics logging

* refactor

* log samples as tables

* fix linter

* refactor: put in a class

* change dir

* add panels

* log eval as table

* improve tables logging

* improve reports logging

* precommit run

* ruff check

* handle importing reports api gracefully

* ruff

* compare results

* minor pre-commit fixes

* build comparison report

* ruff check

* log results as artifacts

* remove comparison script

* update dependency

* type annotate and docstring

* add example

* update readme

* fix typo

* teardown

* handle outside wandb run

* gracefully fail reports creation

* precommit checks

* add report url to summary

* use wandb  printer for better url stdout

* fix ruff

* handle N/A and groups

* fix eval table

* remove unused var

* update wandb version req + disable reports stdout

* remove reports feature to TODO

* add label to multi-choice question data

* log model predictions

* lints

* loglikelihood_rolling

* log eval result for groups

* log tables by group for better handling

* precommit

* choices column for multi-choice

* graciously fail wandb

* remove reports feature

* track system metrics + total eval time + stdout

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

2683fbbb

06 Feb, 2024 2 commits

adding hf_transfer (#1400) · 756eeb6f

Michael Feil authored Feb 06, 2024



* add hf_transfer

* update dependencies

* Delete stale `[linting]` extra

* Update README.md with extras table

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

756eeb6f

Fix confusing `write_out.py` instructions in README (#1371) · df01adf6
Hailey Schoelkopf authored Feb 06, 2024

df01adf6

05 Feb, 2024 1 commit

Support for Inf2 optimum class [WIP] (#1364) · d17dcea0

Michael Feil authored Feb 05, 2024

* initial commit

* remove overwrite bs

* adding neuronx dependencies

* Update README.md

* update neuronx

d17dcea0

01 Feb, 2024 1 commit

Expand docs, update CITATION.bib (#1227) · f5408b6b

Hailey Schoelkopf authored Feb 01, 2024



* Update CITATION.bib

* Create CONTRIBUTING.md

* add disclaimer re: multi node

* flesh out some sections more

* Flesh out contributor guide

* revert CITATION.bib

* appease pre-commit

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

f5408b6b

31 Jan, 2024 1 commit

add bypass metric (#1156) · f8203de1

Baber Abbasi authored Feb 01, 2024

* add bypass metric

* fixed `bypass` metric.

* add task attributes if predict_only

* add `predict_only` checks

* add docs

* added `overide_metric`, `override_config` to `Task`

* nits

* nit

* changed --predict_only to generations; nits

* nits

* nits

* change gen_kwargs warning

* add note about `--predict_only` in README.md

* added `predict_only`

* move table to bottom

* nit

* change null aggregation to bypass (conflict)

* bugfix; default `temp=0.0`

* typo

f8203de1

26 Jan, 2024 1 commit

Add causalLM OpenVino models (#1290) · 97a67d27

NoushNabi authored Jan 26, 2024



* added intel optimum

* added intel optimum in readme

* modified intel optimum

* modified intel optimum

* modified intel optimum

* modified install optimum

* modified path of IR file

* added openvino_device

* added openvino_device2

* changed optimum-causal to openvino-causal

* Update README.md

* Update README.md

* remove `lm_eval.base` import

* update openvino-causal -> openvino ; pass device through super().__init__()

* Update README.md

* Add optimum to tests dependencies

* apply pre-commit

* fix so tests pass

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

97a67d27

25 Jan, 2024 1 commit
- Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (#1330) · 52f48e8c
  Hailey Schoelkopf authored Jan 25, 2024
```
* Update README.md

* [!Tip]
```
  52f48e8c
23 Jan, 2024 1 commit

Don't use `get_task_dict()` in task registration / initialization (#1331) · 969b48bf

Hailey Schoelkopf authored Jan 23, 2024



* don't use get_task_dict() as a helper, it will download the dataset!

* pre-commit

* Update README.md

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>

969b48bf

22 Jan, 2024 2 commits

fix a trailing whitespace that breaks a lint job (#1335) · 84357a46
Brian Vaughan authored Jan 22, 2024

84357a46

Add `local-completions` support using OpenAI interface (#1277) · 5c25dd55

Michael Goin authored Jan 22, 2024



* Add `local-completions` support using OpenAI interface

* Refactor oa_completion

* Address tokenizer comments and change request chunks to batch size

* Add warning message for tiktoken backend

* fix formatting

* fix whitespace

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

5c25dd55

16 Jan, 2024 1 commit

Update README.md with custom integration doc (#1298) · ada4a31d

Mark Saroufim authored Jan 16, 2024



* Update README.md

* punctuation

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

ada4a31d

15 Jan, 2024 2 commits

Re-add citation · 39a465ca

Stella Biderman authored Jan 15, 2024

It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.

39a465ca

Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (#1261) · 39e7b264
Hailey Schoelkopf authored Jan 15, 2024
```
* Make parallelize=True distinction clearer in documentation.

* run linter
```
39e7b264

11 Jan, 2024 1 commit
- Update README.md · eed2d3a6
  Stella Biderman authored Jan 11, 2024
  
  eed2d3a6
08 Jan, 2024 1 commit

Revert citation (#1257) · ecb1df28

Stella Biderman authored Jan 08, 2024

Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.

ecb1df28

30 Dec, 2023 1 commit
- Update README.md (#1195) · 1229862a
  Anjor Kanekar authored Dec 30, 2023
  
  1229862a