Commits · 4eecbabb78c96941d011aecc44adcddc8a672736 · gaoqiong / lm-evaluation-harness

13 Sep, 2024 1 commit

Multimodal prototyping (#2243) · fb963f0f

Lintang Sutawika authored Sep 13, 2024



* add WIP hf vlm class

* add doc_to_image

* add mmmu tasks

* fix merge conflicts

* add lintang's changes to hf_vlms.py

* fix doc_to_image

* added yaml_path for config-loading

* revert

* add line to process str type v

* update

* modeling cleanup

* add aggregation for mmmu

* rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP)

* implemented doc_to_image

* update doc_to_image to accept list of features

* update functions

* readd image processed

* update args process

* bugfix for repeated images fed to model

* push WIP loglikelihood code

* commit most recent code (generative ; qwen2-vl testing)

* preliminary image_token_id handling

* small mmmu update: some qs have >4 mcqa options

* push updated modeling code

* use processor.apply_chat_template

* add mathvista draft

* nit

* nit

* ensure no footguns in text<>multimodal LM<>task incompatibility

* add notification to readme regarding launch of prototype!

* fix compatibility check

* reorganize mmmu configs

* chat_template=None

* add interleave chat_template

* add condition

* add max_images; interleave=true

* nit

* testmini_mcq

* nit

* pass image string; convert img

* add vllm

* add init

* vlm add multi attr

* fixup

* pass max images to vllm model init

* nit

* encoding to device

* fix HFMultimodalLM.chat_template ?

* add mmmu readme

* remove erroneous prints

* use HFMultimodalLM.chat_template ; restore tasks/__init__.py

* add docstring for replace_placeholders in utils

* fix `replace_placeholders`; set image_string=None

* fix typo

* cleanup + fix merge conflicts

* update MMMU readme

* del mathvista

* add some sample scores

* Update README.md

* add log msg for image_string value

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Baber Abbasi <baber@eleuther.ai>
Co-authored-by: Baber <baber@hey.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

fb963f0f

04 Sep, 2024 1 commit

Chat Template fix (cont. #2235) (#2269) · 7a1614eb

Baber Abbasi authored Sep 04, 2024



* default chat template method fix

* move chat_template to TemplateLM

* remove hotfix

* handle openai `chat_template`

* Update lm_eval/api/model.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* add 'max_tokens' to gen_kwargs

* pre-commit

---------
Co-authored-by: KonradSzafer <szafer.konrad@gmail.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

7a1614eb

30 Aug, 2024 2 commits

hotfix #2262 (#2264) · 928e8bb6

Baber Abbasi authored Aug 30, 2024

* max_length - 1 (generation always >= 1)

* vllm: fix rolling prefix_token

* nit: add comment

* fixup! max_length should be handled for logliklihoods

* Revert "fixup! max_length should be handled for logliklihoods"

This reverts commit 432d1a3b754c117c3a54ea2fe792ab3a1bd09ed3.

928e8bb6

API: fix maxlen; vllm: prefix_token_id bug (#2262) · b31f92e8

Baber Abbasi authored Aug 30, 2024

* max_length - 1 (generation always >= 1)

* vllm: fix rolling prefix_token

* nit: add comment

* fixup! max_length should be handled for logliklihoods

b31f92e8

28 Aug, 2024 1 commit

Fix `loglikelihood_rolling` caching ( #1821 ) (#2187) · 8138fd52

Hailey Schoelkopf authored Aug 28, 2024



* fix revision type

* allow for None-input loglikelihood reqs to be cached

* handle no remaining cache items

* pre-commit

* change cache_hook.add_partial(loglikelihood_rolling...) convention

---------
Co-authored-by: Baber Abbasi <baber@eleuther.ai>

8138fd52

22 Aug, 2024 1 commit
- Fix logging when resizing embedding layer in peft mode (#2239) · e9287fce
  Wessel Poelman authored Aug 22, 2024
  
  e9287fce
20 Aug, 2024 1 commit

Add multiple chat template (#2129) · 3740a5d2

KonradSzafer authored Aug 20, 2024



* multiple chat template support

* help doc update

* add transformers link to docstring

* model args update

* comment update

* statement simplification

* simplified chat_template property

* docs update

* removed template arg from HFLM class

* interface doc update

* model guide update

* interface doc update

* reuse apply_chat_template variable

* model guide refactor

* interface doc update

* removed old definition

* last nits

* last nits

* last nits

* better wording

* last nits

* Remove unnecessary Optional

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* return variable rename

---------
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

3740a5d2

05 Aug, 2024 3 commits

fix revision type (#2184) · 7ff13e9e
Hailey Schoelkopf authored Aug 05, 2024

7ff13e9e
[hotfix] API: messages were created twice (#2174) · 8cffa29b
Baber Abbasi authored Aug 05, 2024

8cffa29b

Dp and mp support (#2056) · 0ce7734d

Nathan Habib authored Aug 05, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* linting

* add doc

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/huggingface.py

* linter

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* style

* remove prepare

* fix

* style

* last check

* Update lm_eval/models/huggingface.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: clementine@huggingface.co <clementine@huggingface.co>

0ce7734d

01 Aug, 2024 1 commit
- [Bugfix] add temperature=0 to logprobs and seed args to API models (#2149) · 63e76e89
  Baber Abbasi authored Aug 01, 2024
```
* add temperature for log probs

* add seed

* nit

* add new args to test

* added warning for api chat models
```
  63e76e89
29 Jul, 2024 1 commit

bugfix and docs for API (#2139) · b70af4f5

Baber Abbasi authored Jul 29, 2024



* encoding bugfix

* encoding bugfix

* overload logliklehood rather than loglikehood_tokens

* add custom tokenizer

* add docs

* Update API_guide.md

fix link; add note

* Update API_guide.md

typo

* pre-commit

* add link in readme

* nit

* nit

* nit

* Update API_guide.md

nits

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update API_guide.md

* Update README.md

* Update docs/API_guide.md

* Update docs/API_guide.md

* Update API_guide.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

b70af4f5

22 Jul, 2024 1 commit

Refactor API models (#2008) · 42dc2448

Baber Abbasi authored Jul 23, 2024



* refactor pad_token handling to fn

* fix docs

* add pad_token_handling to vllm

* start on API superclass

* don't detokenize the returned logits

* streamline vllm tokenizer

* add type hint

* pre-commit

* seems to be in working order

* add model to init

* refactor api models

* nit

* cleanup

* add pbar

* fix type hints

* change optional dependencies

* json encode chat template

* add type hints

* deal with different prompt input requiremnts

* nits

* fix

* cache inside async

* fix

* fix

* nits

* nits

* nits

* nit

* fixup

* fixup

* nit

* add dummy retry

* add dummy retry

* handle imports; skip failing test

* add type hint

* add tests

* add dependency to tests

* add package names to exception

* nit

* docs; type hints

* handle api key

* nit

* tokenizer bug

* fix tokenizer

* nit

* nit

* add better error messages

* nit

* remove decorator

* CI: install api dep

* revert evaluator.py

* consolidate

* consolidate

* nits

* nit

* fix typealias

* nit

* nit

* nit

* Update lm_eval/models/api_models.py

typo
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/openai_completions.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/anthropic_llms.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/models/api_models.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix typo

* add news section

* add info for API

* pre-commit

* typo

* fix bug: unpack logliklehood requests

* fix bug: shared gen_kwargs mutated

* nit: handle copy properly

* Update README.md

* Update README.md

* Update README.md

* Update api_models.py

* Update README.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

42dc2448

15 Jul, 2024 1 commit
- make recurrent_gemma model types included in the force-BOS case (#2105) · 9884ad6e
  Hailey Schoelkopf authored Jul 15, 2024
  
  9884ad6e
02 Jul, 2024 1 commit
- update gemma-2 default BOS behavior (#2049) · 67a990e7
  Hailey Schoelkopf authored Jul 01, 2024
  
  67a990e7
28 Jun, 2024 1 commit

Add chat template to `vllm` (#2034) · cc2d3463

Baber Abbasi authored Jun 28, 2024



* add chat template

* refactor token padding

* nit

* nit

* check on failing test

* check transformers version

* remove transformers pin

* add ids to test

* nit

* fixup

* fix bos bug

* nit

* fixup! fix bos bug

* increase tolerance for table test

* don't detokenize vllm logprobs

* Update lm_eval/models/utils.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* pre-commit run --all-files

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

cc2d3463

18 Jun, 2024 1 commit
- Fix self assignment in neuron_optimum.py (#1990) · bdb78d22
  LSinev authored Jun 18, 2024
  
  bdb78d22
13 Jun, 2024 2 commits

Fix `--gen_kwargs` and VLLM (`temperature` not respected) (#1800) · 5c7cba23
Hailey Schoelkopf authored Jun 13, 2024
```
* Update vllm_causallms.py

* adjust

---------
Co-authored-by: lintangsutawika <lintang@eleuther.ai>
```
5c7cba23

`samples` is newline delimited (#1930) · 3850e21a

Baber Abbasi authored Jun 13, 2024



* `samples` is newline delimited

* updated git and pre-commit

* appease pre-commit

* nit

* Revert back for now

* Revert for now

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

3850e21a

12 Jun, 2024 1 commit
- Fix self.max_tokens in anthropic_llms.py (#1848) · 793469e0
  Nikita Lozhnikov authored Jun 12, 2024
```
Fix bug where `self.max_tokens` was not set
```
  793469e0
11 Jun, 2024 1 commit
- add hacky add_bos_token forcing for Gemma to VLLM too (#1857) · b3e4c49a
  Hailey Schoelkopf authored Jun 11, 2024
  
  b3e4c49a
03 Jun, 2024 1 commit

Add chat template (#1873) · 070d31df

KonradSzafer authored Jun 03, 2024



* initial chat template

* tokenizer attribute check

* variable rename

* interface update

* system instruction

* system inst default update

* fewshot as multiturn

* typing update

* indent update

* added comments

* Adding a fewshot in a more readable way

* linting

* Moved apply chat template to LM

* multiturn alternation fix

* cache key update

* apply chat template method fix

* add system prompt hash to cache_key

* tokenizer name property for cache_key

* property name fix

* linting backward compatibility fix

* docs and errors update

* add documentation on adding chat template compatibility to model_guide

* fewshot as multiturn check fix

* saving system inst and chat template in results

* eval tracker update

* docs update

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

070d31df

30 May, 2024 1 commit

[HFLM]Add support for Ascend NPU (#1886) · 8f716817

Huazhong Ji authored May 31, 2024



* [HFLM]Add support for Ascend NPU
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
Co-authored-by: zhabuye <2947436155@qq.com>

* bump accelerate dependency version to 0.26.0 for NPU compat.

---------
Co-authored-by: jiaqiw09 <jiaqiw960714@gmail.com>
Co-authored-by: zhabuye <2947436155@qq.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8f716817

28 May, 2024 1 commit
- Updated vllm imports in vllm_causallms.py (#1890) · b4cd85d4
  Michael Goin authored May 28, 2024
```
* Reorder vllm imports in vllm_causallms.py

* Update vllm_causallms.py
```
  b4cd85d4
24 May, 2024 1 commit
- [HFLM]Use Accelerate's API to reduce hard-coded CUDA code (#1880) · c4c15917
  Huazhong Ji authored May 24, 2024
  
  c4c15917
23 May, 2024 1 commit
- Unpin vllm in dependencies (#1874) · 5711ab87
  Edward Gan authored May 23, 2024
  
  5711ab87
19 May, 2024 1 commit

Fix: support PEFT/LoRA with added tokens (#1828) · 86319a9b

Nick Doiron authored May 19, 2024



* resize model embeddings

* resize only

* tokenizer help

* load tokenizer before model

* add comment and run precommit lint

* Add log message
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

86319a9b

07 May, 2024 2 commits
- Logging Updates (Alphabetize table printouts, fix eval tracker bug) (#1774) (#1791) · d4a913c4
  Hailey Schoelkopf authored May 07, 2024
```
* fix auto-batch size bug for seq2seq models

* alphabetize task + group tables ; fix eval tracker bug

* fix eval tracker bug
```
  d4a913c4
- Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775) · 7fe2b93c
  Hailey Schoelkopf authored May 07, 2024
  
  7fe2b93c
05 May, 2024 2 commits
- Fix bug in setting until kwarg in openai completions (#1784) · 30c060d2
  ciaranby authored May 05, 2024
  
  30c060d2
- remove echo parameter in OpenAI completions API (#1779) · c34986da
  kwrobel.eth authored May 05, 2024
```
* remove echo parameter in OpenAI completions API

* remove context length parameter doc string
```
  c34986da
03 May, 2024 1 commit

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

02 May, 2024 2 commits
- Add option to set OpenVINO config (#1730) · e6394715
  Helena Kloosterman authored May 02, 2024
```
* Add option to set OpenVINO config

* Use utils.eval_logger for logging
```
  e6394715
- vllm lora support (#1756) · 83fd78a2
  bcicc authored May 02, 2024
```
* vllm lora support

* remove print

* version check, rename lora kwarg
```
  83fd78a2
18 Apr, 2024 1 commit
- fix error when appending eot_token_id for generate_until tasks (#1699) · dc5eba86
  Sergio Perez authored Apr 18, 2024
  
  dc5eba86
16 Apr, 2024 2 commits

Add `neuralmagic` models for `sparseml` and `deepsparse` (#1674) · 8b326be7

Michael Goin authored Apr 16, 2024



* Add neuralmagic models for SparseML and DeepSparse

* Update to latest and add test

* Format

* Fix list to List

* Format

* Add deepsparse/sparseml to automated testing

* Update pyproject.toml

* Update pyproject.toml

* Update README

* Fixes for dtype and device

* Format

* Fix test

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Address review comments!

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8b326be7

Add delta weights model loading (#1712) · 12a165d1

KonradSzafer authored Apr 16, 2024

* added delta weights

* removed debug

* readme update

* better error handling

* autogptq warn

* warn update

* peft and delta error, explicitly deleting _model_delta

* linter fix

12a165d1

05 Apr, 2024 1 commit

Anthropic Chat API (#1594) · 27924d77

Seungwoo Ryu authored Apr 06, 2024



* claude3

* supply for anthropic claude3

* supply for anthropic claude3

* anthropic config changes

* add callback options on anthropic

* line passed

* claude3 tiny change

* help anthropic installation

* mention sysprompt / being careful with format in readme

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

27924d77

01 Apr, 2024 1 commit

Fix CLI --batch_size arg for openai-completions/local-completions (#1656) · 9516087b

Michael Goin authored Apr 01, 2024

The OpenAI interface supports batch size as an argument to the completions API, but does not seem to support specification of this on the CLI i.e. `lm_eval --model openai-completions --batch_size 16 ...` because of a simple lack of str->int conversion.

This is confirmed by my usage and stacktrace from running `OPENAI_API_KEY=dummy lm_eval --model local-completions --tasks gsm8k --batch_size 16 --model_args model=nm-
testing/zephyr-beta-7b-gptq-g128,tokenizer_backend=huggingface,base_url=http://localhost:8000/v1`:
```
Traceback (most recent call last):
  File "/home/michael/venv/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/michael/code/lm-evaluation-harness/lm_eval/__main__.py", line 341, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 251, in simple_evaluate
    results = evaluate(
  File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
  File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 390, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 263, in generate_until
    list(sameuntil_chunks(re_ord.get_reordered(), self.batch_size)),
  File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 251, in sameuntil_chunks
    if len(ret) >= size or x[1] != lastuntil:
TypeError: '>=' not supported between instances of 'int' and 'str'
```

9516087b

27 Mar, 2024 1 commit
- Fix conditional import for Nemo LM class (#1641) · 0dffdbb4
  Hailey Schoelkopf authored Mar 27, 2024
  
  0dffdbb4