Commits · v0.4.7 · gaoqiong / lm-evaluation-harness

17 Dec, 2024 1 commit
- increment version (#2574) · 4c26a9c1
  Baber Abbasi authored Dec 17, 2024
```
forgot to increment 0.4.6!
```
  4c26a9c1
16 Dec, 2024 3 commits

fix `DeprecationWarning: invalid escape sequence '\s'` for whitespace filter (#2560) · 8d2f64c1

Baber Abbasi authored Dec 16, 2024

* fix `DeprecationWarning: invalid escape sequence '\s'`

* add type hints

* Revert "add type hints"

This reverts commit 15d8abc626a84e97f8c238ddfbf9e243d6f6eb5c.

8d2f64c1

batch `loglikelihood_rolling` across requests (#2559) · 0bfb0220

Baber Abbasi authored Dec 16, 2024

* batch all rolling token windows

* nit

* copy to vllm

* fix max_length for `get_rolling_token_windows`

* bugfix

* bugfix

* add type hints

0bfb0220

Adding new subtask to SCORE tasks: non greedy robustness (#2558) · 976d8a0b

Rima Shahbazyan authored Dec 16, 2024

* score readme added

* generate until task's "until" parameter's default value fixed.

* score mmlu-pro and agieval added

* changed macro accuracy to micro for agieval

* Always E removed from agi eval

* redundancies removed

* MATH added

* minor cosmetic changes for math

* Licenses added Readme updated

* changes for flake8 + license header on math

* Score added to readme and precommit was run.

* Score added to readme and precommit was run.

* Import error fixed

* math task bugfix
postprocess minor fix

* CR for math added

* math CR

* math task bugfix
postprocess minor fix

CR for math added

* Math cr fixed

* mmlu_pro non_greedy task added

* non greedy summarizer added

* Non greedy for all score tasks

* Bugfixes for non-greedy

* fixing the until argument

* undoing the change to "until" arguments default behaviour

* minor fix in summarizer

* log naming changes for better readability

* math subtasks naming fix

* agieval subtask naming fix

* logging added for debugging

* path issue fixed

* minor fix

* path fix

* path fix

* non_greedy_math minor fix

* final changes

* changed readme for non-greedy
added Nvidia header
added wxample script for non_greedy
changed prompts to match that fo trt runs

* non greedy summarizer bugfix

* non_greedy summarizer fixed

976d8a0b

14 Dec, 2024 1 commit
- add warning to readme (#2568) · 8de772f9
  Baber Abbasi authored Dec 14, 2024
```
* make warning prominent

* make warning prominent
```
  8de772f9
13 Dec, 2024 1 commit

add optimum-intel ipex model (#2566) · 919470a1

Yao Matrix authored Dec 14, 2024



* initial support for optimum-intel ipex model. LM model as first step

* format
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* pass dtype
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* update README
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

919470a1

09 Dec, 2024 2 commits

Update Lightning import (#2549) · 0b994433

Maanu Grover authored Dec 09, 2024



* update import
Signed-off-by: Maanu Grover <maanug@nvidia.com>

* run formatting

---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>

0b994433

[API] left truncate for generate_until (#2554) · 2d11f2e5
Baber Abbasi authored Dec 09, 2024
```
* left truncate for generate_until

* pre-commit
```
2d11f2e5

05 Dec, 2024 1 commit
- Update README.md (#2546) · bcb4cbf4
  fzyzcjy authored Dec 05, 2024
  
  bcb4cbf4
04 Dec, 2024 3 commits
- Support pipeline parallel with OpenVINO models (#2349) · 1f9bc88f
  Slawomir Strehlke authored Dec 04, 2024
```
* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
```
  1f9bc88f
- add better testing when both doc_to_text ends in and target_delimiter are whitespaces (#2535) · 6824d39d
  Baber Abbasi authored Dec 04, 2024
  
  6824d39d
- Update README.md (#2534) · 4a12959f
  Baber Abbasi authored Dec 04, 2024
```
* Update README.md

add caching tip to readme

* Update README.md

add api link
```
  4a12959f
03 Dec, 2024 2 commits
- avoid timeout errors with high concurrency in api_model (#2307) · 9632b343
  Trawinski, Dariusz authored Dec 03, 2024
```
* avoid timeout errors with high concurrency in api_model

* style

* add timeout

* add docs

---------
Co-authored-by: Baber <baber@hey.com>
```
  9632b343
- add Basque translation of PIQA (piqa_eu) to BasqueBench (#2531) · f49b0377
  Naiara Perez authored Dec 03, 2024
  
  f49b0377
01 Dec, 2024 1 commit

Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e

Yoav Katz authored Dec 01, 2024


Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)

* Moved to require unitxt installation and not download unitxt from HF hub.

This has performance benefits and simplifies the code.
Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated watsonx documentation

* Updated installation instructions

* Removed redundant comman

* Allowed unitxt tasks to generate chat APIs

Modified WatsonXI model to support chat apis

* Removed print

* Run precommit formatting

---------
Signed-off-by: Yoav Katz <katz@il.ibm.com>

1170ef9e

30 Nov, 2024 1 commit
- make utility function to handle `until` (#2518) · 0230356c
  Baber Abbasi authored Nov 30, 2024
```
* make utility function to handle `until`

* fix text
```
  0230356c
29 Nov, 2024 1 commit
- skip casting if predict_only (#2524) · 9169899b
  Baber Abbasi authored Nov 29, 2024
  
  9169899b
28 Nov, 2024 1 commit

Filters bugfix; add `metrics` and `filter` to logged sample (#2517) · 5680a2e6

Baber Abbasi authored Nov 28, 2024

* allow !function filters

* bugfix

* nit

* add `filter` to logged samples

* add `filter` and `metric` to logged samples to identification

* convert `metric` to `metrics`: list

5680a2e6

26 Nov, 2024 1 commit

Score tasks (#2452) · 0ef7548d

Rima Shahbazyan authored Nov 26, 2024

* score readme added

* generate until task's "until" parameter's default value fixed.

* score mmlu-pro and agieval added

* changed macro accuracy to micro for agieval

* Always E removed from agi eval

* redundancies removed

* MATH added

* minor cosmetic changes for math

* Licenses added Readme updated

* changes for flake8 + license header on math

* Score added to readme and precommit was run.

* Score added to readme and precommit was run.

* Import error fixed

* math task bugfix
postprocess minor fix

* CR for math added

* math CR

* math task bugfix
postprocess minor fix

CR for math added

* Math cr fixed

* reverting the default "until" parameter change and adjusting  score task configs

0ef7548d

22 Nov, 2024 1 commit
- parse tokenizer_backend=None properly (#2509) · 9d36354e
  Baber Abbasi authored Nov 22, 2024
  
  9d36354e
20 Nov, 2024 1 commit
- Nits (#2500) · 867413f8
  Baber Abbasi authored Nov 20, 2024
```
* fix test task

* dont call lm.chat_template each time
```
  867413f8
18 Nov, 2024 3 commits

Add metabench task to LM Evaluation Harness (#2357) · 62b4364d

Kozzy Voudouris authored Nov 18, 2024



* Add metabench (Kipnis et al. 2024)

* Update metabench tasks for full replication of original benchmarks, using publicly available datasets

* Remove unnecessary import

* Add permute versions of each task, where the answer orders are randomly shuffled.

* Add metabench group for easier evaluations

* Fix mmlu counts after removing duplicate

* Add secondary datasets

* Fix f-string error

* Fix f-string error for permute processing

* Add original hash to outputs for easy matching to original results

* Add line break at end of utils files

* Remove extra line from winogrande

* Reformat for linters

* fix multiple input test

* appease pre-commit

* Add metabench to tasks README

* fix multiple input `test_doc_to_text`

---------
Co-authored-by: Baber <baber@hey.com>

62b4364d

remove duplicate `arc_ca` (#2499) · 8222ad0a
Baber Abbasi authored Nov 18, 2024

8222ad0a
Add mamba hf to `mamba_ssm` (#2496) · 0f5dc265
Baber Abbasi authored Nov 18, 2024
```
* add hf mamba to mamba_lm

* fix _model_generate for hf
```
0f5dc265

16 Nov, 2024 2 commits

kbl-v0.1.1 (#2493) · cbc31eb8

Wonseok Hwang authored Nov 17, 2024

* release kbl-v0.1

* fix linting

* remove rag tasks as  doc_to_text functions cause trouble

* remove remaining rag tasks

* remove unnecessary repeat in yaml files and rag dataset in hf-hub

* remove unncessary newline; introduce cfg files in lbox/kbl in hf

* Make task yaml files consistent to hf-datasets-config

* Make task yaml files consistent to hf-datasets-config

* Remove trailing empty space in doc-to-text

* Remove unncessary yaml file

* Fix task nameing error

* trailing space removed

cbc31eb8

update pre-commit hooks and git actions (#2497) · badf273a
Baber Abbasi authored Nov 16, 2024
```
* pre-commit update

* update github actions

* make logging less verbose

* fix artifacts
```
badf273a

15 Nov, 2024 2 commits

Fix revision parameter to vllm get_tokenizer (#2492) · e20e1ddc
Oyvind Tafjord authored Nov 15, 2024

e20e1ddc

IBM watsonx_llm fixes & refactor (#2464) · 4259a6d4

Nikodem Szwast authored Nov 15, 2024

* refactor code, fix config path bug

* update types to be from typing lib

* add pre-commit formatting

* specify version of ibm_watsonx_ai package

* adjust get_watsonx_credentials() function, add minor refactor to adress PR review comments

* change missing installation hint from ibm_watsonx_ai to lm_eval[ibm_watsonx_ai]

4259a6d4

12 Nov, 2024 1 commit
- wandb logger fix, added pre-commit (#2484) · 67db63a5
  Alex Titterton authored Nov 12, 2024
  
  67db63a5
11 Nov, 2024 2 commits

change warning to debug (#2481) · 6b628d9a
Baber Abbasi authored Nov 11, 2024

6b628d9a

Fix chat template; fix leaderboard math (#2475) · 77c811ea

Baber Abbasi authored Nov 11, 2024

* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca

.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* Chat template fix (#7)

* cleanup

* cleanup

* cleanup

* linting

* fix tests

* add ifeval install to new_task CI

* Revert "add ifeval install to new_task CI"

This reverts commit 1d19449bb7fbfa05d51e7cd20950475eae533bf1.

* adds leaderboard tasks (#1)

* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix math parser

* fix math parser

* fix version

* add warning about chat template

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.co>
Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Nathan Habib <nathan.habib19@gmail.com>

77c811ea

09 Nov, 2024 2 commits

Ifeval: Dowload `punkt_tab` on rank 0 (#2267) · bd80a6c0
Baber Abbasi authored Nov 09, 2024
```
* download nltk `punkt_tab` on LOCAL_RANK=0

* remove print

* remove `time`

* nit
```
bd80a6c0

OpenAI ChatCompletions: switch `max_tokens` (#2443) · 060e8761

Baber Abbasi authored Nov 09, 2024

* switch `max_tokens` for `max_completion_tokens`. OpenAI ChatCompletions

* remove stop, temp=1 for o1

* add chat assertion

* HF_DATASETS_TRUST_REMOTE_CODE = True for task tests

* move warning

060e8761

07 Nov, 2024 3 commits
- pass device_map other than auto for parallelize (#2457) · 4155ec7f
  Baber Abbasi authored Nov 07, 2024
```
* pass device_map other than auto for parallelize
```
  4155ec7f
- typo (#2465) · 901053f6
  Baber Abbasi authored Nov 07, 2024
  
  901053f6
- use global filter (#2461) · cd18cb3b
  Baber Abbasi authored Nov 07, 2024
  
  cd18cb3b
06 Nov, 2024 1 commit
- Fix 'loglikelihood' typos in the api models file (#2459) · bf2abb41
  Rob Geada authored Nov 06, 2024
  
  bf2abb41
05 Nov, 2024 3 commits

Add Japanese Leaderboard (#2439) · 26f607f5

mtkachenko authored Nov 05, 2024

* add jaqket_v2 and jcommonsenseqa

* remove comments

* remove num_beams as it is incompatible with vllm

* add jnli + refactor

* rename jnla -> jnli

* add jsquad + replace colon chars with the Japanese unicode

* ignore whitespaces in generation tasks

* add marc_ja

* add xwinograd + simplify other yamls

* add mgsm and xlsum

* refactor xlsum

* add ja_leaderboard tag

* edit README.md

* update README.md

* add credit + minor changes

* run ruff format

* address review comments + add group

* remove aggregate_metric_list

* remove tags

* update tasks/README.md

26f607f5

Modify label errors in catcola and paws-x (#2434) · fb2e4b59

zxcvuser authored Nov 05, 2024



* Modify label errors in catcola and paws

* Update version to 1.0 in pawsx_template_yaml

* add changelog

---------
Co-authored-by: Baber <baber@hey.com>

fb2e4b59

Add real process_docs example (#2456) · 0b8358ec
Sypherd authored Nov 05, 2024

0b8358ec