Commits · 748eb47e502c4962a06d37df6448816ebdcbc4e8 · gaoqiong / lm-evaluation-harness

21 Jan, 2025 1 commit
- add llama_math · 748eb47e
  Baber authored Jan 21, 2025
  
  748eb47e
02 Jan, 2025 10 commits
- nit · da92dc8c
  Baber authored Jan 02, 2025
  
  da92dc8c
- nit · 523243d2
  Baber authored Jan 02, 2025
  
  523243d2
- nit · 06db8b9b
  Baber authored Jan 02, 2025
  
  06db8b9b
- nit · 8adf999d
  Baber authored Jan 02, 2025
  
  8adf999d
- use chinese colon · 0c50bfaf
  Baber authored Jan 02, 2025
  
  0c50bfaf
- nit · 40027bca
  Baber authored Jan 02, 2025
  
  40027bca
- add mgsm · 2de3ffdf
  Baber authored Jan 02, 2025
  
  2de3ffdf
- add mgsm · 6884c5a0
  Baber authored Jan 02, 2025
  
  6884c5a0
- add mgsm · b0108cf8
  Baber authored Jan 02, 2025
  
  b0108cf8
- add all the different translations of `Answer` to regex · 668603fc
  Baber authored Jan 02, 2025
  
  668603fc
10 Dec, 2024 5 commits
- use `allenai/ai2_arc` · 6a72f627
  Baber authored Dec 10, 2024
  
  6a72f627
- test arc_challenge · 6d0c60d7
  Baber authored Dec 10, 2024
  
  6d0c60d7
- nit · 31631407
  Baber authored Dec 10, 2024
  
  31631407
- add arc_challenge · 29ac037d
  Baber authored Dec 10, 2024
  
  29ac037d
- add mgsm · f44f2c5e
  Baber authored Dec 10, 2024
  
  f44f2c5e
09 Dec, 2024 2 commits

Update Lightning import (#2549) · 0b994433

Maanu Grover authored Dec 09, 2024



* update import
Signed-off-by: Maanu Grover <maanug@nvidia.com>

* run formatting

---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>

0b994433

[API] left truncate for generate_until (#2554) · 2d11f2e5
Baber Abbasi authored Dec 09, 2024
```
* left truncate for generate_until

* pre-commit
```
2d11f2e5

05 Dec, 2024 1 commit
- Update README.md (#2546) · bcb4cbf4
  fzyzcjy authored Dec 05, 2024
  
  bcb4cbf4
04 Dec, 2024 3 commits
- Support pipeline parallel with OpenVINO models (#2349) · 1f9bc88f
  Slawomir Strehlke authored Dec 04, 2024
```
* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
```
  1f9bc88f
- add better testing when both doc_to_text ends in and target_delimiter are whitespaces (#2535) · 6824d39d
  Baber Abbasi authored Dec 04, 2024
  
  6824d39d
- Update README.md (#2534) · 4a12959f
  Baber Abbasi authored Dec 04, 2024
```
* Update README.md

add caching tip to readme

* Update README.md

add api link
```
  4a12959f
03 Dec, 2024 2 commits
- avoid timeout errors with high concurrency in api_model (#2307) · 9632b343
  Trawinski, Dariusz authored Dec 03, 2024
```
* avoid timeout errors with high concurrency in api_model

* style

* add timeout

* add docs

---------
Co-authored-by: Baber <baber@hey.com>
```
  9632b343
- add Basque translation of PIQA (piqa_eu) to BasqueBench (#2531) · f49b0377
  Naiara Perez authored Dec 03, 2024
  
  f49b0377
01 Dec, 2024 1 commit

Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e

Yoav Katz authored Dec 01, 2024


Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)

* Moved to require unitxt installation and not download unitxt from HF hub.

This has performance benefits and simplifies the code.
Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated watsonx documentation

* Updated installation instructions

* Removed redundant comman

* Allowed unitxt tasks to generate chat APIs

Modified WatsonXI model to support chat apis

* Removed print

* Run precommit formatting

---------
Signed-off-by: Yoav Katz <katz@il.ibm.com>

1170ef9e

30 Nov, 2024 1 commit
- make utility function to handle `until` (#2518) · 0230356c
  Baber Abbasi authored Nov 30, 2024
```
* make utility function to handle `until`

* fix text
```
  0230356c
29 Nov, 2024 1 commit
- skip casting if predict_only (#2524) · 9169899b
  Baber Abbasi authored Nov 29, 2024
  
  9169899b
28 Nov, 2024 1 commit

Filters bugfix; add `metrics` and `filter` to logged sample (#2517) · 5680a2e6

Baber Abbasi authored Nov 28, 2024

* allow !function filters

* bugfix

* nit

* add `filter` to logged samples

* add `filter` and `metric` to logged samples to identification

* convert `metric` to `metrics`: list

5680a2e6

26 Nov, 2024 1 commit

Score tasks (#2452) · 0ef7548d

Rima Shahbazyan authored Nov 26, 2024

* score readme added

* generate until task's "until" parameter's default value fixed.

* score mmlu-pro and agieval added

* changed macro accuracy to micro for agieval

* Always E removed from agi eval

* redundancies removed

* MATH added

* minor cosmetic changes for math

* Licenses added Readme updated

* changes for flake8 + license header on math

* Score added to readme and precommit was run.

* Score added to readme and precommit was run.

* Import error fixed

* math task bugfix
postprocess minor fix

* CR for math added

* math CR

* math task bugfix
postprocess minor fix

CR for math added

* Math cr fixed

* reverting the default "until" parameter change and adjusting  score task configs

0ef7548d

22 Nov, 2024 1 commit
- parse tokenizer_backend=None properly (#2509) · 9d36354e
  Baber Abbasi authored Nov 22, 2024
  
  9d36354e
20 Nov, 2024 1 commit
- Nits (#2500) · 867413f8
  Baber Abbasi authored Nov 20, 2024
```
* fix test task

* dont call lm.chat_template each time
```
  867413f8
18 Nov, 2024 3 commits

Add metabench task to LM Evaluation Harness (#2357) · 62b4364d

Kozzy Voudouris authored Nov 18, 2024



* Add metabench (Kipnis et al. 2024)

* Update metabench tasks for full replication of original benchmarks, using publicly available datasets

* Remove unnecessary import

* Add permute versions of each task, where the answer orders are randomly shuffled.

* Add metabench group for easier evaluations

* Fix mmlu counts after removing duplicate

* Add secondary datasets

* Fix f-string error

* Fix f-string error for permute processing

* Add original hash to outputs for easy matching to original results

* Add line break at end of utils files

* Remove extra line from winogrande

* Reformat for linters

* fix multiple input test

* appease pre-commit

* Add metabench to tasks README

* fix multiple input `test_doc_to_text`

---------
Co-authored-by: Baber <baber@hey.com>

62b4364d

remove duplicate `arc_ca` (#2499) · 8222ad0a
Baber Abbasi authored Nov 18, 2024

8222ad0a
Add mamba hf to `mamba_ssm` (#2496) · 0f5dc265
Baber Abbasi authored Nov 18, 2024
```
* add hf mamba to mamba_lm

* fix _model_generate for hf
```
0f5dc265

16 Nov, 2024 2 commits

kbl-v0.1.1 (#2493) · cbc31eb8

Wonseok Hwang authored Nov 17, 2024

* release kbl-v0.1

* fix linting

* remove rag tasks as  doc_to_text functions cause trouble

* remove remaining rag tasks

* remove unnecessary repeat in yaml files and rag dataset in hf-hub

* remove unncessary newline; introduce cfg files in lbox/kbl in hf

* Make task yaml files consistent to hf-datasets-config

* Make task yaml files consistent to hf-datasets-config

* Remove trailing empty space in doc-to-text

* Remove unncessary yaml file

* Fix task nameing error

* trailing space removed

cbc31eb8

update pre-commit hooks and git actions (#2497) · badf273a
Baber Abbasi authored Nov 16, 2024
```
* pre-commit update

* update github actions

* make logging less verbose

* fix artifacts
```
badf273a

15 Nov, 2024 2 commits

Fix revision parameter to vllm get_tokenizer (#2492) · e20e1ddc
Oyvind Tafjord authored Nov 15, 2024

e20e1ddc

IBM watsonx_llm fixes & refactor (#2464) · 4259a6d4

Nikodem Szwast authored Nov 15, 2024

* refactor code, fix config path bug

* update types to be from typing lib

* add pre-commit formatting

* specify version of ibm_watsonx_ai package

* adjust get_watsonx_credentials() function, add minor refactor to adress PR review comments

* change missing installation hint from ibm_watsonx_ai to lm_eval[ibm_watsonx_ai]

4259a6d4

12 Nov, 2024 1 commit
- wandb logger fix, added pre-commit (#2484) · 67db63a5
  Alex Titterton authored Nov 12, 2024
  
  67db63a5
11 Nov, 2024 1 commit
- change warning to debug (#2481) · 6b628d9a
  Baber Abbasi authored Nov 11, 2024
  
  6b628d9a