Commits · e4a8386ee60a046a5bcac2a5c31beafef55faf61 · gaoqiong / lm-evaluation-harness

19 Dec, 2024 10 commits
- fix template · e4a8386e
  Baber authored Dec 19, 2024
  
  e4a8386e
- fix metric · 5c1f194b
  Baber authored Dec 19, 2024
  
  5c1f194b
- nit · 6c7c413d
  Baber authored Dec 19, 2024
  
  6c7c413d
- nit · 1f27bd87
  Baber authored Dec 19, 2024
  
  1f27bd87
- fix tokenizer · bd34e852
  Baber authored Dec 19, 2024
  
  bd34e852
- fix aggregation · 7399dae1
  Baber authored Dec 19, 2024
  
  7399dae1
- fix · 19d54607
  Baber authored Dec 19, 2024
  
  19d54607
- handle nltk punkt_tab · 0351bb6b
  Baber authored Dec 19, 2024
  
  0351bb6b
- add ruler reqs · 07429c86
  Baber authored Dec 19, 2024
  
  07429c86
- add niah · c3aaad59
  Baber authored Dec 19, 2024
  
  c3aaad59
18 Dec, 2024 5 commits
- fix do_sample · d684b9eb
  Baber authored Dec 18, 2024
  
  d684b9eb
- fix do_sample · adbfcce1
  Baber authored Dec 18, 2024
  
  adbfcce1
- fix metrics · cf8a257c
  Baber authored Dec 18, 2024
  
  cf8a257c
- add longbench · 76e517d1
  Baber authored Dec 18, 2024
  
  76e517d1
- add hotpotqa_e · a8601618
  Baber authored Dec 18, 2024
  
  a8601618
17 Dec, 2024 2 commits
- drop python 3.8 support (#2575) · 8558b8d4
  Baber Abbasi authored Dec 17, 2024
```
* feat: drop Python 3.8 support

* feat: drop Python 3.8 tests

* pre-commit
```
  8558b8d4
- increment version (#2574) · 4c26a9c1
  Baber Abbasi authored Dec 17, 2024
```
forgot to increment 0.4.6!
```
  4c26a9c1
16 Dec, 2024 3 commits

fix `DeprecationWarning: invalid escape sequence '\s'` for whitespace filter (#2560) · 8d2f64c1

Baber Abbasi authored Dec 16, 2024

* fix `DeprecationWarning: invalid escape sequence '\s'`

* add type hints

* Revert "add type hints"

This reverts commit 15d8abc626a84e97f8c238ddfbf9e243d6f6eb5c.

8d2f64c1

batch `loglikelihood_rolling` across requests (#2559) · 0bfb0220

Baber Abbasi authored Dec 16, 2024

* batch all rolling token windows

* nit

* copy to vllm

* fix max_length for `get_rolling_token_windows`

* bugfix

* bugfix

* add type hints

0bfb0220

Adding new subtask to SCORE tasks: non greedy robustness (#2558) · 976d8a0b

Rima Shahbazyan authored Dec 16, 2024

* score readme added

* generate until task's "until" parameter's default value fixed.

* score mmlu-pro and agieval added

* changed macro accuracy to micro for agieval

* Always E removed from agi eval

* redundancies removed

* MATH added

* minor cosmetic changes for math

* Licenses added Readme updated

* changes for flake8 + license header on math

* Score added to readme and precommit was run.

* Score added to readme and precommit was run.

* Import error fixed

* math task bugfix
postprocess minor fix

* CR for math added

* math CR

* math task bugfix
postprocess minor fix

CR for math added

* Math cr fixed

* mmlu_pro non_greedy task added

* non greedy summarizer added

* Non greedy for all score tasks

* Bugfixes for non-greedy

* fixing the until argument

* undoing the change to "until" arguments default behaviour

* minor fix in summarizer

* log naming changes for better readability

* math subtasks naming fix

* agieval subtask naming fix

* logging added for debugging

* path issue fixed

* minor fix

* path fix

* path fix

* non_greedy_math minor fix

* final changes

* changed readme for non-greedy
added Nvidia header
added wxample script for non_greedy
changed prompts to match that fo trt runs

* non greedy summarizer bugfix

* non_greedy summarizer fixed

976d8a0b

14 Dec, 2024 1 commit
- add warning to readme (#2568) · 8de772f9
  Baber Abbasi authored Dec 14, 2024
```
* make warning prominent

* make warning prominent
```
  8de772f9
13 Dec, 2024 1 commit

add optimum-intel ipex model (#2566) · 919470a1

Yao Matrix authored Dec 14, 2024



* initial support for optimum-intel ipex model. LM model as first step

* format
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* pass dtype
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* update README
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------
Signed-off-by: Yao Matrix <matrix.yao@intel.com>

919470a1

09 Dec, 2024 2 commits

Update Lightning import (#2549) · 0b994433

Maanu Grover authored Dec 09, 2024



* update import
Signed-off-by: Maanu Grover <maanug@nvidia.com>

* run formatting

---------
Signed-off-by: Maanu Grover <maanug@nvidia.com>

0b994433

[API] left truncate for generate_until (#2554) · 2d11f2e5
Baber Abbasi authored Dec 09, 2024
```
* left truncate for generate_until

* pre-commit
```
2d11f2e5

05 Dec, 2024 1 commit
- Update README.md (#2546) · bcb4cbf4
  fzyzcjy authored Dec 05, 2024
  
  bcb4cbf4
04 Dec, 2024 3 commits
- Support pipeline parallel with OpenVINO models (#2349) · 1f9bc88f
  Slawomir Strehlke authored Dec 04, 2024
```
* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
```
  1f9bc88f
- add better testing when both doc_to_text ends in and target_delimiter are whitespaces (#2535) · 6824d39d
  Baber Abbasi authored Dec 04, 2024
  
  6824d39d
- Update README.md (#2534) · 4a12959f
  Baber Abbasi authored Dec 04, 2024
```
* Update README.md

add caching tip to readme

* Update README.md

add api link
```
  4a12959f
03 Dec, 2024 2 commits
- avoid timeout errors with high concurrency in api_model (#2307) · 9632b343
  Trawinski, Dariusz authored Dec 03, 2024
```
* avoid timeout errors with high concurrency in api_model

* style

* add timeout

* add docs

---------
Co-authored-by: Baber <baber@hey.com>
```
  9632b343
- add Basque translation of PIQA (piqa_eu) to BasqueBench (#2531) · f49b0377
  Naiara Perez authored Dec 03, 2024
  
  f49b0377
01 Dec, 2024 1 commit

Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e

Yoav Katz authored Dec 01, 2024


Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)

* Moved to require unitxt installation and not download unitxt from HF hub.

This has performance benefits and simplifies the code.
Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated watsonx documentation

* Updated installation instructions

* Removed redundant comman

* Allowed unitxt tasks to generate chat APIs

Modified WatsonXI model to support chat apis

* Removed print

* Run precommit formatting

---------
Signed-off-by: Yoav Katz <katz@il.ibm.com>

1170ef9e

30 Nov, 2024 1 commit
- make utility function to handle `until` (#2518) · 0230356c
  Baber Abbasi authored Nov 30, 2024
```
* make utility function to handle `until`

* fix text
```
  0230356c
29 Nov, 2024 1 commit
- skip casting if predict_only (#2524) · 9169899b
  Baber Abbasi authored Nov 29, 2024
  
  9169899b
28 Nov, 2024 1 commit

Filters bugfix; add `metrics` and `filter` to logged sample (#2517) · 5680a2e6

Baber Abbasi authored Nov 28, 2024

* allow !function filters

* bugfix

* nit

* add `filter` to logged samples

* add `filter` and `metric` to logged samples to identification

* convert `metric` to `metrics`: list

5680a2e6

26 Nov, 2024 1 commit

Score tasks (#2452) · 0ef7548d

Rima Shahbazyan authored Nov 26, 2024

* score readme added

* generate until task's "until" parameter's default value fixed.

* score mmlu-pro and agieval added

* changed macro accuracy to micro for agieval

* Always E removed from agi eval

* redundancies removed

* MATH added

* minor cosmetic changes for math

* Licenses added Readme updated

* changes for flake8 + license header on math

* Score added to readme and precommit was run.

* Score added to readme and precommit was run.

* Import error fixed

* math task bugfix
postprocess minor fix

* CR for math added

* math CR

* math task bugfix
postprocess minor fix

CR for math added

* Math cr fixed

* reverting the default "until" parameter change and adjusting  score task configs

0ef7548d

22 Nov, 2024 1 commit
- parse tokenizer_backend=None properly (#2509) · 9d36354e
  Baber Abbasi authored Nov 22, 2024
  
  9d36354e
20 Nov, 2024 1 commit
- Nits (#2500) · 867413f8
  Baber Abbasi authored Nov 20, 2024
```
* fix test task

* dont call lm.chat_template each time
```
  867413f8
18 Nov, 2024 3 commits

Add metabench task to LM Evaluation Harness (#2357) · 62b4364d

Kozzy Voudouris authored Nov 18, 2024



* Add metabench (Kipnis et al. 2024)

* Update metabench tasks for full replication of original benchmarks, using publicly available datasets

* Remove unnecessary import

* Add permute versions of each task, where the answer orders are randomly shuffled.

* Add metabench group for easier evaluations

* Fix mmlu counts after removing duplicate

* Add secondary datasets

* Fix f-string error

* Fix f-string error for permute processing

* Add original hash to outputs for easy matching to original results

* Add line break at end of utils files

* Remove extra line from winogrande

* Reformat for linters

* fix multiple input test

* appease pre-commit

* Add metabench to tasks README

* fix multiple input `test_doc_to_text`

---------
Co-authored-by: Baber <baber@hey.com>

62b4364d

remove duplicate `arc_ca` (#2499) · 8222ad0a
Baber Abbasi authored Nov 18, 2024

8222ad0a
Add mamba hf to `mamba_ssm` (#2496) · 0f5dc265
Baber Abbasi authored Nov 18, 2024
```
* add hf mamba to mamba_lm

* fix _model_generate for hf
```
0f5dc265