Commits · bcee8f2ed6f7f79a62083b2bac37737ee0cd7583 · gaoqiong / lm-evaluation-harness

10 May, 2024 7 commits
- fix direct path, add bash script for gpt models · bcee8f2e
  JessicaOjo authored May 10, 2024
  
  bcee8f2e
- add few show. metric fixes · c4f634c6
  JessicaOjo authored May 10, 2024
  
  c4f634c6
- remove print · ae6e5cbd
  JessicaOjo authored May 10, 2024
  
  ae6e5cbd
- remove direct util, update common yaml · 3eab4b9a
  JessicaOjo authored May 10, 2024
  
  3eab4b9a
- fix bash script · 21fb0db7
  JessicaOjo authored May 10, 2024
  
  21fb0db7
- add squad metric · 6ca56eac
  JessicaOjo authored May 10, 2024
  
  6ca56eac
- add afrimgsm -direct · 187ab735
  JessicaOjo authored May 10, 2024
  
  187ab735
09 May, 2024 3 commits
- Merge pull request #2 from JessicaOjo/afrimmlu · 816832f8
  Jess authored May 09, 2024
```
Afrimmlu direct and few shot
```
  816832f8
- remove print · 901ad392
  Israel Abebe Azime authored May 09, 2024
  
  901ad392
- updated prompt · 343880ab
  Israel Abebe Azime authored May 09, 2024
  
  343880ab
08 May, 2024 4 commits
- afrimmlu folder update · f64b943d
  Israel Abebe Azime authored May 08, 2024
  
  f64b943d
- afrimmlu folder update · 64490d95
  Israel Abebe Azime authored May 08, 2024
  
  64490d95
- afrimmlu added · 138eb569
  Israel Abebe Azime authored May 08, 2024
  
  138eb569
- Merge branch 'EleutherAI:main' into main · eea16d36
  Jess authored May 08, 2024
  
  eea16d36
07 May, 2024 9 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt

The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unit...

885f48d6

Logging Updates (Alphabetize table printouts, fix eval tracker bug) (#1774) (#1791) · d4a913c4
Hailey Schoelkopf authored May 07, 2024
```
* fix auto-batch size bug for seq2seq models

* alphabetize task + group tables ; fix eval tracker bug

* fix eval tracker bug
```
d4a913c4
Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt) variant (#1793) · d42a3e44
Hailey Schoelkopf authored May 07, 2024
```
* add Hendrycks MATH (no sympy checking) variant

* add readmes for MATH tasks
```
d42a3e44
link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b
Merge pull request #1 from JessicaOjo/afrixnli · 72f5f4b1
Jess authored May 07, 2024
```
Afrixnli task
```
72f5f4b1
remove chat completion -untested · ee1f296e
JessicaOjo authored May 07, 2024

ee1f296e
add chat completion · b432d0e9
JessicaOjo authored May 07, 2024

b432d0e9
add afrixnli to task · a27ea4bd
JessicaOjo authored May 07, 2024

a27ea4bd
Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775) · 7fe2b93c
Hailey Schoelkopf authored May 07, 2024

7fe2b93c

06 May, 2024 2 commits

Update `--tasks list` option in interface documentation (#1792) · 66cf07ef
aditya thomas authored May 07, 2024

66cf07ef

Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc

LSinev authored May 06, 2024

* Added fewshot sampling seeds to evaluator.simple_evaluate signature

Way to control seed of fewshot sampling
may help with #1591

* Added ability for custom sampler for ConfigurableTask

May be set in config like
```
fewshot_config:
  sampler: !function utils.MyFewshotSampler
```

* explicitly set fewshot random generator seed for HFLM generate_until_task test

* add backward compatibility for three args seed setup

* save seeds info to logs/reports

ae72cebc

05 May, 2024 4 commits
- Fix bug in setting until kwarg in openai completions (#1784) · 30c060d2
  ciaranby authored May 05, 2024
  
  30c060d2
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
- remove echo parameter in OpenAI completions API (#1779) · c34986da
  kwrobel.eth authored May 05, 2024
```
* remove echo parameter in OpenAI completions API

* remove context length parameter doc string
```
  c34986da
- limit fix (#1785) · cee785e0
  KonradSzafer authored May 05, 2024
  
  cee785e0
03 May, 2024 2 commits

eval tracker args fix (#1777) · 18f4eb57
KonradSzafer authored May 03, 2024

18f4eb57

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

02 May, 2024 2 commits
- Add option to set OpenVINO config (#1730) · e6394715
  Helena Kloosterman authored May 02, 2024
```
* Add option to set OpenVINO config

* Use utils.eval_logger for logging
```
  e6394715
- vllm lora support (#1756) · 83fd78a2
  bcicc authored May 02, 2024
```
* vllm lora support

* remove print

* version check, rename lora kwarg
```
  83fd78a2
01 May, 2024 4 commits

upload new tasks (#1728) · caaf9ab6

Simran Arora authored May 01, 2024



* upload new tasks

* add readmes

* run linters

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

caaf9ab6

Fix m_arc choices (#1760) · f27c4050

Zehan Li authored May 02, 2024



* Update utils.py

This is a 4-choice task, option_e is null for all but 3 samples

* Fix options

Adaptive choices

* add option e

* bump multilingual arc version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

f27c4050

Pile 10k new task (#1758) · b898bdaa
Gabriel Mukobi authored May 01, 2024
```
* Add Pile-10k readme

* Add Pile-10k task configuration file
```
b898bdaa
remove duplicated `num_fewshot: 0` (#1769) · 552eeae7
Chujie Zheng authored May 01, 2024

552eeae7

26 Apr, 2024 2 commits
- Add filter registry decorator (#1750) · f64e72f5
  Nikita Lozhnikov authored Apr 26, 2024
```
* Add register_filter decorator

* Add register_filter docs
```
  f64e72f5
- Support individual scrolls datasets (#1740) · 9b49556a
  giorgossideris authored Apr 26, 2024
```
* Support individual scrolls datasets

* Add qmsum context

* Fix formatting
```
  9b49556a
25 Apr, 2024 1 commit
- Fix Parameter Propagation for Tasks that have `include` (#1749) · 0bafcef0
  Lintang Sutawika authored Apr 26, 2024
```
* Update task.py

* Update __init__.py
```
  0bafcef0