Commits · 86319a9b14ddae2030bc6e0fdddd47fc7d0bb525 · gaoqiong / lm-evaluation-harness

19 May, 2024 1 commit

Fix: support PEFT/LoRA with added tokens (#1828) · 86319a9b

Nick Doiron authored May 19, 2024



* resize model embeddings

* resize only

* tokenizer help

* load tokenizer before model

* add comment and run precommit lint

* Add log message
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

86319a9b

14 May, 2024 1 commit
- Fix links in README guiding to another branch (#1838) · a9eaaf46
  LSinev authored May 14, 2024
  
  a9eaaf46
13 May, 2024 2 commits

interface doc update (#1807) · b24ac4b8
KonradSzafer authored May 13, 2024

b24ac4b8

Adding tinyBenchmarks datasets (#1545) · fe9fef4e

Lucas Weber authored May 13, 2024



* Add tinyBenchmarks

* Add acknowledgements

* Add ordering of outputs for data-parallel

* Run pre-commit

* Add few_shot specifications

* Add tinyBenchmarks post-processing

* add conditional import ; fix task names

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

fe9fef4e

09 May, 2024 1 commit

Copal task (#1803) · 1980a13c

Edd authored May 10, 2024

* add copal

* change name to copal id for clarity and the task name

* remove `copal_id...` to yaml to make it work

* checkmark on README

* change group name to `copal_id`

1980a13c

08 May, 2024 2 commits

Update flag `--hf_hub_log_args` in interface documentation (#1806) · d32ce5cf

aditya thomas authored May 08, 2024

* update interface documentation with flag --hf_hub_logs_arg

* update interface documentation with flag --hf_hub_logs_arg 2

d32ce5cf

add task for mmlu evaluation in arc multiple choice format (#1745) · 9097ad3e

jonabur authored May 08, 2024



* add mmlu arc style evaluation

* rename arc_style to continuation

---------
Co-authored-by: Jonathan Burdge <jburdge@mahti-login11.mahti.csc.fi>
Co-authored-by: Jonathan Burdge <jburdge@mahti-login12.mahti.csc.fi>

9097ad3e

07 May, 2024 5 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt



The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unitxt]'
2. Added a check that unitxt is installed and print a clear error message if not

* Commited missing pyproject change

* Added documentation on adding datasets

* More doc changes

* add unitxt extra to readme

* run precommit

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

885f48d6

Logging Updates (Alphabetize table printouts, fix eval tracker bug) (#1774) (#1791) · d4a913c4
Hailey Schoelkopf authored May 07, 2024
```
* fix auto-batch size bug for seq2seq models

* alphabetize task + group tables ; fix eval tracker bug

* fix eval tracker bug
```
d4a913c4
Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt) variant (#1793) · d42a3e44
Hailey Schoelkopf authored May 07, 2024
```
* add Hendrycks MATH (no sympy checking) variant

* add readmes for MATH tasks
```
d42a3e44
link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b
Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775) · 7fe2b93c
Hailey Schoelkopf authored May 07, 2024

7fe2b93c

06 May, 2024 2 commits

Update `--tasks list` option in interface documentation (#1792) · 66cf07ef
aditya thomas authored May 07, 2024

66cf07ef

Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc

LSinev authored May 06, 2024

* Added fewshot sampling seeds to evaluator.simple_evaluate signature

Way to control seed of fewshot sampling
may help with #1591

* Added ability for custom sampler for ConfigurableTask

May be set in config like
```
fewshot_config:
  sampler: !function utils.MyFewshotSampler
```

* explicitly set fewshot random generator seed for HFLM generate_until_task test

* add backward compatibility for three args seed setup

* save seeds info to logs/reports

ae72cebc

05 May, 2024 4 commits
- Fix bug in setting until kwarg in openai completions (#1784) · 30c060d2
  ciaranby authored May 05, 2024
  
  30c060d2
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
- remove echo parameter in OpenAI completions API (#1779) · c34986da
  kwrobel.eth authored May 05, 2024
```
* remove echo parameter in OpenAI completions API

* remove context length parameter doc string
```
  c34986da
- limit fix (#1785) · cee785e0
  KonradSzafer authored May 05, 2024
  
  cee785e0
03 May, 2024 2 commits

eval tracker args fix (#1777) · 18f4eb57
KonradSzafer authored May 03, 2024

18f4eb57

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

02 May, 2024 2 commits
- Add option to set OpenVINO config (#1730) · e6394715
  Helena Kloosterman authored May 02, 2024
```
* Add option to set OpenVINO config

* Use utils.eval_logger for logging
```
  e6394715
- vllm lora support (#1756) · 83fd78a2
  bcicc authored May 02, 2024
```
* vllm lora support

* remove print

* version check, rename lora kwarg
```
  83fd78a2
01 May, 2024 4 commits

upload new tasks (#1728) · caaf9ab6

Simran Arora authored May 01, 2024



* upload new tasks

* add readmes

* run linters

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

caaf9ab6

Fix m_arc choices (#1760) · f27c4050

Zehan Li authored May 02, 2024



* Update utils.py

This is a 4-choice task, option_e is null for all but 3 samples

* Fix options

Adaptive choices

* add option e

* bump multilingual arc version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

f27c4050

Pile 10k new task (#1758) · b898bdaa
Gabriel Mukobi authored May 01, 2024
```
* Add Pile-10k readme

* Add Pile-10k task configuration file
```
b898bdaa
remove duplicated `num_fewshot: 0` (#1769) · 552eeae7
Chujie Zheng authored May 01, 2024

552eeae7

26 Apr, 2024 2 commits
- Add filter registry decorator (#1750) · f64e72f5
  Nikita Lozhnikov authored Apr 26, 2024
```
* Add register_filter decorator

* Add register_filter docs
```
  f64e72f5
- Support individual scrolls datasets (#1740) · 9b49556a
  giorgossideris authored Apr 26, 2024
```
* Support individual scrolls datasets

* Add qmsum context

* Fix formatting
```
  9b49556a
25 Apr, 2024 3 commits
- Fix Parameter Propagation for Tasks that have `include` (#1749) · 0bafcef0
  Lintang Sutawika authored Apr 26, 2024
```
* Update task.py

* Update __init__.py
```
  0bafcef0
- Add XNLIeu: a dataset for cross-lingual NLI in Basque (#1694) · 594015b6
  Julen Etxaniz authored Apr 25, 2024
```
* add xnli_eu tasks

* update tasks readme

* update readme
```
  594015b6
- reference `--tasks list` in README (#1726) · 80a056bb
  Brian Vaughan authored Apr 25, 2024
```
https://github.com/EleutherAI/lm-evaluation-harness/issues/1698
```
  80a056bb
18 Apr, 2024 2 commits
- Adding retries and rate limit to toxicity tasks (#1620) · 3196e907
  sator-labs authored Apr 18, 2024
  
  3196e907
- fix error when appending eot_token_id for generate_until tasks (#1699) · dc5eba86
  Sergio Perez authored Apr 18, 2024
  
  dc5eba86
16 Apr, 2024 2 commits

Add `neuralmagic` models for `sparseml` and `deepsparse` (#1674) · 8b326be7

Michael Goin authored Apr 16, 2024



* Add neuralmagic models for SparseML and DeepSparse

* Update to latest and add test

* Format

* Fix list to List

* Format

* Add deepsparse/sparseml to automated testing

* Update pyproject.toml

* Update pyproject.toml

* Update README

* Fixes for dtype and device

* Format

* Fix test

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Address review comments!

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

8b326be7

Add delta weights model loading (#1712) · 12a165d1

KonradSzafer authored Apr 16, 2024

* added delta weights

* removed debug

* readme update

* better error handling

* autogptq warn

* warn update

* peft and delta error, explicitly deleting _model_delta

* linter fix

12a165d1

08 Apr, 2024 1 commit
- Update README.md (#1680) · 7852985b
  Hailey Schoelkopf authored Apr 08, 2024
  
  7852985b
07 Apr, 2024 1 commit

correction bug EleutherAI#1664 (#1670) · e9a40543

nicho2 authored Apr 07, 2024

* correction bug EleutherAI#1664

* add any invalid characters for Windows filenames and Unix-like systems

see:
https://gist.github.com/doctaphred/d01d05291546186941e1b7ddc02034d3?permalink_comment_id=3958715



* Update lm_eval/__main__.py

* Update scripts/zeno_visualize.py

* fix format

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

e9a40543

05 Apr, 2024 2 commits

Anthropic Chat API (#1594) · 27924d77

Seungwoo Ryu authored Apr 06, 2024



* claude3

* supply for anthropic claude3

* supply for anthropic claude3

* anthropic config changes

* add callback options on anthropic

* line passed

* claude3 tiny change

* help anthropic installation

* mention sysprompt / being careful with format in readme

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

27924d77

TMMLU+ implementation (#1394) · 9ae96cdf

ZoneTwelve authored Apr 06, 2024



* implementation of TMMLU+

* implemented: TMMLU+

****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding****

- 4 categories
    - STEM
    - Social Science
    - Humanities
    - Other

The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models.

```markdown
Total number of tasks in the 'test' sets: 20160
Total number of tasks in the 'validation' sets: 2247
Total number of tasks in the 'train' sets: 335
```

* Remove print from __init__.py

There was my mistake in forgetting to remove the debug print from the code.

* update: move TMMLU+ config generation program into default

* fix: we should use training set as few shots example

* update: README for TMMLU+

* update: a small changes of TMMLU+ README file

* pre-commit run thought

* Add README for TMMLU+ dataset

* run precommit

* trigger precommit again

* trigger precommit again

* isort is fussy

* isort is fussy

* format, again

* oops

* oops

---------
Co-authored-by: lintang <lintang@eleuther.ai>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

9ae96cdf

04 Apr, 2024 1 commit
- Patch QQP prompt (#1661) · ff24e992
  Hailey Schoelkopf authored Apr 04, 2024
  
  ff24e992