Commits · 899a544abcb24fb45f4f04573feff6b4f39ebc60 · gaoqiong / lm-evaluation-harness

30 May, 2024 1 commit
- multiturn alternation fix · 899a544a
  Konrad authored May 30, 2024
  
  899a544a
29 May, 2024 1 commit
- Moved apply chat template to LM · c3706651
  Konrad authored May 29, 2024
  
  c3706651
22 May, 2024 4 commits
- linting · 1162e34e
  Konrad authored May 22, 2024
  
  1162e34e
- Adding a fewshot in a more readable way · 691e0c0d
  Konrad authored May 22, 2024
  
  691e0c0d
- Merge branch 'main' into chat_template · 9bd948df
  Konrad authored May 22, 2024
  
  9bd948df
- Update polemo2_out.yaml (#1871) · 70e1de09
  zhabuye authored May 22, 2024
  
  70e1de09
21 May, 2024 2 commits
- fixed docs typos (#1863) · cb22e502
  Zafir Stojanovski authored May 21, 2024
  
  cb22e502
- fixed incorrect check for task type (replace `~` with `not`) (#1865) · 00b7a61c
  Zafir Stojanovski authored May 21, 2024
  
  00b7a61c
19 May, 2024 1 commit

Fix: support PEFT/LoRA with added tokens (#1828) · 86319a9b

Nick Doiron authored May 19, 2024



* resize model embeddings

* resize only

* tokenizer help

* load tokenizer before model

* add comment and run precommit lint

* Add log message
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

86319a9b

14 May, 2024 6 commits
- added comments · 8a0ce59d
  Konrad authored May 14, 2024
  
  8a0ce59d
- indent update · d01032d2
  Konrad authored May 14, 2024
  
  d01032d2
- typing update · a4bc4846
  Konrad authored May 14, 2024
  
  a4bc4846
- fewshot as multiturn · 921c4d62
  Konrad authored May 14, 2024
  
  921c4d62
- system inst default update · 3369f887
  Konrad authored May 14, 2024
  
  3369f887
- Fix links in README guiding to another branch (#1838) · a9eaaf46
  LSinev authored May 14, 2024
  
  a9eaaf46
13 May, 2024 2 commits

interface doc update (#1807) · b24ac4b8
KonradSzafer authored May 13, 2024

b24ac4b8

Adding tinyBenchmarks datasets (#1545) · fe9fef4e

Lucas Weber authored May 13, 2024



* Add tinyBenchmarks

* Add acknowledgements

* Add ordering of outputs for data-parallel

* Run pre-commit

* Add few_shot specifications

* Add tinyBenchmarks post-processing

* add conditional import ; fix task names

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

fe9fef4e

12 May, 2024 1 commit
- system instruction · 9dfb58a3
  Konrad authored May 12, 2024
  
  9dfb58a3
09 May, 2024 1 commit

Copal task (#1803) · 1980a13c

Edd authored May 10, 2024

* add copal

* change name to copal id for clarity and the task name

* remove `copal_id...` to yaml to make it work

* checkmark on README

* change group name to `copal_id`

1980a13c

08 May, 2024 6 commits
- Update flag `--hf_hub_log_args` in interface documentation (#1806) · d32ce5cf
  aditya thomas authored May 08, 2024
```
* update interface documentation with flag --hf_hub_logs_arg

* update interface documentation with flag --hf_hub_logs_arg 2
```
  d32ce5cf
- add task for mmlu evaluation in arc multiple choice format (#1745) · 9097ad3e
  jonabur authored May 08, 2024
```
* add mmlu arc style evaluation

* rename arc_style to continuation

---------
Co-authored-by: Jonathan Burdge <jburdge@mahti-login11.mahti.csc.fi>
Co-authored-by: Jonathan Burdge <jburdge@mahti-login12.mahti.csc.fi>
```
  9097ad3e
- interface update · cd9e4540
  Konrad authored May 08, 2024
  
  cd9e4540
- variable rename · 4b790fa7
  Konrad authored May 08, 2024
  
  4b790fa7
- tokenizer attribute check · f4902e06
  Konrad authored May 08, 2024
  
  f4902e06
- initial chat template · 62df55d1
  Konrad authored May 08, 2024
  
  62df55d1
07 May, 2024 5 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt



The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unitxt]'
2. Added a check that unitxt is installed and print a clear error message if not

* Commited missing pyproject change

* Added documentation on adding datasets

* More doc changes

* add unitxt extra to readme

* run precommit

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

885f48d6

Logging Updates (Alphabetize table printouts, fix eval tracker bug) (#1774) (#1791) · d4a913c4
Hailey Schoelkopf authored May 07, 2024
```
* fix auto-batch size bug for seq2seq models

* alphabetize task + group tables ; fix eval tracker bug

* fix eval tracker bug
```
d4a913c4
Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt) variant (#1793) · d42a3e44
Hailey Schoelkopf authored May 07, 2024
```
* add Hendrycks MATH (no sympy checking) variant

* add readmes for MATH tasks
```
d42a3e44
link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b
Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775) · 7fe2b93c
Hailey Schoelkopf authored May 07, 2024

7fe2b93c

06 May, 2024 2 commits

Update `--tasks list` option in interface documentation (#1792) · 66cf07ef
aditya thomas authored May 07, 2024

66cf07ef

Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc

LSinev authored May 06, 2024

* Added fewshot sampling seeds to evaluator.simple_evaluate signature

Way to control seed of fewshot sampling
may help with #1591

* Added ability for custom sampler for ConfigurableTask

May be set in config like
```
fewshot_config:
  sampler: !function utils.MyFewshotSampler
```

* explicitly set fewshot random generator seed for HFLM generate_until_task test

* add backward compatibility for three args seed setup

* save seeds info to logs/reports

ae72cebc

05 May, 2024 4 commits
- Fix bug in setting until kwarg in openai completions (#1784) · 30c060d2
  ciaranby authored May 05, 2024
  
  30c060d2
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
- remove echo parameter in OpenAI completions API (#1779) · c34986da
  kwrobel.eth authored May 05, 2024
```
* remove echo parameter in OpenAI completions API

* remove context length parameter doc string
```
  c34986da
- limit fix (#1785) · cee785e0
  KonradSzafer authored May 05, 2024
  
  cee785e0
03 May, 2024 2 commits

eval tracker args fix (#1777) · 18f4eb57
KonradSzafer authored May 03, 2024

18f4eb57

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

02 May, 2024 2 commits
- Add option to set OpenVINO config (#1730) · e6394715
  Helena Kloosterman authored May 02, 2024
```
* Add option to set OpenVINO config

* Use utils.eval_logger for logging
```
  e6394715
- vllm lora support (#1756) · 83fd78a2
  bcicc authored May 02, 2024
```
* vllm lora support

* remove print

* version check, rename lora kwarg
```
  83fd78a2