Commits · inverse-scaling-tasks · gaoqiong / lm-evaluation-harness

30 May, 2024 1 commit
- remove target_delimiter · d7457ad3
  haileyschoelkopf authored May 30, 2024
  
  d7457ad3
29 May, 2024 1 commit
- Merge branch 'main' into inverse-scaling-tasks · 60c9c170
  haileyschoelkopf authored May 29, 2024
  
  60c9c170
28 May, 2024 1 commit
- Updated vllm imports in vllm_causallms.py (#1890) · b4cd85d4
  Michael Goin authored May 28, 2024
```
* Reorder vllm imports in vllm_causallms.py

* Update vllm_causallms.py
```
  b4cd85d4
26 May, 2024 1 commit
- Rename `lm_eval.logging -> lm_eval.loggers` (#1858) · 0ff6ab99
  Hailey Schoelkopf authored May 26, 2024
```
* rename lm_eval.logging module

* fix evaluation tracker args
```
  0ff6ab99
24 May, 2024 7 commits

Lintang Sutawika authored May 25, 2024



* edit process multiple-choice

* split template yaml

* remove

* modified multiple_choice tasks

* udpate

* Update multiple_choice_template_b_yaml

* Update multiple_choice_template_a_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

78a215e0

add mmlu tasks from pile-t5 (#1710) · f2ea37e3

Lintang Sutawika authored May 25, 2024



* add mmlu tasks from pile-t5

* Update _mmlu_flan_cot_fewshot_template_yaml

* Update _mmlu_flan_cot_zeroshot_template_yaml

* Update _mmlu_flan_generative_template_yaml

* Update _mmlu_flan_loglikelihood_template_yaml

* Update _default_template_yaml

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

f2ea37e3

Fix for bootstrap_iters = 0 case (#1715) (#1789) · b043b050
Hailey Schoelkopf authored May 24, 2024
```
* add handling for bootstrap_iters=0 case

* add more detail to docstring

* run precommit
```
b043b050

Fix Brier Score (#1847) · 7d747ea9

Lintang Sutawika authored May 25, 2024

`gold_one_hot` needs to follow the dimension of predictions so that it still works when `--limit` is used and the indexes in gold does not cover all gold indexes.

7d747ea9

Fix `batch_size=auto` for HF Seq2Seq models (#1765) (#1790) · 5f3a6621
Hailey Schoelkopf authored May 24, 2024
```
* fix auto-batch size bug for seq2seq models

* run linter
```
5f3a6621
[HFLM]Use Accelerate's API to reduce hard-coded CUDA code (#1880) · c4c15917
Huazhong Ji authored May 24, 2024

c4c15917
Fix outdated links to the latest links in `docs` (#1876) · b5afe229
DongGeon Lee authored May 24, 2024

b5afe229

23 May, 2024 1 commit
- Unpin vllm in dependencies (#1874) · 5711ab87
  Edward Gan authored May 23, 2024
  
  5711ab87
22 May, 2024 1 commit
- Update polemo2_out.yaml (#1871) · 70e1de09
  zhabuye authored May 22, 2024
  
  70e1de09
21 May, 2024 2 commits
- fixed docs typos (#1863) · cb22e502
  Zafir Stojanovski authored May 21, 2024
  
  cb22e502
- fixed incorrect check for task type (replace `~` with `not`) (#1865) · 00b7a61c
  Zafir Stojanovski authored May 21, 2024
  
  00b7a61c
19 May, 2024 1 commit

Fix: support PEFT/LoRA with added tokens (#1828) · 86319a9b

Nick Doiron authored May 19, 2024



* resize model embeddings

* resize only

* tokenizer help

* load tokenizer before model

* add comment and run precommit lint

* Add log message
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

86319a9b

14 May, 2024 1 commit
- Fix links in README guiding to another branch (#1838) · a9eaaf46
  LSinev authored May 14, 2024
  
  a9eaaf46
13 May, 2024 2 commits

interface doc update (#1807) · b24ac4b8
KonradSzafer authored May 13, 2024

b24ac4b8

Adding tinyBenchmarks datasets (#1545) · fe9fef4e

Lucas Weber authored May 13, 2024



* Add tinyBenchmarks

* Add acknowledgements

* Add ordering of outputs for data-parallel

* Run pre-commit

* Add few_shot specifications

* Add tinyBenchmarks post-processing

* add conditional import ; fix task names

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

fe9fef4e

09 May, 2024 1 commit

Copal task (#1803) · 1980a13c

Edd authored May 10, 2024

* add copal

* change name to copal id for clarity and the task name

* remove `copal_id...` to yaml to make it work

* checkmark on README

* change group name to `copal_id`

1980a13c

08 May, 2024 2 commits

Update flag `--hf_hub_log_args` in interface documentation (#1806) · d32ce5cf

aditya thomas authored May 08, 2024

* update interface documentation with flag --hf_hub_logs_arg

* update interface documentation with flag --hf_hub_logs_arg 2

d32ce5cf

add task for mmlu evaluation in arc multiple choice format (#1745) · 9097ad3e

jonabur authored May 08, 2024



* add mmlu arc style evaluation

* rename arc_style to continuation

---------
Co-authored-by: Jonathan Burdge <jburdge@mahti-login11.mahti.csc.fi>
Co-authored-by: Jonathan Burdge <jburdge@mahti-login12.mahti.csc.fi>

9097ad3e

07 May, 2024 5 commits

Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6

Yoav Katz authored May 08, 2024

* Initial support for Unitxt datasets in LM Eval Harness

See  https://github.com/IBM/unitxt

The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.

The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.

* Added dataset loading check to generate_yaml

Improved error messages.

* Speed up generate_yaml

Added printouts and improved error message

* Added output printout

* Simplified integration of unitxt datasets

Store all the common yaml configuration in a yaml include shared by all datasets of the same task.

* Post code review comments - part 1

1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
2. Added more datasets and tasks (NER, GEC)
3. Added README

* Post code review comments - part 2

1. Added install unitxt install option in pyproject.toml:
pip install 'lm_eval[unit...

885f48d6

Logging Updates (Alphabetize table printouts, fix eval tracker bug) (#1774) (#1791) · d4a913c4
Hailey Schoelkopf authored May 07, 2024
```
* fix auto-batch size bug for seq2seq models

* alphabetize task + group tables ; fix eval tracker bug

* fix eval tracker bug
```
d4a913c4
Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt) variant (#1793) · d42a3e44
Hailey Schoelkopf authored May 07, 2024
```
* add Hendrycks MATH (no sympy checking) variant

* add readmes for MATH tasks
```
d42a3e44
link to the example output on the hub (#1798) · 20be169b
KonradSzafer authored May 07, 2024

20be169b
Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775) · 7fe2b93c
Hailey Schoelkopf authored May 07, 2024

7fe2b93c

06 May, 2024 2 commits

Update `--tasks list` option in interface documentation (#1792) · 66cf07ef
aditya thomas authored May 07, 2024

66cf07ef

Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc

LSinev authored May 06, 2024

* Added fewshot sampling seeds to evaluator.simple_evaluate signature

Way to control seed of fewshot sampling
may help with #1591

* Added ability for custom sampler for ConfigurableTask

May be set in config like
```
fewshot_config:
  sampler: !function utils.MyFewshotSampler
```

* explicitly set fewshot random generator seed for HFLM generate_until_task test

* add backward compatibility for three args seed setup

* save seeds info to logs/reports

ae72cebc

05 May, 2024 4 commits
- Fix bug in setting until kwarg in openai completions (#1784) · 30c060d2
  ciaranby authored May 05, 2024
  
  30c060d2
- Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776) · 297966f7
  Muhammad Bin Usman authored May 06, 2024
```
fix `----hf_hub_log_args` to `--hf_hub_log_args`
```
  297966f7
- remove echo parameter in OpenAI completions API (#1779) · c34986da
  kwrobel.eth authored May 05, 2024
```
* remove echo parameter in OpenAI completions API

* remove context length parameter doc string
```
  c34986da
- limit fix (#1785) · cee785e0
  KonradSzafer authored May 05, 2024
  
  cee785e0
03 May, 2024 2 commits

eval tracker args fix (#1777) · 18f4eb57
KonradSzafer authored May 03, 2024

18f4eb57

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

02 May, 2024 2 commits
- Add option to set OpenVINO config (#1730) · e6394715
  Helena Kloosterman authored May 02, 2024
```
* Add option to set OpenVINO config

* Use utils.eval_logger for logging
```
  e6394715
- vllm lora support (#1756) · 83fd78a2
  bcicc authored May 02, 2024
```
* vllm lora support

* remove print

* version check, rename lora kwarg
```
  83fd78a2
01 May, 2024 3 commits

upload new tasks (#1728) · caaf9ab6

Simran Arora authored May 01, 2024



* upload new tasks

* add readmes

* run linters

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

caaf9ab6

Fix m_arc choices (#1760) · f27c4050

Zehan Li authored May 02, 2024



* Update utils.py

This is a 4-choice task, option_e is null for all but 3 samples

* Fix options

Adaptive choices

* add option e

* bump multilingual arc version

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

f27c4050

Pile 10k new task (#1758) · b898bdaa
Gabriel Mukobi authored May 01, 2024
```
* Add Pile-10k readme

* Add Pile-10k task configuration file
```
b898bdaa