Commits · ba73d13103f742de7afc00a6ec5ce288983fdf9e · gaoqiong / lm-evaluation-harness

03 Jul, 2024 17 commits

Merge branch 'group-agg-rework' of... · ba73d131

lintangsutawika authored Jul 04, 2024

Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-evaluation-harness into group-agg-rework

ba73d131

update · 6e2dbe76
lintangsutawika authored Jul 04, 2024

6e2dbe76
fix arabicmmlu · 269b66e9
haileyschoelkopf authored Jul 03, 2024

269b66e9

Merge branch 'group-agg-rework' of... · 73767dff

haileyschoelkopf authored Jul 03, 2024

Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-evaluation-harness into group-agg-rework

73767dff

make explicit group configs for leaderboard and other newer tasks · 9fc24ab4
haileyschoelkopf authored Jul 03, 2024

9fc24ab4
adjust group and tags loading · f0ea518f
lintangsutawika authored Jul 03, 2024

f0ea518f
track api/group.py · b03c7636
haileyschoelkopf authored Jul 03, 2024

b03c7636
fix alllll the merge conflicts · 7d09b24c
haileyschoelkopf authored Jul 03, 2024

7d09b24c
fix bbh aggregation filter usage · 6348b947
haileyschoelkopf authored Jul 03, 2024

6348b947
move group api to separate file · 94673d40
haileyschoelkopf authored Jul 03, 2024

94673d40
fixed sorting for multi-level printing · c6839d72
haileyschoelkopf authored Jul 03, 2024

c6839d72
pre-commit format · 96dfe976
lintangsutawika authored Jul 03, 2024

96dfe976
Merge branch 'main' into group-agg-rework · 3d1b8f43
Lintang Sutawika authored Jul 03, 2024

3d1b8f43
update mmlu · e200c24e
lintangsutawika authored Jul 03, 2024

e200c24e

#1442 inverse scaling tasks implementation (#1589) · d855d0ba

Hanwool Albert Lee authored Jul 03, 2024



* initial_implementation (test has to be proceeded)

* minor fix

* revised task name and implemented new task

* minor fixes

* new tasks implement

* minor fix

* added 'prompt injection' task

* delete prompt injection task (will be implemented at next PR)

* trust remote code

* Update lm_eval/tasks/inverse_scaling/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added readme

* Update lm_eval/tasks/README.md

* Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml

* Update lm_eval/tasks/inverse_scaling/README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md

* precommit?

* run precommit on readme

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

d855d0ba

Adds Open LLM Leaderboard Taks (#2047) · 3c8db1bb

Nathan Habib authored Jul 03, 2024



* adds leaderboard tasks

* Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml

* add readme

* Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml

* modify readme

* fix bbh task

* fix bbh salient task

* modify the readme

* Delete lm_eval/tasks/leaderboard/ifeval/README.md

* Delete lm_eval/tasks/leaderboard/math/README.md

* add leaderboard to the tasks repertory

* add anouncment about new leaderbaord tasks

* linting

* Update README.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* installs ifeval dependency in new_task github workflow

---------
Co-authored-by: Nathan Habib <nathan.habib@huggingface.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

3c8db1bb

Update hellaswag.yaml (#2029) · 1870ee4e
Hailey Schoelkopf authored Jul 03, 2024

1870ee4e

02 Jul, 2024 11 commits
- Python tasks which subclass ConfigurableTask now run · e5811879
  haileyschoelkopf authored Jul 02, 2024
  
  e5811879
- don't use to-be-deprecated group: config field in overview notebook · f2e518ab
  haileyschoelkopf authored Jul 02, 2024
  
  f2e518ab
- switch mmlu variants over to using · 22b95671
  haileyschoelkopf authored Jul 02, 2024
  
  22b95671
- clean up diff · c1d9e625
  haileyschoelkopf authored Jul 02, 2024
  
  c1d9e625
- giving this a try · 3b7e6cc6
  haileyschoelkopf authored Jul 02, 2024
  
  3b7e6cc6
- fix merge conflicts? · d13b1f56
  haileyschoelkopf authored Jul 02, 2024
  
  d13b1f56
- Merge branch 'group-agg-rework' into group-agg-fixes · de4400c5
  haileyschoelkopf authored Jul 02, 2024
  
  de4400c5
- don't duplicate task names · fe5254ac
  haileyschoelkopf authored Jul 02, 2024
  
  fe5254ac
- update docs · 763a1c5e
  haileyschoelkopf authored Jul 02, 2024
  
  763a1c5e
- make KMMLU a tag for now · 30fbcfc9
  haileyschoelkopf authored Jul 02, 2024
  
  30fbcfc9
- update gemma-2 default BOS behavior (#2049) · 67a990e7
  Hailey Schoelkopf authored Jul 01, 2024
  
  67a990e7
01 Jul, 2024 4 commits
- Fix strip whitespace filter (#2048) · 9088a68f
  Nathan Habib authored Jul 01, 2024
```
* batch commit

* :Revert "batch commit"

This reverts commit d859d1ca.

* batch commit

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* checkout from main

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup
```
  9088a68f
- fix wandb logger module import in example (#2041) · e83e891d
  Ogundepo Odunayo authored Jul 01, 2024
  
  e83e891d
- update to v0.4.3 (#2046) · 3fa4fd72
  Hailey Schoelkopf authored Jul 01, 2024
  
  3fa4fd72
- ship with exact_match function already used ; don't call evaluate.load() on import (#2045) · a8ac0446
  Hailey Schoelkopf authored Jul 01, 2024
  
  a8ac0446
29 Jun, 2024 1 commit
- fail gracefully upon tokenizer logging failure (#2038) · 2a6acc88
  Hailey Schoelkopf authored Jun 29, 2024
  
  2a6acc88
28 Jun, 2024 3 commits

Add chat template to `vllm` (#2034) · cc2d3463

Baber Abbasi authored Jun 28, 2024



* add chat template

* refactor token padding

* nit

* nit

* check on failing test

* check transformers version

* remove transformers pin

* add ids to test

* nit

* fixup

* fix bos bug

* nit

* fixup! fix bos bug

* increase tolerance for table test

* don't detokenize vllm logprobs

* Update lm_eval/models/utils.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* pre-commit run --all-files

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

cc2d3463

fix cache (#2037) · e922cceb
Baber Abbasi authored Jun 28, 2024

e922cceb

Fixes scrolls task bug with few_shot examples (#2003) · 801322e0

Steven Basart authored Jun 28, 2024

Bug:

```
python -m scripts.write_out --task scrolls_quality --output_base_path ~/workspace/
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/lm-evaluation-harness/scripts/write_out.py", line 92, in <module>
    main()
  File "/lm-evaluation-harness/scripts/write_out.py", line 51, in main
    task_dict = tasks.get_task_dict(task_names, task_manager)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 423, in get_task_dict
    task_name_from_string_dict = task_manager.load_task_or_group(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 271, in load_task_or_group
    collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
  File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 162, in _load_individual_task_or_group
    return load_task(task_config, task=name_or_config, group=parent_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 148, in load_task
    task_object = config["class"]()
                  ^^^^^^^^^^^^^^^^^
  File "/lm-evaluation-harness/lm_eval/tasks/scrolls/task.py", line 120, in __init__
    super().__init__()
  File "/lm-evaluation-harness/lm_eval/api/task.py", line 703, in __init__
    self._config = TaskConfig(**config)
                   ^^^^^^^^^^^^^^^^^^^^
TypeError: lm_eval.api.task.TaskConfig() argument after ** must be a mapping, not NoneType
```

801322e0

27 Jun, 2024 1 commit
- update task_id to be updateable and uses group:task format · 43765669
  lintangsutawika authored Jun 27, 2024
  
  43765669
26 Jun, 2024 1 commit
- Fix `trust_remote_code`-related test failures (#2024) · e5e5ee0c
  Hailey Schoelkopf authored Jun 26, 2024
```
* make MMLU trust remote code to fix tests

* remove trust remote code
```
  e5e5ee0c
25 Jun, 2024 2 commits

Update interface.md (#1982) · 6e49b1f6

johnwee1 authored Jun 26, 2024



* Update interface.md

update interface to remove link to really outdated commit of evaluator.py

* switch to relative referencing?

* Update interface.md

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

6e49b1f6

Factor out LM-specific tests (#1859) · 0366c74f

Hailey Schoelkopf authored Jun 25, 2024

* separate out optimum/neuralmagic tests to separate job

* fix vllm tests

* fix bug in --trust_remote_code

* use datasets.config instead intentionally

* fix remote code issue?

0366c74f