Commits · 9a902155bc83bfd19231f2e08e2470adb6700a23 · gaoqiong / lm-evaluation-harness

02 Feb, 2024 1 commit
- Fix for https://github.com/EleutherAI/lm-evaluation-harness/issues/1383 (#1384) · 9a902155
  Pasquale Minervini authored Feb 02, 2024
```
Fixes https://github.com/EleutherAI/lm-evaluation-harness/issues/1383

If this is okay, it will need to be propagated to SCROLLS
```
  9a902155
01 Feb, 2024 1 commit

Faster Task and Group Loading, Allow Recursive Groups (#1321) · d714fc95

Lintang Sutawika authored Feb 01, 2024



* add trust_remote_code as default

* task for testing recursive

* changed source of ALL_TASKS

* tasks should only accept TaskObjects

* initialize_tasks returns list of tasks and groups

* remove trust_remote_code for now

* moved constructor process to inside load_yaml_config

* more comprehensive way to index tasks and groups

* pre-commit format

* add exit after error

* adjust how task objects are called

* no need to use get_task_dict

* load_task_or_group works but only for tasks

* pre-commit format

* half working for nested groups

* changed variable names

* allow groups and tasks to work

* temp save

* indexing and loading are part of a task_manager object

* adapted initialize_tasks

* iron out bugs

* fixed typo

* fixed typo

* simplified code

* further tidy up

* remove lines for testing

* removed test lines

* removed unused code

* remove unused import

* fixed bug

* removed comments

* group in a list of group can accept parameter changes like `num_fewshot`

* add trust_remote_code as default

* task for testing recursive

* changed source of ALL_TASKS

* tasks should only accept TaskObjects

* initialize_tasks returns list of tasks and groups

* remove trust_remote_code for now

* moved constructor process to inside load_yaml_config

* more comprehensive way to index tasks and groups

* pre-commit format

* add exit after error

* adjust how task objects are called

* no need to use get_task_dict

* load_task_or_group works but only for tasks

* pre-commit format

* half working for nested groups

* changed variable names

* allow groups and tasks to work

* temp save

* indexing and loading are part of a task_manager object

* adapted initialize_tasks

* iron out bugs

* fixed typo

* fixed typo

* simplified code

* further tidy up

* remove lines for testing

* removed test lines

* removed unused code

* remove unused import

* fixed bug

* removed comments

* group in a list of group can accept parameter changes like `num_fewshot`

* check if config is task update

* add GroupConfig object

* edit test yaml

* remove args

* testing returning to python task list

* add weight_by_size config

* describe weight_by_size in docs

* fix weight by size potential error

* can load individual custom python class task

* moved import_function into the config loading file

* remove print lines

* add squadv2 yaml

* temporary scroll implementation

* revert back to use load_yaml_config but with modes

* fix group being loaded with a None

* reformat

* can load unregistered tasks from a group

* update scrolls

* edit scrolls multiplechoice task

* adjust class initialization

* fix initialization

* changed how to identify group and python tasks, fix logger

* allow loading "include" that is nested in a group config

* reworked flan benchmark

* allow duplicate task in the same group to co-exist

* process group_alias

* removed group_alias

* allow parameters set in group_config to apply to all tasks in tasklist

* add function, but comment for now

* reworked processing dict-base config

* fixed how configs in group are processed

* update to allow root group to have its alias used

* remove unused classes

* remove unused classes

* revert some parts to original

* forgot to change one variable

* adapt the new process to use get_task_dict

* fix for singular group call

* fix variable names

* add TaskManager into the evaluator

* format

* changed how dict tasks are loaded

* add docs

* Update docs/new_task_guide.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update evaluator.py

* Update evaluator.py

* remove groupconfig for now

* changed _config to config

* update interface.md to explain TaskManager

* added property functions

* adjusted logger

* update write_out.py

* updated tests

* added documentation and some modifications

* added docstring documentation

* precommit format

* updated task loading for tests

* updates tests

* changed arg order for load_yaml_config

* update to handle scrolls and edit log message

* remove unused lines

* return a list of task classes and not a dict

* Update __init__.py

* Delete lm_eval/tasks/benchmarks/test.yaml

* Update task.py

* Update lm_eval/utils.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/utils.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update utils.py

* re-added old functions with new log message

* Update docs/new_task_guide.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update new_task_guide.md

* added infor regarding `get_task_dict` and documentation

* add get_config for Task

* pre-commit formatting

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

d714fc95

11 Jan, 2024 1 commit
- Fix bug in multi-token Stop Sequences (#1268) · ff739414
  Hailey Schoelkopf authored Jan 11, 2024
```
* fix incorrect lookback protections

* bump generate_until task versions
```
  ff739414
20 Dec, 2023 1 commit

Switch Linting to `ruff` (#1166) · 65b8761d

Baber Abbasi authored Dec 20, 2023

* add ruff and isort. remove black and flake8

* remove unnecessary dependencies

* remove dependency from table

* change order

* ran ruff

* check 3.9

* exclude evaluator

* update CI workflow

* use ruff config in pyproject.toml

* test

* add isort rules to ruff

* sort imports

* import `make_table`

* try stages for no-commit-to-branch

* turn on mypy for pre-commit

* test

* test

* test

* change no-commit-to-branch to default

* nits

* fixed dependency

65b8761d

28 Nov, 2023 1 commit
- add versions · 0d03a9f3
  lintangsutawika authored Nov 28, 2023
  
  0d03a9f3
17 Nov, 2023 1 commit
- precommit · f40b7d0e
  lintangsutawika authored Nov 17, 2023
  
  f40b7d0e
15 Nov, 2023 1 commit
- update squad file · c9a3fd3f
  lintangsutawika authored Nov 15, 2023
  
  c9a3fd3f
09 Nov, 2023 2 commits
- reformat · 9a64e642
  lintangsutawika authored Nov 09, 2023
  
  9a64e642
- readded task descriptions · a3520619
  lintangsutawika authored Nov 09, 2023
  
  a3520619
08 Nov, 2023 1 commit
- removed yaml version of squad · 9817e7c2
  lintangsutawika authored Nov 08, 2023
  
  9817e7c2
07 Nov, 2023 2 commits
- moved squad.py and format changes · e1d5c849
  lintangsutawika authored Nov 07, 2023
  
  e1d5c849
- add squad from master · 8bf55a20
  lintangsutawika authored Nov 07, 2023
  
  8bf55a20
11 May, 2023 1 commit

Add multilingual datasets (XCOPA, XStoryCloze, XWinograd, PAWS-X, XNLI, MGSM) (#426) · d1451679

Julen Etxaniz authored May 11, 2023

* add xcopa dataset

* add xstory_cloze dataset and run pre-commit

* fix xcopa validation and test sets

* add xwinograd dataset

* add pawsx task

* add xnli task

* update task table with recently added tasks

* remove unused metrics from paws-x

* add mgsm task and fix gsm8k

* fix gsm8k until

* update task table

d1451679

28 Apr, 2023 1 commit

Add non-programmatic BIG-bench-hard tasks (#406) · 602abceb

yurodiviy authored Apr 28, 2023



* Support bigbench-hard json tasks using multiple_choice_grade

* Add support for greedy decoding in bigbench tasks

* move bigbench_resources to datasets

* rectify changes to rf.greedy_until w upstream

* make path to resource import reflect new location

---------
Co-authored-by: haileyschoelkopf <hailey.schoelkopf@yale.edu>

602abceb

19 Apr, 2023 1 commit
- in-place replace main with lm-eval2, keeping old git history · d2a9b759
  haileyschoelkopf authored Apr 19, 2023
  
  d2a9b759
24 Jun, 2022 1 commit
- fix key access in squad evaluation metrics · 1a153185
  Konstantin Schulz authored Jun 24, 2022
  
  1a153185
03 May, 2022 2 commits
- Revert `tests/testdata` changes and address flake8 issues · 8c997e53
  jon-tow authored May 03, 2022
  
  8c997e53
- add pre-commit · 121b7096
  Fabrizio Milo authored May 02, 2022
  
  121b7096
18 Mar, 2022 1 commit
- Refactor `Task` download · 7c9da714
  Jonathan Tow authored Mar 11, 2022
  
  7c9da714
28 Feb, 2022 2 commits
- Make citations module-level constants · 3f13d15f
  Jonathan Tow authored Feb 28, 2022
  
  3f13d15f
- Add citations and descriptions to all tasks · a1aceacd
  Jonathan Tow authored Feb 28, 2022
  
  a1aceacd
05 Feb, 2022 1 commit
- Added decontamination to remaining evals · dae7b868
  Quentin Gregory Anthony authored Feb 05, 2022
  
  dae7b868
30 Oct, 2021 1 commit
- Replace the `fewshot_description` API with a `description_dict` based interface · 8ac99269
  Jonathan Tow authored Oct 30, 2021
  
  8ac99269
09 Aug, 2021 1 commit
- fix squad breakage due to HF change · 198ca732
  Leo Gao authored Aug 09, 2021
  
  198ca732
05 Jun, 2021 1 commit
- Add task versioning · 105fa974
  Leo Gao authored Jun 04, 2021
  
  105fa974
28 Mar, 2021 2 commits
- Rename task · 8de85534
  Leo Gao authored Mar 28, 2021
  
  8de85534
- squad: fix aggregation · cbc5c9c8
  Leo Gao authored Mar 28, 2021
  
  cbc5c9c8
27 Mar, 2021 1 commit
- Remove unused imports and format imports · d5d19219
  Jonathan Tow authored Mar 26, 2021
  
  d5d19219
24 Mar, 2021 2 commits
- Fixed calling of loglikelihood within SQuAD task · 14dd29c4
  Charles Foster authored Mar 23, 2021
  
  14dd29c4
- SQuAD fixed to use loglikelihood API to calculate the probability of an unanswerable question. · 5be42b4d
  Charles Foster authored Mar 23, 2021
  
  5be42b4d
13 Feb, 2021 1 commit
- Fixes SQuAD v2 metric computation. · 232c9ab6
  Charles Foster authored Feb 13, 2021
  
  232c9ab6
11 Feb, 2021 2 commits
- Fixes to make greedy_until work · 432bd44c
  Leo Gao authored Feb 10, 2021
```
# Conflicts:
#	lm_eval/models/gpt2.py
#	lm_eval/tasks/squad.py
```
  432bd44c
- Fixes to make greedy_until work · 7b649ded
  Leo Gao authored Feb 10, 2021
  
  7b649ded
10 Feb, 2021 3 commits
- Removed unnecessary import. · eb4c8407
  Charles Foster authored Feb 09, 2021
  
  eb4c8407
- Passes tests, except for NotImplementedError for request type greedy_until. · bba6e0e9
  Charles Foster authored Feb 09, 2021
  
  bba6e0e9
- Skeleton of SQuADv2. Not yet tested. · f48b119d
  Charles Foster authored Feb 09, 2021
  
  f48b119d
01 Feb, 2021 1 commit
- Fix linting problems · fe4a1efd
  Leo Gao authored Feb 01, 2021
  
  fe4a1efd
22 Jan, 2021 2 commits
- Remove rewrite reminder comment from everything except SuperGLUE · f4120e59
  Leo Gao authored Jan 21, 2021
  
  f4120e59
- Clean up code, remove some footguns · e31b4b31
  Leo Gao authored Jan 21, 2021
  
  e31b4b31
13 Jan, 2021 1 commit
- Move `doc_to_text` target code into `doc_to_target` · d77241eb
  Jonathan Tow authored Jan 13, 2021
  
  d77241eb