- 10 Jul, 2025 1 commit
-
-
Baber Abbasi authored
* check for chat for warning * add test * remove yaml extension from some evalita configs * move unitxt to own test script * fix CI test
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* switch MMLU to cais/mmlu * switch back to tj-actions/changed-files * cache HF folder
-
- 17 Mar, 2025 1 commit
-
-
Angelika Romanou authored
* Add INCLUDE tasks * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
- 04 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
* Add a test for a custom unitxt task * Update task.py to bring in line with breaking change in v1.17.2 * Fix lint
-
- 19 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update pre-commit
-
- 04 Dec, 2024 1 commit
-
-
Baber Abbasi authored
-
- 20 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* fix test task * dont call lm.chat_template each time
-
- 18 Nov, 2024 1 commit
-
-
Kozzy Voudouris authored
* Add metabench (Kipnis et al. 2024) * Update metabench tasks for full replication of original benchmarks, using publicly available datasets * Remove unnecessary import * Add permute versions of each task, where the answer orders are randomly shuffled. * Add metabench group for easier evaluations * Fix mmlu counts after removing duplicate * Add secondary datasets * Fix f-string error * Fix f-string error for permute processing * Add original hash to outputs for easy matching to original results * Add line break at end of utils files * Remove extra line from winogrande * Reformat for linters * fix multiple input test * appease pre-commit * Add metabench to tasks README * fix multiple input `test_doc_to_text` --------- Co-authored-by:Baber <baber@hey.com>
-
- 09 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* switch `max_tokens` for `max_completion_tokens`. OpenAI ChatCompletions * remove stop, temp=1 for o1 * add chat assertion * HF_DATASETS_TRUST_REMOTE_CODE = True for task tests * move warning
-
- 26 Sep, 2024 2 commits
-
-
Baber Abbasi authored
* add newlines to task descriptions; increment versions * fix task tests (with groups) * Apply suggestions from code review --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Baber Abbasi authored
* change glianorex to test set * nit * fix test; doc_to_target can be str for multiple_choice * nit
-
- 31 May, 2024 1 commit
-
-
LSinev authored
-
- 04 Mar, 2024 1 commit
-
-
Vicki Boykis authored
-
- 01 Feb, 2024 1 commit
-
-
Lintang Sutawika authored
* add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * check if config is task update * add GroupConfig object * edit test yaml * remove args * testing returning to python task list * add weight_by_size config * describe weight_by_size in docs * fix weight by size potential error * can load individual custom python class task * moved import_function into the config loading file * remove print lines * add squadv2 yaml * temporary scroll implementation * revert back to use load_yaml_config but with modes * fix group being loaded with a None * reformat * can load unregistered tasks from a group * update scrolls * edit scrolls multiplechoice task * adjust class initialization * fix initialization * changed how to identify group and python tasks, fix logger * allow loading "include" that is nested in a group config * reworked flan benchmark * allow duplicate task in the same group to co-exist * process group_alias * removed group_alias * allow parameters set in group_config to apply to all tasks in tasklist * add function, but comment for now * reworked processing dict-base config * fixed how configs in group are processed * update to allow root group to have its alias used * remove unused classes * remove unused classes * revert some parts to original * forgot to change one variable * adapt the new process to use get_task_dict * fix for singular group call * fix variable names * add TaskManager into the evaluator * format * changed how dict tasks are loaded * add docs * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update evaluator.py * Update evaluator.py * remove groupconfig for now * changed _config to config * update interface.md to explain TaskManager * added property functions * adjusted logger * update write_out.py * updated tests * added documentation and some modifications * added docstring documentation * precommit format * updated task loading for tests * updates tests * changed arg order for load_yaml_config * update to handle scrolls and edit log message * remove unused lines * return a list of task classes and not a dict * Update __init__.py * Delete lm_eval/tasks/benchmarks/test.yaml * Update task.py * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update utils.py * re-added old functions with new log message * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update new_task_guide.md * added infor regarding `get_task_dict` and documentation * add get_config for Task * pre-commit formatting --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 27 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* fix group * siqa: default.yml -> default.yaml * max_gen_toks -> self.max_gen_toks * add ids to task tests * fix siqa * fix gen_kwargs for openai-chat
-
- 20 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* add ruff and isort. remove black and flake8 * remove unnecessary dependencies * remove dependency from table * change order * ran ruff * check 3.9 * exclude evaluator * update CI workflow * use ruff config in pyproject.toml * test * add isort rules to ruff * sort imports * import `make_table` * try stages for no-commit-to-branch * turn on mypy for pre-commit * test * test * test * change no-commit-to-branch to default * nits * fixed dependency
-
- 17 Nov, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 05 Sep, 2023 1 commit
-
-
baberabb authored
-
- 02 Aug, 2023 1 commit
-
-
baberabb authored
-
- 22 Jul, 2023 1 commit
-
-
baberabb authored
-
- 21 Jul, 2023 4 commits
- 17 Jul, 2023 4 commits
- 14 Jul, 2023 2 commits
- 21 Jun, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 02 Dec, 2022 1 commit
-
-
jon-tow authored
-
- 03 May, 2022 1 commit
-
-
Fabrizio Milo authored
-
- 03 Jan, 2022 1 commit
-
-
Leo Gao authored
-
- 24 Nov, 2021 1 commit
-
-
Jason Phang authored
-
- 28 Aug, 2021 1 commit
-
-
Leo Gao authored
-
- 14 Jun, 2021 1 commit
-
-
Leo Gao authored
-
- 12 Jun, 2021 2 commits
- 05 Jun, 2021 1 commit
-
-
Leo Gao authored
-