- 21 Feb, 2025 1 commit
-
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-
- 07 Feb, 2025 1 commit
-
-
omahs authored
* fix typo * fix typos * fix typos
-
- 14 Dec, 2024 1 commit
-
-
Baber Abbasi authored
* make warning prominent * make warning prominent
-
- 05 Nov, 2024 1 commit
-
-
Sypherd authored
-
- 08 Jul, 2024 2 commits
-
-
Elron Bandel authored
* Updated unitxt loading Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Revert change to general Readme Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Fix scrolls Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Update documentation Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Enforce backward compatability Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Format unitxt class Signed-off-by:
elronbandel <elron.bandel@ibm.com> --------- Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> Signed-off-by:
elronbandel <elron.bandel@ibm.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Lintang Sutawika authored
* add greoup_config arg * add a group config that allows disabling table for group score and group aggregate in general * fixed size configuration * adjust config * add group config * adjust mmlu to use group_config * fixed args input in aggregate_subtask_metrics * fixed issues related to printing alias of group and updated yaml * update all mmlu variants to include group_config * edit format * modify mmlu tasks * adjust group to also be a configurable group * add configurable group * simplify get_task_list * adjust group scoring with using ConfigurableGroup * adjust args * update mmlu * update mmlu * update to work with new group and task configuration * readd group_agg * readd files * move prepare_print_tasks to evaluator_utils * sort set to False by default, fix predict_only arg * add version for groups * reversed task list * update additional condition when loading a group in a group yaml * update truthfulqa * add description regarding tags replacing group * replace group to tag * fixed conditional statement * remove warning * update loading of task group and newly added tags * reformat with pre-commit * fixed info log * update * fix bug * fix bug * use task id to differentiate tasks * convert all groups to configurable groups * use task_id * reformat * add task_id for python tasks as well * add task_id for python tasks as well * add task_id for python tasks as well * revert truthfulqa * revert mmlu tasks * new mmlu config * new group config parameter `tag_to_task` * Update truthfulqa_mc2.yaml * reformate * add _process_group_config * adjust task_id * add get_subtask_list function to get proper subtask list * group config to_dict update * remove tag check * update mmlu * fix config passing issues * add test yaml * format fix * add documentation * corner case for single tag being called * fix indentation * formatting * update all mmlu variants * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * remove group_alias * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * remove version for metadata * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * update mmlu/ * removed " " in make_table * change how aggregate_metric is loaded * change how aggregate_metric is loaded * update aggregate_metric arg * update format * update format * some docs fixes * add groups for agieval, aexams, aclue * add more explicit aggregation groups * add more groupings / tags distinctions * add more groupings * more groupings * add many explicit group configs * add many explicit group configs * add more explicit group configs * add more explicit group configs * add more error msgs, agg_metric -> agg_metric_list * some docs updates * update task_id to be updateable and uses group:task format * make KMMLU a tag for now * update docs * don't duplicate task names * fix merge conflicts? * giving this a try * clean up diff * switch mmlu variants over to using * don't use to-be-deprecated group: config field in overview notebook * Python tasks which subclass ConfigurableTask now run * update mmlu * pre-commit format * fixed sorting for multi-level printing * move group api to separate file * fix bbh aggregation filter usage * track api/group.py * adjust group and tags loading * make explicit group configs for leaderboard and other newer tasks * fix arabicmmlu * update * change arabicmmlu template name??? * update group alias * fix printing bugs * check table printing is correct ; update tests * use mmlu_stem to have a group included in print tests --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 03 Jun, 2024 1 commit
-
-
anthony-dipofi authored
* added tasks and task family descriptors * continue work on task list w/ links; slightly reorganize README * Apply suggestions from code review * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder * Update new_task_guide.md * Update README.md * run linter * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs * fix typo * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * apply format --------- Co-authored-by:
Harish Vadaparty <harishvadaparty@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 31 May, 2024 1 commit
-
-
Clémentine Fourrier authored
* init test 1 * fix * this format seems to be working - need to update all other tasks with the new format * bbh with few shot format * fix fewshot bbh * add mmlu flan cot * samples of cot * kmmlu * fix gsm8k * update keys for mmlu * minerva math * bbh * fix * fix samples * small fixes to templates * last prompt format change * fixing prompt * fixed minerva math format * rm accidental commited file * added doc for few shot samples * Update lm_eval/loggers/evaluation_tracker.py * Update lm_eval/loggers/evaluation_tracker.py * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * added check in sampler per code review * added the system from a function, plus an example in minerva math * style * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix unit tests 1 * forcing use of test split --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 24 May, 2024 1 commit
-
-
DongGeon Lee authored
-
- 21 May, 2024 1 commit
-
-
Zafir Stojanovski authored
-
- 01 Feb, 2024 1 commit
-
-
Lintang Sutawika authored
* add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * check if config is task update * add GroupConfig object * edit test yaml * remove args * testing returning to python task list * add weight_by_size config * describe weight_by_size in docs * fix weight by size potential error * can load individual custom python class task * moved import_function into the config loading file * remove print lines * add squadv2 yaml * temporary scroll implementation * revert back to use load_yaml_config but with modes * fix group being loaded with a None * reformat * can load unregistered tasks from a group * update scrolls * edit scrolls multiplechoice task * adjust class initialization * fix initialization * changed how to identify group and python tasks, fix logger * allow loading "include" that is nested in a group config * reworked flan benchmark * allow duplicate task in the same group to co-exist * process group_alias * removed group_alias * allow parameters set in group_config to apply to all tasks in tasklist * add function, but comment for now * reworked processing dict-base config * fixed how configs in group are processed * update to allow root group to have its alias used * remove unused classes * remove unused classes * revert some parts to original * forgot to change one variable * adapt the new process to use get_task_dict * fix for singular group call * fix variable names * add TaskManager into the evaluator * format * changed how dict tasks are loaded * add docs * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update evaluator.py * Update evaluator.py * remove groupconfig for now * changed _config to config * update interface.md to explain TaskManager * added property functions * adjusted logger * update write_out.py * updated tests * added documentation and some modifications * added docstring documentation * precommit format * updated task loading for tests * updates tests * changed arg order for load_yaml_config * update to handle scrolls and edit log message * remove unused lines * return a list of task classes and not a dict * Update __init__.py * Delete lm_eval/tasks/benchmarks/test.yaml * Update task.py * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update utils.py * re-added old functions with new log message * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update new_task_guide.md * added infor regarding `get_task_dict` and documentation * add get_config for Task * pre-commit formatting --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 24 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 15 Jan, 2024 1 commit
-
-
Lintang Sutawika authored
* rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md
-
- 22 Dec, 2023 1 commit
-
-
Bram Vanroy authored
-
- 21 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
* change version field formatting in metadata * mention versioning in new task guide * add instructions for changelog * run linters
-
- 14 Dec, 2023 1 commit
-
-
Lintang Sutawika authored
* Additional process for doc_to_choice * doc_to_choice can also parse a string
-
- 04 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
-
- 27 Nov, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 09 Nov, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 06 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 02 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 01 Nov, 2023 2 commits
-
-
Hailey Schoelkopf authored
-
haileyschoelkopf authored
-
- 04 Oct, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 22 Sep, 2023 1 commit
-
-
Chris authored
-
- 25 Aug, 2023 1 commit
-
-
Lintang Sutawika authored
-
- 16 Aug, 2023 1 commit
-
-
lintangsutawika authored
-
- 08 Aug, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 07 Aug, 2023 1 commit
-
-
lintangsutawika authored
-
- 18 Jul, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 20 Jun, 2023 1 commit
-
-
lintangsutawika authored
-
- 16 Jun, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 15 Jun, 2023 1 commit
-
-
lintangsutawika authored
-
- 12 Jun, 2023 4 commits
-
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-