Unverified Commit 517aadc4 authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Group agg rework (#1741)



* add greoup_config arg

* add a group config that allows disabling table for group score and group aggregate in general

* fixed size configuration

* adjust config

* add group config

* adjust mmlu to use group_config

* fixed args input in aggregate_subtask_metrics

* fixed issues related to printing alias of group and updated yaml

* update all mmlu variants to include group_config

* edit format

* modify mmlu tasks

* adjust group to also be a configurable group

* add configurable group

* simplify get_task_list

* adjust group scoring with using ConfigurableGroup

* adjust args

* update mmlu

* update mmlu

* update to work with new group and task configuration

* readd group_agg

* readd files

* move prepare_print_tasks to evaluator_utils

* sort set to False by default, fix predict_only arg

* add version for groups

* reversed task list

* update additional condition when loading a group in a group yaml

* update truthfulqa

* add description regarding tags replacing group

* replace group to tag

* fixed conditional statement

* remove warning

* update loading of task group and newly added tags

* reformat with pre-commit

* fixed info log

* update

* fix bug

* fix bug

* use task id to differentiate tasks

* convert all groups to configurable groups

* use task_id

* reformat

* add task_id for python tasks as well

* add task_id for python tasks as well

* add task_id for python tasks as well

* revert truthfulqa

* revert mmlu tasks

* new mmlu config

* new group config parameter `tag_to_task`

* Update truthfulqa_mc2.yaml

* reformate

* add _process_group_config

* adjust task_id

* add get_subtask_list function to get proper subtask list

* group config to_dict update

* remove tag check

* update mmlu

* fix config passing issues

* add test yaml

* format fix

* add documentation

* corner case for single tag being called

* fix indentation

* formatting

* update all mmlu variants

* Update docs/task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* remove group_alias

* Update docs/task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* remove version for metadata

* Update docs/task_guide.md
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* update mmlu/

* removed " " in make_table

* change how aggregate_metric is loaded

* change how aggregate_metric is loaded

* update aggregate_metric arg

* update format

* update format

* some docs fixes

* add groups for agieval, aexams, aclue

* add more explicit aggregation groups

* add more groupings / tags distinctions

* add more groupings

* more groupings

* add many explicit group configs

* add many explicit group configs

* add more explicit group configs

* add more explicit group configs

* add more error msgs, agg_metric -> agg_metric_list

* some docs updates

* update task_id to be updateable and uses group:task format

* make KMMLU a tag for now

* update docs

* don't duplicate task names

* fix merge conflicts?

* giving this a try

* clean up diff

* switch mmlu variants over to using

* don't use to-be-deprecated group: config field in overview notebook

* Python tasks which subclass ConfigurableTask now run

* update mmlu

* pre-commit format

* fixed sorting for multi-level printing

* move group api to separate file

* fix bbh aggregation filter usage

* track api/group.py

* adjust group and tags loading

* make explicit group configs for leaderboard and other newer tasks

* fix arabicmmlu

* update

* change arabicmmlu template name???

* update group alias

* fix printing bugs

* check table printing is correct ; update tests

* use mmlu_stem to have a group included in print tests

---------
Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
parent 5a7ed3ee
"dataset_name": "high_school_european_history" "dataset_name": "high_school_european_history"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school european history.\n\n" \ school european history.\n\n"
"group": "mmlu_flan_n_shot_generative_humanities" "tag": "mmlu_flan_n_shot_generative_humanities"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_european_history" "task": "mmlu_flan_n_shot_generative_high_school_european_history"
"dataset_name": "high_school_geography" "dataset_name": "high_school_geography"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school geography.\n\n" \ school geography.\n\n"
"group": "mmlu_flan_n_shot_generative_social_sciences" "tag": "mmlu_flan_n_shot_generative_social_sciences"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_geography" "task": "mmlu_flan_n_shot_generative_high_school_geography"
"dataset_name": "high_school_government_and_politics" "dataset_name": "high_school_government_and_politics"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school government and politics.\n\n" \ school government and politics.\n\n"
"group": "mmlu_flan_n_shot_generative_social_sciences" "tag": "mmlu_flan_n_shot_generative_social_sciences"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_government_and_politics" "task": "mmlu_flan_n_shot_generative_high_school_government_and_politics"
"dataset_name": "high_school_macroeconomics" "dataset_name": "high_school_macroeconomics"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school macroeconomics.\n\n" \ school macroeconomics.\n\n"
"group": "mmlu_flan_n_shot_generative_social_sciences" "tag": "mmlu_flan_n_shot_generative_social_sciences"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_macroeconomics" "task": "mmlu_flan_n_shot_generative_high_school_macroeconomics"
"dataset_name": "high_school_mathematics" "dataset_name": "high_school_mathematics"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school mathematics.\n\n" \ school mathematics.\n\n"
"group": "mmlu_flan_n_shot_generative_stem" "tag": "mmlu_flan_n_shot_generative_stem"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_mathematics" "task": "mmlu_flan_n_shot_generative_high_school_mathematics"
"dataset_name": "high_school_microeconomics" "dataset_name": "high_school_microeconomics"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school microeconomics.\n\n" \ school microeconomics.\n\n"
"group": "mmlu_flan_n_shot_generative_social_sciences" "tag": "mmlu_flan_n_shot_generative_social_sciences"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_microeconomics" "task": "mmlu_flan_n_shot_generative_high_school_microeconomics"
"dataset_name": "high_school_physics" "dataset_name": "high_school_physics"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school physics.\n\n" \ school physics.\n\n"
"group": "mmlu_flan_n_shot_generative_stem" "tag": "mmlu_flan_n_shot_generative_stem"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_physics" "task": "mmlu_flan_n_shot_generative_high_school_physics"
"dataset_name": "high_school_psychology" "dataset_name": "high_school_psychology"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school psychology.\n\n" \ school psychology.\n\n"
"group": "mmlu_flan_n_shot_generative_social_sciences" "tag": "mmlu_flan_n_shot_generative_social_sciences"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_psychology" "task": "mmlu_flan_n_shot_generative_high_school_psychology"
"dataset_name": "high_school_statistics" "dataset_name": "high_school_statistics"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school statistics.\n\n" \ school statistics.\n\n"
"group": "mmlu_flan_n_shot_generative_stem" "tag": "mmlu_flan_n_shot_generative_stem"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_statistics" "task": "mmlu_flan_n_shot_generative_high_school_statistics"
"dataset_name": "high_school_us_history" "dataset_name": "high_school_us_history"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school us history.\n\n" \ school us history.\n\n"
"group": "mmlu_flan_n_shot_generative_humanities" "tag": "mmlu_flan_n_shot_generative_humanities"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_us_history" "task": "mmlu_flan_n_shot_generative_high_school_us_history"
"dataset_name": "high_school_world_history" "dataset_name": "high_school_world_history"
"description": "The following are multiple choice questions (with answers) about high\ "description": "The following are multiple choice questions (with answers) about high\
\ school world history.\n\n" \ school world history.\n\n"
"group": "mmlu_flan_n_shot_generative_humanities" "tag": "mmlu_flan_n_shot_generative_humanities"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_high_school_world_history" "task": "mmlu_flan_n_shot_generative_high_school_world_history"
"dataset_name": "human_aging" "dataset_name": "human_aging"
"description": "The following are multiple choice questions (with answers) about human\ "description": "The following are multiple choice questions (with answers) about human\
\ aging.\n\n" \ aging.\n\n"
"group": "mmlu_flan_n_shot_generative_other" "tag": "mmlu_flan_n_shot_generative_other"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_human_aging" "task": "mmlu_flan_n_shot_generative_human_aging"
"dataset_name": "human_sexuality" "dataset_name": "human_sexuality"
"description": "The following are multiple choice questions (with answers) about human\ "description": "The following are multiple choice questions (with answers) about human\
\ sexuality.\n\n" \ sexuality.\n\n"
"group": "mmlu_flan_n_shot_generative_social_sciences" "tag": "mmlu_flan_n_shot_generative_social_sciences"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_human_sexuality" "task": "mmlu_flan_n_shot_generative_human_sexuality"
"dataset_name": "international_law" "dataset_name": "international_law"
"description": "The following are multiple choice questions (with answers) about international\ "description": "The following are multiple choice questions (with answers) about international\
\ law.\n\n" \ law.\n\n"
"group": "mmlu_flan_n_shot_generative_humanities" "tag": "mmlu_flan_n_shot_generative_humanities"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_international_law" "task": "mmlu_flan_n_shot_generative_international_law"
"dataset_name": "jurisprudence" "dataset_name": "jurisprudence"
"description": "The following are multiple choice questions (with answers) about jurisprudence.\n\ "description": "The following are multiple choice questions (with answers) about jurisprudence.\n\
\n" \n"
"group": "mmlu_flan_n_shot_generative_humanities" "tag": "mmlu_flan_n_shot_generative_humanities"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_jurisprudence" "task": "mmlu_flan_n_shot_generative_jurisprudence"
"dataset_name": "logical_fallacies" "dataset_name": "logical_fallacies"
"description": "The following are multiple choice questions (with answers) about logical\ "description": "The following are multiple choice questions (with answers) about logical\
\ fallacies.\n\n" \ fallacies.\n\n"
"group": "mmlu_flan_n_shot_generative_humanities" "tag": "mmlu_flan_n_shot_generative_humanities"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_logical_fallacies" "task": "mmlu_flan_n_shot_generative_logical_fallacies"
"dataset_name": "machine_learning" "dataset_name": "machine_learning"
"description": "The following are multiple choice questions (with answers) about machine\ "description": "The following are multiple choice questions (with answers) about machine\
\ learning.\n\n" \ learning.\n\n"
"group": "mmlu_flan_n_shot_generative_stem" "tag": "mmlu_flan_n_shot_generative_stem"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_machine_learning" "task": "mmlu_flan_n_shot_generative_machine_learning"
"dataset_name": "management" "dataset_name": "management"
"description": "The following are multiple choice questions (with answers) about management.\n\ "description": "The following are multiple choice questions (with answers) about management.\n\
\n" \n"
"group": "mmlu_flan_n_shot_generative_other" "tag": "mmlu_flan_n_shot_generative_other"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_management" "task": "mmlu_flan_n_shot_generative_management"
"dataset_name": "marketing" "dataset_name": "marketing"
"description": "The following are multiple choice questions (with answers) about marketing.\n\ "description": "The following are multiple choice questions (with answers) about marketing.\n\
\n" \n"
"group": "mmlu_flan_n_shot_generative_other" "tag": "mmlu_flan_n_shot_generative_other"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_marketing" "task": "mmlu_flan_n_shot_generative_marketing"
"dataset_name": "medical_genetics" "dataset_name": "medical_genetics"
"description": "The following are multiple choice questions (with answers) about medical\ "description": "The following are multiple choice questions (with answers) about medical\
\ genetics.\n\n" \ genetics.\n\n"
"group": "mmlu_flan_n_shot_generative_other" "tag": "mmlu_flan_n_shot_generative_other"
"include": "_mmlu_flan_generative_template_yaml" "include": "_mmlu_flan_generative_template_yaml"
"task": "mmlu_flan_n_shot_generative_medical_genetics" "task": "mmlu_flan_n_shot_generative_medical_genetics"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment