• Lintang Sutawika's avatar
    Group agg rework (#1741) · 517aadc4
    Lintang Sutawika authored
    
    
    * add greoup_config arg
    
    * add a group config that allows disabling table for group score and group aggregate in general
    
    * fixed size configuration
    
    * adjust config
    
    * add group config
    
    * adjust mmlu to use group_config
    
    * fixed args input in aggregate_subtask_metrics
    
    * fixed issues related to printing alias of group and updated yaml
    
    * update all mmlu variants to include group_config
    
    * edit format
    
    * modify mmlu tasks
    
    * adjust group to also be a configurable group
    
    * add configurable group
    
    * simplify get_task_list
    
    * adjust group scoring with using ConfigurableGroup
    
    * adjust args
    
    * update mmlu
    
    * update mmlu
    
    * update to work with new group and task configuration
    
    * readd group_agg
    
    * readd files
    
    * move prepare_print_tasks to evaluator_utils
    
    * sort set to False by default, fix predict_only arg
    
    * add version for groups
    
    * reversed task list
    
    * update additional condition when loading a group in a group yaml
    
    * update truthfulqa
    
    * add description regarding tags replacing group
    
    * replace group to tag
    
    * fixed conditional statement
    
    * remove warning
    
    * update loading of task group and newly added tags
    
    * reformat with pre-commit
    
    * fixed info log
    
    * update
    
    * fix bug
    
    * fix bug
    
    * use task id to differentiate tasks
    
    * convert all groups to configurable groups
    
    * use task_id
    
    * reformat
    
    * add task_id for python tasks as well
    
    * add task_id for python tasks as well
    
    * add task_id for python tasks as well
    
    * revert truthfulqa
    
    * revert mmlu tasks
    
    * new mmlu config
    
    * new group config parameter `tag_to_task`
    
    * Update truthfulqa_mc2.yaml
    
    * reformate
    
    * add _process_group_config
    
    * adjust task_id
    
    * add get_subtask_list function to get proper subtask list
    
    * group config to_dict update
    
    * remove tag check
    
    * update mmlu
    
    * fix config passing issues
    
    * add test yaml
    
    * format fix
    
    * add documentation
    
    * corner case for single tag being called
    
    * fix indentation
    
    * formatting
    
    * update all mmlu variants
    
    * Update docs/task_guide.md
    Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
    
    * remove group_alias
    
    * Update docs/task_guide.md
    Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
    
    * remove version for metadata
    
    * Update docs/task_guide.md
    Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
    
    * update mmlu/
    
    * removed " " in make_table
    
    * change how aggregate_metric is loaded
    
    * change how aggregate_metric is loaded
    
    * update aggregate_metric arg
    
    * update format
    
    * update format
    
    * some docs fixes
    
    * add groups for agieval, aexams, aclue
    
    * add more explicit aggregation groups
    
    * add more groupings / tags distinctions
    
    * add more groupings
    
    * more groupings
    
    * add many explicit group configs
    
    * add many explicit group configs
    
    * add more explicit group configs
    
    * add more explicit group configs
    
    * add more error msgs, agg_metric -> agg_metric_list
    
    * some docs updates
    
    * update task_id to be updateable and uses group:task format
    
    * make KMMLU a tag for now
    
    * update docs
    
    * don't duplicate task names
    
    * fix merge conflicts?
    
    * giving this a try
    
    * clean up diff
    
    * switch mmlu variants over to using
    
    * don't use to-be-deprecated group: config field in overview notebook
    
    * Python tasks which subclass ConfigurableTask now run
    
    * update mmlu
    
    * pre-commit format
    
    * fixed sorting for multi-level printing
    
    * move group api to separate file
    
    * fix bbh aggregation filter usage
    
    * track api/group.py
    
    * adjust group and tags loading
    
    * make explicit group configs for leaderboard and other newer tasks
    
    * fix arabicmmlu
    
    * update
    
    * change arabicmmlu template name???
    
    * update group alias
    
    * fix printing bugs
    
    * check table printing is correct ; update tests
    
    * use mmlu_stem to have a group included in print tests
    
    ---------
    Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
    Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
    517aadc4
task_guide.md 19.8 KB