- 05 Aug, 2024 1 commit
-
-
Yu Shi Jie authored
* added gsm_plus * formatted dataset to have train-test-splits * README.md for gsm-plus * Update README.md * GSM-Plus: added gsm_plus_mini * GSM-Plus: attribution to original dataset * Update README.md * Update README.md * Update README.md --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
- 08 Jul, 2024 1 commit
-
-
Lintang Sutawika authored
* add greoup_config arg * add a group config that allows disabling table for group score and group aggregate in general * fixed size configuration * adjust config * add group config * adjust mmlu to use group_config * fixed args input in aggregate_subtask_metrics * fixed issues related to printing alias of group and updated yaml * update all mmlu variants to include group_config * edit format * modify mmlu tasks * adjust group to also be a configurable group * add configurable group * simplify get_task_list * adjust group scoring with using ConfigurableGroup * adjust args * update mmlu * update mmlu * update to work with new group and task configuration * readd group_agg * readd files * move prepare_print_tasks to evaluator_utils * sort set to False by default, fix predict_only arg * add version for groups * reversed task list * update additional condition when loading a group in a group yaml * update truthfulqa * add description regarding tags replacing group * replace group to tag * fixed conditional statement * remove warning * update loading of task group and newly added tags * reformat with pre-commit * fixed info log * update * fix bug * fix bug * use task id to differentiate tasks * convert all groups to configurable groups * use task_id * reformat * add task_id for python tasks as well * add task_id for python tasks as well * add task_id for python tasks as well * revert truthfulqa * revert mmlu tasks * new mmlu config * new group config parameter `tag_to_task` * Update truthfulqa_mc2.yaml * reformate * add _process_group_config * adjust task_id * add get_subtask_list function to get proper subtask list * group config to_dict update * remove tag check * update mmlu * fix config passing issues * add test yaml * format fix * add documentation * corner case for single tag being called * fix indentation * formatting * update all mmlu variants * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * remove group_alias * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * remove version for metadata * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * update mmlu/ * removed " " in make_table * change how aggregate_metric is loaded * change how aggregate_metric is loaded * update aggregate_metric arg * update format * update format * some docs fixes * add groups for agieval, aexams, aclue * add more explicit aggregation groups * add more groupings / tags distinctions * add more groupings * more groupings * add many explicit group configs * add many explicit group configs * add more explicit group configs * add more explicit group configs * add more error msgs, agg_metric -> agg_metric_list * some docs updates * update task_id to be updateable and uses group:task format * make KMMLU a tag for now * update docs * don't duplicate task names * fix merge conflicts? * giving this a try * clean up diff * switch mmlu variants over to using * don't use to-be-deprecated group: config field in overview notebook * Python tasks which subclass ConfigurableTask now run * update mmlu * pre-commit format * fixed sorting for multi-level printing * move group api to separate file * fix bbh aggregation filter usage * track api/group.py * adjust group and tags loading * make explicit group configs for leaderboard and other newer tasks * fix arabicmmlu * update * change arabicmmlu template name??? * update group alias * fix printing bugs * check table printing is correct ; update tests * use mmlu_stem to have a group included in print tests --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 19 Feb, 2024 1 commit
-
-
thnkinbtfly authored
* update bbh, gsm8k, mmlu parsing logic and prompts * remove the formatting prompt (bbh) + minor update (mmlu) * update bbh, gsm8k, mmlu zeroshot, revert fewshots * update bbh, gsm8k, mmlu version, forward changes to gsm8k-cot * remove take_last, update to use docs parameters * add newline * ruff formatting * Update pyproject.toml * fix format --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 31 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning
-
- 15 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 11 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* fix incorrect lookback protections * bump generate_until task versions
-
- 21 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
* change version field formatting in metadata * mention versioning in new task guide * add instructions for changelog * run linters
-
- 28 Nov, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 03 Nov, 2023 2 commits
-
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
- 17 Oct, 2023 1 commit
-
-
lintangsutawika authored
-
- 14 Aug, 2023 1 commit
-
-
lintangsutawika authored
-
- 11 Jul, 2023 1 commit
-
-
lintangsutawika authored
-
- 06 Jul, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 12 Jun, 2023 1 commit
-
-
Hailey Schoelkopf authored
* add wip gsm8k yaml * cleanup tasks dir * push gsm8k yaml changes * rename gpt2.py * add updated gsm8k , triviaqa baseline * add new cot yaml * allow for multiple filter pipelines, new filter types * updated gsm8k + sampling gen configs * cleanup self-consistency yaml * push outline for advanced docs * push docs checklist * switch to inheritance for many tasks * acc_norm and acc_mutual_info fixed * fix missing newline in error msg * remove many .py tasks * updated GSM8k * added more doc * Update advanced_task_guide.md Added list of parameters * Update advanced_task_guide.md * Added details on listing metrics * Update advanced_task_guide.md * Added more explanation * modify current default filter name * add new tags to tasks * remove a lingering print() * add rest of param docs, cleanup deprecated fields * push docs update * move ALL_TASKS definition location * confirm write_out.py works if no description dict passed --------- Co-authored-by:lintangsutawika <lintang@sutawika.com>
-
- 08 Jun, 2023 2 commits
-
-
lintangsutawika authored
-
Hailey Schoelkopf authored
* add wip gsm8k yaml * cleanup tasks dir * push gsm8k yaml changes * rename gpt2.py * add updated gsm8k , triviaqa baseline * add new cot yaml * allow for multiple filter pipelines, new filter types * updated gsm8k + sampling gen configs * cleanup self-consistency yaml
-