- 05 Aug, 2024 6 commits
-
-
Yu Shi Jie authored
* added gsm_plus * formatted dataset to have train-test-splits * README.md for gsm-plus * Update README.md * GSM-Plus: added gsm_plus_mini * GSM-Plus: attribution to original dataset * Update README.md * Update README.md * Update README.md --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Yu Shi Jie authored
* initialized mmlu_pro task * added generative mmlu-pro * added cot fewshot for mmlu-pro * Initial commit * updated mmlu-pro to take on 3 splits: test, val, dev * mmlu-pro: added continuation and flan_cot_zeroshot * added README.md for mmlu_pro * removed * update files * moved files out, and removed unused versions * updated * mmlu_pro: -changed task 'other' to 'miscellaneous' there is already a group named 'other' task and group with the same alias (e.g. mmlu_pro_other_generative) throws an error -fixed yaml backslash escape for fewshot cot * changed choices -> options in yaml config to fit dataset schema * ONLY FOR DEFAULT: fixed yaml file to use variable number of choices * mmlu-pro: fixed doc_to_text/choice/target configs for all variants * mmlu-pro: minor fixes * mmlu-pro/default: aligned with mmlu updates * mmlu-pro: update yaml content in line with mmlu * mmlu-pro: fixed mislabelling of task (math->chemistry) * mmlu-pro: fixed yaml formatting * add custom fewshot doc_to_text, target, and choice * add process for each subtask * add process for each subtask * pre-commit * pre-commit * format * resolved left out merge * deleted folders + updated readme * Update evaluator.py * Update evaluator.py --------- Co-authored-by:
Yu Shi Jie <shijie@tensorplex.ai> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
root <root@455bdd73-01.cloud.together.ai> Co-authored-by:
Lintang Sutawika <lintang@sutawika.com>
-
Hailey Schoelkopf authored
-
Amir Hossein Kargaran authored
-
Baber Abbasi authored
-
Nathan Habib authored
* batch commit * :Revert "batch commit" This reverts commit d859d1ca . * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * linting * add doc * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/huggingface.py * linter * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * style * remove prepare * fix * style * last check * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> --------- Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
clementine@huggingface.co <clementine@huggingface.co>
-
- 04 Aug, 2024 2 commits
-
-
zhabuye authored
-
Amir Hossein Kargaran authored
-
- 01 Aug, 2024 3 commits
-
-
Hailey Schoelkopf authored
-
Nathan Weinberg authored
* refactor: move scipy and sklearn module imports to func imports Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * refactor: consolidate weighted_f1_score func into lm_eval utils Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * lint: allow for utils file to have unused imports this allows for shared functions to be defined only once while allowing for the YAML function importing to continue working Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> --------- Signed-off-by:
Nathan Weinberg <nweinber@redhat.com>
-
Baber Abbasi authored
* add temperature for log probs * add seed * nit * add new args to test * added warning for api chat models
-
- 29 Jul, 2024 1 commit
-
-
Baber Abbasi authored
* encoding bugfix * encoding bugfix * overload logliklehood rather than loglikehood_tokens * add custom tokenizer * add docs * Update API_guide.md fix link; add note * Update API_guide.md typo * pre-commit * add link in readme * nit * nit * nit * Update API_guide.md nits * Update API_guide.md * Update API_guide.md * Update API_guide.md * Update API_guide.md * Update README.md * Update docs/API_guide.md * Update docs/API_guide.md * Update API_guide.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 22 Jul, 2024 1 commit
-
-
Baber Abbasi authored
* refactor pad_token handling to fn * fix docs * add pad_token_handling to vllm * start on API superclass * don't detokenize the returned logits * streamline vllm tokenizer * add type hint * pre-commit * seems to be in working order * add model to init * refactor api models * nit * cleanup * add pbar * fix type hints * change optional dependencies * json encode chat template * add type hints * deal with different prompt input requiremnts * nits * fix * cache inside async * fix * fix * nits * nits * nits * nit * fixup * fixup * nit * add dummy retry * add dummy retry * handle imports; skip failing test * add type hint * add tests * add dependency to tests * add package names to exception * nit * docs; type hints * handle api key * nit * tokenizer bug * fix tokenizer * nit * nit * add better error messages * nit * remove decorator * CI: install api dep * revert evaluator.py * consolidate * consolidate * nits * nit * fix typealias * nit * nit * nit * Update lm_eval/models/api_models.py typo Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/anthropic_llms.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/api_models.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix typo * add news section * add info for API * pre-commit * typo * fix bug: unpack logliklehood requests * fix bug: shared gen_kwargs mutated * nit: handle copy properly * Update README.md * Update README.md * Update README.md * Update api_models.py * Update README.md --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 21 Jul, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 20 Jul, 2024 1 commit
-
-
Jennifer Cwagenberg authored
-
- 18 Jul, 2024 2 commits
-
-
Nathan Weinberg authored
Signed-off-by:Nathan Weinberg <nweinber@redhat.com>
-
Jungwhan Kim authored
-
- 17 Jul, 2024 1 commit
-
-
jab13x authored
-
- 15 Jul, 2024 3 commits
-
-
Nathan Weinberg authored
Also add 'test_logs/' to .gitignore Signed-off-by:Nathan Weinberg <nweinber@redhat.com>
-
Lintang Sutawika authored
-
Hailey Schoelkopf authored
-
- 14 Jul, 2024 1 commit
-
-
Ben Shoham Ofir authored
* Added MedConceptsQA Benchmark * pre-commit factor * update group name * update in naming * changed name * Changed mcqa to med_concepts_qa prefix * Added med_concepts_qa to README.md * Changed config files according the new format * Updated README --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
- 13 Jul, 2024 1 commit
-
-
Nathan Weinberg authored
Signed-off-by:Nathan Weinberg <nweinber@redhat.com>
-
- 12 Jul, 2024 4 commits
-
-
Jess authored
* add afrixnli to task * add chat completion * remove chat completion -untested * afrimmlu added * afrimmlu folder update * afrimmlu folder update * updated prompt * remove print * add afrimgsm -direct * add squad metric * fix bash script * remove direct util, update common yaml * remove print * add few show. metric fixes * fix direct path, add bash script for gpt models * added transate test * update afrixnli tasks * update afrixnli tasks * update metrics for afrixnli * prompt translations fix * prompt translations fix * filter and metric fix -mgsm * remove squad metric * remove squad metric * add f1 score to mgsm * add f1 score to mgsm * update native-direct with lin * change f1 function * add lin to utils * add utils * remove test limit * remove test configs * add swahili to mmlu * change eng to ewe in ewe yaml mmlu * add squad metric to mgsm, remove whitespace filter * added translate test * added afrixnli_translate * fix exact match valueError * fix exact match valueError * restructure mmlu folder * spacing * remove afrimmlu_translate folder * add utility * format task name, clean ups * modefied mgsm * update on afrimgsm * update on afrimgsm * removed utils * other mgsm varieties * other mgsm varieties * adding trasnslate direct * Update translate_direct_yaml * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model * edit for open models * Update translate_direct_yaml * add verbalizer for xnli * change xnli from multiple choice to generate * add manual accuracy scores * revert xnli to multiple choice * change afrimgsm utils * revert xnli to multiple_choice * cleanups and readmes * remove openai fixes and unused regex * pr review changes * revert metrics.py, task.py and extraction.py to main version --------- Co-authored-by:
Israel Abebe Azime <azime@cg.uni-saarland.de> Co-authored-by:
Israel Abebe Azime <se.israel.abebe@gmail.com>
-
SuperCat authored
* add mmlusr tasks * renamed all tasks names in mmlusr * edit format and readme * added mmlu_sr * mmlu_sr -> mmlusr * update --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
Wonung Kim authored
-
Hailey Schoelkopf authored
-
- 11 Jul, 2024 1 commit
-
-
anthony-dipofi authored
* add and ; move task list newline logic to new TaskManager.list_all_tasks() method * format table list into markdown table; add config location column * add Output Type column * add logic for printing table of tags separately * merge with main and fix conflicts ; update docstrings --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 10 Jul, 2024 2 commits
-
-
meg authored
-
Lintang Sutawika authored
Group Configs with no aggregation will print a empty space as the score for result table. Example ``` | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |--------------|-------|------|-----:|--------|---|-----:|---|-----:| |group | N/A| | | | | | | | | - task 0 |Yaml |none | 0|acc |↑ |0.4000|± |0.0910| | - task 1 |Yaml |none | 0|acc |↑ |0.3333|± |0.0875| | - task 2 |Yaml |none | 0|acc |↑ |0.2667|± |0.0821| | - task 3 |Yaml |none | 0|acc |↑ |0.3333|± |0.0875| ``` So the `v` variable in the `make_table` needs to check if the value is a float or a string.
-
- 09 Jul, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 08 Jul, 2024 6 commits
-
-
Pankaj Mathur authored
leaderboard README.md missing mmlu-pro group and task
-
Nathan Habib authored
* batch commit * :Revert "batch commit" This reverts commit d859d1ca. * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup eval results * cleanup * add check for gated repo * fix jsonline issue * fix * add try catch when gating the details repo * add doc * adds back hub_repo_name * readds hub repo name
-
Elron Bandel authored
* Updated unitxt loading Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Revert change to general Readme Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Fix scrolls Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Update documentation Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Enforce backward compatability Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Format unitxt class Signed-off-by:
elronbandel <elron.bandel@ibm.com> --------- Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> Signed-off-by:
elronbandel <elron.bandel@ibm.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Hailey Schoelkopf authored
-
Lintang Sutawika authored
* add greoup_config arg * add a group config that allows disabling table for group score and group aggregate in general * fixed size configuration * adjust config * add group config * adjust mmlu to use group_config * fixed args input in aggregate_subtask_metrics * fixed issues related to printing alias of group and updated yaml * update all mmlu variants to include group_config * edit format * modify mmlu tasks * adjust group to also be a configurable group * add configurable group * simplify get_task_list * adjust group scoring with using ConfigurableGroup * adjust args * update mmlu * update mmlu * update to work with new group and task configuration * readd group_agg * readd files * move prepare_print_tasks to evaluator_utils * sort set to False by default, fix predict_only arg * add version for groups * reversed task list * update additional condition when loading a group in a group yaml * update truthfulqa * add description regarding tags replacing group * replace group to tag * fixed conditional statement * remove warning * update loading of task group and newly added tags * reformat with pre-commit * fixed info log * update * fix bug * fix bug * use task id to differentiate tasks * convert all groups to configurable groups * use task_id * reformat * add task_id for python tasks as well * add task_id for python tasks as well * add task_id for python tasks as well * revert truthfulqa * revert mmlu tasks * new mmlu config * new group config parameter `tag_to_task` * Update truthfulqa_mc2.yaml * reformate * add _process_group_config * adjust task_id * add get_subtask_list function to get proper subtask list * group config to_dict update * remove tag check * update mmlu * fix config passing issues * add test yaml * format fix * add documentation * corner case for single tag being called * fix indentation * formatting * update all mmlu variants * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * remove group_alias * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * remove version for metadata * Update docs/task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * update mmlu/ * removed " " in make_table * change how aggregate_metric is loaded * change how aggregate_metric is loaded * update aggregate_metric arg * update format * update format * some docs fixes * add groups for agieval, aexams, aclue * add more explicit aggregation groups * add more groupings / tags distinctions * add more groupings * more groupings * add many explicit group configs * add many explicit group configs * add more explicit group configs * add more explicit group configs * add more error msgs, agg_metric -> agg_metric_list * some docs updates * update task_id to be updateable and uses group:task format * make KMMLU a tag for now * update docs * don't duplicate task names * fix merge conflicts? * giving this a try * clean up diff * switch mmlu variants over to using * don't use to-be-deprecated group: config field in overview notebook * Python tasks which subclass ConfigurableTask now run * update mmlu * pre-commit format * fixed sorting for multi-level printing * move group api to separate file * fix bbh aggregation filter usage * track api/group.py * adjust group and tags loading * make explicit group configs for leaderboard and other newer tasks * fix arabicmmlu * update * change arabicmmlu template name??? * update group alias * fix printing bugs * check table printing is correct ; update tests * use mmlu_stem to have a group included in print tests --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Choyunhui authored
Co-authored-by:yhjo <yhjo@suresofttech.com>
-
- 03 Jul, 2024 3 commits
-
-
Hanwool Albert Lee authored
* initial_implementation (test has to be proceeded) * minor fix * revised task name and implemented new task * minor fixes * new tasks implement * minor fix * added 'prompt injection' task * delete prompt injection task (will be implemented at next PR) * trust remote code * Update lm_eval/tasks/inverse_scaling/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * added readme * Update lm_eval/tasks/README.md * Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml * Update lm_eval/tasks/inverse_scaling/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/tasks/inverse_scaling/_inverse_scaling_mc_yaml Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update README.md * precommit? * run precommit on readme --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Nathan Habib authored
* adds leaderboard tasks * Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml * add readme * Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml * modify readme * fix bbh task * fix bbh salient task * modify the readme * Delete lm_eval/tasks/leaderboard/ifeval/README.md * Delete lm_eval/tasks/leaderboard/math/README.md * add leaderboard to the tasks repertory * add anouncment about new leaderbaord tasks * linting * Update README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * installs ifeval dependency in new_task github workflow --------- Co-authored-by:
Nathan Habib <nathan.habib@huggingface.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
-