- 23 Jul, 2025 1 commit
-
-
Baber authored
-
- 21 Jul, 2025 3 commits
- 19 Jul, 2025 3 commits
-
-
Baber Abbasi authored
-
James A. Michaelov authored
* add multiblimp * run linter
-
Avelina Asada Hadji-Kyriacou authored
* Update default.yaml
-
- 18 Jul, 2025 1 commit
-
-
Idan Tene authored
* Update utils.py
-
- 16 Jul, 2025 1 commit
-
-
philipdoldo authored
* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway. * feat: remove extra space from answers; add changelog --------- Co-authored-by:Baber <baber@hey.com>
-
- 14 Jul, 2025 1 commit
-
-
Atou Houdaifa authored
* add egy mmlu hellaswag * add egymmlu egyhellaswag to tasks readme * fix egymmlu config generation * fix _generate_configs formating
-
- 10 Jul, 2025 1 commit
-
-
Baber Abbasi authored
* check for chat for warning * add test * remove yaml extension from some evalita configs * move unitxt to own test script * fix CI test
-
- 03 Jul, 2025 2 commits
-
-
Baber Abbasi authored
* use double quotes
-
Blanca Calvo authored
* truthfulqa-multi task * truthfulqa-multi with chat few-shot * few shot chat implementation * changed until so it outputs lists * changed dataset location * added MT task * Create README.md * do not include MT * changes for PR * tag change * removed yaml extension * adding task to the table * fix task configs * add import exception --------- Co-authored-by:Baber <baber@hey.com>
-
- 30 Jun, 2025 1 commit
-
-
jinze authored
* Fix: Align the Humaneval dataset with official results Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals". (2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one. Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5). Ref: PR#2650 * add changelog and version * add changelog
-
- 25 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 20 Jun, 2025 1 commit
-
-
Anna Fontana authored
"arc_chalenge_chat" doesn't exist: I think it should be "arc_challenge_chat", but this task is not implemented here (see arc task folder).
-
- 19 Jun, 2025 2 commits
-
-
Maxim Evtush authored
-
Anna Fontana authored
Wrong task name: mmlu_generation doesn't non exist -> mmlu_generative is the correct one
-
- 16 Jun, 2025 2 commits
-
-
Baber Abbasi authored
* fix longbech citation
-
fuder.eth authored
* Update README.md * Update utils_mcq.py
-
- 12 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 08 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* use all answers * use middle truncation * maybe fix classification score * strip classification preds * [vllm] remove stop tokens post-hoc * strip all preds * pacify pre-commit * start on truncation utility * add to readme * add a footgun doc * fix newline in yaml templates * do not strip code_sim preds! * fix pre-commit config * fix instruction warning * add not to longbench readme
-
- 03 Jun, 2025 2 commits
-
-
Baber Abbasi authored
-
Baber Abbasi authored
* feat: add mbpp_instruct * fix: update generation_kwargs to use an empty until list * fix: correct predictions formatting in pass_at_1 function * fix: improve code block extraction by checking first without opening backticks * fix mbpp `pass_at_1`
-
- 26 May, 2025 1 commit
-
-
Boda Sadallah authored
* add arab_culture tasks * add target_delimeter and remove debugging code
-
- 21 May, 2025 1 commit
-
-
Hongseok Oh authored
-
- 19 May, 2025 2 commits
-
-
Baber Abbasi authored
* add `sglang-generate` * nit * nit * nit * pacify pre-commit
-
Harsha authored
* adding ACPBench_hard * adding Clingo * changing tarski to tarski[clingo] * denoting the main variants in each paper
-
- 15 May, 2025 4 commits
-
-
Baber Abbasi authored
-
tawsif authored
-
Yufeng Xu authored
* added c4 dataset (working) * fixed bugs in c4 * fixed loading bugs in c4 dataset; using partial loading * cleaned the code * added version number for c4 * removed irrelevant files
-
Jess authored
* add afrixnli to task * add chat completion * remove chat completion -untested * afrimmlu added * afrimmlu folder update * afrimmlu folder update * updated prompt * remove print * add afrimgsm -direct * add squad metric * fix bash script * remove direct util, update common yaml * remove print * add few show. metric fixes * fix direct path, add bash script for gpt models * added transate test * update afrixnli tasks * update afrixnli tasks * update metrics for afrixnli * prompt translations fix * prompt translations fix * filter and metric fix -mgsm * remove squad metric * remove squad metric * add f1 score to mgsm * add f1 score to mgsm * update native-direct with lin * change f1 function * add lin to utils * add utils * remove test limit * remove test configs * add swahili to mmlu * change eng to ewe in ewe yaml mmlu * add squad metric to mgsm, remove whitespace filter * added translate test * added afrixnli_translate * fix exact match valueError * fix exact match valueError * restructure mmlu folder * spacing * remove afrimmlu_translate folder * add utility * format task name, clean ups * modefied mgsm * update on afrimgsm * update on afrimgsm * removed utils * other mgsm varieties * other mgsm varieties * adding trasnslate direct * Update translate_direct_yaml * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model * edit for open models * Update translate_direct_yaml * add verbalizer for xnli * change xnli from multiple choice to generate * add manual accuracy scores * revert xnli to multiple choice * change afrimgsm utils * revert xnli to multiple_choice * cleanups and readmes * remove openai fixes and unused regex * pr review changes * revert metrics.py, task.py and extraction.py to main version * add afrisenti * utilities * pulled from main * add afrixnli * add afrimmlu * update afrixnli prompts * mising senti language * fix afrisenti prompt 2 * fix afrisenti prompts * fix afrisenti prompts * configure task grouping * add multiple prompts to afrixnli for irokobench * add multiple prompts to afrimmlu for irokobench * Update afrixnli_yaml * fixes and moves * fixes and moves * afrimmlu multiple prompts configs * remove validation set from afrimmlu * remove eng from afrimmlu translate test * correct dataset path * multiple prompts for mgsm * file restructure * afribench grouping * repo restructuring * repo restructuring * update exact match to hugging face exact match and add new mgsm language * remove decontamination * update generation kwargs * update generation kwargs for all mgsm prompts * remove lang * update generation kwargs for afrimgsm translatetest * add afrimgsm cot for direct and translate * remove eng from translate-cot * add masakhaPOS tasks * remove changes from task script * add masakhanews tasks * add uhura arc easy * add afriqa and belebele files * add tags for easier run. add naija rc * add new metrics and transformation scripts * fix afriqa swa fewshot split * add naijarc * add afrobench lite tasks * update afrobench * update afrobench * remove unverified files to avoid bugs * remove files not needed * add afrobench tasks * add afrobench tasks * change to version 1 * change to version 1 * update afrobench * update afrobench * restore metric to original script * update readme instructions * add individual dataset readmes * add link to collections * correct run script * align with main * align with main * align with main * align with main * align with main * align with main * align with main * align with main * failed run fixes * failed run fixes * add afrimgsm cot * Apply precommit fixes * update mafand dataset name * pull request fixes * remove afrihate due to availability --------- Co-authored-by:
Israel Abebe Azime <azime@cg.uni-saarland.de> Co-authored-by:
Israel Abebe Azime <se.israel.abebe@gmail.com> Co-authored-by:
David Adelani <davlanade@gmail.com> Co-authored-by:
theyorubayesian <akin.o.oladipo@gmail.com>
-
- 13 May, 2025 2 commits
-
-
Yoonsoo Kim authored
* mmlu pro generation_kwargs until Q: -> Question: * pacify pre-commit * change stop token --------- Co-authored-by:Baber <baber@hey.com>
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 06 May, 2025 2 commits
-
-
Anna Fontana authored
* Fix import error for eval_logger in score utils * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Vladislav Mikhailov authored
* added noreval * added a checklist for noreval * run pre-commit * changed imports and added short noreval description * fixed norsumm path * refactored multi-folder tasks * refactored multi-folder tasks
-
- 29 Apr, 2025 1 commit
-
-
Baber Abbasi authored
-
- 16 Apr, 2025 3 commits
-
-
Baber Abbasi authored
* add warning in for default until * fix stop tokens; add vcsum * bugfix:fix doc_to_target to string * fix lsht, trec * add task to readme * add debugging logs for multiple input/output
-
Baber Abbasi authored
* switch MMLU to cais/mmlu * switch back to tj-actions/changed-files * cache HF folder
-
Eldar Kurtic authored
-