• Jess's avatar
    Irokobench: Benchmark Dataset for African languages (#2042) · 383bbd54
    Jess authored
    
    
    * add afrixnli to task
    
    * add chat completion
    
    * remove chat completion -untested
    
    * afrimmlu added
    
    * afrimmlu folder update
    
    * afrimmlu folder update
    
    * updated prompt
    
    * remove print
    
    * add afrimgsm -direct
    
    * add squad metric
    
    * fix bash script
    
    * remove direct util, update common yaml
    
    * remove print
    
    * add few show. metric fixes
    
    * fix direct path, add bash script for gpt models
    
    * added transate test
    
    * update afrixnli tasks
    
    * update afrixnli tasks
    
    * update metrics for afrixnli
    
    * prompt translations fix
    
    * prompt translations fix
    
    * filter and metric fix -mgsm
    
    * remove squad metric
    
    * remove squad metric
    
    * add f1 score to mgsm
    
    * add f1 score to mgsm
    
    * update native-direct with lin
    
    * change f1 function
    
    * add lin to utils
    
    * add utils
    
    * remove test limit
    
    * remove test configs
    
    * add swahili to mmlu
    
    * change eng to ewe in ewe yaml mmlu
    
    * add squad metric to mgsm, remove whitespace filter
    
    * added translate test
    
    * added afrixnli_translate
    
    * fix exact match valueError
    
    * fix exact match valueError
    
    * restructure mmlu folder
    
    * spacing
    
    * remove afrimmlu_translate folder
    
    * add utility
    
    * format task name, clean ups
    
    * modefied mgsm
    
    * update on afrimgsm
    
    * update on afrimgsm
    
    * removed utils
    
    * other mgsm varieties
    
    * other mgsm varieties
    
    * adding trasnslate direct
    
    * Update translate_direct_yaml
    
    * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model
    
    * edit for open models
    
    * Update translate_direct_yaml
    
    * add verbalizer for xnli
    
    * change xnli from multiple choice to generate
    
    * add manual accuracy scores
    
    * revert xnli to multiple choice
    
    * change afrimgsm utils
    
    * revert xnli to multiple_choice
    
    * cleanups and readmes
    
    * remove openai fixes and unused regex
    
    * pr review changes
    
    * revert metrics.py, task.py and extraction.py to main version
    
    ---------
    Co-authored-by: default avatarIsrael Abebe Azime <azime@cg.uni-saarland.de>
    Co-authored-by: default avatarIsrael Abebe Azime <se.israel.abebe@gmail.com>
    383bbd54
task.py 64.2 KB