Unverified Commit 18297993 authored by Jess's avatar Jess Committed by GitHub
Browse files

AfroBench: How Good are Large Language Models on African Languages? (#2825)



* add afrixnli to task

* add chat completion

* remove chat completion -untested

* afrimmlu added

* afrimmlu folder update

* afrimmlu folder update

* updated prompt

* remove print

* add afrimgsm -direct

* add squad metric

* fix bash script

* remove direct util, update common yaml

* remove print

* add few show. metric fixes

* fix direct path, add bash script for gpt models

* added transate test

* update afrixnli tasks

* update afrixnli tasks

* update metrics for afrixnli

* prompt translations fix

* prompt translations fix

* filter and metric fix -mgsm

* remove squad metric

* remove squad metric

* add f1 score to mgsm

* add f1 score to mgsm

* update native-direct with lin

* change f1 function

* add lin to utils

* add utils

* remove test limit

* remove test configs

* add swahili to mmlu

* change eng to ewe in ewe yaml mmlu

* add squad metric to mgsm, remove whitespace filter

* added translate test

* added afrixnli_translate

* fix exact match valueError

* fix exact match valueError

* restructure mmlu folder

* spacing

* remove afrimmlu_translate folder

* add utility

* format task name, clean ups

* modefied mgsm

* update on afrimgsm

* update on afrimgsm

* removed utils

* other mgsm varieties

* other mgsm varieties

* adding trasnslate direct

* Update translate_direct_yaml

* add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model

* edit for open models

* Update translate_direct_yaml

* add verbalizer for xnli

* change xnli from multiple choice to generate

* add manual accuracy scores

* revert xnli to multiple choice

* change afrimgsm utils

* revert xnli to multiple_choice

* cleanups and readmes

* remove openai fixes and unused regex

* pr review changes

* revert metrics.py, task.py and extraction.py to main version

* add afrisenti

* utilities

* pulled from main

* add afrixnli

* add afrimmlu

* update afrixnli prompts

* mising senti language

* fix afrisenti prompt 2

* fix afrisenti prompts

* fix afrisenti prompts

* configure task grouping

* add multiple prompts to afrixnli for irokobench

* add multiple prompts to afrimmlu for irokobench

* Update afrixnli_yaml

* fixes and moves

* fixes and moves

* afrimmlu multiple prompts configs

* remove validation set from afrimmlu

* remove eng from afrimmlu translate test

* correct dataset path

* multiple prompts for mgsm

* file restructure

* afribench grouping

* repo restructuring

* repo restructuring

* update exact match to hugging face exact match and add new mgsm language

* remove decontamination

* update generation kwargs

* update generation kwargs for all mgsm prompts

* remove lang

* update generation kwargs for afrimgsm translatetest

* add afrimgsm cot for direct and translate

* remove eng from translate-cot

* add masakhaPOS tasks

* remove changes from task script

* add masakhanews tasks

* add uhura arc easy

* add afriqa and belebele files

* add tags for easier run. add naija rc

* add new metrics and transformation scripts

* fix afriqa swa fewshot split

* add naijarc

* add afrobench lite tasks

* update afrobench

* update afrobench

* remove unverified files to avoid bugs

* remove files not needed

* add afrobench tasks

* add afrobench tasks

* change to version 1

* change to version 1

* update afrobench

* update afrobench

* restore metric to original script

* update readme instructions

* add individual dataset readmes

* add link to collections

* correct run script

* align with main

* align with main

* align with main

* align with main

* align with main

* align with main

* align with main

* align with main

* failed run fixes

* failed run fixes

* add afrimgsm cot

* Apply precommit fixes

* update mafand dataset name

* pull request fixes

* remove afrihate due to availability

---------
Co-authored-by: default avatarIsrael Abebe Azime <azime@cg.uni-saarland.de>
Co-authored-by: default avatarIsrael Abebe Azime <se.israel.abebe@gmail.com>
Co-authored-by: default avatarDavid Adelani <davlanade@gmail.com>
Co-authored-by: default avatartheyorubayesian <akin.o.oladipo@gmail.com>
parent cf51e699
# Generated by utils.py
dataset_name: wol
doc_to_text: "Given the following premise and hypothesis in Wolof, identify if the\
\ premise entails, contradicts, or is neutral towards the hypothesis. Please respond\
\ with exact 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}}\
\ \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_wol_prompt_3
# Generated by utils.py
dataset_name: xho
doc_to_text: "Given the following premise and hypothesis in isiXhosa, identify if\
\ the premise entails, contradicts, or is neutral towards the hypothesis. Please\
\ respond with exact 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}}\
\ \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_xho_prompt_3
tag: afrixnli_tt_tasks
dataset_path: masakhane/afrixnli-translate-test
dataset_name: null
output_type: multiple_choice
test_split: test
fewshot_split: test
doc_to_target: !function utils.doc_to_target
doc_to_choice:
- "entailment"
- "neutral"
- "contradiction"
should_decontaminate: true
doc_to_decontamination_query: premise
metric_list:
- metric: f1
aggregation: !function utils.weighted_f1_score
average: weighted
higher_is_better: True
ignore_case: true
ignore_punctuation: true
- metric: acc
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
metadata:
version: 1.0
# Generated by utils.py
dataset_name: yor
doc_to_text: "Given the following premise and hypothesis in Yoruba, identify if the\
\ premise entails, contradicts, or is neutral towards the hypothesis. Please respond\
\ with exact 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}}\
\ \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_yor_prompt_3
# Generated by utils.py
dataset_name: zul
doc_to_text: "Given the following premise and hypothesis in Zulu, identify if the\
\ premise entails, contradicts, or is neutral towards the hypothesis. Please respond\
\ with exact 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}}\
\ \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_zul_prompt_3
from lm_eval.utils import weighted_f1_score
def doc_to_text(doc):
output = """You are an NLP assistant whose purpose is to solve Natural Language Inference (NLI) problems
Please identify whether the premise entails or contradicts the hypothesis in the following premise
and hypothesis. The answer should be exact entailment, contradiction, or neutral.
Premise: {premise}
Hypothesis: {hypothesis}
Is it entailment, contradiction, or neutral?"""
text = output.format(premise=doc["premise"], hypothesis=doc["hypothesis"])
return text
def doc_to_target(doc):
replacements = {0: "entailment", 1: "neutral", 2: "contradiction"}
return replacements[doc["label"]]
# Generated by utils.py
dataset_name: amh
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Amharic language.\nAnalyze the premise and hypothesis given in Amharic, and\
\ determine the relationship between them.\n Respond with one of the following options:\
\ 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis:\
\ {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_amh_prompt_4
# Generated by utils.py
dataset_name: ewe
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Ewe language.\nAnalyze the premise and hypothesis given in Ewe, and determine\
\ the relationship between them.\n Respond with one of the following options: 'entailment',\
\ 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_ewe_prompt_4
# Generated by utils.py
dataset_name: fra
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the French language.\nAnalyze the premise and hypothesis given in French, and\
\ determine the relationship between them.\n Respond with one of the following options:\
\ 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis:\
\ {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_fra_prompt_4
# Generated by utils.py
dataset_name: hau
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Hausa language.\nAnalyze the premise and hypothesis given in Hausa, and determine\
\ the relationship between them.\n Respond with one of the following options: 'entailment',\
\ 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_hau_prompt_4
# Generated by utils.py
dataset_name: ibo
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Igbo language.\nAnalyze the premise and hypothesis given in Igbo, and determine\
\ the relationship between them.\n Respond with one of the following options: 'entailment',\
\ 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_ibo_prompt_4
# Generated by utils.py
dataset_name: kin
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Kinyarwanda language.\nAnalyze the premise and hypothesis given in Kinyarwanda,\
\ and determine the relationship between them.\n Respond with one of the following\
\ options: 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}}\
\ \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_kin_prompt_4
# Generated by utils.py
dataset_name: lin
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Lingala language.\nAnalyze the premise and hypothesis given in Lingala, and\
\ determine the relationship between them.\n Respond with one of the following options:\
\ 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis:\
\ {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_lin_prompt_4
# Generated by utils.py
dataset_name: lug
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Luganda language.\nAnalyze the premise and hypothesis given in Luganda, and\
\ determine the relationship between them.\n Respond with one of the following options:\
\ 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis:\
\ {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_lug_prompt_4
# Generated by utils.py
dataset_name: orm
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Oromo language.\nAnalyze the premise and hypothesis given in Oromo, and determine\
\ the relationship between them.\n Respond with one of the following options: 'entailment',\
\ 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_orm_prompt_4
# Generated by utils.py
dataset_name: sna
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the chiShona language.\nAnalyze the premise and hypothesis given in chiShona,\
\ and determine the relationship between them.\n Respond with one of the following\
\ options: 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}}\
\ \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_sna_prompt_4
# Generated by utils.py
dataset_name: sot
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Sesotho language.\nAnalyze the premise and hypothesis given in Sesotho, and\
\ determine the relationship between them.\n Respond with one of the following options:\
\ 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis:\
\ {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_sot_prompt_4
# Generated by utils.py
dataset_name: swa
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Swahili language.\nAnalyze the premise and hypothesis given in Swahili, and\
\ determine the relationship between them.\n Respond with one of the following options:\
\ 'entailment', 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis:\
\ {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_swa_prompt_4
# Generated by utils.py
dataset_name: twi
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Twi language.\nAnalyze the premise and hypothesis given in Twi, and determine\
\ the relationship between them.\n Respond with one of the following options: 'entailment',\
\ 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_twi_prompt_4
# Generated by utils.py
dataset_name: wol
doc_to_text: "You are an expert in Natural Language Inference (NLI) specializing in\
\ the Wolof language.\nAnalyze the premise and hypothesis given in Wolof, and determine\
\ the relationship between them.\n Respond with one of the following options: 'entailment',\
\ 'contradiction', or 'neutral'. \n\nPremise: {{premise}} \nHypothesis: {{hypothesis}}"
include: afrixnli_translate_yaml
task: afrixnli_translate_wol_prompt_4
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment