AfroBench: How Good are Large Language Models on African Languages? (#2825)

* add afrixnli to task * add chat completion * remove chat completion -untested * afrimmlu added * afrimmlu folder update * afrimmlu folder update * updated prompt * remove print * add afrimgsm -direct * add squad metric * fix bash script * remove direct util, update common yaml * remove print * add few show. metric fixes * fix direct path, add bash script for gpt models * added transate test * update afrixnli tasks * update afrixnli tasks * update metrics for afrixnli * prompt translations fix * prompt translations fix * filter and metric fix -mgsm * remove squad metric * remove squad metric * add f1 score to mgsm * add f1 score to mgsm * update native-direct with lin * change f1 function * add lin to utils * add utils * remove test limit * remove test configs * add swahili to mmlu * change eng to ewe in ewe yaml mmlu * add squad metric to mgsm, remove whitespace filter * added translate test * added afrixnli_translate * fix exact match valueError * fix exact match valueError * restructure mmlu folder * spacing * remove afrimmlu_translate folder * add utility * format task name, clean ups * modefied mgsm * update on afrimgsm * update on afrimgsm * removed utils * other mgsm varieties * other mgsm varieties * adding trasnslate direct * Update translate_direct_yaml * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model * edit for open models * Update translate_direct_yaml * add verbalizer for xnli * change xnli from multiple choice to generate * add manual accuracy scores * revert xnli to multiple choice * change afrimgsm utils * revert xnli to multiple_choice * cleanups and readmes * remove openai fixes and unused regex * pr review changes * revert metrics.py, task.py and extraction.py to main version * add afrisenti * utilities * pulled from main * add afrixnli * add afrimmlu * update afrixnli prompts * mising senti language * fix afrisenti prompt 2 * fix afrisenti prompts * fix afrisenti prompts * configure task grouping * add multiple prompts to afrixnli for irokobench * add multiple prompts to afrimmlu for irokobench * Update afrixnli_yaml * fixes and moves * fixes and moves * afrimmlu multiple prompts configs * remove validation set from afrimmlu * remove eng from afrimmlu translate test * correct dataset path * multiple prompts for mgsm * file restructure * afribench grouping * repo restructuring * repo restructuring * update exact match to hugging face exact match and add new mgsm language * remove decontamination * update generation kwargs * update generation kwargs for all mgsm prompts * remove lang * update generation kwargs for afrimgsm translatetest * add afrimgsm cot for direct and translate * remove eng from translate-cot * add masakhaPOS tasks * remove changes from task script * add masakhanews tasks * add uhura arc easy * add afriqa and belebele files * add tags for easier run. add naija rc * add new metrics and transformation scripts * fix afriqa swa fewshot split * add naijarc * add afrobench lite tasks * update afrobench * update afrobench * remove unverified files to avoid bugs * remove files not needed * add afrobench tasks * add afrobench tasks * change to version 1 * change to version 1 * update afrobench * update afrobench * restore metric to original script * update readme instructions * add individual dataset readmes * add link to collections * correct run script * align with main * align with main * align with main * align with main * align with main * align with main * align with main * align with main * failed run fixes * failed run fixes * add afrimgsm cot * Apply precommit fixes * update mafand dataset name * pull request fixes * remove afrihate due to availability --------- Co-authored-by: Israel Abebe Azime <azime@cg.uni-saarland.de> Co-authored-by: Israel Abebe Azime <se.israel.abebe@gmail.com> Co-authored-by: David Adelani <davlanade@gmail.com> Co-authored-by: theyorubayesian <akin.o.oladipo@gmail.com>

AfroBench: How Good are Large Language Models on African Languages? (#2825)
* add afrixnli to task * add chat completion * remove chat completion -untested * afrimmlu added * afrimmlu folder update * afrimmlu folder update * updated prompt * remove print * add afrimgsm -direct * add squad metric * fix bash script * remove direct util, update common yaml * remove print * add few show. metric fixes * fix direct path, add bash script for gpt models * added transate test * update afrixnli tasks * update afrixnli tasks * update metrics for afrixnli * prompt translations fix * prompt translations fix * filter and metric fix -mgsm * remove squad metric * remove squad metric * add f1 score to mgsm * add f1 score to mgsm * update native-direct with lin * change f1 function * add lin to utils * add utils * remove test limit * remove test configs * add swahili to mmlu * change eng to ewe in ewe yaml mmlu * add squad metric to mgsm, remove whitespace filter * added translate test * added afrixnli_translate * fix exact match valueError * fix exact match valueError * restructure mmlu folder * spacing * remove afrimmlu_translate folder * add utility * format task name, clean ups * modefied mgsm * update on afrimgsm * update on afrimgsm * removed utils * other mgsm varieties * other mgsm varieties * adding trasnslate direct * Update translate_direct_yaml * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model * edit for open models * Update translate_direct_yaml * add verbalizer for xnli * change xnli from multiple choice to generate * add manual accuracy scores * revert xnli to multiple choice * change afrimgsm utils * revert xnli to multiple_choice * cleanups and readmes * remove openai fixes and unused regex * pr review changes * revert metrics.py, task.py and extraction.py to main version * add afrisenti * utilities * pulled from main * add afrixnli * add afrimmlu * update afrixnli prompts * mising senti language * fix afrisenti prompt 2 * fix afrisenti prompts * fix afrisenti prompts * configure task grouping * add multiple prompts to afrixnli for irokobench * add multiple prompts to afrimmlu for irokobench * Update afrixnli_yaml * fixes and moves * fixes and moves * afrimmlu multiple prompts configs * remove validation set from afrimmlu * remove eng from afrimmlu translate test * correct dataset path * multiple prompts for mgsm * file restructure * afribench grouping * repo restructuring * repo restructuring * update exact match to hugging face exact match and add new mgsm language * remove decontamination * update generation kwargs * update generation kwargs for all mgsm prompts * remove lang * update generation kwargs for afrimgsm translatetest * add afrimgsm cot for direct and translate * remove eng from translate-cot * add masakhaPOS tasks * remove changes from task script * add masakhanews tasks * add uhura arc easy * add afriqa and belebele files * add tags for easier run. add naija rc * add new metrics and transformation scripts * fix afriqa swa fewshot split * add naijarc * add afrobench lite tasks * update afrobench * update afrobench * remove unverified files to avoid bugs * remove files not needed * add afrobench tasks * add afrobench tasks * change to version 1 * change to version 1 * update afrobench * update afrobench * restore metric to original script * update readme instructions * add individual dataset readmes * add link to collections * correct run script * align with main * align with main * align with main * align with main * align with main * align with main * align with main * align with main * failed run fixes * failed run fixes * add afrimgsm cot * Apply precommit fixes * update mafand dataset name * pull request fixes * remove afrihate due to availability --------- Co-authored-by: Israel Abebe Azime <azime@cg.uni-saarland.de> Co-authored-by: Israel Abebe Azime <se.israel.abebe@gmail.com> Co-authored-by: David Adelani <davlanade@gmail.com> Co-authored-by: theyorubayesian <akin.o.oladipo@gmail.com>
18297993 · Jess · GitHub · cf51e699 · 18297993 · 18297993
Unverified Commit 18297993 authored May 15, 2025 by Jess Committed by GitHub May 15, 2025
20 changed files
--- a/lm_eval/tasks/afrixnli/translate/prompt_5/afrixnli_translate_yaml
+++ b/lm_eval/tasks/afrixnli/translate/prompt_5/afrixnli_translate_yaml
+tag: afrixnli_tt_tasks
+dataset_path: masakhane/afrixnli-translate-test
+dataset_name: null
+output_type: multiple_choice
+test_split: test
+fewshot_split: test
+doc_to_target: !function utils.doc_to_target
+doc_to_choice:
+  - "true"
+  - "inconclusive"
+  - "false"
+should_decontaminate: true
+doc_to_decontamination_query: premise
+metric_list:
+  - metric: f1
+    aggregation: !function utils.weighted_f1_score
+    average: weighted
+    higher_is_better: True
+    ignore_case: true
+    ignore_punctuation: true
+  - metric: acc
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/afrixnli/translate/prompt_5/afrixnli_translate_yor.yaml
+++ b/lm_eval/tasks/afrixnli/translate/prompt_5/afrixnli_translate_yor.yaml
+# Generated by utils.py
+dataset_name: yor
+doc_to_text: "Based on the given statement, is the following claim 'true', 'false',\
+  \ or 'inconclusive'. \nStatement: {{premise}} \nClaim: {{hypothesis}}"
+include: afrixnli_translate_yaml
+task: afrixnli_translate_yor_prompt_5
--- a/lm_eval/tasks/afrixnli/translate/prompt_5/afrixnli_translate_zul.yaml
+++ b/lm_eval/tasks/afrixnli/translate/prompt_5/afrixnli_translate_zul.yaml
+# Generated by utils.py
+dataset_name: zul
+doc_to_text: "Based on the given statement, is the following claim 'true', 'false',\
+  \ or 'inconclusive'. \nStatement: {{premise}} \nClaim: {{hypothesis}}"
+include: afrixnli_translate_yaml
+task: afrixnli_translate_zul_prompt_5
--- a/lm_eval/tasks/afrixnli/translate/prompt_5/utils.py
+++ b/lm_eval/tasks/afrixnli/translate/prompt_5/utils.py
+from lm_eval.utils import weighted_f1_score
+
+
+def doc_to_target(doc):
+    replacements = {0: "true", 1: "false", 2: "inconclusive"}
+    return replacements[doc["label"]]
--- a/lm_eval/tasks/afrobench/README.md
+++ b/lm_eval/tasks/afrobench/README.md
+# AfroBench
+
+### Paper
+
+Title: `AfroBench: How Good are Large Language Models on African Languages?`
+
+Paper Link: https://arxiv.org/abs/2311.07978
+
+## Abstract
+> Large-scale multilingual evaluations, such as MEGA, often include only a handful of African languages due to the scarcity of high-quality evaluation data and the limited discoverability of existing African datasets. This lack of representation hinders comprehensive LLM evaluation across a diverse range of languages and tasks. To address these challenges, we introduce AfroBench -- a multi-task benchmark for evaluating the performance of LLMs across 64 African languages, 15 tasks and 22 datasets. AfroBench consists of nine natural language understanding datasets, six text generation datasets, six knowledge and question answering tasks, and one mathematical reasoning task. We present results comparing the performance of prompting LLMs to fine-tuned baselines based on BERT and T5-style models. Our results suggest large gaps in performance between high-resource languages, such as English, and African languages across most tasks; but performance also varies based on the availability of monolingual data resources. Our findings confirm that performance on African languages continues to remain a hurdle for current LLMs, underscoring the need for additional efforts to close this gap.
+
+HomePage: https://mcgill-nlp.github.io/AfroBench/
+
+### Groups, and Tasks
+#### Groups
+* `afrobench` : Runs all that tasks, datasets and prompts in this benchmark
+* `afrobench_lite`: Runs the lite version of the benchmark which includes; afrimgsm, afrimmlu, afrixnli, sib, intent, adr and flores
+
+Dataset specific grouping that listing all prompts, allowing users to review or edit them.
+* `adr`   `afrihate`   `afrisenti`   `belebele`  `african_flores` `injongointent`  `mafand`  `masakhaner`  `masakhapos`  `naijarc`  `nollysenti`  `african_ntrex`  `openai_mmlu`  `salt`  `sib`  `uhura`  `xlsum`
+
+
+#### Task Tags
+* `adr_tasks`: all datasets in this benchmark relating to Automatic Diacritics Restoration task
+* `afrihate_tasks`: all datasets in this benchmark relating to Hate Speech detection task
+* `afrimgsm_tasks`: all datasets in this benchmark relating to Mathematical reasoning task
+* `afrixnli_tasks`: all datasets in this benchmark relating to Natural Language Inference task
+* `afrobench_xqa_tasks`: all datasets in this benchmark relating to Crosslingual QA (XQA) task
+* `afrobench_sentiment_tasks`: all datasets in this benchmark relating to Sentiment Classification task
+* `afrobench_MT_tasks`: all datasets in this benchmark relating to Machine Translation task
+* `afrobench_TC_tasks`: all datasets in this benchmark relating to Topic Classification task
+* `afrobench_mmlu_tasks`: all datasets in this benchmark relating to MMLU task
+* `injongointent_tasks`: all datasets in this benchmark relating to Intent Detection task
+* `masakhaner_tasks`: all datasets in this benchmark relating to Named Entity Recognition (NER) task
+* `masakhapos_tasks`: all datasets in this benchmark relating to Part of Speech Tagging (POS) task
+* `RC_tasks`: all datasets in this benchmark relating to Reading Comprehension task
+* `uhura_arc_easy_tasks`: all datasets in this benchmark relating to Arc-Easy (XQA) task
+* `xlsum_tasks`: all datasets in this benchmark relating to Summarization task
+
+
+We've included sample run scripts for easier integration with the benchmark: [sample run scripts](./sample_run_scripts)
+
+For better understanding of the run interface see [interface.md](../../../docs/interface.md)
+
+All dataset used in this benchmark are available at [huggingface](https://huggingface.co/collections/masakhane/afrobench-67dbf553ebf5701c2207f883)
+
+### Citation
+
+```
+@misc{ojo2025afrobenchgoodlargelanguage,
+      title={AfroBench: How Good are Large Language Models on African Languages?},
+      author={Jessica Ojo and Odunayo Ogundepo and Akintunde Oladipo and Kelechi Ogueji and Jimmy Lin and Pontus Stenetorp and David Ifeoluwa Adelani},
+      year={2025},
+      eprint={2311.07978},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2311.07978},
+}
+```
+Please cite datasets used. Citations for individual datasets are included in their respective repository readme files within this benchmark.
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? The original paper doesn't have an associated implementation, but there is an official entry in [BigBench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/social_iqa). I use the same prompting format as BigBench.
+
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/afrobench/adr/README.md
+++ b/lm_eval/tasks/afrobench/adr/README.md
+# Automatic Diacritics Restoration (ADR)
+
+Automatic Diacritics Restoration (ADR) is the task of restoring diacritical marks in text where they have been omitted or removed.
+This process is essential for languages where diacritics alter pronunciation, meaning, or grammatical structure.
+ADR requires the model to have a deep understanding of linguistic context, syntax, and semantics to accurately predict and reinsert the appropriate diacritics.
+
+As part of this benchmark project, we utilise the mafand dataset to curate a dataset specifically for ADR. We focus on five languages: Gbomola, Fon, Igbo, Wolof, and Yoruba.
--- a/lm_eval/tasks/afrobench/adr/afridiacritics.yaml
+++ b/lm_eval/tasks/afrobench/adr/afridiacritics.yaml
+group: adr
+task:
+  - adr_prompt_1
+  - adr_prompt_2
+  - adr_prompt_3
+  - adr_prompt_4
+  - adr_prompt_5
+aggregate_metric_list:
+  - metric: acc
+    aggregation: mean
+    weight_by_size: true
+metadata:
+  version: 1
--- a/lm_eval/tasks/afrobench/adr/gen_utils.py
+++ b/lm_eval/tasks/afrobench/adr/gen_utils.py
+import argparse
+import os
+
+import yaml
+
+
+class FunctionTag:
+    def __init__(self, value):
+        self.value = value
+
+
+def prompt_func(mode, lang):
+    prompt_map = {
+        "prompt_1": "Please restore the missing diacritics in the following sentence: {{text}}. Return output sentence only",
+        "prompt_2": "Given a sentence without diacritics, add the appropriate diacritics to make it grammatically "
+        "and semantically correct. \nSentence: {{text}}. Return output sentence only",
+        "prompt_3": f"This text is in {lang}. Restore all diacritical marks to their proper places in the "
+        "following sentence: {{text}}. Return output sentence only",
+        "prompt_4": f"You are a linguist specializing in diacritical marks for {lang}. "
+        f"Add the appropriate diacritics to this {lang} sentence: "
+        "{{text}}. Return output sentence only",
+        "prompt_5": f"You are a linguist specializing in diacritical marks for {lang}. Diacritics are essential for "
+        f"proper pronunciation and meaning in {lang}. You are tasked with converting {lang} sentences  "
+        "without diacritics into their correctly accented forms. Here's the input: {{text}}. "
+        "Return output sentence only",
+    }
+    return prompt_map[mode]
+
+
+def gen_lang_yamls(output_dir: str, overwrite: bool, mode: str) -> None:
+    """
+    Generate a yaml file for each language.
+
+    :param output_dir: The directory to output the files to.
+    :param overwrite: Whether to overwrite files if they already exist.
+    """
+    err = []
+    languages = {
+        "fon": "Fon",
+        "bbj": "Gbomala",
+        "ibo": "Igbo",
+        "wol": "Wolof",
+        "yor": "Yoruba",
+    }
+
+    for lang in languages.keys():
+        try:
+            file_name = f"afridiacritics_{lang}.yaml"
+            task_name = f"afridiacritics_{lang}_{mode}"
+            yaml_template = "afridiacritics_yaml"
+            yaml_details = {
+                "include": yaml_template,
+                "task": task_name,
+                "dataset_name": lang,
+                "doc_to_text": prompt_func(mode, languages[lang]),
+            }
+            os.makedirs(f"{output_dir}/{mode}", exist_ok=True)
+            with open(
+                f"{output_dir}/{mode}/{file_name}",
+                "w" if overwrite else "x",
+                encoding="utf8",
+            ) as f:
+                f.write("# Generated by utils.py\n")
+                yaml.dump(
+                    yaml_details,
+                    f,
+                    allow_unicode=True,
+                )
+        except FileExistsError:
+            err.append(file_name)
+
+    if len(err) > 0:
+        raise FileExistsError(
+            "Files were not created because they already exist (use --overwrite flag):"
+            f" {', '.join(err)}"
+        )
+
+
+def main() -> None:
+    """Parse CLI args and generate language-specific yaml files."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--overwrite",
+        default=True,
+        action="store_true",
+        help="Overwrite files if they already exist",
+    )
+    parser.add_argument(
+        "--output-dir",
+        default="./",
+        help="Directory to write yaml files to",
+    )
+    parser.add_argument(
+        "--mode",
+        default="prompt_1",
+        choices=["prompt_1", "prompt_2", "prompt_3", "prompt_4", "prompt_5"],
+        help="Prompt number",
+    )
+    args = parser.parse_args()
+
+    gen_lang_yamls(output_dir=args.output_dir, overwrite=args.overwrite, mode=args.mode)
+
+
+if __name__ == "__main__":
+    main()
--- a/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_bbj.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_bbj.yaml
+# Generated by utils.py
+dataset_name: bbj
+doc_to_text: 'Please restore the missing diacritics in the following sentence: {{text}}.
+  Return output sentence only'
+include: afridiacritics_yaml
+task: afridiacritics_bbj_prompt_1
--- a/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_fon.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_fon.yaml
+# Generated by utils.py
+dataset_name: fon
+doc_to_text: 'Please restore the missing diacritics in the following sentence: {{text}}.
+  Return output sentence only'
+include: afridiacritics_yaml
+task: afridiacritics_fon_prompt_1
--- a/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_ibo.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_ibo.yaml
+# Generated by utils.py
+dataset_name: ibo
+doc_to_text: 'Please restore the missing diacritics in the following sentence: {{text}}.
+  Return output sentence only'
+include: afridiacritics_yaml
+task: afridiacritics_ibo_prompt_1
--- a/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_wol.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_wol.yaml
+# Generated by utils.py
+dataset_name: wol
+doc_to_text: 'Please restore the missing diacritics in the following sentence: {{text}}.
+  Return output sentence only'
+include: afridiacritics_yaml
+task: afridiacritics_wol_prompt_1
--- a/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_yaml
+tag:
+- adr_tasks
+- adr_prompt_1
+dataset_path: masakhane/diacritics-restoration
+dataset_kwargs: {trust_remote_code: True}
+doc_to_target: target
+output_type: generate_until
+fewshot_split: dev
+test_split: test
+training_split: train
+metric_list:
+  - metric: bleu
+    aggregation: bleu
+    higher_is_better: true
+  - metric: chrf
+    aggregation: chrf
+    higher_is_better: true
+generation_kwargs:
+  do_sample: false
+  until:
+  - '<eos>'
+  - </s>
+  - <|im_end|>
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_yor.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_1/afridiacritics_yor.yaml
+# Generated by utils.py
+dataset_name: yor
+doc_to_text: 'Please restore the missing diacritics in the following sentence: {{text}}.
+  Return output sentence only'
+include: afridiacritics_yaml
+task: afridiacritics_yor_prompt_1
--- a/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_bbj.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_bbj.yaml
+# Generated by utils.py
+dataset_name: bbj
+doc_to_text: "Given a sentence without diacritics, add the appropriate diacritics\
+  \ to make it grammatically and semantically correct. \nSentence: {{text}}. Return\
+  \ output sentence only"
+include: afridiacritics_yaml
+task: afridiacritics_bbj_prompt_2
--- a/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_fon.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_fon.yaml
+# Generated by utils.py
+dataset_name: fon
+doc_to_text: "Given a sentence without diacritics, add the appropriate diacritics\
+  \ to make it grammatically and semantically correct. \nSentence: {{text}}. Return\
+  \ output sentence only"
+include: afridiacritics_yaml
+task: afridiacritics_fon_prompt_2
--- a/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_ibo.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_ibo.yaml
+# Generated by utils.py
+dataset_name: ibo
+doc_to_text: "Given a sentence without diacritics, add the appropriate diacritics\
+  \ to make it grammatically and semantically correct. \nSentence: {{text}}. Return\
+  \ output sentence only"
+include: afridiacritics_yaml
+task: afridiacritics_ibo_prompt_2
--- a/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_wol.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_wol.yaml
+# Generated by utils.py
+dataset_name: wol
+doc_to_text: "Given a sentence without diacritics, add the appropriate diacritics\
+  \ to make it grammatically and semantically correct. \nSentence: {{text}}. Return\
+  \ output sentence only"
+include: afridiacritics_yaml
+task: afridiacritics_wol_prompt_2
--- a/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_yaml
+tag:
+- adr_tasks
+- adr_prompt_2
+dataset_path: masakhane/diacritics-restoration
+dataset_kwargs: {trust_remote_code: True}
+doc_to_target: target
+output_type: generate_until
+fewshot_split: dev
+test_split: test
+training_split: train
+metric_list:
+  - metric: bleu
+    aggregation: bleu
+    higher_is_better: true
+  - metric: chrf
+    aggregation: chrf
+    higher_is_better: true
+generation_kwargs:
+  do_sample: false
+  until:
+  - '<eos>'
+  - </s>
+  - <|im_end|>
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_yor.yaml
+++ b/lm_eval/tasks/afrobench/adr/prompt_2/afridiacritics_yor.yaml
+# Generated by utils.py
+dataset_name: yor
+doc_to_text: "Given a sentence without diacritics, add the appropriate diacritics\
+  \ to make it grammatically and semantically correct. \nSentence: {{text}}. Return\
+  \ output sentence only"
+include: afridiacritics_yaml
+task: afridiacritics_yor_prompt_2