Add new benchmark: Spanish bench (#2157)

* Add spanish_bench * Add flores_es group * Update _flores_common_yaml * Delete lm_eval/tasks/spanish_bench/escola.yaml * Delete escola from spanish_bench.yaml * Delete escola from README.md * pre-commit run --all-files * Updated some task groupings and readme ---------

Add new benchmark: Spanish bench (#2157)
* Add spanish_bench * Add flores_es group * Update _flores_common_yaml * Delete lm_eval/tasks/spanish_bench/escola.yaml * Delete escola from spanish_bench.yaml * Delete escola from README.md * pre-commit run --all-files * Updated some task groupings and readme ---------
ea17b98e · zxcvuser · GitHub · 15ffb0da · ea17b98e · ea17b98e
Unverified Commit ea17b98e authored Oct 03, 2024 by zxcvuser Committed by GitHub Oct 03, 2024
20 changed files
--- a/lm_eval/tasks/README.md
+++ b/lm_eval/tasks/README.md
@@ -86,6 +86,7 @@
 | [pile_10k](pile_10k/README.md) | The first 10K elements of The Pile, useful for debugging models trained on it. | English |
 | [piqa](piqa/README.md) | Physical Interaction Question Answering tasks to test physical commonsense reasoning. | English |
 | [polemo2](polemo2/README.md) | Sentiment analysis and emotion detection tasks based on Polish language data. | Polish |
+| [portuguese_bench](portuguese_bench/README.md) | Collection of tasks in European Portuguese encompassing various evaluation areas. | Portuguese |
 | [prost](prost/README.md) | Tasks requiring understanding of professional standards and ethics in various domains. | English |
 | [pubmedqa](pubmedqa/README.md) | Question answering tasks based on PubMed research articles for biomedical understanding. | English |
 | [qa4mre](qa4mre/README.md) | Question Answering for Machine Reading Evaluation, assessing comprehension and reasoning. | English |
@@ -95,6 +96,7 @@
 | [sciq](sciq/README.md) | Science Question Answering tasks to assess understanding of scientific concepts. | English |
 | [scrolls](scrolls/README.md) | Tasks that involve long-form reading comprehension across various domains. | English |
 | [siqa](siqa/README.md) | Social Interaction Question Answering to evaluate common sense and social reasoning.  | English |
+| [spanish_bench](spanish_bench/README.md) | Collection of tasks in Spanish encompassing various evaluation areas. | Spanish |
 | [squad_completion](squad_completion/README.md) | A variant of the SQuAD question answering task designed for zero-shot evaluation of small LMs. | English |
 | [squadv2](squadv2/README.md) | Stanford Question Answering Dataset version 2, a reading comprehension benchmark. | English |
 | [storycloze](storycloze/README.md) | Tasks to predict story endings, focusing on narrative logic and coherence. | English |
@@ -121,4 +123,4 @@
 | [xnli_eu](xnli_eu/README.md) | Cross-lingual Natural Language Inference tasks in Basque. | Basque |
 | [xstorycloze](xstorycloze/README.md) | Cross-lingual narrative understanding tasks to predict story endings in multiple languages. | Russian, Simplified Chinese, Spanish, Arabic, Hindi, Indonesian, Telugu, Swahili, Basque, Burmese |
 | [xwinograd](xwinograd/README.md) | Cross-lingual Winograd schema tasks for coreference resolution in multiple languages. | English, French, Japanese, Portuguese, Russian, Chinese |
-| [portuguese_bench](portuguese_bench/README.md) | Collection of tasks in European Portuguese encompassing various evaluation areas. | Portuguese |
+
--- a/lm_eval/tasks/spanish_bench/README.md
+++ b/lm_eval/tasks/spanish_bench/README.md
+# SpanishBench
+
+### Paper
+
+SpanishBench is a benchmark for evaluating language models in Spanish tasks. This is, it evaluates the ability of a language model to understand and generate Spanish text. SpanishBench offers a combination of pre-existing, open datasets. All the details of SpanishBench will be published in a paper soon.
+
+The datasets included in SpanishBench are:
+
+| Task          | Category       | Paper title          | Homepage  |
+|:-------------:|:-----:|:-------------:|:-----:|
+| Belebele_es | Reading Comprehension | [The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants](https://arxiv.org/abs/2308.16884) | https://huggingface.co/datasets/facebook/belebele |
+| FLORES_es | Translation | [The FLORES-101  Evaluation Benchmark for Low-Resource and Multilingual Machine Translation](https://arxiv.org/abs/2106.03193) | https://huggingface.co/datasets/facebook/flores |
+| MGSM_es | Math | [Language Models are Multilingual Chain-of-Thought Reasoners](https://arxiv.org/abs/2210.03057) | https://huggingface.co/datasets/juletxara/mgsm |
+| PAWS-X_es | Paraphrasing | [PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification](https://aclanthology.org/D19-1382/) | https://huggingface.co/datasets/google-research-datasets/paws-x |
+| WNLI-es | Natural Language Inference | No paper. | https://huggingface.co/datasets/PlanTL-GOB-ES/wnli-es |
+| XL-Sum_es | Summarization | [XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages](https://aclanthology.org/2021.findings-acl.413/) | https://huggingface.co/datasets/csebuetnlp/xlsum |
+| XNLI_es | Natural Language Inference | [XNLI: Evaluating Cross-lingual Sentence Representations](https://aclanthology.org/D18-1269/) | https://huggingface.co/datasets/facebook/xnli |
+| XQuAD_es | Question Answering | [On the Cross-lingual Transferability of Monolingual Representations](https://aclanthology.org/2020.acl-main.421/) | https://huggingface.co/datasets/google/xquad |
+| XStoryCloze_es | Commonsense Reasoning | [Few-shot Learning with Multilingual Generative Language Models](https://aclanthology.org/2022.emnlp-main.616/) | https://huggingface.co/datasets/juletxara/xstory_cloze |
+
+
+### Citation
+Paper for SpanishBench coming soon.
+
+### Groups and Tasks
+
+#### Groups
+
+- `spanish_bench`: All tasks included in SpanishBench.
+- `flores_es`: All FLORES translation tasks from or to Spanish.
+
+#### Tags
+- `phrases_es`: Two Phrases_va tasks for language adaptation between Spanish and Valencian.
+
+#### Tasks
+
+The following tasks evaluate tasks on SpanishBench dataset using various scoring methods.
+  - `belebele_spa_Latn`
+  - `flores_es`
+  - `flores_es-ca`
+  - `flores_es-de`
+  - `flores_es-en`
+  - `flores_es-eu`
+  - `flores_es-fr`
+  - `flores_es-gl`
+  - `flores_es-it`
+  - `flores_es-pt`
+  - `flores_ca-es`
+  - `flores_de-es`
+  - `flores_en-es`
+  - `flores_eu-es`
+  - `flores_fr-es`
+  - `flores_gl-es`
+  - `flores_it-es`
+  - `flores_pt-es`
+  - `mgsm_direct_es_v2` (`v2` is due to an existing open issue in the original task)
+  - `paws_es`
+  - `phrases_es`
+  - `wnli_es`
+  - `xlsum_es`
+  - `xnli_es`
+  - `xquad_es`
+  - `xstorycloze_es`
+
+Some of these tasks are taken from benchmarks already available in LM Evaluation Harness. These are:
+- `belebele_spa_Latn`: Belebele Spanish
+- `mgsm_direct_es`: MGSM Spanish (We fix an existing open issue in the original task)
+- `paws_es`: PAWS-X Spanish
+- `xnli_es`: XNLI Spanish
+- `xstorycloze_es`: XStoryCloze Spanish
+
+### Checklist
+
+* [x] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation?
+    * [ ] Yes, original implementation contributed by author of the benchmark
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/spanish_bench/flores_es/_flores_common_yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/_flores_common_yaml
+group: flores
+dataset_path: facebook/flores
+dataset_name: all
+output_type: generate_until
+#! The test split of flores is not publicly available! (See paper section 6.1)
+#! We are using `dev` and `devtest` splits, but they're mapped to train/validation/test in `data/flores/flores.py`.
+training_split: dev
+validation_split: dev
+test_split: devtest
+fewshot_split: dev
+target_delimiter: ''
+generation_kwargs:
+  until:
+    - "\n"
+metric_list:
+  - metric: bleu
+    aggregation: bleu
+    higher_is_better: true
+  - metric: ter
+    aggregation: ter
+    higher_is_better: false
+  - metric: chrf
+    aggregation: chrf
+    higher_is_better: true
+metadata:
+  version: 1.0
+dataset_kwargs:
+  trust_remote_code: true
--- a/lm_eval/tasks/spanish_bench/flores_es/create_yamls_flores_es.py
+++ b/lm_eval/tasks/spanish_bench/flores_es/create_yamls_flores_es.py
+"""
+Script to generate task YAMLs for the FLORES-200 dataset.
+Based on `tasks/translation/utils.py`.
+"""
+
+import argparse
+from itertools import *
+
+import yaml
+from langcodes import *
+
+
+# utils
+flatten = lambda l: list(itertools.chain(*l))
+
+# constants
+_LANGUAGES = [
+    "ace_Arab",
+    "bam_Latn",
+    "dzo_Tibt",
+    "hin_Deva",
+    "khm_Khmr",
+    "mag_Deva",
+    "pap_Latn",
+    "sot_Latn",
+    "tur_Latn",
+    "ace_Latn",
+    "ban_Latn",
+    "ell_Grek",
+    "hne_Deva",
+    "kik_Latn",
+    "mai_Deva",
+    "pbt_Arab",
+    "spa_Latn",
+    "twi_Latn",
+    "acm_Arab",
+    "bel_Cyrl",
+    "eng_Latn",
+    "hrv_Latn",
+    "kin_Latn",
+    "mal_Mlym",
+    "pes_Arab",
+    "srd_Latn",
+    "tzm_Tfng",
+    "acq_Arab",
+    "bem_Latn",
+    "epo_Latn",
+    "hun_Latn",
+    "kir_Cyrl",
+    "mar_Deva",
+    "plt_Latn",
+    "srp_Cyrl",
+    "uig_Arab",
+    "aeb_Arab",
+    "ben_Beng",
+    "est_Latn",
+    "hye_Armn",
+    "kmb_Latn",
+    "min_Arab",
+    "pol_Latn",
+    "ssw_Latn",
+    "ukr_Cyrl",
+    "afr_Latn",
+    "bho_Deva",
+    "eus_Latn",
+    "ibo_Latn",
+    "kmr_Latn",
+    "min_Latn",
+    "por_Latn",
+    "sun_Latn",
+    "umb_Latn",
+    "ajp_Arab",
+    "bjn_Arab",
+    "ewe_Latn",
+    "ilo_Latn",
+    "knc_Arab",
+    "mkd_Cyrl",
+    "prs_Arab",
+    "swe_Latn",
+    "urd_Arab",
+    "aka_Latn",
+    "bjn_Latn",
+    "fao_Latn",
+    "ind_Latn",
+    "knc_Latn",
+    "mlt_Latn",
+    "quy_Latn",
+    "swh_Latn",
+    "uzn_Latn",
+    "als_Latn",
+    "bod_Tibt",
+    "fij_Latn",
+    "isl_Latn",
+    "kon_Latn",
+    "mni_Beng",
+    "ron_Latn",
+    "szl_Latn",
+    "vec_Latn",
+    "amh_Ethi",
+    "bos_Latn",
+    "fin_Latn",
+    "ita_Latn",
+    "kor_Hang",
+    "mos_Latn",
+    "run_Latn",
+    "tam_Taml",
+    "vie_Latn",
+    "apc_Arab",
+    "bug_Latn",
+    "fon_Latn",
+    "jav_Latn",
+    "lao_Laoo",
+    "mri_Latn",
+    "rus_Cyrl",
+    "taq_Latn",
+    "war_Latn",
+    "arb_Arab",
+    "bul_Cyrl",
+    "fra_Latn",
+    "jpn_Jpan",
+    "lij_Latn",
+    "mya_Mymr",
+    "sag_Latn",
+    "taq_Tfng",
+    "wol_Latn",
+    "arb_Latn",
+    "cat_Latn",
+    "fur_Latn",
+    "kab_Latn",
+    "lim_Latn",
+    "nld_Latn",
+    "san_Deva",
+    "tat_Cyrl",
+    "xho_Latn",
+    "ars_Arab",
+    "ceb_Latn",
+    "fuv_Latn",
+    "kac_Latn",
+    "lin_Latn",
+    "nno_Latn",
+    "sat_Olck",
+    "tel_Telu",
+    "ydd_Hebr",
+    "ary_Arab",
+    "ces_Latn",
+    "gaz_Latn",
+    "kam_Latn",
+    "lit_Latn",
+    "nob_Latn",
+    "scn_Latn",
+    "tgk_Cyrl",
+    "yor_Latn",
+    "arz_Arab",
+    "cjk_Latn",
+    "gla_Latn",
+    "kan_Knda",
+    "lmo_Latn",
+    "npi_Deva",
+    "shn_Mymr",
+    "tgl_Latn",
+    "yue_Hant",
+    "asm_Beng",
+    "ckb_Arab",
+    "gle_Latn",
+    "kas_Arab",
+    "ltg_Latn",
+    "nso_Latn",
+    "sin_Sinh",
+    "tha_Thai",
+    "zho_Hans",
+    "ast_Latn",
+    "crh_Latn",
+    "glg_Latn",
+    "kas_Deva",
+    "ltz_Latn",
+    "nus_Latn",
+    "slk_Latn",
+    "tir_Ethi",
+    "zho_Hant",
+    "awa_Deva",
+    "cym_Latn",
+    "grn_Latn",
+    "kat_Geor",
+    "lua_Latn",
+    "nya_Latn",
+    "slv_Latn",
+    "tpi_Latn",
+    "zsm_Latn",
+    "ayr_Latn",
+    "dan_Latn",
+    "guj_Gujr",
+    "kaz_Cyrl",
+    "lug_Latn",
+    "oci_Latn",
+    "smo_Latn",
+    "tsn_Latn",
+    "zul_Latn",
+    "azb_Arab",
+    "deu_Latn",
+    "hat_Latn",
+    "kbp_Latn",
+    "luo_Latn",
+    "ory_Orya",
+    "sna_Latn",
+    "tso_Latn",
+    "azj_Latn",
+    "dik_Latn",
+    "hau_Latn",
+    "kea_Latn",
+    "lus_Latn",
+    "pag_Latn",
+    "snd_Arab",
+    "tuk_Latn",
+    "bak_Cyrl",
+    "dyu_Latn",
+    "heb_Hebr",
+    "khk_Cyrl",
+    "lvs_Latn",
+    "pan_Guru",
+    "som_Latn",
+    "tum_Latn",
+]
+LANGUAGE_PAIRS = [
+    (a, b) for idx, a in enumerate(_LANGUAGES) for b in _LANGUAGES[idx + 1 :]
+]
+
+LANGUAGES_OF_INTEREST = [
+    "cat_Latn",
+    "spa_Latn",
+    "eng_Latn",
+    "glg_Latn",
+    "eus_Latn",
+    "ita_Latn",
+    "deu_Latn",
+    "por_Latn",
+    "fra_Latn",
+]
+MAIN_LANG = "spa_Latn"
+LANGUAGE_PAIRS = [
+    (a, b)
+    for (a, b) in LANGUAGE_PAIRS
+    if a in LANGUAGES_OF_INTEREST and b in LANGUAGES_OF_INTEREST and MAIN_LANG in (a, b)
+]
+
+# auxiliary functions
+
+code_to_language_name = lambda code: Language.make(
+    language=Language.get(code)["language"]
+).display_name()
+code_to_short_name = lambda code: Language.get(code)["language"]
+jinja_var = (
+    lambda s: "{{" + s + "}}"
+)  # wrapper to avoid having to escape { } in format strings
+
+
+def doc_to_text(src: str, tgt: str) -> str:
+    src_name, tgt_name = map(code_to_language_name, [src, tgt])
+
+    return f"""\
+{src_name} sentence: {jinja_var('sentence_' + src)}
+{tgt_name} sentence:"""
+
+
+def doc_to_target(tgt: str) -> str:
+    return f"{jinja_var('sentence_' + tgt)}"
+
+
+# main function
+
+
+def gen_lang_yamls(output_dir: str, overwrite: bool) -> None:
+    """
+    Generate a YAML file for each translation direction.
+    """
+
+    err = []
+    for src, tgt in LANGUAGE_PAIRS:
+        # do both translation directions for each lang pair
+        for src, tgt in [(src, tgt), (tgt, src)]:
+            lang_pair_name = f"{code_to_short_name(src)}-{code_to_short_name(tgt)}"
+            yaml_file_name = f"flores_{lang_pair_name}.yaml"
+
+            try:
+                with open(
+                    f"{output_dir}/{yaml_file_name}",
+                    "w" if overwrite else "x",
+                    encoding="utf-8",
+                ) as outfile:
+                    print(f"Creating {yaml_file_name}...")
+                    outfile.write("# File generated by `create-yamls.py`\n")
+                    yaml.dump(
+                        {
+                            #                            "group": "flores_es",
+                            "include": "_flores_common_yaml",
+                            "task": f"flores_{lang_pair_name}",
+                            "doc_to_text": doc_to_text(src, tgt),
+                            "doc_to_target": doc_to_target(tgt),
+                        },
+                        outfile,
+                        sort_keys=False,
+                    )
+
+            except FileExistsError:
+                err.append(yaml_file_name)
+
+    if len(err) > 0:
+        raise FileExistsError(
+            "Files were not created because they already exist:"
+            f" {', '.join(err)}"
+            "\nUse flag --overwrite to overwrite them."
+        )
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--overwrite",
+        default=False,
+        action="store_true",
+        help="Overwrite files if they already exist",
+    )
+    parser.add_argument(
+        "--output-dir", default=".", help="Directory to write yaml files to"
+    )
+    args = parser.parse_args()
+
+    gen_lang_yamls(output_dir=args.output_dir, overwrite=args.overwrite)
+
+
+if __name__ == "__main__":
+    main()
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_ca-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_ca-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_ca-es
+doc_to_text: 'Catalan sentence: {{sentence_cat_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_de-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_de-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_de-es
+doc_to_text: 'German sentence: {{sentence_deu_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_en-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_en-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_en-es
+doc_to_text: 'English sentence: {{sentence_eng_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-ca.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-ca.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-ca
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  Catalan sentence:'
+doc_to_target: '{{sentence_cat_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-de.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-de.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-de
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  German sentence:'
+doc_to_target: '{{sentence_deu_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-en.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-en.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-en
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  English sentence:'
+doc_to_target: '{{sentence_eng_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-eu.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-eu.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-eu
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  Basque sentence:'
+doc_to_target: '{{sentence_eus_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-fr.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-fr.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-fr
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  French sentence:'
+doc_to_target: '{{sentence_fra_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-gl.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-gl.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-gl
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  Galician sentence:'
+doc_to_target: '{{sentence_glg_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-it.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-it.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-it
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  Italian sentence:'
+doc_to_target: '{{sentence_ita_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es-pt.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es-pt.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_es-pt
+doc_to_text: 'Spanish sentence: {{sentence_spa_Latn}}
+
+  Portuguese sentence:'
+doc_to_target: '{{sentence_por_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_es.yaml
+group: flores_es
+task:
+  - flores_es-en
+  - flores_en-es
+  - flores_es-eu
+  - flores_eu-es
+  - flores_es-pt
+  - flores_pt-es
+  - flores_es-it
+  - flores_it-es
+  - flores_es-fr
+  - flores_fr-es
+  - flores_es-ca
+  - flores_ca-es
+  - flores_es-gl
+  - flores_gl-es
+  - flores_es-de
+  - flores_de-es
+aggregate_metric_list:
+  - metric: bleu
+    aggregation: mean
+    weight_by_size: false
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_eu-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_eu-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_eu-es
+doc_to_text: 'Basque sentence: {{sentence_eus_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_fr-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_fr-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_fr-es
+doc_to_text: 'French sentence: {{sentence_fra_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_gl-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_gl-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_gl-es
+doc_to_text: 'Galician sentence: {{sentence_glg_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'
--- a/lm_eval/tasks/spanish_bench/flores_es/flores_it-es.yaml
+++ b/lm_eval/tasks/spanish_bench/flores_es/flores_it-es.yaml
+# File generated by `create-yamls.py`
+include: _flores_common_yaml
+task: flores_it-es
+doc_to_text: 'Italian sentence: {{sentence_ita_Latn}}
+
+  Spanish sentence:'
+doc_to_target: '{{sentence_spa_Latn}}'