add cocoteros_es dataset (#2721)

Co-authored-by: Robiert Sepulveda Torres <rsepulveda911112@gmail.com>

add cocoteros_es dataset (#2721)
Co-authored-by: Robiert Sepulveda Torres <rsepulveda911112@gmail.com>
2b2fa97b · Santiago Galiano Segura · GitHub · 2f403fa0 · 2b2fa97b · 2b2fa97b
Unverified Commit 2b2fa97b authored Feb 25, 2025 by Santiago Galiano Segura Committed by GitHub Feb 25, 2025
3 changed files
--- a/lm_eval/tasks/spanish_bench/README.md
+++ b/lm_eval/tasks/spanish_bench/README.md
@@ -15,6 +15,7 @@ The datasets included in SpanishBench that have been made public in previous pub
 | Task          | Category       | Paper title          | Homepage  |
 |:-------------:|:-----:|:-------------:|:-----:|
 | Belebele_es | Reading Comprehension | [The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants](https://arxiv.org/abs/2308.16884) | https://huggingface.co/datasets/facebook/belebele |
+| Cocoteros_es | Commonsense Reasoning | [COCOTEROS: A Spanish Corpus with Contextual Knowledge for Natural Language Generation](https://besaya.infor.uva.es/sepln24/paper04.pdf) | https://huggingface.co/datasets/gplsi/cocoteros |
 | EsCoLA | Linguistic Acceptability | [EsCoLA: Spanish Corpus of Linguistic Acceptability](https://aclanthology.org/2024.lrec-main.554/) | https://huggingface.co/datasets/nbel/EsCoLA |
 | FLORES_es | Translation | [The FLORES-101  Evaluation Benchmark for Low-Resource and Multilingual Machine Translation](https://arxiv.org/abs/2106.03193) | https://huggingface.co/datasets/facebook/flores |
 | MGSM_es | Math | [Language Models are Multilingual Chain-of-Thought Reasoners](https://arxiv.org/abs/2210.03057) | https://huggingface.co/datasets/juletxara/mgsm |
@@ -77,6 +78,7 @@ The datasets included in SpanishBench that have been made public in previous pub
 The following tasks evaluate tasks on SpanishBench dataset using various scoring methods.
  - `belebele_spa_Latn`
+  - `cocoteros_es`
  - `copa_es`
  - `escola`
  - `flores_es`

--- a/lm_eval/tasks/spanish_bench/cocoteros_es.yaml
+++ b/lm_eval/tasks/spanish_bench/cocoteros_es.yaml
+task: cocoteros_es
+dataset_path: gplsi/cocoteros
+dataset_name: null
+output_type: generate_until
+doc_to_text: "Genera una frase corta con estas palabras: {{keywords}}. El contexto es: {{context}} \n\nRespuesta:"
+doc_to_target: "{{text}}"
+training_split: train
+test_split: test
+target_delimiter: ' '
+generation_kwargs:
+  max_gen_toks: 40
+  until:
+    - "\n"
+metric_list:
+  - metric: bleu
+    aggregation: bleu
+    higher_is_better: true
+  - metric: !function utils.rouge1
+    aggregation: !function utils.rouge1_agg
+    higher_is_better: true
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/spanish_bench/spanish_bench.yaml
+++ b/lm_eval/tasks/spanish_bench/spanish_bench.yaml
@@ -13,5 +13,6 @@ task:
  - mgsm_direct_es_spanish_bench
  - flores_es
  - phrases_es
+  - cocoteros_es
 metadata:
  version: 1.0