Add cocoteros_va dataset (#2787)

* Add cocoteros_va dataset * Fix format in cocoteros_va.yml * Undo newline added * Execute pre-commit to fix format errors * Update catalan_bench.yaml version and add Changelog section into Readme.md

Add cocoteros_va dataset (#2787)
* Add cocoteros_va dataset * Fix format in cocoteros_va.yml * Undo newline added * Execute pre-commit to fix format errors * Update catalan_bench.yaml version and add Changelog section into Readme.md
65ef2573 · Santiago Galiano Segura · GitHub · fa1ce2c6 · 65ef2573 · 65ef2573
Unverified Commit 65ef2573 authored Mar 18, 2025 by Santiago Galiano Segura Committed by GitHub Mar 18, 2025
3 changed files
--- a/lm_eval/tasks/catalan_bench/README.md
+++ b/lm_eval/tasks/catalan_bench/README.md
@@ -23,6 +23,8 @@ The datasets included in CatalanBench that have been made public in previous pub
 | caBREU | Summarization | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/caBreu |
 | CatalanQA | Question Answering | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/catalanqa |
 | CatCoLA | Linguistic Acceptability | CatCoLA: Catalan Corpus of Linguistic Acceptability | https://huggingface.co/datasets/nbel/CatCoLA |
+| Cocoteros_va | Commonsense Reasoning | COCOTEROS_VA: Valencian translation of the COCOTEROS Spanish dataset | https://huggingface.co/datasets/gplsi/cocoteros_va |
+ | EsCoLA | Linguistic Acceptability | [EsCoLA: Spanish Corpus of Linguistic Acceptability](https://aclanthology.org/2024.lrec-main.554/) |
 | COPA-ca | Commonsense Reasoning | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/COPA-ca |
 | CoQCat | Question Answering | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/CoQCat |
 | FLORES_ca | Translation | [The FLORES-101  Evaluation Benchmark for Low-Resource and Multilingual Machine Translation](https://arxiv.org/abs/2106.03193) | https://huggingface.co/datasets/facebook/flores |
@@ -91,6 +93,7 @@ The following tasks evaluate tasks on CatalanBench dataset using various scoring
  - `cabreu`
  - `catalanqa`
  - `catcola`
+  - `cocoteros_va`
  - `copa_ca`
  - `coqcat`
  - `flores_ca`
@@ -141,3 +144,7 @@ If other tasks on this dataset are already supported:
 * [ ] Is the "Main" variant of this task clearly denoted?
 * [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
 * [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
+
+
+### Changelog
+version 2.0: (2025-Mar-18) add [`cococteros_va`](./cocoteros_va.yaml) task. 
\ No newline at end of file
--- a/lm_eval/tasks/catalan_bench/catalan_bench.yaml
+++ b/lm_eval/tasks/catalan_bench/catalan_bench.yaml
@@ -21,5 +21,6 @@ task:
    - cabreu
    - mgsm_direct_ca
    - phrases_va
+    - cocoteros_va
 metadata:
-  version: 1.0
+  version: 2.0
--- a/lm_eval/tasks/catalan_bench/cocoteros_va.yaml
+++ b/lm_eval/tasks/catalan_bench/cocoteros_va.yaml
+task: cocoteros_va
+dataset_path: gplsi/cocoteros_va
+dataset_name: null
+output_type: generate_until
+doc_to_text: "Genera una frase curta amb estes paraules: {{keywords}}. El context és: {{context}} \n\nResposta:"
+doc_to_target: "{{text}}"
+training_split: null
+validation_split: null
+test_split: test
+fewshot_split: test
+target_delimiter: ' '
+generation_kwargs:
+  max_gen_toks: 40
+  until:
+    - "\n"
+metric_list:
+  - metric: bleu
+    aggregation: bleu
+    higher_is_better: true
+  - metric: !function utils.rouge1
+    aggregation: !function utils.rouge1_agg
+    higher_is_better: true
+metadata:
+  version: 1.0