add xnli_va dataset to catalan_bench (#3194)

51d8a192 · FranValero97 · GitHub · 30885632 · 51d8a192 · 51d8a192
Unverified Commit 51d8a192 authored Aug 21, 2025 by FranValero97 Committed by GitHub Aug 21, 2025
3 changed files
--- a/lm_eval/tasks/catalan_bench/README.md
+++ b/lm_eval/tasks/catalan_bench/README.md
@@ -33,6 +33,7 @@ The datasets included in CatalanBench that have been made public in previous pub
 | VeritasQA_ca | Truthfulness | VeritasQA: A Truthfulness Benchmark Aimed at Multilingual Transferability | TBA |
 | WNLI-ca | Natural Language Inference | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/wnli-ca |
 | XNLI-ca | Natural Language Inference | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/xnli-ca |
+| XNLI-va | Natural Language Inference | Building a Data Infrastructure for a Mid-Resource Language: The Case of Valencian | https://huggingface.co/datasets/gplsi/xnli_va |
 | XQuAD-ca | Question Answering | [Building a Data Infrastructure for a Mid-Resource Language: The Case of Catalan](https://aclanthology.org/2024.lrec-main.231/) | https://huggingface.co/datasets/projecte-aina/xquad-ca |


@@ -126,6 +127,7 @@ The following tasks evaluate tasks on CatalanBench dataset using various scoring
  - `veritasqa_mc2_ca`
  - `wnli_ca`
  - `xnli_ca`
+  - `xnli_va`
  - `xquad_ca`
  - `xstorycloze_ca`

@@ -148,3 +150,4 @@ If other tasks on this dataset are already supported:

 ### Changelog
 version 2.0: (2025-Mar-18) add [`cococteros_va`](./cocoteros_va.yaml) task.
+version 2.1: (2025-Jul-30) add [`xnli_va`](./xnli_va.yaml) task.
--- a/lm_eval/tasks/catalan_bench/catalan_bench.yaml
+++ b/lm_eval/tasks/catalan_bench/catalan_bench.yaml
@@ -22,5 +22,6 @@ task:
    - mgsm_direct_ca
    - phrases_va
    - cocoteros_va
+    - xnli_va
 metadata:
-  version: 2.0
+  version: 2.1
--- a/lm_eval/tasks/catalan_bench/xnli_va.yaml
+++ b/lm_eval/tasks/catalan_bench/xnli_va.yaml
+task: xnli_va
+dataset_path: gplsi/xnli_va
+dataset_name: null
+include: ../xnli/xnli_common_yaml
+output_type: multiple_choice
+doc_to_choice: '{{[premise+", correcte? Sí, "+hypothesis,premise+", correcte? A més,
+  "+hypothesis,premise+", correcte? No, "+hypothesis]}}'
+doc_to_text: ''
+target_delimiter: ''
+process_docs: !function utils.process_doc_nli
+training_split: null
+validation_split: null
+test_split: test
+doc_to_target: label
+metric_list:
+  - metric: acc
+    aggregation: mean
+    higher_is_better: true
+metadata:
+  version: 1.0
+dataset_kwargs:
+  trust_remote_code: true