Unverified Commit c2c8e238 authored by Julen Etxaniz's avatar Julen Etxaniz Committed by GitHub
Browse files

Add Latxa paper evaluation tasks for Basque (#1654)

* add basqueglue

* add eus_exams

* add eus_proficiency

* add eus_reading

* add eus_trivia

* run pre-commit
parent ab7cc6b1
# Generated by utils.py
dataset_name: eu_opegasteizkoudala
include: eus_exams_eu
task: eus_exams_eu_opegasteizkoudala
# Generated by utils.py
dataset_name: eu_opeosakiadmineu
include: eus_exams_eu
task: eus_exams_eu_opeosakiadmineu
# Generated by utils.py
dataset_name: eu_opeosakiauxenfeu
include: eus_exams_eu
task: eus_exams_eu_opeosakiauxenfeu
# Generated by utils.py
dataset_name: eu_opeosakiauxeu
include: eus_exams_eu
task: eus_exams_eu_opeosakiauxeu
# Generated by utils.py
dataset_name: eu_opeosakiceladoreu
include: eus_exams_eu
task: eus_exams_eu_opeosakiceladoreu
# Generated by utils.py
dataset_name: eu_opeosakienfeu
include: eus_exams_eu
task: eus_exams_eu_opeosakienfeu
# Generated by utils.py
dataset_name: eu_opeosakioperarioeu
include: eus_exams_eu
task: eus_exams_eu_opeosakioperarioeu
# Generated by utils.py
dataset_name: eu_opeosakitecnicoeu
include: eus_exams_eu
task: eus_exams_eu_opeosakitecnicoeu
# Generated by utils.py
dataset_name: eu_opeosakivarioseu
include: eus_exams_eu
task: eus_exams_eu_opeosakivarioseu
# Generated by utils.py
dataset_name: eu_osakidetza1e
include: eus_exams_eu
task: eus_exams_eu_osakidetza1e
# Generated by utils.py
dataset_name: eu_osakidetza2e
include: eus_exams_eu
task: eus_exams_eu_osakidetza2e
# Generated by utils.py
dataset_name: eu_osakidetza3e
include: eus_exams_eu
task: eus_exams_eu_osakidetza3e
# Generated by utils.py
dataset_name: eu_osakidetza5e
include: eus_exams_eu
task: eus_exams_eu_osakidetza5e
# Generated by utils.py
dataset_name: eu_osakidetza6e
include: eus_exams_eu
task: eus_exams_eu_osakidetza6e
# Generated by utils.py
dataset_name: eu_osakidetza7e
include: eus_exams_eu
task: eus_exams_eu_osakidetza7e
import datasets
def process_docs(dataset: datasets.Dataset):
"""Filter out examples with no answer."""
def valid_example(example: dict) -> bool:
"""Check if an example is valid."""
if example["answer"] not in [0, 1, 2, 3]:
return False
if example["candidates"] == ["", "", "", ""]:
return False
return True
return dataset.filter(valid_example)
# EusProficiency
### Paper
Title: Latxa: An Open Language Model and Evaluation Suite for Basque
Abstract: https://arxiv.org/abs/2403.20266
EusProficiency comprises 5,169 exercises on different topics from past EGA exams, the official C1-level certificate of proficiency in Basque. We collected the atarikoa exercises from EGA exams through the years 1998 to 2008. Atarikoa is the first qualifying test of EGA, which measures different aspects of language competency, such as reading comprehension, grammar, vocabulary, spelling, and writing. Each test generally has 85 multiple-choice questions, with 4 choices and a single correct answer.
Homepage: https://github.com/hitz-zentroa/latxa
### Citation
```
@misc{etxaniz2024latxa,
title={Latxa: An Open Language Model and Evaluation Suite for Basque},
author={Julen Etxaniz and Oscar Sainz and Naiara Perez and Itziar Aldabe and German Rigau and Eneko Agirre and Aitor Ormazabal and Mikel Artetxe and Aitor Soroa},
year={2024},
eprint={2403.20266},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Groups and Tasks
#### Groups
There are no groups.
#### Tasks
* `eus_proficiency`: EusProficiency comprises 5,169 exercises on different topics from past EGA exams, the official C1-level certificate of proficiency in Basque.
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
dataset_path: HiTZ/EusProficiency
dataset_name: default
task: eus_proficiency
doc_to_text: "Galdera: {{question}}\nA: {{candidates[0]}}\nB: {{candidates[1]}}\nC: {{candidates[2]}}\nD: {{candidates[3]}}\nErantzuna:"
doc_to_choice: ["A", "B", "C", "D"]
validation_split: null
test_split: test
fewshot_split: test
output_type: multiple_choice
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
# EusReading
### Paper
Title: Latxa: An Open Language Model and Evaluation Suite for Basque
Abstract: https://arxiv.org/abs/2403.20266
EusReading consists of 352 reading comprehension exercises (irakurmena) sourced from the set of past EGA exams from 1998 to 2008. Each test generally has 10 multiple-choice questions, with 4 choices and a single correct answer. These exercises are more challenging than Belebele due to the complexity and length of the input texts. As a result, EusReading is useful to measure long context understanding of models.
Homepage: https://github.com/hitz-zentroa/latxa
### Citation
```
@misc{etxaniz2024latxa,
title={Latxa: An Open Language Model and Evaluation Suite for Basque},
author={Julen Etxaniz and Oscar Sainz and Naiara Perez and Itziar Aldabe and German Rigau and Eneko Agirre and Aitor Ormazabal and Mikel Artetxe and Aitor Soroa},
year={2024},
eprint={2403.20266},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Groups and Tasks
#### Groups
There are no groups.
#### Tasks
* `eus_reading`: EusReading consists of 352 reading comprehension exercises (irakurmena) sourced from the set of past EGA exams from 1998 to 2008.
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
dataset_path: HiTZ/EusReading
dataset_name: default
task: eus_reading
doc_to_text: !function utils.doc_to_text_context
doc_to_choice: !function utils.doc_to_choice
validation_split: null
test_split: test
fewshot_split: test
output_type: multiple_choice
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 0.0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment