"src/vscode:/vscode.git/clone" did not exist on "e6110f68569c7b620306e678c3a3d9eee1a293e2"
Unverified Commit 2f403fa0 authored by Naiara Perez's avatar Naiara Perez Committed by GitHub
Browse files

add Basque translation of ARC and PAWS to BasqueBench (#2732)



* add Basque translation of ARC and PAWS to BasqueBench

* pre-commit

---------
Co-authored-by: default avatarBaber <baber@hey.com>
parent 01849b40
...@@ -5,14 +5,16 @@ ...@@ -5,14 +5,16 @@
BasqueBench is a benchmark for evaluating language models in Basque tasks. This is, it evaluates the ability of a language model to understand and generate Basque text. BasqueBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of BasqueBench will be published in a paper soon. BasqueBench is a benchmark for evaluating language models in Basque tasks. This is, it evaluates the ability of a language model to understand and generate Basque text. BasqueBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of BasqueBench will be published in a paper soon.
The new evaluation datasets included in BasqueBench are: The new evaluation datasets included in BasqueBench are:
| Task | Category | Homepage | | Task | Category | Homepage |
|:-------------:|:-----:|:-----:| |:--------:|:--------------------------:|:---------------------------------------------:|
| MGSM_eu | Math | https://huggingface.co/datasets/HiTZ/MGSM-eu | | ARC_eu | Question Answering | https://huggingface.co/datasets/HiTZ/ARC-eu |
| PIQA_eu | Question Answering | https://huggingface.co/datasets/HiTZ/PIQA-eu | | MGSM_eu | Math | https://huggingface.co/datasets/HiTZ/MGSM-eu |
| WNLI_eu | Natural Language Inference | https://huggingface.co/datasets/HiTZ/wnli-eu | | PAWS_eu | Paraphrasing | https://huggingface.co/datasets/HiTZ/PAWS-eu |
| XCOPA_eu | Commonsense Reasoning | https://huggingface.co/datasets/HiTZ/XCOPA-eu | | PIQA_eu | Question Answering | https://huggingface.co/datasets/HiTZ/PIQA-eu |
| WNLI_eu | Natural Language Inference | https://huggingface.co/datasets/HiTZ/WNLI-eu |
| XCOPA_eu | Commonsense Reasoning | https://huggingface.co/datasets/HiTZ/XCOPA-eu |
The datasets included in BasqueBench that have been made public in previous pubications are: The datasets included in BasqueBench that have been made public in previous publications are:
| Task | Category | Paper title | Homepage | | Task | Category | Paper title | Homepage |
|:-------------:|:-----:|:-------------:|:-----:| |:-------------:|:-----:|:-------------:|:-----:|
...@@ -73,6 +75,8 @@ The datasets included in BasqueBench that have been made public in previous pubi ...@@ -73,6 +75,8 @@ The datasets included in BasqueBench that have been made public in previous pubi
#### Tasks #### Tasks
The following tasks evaluate tasks on BasqueBench dataset using various scoring methods. The following tasks evaluate tasks on BasqueBench dataset using various scoring methods.
- `arc_eu_challenge`
- `arc_eu_easy`
- `belebele_eus_Latn` - `belebele_eus_Latn`
- `eus_exams_eu` - `eus_exams_eu`
- `eus_proficiency` - `eus_proficiency`
...@@ -97,6 +101,7 @@ The following tasks evaluate tasks on BasqueBench dataset using various scoring ...@@ -97,6 +101,7 @@ The following tasks evaluate tasks on BasqueBench dataset using various scoring
- `flores_pt-eu` - `flores_pt-eu`
- `mgsm_direct_eu` - `mgsm_direct_eu`
- `mgsm_native_cot_eu` - `mgsm_native_cot_eu`
- `paws_eu`
- `piqa_eu` - `piqa_eu`
- `qnlieu` - `qnlieu`
- `wnli_eu` - `wnli_eu`
......
include: arc_eu_easy.yaml
task: arc_eu_challenge
dataset_name: ARC-Challenge
task: arc_eu_easy
dataset_path: HiTZ/ARC-eu
dataset_name: ARC-Easy
output_type: multiple_choice
training_split: null
validation_split: validation
test_split: test
doc_to_text: "Galdera: {{question}}\nErantzuna:"
doc_to_target: "{{choices.label.index(answerKey)}}"
doc_to_choice: "{{choices.text}}"
should_decontaminate: true
doc_to_decontamination_query: "Galdera: {{question}}\nErantzuna:"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
group: basque_bench group: basque_bench
task: task:
- arc_eu_challenge
- arc_eu_easy
- belebele_eus_Latn - belebele_eus_Latn
- xstorycloze_eu - xstorycloze_eu
- flores_eu - flores_eu
...@@ -14,6 +16,7 @@ task: ...@@ -14,6 +16,7 @@ task:
- xcopa_eu - xcopa_eu
- mgsm_direct_eu - mgsm_direct_eu
- mgsm_native_cot_eu - mgsm_native_cot_eu
- paws_eu
- piqa_eu - piqa_eu
metadata: metadata:
version: 1.0 version: 1.0
task: paws_eu
dataset_path: HiTZ/PAWS-eu
dataset_name: null
output_type: multiple_choice
test_split: test
process_docs: !function utils.paws_process_docs
doc_to_text: ''
doc_to_target: label
doc_to_choice: '{{[sentence1+", ezta? Ez, "+sentence2, sentence1+", ezta? Bai, "+sentence2]}}'
target_delimiter: ''
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
from functools import partial
# ~~~~~~~~~~~ XCOPA ~~~~~~~~~~~ # # ~~~~~~~~~~~ XCOPA ~~~~~~~~~~~ #
xcopa_connectors = {"cause": " Izan ere,", "effect": " Beraz,"} xcopa_connectors = {"cause": " Izan ere,", "effect": " Beraz,"}
...@@ -18,4 +15,28 @@ def xcopa_doc_to_choice(doc): ...@@ -18,4 +15,28 @@ def xcopa_doc_to_choice(doc):
return [convert_choice(doc["choice1"]), convert_choice(doc["choice2"])] return [convert_choice(doc["choice1"]), convert_choice(doc["choice2"])]
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # # ~~~~~~~~~~~ PAWS-X ~~~~~~~~~~~ #
def paws_process_docs(dataset):
empty_docs = []
def _process_doc(doc):
if doc["sentence1"] not in [None, ""] and doc["sentence2"] not in [None, ""]:
# Remove final punctuation mark in the first sentence
if doc["sentence1"].endswith((".", ",", ";")):
doc["sentence1"] = doc["sentence1"][:-1]
# Start the second sentence in lowercase (to be used after "Yes, ...")
doc["sentence2"] = lowercase_first_letter(doc["sentence2"])
return doc
else:
empty_docs.append(doc)
return doc
def lowercase_first_letter(text):
return text[0].lower() + text[1:]
return dataset.filter(
lambda doc: doc["sentence1"] not in [None, ""]
and doc["sentence2"] not in [None, ""]
).map(_process_doc)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment