"test/git@developer.sourcefind.cn:jerrrrry/infinicore.git" did not exist on "14a278bf46d367cbe86187afe6e0024cacae952d"
Unverified Commit b40a20ae authored by zxcvuser's avatar zxcvuser Committed by GitHub
Browse files

Add xquad task (#2435)

* Add xquad task

* Update general README

* Run pre-commit
parent 838a3e03
...@@ -124,5 +124,6 @@ ...@@ -124,5 +124,6 @@
| [xcopa](xcopa/README.md) | Cross-lingual Choice of Plausible Alternatives, testing reasoning in multiple languages. | Estonian, Haitian, Indonesian, Italian, Quechua, Swahili, Tamil, Thai, Turkish, Vietnamese, Chinese | | [xcopa](xcopa/README.md) | Cross-lingual Choice of Plausible Alternatives, testing reasoning in multiple languages. | Estonian, Haitian, Indonesian, Italian, Quechua, Swahili, Tamil, Thai, Turkish, Vietnamese, Chinese |
| [xnli](xnli/README.md) | Cross-Lingual Natural Language Inference to test understanding across different languages. | Arabic, Bulgarian, German, Greek, English, Spanish, French, Hindi, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, Chinese | | [xnli](xnli/README.md) | Cross-Lingual Natural Language Inference to test understanding across different languages. | Arabic, Bulgarian, German, Greek, English, Spanish, French, Hindi, Russian, Swahili, Thai, Turkish, Urdu, Vietnamese, Chinese |
| [xnli_eu](xnli_eu/README.md) | Cross-lingual Natural Language Inference tasks in Basque. | Basque | | [xnli_eu](xnli_eu/README.md) | Cross-lingual Natural Language Inference tasks in Basque. | Basque |
| [xquad](xquad/README.md) | Cross-lingual Question Answering Dataset in multiple languages. | Arabic, German, Greek, English, Spanish, Hindi, Romanian, Russian, Thai, Turkish, Vietnamese, Chinese |
| [xstorycloze](xstorycloze/README.md) | Cross-lingual narrative understanding tasks to predict story endings in multiple languages. | Russian, Simplified Chinese, Spanish, Arabic, Hindi, Indonesian, Telugu, Swahili, Basque, Burmese | | [xstorycloze](xstorycloze/README.md) | Cross-lingual narrative understanding tasks to predict story endings in multiple languages. | Russian, Simplified Chinese, Spanish, Arabic, Hindi, Indonesian, Telugu, Swahili, Basque, Burmese |
| [xwinograd](xwinograd/README.md) | Cross-lingual Winograd schema tasks for coreference resolution in multiple languages. | English, French, Japanese, Portuguese, Russian, Chinese | | [xwinograd](xwinograd/README.md) | Cross-lingual Winograd schema tasks for coreference resolution in multiple languages. | English, French, Japanese, Portuguese, Russian, Chinese |
...@@ -35,15 +35,6 @@ def process_doc_nli(dataset): ...@@ -35,15 +35,6 @@ def process_doc_nli(dataset):
return dataset.map(process_fn) return dataset.map(process_fn)
def process_results_qa(doc, results):
preds = results[0]
reference = doc["answers"]["text"][0]
# import code; code.interact(local=dict(globals(), **locals()))
f1_sum = squad_metrics.compute_f1(reference, preds)
exact_match = squad_metrics.compute_exact(reference, preds)
return {"f1": f1_sum, "exact_match": exact_match}
def process_xlsum(dataset): def process_xlsum(dataset):
def _process_doc(doc): def _process_doc(doc):
# Remove double spaces # Remove double spaces
......
# XQuAD
### Paper
Title: `On the Cross-lingual Transferability of Monolingual Representations`
Abstract: https://aclanthology.org/2020.acl-main.421.pdf
XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
Homepage: https://github.com/deepmind/xquad
### Citation
```
@article{Artetxe:etal:2019,
author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
title = {On the cross-lingual transferability of monolingual representations},
journal = {CoRR},
volume = {abs/1910.11856},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.11856}
}
```
### Groups and Tasks
#### Groups
* `xquad`: All available languages.
#### Tasks
Perform extractive question answering for each language's subset of XQuAD.
* `xquad_ar`: Arabic
* `xquad_de`: German
* `xquad_el`: Greek
* `xquad_en`: English
* `xquad_es`: Spanish
* `xquad_hi`: Hindi
* `xquad_ro`: Romanian
* `xquad_ru`: Russian
* `xquad_th`: Thai
* `xquad_tr`: Turkish
* `xquad_vi`: Vietnamese
* `xquad_zh`: Chinese
### Checklist
For adding novel benchmarks/datasets to the library:
* [x] Is the task an existing benchmark in the literature?
* [x] Have you referenced the original paper that introduced the task?
* [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
import re
from itertools import product
import evaluate
import transformers.data.metrics.squad_metrics as squad_metrics
from lm_eval.utils import general_detokenize
def process_results_qa(doc, results):
preds = results[0]
reference = doc["answers"]["text"][0]
f1_sum = squad_metrics.compute_f1(reference, preds)
exact_match = squad_metrics.compute_exact(reference, preds)
return {"f1": f1_sum, "exact_match": exact_match}
include: xquad_common_yaml
task: xquad_ar
dataset_name: xquad.ar
doc_to_text: "سيا: {{context}}\n\nسؤال: {{question}}\n\nإجابة:"
task: xquad_es # This file will be included in the generated language-specific task configs.
# It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness.
tag: xquad
task: null
dataset_path: xquad dataset_path: xquad
dataset_name: xquad.es dataset_name: null
output_type: generate_until output_type: generate_until
doc_to_text: "Contexto: {{context}}\n\nPregunta: {{question}}\n\nRespuesta:"
doc_to_target: '{{answers["text"][0]}}'
validation_split: validation validation_split: validation
target_delimiter: ' ' doc_to_text: null
doc_to_target: '{{answers["text"][0]}}'
process_results: !function utils.process_results_qa process_results: !function utils.process_results_qa
target_delimiter: ' '
generation_kwargs: generation_kwargs:
until: until:
- "\n" - "\n"
......
include: xquad_common_yaml
task: xquad_de
dataset_name: xquad.de
doc_to_text: "Kontext: {{context}}\n\nFrage: {{question}}\n\nAntwort:"
include: xquad_common_yaml
task: xquad_el
dataset_name: xquad.el
doc_to_text: "Συμφραζόμενα: {{context}}\n\nΕρώτηση: {{question}}\n\nΑπάντηση:"
include: xquad_common_yaml
task: xquad_en
dataset_name: xquad.en
doc_to_text: "Context: {{context}}\n\nQuestion: {{question}}\n\nAnswer:"
include: xquad_common_yaml
task: xquad_es
dataset_name: xquad.es
doc_to_text: "Contexto: {{context}}\n\nPregunta: {{question}}\n\nRespuesta:"
include: xquad_common_yaml
task: xquad_hi
dataset_name: xquad.hi
doc_to_text: "प्रसंग: {{context}}\n\nसवाल: {{question}}\n\nउत्तर:"
include: xquad_common_yaml
task: xquad_ro
dataset_name: xquad.ro
doc_to_text: "Context: {{context}}\n\nÎntrebare: {{question}}\n\nRăspuns:"
include: xquad_common_yaml
task: xquad_ru
dataset_name: xquad.ru
doc_to_text: "Контекст: {{context}}\n\nВопрос: {{question}}\n\nОтвет:"
include: xquad_common_yaml
task: xquad_th
dataset_name: xquad.th
doc_to_text: "บริบท: {{context}}\n\nคำถาม: {{question}}\n\nคำตอบ:"
include: xquad_common_yaml
task: xquad_tr
dataset_name: xquad.tr
doc_to_text: "Bağlam: {{context}}\n\nSoru: {{question}}\n\nCevap:"
include: xquad_common_yaml
task: xquad_vi
dataset_name: xquad.vi
doc_to_text: "Bối cảnh: {{context}}\n\nCâu hỏi: {{question}}\n\nTrả lời:"
include: xquad_common_yaml
task: xquad_zh
dataset_name: xquad.zh
doc_to_text: "语境: {{context}}\n\n问题: {{question}}\n\n回答:"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment