Commit af4b012e authored by lintangsutawika's avatar lintangsutawika
Browse files

add mutual

parent 57c29709
# MuTual
### Paper
Title: `MuTual: A Dataset for Multi-Turn Dialogue Reasoning`
Abstract: https://www.aclweb.org/anthology/2020.acl-main.130/
MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is
modified from Chinese high school English listening comprehension test data.
Homepage: https://github.com/Nealcly/MuTual
### Citation
```
@inproceedings{mutual,
title = "MuTual: A Dataset for Multi-Turn Dialogue Reasoning",
author = "Cui, Leyang and Wu, Yu and Liu, Shujie and Zhang, Yue and Zhou, Ming" ,
booktitle = "Proceedings of the 58th Conference of the Association for Computational Linguistics",
year = "2020",
publisher = "Association for Computational Linguistics",
}
```
### Groups and Tasks
#### Groups
* Not part of a group yet.
#### Tasks
* `mutual`
* `mutual_plus`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
include: mutual.yaml
task: mutual_plus
dataset_name: mutual_plus
task: mutual
dataset_path: "EleutherAI/mutual"
dataset_name: mutual
output_type: multiple_choice
training_split: train
validation_split: validation
doc_to_text: "{{article}}"
doc_to_target: "{{['A', 'B', 'C', 'D'].index(answers)}}"
doc_to_choice: "{{options}}"
process_docs: !function utils.process_docs
process_results: !function utils.process_results
should_decontaminate: true
doc_to_decontamination_query: "{{article}}"
metric_list:
- metric: r@1
aggregation: mean
higher_is_better: true
- metric: r@2
aggregation: mean
higher_is_better: true
- metric: mrr
aggregation: mean
higher_is_better: true
import numpy as np
def process_docs(dataset):
def _detokenize(text):
text = text.replace(" '", "'")
text = text.replace(" \n", "\n")
text = text.replace("\n ", "\n")
text = text.replace(" n't", "n't")
text = text.replace("`` ", '"')
text = text.replace("''", '"')
# punctuation
text = text.replace(" :", ":")
text = text.replace(" ;", ";")
text = text.replace(" !", "!")
text = text.replace(" ?", "?")
text = text.replace(" ,", ",")
text = text.replace(" .", ".")
return text
def _process(doc):
return {
"article": _detokenize(doc["article"]),
"options": [_detokenize(option) for option in doc["options"]],
}
return dataset.map(_process)
def process_results(doc, results):
gold = ["A", "B", "C", "D"].index(doc["answers"])
r4_1 = np.argmax(results) == gold # r4_1 = accuracy
ranks = sorted(results, reverse=True)
r4_2 = (ranks.index(results[gold]) == 1) + r4_1
mrr = 1.0 / (ranks.index(results[gold]) + 1) # `+ 1` for index offset
return {"r@1": r4_1, "r@2": r4_2, "mrr": mrr}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment