Commit 767c58b9 authored by lintangsutawika's avatar lintangsutawika
Browse files

Merge branch 'big-refactor' into update_docs

parents 3bfbddc4 759da8d5
......@@ -25,13 +25,20 @@ Homepage: https://github.com/hendrycks/ethics
}
```
### Subtasks
### Groups and Tasks
* `ethics_cm`:
*
#### Groups
Missing:
* `ethics_utilitarianism_original`:
- `hendrycks_ethics`
#### Tasks
* `ethics_cm`
* `ethics_deontology`
* `ethics_justice`
* `ethics_utilitarianism`
* (MISSING) `ethics_utilitarianism_original`
* `ethics_virtue`
### Checklist
......
# LAMBADA
### Paper
The LAMBADA dataset: Word prediction requiring a broad discourse context
https://arxiv.org/pdf/1606.06031.pdf
Title: `The LAMBADA dataset: Word prediction requiring a broad discourse context`
Abstract: https://arxiv.org/pdf/1606.06031.pdf
LAMBADA is a dataset to evaluate the capabilities of computational models for text
understanding by means of a word prediction task. LAMBADA is a collection of narrative
......@@ -14,6 +15,18 @@ in the broader discourse.
Homepage: https://zenodo.org/record/2630551#.X4Xzn5NKjUI
### Groups and Tasks
#### Groups
- `lambada`
#### Tasks
- `lambada_openai`
- `lambada_standard`
### Citation
@misc{
......
group:
- lambada
- loglikelihood
- perplexity
task: lambada_openai
dataset_path: EleutherAI/lambada_openai
dataset_name: default
......
group:
- lambada
- loglikelihood
- perplexity
task: lambada_standard
dataset_path: lambada
dataset_name: null
......
# LAMBADA Cloze
### Paper
Title: `The LAMBADA dataset: Word prediction requiring a broad discourse context`
Abstract: https://arxiv.org/abs/1606.06031
Cloze-style LAMBADA dataset.
LAMBADA is a dataset to evaluate the capabilities of computational models for text
understanding by means of a word prediction task. LAMBADA is a collection of narrative
passages sharing the characteristic that human subjects are able to guess their last
word if they are exposed to the whole passage, but not if they only see the last
sentence preceding the target word. To succeed on LAMBADA, computational models
cannot simply rely on local context, but must be able to keep track of information
in the broader discourse.
Homepage: https://zenodo.org/record/2630551#.X4Xzn5NKjUI
### Citation
```
@misc{
author={Paperno, Denis and Kruszewski, Germán and Lazaridou, Angeliki and Pham, Quan Ngoc and Bernardi, Raffaella and Pezzelle, Sandro and Baroni, Marco and Boleda, Gemma and Fernández, Raquel},
title={The LAMBADA dataset},
DOI={10.5281/zenodo.2630551},
publisher={Zenodo},
year={2016},
month={Aug}
}
```
### Groups and Tasks
#### Groups
* `lambada_cloze`
#### Tasks
* `lambada_openai_cloze_yaml`
* `lambada_standard_cloze_yaml`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
group:
- lambada_cloze
- loglikelihood
task: lambada_openai_cloze_yaml
dataset_path: EleutherAI/lambada_openai
dataset_name: default
......
group:
- lambada_cloze
- loglikelihood
task: lambada_standard_cloze_yaml
dataset_path: lambada
dataset_name: null
......
......@@ -25,7 +25,13 @@ Homepage: https://zenodo.org/record/2630551#.X4Xzn5NKjUI
month={Aug}
}
### Subtasks
### Groups and Tasks
#### Groups
* `lambada_multilingual`: Evaluates all `lambada_mt_X` tasks
#### Tasks
* `lambada_mt_{en, fr, de, it, es}`: Machine-translated versions of OpenAI's Lambada variant.
......
include: lambada_mt_en.yaml
group:
- lambada_multilingual
- loglikelihood
- perplexity
task: lambada_openai_mt_de
dataset_name: de
group:
- lambada_multilingual
- loglikelihood
- perplexity
task: lambada_openai_mt_en
dataset_path: EleutherAI/lambada_openai
dataset_name: en
......
include: lambada_mt_en.yaml
group:
- lambada_multilingual
- loglikelihood
- perplexity
task: lambada_openai_mt_es
dataset_name: es
include: lambada_mt_en.yaml
group:
- lambada_multilingual
- loglikelihood
- perplexity
task: lambada_openai_mt_fr
dataset_name: fr
include: lambada_mt_en.yaml
group:
- lambada_multilingual
- loglikelihood
- perplexity
task: lambada_openai_mt_it
dataset_name: it
# LogiQA
### Paper
Title: `LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning`
Abstract: https://arxiv.org/abs/2007.08124
LogiQA is a dataset for testing human logical reasoning. It consists of 8,678 QA
instances, covering multiple types of deductive reasoning. Results show that state-
of-the-art neural models perform by far worse than human ceiling. The dataset can
also serve as a benchmark for reinvestigating logical AI under the deep learning
NLP setting.
Homepage: https://github.com/lgw863/LogiQA-dataset
### Citation
```
@misc{liu2020logiqa,
title={LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning},
author={Jian Liu and Leyang Cui and Hanmeng Liu and Dandan Huang and Yile Wang and Yue Zhang},
year={2020},
eprint={2007.08124},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
### Groups and Tasks
#### Groups
* Not part of a group yet
#### Tasks
* `logiqa`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
group:
- multiple_choice
task: logiqa
dataset_path: EleutherAI/logiqa
dataset_name: logiqa
......
......@@ -25,15 +25,19 @@ Homepage: https://github.com/csitfun/LogiQA2.0
doi={10.1109/TASLP.2023.3293046}}
```
### Subtasks
### Groups and Tasks
`logiqa2_zh`: The original dataset in Chinese.
#### Groups
`logiqa2_NLI`: The NLI version of the dataset converted from the MRC version.
* Not part of a group yet
`logieval`: Prompt based; https://github.com/csitfun/LogiEval
#### Tasks
The subtasks have not been verified yet.
* `logiqa2_zh`: The original dataset in Chinese.
* `logiqa2_NLI`: The NLI version of the dataset converted from the MRC version.
* `logieval`: Prompt based; https://github.com/csitfun/LogiEval
NOTE! The subtasks have not been verified yet.
### Checklist
......
group:
- greedy_until
task: logieval
dataset_path: baber/logiqa2
dataset_name: logieval
......
group:
- multiple_choice
task: logiqa2
dataset_path: baber/logiqa2
dataset_name: logiqa2
......
......@@ -25,7 +25,13 @@ Homepage: https://math-qa.github.io/math-QA/
}
```
### Subtasks
### Groups and Tasks
#### Groups
* `math_word_problems`
#### Tasks
* `mathqa`: The MathQA dataset, as a multiple choice dataset where the answer choices are not in context.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment