Unverified Commit 7634a6ec authored by Lintang Sutawika's avatar Lintang Sutawika Committed by GitHub
Browse files

Merge pull request #763 from EleutherAI/blimp

[Refactor] Add Blimp
parents 4907defd b22ccf92
# Task-name
### Paper
Title: `BLiMP: A Benchmark of Linguistic Minimal Pairs for English`
Abstract: `https://arxiv.org/abs/1912.00582`
BLiMP is a challenge set for evaluating what language models (LMs) know about
major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each
containing 1000 minimal pairs isolating specific contrasts in syntax, morphology,
or semantics. The data is automatically generated according to expert-crafted
grammars.
Homepage: https://github.com/alexwarstadt/blimp
### Citation
```
@article{warstadt2019blimp,
author = {Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R.},
title = {BLiMP: The Benchmark of Linguistic Minimal Pairs for English},
journal = {Transactions of the Association for Computational Linguistics},
volume = {8},
number = {},
pages = {377-392},
year = {2020},
doi = {10.1162/tacl\_a\_00321},
URL = {https://doi.org/10.1162/tacl_a_00321},
eprint = {https://doi.org/10.1162/tacl_a_00321},
abstract = { We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4\%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands. }
}
```
### Subtasks
List or describe tasks defined in this folder, and their names here:
* `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: .....
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
# Generated by utils.py
dataset_name: adjunct_island
include: template_yaml
task: blimp_adjunct_island
# Generated by utils.py
dataset_name: anaphor_gender_agreement
include: template_yaml
task: blimp_anaphor_gender_agreement
# Generated by utils.py
dataset_name: anaphor_number_agreement
include: template_yaml
task: blimp_anaphor_number_agreement
# Generated by utils.py
dataset_name: animate_subject_passive
include: template_yaml
task: blimp_animate_subject_passive
# Generated by utils.py
dataset_name: animate_subject_trans
include: template_yaml
task: blimp_animate_subject_trans
# Generated by utils.py
dataset_name: causative
include: template_yaml
task: blimp_causative
# Generated by utils.py
dataset_name: complex_NP_island
include: template_yaml
task: blimp_complex_NP_island
# Generated by utils.py
dataset_name: coordinate_structure_constraint_complex_left_branch
include: template_yaml
task: blimp_coordinate_structure_constraint_complex_left_branch
# Generated by utils.py
dataset_name: coordinate_structure_constraint_object_extraction
include: template_yaml
task: blimp_coordinate_structure_constraint_object_extraction
# Generated by utils.py
dataset_name: determiner_noun_agreement_1
include: template_yaml
task: blimp_determiner_noun_agreement_1
# Generated by utils.py
dataset_name: determiner_noun_agreement_2
include: template_yaml
task: blimp_determiner_noun_agreement_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_irregular_1
include: template_yaml
task: blimp_determiner_noun_agreement_irregular_1
# Generated by utils.py
dataset_name: determiner_noun_agreement_irregular_2
include: template_yaml
task: blimp_determiner_noun_agreement_irregular_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adj_2
include: template_yaml
task: blimp_determiner_noun_agreement_with_adj_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adj_irregular_1
include: template_yaml
task: blimp_determiner_noun_agreement_with_adj_irregular_1
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adj_irregular_2
include: template_yaml
task: blimp_determiner_noun_agreement_with_adj_irregular_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adjective_1
include: template_yaml
task: blimp_determiner_noun_agreement_with_adjective_1
# Generated by utils.py
dataset_name: distractor_agreement_relational_noun
include: template_yaml
task: blimp_distractor_agreement_relational_noun
# Generated by utils.py
dataset_name: distractor_agreement_relative_clause
include: template_yaml
task: blimp_distractor_agreement_relative_clause
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment