Commit 5add46aa authored by hepj's avatar hepj
Browse files

添加Megatron项目

parent deb8370c
Pipeline #2199 failed with stages
in 0 seconds
"""
A utility script that pushes all Bigbench subtasks from their form in the `bigbench` HF dataset
into `{org name}/bigbench`.
Prior to running, log into HF Hub for the target HF hub org via `huggingface-cli login`.
Requires the installation of
`pip install "bigbench @ https://storage.googleapis.com/public_research_data/bigbench/bigbench-0.0.1.tar.gz"`
and is included so that the bigbench dependency can be avoided.
"""
import bigbench.api.util as bb_utils
import datasets
from tqdm import tqdm
all_task_names = bb_utils.get_all_json_task_names()
num_shots = [0]
for shots in num_shots:
for task_name in tqdm(all_task_names):
try:
print(f"Loading '{task_name}' with num_shots={shots}...")
task_ds = datasets.load_dataset("bigbench", name=task_name, num_shots=shots)
print(f"Pushing '{task_name}' with num_shots={shots}...")
task_ds.push_to_hub("hails/bigbench", task_name + "_zero_shot")
del task_ds
except Exception as e:
raise e
# Task-name
### Paper
Title: `BLiMP: A Benchmark of Linguistic Minimal Pairs for English`
Abstract: `https://arxiv.org/abs/1912.00582`
BLiMP is a challenge set for evaluating what language models (LMs) know about
major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each
containing 1000 minimal pairs isolating specific contrasts in syntax, morphology,
or semantics. The data is automatically generated according to expert-crafted
grammars.
Homepage: https://github.com/alexwarstadt/blimp
### Citation
```
@article{warstadt2019blimp,
author = {Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R.},
title = {BLiMP: The Benchmark of Linguistic Minimal Pairs for English},
journal = {Transactions of the Association for Computational Linguistics},
volume = {8},
number = {},
pages = {377-392},
year = {2020},
doi = {10.1162/tacl\_a\_00321},
URL = {https://doi.org/10.1162/tacl_a_00321},
eprint = {https://doi.org/10.1162/tacl_a_00321},
abstract = { We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4\%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands. }
}
```
### Subtasks
List or describe tasks defined in this folder, and their names here:
* `task_name`: `1-sentence description of what this particular task does`
* `task_name2`: .....
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
group: blimp
dataset_path: blimp
output_type: multiple_choice
validation_split: train
doc_to_text: ""
doc_to_target: 0
doc_to_choice: "{{[sentence_good, sentence_bad]}}"
num_fewshot: 0
should_decontaminate: true
doc_to_decontamination_query: "{{sentence_good}} {{sentence_bad}}"
metric_list:
- metric: acc
metadata:
version: 1.0
# Generated by utils.py
dataset_name: adjunct_island
include: _template_yaml
task: blimp_adjunct_island
# Generated by utils.py
dataset_name: anaphor_gender_agreement
include: _template_yaml
task: blimp_anaphor_gender_agreement
# Generated by utils.py
dataset_name: anaphor_number_agreement
include: _template_yaml
task: blimp_anaphor_number_agreement
# Generated by utils.py
dataset_name: animate_subject_passive
include: _template_yaml
task: blimp_animate_subject_passive
# Generated by utils.py
dataset_name: animate_subject_trans
include: _template_yaml
task: blimp_animate_subject_trans
# Generated by utils.py
dataset_name: causative
include: _template_yaml
task: blimp_causative
# Generated by utils.py
dataset_name: complex_NP_island
include: _template_yaml
task: blimp_complex_NP_island
# Generated by utils.py
dataset_name: coordinate_structure_constraint_complex_left_branch
include: _template_yaml
task: blimp_coordinate_structure_constraint_complex_left_branch
# Generated by utils.py
dataset_name: coordinate_structure_constraint_object_extraction
include: _template_yaml
task: blimp_coordinate_structure_constraint_object_extraction
# Generated by utils.py
dataset_name: determiner_noun_agreement_1
include: _template_yaml
task: blimp_determiner_noun_agreement_1
# Generated by utils.py
dataset_name: determiner_noun_agreement_2
include: _template_yaml
task: blimp_determiner_noun_agreement_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_irregular_1
include: _template_yaml
task: blimp_determiner_noun_agreement_irregular_1
# Generated by utils.py
dataset_name: determiner_noun_agreement_irregular_2
include: _template_yaml
task: blimp_determiner_noun_agreement_irregular_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adj_2
include: _template_yaml
task: blimp_determiner_noun_agreement_with_adj_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adj_irregular_1
include: _template_yaml
task: blimp_determiner_noun_agreement_with_adj_irregular_1
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adj_irregular_2
include: _template_yaml
task: blimp_determiner_noun_agreement_with_adj_irregular_2
# Generated by utils.py
dataset_name: determiner_noun_agreement_with_adjective_1
include: _template_yaml
task: blimp_determiner_noun_agreement_with_adjective_1
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment