添加Megatron项目

5add46aa · hepj · deb8370c · 5add46aa · 5add46aa · 5add46aa
Commit 5add46aa authored Jan 09, 2025 by hepj
20 changed files
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/bigbench/push_bigbench_dataset.py
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/bigbench/push_bigbench_dataset.py
+"""
+A utility script that pushes all Bigbench subtasks from their form in the `bigbench` HF dataset
+into `{org name}/bigbench`.
+
+Prior to running, log into HF Hub for the target HF hub org via `huggingface-cli login`.
+
+Requires the installation of
+`pip install "bigbench @ https://storage.googleapis.com/public_research_data/bigbench/bigbench-0.0.1.tar.gz"`
+and is included so that the bigbench dependency can be avoided.
+"""
+import bigbench.api.util as bb_utils
+import datasets
+from tqdm import tqdm
+
+
+all_task_names = bb_utils.get_all_json_task_names()
+
+num_shots = [0]
+
+for shots in num_shots:
+    for task_name in tqdm(all_task_names):
+        try:
+            print(f"Loading '{task_name}' with num_shots={shots}...")
+            task_ds = datasets.load_dataset("bigbench", name=task_name, num_shots=shots)
+
+            print(f"Pushing '{task_name}' with num_shots={shots}...")
+            task_ds.push_to_hub("hails/bigbench", task_name + "_zero_shot")
+
+            del task_ds
+        except Exception as e:
+            raise e
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/README.md
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/README.md
+# Task-name
+
+### Paper
+
+Title: `BLiMP: A Benchmark of Linguistic Minimal Pairs for English`
+Abstract: `https://arxiv.org/abs/1912.00582`
+
+BLiMP is a challenge set for evaluating what language models (LMs) know about
+major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each
+containing 1000 minimal pairs isolating specific contrasts in syntax, morphology,
+or semantics. The data is automatically generated according to expert-crafted
+grammars.
+
+Homepage: https://github.com/alexwarstadt/blimp
+
+
+### Citation
+
+```
+@article{warstadt2019blimp,
+    author = {Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R.},
+    title = {BLiMP: The Benchmark of Linguistic Minimal Pairs for English},
+    journal = {Transactions of the Association for Computational Linguistics},
+    volume = {8},
+    number = {},
+    pages = {377-392},
+    year = {2020},
+    doi = {10.1162/tacl\_a\_00321},
+    URL = {https://doi.org/10.1162/tacl_a_00321},
+    eprint = {https://doi.org/10.1162/tacl_a_00321},
+    abstract = { We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP),1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4\%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands. }
+}
+```
+
+### Subtasks
+
+List or describe tasks defined in this folder, and their names here:
+* `task_name`: `1-sentence description of what this particular task does`
+* `task_name2`: .....
+
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/_template_yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/_template_yaml
+group: blimp
+dataset_path: blimp
+output_type: multiple_choice
+validation_split: train
+doc_to_text: ""
+doc_to_target: 0
+doc_to_choice: "{{[sentence_good, sentence_bad]}}"
+num_fewshot: 0
+should_decontaminate: true
+doc_to_decontamination_query: "{{sentence_good}} {{sentence_bad}}"
+metric_list:
+  - metric: acc
+metadata:
+  version: 1.0
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/adjunct_island.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/adjunct_island.yaml
+# Generated by utils.py
+dataset_name: adjunct_island
+include: _template_yaml
+task: blimp_adjunct_island
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/anaphor_gender_agreement.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/anaphor_gender_agreement.yaml
+# Generated by utils.py
+dataset_name: anaphor_gender_agreement
+include: _template_yaml
+task: blimp_anaphor_gender_agreement
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/anaphor_number_agreement.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/anaphor_number_agreement.yaml
+# Generated by utils.py
+dataset_name: anaphor_number_agreement
+include: _template_yaml
+task: blimp_anaphor_number_agreement
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/animate_subject_passive.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/animate_subject_passive.yaml
+# Generated by utils.py
+dataset_name: animate_subject_passive
+include: _template_yaml
+task: blimp_animate_subject_passive
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/animate_subject_trans.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/animate_subject_trans.yaml
+# Generated by utils.py
+dataset_name: animate_subject_trans
+include: _template_yaml
+task: blimp_animate_subject_trans
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/causative.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/causative.yaml
+# Generated by utils.py
+dataset_name: causative
+include: _template_yaml
+task: blimp_causative
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/complex_NP_island.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/complex_NP_island.yaml
+# Generated by utils.py
+dataset_name: complex_NP_island
+include: _template_yaml
+task: blimp_complex_NP_island
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/coordinate_structure_constraint_complex_left_branch.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/coordinate_structure_constraint_complex_left_branch.yaml
+# Generated by utils.py
+dataset_name: coordinate_structure_constraint_complex_left_branch
+include: _template_yaml
+task: blimp_coordinate_structure_constraint_complex_left_branch
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/coordinate_structure_constraint_object_extraction.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/coordinate_structure_constraint_object_extraction.yaml
+# Generated by utils.py
+dataset_name: coordinate_structure_constraint_object_extraction
+include: _template_yaml
+task: blimp_coordinate_structure_constraint_object_extraction
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_1.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_1.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_1
+include: _template_yaml
+task: blimp_determiner_noun_agreement_1
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_2.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_2.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_2
+include: _template_yaml
+task: blimp_determiner_noun_agreement_2
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_irregular_1.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_irregular_1.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_irregular_1
+include: _template_yaml
+task: blimp_determiner_noun_agreement_irregular_1
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_irregular_2.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_irregular_2.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_irregular_2
+include: _template_yaml
+task: blimp_determiner_noun_agreement_irregular_2
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adj_2.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adj_2.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_with_adj_2
+include: _template_yaml
+task: blimp_determiner_noun_agreement_with_adj_2
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adj_irregular_1.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adj_irregular_1.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_with_adj_irregular_1
+include: _template_yaml
+task: blimp_determiner_noun_agreement_with_adj_irregular_1
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adj_irregular_2.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adj_irregular_2.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_with_adj_irregular_2
+include: _template_yaml
+task: blimp_determiner_noun_agreement_with_adj_irregular_2
--- a/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adjective_1.yaml
+++ b/LM-Evaluation-Harness-240310/lm_eval/tasks/blimp/determiner_noun_agreement_with_adjective_1.yaml
+# Generated by utils.py
+dataset_name: determiner_noun_agreement_with_adjective_1
+include: _template_yaml
+task: blimp_determiner_noun_agreement_with_adjective_1