"tasks/msdp/preprocessing.py" did not exist on "3ec549bad6610aa11be7e85ae09edfc759b1d1ba"
Commit 5f48dfc2 authored by Igor Ostrovsky's avatar Igor Ostrovsky
Browse files

Add BLiMP

parent df5d7cf0
......@@ -90,191 +90,256 @@ To implement a new task in eval harness, see [this guide](https://github.com/Ele
### Full Task List
| Task Name |Train|Val|Test|Val/Test Docs| Metrics |
|-------------------------------------------------|-----|---|----|------------:|------------------------------------------------------------------------------|
|cola |✓ |✓ | | 1043|mcc |
|mnli |✓ |✓ | | 9815|acc |
|mnli_mismatched |✓ |✓ | | 9832|acc |
|mrpc |✓ |✓ | | 408|acc, f1 |
|rte |✓ |✓ | | 277|acc |
|qnli |✓ |✓ | | 5463|acc |
|qqp |✓ |✓ | | 40430|acc, f1 |
|sst |✓ |✓ | | 872|acc |
|wnli |✓ |✓ | | 71|acc |
|boolq |✓ |✓ | | 3270|acc |
|cb |✓ |✓ | | 56|acc, f1 |
|copa |✓ |✓ | | 100|acc |
|multirc |✓ |✓ | | 4848|acc |
|record |✓ |✓ | | 10000|f1, em |
|wic |✓ |✓ | | 638|acc |
|wsc |✓ |✓ | | 104|acc |
|coqa |✓ |✓ | | 500|f1, em |
|drop |✓ |✓ | | 9536|em, f1 |
|lambada | |✓ | | 5153|ppl, acc |
|lambada_cloze | |✓ | | 5153|ppl, acc |
|wikitext | |✓ |✓ | 62|word_perplexity, byte_perplexity, bits_per_byte |
|piqa |✓ |✓ | | 1838|acc, acc_norm |
|prost | | |✓ | 18736|acc, acc_norm |
|pubmedqa | | |✓ | 1000|acc |
|sciq |✓ |✓ |✓ | 1000|acc, acc_norm |
|qa4mre_2011 | | |✓ | 120|acc, acc_norm |
|qa4mre_2012 | | |✓ | 160|acc, acc_norm |
|qa4mre_2013 | | |✓ | 284|acc, acc_norm |
|triviaqa |✓ |✓ | | 11313|acc |
|arc_easy |✓ |✓ |✓ | 2376|acc, acc_norm |
|arc_challenge |✓ |✓ |✓ | 1172|acc, acc_norm |
|logiqa |✓ |✓ |✓ | 651|acc, acc_norm |
|hellaswag |✓ |✓ | | 10042|acc, acc_norm |
|openbookqa |✓ |✓ |✓ | 500|acc, acc_norm |
|squad2 |✓ |✓ | | 11873|exact, f1, HasAns_exact, HasAns_f1, NoAns_exact, NoAns_f1, best_exact, best_f1|
|race |✓ |✓ |✓ | 1045|acc |
|headqa |✓ |✓ |✓ | 2742|acc, acc_norm |
|mathqa |✓ |✓ |✓ | 2985|acc, acc_norm |
|webqs |✓ | |✓ | 2032|acc |
|wsc273 | | |✓ | 273|acc |
|winogrande |✓ |✓ | | 1267|acc |
|anli_r1 |✓ |✓ |✓ | 1000|acc |
|anli_r2 |✓ |✓ |✓ | 1000|acc |
|anli_r3 |✓ |✓ |✓ | 1200|acc |
|ethics_cm |✓ | |✓ | 3885|acc |
|ethics_deontology |✓ | |✓ | 3596|acc, em |
|ethics_justice |✓ | |✓ | 2704|acc, em |
|ethics_utilitarianism_original | | |✓ | 4808|acc |
|ethics_utilitarianism |✓ | |✓ | 4808|acc |
|ethics_virtue |✓ | |✓ | 4975|acc, em |
|math_algebra |✓ | |✓ | 1187|acc |
|math_counting_and_prob |✓ | |✓ | 474|acc |
|math_geometry |✓ | |✓ | 479|acc |
|math_intermediate_algebra |✓ | |✓ | 903|acc |
|math_num_theory |✓ | |✓ | 540|acc |
|math_prealgebra |✓ | |✓ | 871|acc |
|math_precalc |✓ | |✓ | 546|acc |
|arithmetic_2da | |✓ | | 2000|acc |
|arithmetic_2ds | |✓ | | 2000|acc |
|arithmetic_3da | |✓ | | 2000|acc |
|arithmetic_3ds | |✓ | | 2000|acc |
|arithmetic_4da | |✓ | | 2000|acc |
|arithmetic_4ds | |✓ | | 2000|acc |
|arithmetic_5da | |✓ | | 2000|acc |
|arithmetic_5ds | |✓ | | 2000|acc |
|arithmetic_2dm | |✓ | | 2000|acc |
|arithmetic_1dc | |✓ | | 2000|acc |
|hendrycksTest-abstract_algebra |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-anatomy |✓ |✓ |✓ | 135|acc, acc_norm |
|hendrycksTest-astronomy |✓ |✓ |✓ | 152|acc, acc_norm |
|hendrycksTest-business_ethics |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-clinical_knowledge |✓ |✓ |✓ | 265|acc, acc_norm |
|hendrycksTest-college_biology |✓ |✓ |✓ | 144|acc, acc_norm |
|hendrycksTest-college_chemistry |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-college_computer_science |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-college_mathematics |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-college_medicine |✓ |✓ |✓ | 173|acc, acc_norm |
|hendrycksTest-college_physics |✓ |✓ |✓ | 102|acc, acc_norm |
|hendrycksTest-computer_security |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-conceptual_physics |✓ |✓ |✓ | 235|acc, acc_norm |
|hendrycksTest-econometrics |✓ |✓ |✓ | 114|acc, acc_norm |
|hendrycksTest-electrical_engineering |✓ |✓ |✓ | 145|acc, acc_norm |
|hendrycksTest-elementary_mathematics |✓ |✓ |✓ | 378|acc, acc_norm |
|hendrycksTest-formal_logic |✓ |✓ |✓ | 126|acc, acc_norm |
|hendrycksTest-global_facts |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-high_school_biology |✓ |✓ |✓ | 310|acc, acc_norm |
|hendrycksTest-high_school_chemistry |✓ |✓ |✓ | 203|acc, acc_norm |
|hendrycksTest-high_school_computer_science |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-high_school_european_history |✓ |✓ |✓ | 165|acc, acc_norm |
|hendrycksTest-high_school_geography |✓ |✓ |✓ | 198|acc, acc_norm |
|hendrycksTest-high_school_government_and_politics|✓ |✓ |✓ | 193|acc, acc_norm |
|hendrycksTest-high_school_macroeconomics |✓ |✓ |✓ | 390|acc, acc_norm |
|hendrycksTest-high_school_mathematics |✓ |✓ |✓ | 270|acc, acc_norm |
|hendrycksTest-high_school_microeconomics |✓ |✓ |✓ | 238|acc, acc_norm |
|hendrycksTest-high_school_physics |✓ |✓ |✓ | 151|acc, acc_norm |
|hendrycksTest-high_school_psychology |✓ |✓ |✓ | 545|acc, acc_norm |
|hendrycksTest-high_school_statistics |✓ |✓ |✓ | 216|acc, acc_norm |
|hendrycksTest-high_school_us_history |✓ |✓ |✓ | 204|acc, acc_norm |
|hendrycksTest-high_school_world_history |✓ |✓ |✓ | 237|acc, acc_norm |
|hendrycksTest-human_aging |✓ |✓ |✓ | 223|acc, acc_norm |
|hendrycksTest-human_sexuality |✓ |✓ |✓ | 131|acc, acc_norm |
|hendrycksTest-international_law |✓ |✓ |✓ | 121|acc, acc_norm |
|hendrycksTest-jurisprudence |✓ |✓ |✓ | 108|acc, acc_norm |
|hendrycksTest-logical_fallacies |✓ |✓ |✓ | 163|acc, acc_norm |
|hendrycksTest-machine_learning |✓ |✓ |✓ | 112|acc, acc_norm |
|hendrycksTest-management |✓ |✓ |✓ | 103|acc, acc_norm |
|hendrycksTest-marketing |✓ |✓ |✓ | 234|acc, acc_norm |
|hendrycksTest-medical_genetics |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-miscellaneous |✓ |✓ |✓ | 783|acc, acc_norm |
|hendrycksTest-moral_disputes |✓ |✓ |✓ | 346|acc, acc_norm |
|hendrycksTest-moral_scenarios |✓ |✓ |✓ | 895|acc, acc_norm |
|hendrycksTest-nutrition |✓ |✓ |✓ | 306|acc, acc_norm |
|hendrycksTest-philosophy |✓ |✓ |✓ | 311|acc, acc_norm |
|hendrycksTest-prehistory |✓ |✓ |✓ | 324|acc, acc_norm |
|hendrycksTest-professional_accounting |✓ |✓ |✓ | 282|acc, acc_norm |
|hendrycksTest-professional_law |✓ |✓ |✓ | 1534|acc, acc_norm |
|hendrycksTest-professional_medicine |✓ |✓ |✓ | 272|acc, acc_norm |
|hendrycksTest-professional_psychology |✓ |✓ |✓ | 612|acc, acc_norm |
|hendrycksTest-public_relations |✓ |✓ |✓ | 110|acc, acc_norm |
|hendrycksTest-security_studies |✓ |✓ |✓ | 245|acc, acc_norm |
|hendrycksTest-sociology |✓ |✓ |✓ | 201|acc, acc_norm |
|hendrycksTest-us_foreign_policy |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-virology |✓ |✓ |✓ | 166|acc, acc_norm |
|hendrycksTest-world_religions |✓ |✓ |✓ | 171|acc, acc_norm |
|wmt14-en-fr | | |✓ | 3003|bleu, chrf, ter |
|wmt14-fr-en | | |✓ | 3003|bleu, chrf, ter |
|wmt16-en-ro | | |✓ | 1999|bleu, chrf, ter |
|wmt16-ro-en | | |✓ | 1999|bleu, chrf, ter |
|wmt16-de-en | | |✓ | 2999|bleu, chrf, ter |
|wmt16-en-de | | |✓ | 2999|bleu, chrf, ter |
|wmt20-cs-en | | |✓ | 664|bleu, chrf, ter |
|wmt20-de-en | | |✓ | 785|bleu, chrf, ter |
|wmt20-de-fr | | |✓ | 1619|bleu, chrf, ter |
|wmt20-en-cs | | |✓ | 1418|bleu, chrf, ter |
|wmt20-en-de | | |✓ | 1418|bleu, chrf, ter |
|wmt20-en-iu | | |✓ | 2971|bleu, chrf, ter |
|wmt20-en-ja | | |✓ | 1000|bleu, chrf, ter |
|wmt20-en-km | | |✓ | 2320|bleu, chrf, ter |
|wmt20-en-pl | | |✓ | 1000|bleu, chrf, ter |
|wmt20-en-ps | | |✓ | 2719|bleu, chrf, ter |
|wmt20-en-ru | | |✓ | 2002|bleu, chrf, ter |
|wmt20-en-ta | | |✓ | 1000|bleu, chrf, ter |
|wmt20-en-zh | | |✓ | 1418|bleu, chrf, ter |
|wmt20-fr-de | | |✓ | 1619|bleu, chrf, ter |
|wmt20-iu-en | | |✓ | 2971|bleu, chrf, ter |
|wmt20-ja-en | | |✓ | 993|bleu, chrf, ter |
|wmt20-km-en | | |✓ | 2320|bleu, chrf, ter |
|wmt20-pl-en | | |✓ | 1001|bleu, chrf, ter |
|wmt20-ps-en | | |✓ | 2719|bleu, chrf, ter |
|wmt20-ru-en | | |✓ | 991|bleu, chrf, ter |
|wmt20-ta-en | | |✓ | 997|bleu, chrf, ter |
|wmt20-zh-en | | |✓ | 2000|bleu, chrf, ter |
|iwslt17-en-ar | | |✓ | 1460|bleu, chrf, ter |
|iwslt17-ar-en | | |✓ | 1460|bleu, chrf, ter |
|anagrams1 | |✓ | | 10000|acc |
|anagrams2 | |✓ | | 10000|acc |
|cycle_letters | |✓ | | 10000|acc |
|random_insertion | |✓ | | 10000|acc |
|reversed_words | |✓ | | 10000|acc |
|pile_arxiv | |✓ |✓ | 2407|word_perplexity, byte_perplexity, bits_per_byte |
|pile_books3 | |✓ |✓ | 269|word_perplexity, byte_perplexity, bits_per_byte |
|pile_bookcorpus2 | |✓ |✓ | 28|word_perplexity, byte_perplexity, bits_per_byte |
|pile_dm-mathematics | |✓ |✓ | 1922|word_perplexity, byte_perplexity, bits_per_byte |
|pile_enron | |✓ |✓ | 1010|word_perplexity, byte_perplexity, bits_per_byte |
|pile_europarl | |✓ |✓ | 157|word_perplexity, byte_perplexity, bits_per_byte |
|pile_freelaw | |✓ |✓ | 5101|word_perplexity, byte_perplexity, bits_per_byte |
|pile_github | |✓ |✓ | 18195|word_perplexity, byte_perplexity, bits_per_byte |
|pile_gutenberg | |✓ |✓ | 80|word_perplexity, byte_perplexity, bits_per_byte |
|pile_hackernews | |✓ |✓ | 1632|word_perplexity, byte_perplexity, bits_per_byte |
|pile_nih-exporter | |✓ |✓ | 1884|word_perplexity, byte_perplexity, bits_per_byte |
|pile_opensubtitles | |✓ |✓ | 642|word_perplexity, byte_perplexity, bits_per_byte |
|pile_openwebtext2 | |✓ |✓ | 32925|word_perplexity, byte_perplexity, bits_per_byte |
|pile_philpapers | |✓ |✓ | 68|word_perplexity, byte_perplexity, bits_per_byte |
|pile_pile-cc | |✓ |✓ | 52790|word_perplexity, byte_perplexity, bits_per_byte |
|pile_pubmed-abstracts | |✓ |✓ | 29895|word_perplexity, byte_perplexity, bits_per_byte |
|pile_pubmed-central | |✓ |✓ | 5911|word_perplexity, byte_perplexity, bits_per_byte |
|pile_stackexchange | |✓ |✓ | 30378|word_perplexity, byte_perplexity, bits_per_byte |
|pile_uspto | |✓ |✓ | 11415|word_perplexity, byte_perplexity, bits_per_byte |
|pile_ubuntu-irc | |✓ |✓ | 22|word_perplexity, byte_perplexity, bits_per_byte |
|pile_wikipedia | |✓ |✓ | 17511|word_perplexity, byte_perplexity, bits_per_byte |
|pile_youtubesubtitles | |✓ |✓ | 342|word_perplexity, byte_perplexity, bits_per_byte |
| Task Name |Train|Val|Test|Val/Test Docs| Metrics |
|---------------------------------------------------------|-----|---|----|------------:|------------------------------------------------------------------------------|
|cola |✓ |✓ | | 1043|mcc |
|mnli |✓ |✓ | | 9815|acc |
|mnli_mismatched |✓ |✓ | | 9832|acc |
|mrpc |✓ |✓ | | 408|acc, f1 |
|rte |✓ |✓ | | 277|acc |
|qnli |✓ |✓ | | 5463|acc |
|qqp |✓ |✓ | | 40430|acc, f1 |
|sst |✓ |✓ | | 872|acc |
|wnli |✓ |✓ | | 71|acc |
|boolq |✓ |✓ | | 3270|acc |
|cb |✓ |✓ | | 56|acc, f1 |
|copa |✓ |✓ | | 100|acc |
|multirc |✓ |✓ | | 4848|acc |
|record |✓ |✓ | | 10000|f1, em |
|wic |✓ |✓ | | 638|acc |
|wsc |✓ |✓ | | 104|acc |
|coqa |✓ |✓ | | 500|f1, em |
|drop |✓ |✓ | | 9536|em, f1 |
|lambada | |✓ | | 5153|ppl, acc |
|lambada_cloze | |✓ | | 5153|ppl, acc |
|wikitext | |✓ |✓ | 62|word_perplexity, byte_perplexity, bits_per_byte |
|piqa |✓ |✓ | | 1838|acc, acc_norm |
|prost | | |✓ | 18736|acc, acc_norm |
|pubmedqa | | |✓ | 1000|acc |
|sciq |✓ |✓ |✓ | 1000|acc, acc_norm |
|qa4mre_2011 | | |✓ | 120|acc, acc_norm |
|qa4mre_2012 | | |✓ | 160|acc, acc_norm |
|qa4mre_2013 | | |✓ | 284|acc, acc_norm |
|triviaqa |✓ |✓ | | 11313|acc |
|arc_easy |✓ |✓ |✓ | 2376|acc, acc_norm |
|arc_challenge |✓ |✓ |✓ | 1172|acc, acc_norm |
|logiqa |✓ |✓ |✓ | 651|acc, acc_norm |
|hellaswag |✓ |✓ | | 10042|acc, acc_norm |
|openbookqa |✓ |✓ |✓ | 500|acc, acc_norm |
|squad2 |✓ |✓ | | 11873|exact, f1, HasAns_exact, HasAns_f1, NoAns_exact, NoAns_f1, best_exact, best_f1|
|race |✓ |✓ |✓ | 1045|acc |
|headqa |✓ |✓ |✓ | 2742|acc, acc_norm |
|mathqa |✓ |✓ |✓ | 2985|acc, acc_norm |
|webqs |✓ | |✓ | 2032|acc |
|wsc273 | | |✓ | 273|acc |
|winogrande |✓ |✓ | | 1267|acc |
|anli_r1 |✓ |✓ |✓ | 1000|acc |
|anli_r2 |✓ |✓ |✓ | 1000|acc |
|anli_r3 |✓ |✓ |✓ | 1200|acc |
|ethics_cm |✓ | |✓ | 3885|acc |
|ethics_deontology |✓ | |✓ | 3596|acc, em |
|ethics_justice |✓ | |✓ | 2704|acc, em |
|ethics_utilitarianism_original | | |✓ | 4808|acc |
|ethics_utilitarianism |✓ | |✓ | 4808|acc |
|ethics_virtue |✓ | |✓ | 4975|acc, em |
|math_algebra |✓ | |✓ | 1187|acc |
|math_counting_and_prob |✓ | |✓ | 474|acc |
|math_geometry |✓ | |✓ | 479|acc |
|math_intermediate_algebra |✓ | |✓ | 903|acc |
|math_num_theory |✓ | |✓ | 540|acc |
|math_prealgebra |✓ | |✓ | 871|acc |
|math_precalc |✓ | |✓ | 546|acc |
|arithmetic_2da | |✓ | | 2000|acc |
|arithmetic_2ds | |✓ | | 2000|acc |
|arithmetic_3da | |✓ | | 2000|acc |
|arithmetic_3ds | |✓ | | 2000|acc |
|arithmetic_4da | |✓ | | 2000|acc |
|arithmetic_4ds | |✓ | | 2000|acc |
|arithmetic_5da | |✓ | | 2000|acc |
|arithmetic_5ds | |✓ | | 2000|acc |
|arithmetic_2dm | |✓ | | 2000|acc |
|arithmetic_1dc | |✓ | | 2000|acc |
|hendrycksTest-abstract_algebra |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-anatomy |✓ |✓ |✓ | 135|acc, acc_norm |
|hendrycksTest-astronomy |✓ |✓ |✓ | 152|acc, acc_norm |
|hendrycksTest-business_ethics |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-clinical_knowledge |✓ |✓ |✓ | 265|acc, acc_norm |
|hendrycksTest-college_biology |✓ |✓ |✓ | 144|acc, acc_norm |
|hendrycksTest-college_chemistry |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-college_computer_science |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-college_mathematics |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-college_medicine |✓ |✓ |✓ | 173|acc, acc_norm |
|hendrycksTest-college_physics |✓ |✓ |✓ | 102|acc, acc_norm |
|hendrycksTest-computer_security |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-conceptual_physics |✓ |✓ |✓ | 235|acc, acc_norm |
|hendrycksTest-econometrics |✓ |✓ |✓ | 114|acc, acc_norm |
|hendrycksTest-electrical_engineering |✓ |✓ |✓ | 145|acc, acc_norm |
|hendrycksTest-elementary_mathematics |✓ |✓ |✓ | 378|acc, acc_norm |
|hendrycksTest-formal_logic |✓ |✓ |✓ | 126|acc, acc_norm |
|hendrycksTest-global_facts |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-high_school_biology |✓ |✓ |✓ | 310|acc, acc_norm |
|hendrycksTest-high_school_chemistry |✓ |✓ |✓ | 203|acc, acc_norm |
|hendrycksTest-high_school_computer_science |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-high_school_european_history |✓ |✓ |✓ | 165|acc, acc_norm |
|hendrycksTest-high_school_geography |✓ |✓ |✓ | 198|acc, acc_norm |
|hendrycksTest-high_school_government_and_politics |✓ |✓ |✓ | 193|acc, acc_norm |
|hendrycksTest-high_school_macroeconomics |✓ |✓ |✓ | 390|acc, acc_norm |
|hendrycksTest-high_school_mathematics |✓ |✓ |✓ | 270|acc, acc_norm |
|hendrycksTest-high_school_microeconomics |✓ |✓ |✓ | 238|acc, acc_norm |
|hendrycksTest-high_school_physics |✓ |✓ |✓ | 151|acc, acc_norm |
|hendrycksTest-high_school_psychology |✓ |✓ |✓ | 545|acc, acc_norm |
|hendrycksTest-high_school_statistics |✓ |✓ |✓ | 216|acc, acc_norm |
|hendrycksTest-high_school_us_history |✓ |✓ |✓ | 204|acc, acc_norm |
|hendrycksTest-high_school_world_history |✓ |✓ |✓ | 237|acc, acc_norm |
|hendrycksTest-human_aging |✓ |✓ |✓ | 223|acc, acc_norm |
|hendrycksTest-human_sexuality |✓ |✓ |✓ | 131|acc, acc_norm |
|hendrycksTest-international_law |✓ |✓ |✓ | 121|acc, acc_norm |
|hendrycksTest-jurisprudence |✓ |✓ |✓ | 108|acc, acc_norm |
|hendrycksTest-logical_fallacies |✓ |✓ |✓ | 163|acc, acc_norm |
|hendrycksTest-machine_learning |✓ |✓ |✓ | 112|acc, acc_norm |
|hendrycksTest-management |✓ |✓ |✓ | 103|acc, acc_norm |
|hendrycksTest-marketing |✓ |✓ |✓ | 234|acc, acc_norm |
|hendrycksTest-medical_genetics |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-miscellaneous |✓ |✓ |✓ | 783|acc, acc_norm |
|hendrycksTest-moral_disputes |✓ |✓ |✓ | 346|acc, acc_norm |
|hendrycksTest-moral_scenarios |✓ |✓ |✓ | 895|acc, acc_norm |
|hendrycksTest-nutrition |✓ |✓ |✓ | 306|acc, acc_norm |
|hendrycksTest-philosophy |✓ |✓ |✓ | 311|acc, acc_norm |
|hendrycksTest-prehistory |✓ |✓ |✓ | 324|acc, acc_norm |
|hendrycksTest-professional_accounting |✓ |✓ |✓ | 282|acc, acc_norm |
|hendrycksTest-professional_law |✓ |✓ |✓ | 1534|acc, acc_norm |
|hendrycksTest-professional_medicine |✓ |✓ |✓ | 272|acc, acc_norm |
|hendrycksTest-professional_psychology |✓ |✓ |✓ | 612|acc, acc_norm |
|hendrycksTest-public_relations |✓ |✓ |✓ | 110|acc, acc_norm |
|hendrycksTest-security_studies |✓ |✓ |✓ | 245|acc, acc_norm |
|hendrycksTest-sociology |✓ |✓ |✓ | 201|acc, acc_norm |
|hendrycksTest-us_foreign_policy |✓ |✓ |✓ | 100|acc, acc_norm |
|hendrycksTest-virology |✓ |✓ |✓ | 166|acc, acc_norm |
|hendrycksTest-world_religions |✓ |✓ |✓ | 171|acc, acc_norm |
|wmt14-en-fr | | |✓ | 3003|bleu, chrf, ter |
|wmt14-fr-en | | |✓ | 3003|bleu, chrf, ter |
|wmt16-en-ro | | |✓ | 1999|bleu, chrf, ter |
|wmt16-ro-en | | |✓ | 1999|bleu, chrf, ter |
|wmt16-de-en | | |✓ | 2999|bleu, chrf, ter |
|wmt16-en-de | | |✓ | 2999|bleu, chrf, ter |
|wmt20-cs-en | | |✓ | 664|bleu, chrf, ter |
|wmt20-de-en | | |✓ | 785|bleu, chrf, ter |
|wmt20-de-fr | | |✓ | 1619|bleu, chrf, ter |
|wmt20-en-cs | | |✓ | 1418|bleu, chrf, ter |
|wmt20-en-de | | |✓ | 1418|bleu, chrf, ter |
|wmt20-en-iu | | |✓ | 2971|bleu, chrf, ter |
|wmt20-en-ja | | |✓ | 1000|bleu, chrf, ter |
|wmt20-en-km | | |✓ | 2320|bleu, chrf, ter |
|wmt20-en-pl | | |✓ | 1000|bleu, chrf, ter |
|wmt20-en-ps | | |✓ | 2719|bleu, chrf, ter |
|wmt20-en-ru | | |✓ | 2002|bleu, chrf, ter |
|wmt20-en-ta | | |✓ | 1000|bleu, chrf, ter |
|wmt20-en-zh | | |✓ | 1418|bleu, chrf, ter |
|wmt20-fr-de | | |✓ | 1619|bleu, chrf, ter |
|wmt20-iu-en | | |✓ | 2971|bleu, chrf, ter |
|wmt20-ja-en | | |✓ | 993|bleu, chrf, ter |
|wmt20-km-en | | |✓ | 2320|bleu, chrf, ter |
|wmt20-pl-en | | |✓ | 1001|bleu, chrf, ter |
|wmt20-ps-en | | |✓ | 2719|bleu, chrf, ter |
|wmt20-ru-en | | |✓ | 991|bleu, chrf, ter |
|wmt20-ta-en | | |✓ | 997|bleu, chrf, ter |
|wmt20-zh-en | | |✓ | 2000|bleu, chrf, ter |
|iwslt17-en-ar | | |✓ | 1460|bleu, chrf, ter |
|iwslt17-ar-en | | |✓ | 1460|bleu, chrf, ter |
|anagrams1 | |✓ | | 10000|acc |
|anagrams2 | |✓ | | 10000|acc |
|cycle_letters | |✓ | | 10000|acc |
|random_insertion | |✓ | | 10000|acc |
|reversed_words | |✓ | | 10000|acc |
|pile_arxiv | |✓ |✓ | 2407|word_perplexity, byte_perplexity, bits_per_byte |
|pile_books3 | |✓ |✓ | 269|word_perplexity, byte_perplexity, bits_per_byte |
|pile_bookcorpus2 | |✓ |✓ | 28|word_perplexity, byte_perplexity, bits_per_byte |
|pile_dm-mathematics | |✓ |✓ | 1922|word_perplexity, byte_perplexity, bits_per_byte |
|pile_enron | |✓ |✓ | 1010|word_perplexity, byte_perplexity, bits_per_byte |
|pile_europarl | |✓ |✓ | 157|word_perplexity, byte_perplexity, bits_per_byte |
|pile_freelaw | |✓ |✓ | 5101|word_perplexity, byte_perplexity, bits_per_byte |
|pile_github | |✓ |✓ | 18195|word_perplexity, byte_perplexity, bits_per_byte |
|pile_gutenberg | |✓ |✓ | 80|word_perplexity, byte_perplexity, bits_per_byte |
|pile_hackernews | |✓ |✓ | 1632|word_perplexity, byte_perplexity, bits_per_byte |
|pile_nih-exporter | |✓ |✓ | 1884|word_perplexity, byte_perplexity, bits_per_byte |
|pile_opensubtitles | |✓ |✓ | 642|word_perplexity, byte_perplexity, bits_per_byte |
|pile_openwebtext2 | |✓ |✓ | 32925|word_perplexity, byte_perplexity, bits_per_byte |
|pile_philpapers | |✓ |✓ | 68|word_perplexity, byte_perplexity, bits_per_byte |
|pile_pile-cc | |✓ |✓ | 52790|word_perplexity, byte_perplexity, bits_per_byte |
|pile_pubmed-abstracts | |✓ |✓ | 29895|word_perplexity, byte_perplexity, bits_per_byte |
|pile_pubmed-central | |✓ |✓ | 5911|word_perplexity, byte_perplexity, bits_per_byte |
|pile_stackexchange | |✓ |✓ | 30378|word_perplexity, byte_perplexity, bits_per_byte |
|pile_uspto | |✓ |✓ | 11415|word_perplexity, byte_perplexity, bits_per_byte |
|pile_ubuntu-irc | |✓ |✓ | 22|word_perplexity, byte_perplexity, bits_per_byte |
|pile_wikipedia | |✓ |✓ | 17511|word_perplexity, byte_perplexity, bits_per_byte |
|pile_youtubesubtitles | |✓ | | 1000|acc
|blimp_adjunct_island | |✓ | | 1000|acc
|blimp_anaphor_gender_agreement | |✓ | | 1000|acc
|blimp_anaphor_number_agreement | |✓ | | 1000|acc
|blimp_animate_subject_passive | |✓ | | 1000|acc
|blimp_animate_subject_trans | |✓ | | 1000|acc
|blimp_causative | |✓ | | 1000|acc
|blimp_complex_NP_island | |✓ | | 1000|acc
|blimp_coordinate_structure_constraint_complex_left_branch| |✓ | | 1000|acc
|blimp_coordinate_structure_constraint_object_extraction | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_1 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_2 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_irregular_1 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_irregular_2 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_with_adj_2 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_with_adj_irregular_1 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_with_adj_irregular_2 | |✓ | | 1000|acc
|blimp_determiner_noun_agreement_with_adjective_1 | |✓ | | 1000|acc
|blimp_distractor_agreement_relational_noun | |✓ | | 1000|acc
|blimp_distractor_agreement_relative_clause | |✓ | | 1000|acc
|blimp_drop_argument | |✓ | | 1000|acc
|blimp_ellipsis_n_bar_1 | |✓ | | 1000|acc
|blimp_ellipsis_n_bar_2 | |✓ | | 1000|acc
|blimp_existential_there_object_raising | |✓ | | 1000|acc
|blimp_existential_there_quantifiers_1 | |✓ | | 1000|acc
|blimp_existential_there_quantifiers_2 | |✓ | | 1000|acc
|blimp_existential_there_subject_raising | |✓ | | 1000|acc
|blimp_expletive_it_object_raising | |✓ | | 1000|acc
|blimp_inchoative | |✓ | | 1000|acc
|blimp_intransitive | |✓ | | 1000|acc
|blimp_irregular_past_participle_adjectives | |✓ | | 1000|acc
|blimp_irregular_past_participle_verbs | |✓ | | 1000|acc
|blimp_irregular_plural_subject_verb_agreement_1 | |✓ | | 1000|acc
|blimp_irregular_plural_subject_verb_agreement_2 | |✓ | | 1000|acc
|blimp_left_branch_island_echo_question | |✓ | | 1000|acc
|blimp_left_branch_island_simple_question | |✓ | | 1000|acc
|blimp_matrix_question_npi_licensor_present | |✓ | | 1000|acc
|blimp_npi_present_1 | |✓ | | 1000|acc
|blimp_npi_present_2 | |✓ | | 1000|acc
|blimp_only_npi_licensor_present | |✓ | | 1000|acc
|blimp_only_npi_scope | |✓ | | 1000|acc
|blimp_passive_1 | |✓ | | 1000|acc
|blimp_passive_2 | |✓ | | 1000|acc
|blimp_principle_A_c_command | |✓ | | 1000|acc
|blimp_principle_A_case_1 | |✓ | | 1000|acc
|blimp_principle_A_case_2 | |✓ | | 1000|acc
|blimp_principle_A_domain_1 | |✓ | | 1000|acc
|blimp_principle_A_domain_2 | |✓ | | 1000|acc
|blimp_principle_A_domain_3 | |✓ | | 1000|acc
|blimp_principle_A_reconstruction | |✓ | | 1000|acc
|blimp_regular_plural_subject_verb_agreement_1 | |✓ | | 1000|acc
|blimp_regular_plural_subject_verb_agreement_2 | |✓ | | 1000|acc
|blimp_sentential_negation_npi_licensor_present | |✓ | | 1000|acc
|blimp_sentential_negation_npi_scope | |✓ | | 1000|acc
|blimp_sentential_subject_island | |✓ | | 1000|acc
|blimp_superlative_quantifiers_1 | |✓ | | 1000|acc
|blimp_superlative_quantifiers_2 | |✓ | | 1000|acc
|blimp_tough_vs_raising_1 | |✓ | | 1000|acc
|blimp_tough_vs_raising_2 | |✓ | | 1000|acc
|blimp_transitive | |✓ | | 1000|acc
|blimp_wh_island | |✓ | | 1000|acc
|blimp_wh_questions_object_gap | |✓ | | 1000|acc
|blimp_wh_questions_subject_gap | |✓ | | 1000|acc
|blimp_wh_questions_subject_gap_long_distance | |✓ | | 1000|acc
|blimp_wh_vs_that_no_gap | |✓ | | 1000|acc
|blimp_wh_vs_that_no_gap_long_distance | |✓ | | 1000|acc
|blimp_wh_vs_that_with_gap | |✓ | | 1000|acc
|blimp_wh_vs_that_with_gap_long_distance | |✓ | | 1000|acc
## Usage
......
......@@ -44,6 +44,7 @@ from . import wikitext
from . import lambada_multilingual
from . import mutual
from . import truthfulqa
from . import blimp
########################################
# Translation tasks
......@@ -217,6 +218,74 @@ TASK_REGISTRY = {
"pile_wikipedia": pile.PileWikipedia,
"pile_youtubesubtitles": pile.PileYoutubeSubtitles,
# BLiMP
"blimp_adjunct_island": blimp.BlimpAdjunctIsland,
"blimp_anaphor_gender_agreement": blimp.BlimpAnaphorGenderAgreement,
"blimp_anaphor_number_agreement": blimp.BlimpAnaphorNumberAgreement,
"blimp_animate_subject_passive": blimp.BlimpAnimateSubjectPassive,
"blimp_animate_subject_trans": blimp.BlimpAnimateSubjectTrans,
"blimp_causative": blimp.BlimpCausative,
"blimp_complex_NP_island": blimp.BlimpComplex_NPIsland,
"blimp_coordinate_structure_constraint_complex_left_branch": blimp.BlimpCoordinateStructureConstraintComplexLeftBranch,
"blimp_coordinate_structure_constraint_object_extraction": blimp.BlimpCoordinateStructureConstraintObjectExtraction,
"blimp_determiner_noun_agreement_1": blimp.BlimpDeterminerNounAgreement_1,
"blimp_determiner_noun_agreement_2": blimp.BlimpDeterminerNounAgreement_2,
"blimp_determiner_noun_agreement_irregular_1": blimp.BlimpDeterminerNounAgreementIrregular_1,
"blimp_determiner_noun_agreement_irregular_2": blimp.BlimpDeterminerNounAgreementIrregular_2,
"blimp_determiner_noun_agreement_with_adj_2": blimp.BlimpDeterminerNounAgreementWithAdj_2,
"blimp_determiner_noun_agreement_with_adj_irregular_1": blimp.BlimpDeterminerNounAgreementWithAdjIrregular_1,
"blimp_determiner_noun_agreement_with_adj_irregular_2": blimp.BlimpDeterminerNounAgreementWithAdjIrregular_2,
"blimp_determiner_noun_agreement_with_adjective_1": blimp.BlimpDeterminerNounAgreementWithAdjective_1,
"blimp_distractor_agreement_relational_noun": blimp.BlimpDistractorAgreementRelationalNoun,
"blimp_distractor_agreement_relative_clause": blimp.BlimpDistractorAgreementRelativeClause,
"blimp_drop_argument": blimp.BlimpDropArgument,
"blimp_ellipsis_n_bar_1": blimp.BlimpEllipsisNBar_1,
"blimp_ellipsis_n_bar_2": blimp.BlimpEllipsisNBar_2,
"blimp_existential_there_object_raising": blimp.BlimpExistentialThereObjectRaising,
"blimp_existential_there_quantifiers_1": blimp.BlimpExistentialThereQuantifiers_1,
"blimp_existential_there_quantifiers_2": blimp.BlimpExistentialThereQuantifiers_2,
"blimp_existential_there_subject_raising": blimp.BlimpExistentialThereSubjectRaising,
"blimp_expletive_it_object_raising": blimp.BlimpExpletiveItObjectRaising,
"blimp_inchoative": blimp.BlimpInchoative,
"blimp_intransitive": blimp.BlimpIntransitive,
"blimp_irregular_past_participle_adjectives": blimp.BlimpIrregularPastParticipleAdjectives,
"blimp_irregular_past_participle_verbs": blimp.BlimpIrregularPastParticipleVerbs,
"blimp_irregular_plural_subject_verb_agreement_1": blimp.BlimpIrregularPluralSubjectVerbAgreement_1,
"blimp_irregular_plural_subject_verb_agreement_2": blimp.BlimpIrregularPluralSubjectVerbAgreement_2,
"blimp_left_branch_island_echo_question": blimp.BlimpLeftBranchIslandEchoQuestion,
"blimp_left_branch_island_simple_question": blimp.BlimpLeftBranchIslandSimpleQuestion,
"blimp_matrix_question_npi_licensor_present": blimp.BlimpMatrixQuestionNpiLicensorPresent,
"blimp_npi_present_1": blimp.BlimpNpiPresent_1,
"blimp_npi_present_2": blimp.BlimpNpiPresent_2,
"blimp_only_npi_licensor_present": blimp.BlimpOnlyNpiLicensorPresent,
"blimp_only_npi_scope": blimp.BlimpOnlyNpiScope,
"blimp_passive_1": blimp.BlimpPassive_1,
"blimp_passive_2": blimp.BlimpPassive_2,
"blimp_principle_A_c_command": blimp.BlimpPrinciple_ACCommand,
"blimp_principle_A_case_1": blimp.BlimpPrinciple_ACase_1,
"blimp_principle_A_case_2": blimp.BlimpPrinciple_ACase_2,
"blimp_principle_A_domain_1": blimp.BlimpPrinciple_ADomain_1,
"blimp_principle_A_domain_2": blimp.BlimpPrinciple_ADomain_2,
"blimp_principle_A_domain_3": blimp.BlimpPrinciple_ADomain_3,
"blimp_principle_A_reconstruction": blimp.BlimpPrinciple_AReconstruction,
"blimp_regular_plural_subject_verb_agreement_1": blimp.BlimpRegularPluralSubjectVerbAgreement_1,
"blimp_regular_plural_subject_verb_agreement_2": blimp.BlimpRegularPluralSubjectVerbAgreement_2,
"blimp_sentential_negation_npi_licensor_present": blimp.BlimpSententialNegationNpiLicensorPresent,
"blimp_sentential_negation_npi_scope": blimp.BlimpSententialNegationNpiScope,
"blimp_sentential_subject_island": blimp.BlimpSententialSubjectIsland,
"blimp_superlative_quantifiers_1": blimp.BlimpSuperlativeQuantifiers_1,
"blimp_superlative_quantifiers_2": blimp.BlimpSuperlativeQuantifiers_2,
"blimp_tough_vs_raising_1": blimp.BlimpToughVsRaising_1,
"blimp_tough_vs_raising_2": blimp.BlimpToughVsRaising_2,
"blimp_transitive": blimp.BlimpTransitive,
"blimp_wh_island": blimp.BlimpWhIsland,
"blimp_wh_questions_object_gap": blimp.BlimpWhQuestionsObjectGap,
"blimp_wh_questions_subject_gap": blimp.BlimpWhQuestionsSubjectGap,
"blimp_wh_questions_subject_gap_long_distance": blimp.BlimpWhQuestionsSubjectGapLongDistance,
"blimp_wh_vs_that_no_gap": blimp.BlimpWhVsThatNoGap,
"blimp_wh_vs_that_no_gap_long_distance": blimp.BlimpWhVsThatNoGapLongDistance,
"blimp_wh_vs_that_with_gap": blimp.BlimpWhVsThatWithGap,
"blimp_wh_vs_that_with_gap_long_distance": blimp.BlimpWhVsThatWithGapLongDistance,
}
......
"""
BLiMP: A Benchmark of Linguistic Minimal Pairs for English
https://arxiv.org/abs/1912.00582
@article{warstadt2019blimp,
title={BLiMP: A Benchmark of Linguistic Minimal Pairs for English},
author={Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei, and Wang, Sheng-Fu and Bowman, Samuel R},
journal={arXiv preprint arXiv:1912.00582},
year={2019}
}
"""
from lm_eval.base import rf
from lm_eval.metrics import mean
from .common import HFTask
class BlimpTask(HFTask):
VERSION = 0
DATASET_PATH = "blimp"
def download(self):
super().download()
# The HF dataset only contains a "train" dataset, but the harness expects a "validation"
# dataset. Let's use the training dataset, on the assumption that the model wasn't actually
# trained on this data.
self.data["validation"] = self.data["train"]
del self.data["train"]
def fewshot_context(self, doc, num_fewshot, provide_description, rnd):
assert num_fewshot == 0
assert not provide_description
return ""
def doc_to_text(self, doc):
# this method is invoked by tests only
return ""
def doc_to_target(self, doc):
# this method is invoked by tests only
return ""
def construct_requests(self, doc, ctx):
assert not ctx
# Calculate the loglikelihood for the good and the bad sentence.
# Note that loglikelihood translates the "" prefix to the "<|endoftext|>" token
return [
rf.loglikelihood("", doc["sentence_good"]),
rf.loglikelihood("", doc["sentence_bad"]),
]
def process_results(self, doc, results):
likelihood1, likelihood2 = results
# the model got this case right iff the good sentence scored higher than the bad sentence
acc = 1.0 if likelihood1 > likelihood2 else 0.0
return {
"acc": acc,
}
def higher_is_better(self):
return {
"acc": True,
}
def aggregation(self):
return {
"acc": mean,
}
class BlimpAdjunctIsland(BlimpTask):
DATASET_NAME = "adjunct_island"
class BlimpAnaphorGenderAgreement(BlimpTask):
DATASET_NAME = "anaphor_gender_agreement"
class BlimpAnaphorNumberAgreement(BlimpTask):
DATASET_NAME = "anaphor_number_agreement"
class BlimpAnimateSubjectPassive(BlimpTask):
DATASET_NAME = "animate_subject_passive"
class BlimpAnimateSubjectTrans(BlimpTask):
DATASET_NAME = "animate_subject_trans"
class BlimpCausative(BlimpTask):
DATASET_NAME = "causative"
class BlimpComplex_NPIsland(BlimpTask):
DATASET_NAME = "complex_NP_island"
class BlimpCoordinateStructureConstraintComplexLeftBranch(BlimpTask):
DATASET_NAME = "coordinate_structure_constraint_complex_left_branch"
class BlimpCoordinateStructureConstraintObjectExtraction(BlimpTask):
DATASET_NAME = "coordinate_structure_constraint_object_extraction"
class BlimpDeterminerNounAgreement_1(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_1"
class BlimpDeterminerNounAgreement_2(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_2"
class BlimpDeterminerNounAgreementIrregular_1(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_irregular_1"
class BlimpDeterminerNounAgreementIrregular_2(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_irregular_2"
class BlimpDeterminerNounAgreementWithAdj_2(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_with_adj_2"
class BlimpDeterminerNounAgreementWithAdjIrregular_1(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_with_adj_irregular_1"
class BlimpDeterminerNounAgreementWithAdjIrregular_2(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_with_adj_irregular_2"
class BlimpDeterminerNounAgreementWithAdjective_1(BlimpTask):
DATASET_NAME = "determiner_noun_agreement_with_adjective_1"
class BlimpDistractorAgreementRelationalNoun(BlimpTask):
DATASET_NAME = "distractor_agreement_relational_noun"
class BlimpDistractorAgreementRelativeClause(BlimpTask):
DATASET_NAME = "distractor_agreement_relative_clause"
class BlimpDropArgument(BlimpTask):
DATASET_NAME = "drop_argument"
class BlimpEllipsisNBar_1(BlimpTask):
DATASET_NAME = "ellipsis_n_bar_1"
class BlimpEllipsisNBar_2(BlimpTask):
DATASET_NAME = "ellipsis_n_bar_2"
class BlimpExistentialThereObjectRaising(BlimpTask):
DATASET_NAME = "existential_there_object_raising"
class BlimpExistentialThereQuantifiers_1(BlimpTask):
DATASET_NAME = "existential_there_quantifiers_1"
class BlimpExistentialThereQuantifiers_2(BlimpTask):
DATASET_NAME = "existential_there_quantifiers_2"
class BlimpExistentialThereSubjectRaising(BlimpTask):
DATASET_NAME = "existential_there_subject_raising"
class BlimpExpletiveItObjectRaising(BlimpTask):
DATASET_NAME = "expletive_it_object_raising"
class BlimpInchoative(BlimpTask):
DATASET_NAME = "inchoative"
class BlimpIntransitive(BlimpTask):
DATASET_NAME = "intransitive"
class BlimpIrregularPastParticipleAdjectives(BlimpTask):
DATASET_NAME = "irregular_past_participle_adjectives"
class BlimpIrregularPastParticipleVerbs(BlimpTask):
DATASET_NAME = "irregular_past_participle_verbs"
class BlimpIrregularPluralSubjectVerbAgreement_1(BlimpTask):
DATASET_NAME = "irregular_plural_subject_verb_agreement_1"
class BlimpIrregularPluralSubjectVerbAgreement_2(BlimpTask):
DATASET_NAME = "irregular_plural_subject_verb_agreement_2"
class BlimpLeftBranchIslandEchoQuestion(BlimpTask):
DATASET_NAME = "left_branch_island_echo_question"
class BlimpLeftBranchIslandSimpleQuestion(BlimpTask):
DATASET_NAME = "left_branch_island_simple_question"
class BlimpMatrixQuestionNpiLicensorPresent(BlimpTask):
DATASET_NAME = "matrix_question_npi_licensor_present"
class BlimpNpiPresent_1(BlimpTask):
DATASET_NAME = "npi_present_1"
class BlimpNpiPresent_2(BlimpTask):
DATASET_NAME = "npi_present_2"
class BlimpOnlyNpiLicensorPresent(BlimpTask):
DATASET_NAME = "only_npi_licensor_present"
class BlimpOnlyNpiScope(BlimpTask):
DATASET_NAME = "only_npi_scope"
class BlimpPassive_1(BlimpTask):
DATASET_NAME = "passive_1"
class BlimpPassive_2(BlimpTask):
DATASET_NAME = "passive_2"
class BlimpPrinciple_ACCommand(BlimpTask):
DATASET_NAME = "principle_A_c_command"
class BlimpPrinciple_ACase_1(BlimpTask):
DATASET_NAME = "principle_A_case_1"
class BlimpPrinciple_ACase_2(BlimpTask):
DATASET_NAME = "principle_A_case_2"
class BlimpPrinciple_ADomain_1(BlimpTask):
DATASET_NAME = "principle_A_domain_1"
class BlimpPrinciple_ADomain_2(BlimpTask):
DATASET_NAME = "principle_A_domain_2"
class BlimpPrinciple_ADomain_3(BlimpTask):
DATASET_NAME = "principle_A_domain_3"
class BlimpPrinciple_AReconstruction(BlimpTask):
DATASET_NAME = "principle_A_reconstruction"
class BlimpRegularPluralSubjectVerbAgreement_1(BlimpTask):
DATASET_NAME = "regular_plural_subject_verb_agreement_1"
class BlimpRegularPluralSubjectVerbAgreement_2(BlimpTask):
DATASET_NAME = "regular_plural_subject_verb_agreement_2"
class BlimpSententialNegationNpiLicensorPresent(BlimpTask):
DATASET_NAME = "sentential_negation_npi_licensor_present"
class BlimpSententialNegationNpiScope(BlimpTask):
DATASET_NAME = "sentential_negation_npi_scope"
class BlimpSententialSubjectIsland(BlimpTask):
DATASET_NAME = "sentential_subject_island"
class BlimpSuperlativeQuantifiers_1(BlimpTask):
DATASET_NAME = "superlative_quantifiers_1"
class BlimpSuperlativeQuantifiers_2(BlimpTask):
DATASET_NAME = "superlative_quantifiers_2"
class BlimpToughVsRaising_1(BlimpTask):
DATASET_NAME = "tough_vs_raising_1"
class BlimpToughVsRaising_2(BlimpTask):
DATASET_NAME = "tough_vs_raising_2"
class BlimpTransitive(BlimpTask):
DATASET_NAME = "transitive"
class BlimpWhIsland(BlimpTask):
DATASET_NAME = "wh_island"
class BlimpWhQuestionsObjectGap(BlimpTask):
DATASET_NAME = "wh_questions_object_gap"
class BlimpWhQuestionsSubjectGap(BlimpTask):
DATASET_NAME = "wh_questions_subject_gap"
class BlimpWhQuestionsSubjectGapLongDistance(BlimpTask):
DATASET_NAME = "wh_questions_subject_gap_long_distance"
class BlimpWhVsThatNoGap(BlimpTask):
DATASET_NAME = "wh_vs_that_no_gap"
class BlimpWhVsThatNoGapLongDistance(BlimpTask):
DATASET_NAME = "wh_vs_that_no_gap_long_distance"
class BlimpWhVsThatWithGap(BlimpTask):
DATASET_NAME = "wh_vs_that_with_gap"
class BlimpWhVsThatWithGapLongDistance(BlimpTask):
DATASET_NAME = "wh_vs_that_with_gap_long_distance"
976a5cac4bdb724632eebd4cb9e522203ce3da8d5525288a597c86e80469f3f2
\ No newline at end of file
{"results": {"blimp_adjunct_island": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_adjunct_island": 0}}
\ No newline at end of file
2d8964e56a17661502ecf3f09c0befba63915360ddf2145b0bd845816950515d
\ No newline at end of file
{"results": {"blimp_anaphor_gender_agreement": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_anaphor_gender_agreement": 0}}
\ No newline at end of file
0bdad31c974ba064e1f1ba931841ec2ba7461e8b0ca54ea5f79f08b6bae0bab5
\ No newline at end of file
{"results": {"blimp_anaphor_number_agreement": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_anaphor_number_agreement": 0}}
\ No newline at end of file
064c38fcd072b8bd12f54ea4f8e41599ed4e11dc386e93b77e1fc07967d1f960
\ No newline at end of file
{"results": {"blimp_animate_subject_passive": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_animate_subject_passive": 0}}
\ No newline at end of file
2a84231e7b79f517427e57e2099c88fed3d60a7efab4ef9506e263b4091d5cfa
\ No newline at end of file
{"results": {"blimp_animate_subject_trans": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_animate_subject_trans": 0}}
\ No newline at end of file
3d67ad025185dbb0808ebd7f508edcb5750c18fc3c01ad91f20fda80780c916c
\ No newline at end of file
{"results": {"blimp_causative": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_causative": 0}}
\ No newline at end of file
f46cfcc7e43050a235fd2a6b989cabbfbcce76786df74db9f0d4a9cd1caa1628
\ No newline at end of file
{"results": {"blimp_complex_NP_island": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_complex_NP_island": 0}}
\ No newline at end of file
7e1cc5b9f71abfbe56c4bdf343a1e5632785b66a986b8e904a41ed8f45a2c33e
\ No newline at end of file
{"results": {"blimp_coordinate_structure_constraint_complex_left_branch": {"acc": 0.485, "acc_stderr": 0.0158121796418149}}, "versions": {"blimp_coordinate_structure_constraint_complex_left_branch": 0}}
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment