Unverified Commit 2cc968ba authored by Leo Gao's avatar Leo Gao Committed by GitHub
Browse files

Merge pull request #288 from EleutherAI/leogao2-patch-1

Update README.md
parents c5131e8a 71647775
...@@ -91,7 +91,7 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md). ...@@ -91,7 +91,7 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
### Full Task List ### Full Task List
| Task Name |Train|Val|Test|Val/Test Docs| Metrics | | Task Name |Train|Val|Test|Val/Test Docs| Metrics |
|---------------------------------------------------------|-----|---|----|------------:|------------------------------------------------------------------------------| |---------------------------------------------------------|-----|---|----|------------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|cola |✓ |✓ | | 1043|mcc | |cola |✓ |✓ | | 1043|mcc |
|mnli |✓ |✓ | | 9815|acc | |mnli |✓ |✓ | | 9815|acc |
|mnli_mismatched |✓ |✓ | | 9832|acc | |mnli_mismatched |✓ |✓ | | 9832|acc |
...@@ -112,9 +112,15 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md). ...@@ -112,9 +112,15 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
|drop |✓ |✓ | | 9536|em, f1 | |drop |✓ |✓ | | 9536|em, f1 |
|lambada | |✓ | | 5153|ppl, acc | |lambada | |✓ | | 5153|ppl, acc |
|lambada_cloze | |✓ | | 5153|ppl, acc | |lambada_cloze | |✓ | | 5153|ppl, acc |
|lambada_mt_en | |✓ | | 5153|ppl, acc |
|lambada_mt_fr | |✓ | | 5153|ppl, acc |
|lambada_mt_de | |✓ | | 5153|ppl, acc |
|lambada_mt_it | |✓ | | 5153|ppl, acc |
|lambada_mt_es | |✓ | | 5153|ppl, acc |
|wikitext | |✓ |✓ | 62|word_perplexity, byte_perplexity, bits_per_byte | |wikitext | |✓ |✓ | 62|word_perplexity, byte_perplexity, bits_per_byte |
|piqa |✓ |✓ | | 1838|acc, acc_norm | |piqa |✓ |✓ | | 1838|acc, acc_norm |
|prost | | |✓ | 18736|acc, acc_norm | |prost | | |✓ | 18736|acc, acc_norm |
|mc_taco | |✓ |✓ | 9442|f1, em |
|pubmedqa | | |✓ | 1000|acc | |pubmedqa | | |✓ | 1000|acc |
|sciq |✓ |✓ |✓ | 1000|acc, acc_norm | |sciq |✓ |✓ |✓ | 1000|acc, acc_norm |
|qa4mre_2011 | | |✓ | 120|acc, acc_norm | |qa4mre_2011 | | |✓ | 120|acc, acc_norm |
...@@ -126,11 +132,12 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md). ...@@ -126,11 +132,12 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
|logiqa |✓ |✓ |✓ | 651|acc, acc_norm | |logiqa |✓ |✓ |✓ | 651|acc, acc_norm |
|hellaswag |✓ |✓ | | 10042|acc, acc_norm | |hellaswag |✓ |✓ | | 10042|acc, acc_norm |
|openbookqa |✓ |✓ |✓ | 500|acc, acc_norm | |openbookqa |✓ |✓ |✓ | 500|acc, acc_norm |
|squad2 |✓ |✓ | | 11873|exact, f1, HasAns_exact, HasAns_f1, NoAns_exact, NoAns_f1, best_exact, best_f1| |squad2 |✓ |✓ | | 11873|exact, f1, HasAns_exact, HasAns_f1, NoAns_exact, NoAns_f1, best_exact, best_f1 |
|race |✓ |✓ |✓ | 1045|acc | |race |✓ |✓ |✓ | 1045|acc |
|mathqa |✓ |✓ |✓ | 2985|acc, acc_norm | |headqa |✓ |✓ |✓ | 2742|acc, acc_norm |
|headqa_es |✓ |✓ |✓ | 2742|acc, acc_norm | |headqa_es |✓ |✓ |✓ | 2742|acc, acc_norm |
|headqa_en |✓ |✓ |✓ | 2742|acc, acc_norm | |headqa_en |✓ |✓ |✓ | 2742|acc, acc_norm |
|mathqa |✓ |✓ |✓ | 2985|acc, acc_norm |
|webqs |✓ | |✓ | 2032|acc | |webqs |✓ | |✓ | 2032|acc |
|wsc273 | | |✓ | 273|acc | |wsc273 | | |✓ | 273|acc |
|winogrande |✓ |✓ | | 1267|acc | |winogrande |✓ |✓ | | 1267|acc |
...@@ -143,6 +150,10 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md). ...@@ -143,6 +150,10 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
|ethics_utilitarianism_original | | |✓ | 4808|acc | |ethics_utilitarianism_original | | |✓ | 4808|acc |
|ethics_utilitarianism |✓ | |✓ | 4808|acc | |ethics_utilitarianism |✓ | |✓ | 4808|acc |
|ethics_virtue |✓ | |✓ | 4975|acc, em | |ethics_virtue |✓ | |✓ | 4975|acc, em |
|truthfulqa_mc | |✓ | | 817|mc1, mc2 |
|truthfulqa_gen | |✓ | | 817|bleurt_max, bleurt_acc, bleurt_diff, bleu_max, bleu_acc, bleu_diff, rouge1_max, rouge1_acc, rouge1_diff, rouge2_max, rouge2_acc, rouge2_diff, rougeL_max, rougeL_acc, rougeL_diff|
|mutual |✓ |✓ | | 886|r@1, r@2, mrr |
|mutual_plus |✓ |✓ | | 886|r@1, r@2, mrr |
|math_algebra |✓ | |✓ | 1187|acc | |math_algebra |✓ | |✓ | 1187|acc |
|math_counting_and_prob |✓ | |✓ | 474|acc | |math_counting_and_prob |✓ | |✓ | 474|acc |
|math_geometry |✓ | |✓ | 479|acc | |math_geometry |✓ | |✓ | 479|acc |
...@@ -150,6 +161,7 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md). ...@@ -150,6 +161,7 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
|math_num_theory |✓ | |✓ | 540|acc | |math_num_theory |✓ | |✓ | 540|acc |
|math_prealgebra |✓ | |✓ | 871|acc | |math_prealgebra |✓ | |✓ | 871|acc |
|math_precalc |✓ | |✓ | 546|acc | |math_precalc |✓ | |✓ | 546|acc |
|math_asdiv | |✓ | | 2305|acc |
|arithmetic_2da | |✓ | | 2000|acc | |arithmetic_2da | |✓ | | 2000|acc |
|arithmetic_2ds | |✓ | | 2000|acc | |arithmetic_2ds | |✓ | | 2000|acc |
|arithmetic_3da | |✓ | | 2000|acc | |arithmetic_3da | |✓ | | 2000|acc |
...@@ -273,74 +285,74 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md). ...@@ -273,74 +285,74 @@ To implement a new task in eval harness, see [this guide](./docs/task_guide.md).
|pile_uspto | |✓ |✓ | 11415|word_perplexity, byte_perplexity, bits_per_byte | |pile_uspto | |✓ |✓ | 11415|word_perplexity, byte_perplexity, bits_per_byte |
|pile_ubuntu-irc | |✓ |✓ | 22|word_perplexity, byte_perplexity, bits_per_byte | |pile_ubuntu-irc | |✓ |✓ | 22|word_perplexity, byte_perplexity, bits_per_byte |
|pile_wikipedia | |✓ |✓ | 17511|word_perplexity, byte_perplexity, bits_per_byte | |pile_wikipedia | |✓ |✓ | 17511|word_perplexity, byte_perplexity, bits_per_byte |
|pile_youtubesubtitles | |✓ | | 1000|acc |pile_youtubesubtitles | |✓ |✓ | 342|word_perplexity, byte_perplexity, bits_per_byte |
|blimp_adjunct_island | |✓ | | 1000|acc |blimp_adjunct_island | |✓ | | 1000|acc |
|blimp_anaphor_gender_agreement | |✓ | | 1000|acc |blimp_anaphor_gender_agreement | |✓ | | 1000|acc |
|blimp_anaphor_number_agreement | |✓ | | 1000|acc |blimp_anaphor_number_agreement | |✓ | | 1000|acc |
|blimp_animate_subject_passive | |✓ | | 1000|acc |blimp_animate_subject_passive | |✓ | | 1000|acc |
|blimp_animate_subject_trans | |✓ | | 1000|acc |blimp_animate_subject_trans | |✓ | | 1000|acc |
|blimp_causative | |✓ | | 1000|acc |blimp_causative | |✓ | | 1000|acc |
|blimp_complex_NP_island | |✓ | | 1000|acc |blimp_complex_NP_island | |✓ | | 1000|acc |
|blimp_coordinate_structure_constraint_complex_left_branch| |✓ | | 1000|acc |blimp_coordinate_structure_constraint_complex_left_branch| |✓ | | 1000|acc |
|blimp_coordinate_structure_constraint_object_extraction | |✓ | | 1000|acc |blimp_coordinate_structure_constraint_object_extraction | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_1 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_1 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_2 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_2 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_irregular_1 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_irregular_1 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_irregular_2 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_irregular_2 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_with_adj_2 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_with_adj_2 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_with_adj_irregular_1 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_with_adj_irregular_1 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_with_adj_irregular_2 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_with_adj_irregular_2 | |✓ | | 1000|acc |
|blimp_determiner_noun_agreement_with_adjective_1 | |✓ | | 1000|acc |blimp_determiner_noun_agreement_with_adjective_1 | |✓ | | 1000|acc |
|blimp_distractor_agreement_relational_noun | |✓ | | 1000|acc |blimp_distractor_agreement_relational_noun | |✓ | | 1000|acc |
|blimp_distractor_agreement_relative_clause | |✓ | | 1000|acc |blimp_distractor_agreement_relative_clause | |✓ | | 1000|acc |
|blimp_drop_argument | |✓ | | 1000|acc |blimp_drop_argument | |✓ | | 1000|acc |
|blimp_ellipsis_n_bar_1 | |✓ | | 1000|acc |blimp_ellipsis_n_bar_1 | |✓ | | 1000|acc |
|blimp_ellipsis_n_bar_2 | |✓ | | 1000|acc |blimp_ellipsis_n_bar_2 | |✓ | | 1000|acc |
|blimp_existential_there_object_raising | |✓ | | 1000|acc |blimp_existential_there_object_raising | |✓ | | 1000|acc |
|blimp_existential_there_quantifiers_1 | |✓ | | 1000|acc |blimp_existential_there_quantifiers_1 | |✓ | | 1000|acc |
|blimp_existential_there_quantifiers_2 | |✓ | | 1000|acc |blimp_existential_there_quantifiers_2 | |✓ | | 1000|acc |
|blimp_existential_there_subject_raising | |✓ | | 1000|acc |blimp_existential_there_subject_raising | |✓ | | 1000|acc |
|blimp_expletive_it_object_raising | |✓ | | 1000|acc |blimp_expletive_it_object_raising | |✓ | | 1000|acc |
|blimp_inchoative | |✓ | | 1000|acc |blimp_inchoative | |✓ | | 1000|acc |
|blimp_intransitive | |✓ | | 1000|acc |blimp_intransitive | |✓ | | 1000|acc |
|blimp_irregular_past_participle_adjectives | |✓ | | 1000|acc |blimp_irregular_past_participle_adjectives | |✓ | | 1000|acc |
|blimp_irregular_past_participle_verbs | |✓ | | 1000|acc |blimp_irregular_past_participle_verbs | |✓ | | 1000|acc |
|blimp_irregular_plural_subject_verb_agreement_1 | |✓ | | 1000|acc |blimp_irregular_plural_subject_verb_agreement_1 | |✓ | | 1000|acc |
|blimp_irregular_plural_subject_verb_agreement_2 | |✓ | | 1000|acc |blimp_irregular_plural_subject_verb_agreement_2 | |✓ | | 1000|acc |
|blimp_left_branch_island_echo_question | |✓ | | 1000|acc |blimp_left_branch_island_echo_question | |✓ | | 1000|acc |
|blimp_left_branch_island_simple_question | |✓ | | 1000|acc |blimp_left_branch_island_simple_question | |✓ | | 1000|acc |
|blimp_matrix_question_npi_licensor_present | |✓ | | 1000|acc |blimp_matrix_question_npi_licensor_present | |✓ | | 1000|acc |
|blimp_npi_present_1 | |✓ | | 1000|acc |blimp_npi_present_1 | |✓ | | 1000|acc |
|blimp_npi_present_2 | |✓ | | 1000|acc |blimp_npi_present_2 | |✓ | | 1000|acc |
|blimp_only_npi_licensor_present | |✓ | | 1000|acc |blimp_only_npi_licensor_present | |✓ | | 1000|acc |
|blimp_only_npi_scope | |✓ | | 1000|acc |blimp_only_npi_scope | |✓ | | 1000|acc |
|blimp_passive_1 | |✓ | | 1000|acc |blimp_passive_1 | |✓ | | 1000|acc |
|blimp_passive_2 | |✓ | | 1000|acc |blimp_passive_2 | |✓ | | 1000|acc |
|blimp_principle_A_c_command | |✓ | | 1000|acc |blimp_principle_A_c_command | |✓ | | 1000|acc |
|blimp_principle_A_case_1 | |✓ | | 1000|acc |blimp_principle_A_case_1 | |✓ | | 1000|acc |
|blimp_principle_A_case_2 | |✓ | | 1000|acc |blimp_principle_A_case_2 | |✓ | | 1000|acc |
|blimp_principle_A_domain_1 | |✓ | | 1000|acc |blimp_principle_A_domain_1 | |✓ | | 1000|acc |
|blimp_principle_A_domain_2 | |✓ | | 1000|acc |blimp_principle_A_domain_2 | |✓ | | 1000|acc |
|blimp_principle_A_domain_3 | |✓ | | 1000|acc |blimp_principle_A_domain_3 | |✓ | | 1000|acc |
|blimp_principle_A_reconstruction | |✓ | | 1000|acc |blimp_principle_A_reconstruction | |✓ | | 1000|acc |
|blimp_regular_plural_subject_verb_agreement_1 | |✓ | | 1000|acc |blimp_regular_plural_subject_verb_agreement_1 | |✓ | | 1000|acc |
|blimp_regular_plural_subject_verb_agreement_2 | |✓ | | 1000|acc |blimp_regular_plural_subject_verb_agreement_2 | |✓ | | 1000|acc |
|blimp_sentential_negation_npi_licensor_present | |✓ | | 1000|acc |blimp_sentential_negation_npi_licensor_present | |✓ | | 1000|acc |
|blimp_sentential_negation_npi_scope | |✓ | | 1000|acc |blimp_sentential_negation_npi_scope | |✓ | | 1000|acc |
|blimp_sentential_subject_island | |✓ | | 1000|acc |blimp_sentential_subject_island | |✓ | | 1000|acc |
|blimp_superlative_quantifiers_1 | |✓ | | 1000|acc |blimp_superlative_quantifiers_1 | |✓ | | 1000|acc |
|blimp_superlative_quantifiers_2 | |✓ | | 1000|acc |blimp_superlative_quantifiers_2 | |✓ | | 1000|acc |
|blimp_tough_vs_raising_1 | |✓ | | 1000|acc |blimp_tough_vs_raising_1 | |✓ | | 1000|acc |
|blimp_tough_vs_raising_2 | |✓ | | 1000|acc |blimp_tough_vs_raising_2 | |✓ | | 1000|acc |
|blimp_transitive | |✓ | | 1000|acc |blimp_transitive | |✓ | | 1000|acc |
|blimp_wh_island | |✓ | | 1000|acc |blimp_wh_island | |✓ | | 1000|acc |
|blimp_wh_questions_object_gap | |✓ | | 1000|acc |blimp_wh_questions_object_gap | |✓ | | 1000|acc |
|blimp_wh_questions_subject_gap | |✓ | | 1000|acc |blimp_wh_questions_subject_gap | |✓ | | 1000|acc |
|blimp_wh_questions_subject_gap_long_distance | |✓ | | 1000|acc |blimp_wh_questions_subject_gap_long_distance | |✓ | | 1000|acc |
|blimp_wh_vs_that_no_gap | |✓ | | 1000|acc |blimp_wh_vs_that_no_gap | |✓ | | 1000|acc |
|blimp_wh_vs_that_no_gap_long_distance | |✓ | | 1000|acc |blimp_wh_vs_that_no_gap_long_distance | |✓ | | 1000|acc |
|blimp_wh_vs_that_with_gap | |✓ | | 1000|acc |blimp_wh_vs_that_with_gap | |✓ | | 1000|acc |
|blimp_wh_vs_that_with_gap_long_distance | |✓ | | 1000|acc |blimp_wh_vs_that_with_gap_long_distance | |✓ | | 1000|acc |
## Usage ## Usage
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment