Commit e4db76cb authored by haileyschoelkopf's avatar haileyschoelkopf
Browse files

Merge branch 'main' into multimodal-prototyping

parents 6cc6e9cd ad80f555
"dataset_name": "security_studies" "dataset_name": "security_studies"
"description": "The following are multiple choice questions (with answers) about security\ "description": "The following are multiple choice questions (with answers) about security\
\ studies.\n\n" \ studies.\n\n"
"group": "mmlu_social_sciences" "tag": "mmlu_social_sciences_tasks"
"group_alias": "social_sciences"
"include": "_default_template_yaml" "include": "_default_template_yaml"
"task": "mmlu_security_studies" "task": "mmlu_security_studies"
"task_alias": "security_studies" "task_alias": "security_studies"
"dataset_name": "sociology" "dataset_name": "sociology"
"description": "The following are multiple choice questions (with answers) about sociology.\n\ "description": "The following are multiple choice questions (with answers) about sociology.\n\
\n" \n"
"group": "mmlu_social_sciences" "tag": "mmlu_social_sciences_tasks"
"group_alias": "social_sciences"
"include": "_default_template_yaml" "include": "_default_template_yaml"
"task": "mmlu_sociology" "task": "mmlu_sociology"
"task_alias": "sociology" "task_alias": "sociology"
"dataset_name": "us_foreign_policy" "dataset_name": "us_foreign_policy"
"description": "The following are multiple choice questions (with answers) about us\ "description": "The following are multiple choice questions (with answers) about us\
\ foreign policy.\n\n" \ foreign policy.\n\n"
"group": "mmlu_social_sciences" "tag": "mmlu_social_sciences_tasks"
"group_alias": "social_sciences"
"include": "_default_template_yaml" "include": "_default_template_yaml"
"task": "mmlu_us_foreign_policy" "task": "mmlu_us_foreign_policy"
"task_alias": "us_foreign_policy" "task_alias": "us_foreign_policy"
"dataset_name": "virology" "dataset_name": "virology"
"description": "The following are multiple choice questions (with answers) about virology.\n\ "description": "The following are multiple choice questions (with answers) about virology.\n\
\n" \n"
"group": "mmlu_other" "tag": "mmlu_other_tasks"
"group_alias": "other"
"include": "_default_template_yaml" "include": "_default_template_yaml"
"task": "mmlu_virology" "task": "mmlu_virology"
"task_alias": "virology" "task_alias": "virology"
"dataset_name": "world_religions" "dataset_name": "world_religions"
"description": "The following are multiple choice questions (with answers) about world\ "description": "The following are multiple choice questions (with answers) about world\
\ religions.\n\n" \ religions.\n\n"
"group": "mmlu_humanities" "tag": "mmlu_humanities_tasks"
"group_alias": "humanities"
"include": "_default_template_yaml" "include": "_default_template_yaml"
"task": "mmlu_world_religions" "task": "mmlu_world_religions"
"task_alias": "world_religions" "task_alias": "world_religions"
group: mmlu_flan_cot_fewshot group: mmlu_flan_cot_fewshot
group_alias: mmlu (flan style, fewshot cot)
task: task:
- mmlu_flan_cot_fewshot_stem - group: stem
- mmlu_flan_cot_fewshot_other task:
- mmlu_flan_cot_fewshot_social_sciences - mmlu_flan_cot_fewshot_stem
- mmlu_flan_cot_fewshot_humanities aggregate_metric_list:
- metric: acc
weight_by_size: True
- group: other
task:
- mmlu_flan_cot_fewshot_other
aggregate_metric_list:
- metric: acc
weight_by_size: True
- group: social sciences
task:
- mmlu_flan_cot_fewshot_social_sciences
aggregate_metric_list:
- metric: acc
weight_by_size: True
- group: humanities
task:
- mmlu_flan_cot_fewshot_humanities
aggregate_metric_list:
- metric: acc
weight_by_size: True
aggregate_metric_list:
- metric: acc
weight_by_size: True
metadata:
version: 1
...@@ -54,6 +54,6 @@ fewshot_config: ...@@ -54,6 +54,6 @@ fewshot_config:
not have any roots. For c = 2 the polynomial x^2 + 2 has two roots at x = 1 not have any roots. For c = 2 the polynomial x^2 + 2 has two roots at x = 1
and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only if c = 1. The answer and x = 2. Hence Z_3[x]/(x^2 + c) is a field if and only if c = 1. The answer
is (B).' is (B).'
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_abstract_algebra task: mmlu_flan_cot_fewshot_abstract_algebra
...@@ -70,6 +70,6 @@ fewshot_config: ...@@ -70,6 +70,6 @@ fewshot_config:
\ origin of the hyoid bone are the second and the third pharyngeal arches\u2014\ \ origin of the hyoid bone are the second and the third pharyngeal arches\u2014\
this information is covered in the last option (D). Therefore, we conclude that\ this information is covered in the last option (D). Therefore, we conclude that\
\ (D) must be the correct answer. The answer is (D).\n\n" \ (D) must be the correct answer. The answer is (D).\n\n"
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_anatomy task: mmlu_flan_cot_fewshot_anatomy
...@@ -65,6 +65,6 @@ fewshot_config: ...@@ -65,6 +65,6 @@ fewshot_config:
because it explains that the surface is red due to the rusted materials on the because it explains that the surface is red due to the rusted materials on the
surface and the red color comes from the rust. So the correct option is (A). surface and the red color comes from the rust. So the correct option is (A).
The answer is (A).' The answer is (A).'
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_astronomy task: mmlu_flan_cot_fewshot_astronomy
...@@ -70,6 +70,6 @@ fewshot_config: ...@@ -70,6 +70,6 @@ fewshot_config:
\ moral arguments relating to: negative *externalities*, the *power* that corporations\ \ moral arguments relating to: negative *externalities*, the *power* that corporations\
\ possess and the *mutual independence* of business and society. The answer\ \ possess and the *mutual independence* of business and society. The answer\
\ is (D).\n\n" \ is (D).\n\n"
group: mmlu_flan_cot_fewshot_other tag: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_business_ethics task: mmlu_flan_cot_fewshot_business_ethics
...@@ -43,6 +43,6 @@ fewshot_config: ...@@ -43,6 +43,6 @@ fewshot_config:
target: 'Let''s think step by step. We refer to Wikipedia articles on clinical target: 'Let''s think step by step. We refer to Wikipedia articles on clinical
knowledge for help. The energy for muscular contraction is provided by ATP (adenosine knowledge for help. The energy for muscular contraction is provided by ATP (adenosine
triphosphate), which is the powerhouse of the cell. The answer is (A).' triphosphate), which is the powerhouse of the cell. The answer is (A).'
group: mmlu_flan_cot_fewshot_other tag: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_clinical_knowledge task: mmlu_flan_cot_fewshot_clinical_knowledge
...@@ -70,6 +70,6 @@ fewshot_config: ...@@ -70,6 +70,6 @@ fewshot_config:
that have different origins, which is not the case for the human and bird forearms, that have different origins, which is not the case for the human and bird forearms,
which rules out (D). Humans and birds do belong to the same clade - a group which rules out (D). Humans and birds do belong to the same clade - a group
of organisms composed of a common ancestor. The answer is (C).' of organisms composed of a common ancestor. The answer is (C).'
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_biology task: mmlu_flan_cot_fewshot_college_biology
...@@ -44,6 +44,6 @@ fewshot_config: ...@@ -44,6 +44,6 @@ fewshot_config:
\ into 2 lines. This will be further split into 4 lines by the interaction with\ \ into 2 lines. This will be further split into 4 lines by the interaction with\
\ three equivalent 1H nuclei. The total number of lines is therefore $2 \\cdot\ \ three equivalent 1H nuclei. The total number of lines is therefore $2 \\cdot\
\ 4 = 8$. The answer is (E).\n\n" \ 4 = 8$. The answer is (E).\n\n"
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_chemistry task: mmlu_flan_cot_fewshot_college_chemistry
...@@ -175,6 +175,6 @@ fewshot_config: ...@@ -175,6 +175,6 @@ fewshot_config:
(1000 nanoseconds / cache miss) * (1 cache miss / 50 instructions) * (50 instructions (1000 nanoseconds / cache miss) * (1 cache miss / 50 instructions) * (50 instructions
/ 27000 nanoseconds) = 1000 * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer / 27000 nanoseconds) = 1000 * (1/50) * (50/27000) = 1000/27000 = 1/27. The answer
is (B).' is (B).'
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_computer_science task: mmlu_flan_cot_fewshot_college_computer_science
...@@ -68,6 +68,6 @@ fewshot_config: ...@@ -68,6 +68,6 @@ fewshot_config:
\ Then, for all $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and\ \ Then, for all $t \\in \\mathbb{R}$, we have $(s(t))-2=K e^{-t / 25}$, and\
\ so $s(t)=2+K e^{-t / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K\ \ so $s(t)=2+K e^{-t / 25}$. Then $3=s(0)=2+K e^{0}=2+K$, so $K=1$. Then $s(100)=2+K\
\ e^{-100 / 25}=2+1 \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n" \ e^{-100 / 25}=2+1 \\cdot e^{-4}=2+e^{-4}$. The answer is (D).\n\n"
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_mathematics task: mmlu_flan_cot_fewshot_college_mathematics
...@@ -63,6 +63,6 @@ fewshot_config: ...@@ -63,6 +63,6 @@ fewshot_config:
for help. Glucose (also known as the blood sugar) is the main sugar found in for help. Glucose (also known as the blood sugar) is the main sugar found in
the human body. It is transported into the muscle cell via diffusion through the human body. It is transported into the muscle cell via diffusion through
protein transporters called GLUT4. The answer is (A).' protein transporters called GLUT4. The answer is (A).'
group: mmlu_flan_cot_fewshot_other tag: mmlu_flan_cot_fewshot_other
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_medicine task: mmlu_flan_cot_fewshot_college_medicine
...@@ -56,6 +56,6 @@ fewshot_config: ...@@ -56,6 +56,6 @@ fewshot_config:
of the gas container is constant, no work will be done (since work is pressure of the gas container is constant, no work will be done (since work is pressure
times change in volume). So, at constant volume, all of the heat goes into the times change in volume). So, at constant volume, all of the heat goes into the
internal energy. The answer is (B).' internal energy. The answer is (B).'
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_college_physics task: mmlu_flan_cot_fewshot_college_physics
...@@ -45,6 +45,6 @@ fewshot_config: ...@@ -45,6 +45,6 @@ fewshot_config:
of the TLS heartbeat extension. The vulnerability was classified as a buffer of the TLS heartbeat extension. The vulnerability was classified as a buffer
over-read, a situation where more data can be read than should be allowed. The over-read, a situation where more data can be read than should be allowed. The
answer is (C).' answer is (C).'
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_computer_security task: mmlu_flan_cot_fewshot_computer_security
...@@ -44,6 +44,6 @@ fewshot_config: ...@@ -44,6 +44,6 @@ fewshot_config:
\ orthogonal to the wind is the same as it would be in the absence of the wind.\ \ orthogonal to the wind is the same as it would be in the absence of the wind.\
\ The total speed, which is these two components added in quadrature, is thus\ \ The total speed, which is these two components added in quadrature, is thus\
\ greater than the speed in still air. The answer is (B).\n\n" \ greater than the speed in still air. The answer is (B).\n\n"
group: mmlu_flan_cot_fewshot_stem tag: mmlu_flan_cot_fewshot_stem
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_conceptual_physics task: mmlu_flan_cot_fewshot_conceptual_physics
...@@ -82,6 +82,6 @@ fewshot_config: ...@@ -82,6 +82,6 @@ fewshot_config:
target: 'Let''s think step by step. We refer to Wikipedia articles on econometrics target: 'Let''s think step by step. We refer to Wikipedia articles on econometrics
for help. This is a formal logic problem about stationally process. For a stationary for help. This is a formal logic problem about stationally process. For a stationary
autoregressive process, shocks will eventually die away. The answer is (A).' autoregressive process, shocks will eventually die away. The answer is (A).'
group: mmlu_flan_cot_fewshot_social_sciences tag: mmlu_flan_cot_fewshot_social_sciences
include: _mmlu_flan_cot_fewshot_template_yaml include: _mmlu_flan_cot_fewshot_template_yaml
task: mmlu_flan_cot_fewshot_econometrics task: mmlu_flan_cot_fewshot_econometrics
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment