add mc taco

cfbc5a8a · lintangsutawika · 8eab2a58 · cfbc5a8a · cfbc5a8a
Commit cfbc5a8a authored Aug 15, 2023 by lintangsutawika
Hide whitespace changes
Inline Side-by-side

Showing with 49 additions and 0 deletions

lm_eval/tasks/mc_taco/README.md lm_eval/tasks/mc_taco/README.md +36 -0

lm_eval/tasks/mc_taco/default.yaml lm_eval/tasks/mc_taco/default.yaml +13 -0

No files found.
--- a/lm_eval/tasks/mc_taco/README.md
+++ b/lm_eval/tasks/mc_taco/README.md
+# Task-name
+
+### Paper
+
+Title: `paper title goes here`
+Abstract: `link to paper PDF or arXiv abstract goes here`
+
+`Short description of paper / benchmark goes here:`
+
+Homepage: `homepage to the benchmark's website goes here, if applicable`
+
+
+### Citation
+
+```
+BibTeX-formatted citation goes here
+```
+
+### Subtasks
+
+List or describe tasks defined in this folder, and their names here:
+* `task_name`: `1-sentence description of what this particular task does`
+* `task_name2`: .....
+
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/mc_taco/default.yaml
+++ b/lm_eval/tasks/mc_taco/default.yaml
+task: mc_taco
+dataset_path: mc_taco
+output_type: multiple_choice
+validation_split: validation
+test_split: test
+doc_to_text: "{{sentence}}\nQuestion: {{question}}\nAnswer: {{answer}}\nPlausible:"
+doc_to_target: label
+doc_to_choice: ["no", "yes"]
+should_decontaminate: true
+doc_to_decontamination_query: "{{question}} {{sentence}}"
+metric_list:
+  - metric: acc
+  - metric: f1