edit thruthfulqa

4c935519 · lintangsutawika · ed3b0444 · 4c935519 · 4c935519
Commit 4c935519 authored Aug 14, 2023 by lintangsutawika
Hide whitespace changes
Inline Side-by-side

Showing with 22 additions and 5 deletions

lm_eval/tasks/truthfulqa/README.md lm_eval/tasks/truthfulqa/README.md +22 -3

lm_eval/tasks/truthfulqa/truthfulqa_mc1.yaml lm_eval/tasks/truthfulqa/truthfulqa_mc1.yaml +0 -2

No files found.
--- a/lm_eval/tasks/truthfulqa/README.md
+++ b/lm_eval/tasks/truthfulqa/README.md
@@ -27,8 +27,27 @@ Homepage: `https://github.com/sylinrl/TruthfulQA`
 }
 ```
-### Subtasks
+### Groups and Tasks
+#### Groups
+* Not part of a group yet.
+#### Tasks
 * `truthfulqa_mc1`: `Multiple-choice, single answer`
-* `truthfulqa_mc2`: `Multiple-choice, multiple answers`
+* (MISSING)`truthfulqa_mc2`: `Multiple-choice, multiple answers`
-* `truthfulqa_gen`: `Answer generation`
+* (MISSING)`truthfulqa_gen`: `Answer generation`
+### Checklist
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/truthfulqa/truthfulqa_mc1.yaml
+++ b/lm_eval/tasks/truthfulqa/truthfulqa_mc1.yaml
-group:
-  - multiple_choice
 task: truthfulqa_mc1
 dataset_path: truthful_qa
 dataset_name: multiple_choice