edit openbookqa

7714e30b · lintangsutawika · e0024475 · 7714e30b · 7714e30b
Commit 7714e30b authored Aug 14, 2023 by lintangsutawika
Hide whitespace changes
Inline Side-by-side

Showing with 49 additions and 2 deletions

lm_eval/tasks/openbookqa/README.md lm_eval/tasks/openbookqa/README.md +49 -0

lm_eval/tasks/openbookqa/openbookqa.yaml lm_eval/tasks/openbookqa/openbookqa.yaml +0 -2

No files found.
--- a/lm_eval/tasks/openbookqa/README.md
+++ b/lm_eval/tasks/openbookqa/README.md
+# Task-name
+
+### Paper
+
+Title: `Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering`
+
+Abstract: https://arxiv.org/abs/1809.02789
+
+OpenBookQA is a question-answering dataset modeled after open book exams for
+assessing human understanding of a subject. It consists of 5,957 multiple-choice
+elementary-level science questions (4,957 train, 500 dev, 500 test), which probe
+the understanding of a small “book” of 1,326 core science facts and the application
+of these facts to novel situations. For training, the dataset includes a mapping
+from each question to the core science fact it was designed to probe. Answering
+OpenBookQA questions requires additional broad common knowledge, not contained
+in the book. The questions, by design, are answered incorrectly by both a retrieval-
+based algorithm and a word co-occurrence algorithm.
+
+Homepage: https://allenai.org/data/open-book-qa
+
+
+### Citation
+
+```
+BibTeX-formatted citation goes here
+```
+
+### Groups and Tasks
+
+#### Groups
+
+* Not part of a group yet
+
+#### Tasks
+
+* `openbookqa`
+
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/openbookqa/openbookqa.yaml
+++ b/lm_eval/tasks/openbookqa/openbookqa.yaml
-group:
-  - multiple_choice
 task: openbookqa
 dataset_path: openbookqa
 dataset_name: main