Merge pull request #923 from EleutherAI/fix_squadv2

[Refactor] Squadv2 updates

Merge pull request #923 from EleutherAI/fix_squadv2
[Refactor] Squadv2 updates
73c80915 · Hailey Schoelkopf · GitHub · a056eded · a7ba3d76 · 73c80915
Unverified Commit 73c80915 authored Oct 17, 2023 by Hailey Schoelkopf Committed by GitHub Oct 17, 2023
3 changed files
--- a/lm_eval/tasks/squadv2/README.md
+++ b/lm_eval/tasks/squadv2/README.md
@@ -2,25 +2,44 @@

 ### Paper

-Title: `paper title goes here`
-Abstract: `link to paper PDF or arXiv abstract goes here`
+Title: `Know What You Don’t Know: Unanswerable Questions for SQuAD`
+Abstract: https://arxiv.org/abs/1806.03822

-`Short description of paper / benchmark goes here:`
+Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset,
+consisting of questions posed by crowdworkers on a set of Wikipedia articles,
+where the answer to every question is a segment of text, or span, from the
+corresponding reading passage, or the question might be unanswerable.
+SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable
+questions written adversarially by crowdworkers to look similar to answerable ones.
+To do well on SQuAD2.0, systems must not only answer questions when possible, but
+also determine when no answer is supported by the paragraph and abstain from answering.

-Homepage: `homepage to the benchmark's website goes here, if applicable`
+Homepage: https://rajpurkar.github.io/SQuAD-explorer/


 ### Citation

 ```
-BibTeX-formatted citation goes here
+@misc{rajpurkar2018know,
+    title={Know What You Don't Know: Unanswerable Questions for SQuAD},
+    author={Pranav Rajpurkar and Robin Jia and Percy Liang},
+    year={2018},
+    eprint={1806.03822},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
 ```

-### Subtasks
+### Groups and Tasks

-List or describe tasks defined in this folder, and their names here:
-* `task_name`: `1-sentence description of what this particular task does`
-* `task_name2`: .....
+#### Groups
+
+* `squadv2_complete`: Runs both `squadv2` and `squadv2_noans_loglikelihood`
+
+#### Tasks
+
+* `squadv2`: `Default squadv2 task`
+* `squadv2_noans_loglikelihood`: `Additional task to acquire the probability of model predicting there is no answer`

 ### Checklist


--- a/lm_eval/tasks/squadv2/_template_yaml
+++ b/lm_eval/tasks/squadv2/_template_yaml
+dataset_path: squad_v2
+training_split: train
+validation_split: validation
+doc_to_text: "Title: {{title}}\n\nBackground: {{context}}\n\nQuestion: {{question}}\n\n Answer:"
+doc_to_target: "{% if answers.text| length > 0 %}{{answers.text}}{% else %}{{['']}}{% endif %}"
+target_delimiter: ""
+should_decontaminate: true
+doc_to_decontamination_query: context
--- a/lm_eval/tasks/squadv2/no_ans.yaml
+++ b/lm_eval/tasks/squadv2/no_ans.yaml
-include: default.yaml
+include: _template_yaml
 task: squadv2_noans_loglikelihood
-dataset_path: squad_v2
 output_type: loglikelihood
-training_split: train
-validation_split: validation
 doc_to_target: " unanswerable"
 metric_list:
  - metric: perplexity