Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
73c80915
Unverified
Commit
73c80915
authored
Oct 17, 2023
by
Hailey Schoelkopf
Committed by
GitHub
Oct 17, 2023
Browse files
Merge pull request #923 from EleutherAI/fix_squadv2
[Refactor] Squadv2 updates
parents
a056eded
a7ba3d76
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
37 additions
and
13 deletions
+37
-13
lm_eval/tasks/squadv2/README.md
lm_eval/tasks/squadv2/README.md
+28
-9
lm_eval/tasks/squadv2/_template_yaml
lm_eval/tasks/squadv2/_template_yaml
+8
-0
lm_eval/tasks/squadv2/no_ans.yaml
lm_eval/tasks/squadv2/no_ans.yaml
+1
-4
No files found.
lm_eval/tasks/squadv2/README.md
View file @
73c80915
...
...
@@ -2,25 +2,44 @@
### Paper
Title:
`
paper title goes here
`
Abstract:
`link to paper PDF or arXiv abstract goes here`
Title:
`
Know What You Don’t Know: Unanswerable Questions for SQuAD
`
Abstract:
https://arxiv.org/abs/1806.03822
`Short description of paper / benchmark goes here:`
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset,
consisting of questions posed by crowdworkers on a set of Wikipedia articles,
where the answer to every question is a segment of text, or span, from the
corresponding reading passage, or the question might be unanswerable.
SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable
questions written adversarially by crowdworkers to look similar to answerable ones.
To do well on SQuAD2.0, systems must not only answer questions when possible, but
also determine when no answer is supported by the paragraph and abstain from answering.
Homepage:
`homepage to the benchmark's website goes here, if applicable`
Homepage:
https://rajpurkar.github.io/SQuAD-explorer/
### Citation
```
BibTeX-formatted citation goes here
@misc{rajpurkar2018know,
title={Know What You Don't Know: Unanswerable Questions for SQuAD},
author={Pranav Rajpurkar and Robin Jia and Percy Liang},
year={2018},
eprint={1806.03822},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
###
Subt
asks
###
Groups and T
asks
List or describe tasks defined in this folder, and their names here:
*
`task_name`
:
`1-sentence description of what this particular task does`
*
`task_name2`
: .....
#### Groups
*
`squadv2_complete`
: Runs both
`squadv2`
and
`squadv2_noans_loglikelihood`
#### Tasks
*
`squadv2`
:
`Default squadv2 task`
*
`squadv2_noans_loglikelihood`
:
`Additional task to acquire the probability of model predicting there is no answer`
### Checklist
...
...
lm_eval/tasks/squadv2/_template_yaml
0 → 100644
View file @
73c80915
dataset_path: squad_v2
training_split: train
validation_split: validation
doc_to_text: "Title: {{title}}\n\nBackground: {{context}}\n\nQuestion: {{question}}\n\n Answer:"
doc_to_target: "{% if answers.text| length > 0 %}{{answers.text}}{% else %}{{['']}}{% endif %}"
target_delimiter: ""
should_decontaminate: true
doc_to_decontamination_query: context
lm_eval/tasks/squadv2/no_ans.yaml
View file @
73c80915
include
:
default.
yaml
include
:
_template_
yaml
task
:
squadv2_noans_loglikelihood
dataset_path
:
squad_v2
output_type
:
loglikelihood
training_split
:
train
validation_split
:
validation
doc_to_target
:
"
unanswerable"
metric_list
:
-
metric
:
perplexity
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment