Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
dd59b0ef
Commit
dd59b0ef
authored
Oct 06, 2021
by
Jonathan Tow
Browse files
Adhere to spacing convention
parent
17c47812
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
6 deletions
+6
-6
lm_eval/tasks/truthfulqa.py
lm_eval/tasks/truthfulqa.py
+6
-6
No files found.
lm_eval/tasks/truthfulqa.py
View file @
dd59b0ef
...
@@ -4,9 +4,9 @@ https://arxiv.org/pdf/2109.07958.pdf
...
@@ -4,9 +4,9 @@ https://arxiv.org/pdf/2109.07958.pdf
TODO: Add support for the automatic metrics, 'GPT-judge' and 'GPT-info', which
TODO: Add support for the automatic metrics, 'GPT-judge' and 'GPT-info', which
predict human evaluation of truth and informativeness (respectively) through
predict human evaluation of truth and informativeness (respectively) through
a fine-tuned GPT-3 model. NOTE: This requires access keys to the corresponding
a fine-tuned GPT-3 model. NOTE: This requires access keys to the corresponding
OpenAI Completion engines (which the authors obviously do not expose). They do
OpenAI Completion engines (which the authors obviously do not expose). They do
provide the data used to fine-tune GPT-3 into `GPT-judge` and `GPT-info`, see
provide the data used to fine-tune GPT-3 into `GPT-judge` and `GPT-info`, see
https://github.com/sylinrl/TruthfulQA#Fine-tuning-GPT-3-for-evaluation. Maybe
https://github.com/sylinrl/TruthfulQA#Fine-tuning-GPT-3-for-evaluation. Maybe
we could try this?
we could try this?
...
@@ -80,10 +80,10 @@ class TruthfulQAMultipleChoice(Task):
...
@@ -80,10 +80,10 @@ class TruthfulQAMultipleChoice(Task):
raise
NotImplementedError
()
raise
NotImplementedError
()
def
doc_to_text
(
self
,
doc
):
def
doc_to_text
(
self
,
doc
):
return
QA_PROMPT
+
"
\n\n
Q: "
+
doc
[
'question'
]
+
"
\n
A:
"
return
QA_PROMPT
+
"
\n\n
Q: "
+
doc
[
'question'
]
+
"
\n
A:"
def
doc_to_target
(
self
,
doc
):
def
doc_to_target
(
self
,
doc
):
return
""
return
"
"
def
fewshot_context
(
self
,
doc
,
num_fewshot
,
provide_description
,
rnd
):
def
fewshot_context
(
self
,
doc
,
num_fewshot
,
provide_description
,
rnd
):
assert
num_fewshot
==
0
,
"TruthfulQA is intended only for the zero-shot setting."
assert
num_fewshot
==
0
,
"TruthfulQA is intended only for the zero-shot setting."
...
@@ -198,7 +198,7 @@ class TruthfulQAGeneration(Task):
...
@@ -198,7 +198,7 @@ class TruthfulQAGeneration(Task):
correct_answers
.
append
(
"I have no comment."
)
correct_answers
.
append
(
"I have no comment."
)
incorrect_answers
=
self
.
_split_multi_answer
(
doc
[
'Incorrect Answers'
])
incorrect_answers
=
self
.
_split_multi_answer
(
doc
[
'Incorrect Answers'
])
doc
=
{
doc
=
{
'question'
:
doc
[
'Question'
],
'question'
:
doc
[
'Question'
]
.
strip
()
,
'correct_answers'
:
correct_answers
,
'correct_answers'
:
correct_answers
,
'incorrect_answers'
:
incorrect_answers
'incorrect_answers'
:
incorrect_answers
}
}
...
@@ -211,7 +211,7 @@ class TruthfulQAGeneration(Task):
...
@@ -211,7 +211,7 @@ class TruthfulQAGeneration(Task):
return
QA_PROMPT
+
"
\n\n
Q: "
+
doc
[
'question'
]
return
QA_PROMPT
+
"
\n\n
Q: "
+
doc
[
'question'
]
def
doc_to_target
(
self
,
doc
):
def
doc_to_target
(
self
,
doc
):
return
""
return
"
"
def
fewshot_context
(
self
,
doc
,
num_fewshot
,
provide_description
,
rnd
):
def
fewshot_context
(
self
,
doc
,
num_fewshot
,
provide_description
,
rnd
):
assert
num_fewshot
==
0
,
"TruthfulQA is intended only for the zero-shot setting."
assert
num_fewshot
==
0
,
"TruthfulQA is intended only for the zero-shot setting."
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment