Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
da211969
Unverified
Commit
da211969
authored
Jun 28, 2024
by
Jess
Committed by
GitHub
Jun 28, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
1b97e487
801322e0
Changes
654
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
71 additions
and
19 deletions
+71
-19
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-70b-v1.1.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-70b-v1.1.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-70b-v1.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-70b-v1.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-7b-v1.1.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-7b-v1.1.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-7b-v1.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-7b-v1.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-13b.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-13b.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-70b.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-70b.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-7b.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-7b.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_madlad.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_madlad.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_en_mt_nllb.yaml
lm_eval/tasks/bertaqa/bertaqa_en_mt_nllb.yaml
+4
-0
lm_eval/tasks/bertaqa/bertaqa_eu.yaml
lm_eval/tasks/bertaqa/bertaqa_eu.yaml
+4
-0
lm_eval/tasks/bigbench/generate_tasks.py
lm_eval/tasks/bigbench/generate_tasks.py
+25
-1
lm_eval/tasks/bigbench/multiple_choice/abstract_narrative_understanding.yaml
...nch/multiple_choice/abstract_narrative_understanding.yaml
+1
-1
lm_eval/tasks/bigbench/multiple_choice/anachronisms.yaml
lm_eval/tasks/bigbench/multiple_choice/anachronisms.yaml
+1
-1
lm_eval/tasks/bigbench/multiple_choice/analogical_similarity.yaml
...tasks/bigbench/multiple_choice/analogical_similarity.yaml
+1
-1
lm_eval/tasks/bigbench/multiple_choice/analytic_entailment.yaml
...l/tasks/bigbench/multiple_choice/analytic_entailment.yaml
+1
-1
lm_eval/tasks/bigbench/multiple_choice/arithmetic.yaml
lm_eval/tasks/bigbench/multiple_choice/arithmetic.yaml
+1
-1
lm_eval/tasks/bigbench/multiple_choice/ascii_word_recognition.yaml
...asks/bigbench/multiple_choice/ascii_word_recognition.yaml
+0
-4
lm_eval/tasks/bigbench/multiple_choice/authorship_verification.yaml
...sks/bigbench/multiple_choice/authorship_verification.yaml
+1
-1
lm_eval/tasks/bigbench/multiple_choice/auto_categorization.yaml
...l/tasks/bigbench/multiple_choice/auto_categorization.yaml
+0
-4
lm_eval/tasks/bigbench/multiple_choice/auto_debugging.yaml
lm_eval/tasks/bigbench/multiple_choice/auto_debugging.yaml
+0
-4
No files found.
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-70b-v1.1.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_latxa-70b-v1.1
include
:
_bertaqa_template
dataset_name
:
en_mt_latxa-70b-v1.1
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-70b-v1.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_latxa-70b-v1
include
:
_bertaqa_template
dataset_name
:
en_mt_latxa-70b-v1
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-7b-v1.1.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_latxa-7b-v1.1
include
:
_bertaqa_template
dataset_name
:
en_mt_latxa-7b-v1.1
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_latxa-7b-v1.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_latxa-7b-v1
include
:
_bertaqa_template
dataset_name
:
en_mt_latxa-7b-v1
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-13b.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_llama-2-13b
include
:
_bertaqa_template
dataset_name
:
en_mt_llama-2-13b
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-70b.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_llama-2-70b
include
:
_bertaqa_template
dataset_name
:
en_mt_llama-2-70b
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_llama-2-7b.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_llama-2-7b
include
:
_bertaqa_template
dataset_name
:
en_mt_llama-2-7b
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_madlad.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_madlad
include
:
_bertaqa_template
dataset_name
:
en_mt_madlad
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_en_mt_nllb.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_en_mt_nllb
include
:
_bertaqa_template
dataset_name
:
en_mt_nllb
doc_to_text
:
"
Question:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Answer:"
lm_eval/tasks/bertaqa/bertaqa_eu.yaml
0 → 100644
View file @
da211969
task
:
bertaqa_eu
include
:
_bertaqa_template
dataset_name
:
eu
doc_to_text
:
"
Galdera:
{{question}}
\n
A:
{{candidates[0]}}
\n
B:
{{candidates[1]}}
\n
C:
{{candidates[2]}}
\n
Erantzuna:"
lm_eval/tasks/bigbench/generate_tasks.py
View file @
da211969
import
os
import
datasets
import
yaml
...
...
@@ -173,6 +174,11 @@ all_subtasks = [
"word_unscrambling"
,
]
skip_tasks
=
[
"simple_arithmetic_json_multiple_choice"
,
"simple_arithmetic_multiple_targets_json"
,
]
def
main
()
->
None
:
for
path
,
task_type
in
zip
(
...
...
@@ -183,11 +189,29 @@ def main() -> None:
for
task
in
all_subtasks
:
file_name
=
f
"
{
task
}
.yaml"
try
:
template_file
=
task_type
if
path
==
"multiple_choice"
:
print
(
f
"Checking
{
task
}
for multiple choices"
)
if
task
in
skip_tasks
:
continue
data
=
datasets
.
load_dataset
(
"hails/bigbench"
,
task
+
"_zero_shot"
)
multiple_choice_targets
=
data
[
"default"
][
0
][
"multiple_choice_targets"
]
if
len
(
multiple_choice_targets
)
==
0
:
continue
else
:
template_file
=
"multiple_choice_template_b_yaml"
if
set
(
data
[
"default"
][
0
][
"targets"
])
<
set
(
multiple_choice_targets
):
template_file
=
"multiple_choice_template_a_yaml"
with
open
(
f
"
{
path
}
/
{
file_name
}
"
,
"w"
,
encoding
=
"utf-8"
)
as
f
:
f
.
write
(
"# Generated by utils.py
\n
"
)
yaml
.
dump
(
{
"include"
:
f
"../
{
t
ask_typ
e
}
"
,
"include"
:
f
"../
{
t
emplate_fil
e
}
"
,
"task"
:
"bigbench_"
+
task
+
"_{}"
.
format
(
task_type
.
split
(
"_template_yaml"
)[
0
]),
...
...
lm_eval/tasks/bigbench/multiple_choice/abstract_narrative_understanding.yaml
View file @
da211969
# Generated by utils.py
dataset_name
:
abstract_narrative_understanding_zero_shot
include
:
../multiple_choice_template_yaml
include
:
../multiple_choice_template_
a_
yaml
task
:
bigbench_abstract_narrative_understanding_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/anachronisms.yaml
View file @
da211969
# Generated by utils.py
dataset_name
:
anachronisms_zero_shot
include
:
../multiple_choice_template_yaml
include
:
../multiple_choice_template_
a_
yaml
task
:
bigbench_anachronisms_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/analogical_similarity.yaml
View file @
da211969
# Generated by utils.py
dataset_name
:
analogical_similarity_zero_shot
include
:
../multiple_choice_template_yaml
include
:
../multiple_choice_template_
a_
yaml
task
:
bigbench_analogical_similarity_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/analytic_entailment.yaml
View file @
da211969
# Generated by utils.py
dataset_name
:
analytic_entailment_zero_shot
include
:
../multiple_choice_template_yaml
include
:
../multiple_choice_template_
a_
yaml
task
:
bigbench_analytic_entailment_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/arithmetic.yaml
View file @
da211969
# Generated by utils.py
dataset_name
:
arithmetic_zero_shot
include
:
../multiple_choice_template_yaml
include
:
../multiple_choice_template_
a_
yaml
task
:
bigbench_arithmetic_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/ascii_word_recognition.yaml
deleted
100644 → 0
View file @
1b97e487
# Generated by utils.py
dataset_name
:
ascii_word_recognition_zero_shot
include
:
../multiple_choice_template_yaml
task
:
bigbench_ascii_word_recognition_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/authorship_verification.yaml
View file @
da211969
# Generated by utils.py
dataset_name
:
authorship_verification_zero_shot
include
:
../multiple_choice_template_yaml
include
:
../multiple_choice_template_
a_
yaml
task
:
bigbench_authorship_verification_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/auto_categorization.yaml
deleted
100644 → 0
View file @
1b97e487
# Generated by utils.py
dataset_name
:
auto_categorization_zero_shot
include
:
../multiple_choice_template_yaml
task
:
bigbench_auto_categorization_multiple_choice
lm_eval/tasks/bigbench/multiple_choice/auto_debugging.yaml
deleted
100644 → 0
View file @
1b97e487
# Generated by utils.py
dataset_name
:
auto_debugging_zero_shot
include
:
../multiple_choice_template_yaml
task
:
bigbench_auto_debugging_multiple_choice
Prev
1
…
8
9
10
11
12
13
14
15
16
…
33
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment