Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
cda25fef
"ts/webui/src/components/common/OpenRow.tsx" did not exist on "dd04c73ab2b03d901330e8516b5f625970adef8c"
Unverified
Commit
cda25fef
authored
Jan 02, 2024
by
Lintang Sutawika
Committed by
GitHub
Jan 02, 2024
Browse files
Merge branch 'main' into standardize_metrics
parents
dfb41835
4d10ad56
Changes
249
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
24 additions
and
22 deletions
+24
-22
lm_eval/tasks/coqa/default.yaml
lm_eval/tasks/coqa/default.yaml
+1
-1
lm_eval/tasks/coqa/utils.py
lm_eval/tasks/coqa/utils.py
+1
-3
lm_eval/tasks/crows_pairs/crows_pairs_english.yaml
lm_eval/tasks/crows_pairs/crows_pairs_english.yaml
+1
-1
lm_eval/tasks/csatqa/_default_csatqa_yaml
lm_eval/tasks/csatqa/_default_csatqa_yaml
+1
-1
lm_eval/tasks/csatqa/_generate_configs.py
lm_eval/tasks/csatqa/_generate_configs.py
+0
-2
lm_eval/tasks/drop/default.yaml
lm_eval/tasks/drop/default.yaml
+1
-1
lm_eval/tasks/drop/utils.py
lm_eval/tasks/drop/utils.py
+0
-1
lm_eval/tasks/fld/fld_default.yaml
lm_eval/tasks/fld/fld_default.yaml
+7
-0
lm_eval/tasks/glue/cola/default.yaml
lm_eval/tasks/glue/cola/default.yaml
+1
-1
lm_eval/tasks/glue/mnli/default.yaml
lm_eval/tasks/glue/mnli/default.yaml
+1
-1
lm_eval/tasks/glue/mrpc/default.yaml
lm_eval/tasks/glue/mrpc/default.yaml
+1
-1
lm_eval/tasks/glue/qnli/default.yaml
lm_eval/tasks/glue/qnli/default.yaml
+1
-1
lm_eval/tasks/glue/qqp/default.yaml
lm_eval/tasks/glue/qqp/default.yaml
+1
-1
lm_eval/tasks/glue/rte/default.yaml
lm_eval/tasks/glue/rte/default.yaml
+1
-1
lm_eval/tasks/glue/sst2/default.yaml
lm_eval/tasks/glue/sst2/default.yaml
+1
-1
lm_eval/tasks/glue/wnli/default.yaml
lm_eval/tasks/glue/wnli/default.yaml
+1
-1
lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml
lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml
+1
-1
lm_eval/tasks/gsm8k/gsm8k-cot.yaml
lm_eval/tasks/gsm8k/gsm8k-cot.yaml
+1
-1
lm_eval/tasks/gsm8k/gsm8k.yaml
lm_eval/tasks/gsm8k/gsm8k.yaml
+1
-1
lm_eval/tasks/headqa/headqa_en.yaml
lm_eval/tasks/headqa/headqa_en.yaml
+1
-1
No files found.
lm_eval/tasks/coqa/default.yaml
View file @
cda25fef
...
...
@@ -19,4 +19,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
2.0
version
:
2.0
lm_eval/tasks/coqa/utils.py
View file @
cda25fef
...
...
@@ -7,7 +7,7 @@ def doc_to_text(doc):
# Given a passage p, the conversation history {q1, a1, . . . qi−1, ai−1}
# and a question qi, the task is to predict the answer ai
doc_text
=
doc
[
"story"
]
+
"
\n\n
"
for
(
q
,
a
)
in
zip_longest
(
for
q
,
a
in
zip_longest
(
doc
[
"questions"
][
"input_text"
],
doc
[
"answers"
][
"input_text"
][:
-
1
]
):
# omit target answer ai
question
=
f
"Q:
{
q
}
\n\n
"
...
...
@@ -17,7 +17,6 @@ def doc_to_text(doc):
def
doc_to_target
(
doc
):
turn_id
=
len
(
doc
[
"questions"
][
"input_text"
])
# Returns unique answers and valid alternatives (Some questions in CoQA have multiple valid answers).
answers
=
[]
...
...
@@ -71,7 +70,6 @@ def compute_scores(gold_list, pred):
def
process_results
(
doc
,
results
):
gold_list
=
doc_to_target
(
doc
)
pred
=
results
[
0
].
strip
().
split
(
"
\n
"
)[
0
]
...
...
lm_eval/tasks/crows_pairs/crows_pairs_english.yaml
View file @
cda25fef
...
...
@@ -20,4 +20,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
false
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/csatqa/_default_csatqa_yaml
View file @
cda25fef
...
...
@@ -14,4 +14,4 @@ metric_list:
aggregation: mean
higher_is_better: true
metadata:
-
version: 0.0
version: 0.0
lm_eval/tasks/csatqa/_generate_configs.py
View file @
cda25fef
...
...
@@ -21,7 +21,6 @@ def parse_args():
if
__name__
==
"__main__"
:
args
=
parse_args
()
# get filename of base_yaml so we can `"include": ` it in our other YAMLs.
...
...
@@ -30,7 +29,6 @@ if __name__ == "__main__":
base_yaml
=
yaml
.
full_load
(
f
)
for
name
in
tqdm
(
SUBSETS
):
yaml_dict
=
{
"include"
:
base_yaml_name
,
"task"
:
f
"csatqa_
{
args
.
task_prefix
}
_
{
name
}
"
...
...
lm_eval/tasks/drop/default.yaml
View file @
cda25fef
...
...
@@ -21,4 +21,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
2.0
version
:
2.0
lm_eval/tasks/drop/utils.py
View file @
cda25fef
...
...
@@ -62,7 +62,6 @@ def parse_answer(answer):
def
process_results
(
doc
,
results
):
preds
,
golds
=
results
,
doc
[
"answers"
]
max_em
=
0
max_f1
=
0
...
...
lm_eval/tasks/fld/fld_default.yaml
View file @
cda25fef
...
...
@@ -12,3 +12,10 @@ metric_list:
-
metric
:
exact_match
aggregation
:
mean
higher_is_better
:
true
filter_list
:
-
name
:
remove_whitespace
filter
:
-
function
:
remove_whitespace
-
function
:
take_first
metadata
:
version
:
1.0
lm_eval/tasks/glue/cola/default.yaml
View file @
cda25fef
...
...
@@ -13,4 +13,4 @@ doc_to_decontamination_query: sentence
metric_list
:
-
metric
:
mcc
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/mnli/default.yaml
View file @
cda25fef
...
...
@@ -11,4 +11,4 @@ doc_to_choice: ["True", "Neither", "False"]
metric_list
:
-
metric
:
acc
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/mrpc/default.yaml
View file @
cda25fef
...
...
@@ -12,4 +12,4 @@ metric_list:
-
metric
:
acc
-
metric
:
f1
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/qnli/default.yaml
View file @
cda25fef
...
...
@@ -11,4 +11,4 @@ doc_to_choice: ["yes", "no"]
metric_list
:
-
metric
:
acc
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/qqp/default.yaml
View file @
cda25fef
...
...
@@ -12,4 +12,4 @@ metric_list:
-
metric
:
acc
-
metric
:
f1
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/rte/default.yaml
View file @
cda25fef
...
...
@@ -11,4 +11,4 @@ doc_to_choice: ["True", "False"]
metric_list
:
-
metric
:
acc
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/sst2/default.yaml
View file @
cda25fef
...
...
@@ -11,4 +11,4 @@ doc_to_choice: ["negative", "positive"]
metric_list
:
-
metric
:
acc
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/glue/wnli/default.yaml
View file @
cda25fef
...
...
@@ -11,4 +11,4 @@ doc_to_choice: ["False", "True"]
metric_list
:
-
metric
:
acc
metadata
:
-
version
:
2.0
version
:
2.0
lm_eval/tasks/gsm8k/gsm8k-cot-self-consistency.yaml
View file @
cda25fef
...
...
@@ -31,4 +31,4 @@ filter_list:
-
function
:
"
majority_vote"
-
function
:
"
take_first"
metadata
:
-
version
:
0.0
version
:
0.0
lm_eval/tasks/gsm8k/gsm8k-cot.yaml
View file @
cda25fef
...
...
@@ -41,4 +41,4 @@ filter_list:
regex_pattern
:
"
The
answer
is
(
\\
-?[0-9
\\
.
\\
,]+)."
-
function
:
"
take_first"
metadata
:
-
version
:
0.0
version
:
0.0
lm_eval/tasks/gsm8k/gsm8k.yaml
View file @
cda25fef
...
...
@@ -34,4 +34,4 @@ filter_list:
regex_pattern
:
"
####
(
\\
-?[0-9
\\
.
\\
,]+)"
-
function
:
"
take_first"
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/headqa/headqa_en.yaml
View file @
cda25fef
...
...
@@ -20,4 +20,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
Prev
1
2
3
4
5
6
7
8
…
13
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment