Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
51519e40
Commit
51519e40
authored
Jun 25, 2024
by
haileyschoelkopf
Browse files
add many explicit group configs
parent
44a602ab
Changes
33
Hide whitespace changes
Inline
Side-by-side
Showing
13 changed files
with
12 additions
and
27 deletions
+12
-27
lm_eval/tasks/agieval/lsat-rc.yaml
lm_eval/tasks/agieval/lsat-rc.yaml
+0
-4
lm_eval/tasks/agieval/math.yaml
lm_eval/tasks/agieval/math.yaml
+0
-3
lm_eval/tasks/agieval/sat-en-without-passage.yaml
lm_eval/tasks/agieval/sat-en-without-passage.yaml
+0
-4
lm_eval/tasks/agieval/sat-en.yaml
lm_eval/tasks/agieval/sat-en.yaml
+0
-4
lm_eval/tasks/agieval/sat-math.yaml
lm_eval/tasks/agieval/sat-math.yaml
+0
-4
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
+1
-1
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
+1
-1
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
+1
-1
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
+1
-1
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
+1
-1
lm_eval/tasks/storycloze/storycloze_2016.yaml
lm_eval/tasks/storycloze/storycloze_2016.yaml
+1
-1
lm_eval/tasks/storycloze/storycloze_2018.yaml
lm_eval/tasks/storycloze/storycloze_2018.yaml
+1
-1
lm_eval/tasks/super_glue/README.md
lm_eval/tasks/super_glue/README.md
+5
-1
No files found.
lm_eval/tasks/agieval/lsat-rc.yaml
View file @
51519e40
include
:
aqua-rat.yaml
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_lsat_rc
task
:
agieval_lsat_rc
dataset_path
:
hails/agieval-lsat-rc
dataset_path
:
hails/agieval-lsat-rc
lm_eval/tasks/agieval/math.yaml
View file @
51519e40
group
:
-
agieval
-
agieval_en
task
:
agieval_math
task
:
agieval_math
dataset_path
:
hails/agieval-math
dataset_path
:
hails/agieval-math
dataset_name
:
null
dataset_name
:
null
...
...
lm_eval/tasks/agieval/sat-en-without-passage.yaml
View file @
51519e40
include
:
aqua-rat.yaml
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_sat_en_without_passage
task
:
agieval_sat_en_without_passage
dataset_path
:
hails/agieval-sat-en-without-passage
dataset_path
:
hails/agieval-sat-en-without-passage
lm_eval/tasks/agieval/sat-en.yaml
View file @
51519e40
include
:
aqua-rat.yaml
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_sat_en
task
:
agieval_sat_en
dataset_path
:
hails/agieval-sat-en
dataset_path
:
hails/agieval-sat-en
lm_eval/tasks/agieval/sat-math.yaml
View file @
51519e40
include
:
aqua-rat.yaml
include
:
aqua-rat.yaml
group
:
-
agieval
-
agieval_nous
-
agieval_en
task
:
agieval_sat_math
task
:
agieval_sat_math
dataset_path
:
hails/agieval-sat-math
dataset_path
:
hails/agieval-sat-math
lm_eval/tasks/bbh/cot_fewshot/_bbh.yaml
View file @
51519e40
...
@@ -27,7 +27,7 @@ task:
...
@@ -27,7 +27,7 @@ task:
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_word_sorting
-
bbh_cot_fewshot_word_sorting
aggregate_metric
:
aggregate_metric
_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
weight_by_size
:
true
weight_by_size
:
true
...
...
lm_eval/tasks/bbh/cot_fewshot/_bbh_cot_fewshot.yaml
View file @
51519e40
...
@@ -27,7 +27,7 @@ task:
...
@@ -27,7 +27,7 @@ task:
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_tracking_shuffled_objects_three_objects
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_web_of_lies
-
bbh_cot_fewshot_word_sorting
-
bbh_cot_fewshot_word_sorting
aggregate_metric
:
aggregate_metric
_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
weight_by_size
:
true
weight_by_size
:
true
...
...
lm_eval/tasks/bbh/cot_zeroshot/_bbh_cot_zeroshot.yaml
View file @
51519e40
...
@@ -27,7 +27,7 @@ task:
...
@@ -27,7 +27,7 @@ task:
-
bbh_cot_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_cot_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_cot_zeroshot_web_of_lies
-
bbh_cot_zeroshot_web_of_lies
-
bbh_cot_zeroshot_word_sorting
-
bbh_cot_zeroshot_word_sorting
aggregate_metric
:
aggregate_metric
_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
weight_by_size
:
true
weight_by_size
:
true
...
...
lm_eval/tasks/bbh/fewshot/_bbh_fewshot.yaml
View file @
51519e40
...
@@ -27,7 +27,7 @@ task:
...
@@ -27,7 +27,7 @@ task:
-
bbh_fewshot_tracking_shuffled_objects_three_objects
-
bbh_fewshot_tracking_shuffled_objects_three_objects
-
bbh_fewshot_web_of_lies
-
bbh_fewshot_web_of_lies
-
bbh_fewshot_word_sorting
-
bbh_fewshot_word_sorting
aggregate_metric
:
aggregate_metric
_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
weight_by_size
:
true
weight_by_size
:
true
...
...
lm_eval/tasks/bbh/zeroshot/_bbh_zeroshot.yaml
View file @
51519e40
...
@@ -27,7 +27,7 @@ task:
...
@@ -27,7 +27,7 @@ task:
-
bbh_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_zeroshot_tracking_shuffled_objects_three_objects
-
bbh_zeroshot_web_of_lies
-
bbh_zeroshot_web_of_lies
-
bbh_zeroshot_word_sorting
-
bbh_zeroshot_word_sorting
aggregate_metric
:
aggregate_metric
_list
:
-
metric
:
exact_match
-
metric
:
exact_match
aggregation
:
mean
aggregation
:
mean
weight_by_size
:
true
weight_by_size
:
true
...
...
lm_eval/tasks/storycloze/storycloze_2016.yaml
View file @
51519e40
group
:
storycloze
tag
:
storycloze
task
:
storycloze_2016
task
:
storycloze_2016
dataset_path
:
story_cloze
dataset_path
:
story_cloze
dataset_name
:
2016
dataset_name
:
2016
...
...
lm_eval/tasks/storycloze/storycloze_2018.yaml
View file @
51519e40
group
:
storycloze
tag
:
storycloze
task
:
storycloze_2018
task
:
storycloze_2018
dataset_path
:
story_cloze
dataset_path
:
story_cloze
dataset_name
:
2018
dataset_name
:
2018
...
...
lm_eval/tasks/super_glue/README.md
View file @
51519e40
...
@@ -26,10 +26,14 @@ Homepage: https://super.gluebenchmark.com/
...
@@ -26,10 +26,14 @@ Homepage: https://super.gluebenchmark.com/
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
#### Groups
None.
#### Tags
*
`super-glue-lm-eval-v1`
: SuperGLUE eval adapted from LM Eval V1
*
`super-glue-lm-eval-v1`
: SuperGLUE eval adapted from LM Eval V1
*
`super-glue-t5-prompt`
: SuperGLUE prompt and evaluation that matches the T5 paper (if using accelerate, will error if record is included.)
*
`super-glue-t5-prompt`
: SuperGLUE prompt and evaluation that matches the T5 paper (if using accelerate, will error if record is included.)
...
...
Prev
1
2
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment