Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
a2af2101
Unverified
Commit
a2af2101
authored
Jul 12, 2024
by
Yen-Ting Lin
Committed by
GitHub
Jul 12, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
82cb25c1
d5f39bf8
Changes
1000
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
75 additions
and
22 deletions
+75
-22
lm_eval/tasks/arc_mt/arc_challenge_mt_hu.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_hu.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_is.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_is.yaml
+22
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_it.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_it.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_nb.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_nb.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_pl.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_pl.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_pt.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_pt.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_sv.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_sv.yaml
+3
-0
lm_eval/tasks/arithmetic/README.md
lm_eval/tasks/arithmetic/README.md
+2
-2
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
+1
-1
lm_eval/tasks/asdiv/README.md
lm_eval/tasks/asdiv/README.md
+1
-1
lm_eval/tasks/babi/README.md
lm_eval/tasks/babi/README.md
+5
-1
lm_eval/tasks/basqueglue/README.md
lm_eval/tasks/basqueglue/README.md
+8
-4
lm_eval/tasks/basqueglue/bec.yaml
lm_eval/tasks/basqueglue/bec.yaml
+2
-2
lm_eval/tasks/basqueglue/bhtc.yaml
lm_eval/tasks/basqueglue/bhtc.yaml
+2
-2
lm_eval/tasks/basqueglue/coref.yaml
lm_eval/tasks/basqueglue/coref.yaml
+2
-2
lm_eval/tasks/basqueglue/qnli.yaml
lm_eval/tasks/basqueglue/qnli.yaml
+2
-2
lm_eval/tasks/basqueglue/vaxx.yaml
lm_eval/tasks/basqueglue/vaxx.yaml
+2
-2
lm_eval/tasks/basqueglue/wic.yaml
lm_eval/tasks/basqueglue/wic.yaml
+2
-2
lm_eval/tasks/bbh/README.md
lm_eval/tasks/bbh/README.md
+5
-1
lm_eval/tasks/bbh/_generate_configs.py
lm_eval/tasks/bbh/_generate_configs.py
+1
-0
No files found.
Too many changes to show.
To preserve performance only
1000 of 1000+
files are displayed.
Plain diff
Email patch
lm_eval/tasks/arc_mt/arc_challenge_mt_hu.yaml
0 → 100644
View file @
a2af2101
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_hu
dataset_name
:
hu
lm_eval/tasks/arc_mt/arc_challenge_mt_is.yaml
0 → 100644
View file @
a2af2101
group
:
-
arc_challenge_mt
task
:
arc_challenge_mt_is
dataset_path
:
mideind/icelandic-arc-challenge
output_type
:
multiple_choice
training_split
:
train
validation_split
:
validation
test_split
:
test
doc_to_text
:
"
Question:
{{question}}
\n
Answer:"
doc_to_target
:
"
{{choices.label.index(answerKey)}}"
doc_to_choice
:
"
{{choices.text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
Question:
{{question}}
\n
Answer:"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arc_mt/arc_challenge_mt_it.yaml
0 → 100644
View file @
a2af2101
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_it
dataset_name
:
it
lm_eval/tasks/arc_mt/arc_challenge_mt_nb.yaml
0 → 100644
View file @
a2af2101
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_nb
dataset_name
:
nb
lm_eval/tasks/arc_mt/arc_challenge_mt_pl.yaml
0 → 100644
View file @
a2af2101
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_pl
dataset_name
:
pl
lm_eval/tasks/arc_mt/arc_challenge_mt_pt.yaml
0 → 100644
View file @
a2af2101
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_pt
dataset_name
:
pt
lm_eval/tasks/arc_mt/arc_challenge_mt_sv.yaml
0 → 100644
View file @
a2af2101
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_sv
dataset_name
:
sv
lm_eval/tasks/arithmetic/README.md
View file @
a2af2101
...
...
@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
####
Group
s
####
Tag
s
*
`arithmetic`
: Evaluates
`1dc`
to
`5ds`
...
...
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
View file @
a2af2101
group
:
tag
:
-
arithmetic
task
:
arithmetic_1dc
dataset_path
:
EleutherAI/arithmetic
...
...
lm_eval/tasks/asdiv/README.md
View file @
a2af2101
...
...
@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
...
...
lm_eval/tasks/babi/README.md
View file @
a2af2101
...
...
@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
*
Not part of a group yet
#### Tags
*
No tags applied.
#### Tasks
*
`babi`
...
...
lm_eval/tasks/basqueglue/README.md
View file @
a2af2101
...
...
@@ -43,20 +43,24 @@ Homepage: `https://github.com/hitz-zentroa/latxa`
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
*
`basque-glue`
: First version of the implementation
None.
#### Tags
*
`basque-glue`
: First version of the implementation. Calls all subtasks, but does not average.
#### Tasks
*
`bhtc_v2`
: Topic classification of news extracts with 12 categories.
*
`bec`
: Sentiment analysis on tweets about the campaign for the 2016 Basque elections.
*
`bec
2016eu
`
: Sentiment analysis on tweets about the campaign for the 2016 Basque elections.
*
`vaxx_stance`
: Stance detection on tweets around the anti-vaccine movement.
*
`qnlieu`
: Q&A NLI as in
[
glue/qnli
](
../glue/qnli
)
.
*
`wiceu`
: Word-in-Context as in
[
super_glue/wic
](
../super_glue/wic
)
.
*
`epec_kor
r
ef_bin`
: Correference detection as in
[
super_glue/wsc
](
../super_glue/wsc
)
.
*
`epec_koref_bin`
: Correference detection as in
[
super_glue/wsc
](
../super_glue/wsc
)
.
### Checklist
...
...
lm_eval/tasks/basqueglue/bec.yaml
View file @
a2af2101
group
:
basque-glue
tag
:
basque-glue
task
:
bec2016eu
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
bec
...
...
@@ -13,4 +13,4 @@ metric_list:
aggregation
:
!function
utils.micro_f1_score
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/basqueglue/bhtc.yaml
View file @
a2af2101
group
:
basque-glue
tag
:
basque-glue
task
:
bhtc_v2
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
bhtc
...
...
@@ -13,4 +13,4 @@ metric_list:
aggregation
:
!function
utils.micro_f1_score
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/basqueglue/coref.yaml
View file @
a2af2101
group
:
basque-glue
tag
:
basque-glue
task
:
epec_koref_bin
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
coref
...
...
@@ -13,4 +13,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/basqueglue/qnli.yaml
View file @
a2af2101
group
:
basque-glue
tag
:
basque-glue
task
:
qnlieu
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
qnli
...
...
@@ -13,4 +13,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/basqueglue/vaxx.yaml
View file @
a2af2101
group
:
basque-glue
tag
:
basque-glue
task
:
vaxx_stance
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
vaxx
...
...
@@ -13,4 +13,4 @@ metric_list:
aggregation
:
!function
utils.vaxx_f1_score
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/basqueglue/wic.yaml
View file @
a2af2101
group
:
basque-glue
tag
:
basque-glue
task
:
wiceu
dataset_path
:
orai-nlp/basqueGLUE
dataset_name
:
wic
...
...
@@ -14,4 +14,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
-
version
:
1.0
version
:
1.0
lm_eval/tasks/bbh/README.md
View file @
a2af2101
...
...
@@ -21,15 +21,19 @@ Homepage: https://github.com/suzgunmirac/BIG-Bench-Hard
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
-
`bbh`
: is the same as
`bbh_cot_fewshot`
.
-
`bbh_zeroshot`
-
`bbh_fewshot`
-
`bbh_cot_fewshot`
-
`bbh_cot_zeroshot`
#### Tags
None.
#### Tasks
...
...
lm_eval/tasks/bbh/_generate_configs.py
View file @
a2af2101
"""
Take in a YAML, and output all other splits with this YAML
"""
import
argparse
import
os
import
re
...
...
Prev
1
…
5
6
7
8
9
10
11
12
13
…
50
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment