Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
b13753cd
Commit
b13753cd
authored
Jan 22, 2024
by
haileyschoelkopf
Browse files
Merge branch 'main' into fix-task-table
parents
8ea9c59d
5c25dd55
Changes
232
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
108 additions
and
11 deletions
+108
-11
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ro.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ro.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ru.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ru.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sk.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sk.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sr.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sr.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sv.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sv.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ta.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ta.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_te.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_te.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_uk.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_uk.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_vi.yaml
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_vi.yaml
+6
-0
lm_eval/tasks/okapi/hellaswag_multilingual/utils.py
lm_eval/tasks/okapi/hellaswag_multilingual/utils.py
+24
-0
lm_eval/tasks/polemo2/polemo2_in.yaml
lm_eval/tasks/polemo2/polemo2_in.yaml
+3
-2
lm_eval/tasks/qasper/freeform.yaml
lm_eval/tasks/qasper/freeform.yaml
+1
-1
lm_eval/tasks/race/README.md
lm_eval/tasks/race/README.md
+19
-1
lm_eval/tasks/scrolls/task.py
lm_eval/tasks/scrolls/task.py
+1
-1
lm_eval/tasks/squadv2/task.py
lm_eval/tasks/squadv2/task.py
+1
-1
lm_eval/tasks/translation/wmt_common_yaml
lm_eval/tasks/translation/wmt_common_yaml
+1
-1
lm_eval/tasks/triviaqa/default.yaml
lm_eval/tasks/triviaqa/default.yaml
+1
-1
lm_eval/tasks/truthfulqa/truthfulqa_gen.yaml
lm_eval/tasks/truthfulqa/truthfulqa_gen.yaml
+1
-1
lm_eval/tasks/unscramble/anagrams1.yaml
lm_eval/tasks/unscramble/anagrams1.yaml
+1
-1
lm_eval/tasks/unscramble/anagrams2.yaml
lm_eval/tasks/unscramble/anagrams2.yaml
+1
-1
No files found.
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ro.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_ro
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
ro
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ru.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_ru
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
ru
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sk.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_sk
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
sk
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sr.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_sr
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
sr
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_sv.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_sv
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
sv
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_ta.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_ta
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
ta
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_te.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_te
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
te
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_uk.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_uk
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
uk
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/hellaswag_vi.yaml
0 → 100644
View file @
b13753cd
include
:
_hellaswag_yaml
task
:
hellaswag_vi
dataset_path
:
alexandrainst/m_hellaswag
dataset_name
:
vi
training_split
:
null
validation_split
:
val
lm_eval/tasks/okapi/hellaswag_multilingual/utils.py
0 → 100644
View file @
b13753cd
import
datasets
import
re
def
preprocess
(
text
):
text
=
text
.
strip
()
# NOTE: Brackets are artifacts of the WikiHow dataset portion of HellaSwag.
text
=
text
.
replace
(
" [title]"
,
". "
)
text
=
re
.
sub
(
"
\\
[.*?
\\
]"
,
""
,
text
)
text
=
text
.
replace
(
" "
,
" "
)
return
text
def
process_docs
(
dataset
:
datasets
.
Dataset
)
->
datasets
.
Dataset
:
def
_process_doc
(
doc
):
ctx
=
doc
[
"ctx_a"
]
+
" "
+
doc
[
"ctx_b"
].
capitalize
()
out_doc
=
{
"query"
:
preprocess
(
doc
[
"activity_label"
]
+
": "
+
ctx
),
"choices"
:
[
preprocess
(
ending
)
for
ending
in
doc
[
"endings"
]],
"gold"
:
int
(
doc
[
"label"
]),
}
return
out_doc
return
dataset
.
map
(
_process_doc
)
lm_eval/tasks/polemo2/polemo2_in.yaml
View file @
b13753cd
...
...
@@ -2,7 +2,7 @@ group:
-
polemo2
task
:
polemo2_in
dataset_path
:
allegro/klej-polemo2-in
dataset_name
:
klej-polemo2-in
dataset_name
:
null
output_type
:
generate_until
training_split
:
train
validation_split
:
validation
...
...
@@ -41,5 +41,6 @@ metric_list:
-
metric
:
accuracy
aggregation
:
mean
higher_is_better
:
true
hf_evaluate
:
true
metadata
:
version
:
0
.0
version
:
1
.0
lm_eval/tasks/qasper/freeform.yaml
View file @
b13753cd
...
...
@@ -15,4 +15,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1
.0
version
:
2
.0
lm_eval/tasks/race/README.md
View file @
b13753cd
...
...
@@ -17,7 +17,25 @@ Homepage: https://www.cs.cmu.edu/~glai1/data/race/
### Citation
```
BibTeX-formatted citation goes here
@inproceedings{lai-etal-2017-race,
title = "{RACE}: Large-scale {R}e{A}ding Comprehension Dataset From Examinations",
author = "Lai, Guokun and
Xie, Qizhe and
Liu, Hanxiao and
Yang, Yiming and
Hovy, Eduard",
editor = "Palmer, Martha and
Hwa, Rebecca and
Riedel, Sebastian",
booktitle = "Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing",
month = sep,
year = "2017",
address = "Copenhagen, Denmark",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D17-1082",
doi = "10.18653/v1/D17-1082",
pages = "785--794"
}
```
### Groups and Tasks
...
...
lm_eval/tasks/scrolls/task.py
View file @
b13753cd
...
...
@@ -108,7 +108,7 @@ def _num_cpu_cores():
class
_SCROLLSTask
(
Task
):
VERSION
=
1
VERSION
=
2
DATASET_PATH
=
"tau/scrolls"
DATASET_NAME
=
None
PRUNE_TOKENIZERS
=
None
...
...
lm_eval/tasks/squadv2/task.py
View file @
b13753cd
...
...
@@ -49,7 +49,7 @@ def _squad_agg(key, items):
@
register_task
(
"squadv2"
)
class
SQuAD2
(
Task
):
VERSION
=
2
VERSION
=
3
DATASET_PATH
=
"squad_v2"
DATASET_NAME
=
None
...
...
lm_eval/tasks/translation/wmt_common_yaml
View file @
b13753cd
...
...
@@ -14,4 +14,4 @@ generation_kwargs:
temperature: 0.0
repeats: 1
metadata:
version:
0
.0
version:
1
.0
lm_eval/tasks/triviaqa/default.yaml
View file @
b13753cd
...
...
@@ -28,4 +28,4 @@ metric_list:
ignore_case
:
true
ignore_punctuation
:
true
metadata
:
version
:
2
.0
version
:
3
.0
lm_eval/tasks/truthfulqa/truthfulqa_gen.yaml
View file @
b13753cd
...
...
@@ -76,4 +76,4 @@ metric_list:
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
2
.0
version
:
3
.0
lm_eval/tasks/unscramble/anagrams1.yaml
View file @
b13753cd
...
...
@@ -17,4 +17,4 @@ metric_list:
ignore_case
:
false
ignore_punctuation
:
false
metadata
:
version
:
1
.0
version
:
2
.0
lm_eval/tasks/unscramble/anagrams2.yaml
View file @
b13753cd
...
...
@@ -17,4 +17,4 @@ metric_list:
ignore_case
:
false
ignore_punctuation
:
false
metadata
:
version
:
1
.0
version
:
2
.0
Prev
1
…
7
8
9
10
11
12
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment