Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e4db76cb
Commit
e4db76cb
authored
Jul 09, 2024
by
haileyschoelkopf
Browse files
Merge branch 'main' into multimodal-prototyping
parents
6cc6e9cd
ad80f555
Changes
871
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
36 additions
and
20 deletions
+36
-20
lm_eval/tasks/model_written_evals/sycophancy/sycophancy_on_political_typology_quiz.yaml
...als/sycophancy/sycophancy_on_political_typology_quiz.yaml
+1
-1
lm_eval/tasks/model_written_evals/winogenerated/winogenerated.yaml
...asks/model_written_evals/winogenerated/winogenerated.yaml
+1
-1
lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
+1
-1
lm_eval/tasks/okapi/hellaswag_multilingual/_hellaswag_yaml
lm_eval/tasks/okapi/hellaswag_multilingual/_hellaswag_yaml
+1
-1
lm_eval/tasks/okapi/mmlu_multilingual/_default_yaml
lm_eval/tasks/okapi/mmlu_multilingual/_default_yaml
+1
-1
lm_eval/tasks/okapi/truthfulqa_multilingual/_truthfulqa_mc1_yaml
.../tasks/okapi/truthfulqa_multilingual/_truthfulqa_mc1_yaml
+1
-1
lm_eval/tasks/paloma/_paloma_template
lm_eval/tasks/paloma/_paloma_template
+1
-1
lm_eval/tasks/paws-x/_pawsx.yaml
lm_eval/tasks/paws-x/_pawsx.yaml
+15
-0
lm_eval/tasks/paws-x/pawsx_template_yaml
lm_eval/tasks/paws-x/pawsx_template_yaml
+0
-1
lm_eval/tasks/pile/pile_arxiv.yaml
lm_eval/tasks/pile/pile_arxiv.yaml
+0
-2
lm_eval/tasks/polemo2/polemo2_in.yaml
lm_eval/tasks/polemo2/polemo2_in.yaml
+1
-1
lm_eval/tasks/qa4mre/qa4mre_2011.yaml
lm_eval/tasks/qa4mre/qa4mre_2011.yaml
+1
-1
lm_eval/tasks/qasper/bool.yaml
lm_eval/tasks/qasper/bool.yaml
+1
-1
lm_eval/tasks/qasper/freeform.yaml
lm_eval/tasks/qasper/freeform.yaml
+1
-1
lm_eval/tasks/scrolls/task.py
lm_eval/tasks/scrolls/task.py
+1
-1
lm_eval/tasks/squad_completion/task.py
lm_eval/tasks/squad_completion/task.py
+1
-1
lm_eval/tasks/squadv2/task.py
lm_eval/tasks/squadv2/task.py
+1
-1
lm_eval/tasks/storycloze/storycloze_2016.yaml
lm_eval/tasks/storycloze/storycloze_2016.yaml
+1
-1
lm_eval/tasks/storycloze/storycloze_2018.yaml
lm_eval/tasks/storycloze/storycloze_2018.yaml
+1
-1
lm_eval/tasks/super_glue/README.md
lm_eval/tasks/super_glue/README.md
+5
-1
No files found.
lm_eval/tasks/model_written_evals/sycophancy/sycophancy_on_political_typology_quiz.yaml
View file @
e4db76cb
group
:
sycophancy
tag
:
sycophancy
task
:
sycophancy_on_political_typology_quiz
dataset_path
:
EleutherAI/sycophancy
dataset_name
:
sycophancy_on_political_typology_quiz
...
...
lm_eval/tasks/model_written_evals/winogenerated/
_templ
ate
_
yaml
→
lm_eval/tasks/model_written_evals/winogenerated/
winogener
ate
d.
yaml
View file @
e4db76cb
group
: winogenerated
tag
:
winogenerated
dataset_path
:
EleutherAI/winogenerated
output_type
:
multiple_choice
validation_split
:
validation
...
...
lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
View file @
e4db76cb
group
:
tag
:
- arc_multilingual
dataset_path: null
dataset_name: null
...
...
lm_eval/tasks/okapi/hellaswag_multilingual/_hellaswag_yaml
View file @
e4db76cb
group
:
tag
:
- hellaswag_multilingual
dataset_path: null
dataset_name: null
...
...
lm_eval/tasks/okapi/mmlu_multilingual/_default_yaml
View file @
e4db76cb
group
:
tag
:
- m_mmlu
dataset_path: alexandrainst/m_mmlu
test_split: test
...
...
lm_eval/tasks/okapi/truthfulqa_multilingual/_truthfulqa_mc1_yaml
View file @
e4db76cb
group
:
tag
:
- truthfulqa_multilingual
dataset_path: null
dataset_name: null
...
...
lm_eval/tasks/paloma/_paloma_template
View file @
e4db76cb
group
:
tag
:
- paloma
dataset_path: allenai/paloma
output_type: loglikelihood_rolling
...
...
lm_eval/tasks/paws-x/_pawsx.yaml
0 → 100644
View file @
e4db76cb
group
:
pawsx
task
:
-
paws_en
-
paws_de
-
paws_es
-
paws_fr
-
paws_ja
-
paws_ko
-
paws_zh
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
0.0
lm_eval/tasks/paws-x/pawsx_template_yaml
View file @
e4db76cb
# This file will be included in the generated language-specific task configs.
# It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness.
group: pawsx
task: null
dataset_path: paws-x
dataset_name: null
...
...
lm_eval/tasks/pile/pile_arxiv.yaml
View file @
e4db76cb
group
:
-
pile
task
:
pile_arxiv
dataset_path
:
EleutherAI/pile
dataset_name
:
pile_arxiv
...
...
lm_eval/tasks/polemo2/polemo2_in.yaml
View file @
e4db76cb
group
:
tag
:
-
polemo2
task
:
polemo2_in
dataset_path
:
allegro/klej-polemo2-in
...
...
lm_eval/tasks/qa4mre/qa4mre_2011.yaml
View file @
e4db76cb
group
:
tag
:
-
qa4mre
task
:
qa4mre_2011
dataset_path
:
qa4mre
...
...
lm_eval/tasks/qasper/bool.yaml
View file @
e4db76cb
group
:
qasper
tag
:
qasper
task
:
qasper_bool
dataset_path
:
allenai/qasper
output_type
:
multiple_choice
...
...
lm_eval/tasks/qasper/freeform.yaml
View file @
e4db76cb
group
:
qasper
tag
:
qasper
task
:
qasper_freeform
dataset_path
:
allenai/qasper
output_type
:
generate_until
...
...
lm_eval/tasks/scrolls/task.py
View file @
e4db76cb
...
...
@@ -116,7 +116,7 @@ class _SCROLLSTask(ConfigurableTask):
PRUNE_MAX_TOKENS
=
None
PRUNE_NUM_PROC
=
None
def
__init__
(
self
):
def
__init__
(
self
,
config
=
None
):
super
().
__init__
(
config
=
{
"metadata"
:
{
"version"
:
self
.
VERSION
}})
if
self
.
DATASET_NAME
is
not
None
:
self
.
metric
=
load_metric
(
_download_metric
(),
config_name
=
self
.
DATASET_NAME
)
...
...
lm_eval/tasks/squad_completion/task.py
View file @
e4db76cb
...
...
@@ -12,7 +12,7 @@ class SQUADCompletion(ConfigurableTask):
DATASET_PATH
=
"hazyresearch/based-squad"
DATASET_NAME
=
"default"
def
__init__
(
self
):
def
__init__
(
self
,
**
kwargs
):
super
().
__init__
(
config
=
{
"metadata"
:
{
"version"
:
self
.
VERSION
}})
def
has_training_docs
(
self
):
...
...
lm_eval/tasks/squadv2/task.py
View file @
e4db76cb
...
...
@@ -52,7 +52,7 @@ class SQuAD2(ConfigurableTask):
DATASET_PATH
=
"squad_v2"
DATASET_NAME
=
None
def
__init__
(
self
):
def
__init__
(
self
,
config
=
None
):
super
().
__init__
(
config
=
{
"metadata"
:
{
"version"
:
self
.
VERSION
}})
# HF changed squad on us so we have to make sure we aren't running the old one
...
...
lm_eval/tasks/storycloze/storycloze_2016.yaml
View file @
e4db76cb
group
:
storycloze
tag
:
storycloze
task
:
storycloze_2016
dataset_path
:
story_cloze
dataset_name
:
2016
...
...
lm_eval/tasks/storycloze/storycloze_2018.yaml
View file @
e4db76cb
group
:
storycloze
tag
:
storycloze
task
:
storycloze_2018
dataset_path
:
story_cloze
dataset_name
:
2018
...
...
lm_eval/tasks/super_glue/README.md
View file @
e4db76cb
...
...
@@ -26,10 +26,14 @@ Homepage: https://super.gluebenchmark.com/
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
None.
#### Tags
*
`super-glue-lm-eval-v1`
: SuperGLUE eval adapted from LM Eval V1
*
`super-glue-t5-prompt`
: SuperGLUE prompt and evaluation that matches the T5 paper (if using accelerate, will error if record is included.)
...
...
Prev
1
…
35
36
37
38
39
40
41
42
43
44
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment