Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
7d09b24c
Commit
7d09b24c
authored
Jul 03, 2024
by
haileyschoelkopf
Browse files
fix alllll the merge conflicts
parents
96dfe976
6348b947
Changes
395
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
36 additions
and
21 deletions
+36
-21
lm_eval/tasks/model_written_evals/sycophancy/sycophancy_on_philpapers2020.yaml
...ritten_evals/sycophancy/sycophancy_on_philpapers2020.yaml
+1
-1
lm_eval/tasks/model_written_evals/sycophancy/sycophancy_on_political_typology_quiz.yaml
...als/sycophancy/sycophancy_on_political_typology_quiz.yaml
+1
-1
lm_eval/tasks/model_written_evals/winogenerated/winogenerated.yaml
...asks/model_written_evals/winogenerated/winogenerated.yaml
+1
-1
lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
+1
-1
lm_eval/tasks/okapi/hellaswag_multilingual/_hellaswag_yaml
lm_eval/tasks/okapi/hellaswag_multilingual/_hellaswag_yaml
+1
-1
lm_eval/tasks/okapi/mmlu_multilingual/_default_yaml
lm_eval/tasks/okapi/mmlu_multilingual/_default_yaml
+1
-1
lm_eval/tasks/okapi/truthfulqa_multilingual/_truthfulqa_mc1_yaml
.../tasks/okapi/truthfulqa_multilingual/_truthfulqa_mc1_yaml
+1
-1
lm_eval/tasks/paws-x/_pawsx.yaml
lm_eval/tasks/paws-x/_pawsx.yaml
+15
-0
lm_eval/tasks/paws-x/pawsx_template_yaml
lm_eval/tasks/paws-x/pawsx_template_yaml
+0
-1
lm_eval/tasks/pile/pile_arxiv.yaml
lm_eval/tasks/pile/pile_arxiv.yaml
+0
-2
lm_eval/tasks/polemo2/polemo2_in.yaml
lm_eval/tasks/polemo2/polemo2_in.yaml
+1
-1
lm_eval/tasks/qa4mre/qa4mre_2011.yaml
lm_eval/tasks/qa4mre/qa4mre_2011.yaml
+1
-1
lm_eval/tasks/qasper/bool.yaml
lm_eval/tasks/qasper/bool.yaml
+1
-1
lm_eval/tasks/qasper/freeform.yaml
lm_eval/tasks/qasper/freeform.yaml
+1
-1
lm_eval/tasks/squad_completion/task.py
lm_eval/tasks/squad_completion/task.py
+1
-1
lm_eval/tasks/storycloze/storycloze_2016.yaml
lm_eval/tasks/storycloze/storycloze_2016.yaml
+1
-1
lm_eval/tasks/storycloze/storycloze_2018.yaml
lm_eval/tasks/storycloze/storycloze_2018.yaml
+1
-1
lm_eval/tasks/super_glue/README.md
lm_eval/tasks/super_glue/README.md
+5
-1
lm_eval/tasks/swde/task.py
lm_eval/tasks/swde/task.py
+1
-1
lm_eval/tasks/translation/iwslt2017_ar-en.yaml
lm_eval/tasks/translation/iwslt2017_ar-en.yaml
+1
-2
No files found.
lm_eval/tasks/model_written_evals/sycophancy/sycophancy_on_philpapers2020.yaml
View file @
7d09b24c
group
:
sycophancy
tag
:
sycophancy
task
:
sycophancy_on_philpapers2020
dataset_path
:
EleutherAI/sycophancy
dataset_name
:
sycophancy_on_philpapers2020
...
...
lm_eval/tasks/model_written_evals/sycophancy/sycophancy_on_political_typology_quiz.yaml
View file @
7d09b24c
group
:
sycophancy
tag
:
sycophancy
task
:
sycophancy_on_political_typology_quiz
dataset_path
:
EleutherAI/sycophancy
dataset_name
:
sycophancy_on_political_typology_quiz
...
...
lm_eval/tasks/model_written_evals/winogenerated/
_templ
ate
_
yaml
→
lm_eval/tasks/model_written_evals/winogenerated/
winogener
ate
d.
yaml
View file @
7d09b24c
group
: winogenerated
tag
:
winogenerated
dataset_path
:
EleutherAI/winogenerated
output_type
:
multiple_choice
validation_split
:
validation
...
...
lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
View file @
7d09b24c
group
:
tag
:
- arc_multilingual
dataset_path: null
dataset_name: null
...
...
lm_eval/tasks/okapi/hellaswag_multilingual/_hellaswag_yaml
View file @
7d09b24c
group
:
tag
:
- hellaswag_multilingual
dataset_path: null
dataset_name: null
...
...
lm_eval/tasks/okapi/mmlu_multilingual/_default_yaml
View file @
7d09b24c
group
:
tag
:
- m_mmlu
dataset_path: alexandrainst/m_mmlu
test_split: test
...
...
lm_eval/tasks/okapi/truthfulqa_multilingual/_truthfulqa_mc1_yaml
View file @
7d09b24c
group
:
tag
:
- truthfulqa_multilingual
dataset_path: null
dataset_name: null
...
...
lm_eval/tasks/paws-x/_pawsx.yaml
0 → 100644
View file @
7d09b24c
group
:
pawsx
task
:
-
paws_en
-
paws_de
-
paws_es
-
paws_fr
-
paws_ja
-
paws_ko
-
paws_zh
aggregate_metric_list
:
-
metric
:
acc
aggregation
:
mean
weight_by_size
:
true
metadata
:
version
:
0.0
lm_eval/tasks/paws-x/pawsx_template_yaml
View file @
7d09b24c
# This file will be included in the generated language-specific task configs.
# It doesn't have a yaml file extension as it is not meant to be imported directly
# by the harness.
group: pawsx
task: null
dataset_path: paws-x
dataset_name: null
...
...
lm_eval/tasks/pile/pile_arxiv.yaml
View file @
7d09b24c
group
:
-
pile
task
:
pile_arxiv
dataset_path
:
EleutherAI/pile
dataset_name
:
pile_arxiv
...
...
lm_eval/tasks/polemo2/polemo2_in.yaml
View file @
7d09b24c
group
:
tag
:
-
polemo2
task
:
polemo2_in
dataset_path
:
allegro/klej-polemo2-in
...
...
lm_eval/tasks/qa4mre/qa4mre_2011.yaml
View file @
7d09b24c
group
:
tag
:
-
qa4mre
task
:
qa4mre_2011
dataset_path
:
qa4mre
...
...
lm_eval/tasks/qasper/bool.yaml
View file @
7d09b24c
group
:
qasper
tag
:
qasper
task
:
qasper_bool
dataset_path
:
allenai/qasper
output_type
:
multiple_choice
...
...
lm_eval/tasks/qasper/freeform.yaml
View file @
7d09b24c
group
:
qasper
tag
:
qasper
task
:
qasper_freeform
dataset_path
:
allenai/qasper
output_type
:
generate_until
...
...
lm_eval/tasks/squad_completion/task.py
View file @
7d09b24c
...
...
@@ -12,7 +12,7 @@ class SQUADCompletion(ConfigurableTask):
DATASET_PATH
=
"hazyresearch/based-squad"
DATASET_NAME
=
"default"
def
__init__
(
self
):
def
__init__
(
self
,
**
kwargs
):
super
().
__init__
(
config
=
{
"metadata"
:
{
"version"
:
self
.
VERSION
}})
def
has_training_docs
(
self
):
...
...
lm_eval/tasks/storycloze/storycloze_2016.yaml
View file @
7d09b24c
group
:
storycloze
tag
:
storycloze
task
:
storycloze_2016
dataset_path
:
story_cloze
dataset_name
:
2016
...
...
lm_eval/tasks/storycloze/storycloze_2018.yaml
View file @
7d09b24c
group
:
storycloze
tag
:
storycloze
task
:
storycloze_2018
dataset_path
:
story_cloze
dataset_name
:
2018
...
...
lm_eval/tasks/super_glue/README.md
View file @
7d09b24c
...
...
@@ -26,10 +26,14 @@ Homepage: https://super.gluebenchmark.com/
}
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
None.
#### Tags
*
`super-glue-lm-eval-v1`
: SuperGLUE eval adapted from LM Eval V1
*
`super-glue-t5-prompt`
: SuperGLUE prompt and evaluation that matches the T5 paper (if using accelerate, will error if record is included.)
...
...
lm_eval/tasks/swde/task.py
View file @
7d09b24c
...
...
@@ -12,7 +12,7 @@ class SWDE(ConfigurableTask):
DATASET_PATH
=
"hazyresearch/based-swde-v2"
DATASET_NAME
=
"default"
def
__init__
(
self
):
def
__init__
(
self
,
**
kwargs
):
super
().
__init__
(
config
=
{
"metadata"
:
{
"version"
:
self
.
VERSION
}})
def
has_training_docs
(
self
):
...
...
lm_eval/tasks/translation/iwslt2017_ar-en.yaml
View file @
7d09b24c
...
...
@@ -5,8 +5,7 @@ doc_to_target: ' {{translation["en"]}}'
doc_to_text
:
'
Arabic
phrase:
{{translation["ar"]}}
English
phrase:'
group
:
-
generate_until
tag
:
-
translation
-
iwslt2017
include
:
wmt_common_yaml
...
...
Prev
1
…
14
15
16
17
18
19
20
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment