Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
e4db76cb
Commit
e4db76cb
authored
Jul 09, 2024
by
haileyschoelkopf
Browse files
Merge branch 'main' into multimodal-prototyping
parents
6cc6e9cd
ad80f555
Changes
871
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
42 additions
and
47 deletions
+42
-47
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_general_knowledge.yaml
...asks/arabicmmlu/arabicmmlu_primary_general_knowledge.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_geography.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_geography.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_history.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_history.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_islamic_studies.yaml
.../tasks/arabicmmlu/arabicmmlu_primary_islamic_studies.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_math.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_math.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_natural_science.yaml
.../tasks/arabicmmlu/arabicmmlu_primary_natural_science.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_social_science.yaml
...l/tasks/arabicmmlu/arabicmmlu_primary_social_science.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_prof_law.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_prof_law.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_accounting.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_accounting.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_computer_science.yaml
...al/tasks/arabicmmlu/arabicmmlu_univ_computer_science.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_economics.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_economics.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_management.yaml
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_management.yaml
+2
-3
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_political_science.yaml
...l/tasks/arabicmmlu/arabicmmlu_univ_political_science.yaml
+2
-3
lm_eval/tasks/arc/README.md
lm_eval/tasks/arc/README.md
+5
-1
lm_eval/tasks/arc/arc_easy.yaml
lm_eval/tasks/arc/arc_easy.yaml
+1
-1
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
+1
-1
lm_eval/tasks/arithmetic/README.md
lm_eval/tasks/arithmetic/README.md
+2
-2
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
+1
-1
lm_eval/tasks/asdiv/README.md
lm_eval/tasks/asdiv/README.md
+1
-1
lm_eval/tasks/babi/README.md
lm_eval/tasks/babi/README.md
+5
-1
No files found.
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_general_knowledge.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
General
Knowledge"
"
dataset_name"
:
"
Primary
General
Knowledge"
"
group"
:
"
arabicmmlu_other"
"
tag"
:
"
arabicmmlu_other_tasks"
"
group_alias"
:
"
other"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_general_knowledge"
"
task"
:
"
arabicmmlu_primary_general_knowledge"
"
task_alias"
:
"
Primary
General
Knowledge"
"
task_alias"
:
"
Primary
General
Knowledge"
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_geography.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
Geography"
"
dataset_name"
:
"
Primary
Geography"
"
group"
:
"
arabicmmlu_social_science"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
group_alias"
:
"
social
science"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_geography"
"
task"
:
"
arabicmmlu_primary_geography"
"
task_alias"
:
"
Primary
Geography"
"
task_alias"
:
"
Primary
Geography"
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_history.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
History"
"
dataset_name"
:
"
Primary
History"
"
group"
:
"
arabicmmlu_humanities"
"
tag"
:
"
arabicmmlu_humanities_tasks"
"
group_alias"
:
"
humanities"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_history"
"
task"
:
"
arabicmmlu_primary_history"
"
task_alias"
:
"
Primary
History"
"
task_alias"
:
"
Primary
History"
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_islamic_studies.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
Islamic
Studies"
"
dataset_name"
:
"
Primary
Islamic
Studies"
"
group"
:
"
arabicmmlu_humanities"
"
tag"
:
"
arabicmmlu_humanities_tasks"
"
group_alias"
:
"
humanities"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_islamic_studies"
"
task"
:
"
arabicmmlu_primary_islamic_studies"
"
task_alias"
:
"
Primary
Islamic
Studies"
"
task_alias"
:
"
Primary
Islamic
Studies"
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_math.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
Math"
"
dataset_name"
:
"
Primary
Math"
"
group"
:
"
arabicmmlu_stem"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_math"
"
task"
:
"
arabicmmlu_primary_math"
"
task_alias"
:
"
Primary
Math"
"
task_alias"
:
"
Primary
Math"
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_natural_science.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
Natural
Science"
"
dataset_name"
:
"
Primary
Natural
Science"
"
group"
:
"
arabicmmlu_stem"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_natural_science"
"
task"
:
"
arabicmmlu_primary_natural_science"
"
task_alias"
:
"
Primary
Natural
Science"
"
task_alias"
:
"
Primary
Natural
Science"
lm_eval/tasks/arabicmmlu/arabicmmlu_primary_social_science.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Primary
Social
Science"
"
dataset_name"
:
"
Primary
Social
Science"
"
group"
:
"
arabicmmlu_social_science"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
group_alias"
:
"
social
science"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_primary_social_science"
"
task"
:
"
arabicmmlu_primary_social_science"
"
task_alias"
:
"
Primary
Social
Science"
"
task_alias"
:
"
Primary
Social
Science"
lm_eval/tasks/arabicmmlu/arabicmmlu_prof_law.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Prof
Law"
"
dataset_name"
:
"
Prof
Law"
"
group"
:
"
arabicmmlu_humanities"
"
tag"
:
"
arabicmmlu_humanities_tasks"
"
group_alias"
:
"
humanities"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_prof_law"
"
task"
:
"
arabicmmlu_prof_law"
"
task_alias"
:
"
Prof
Law"
"
task_alias"
:
"
Prof
Law"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_accounting.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Univ
Accounting"
"
dataset_name"
:
"
Univ
Accounting"
"
group"
:
"
arabicmmlu_social_science"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
group_alias"
:
"
social
science"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_univ_accounting"
"
task"
:
"
arabicmmlu_univ_accounting"
"
task_alias"
:
"
Univ
Accounting"
"
task_alias"
:
"
Univ
Accounting"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_computer_science.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Univ
Computer
Science"
"
dataset_name"
:
"
Univ
Computer
Science"
"
group"
:
"
arabicmmlu_stem"
"
tag"
:
"
arabicmmlu_stem_tasks"
"
group_alias"
:
"
stem"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_univ_computer_science"
"
task"
:
"
arabicmmlu_univ_computer_science"
"
task_alias"
:
"
Univ
Computer
Science"
"
task_alias"
:
"
Univ
Computer
Science"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_economics.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Univ
Economics"
"
dataset_name"
:
"
Univ
Economics"
"
group"
:
"
arabicmmlu_social_science"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
group_alias"
:
"
social
science"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_univ_economics"
"
task"
:
"
arabicmmlu_univ_economics"
"
task_alias"
:
"
Univ
Economics"
"
task_alias"
:
"
Univ
Economics"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_management.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Univ
Management"
"
dataset_name"
:
"
Univ
Management"
"
group"
:
"
arabicmmlu_other"
"
tag"
:
"
arabicmmlu_other_tasks"
"
group_alias"
:
"
other"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_univ_management"
"
task"
:
"
arabicmmlu_univ_management"
"
task_alias"
:
"
Univ
Management"
"
task_alias"
:
"
Univ
Management"
lm_eval/tasks/arabicmmlu/arabicmmlu_univ_political_science.yaml
View file @
e4db76cb
"
dataset_name"
:
"
Univ
Political
Science"
"
dataset_name"
:
"
Univ
Political
Science"
"
group"
:
"
arabicmmlu_social_science"
"
tag"
:
"
arabicmmlu_social_science_tasks"
"
group_alias"
:
"
social
science"
"
include"
:
"
_default_arabicmmlu_template_yaml"
"
include"
:
"
_default_template_yaml"
"
task"
:
"
arabicmmlu_univ_political_science"
"
task"
:
"
arabicmmlu_univ_political_science"
"
task_alias"
:
"
Univ
Political
Science"
"
task_alias"
:
"
Univ
Political
Science"
lm_eval/tasks/arc/README.md
View file @
e4db76cb
...
@@ -29,10 +29,14 @@ Homepage: https://allenai.org/data/arc
...
@@ -29,10 +29,14 @@ Homepage: https://allenai.org/data/arc
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
#### Groups
None.
#### Tags
*
`ai2_arc`
: Evaluates
`arc_easy`
and
`arc_challenge`
*
`ai2_arc`
: Evaluates
`arc_easy`
and
`arc_challenge`
#### Tasks
#### Tasks
...
...
lm_eval/tasks/arc/arc_easy.yaml
View file @
e4db76cb
group
:
tag
:
-
ai2_arc
-
ai2_arc
task
:
arc_easy
task
:
arc_easy
dataset_path
:
allenai/ai2_arc
dataset_path
:
allenai/ai2_arc
...
...
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
View file @
e4db76cb
group
:
tag
:
-
arc_challenge_mt
-
arc_challenge_mt
task
:
arc_challenge_mt_fi
task
:
arc_challenge_mt_fi
dataset_path
:
LumiOpen/arc_challenge_mt
dataset_path
:
LumiOpen/arc_challenge_mt
...
...
lm_eval/tasks/arithmetic/README.md
View file @
e4db76cb
...
@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
...
@@ -27,9 +27,9 @@ Homepage: https://github.com/openai/gpt-3/tree/master/data
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
####
Group
s
####
Tag
s
*
`arithmetic`
: Evaluates
`1dc`
to
`5ds`
*
`arithmetic`
: Evaluates
`1dc`
to
`5ds`
...
...
lm_eval/tasks/arithmetic/arithmetic_1dc.yaml
View file @
e4db76cb
group
:
tag
:
-
arithmetic
-
arithmetic
task
:
arithmetic_1dc
task
:
arithmetic_1dc
dataset_path
:
EleutherAI/arithmetic
dataset_path
:
EleutherAI/arithmetic
...
...
lm_eval/tasks/asdiv/README.md
View file @
e4db76cb
...
@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
...
@@ -32,7 +32,7 @@ Homepage: https://github.com/chaochun/nlu-asdiv-dataset
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
#### Groups
...
...
lm_eval/tasks/babi/README.md
View file @
e4db76cb
...
@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
...
@@ -21,12 +21,16 @@ Homepage: https://github.com/facebookarchive/bAbI-tasks
}
}
```
```
### Groups and Tasks
### Groups
, Tags,
and Tasks
#### Groups
#### Groups
*
Not part of a group yet
*
Not part of a group yet
#### Tags
*
No tags applied.
#### Tasks
#### Tasks
*
`babi`
*
`babi`
...
...
Prev
1
2
3
4
5
6
7
8
9
…
44
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment