Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
9b6b0f5e
Unverified
Commit
9b6b0f5e
authored
Jun 25, 2024
by
jonabur
Committed by
GitHub
Jun 25, 2024
Browse files
add arc_challenge_mt (#1900)
* add arc_challenge_mt * add README * add icelandic
parent
0ae3d3eb
Changes
13
Hide whitespace changes
Inline
Side-by-side
Showing
13 changed files
with
87 additions
and
0 deletions
+87
-0
lm_eval/tasks/arc_mt/README.md
lm_eval/tasks/arc_mt/README.md
+12
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_da.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_da.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_de.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_de.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_el.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_el.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_es.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_es.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
+23
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_hu.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_hu.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_is.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_is.yaml
+22
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_it.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_it.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_nb.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_nb.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_pl.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_pl.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_pt.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_pt.yaml
+3
-0
lm_eval/tasks/arc_mt/arc_challenge_mt_sv.yaml
lm_eval/tasks/arc_mt/arc_challenge_mt_sv.yaml
+3
-0
No files found.
lm_eval/tasks/arc_mt/README.md
0 → 100644
View file @
9b6b0f5e
# arc mt
arc mt is an implementation of tasks to support machine translated arc
challenge evals, to improve eval support across a number of additional
languages.
The main page for the effort is
[
here
](
https://huggingface.co/datasets/LumiOpen/arc_challenge_mt
)
and we will
include more data and analysis there.
Initial datasets include a number of European languages, and we plan to expand
more in the future.
lm_eval/tasks/arc_mt/arc_challenge_mt_da.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_da
dataset_name
:
da
lm_eval/tasks/arc_mt/arc_challenge_mt_de.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_de
dataset_name
:
de
lm_eval/tasks/arc_mt/arc_challenge_mt_el.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_el
dataset_name
:
el
lm_eval/tasks/arc_mt/arc_challenge_mt_es.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_es
dataset_name
:
es
lm_eval/tasks/arc_mt/arc_challenge_mt_fi.yaml
0 → 100644
View file @
9b6b0f5e
group
:
-
arc_challenge_mt
task
:
arc_challenge_mt_fi
dataset_path
:
LumiOpen/arc_challenge_mt
dataset_name
:
fi
output_type
:
multiple_choice
training_split
:
train
validation_split
:
validation
test_split
:
test
doc_to_text
:
"
Question:
{{question}}
\n
Answer:"
doc_to_target
:
"
{{choices.label.index(answerKey)}}"
doc_to_choice
:
"
{{choices.text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
Question:
{{question}}
\n
Answer:"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arc_mt/arc_challenge_mt_hu.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_hu
dataset_name
:
hu
lm_eval/tasks/arc_mt/arc_challenge_mt_is.yaml
0 → 100644
View file @
9b6b0f5e
group
:
-
arc_challenge_mt
task
:
arc_challenge_mt_is
dataset_path
:
mideind/icelandic-arc-challenge
output_type
:
multiple_choice
training_split
:
train
validation_split
:
validation
test_split
:
test
doc_to_text
:
"
Question:
{{question}}
\n
Answer:"
doc_to_target
:
"
{{choices.label.index(answerKey)}}"
doc_to_choice
:
"
{{choices.text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
Question:
{{question}}
\n
Answer:"
metric_list
:
-
metric
:
acc
aggregation
:
mean
higher_is_better
:
true
-
metric
:
acc_norm
aggregation
:
mean
higher_is_better
:
true
metadata
:
version
:
1.0
lm_eval/tasks/arc_mt/arc_challenge_mt_it.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_it
dataset_name
:
it
lm_eval/tasks/arc_mt/arc_challenge_mt_nb.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_nb
dataset_name
:
nb
lm_eval/tasks/arc_mt/arc_challenge_mt_pl.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_pl
dataset_name
:
pl
lm_eval/tasks/arc_mt/arc_challenge_mt_pt.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_pt
dataset_name
:
pt
lm_eval/tasks/arc_mt/arc_challenge_mt_sv.yaml
0 → 100644
View file @
9b6b0f5e
include
:
arc_challenge_mt_fi.yaml
task
:
arc_challenge_mt_sv
dataset_name
:
sv
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment