Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
2106fbeb
Commit
2106fbeb
authored
Jan 15, 2025
by
Baber
Browse files
Merge branch 'main' into mathvista
# Conflicts: # lm_eval/models/openai_completions.py
parents
4354fe46
703fbffd
Changes
574
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
160 additions
and
0 deletions
+160
-0
lm_eval/tasks/mlqa/generate_tasks.py
lm_eval/tasks/mlqa/generate_tasks.py
+48
-0
lm_eval/tasks/mlqa/mlqa_ar_ar.yaml
lm_eval/tasks/mlqa/mlqa_ar_ar.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_ar_de.yaml
lm_eval/tasks/mlqa/mlqa_ar_de.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_ar_en.yaml
lm_eval/tasks/mlqa/mlqa_ar_en.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_ar_es.yaml
lm_eval/tasks/mlqa/mlqa_ar_es.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_ar_hi.yaml
lm_eval/tasks/mlqa/mlqa_ar_hi.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_ar_vi.yaml
lm_eval/tasks/mlqa/mlqa_ar_vi.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_ar_zh.yaml
lm_eval/tasks/mlqa/mlqa_ar_zh.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_common_yaml
lm_eval/tasks/mlqa/mlqa_common_yaml
+22
-0
lm_eval/tasks/mlqa/mlqa_de_ar.yaml
lm_eval/tasks/mlqa/mlqa_de_ar.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_de_de.yaml
lm_eval/tasks/mlqa/mlqa_de_de.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_de_en.yaml
lm_eval/tasks/mlqa/mlqa_de_en.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_de_es.yaml
lm_eval/tasks/mlqa/mlqa_de_es.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_de_hi.yaml
lm_eval/tasks/mlqa/mlqa_de_hi.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_de_vi.yaml
lm_eval/tasks/mlqa/mlqa_de_vi.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_de_zh.yaml
lm_eval/tasks/mlqa/mlqa_de_zh.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_en_ar.yaml
lm_eval/tasks/mlqa/mlqa_en_ar.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_en_de.yaml
lm_eval/tasks/mlqa/mlqa_en_de.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_en_en.yaml
lm_eval/tasks/mlqa/mlqa_en_en.yaml
+5
-0
lm_eval/tasks/mlqa/mlqa_en_es.yaml
lm_eval/tasks/mlqa/mlqa_en_es.yaml
+5
-0
No files found.
lm_eval/tasks/mlqa/generate_tasks.py
0 → 100644
View file @
2106fbeb
# ruff: noqa: E731, E741
"""
Script to generate task YAMLs for the mlqa dataset.
Based on `tasks/bigbench/generate_tasks.py`.
"""
from
datasets
import
get_dataset_config_names
chosen_subtasks
=
[]
language_dict
=
{
"en"
:
"english"
,
"es"
:
"spanish"
,
"hi"
:
"hindi"
,
"vi"
:
"vietnamese"
,
"de"
:
"german"
,
"ar"
:
"arabic"
,
"zh"
:
"chinese"
,
}
def
main
()
->
None
:
configs
=
get_dataset_config_names
(
"facebook/mlqa"
,
trust_remote_code
=
True
)
for
config
in
configs
:
if
len
(
config
.
split
(
"."
))
==
2
:
continue
else
:
chosen_subtasks
.
append
(
config
)
assert
len
(
chosen_subtasks
)
==
49
for
task
in
chosen_subtasks
:
file_name
=
f
"
{
task
.
replace
(
'.'
,
'_'
)
}
.yaml"
context_lang
=
file_name
.
split
(
"_"
)[
1
]
# Not using yaml to avoid tagging issues with !function
with
open
(
file_name
,
"w"
,
encoding
=
"utf-8"
)
as
f
:
f
.
write
(
"# Generated by generate_tasks.py
\n
"
)
# Manually writing the YAML-like content inside files to avoid tagging issues
f
.
write
(
"include: mlqa_common_yaml
\n
"
)
f
.
write
(
f
"task:
{
task
.
replace
(
'.'
,
'_'
)
}
\n
"
)
f
.
write
(
f
"dataset_name:
{
task
}
\n
"
)
f
.
write
(
f
"process_results: !function utils.process_results_
{
context_lang
}
\n
"
)
if
__name__
==
"__main__"
:
main
()
lm_eval/tasks/mlqa/mlqa_ar_ar.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_ar
dataset_name
:
mlqa.ar.ar
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_ar_de.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_de
dataset_name
:
mlqa.ar.de
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_ar_en.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_en
dataset_name
:
mlqa.ar.en
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_ar_es.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_es
dataset_name
:
mlqa.ar.es
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_ar_hi.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_hi
dataset_name
:
mlqa.ar.hi
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_ar_vi.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_vi
dataset_name
:
mlqa.ar.vi
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_ar_zh.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_ar_zh
dataset_name
:
mlqa.ar.zh
process_results
:
!function
utils.process_results_ar
lm_eval/tasks/mlqa/mlqa_common_yaml
0 → 100644
View file @
2106fbeb
dataset_path: facebook/mlqa
dataset_kwargs:
trust_remote_code: true
test_split: test
validation_split: validation
output_type: generate_until
doc_to_text: "Context: {{context}}\n\nQuestion: {{question}}\n\nAnswer:"
doc_to_target: "{{answers}}"
process_docs: !function utils.process_docs
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
- metric: f1
aggregation: mean
higher_is_better: true
generation_kwargs:
until:
- "\n"
do_sample: false
metadata:
version: 0.0
lm_eval/tasks/mlqa/mlqa_de_ar.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_ar
dataset_name
:
mlqa.de.ar
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_de_de.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_de
dataset_name
:
mlqa.de.de
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_de_en.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_en
dataset_name
:
mlqa.de.en
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_de_es.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_es
dataset_name
:
mlqa.de.es
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_de_hi.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_hi
dataset_name
:
mlqa.de.hi
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_de_vi.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_vi
dataset_name
:
mlqa.de.vi
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_de_zh.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_de_zh
dataset_name
:
mlqa.de.zh
process_results
:
!function
utils.process_results_de
lm_eval/tasks/mlqa/mlqa_en_ar.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_en_ar
dataset_name
:
mlqa.en.ar
process_results
:
!function
utils.process_results_en
lm_eval/tasks/mlqa/mlqa_en_de.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_en_de
dataset_name
:
mlqa.en.de
process_results
:
!function
utils.process_results_en
lm_eval/tasks/mlqa/mlqa_en_en.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_en_en
dataset_name
:
mlqa.en.en
process_results
:
!function
utils.process_results_en
lm_eval/tasks/mlqa/mlqa_en_es.yaml
0 → 100644
View file @
2106fbeb
# Generated by generate_tasks.py
include
:
mlqa_common_yaml
task
:
mlqa_en_es
dataset_name
:
mlqa.en.es
process_results
:
!function
utils.process_results_en
Prev
1
…
18
19
20
21
22
23
24
25
26
…
29
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment