Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
eea16d36
Unverified
Commit
eea16d36
authored
May 08, 2024
by
Jess
Committed by
GitHub
May 08, 2024
Browse files
Merge branch 'EleutherAI:main' into main
parents
72f5f4b1
885f48d6
Changes
45
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
330 additions
and
0 deletions
+330
-0
lm_eval/tasks/unitxt/claim_stance_topic.yaml
lm_eval/tasks/unitxt/claim_stance_topic.yaml
+3
-0
lm_eval/tasks/unitxt/cnn_dailymail.yaml
lm_eval/tasks/unitxt/cnn_dailymail.yaml
+3
-0
lm_eval/tasks/unitxt/coedit_gec.yaml
lm_eval/tasks/unitxt/coedit_gec.yaml
+3
-0
lm_eval/tasks/unitxt/dbpedia_14.yaml
lm_eval/tasks/unitxt/dbpedia_14.yaml
+3
-0
lm_eval/tasks/unitxt/ethos_binary.yaml
lm_eval/tasks/unitxt/ethos_binary.yaml
+3
-0
lm_eval/tasks/unitxt/financial_tweets.yaml
lm_eval/tasks/unitxt/financial_tweets.yaml
+3
-0
lm_eval/tasks/unitxt/generate_yamls.py
lm_eval/tasks/unitxt/generate_yamls.py
+135
-0
lm_eval/tasks/unitxt/law_stack_exchange.yaml
lm_eval/tasks/unitxt/law_stack_exchange.yaml
+3
-0
lm_eval/tasks/unitxt/ledgar.yaml
lm_eval/tasks/unitxt/ledgar.yaml
+3
-0
lm_eval/tasks/unitxt/medical_abstracts.yaml
lm_eval/tasks/unitxt/medical_abstracts.yaml
+3
-0
lm_eval/tasks/unitxt/stsb.yaml
lm_eval/tasks/unitxt/stsb.yaml
+3
-0
lm_eval/tasks/unitxt/unfair_tos.yaml
lm_eval/tasks/unitxt/unfair_tos.yaml
+3
-0
lm_eval/tasks/unitxt/unitxt_datasets
lm_eval/tasks/unitxt/unitxt_datasets
+18
-0
lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_class
lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_class
+24
-0
lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_label
lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_label
+24
-0
lm_eval/tasks/unitxt/unitxt_tasks.grammatical_error_correction
...al/tasks/unitxt/unitxt_tasks.grammatical_error_correction
+24
-0
lm_eval/tasks/unitxt/unitxt_tasks.qa.with_context.extractive
lm_eval/tasks/unitxt/unitxt_tasks.qa.with_context.extractive
+18
-0
lm_eval/tasks/unitxt/unitxt_tasks.regression.two_texts
lm_eval/tasks/unitxt/unitxt_tasks.regression.two_texts
+18
-0
lm_eval/tasks/unitxt/unitxt_tasks.span_labeling.extraction
lm_eval/tasks/unitxt/unitxt_tasks.span_labeling.extraction
+18
-0
lm_eval/tasks/unitxt/unitxt_tasks.summarization.abstractive
lm_eval/tasks/unitxt/unitxt_tasks.summarization.abstractive
+18
-0
No files found.
lm_eval/tasks/unitxt/claim_stance_topic.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
claim_stance_topic
dataset_name
:
card=cards.claim_stance_topic,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/cnn_dailymail.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.summarization.abstractive
task
:
cnn_dailymail
dataset_name
:
card=cards.cnn_dailymail,template=templates.summarization.abstractive.full
lm_eval/tasks/unitxt/coedit_gec.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.grammatical_error_correction
task
:
coedit_gec
dataset_name
:
card=cards.coedit_gec,template=templates.grammatical_error_correction.simple
lm_eval/tasks/unitxt/dbpedia_14.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
dbpedia_14
dataset_name
:
card=cards.dbpedia_14,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/ethos_binary.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
ethos_binary
dataset_name
:
card=cards.ethos_binary,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/financial_tweets.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
financial_tweets
dataset_name
:
card=cards.financial_tweets,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/generate_yamls.py
0 → 100644
View file @
eea16d36
#
# This file generates a set of LM eval harness yaml file
# that load unitxt datasets (https://github.com/IBM/unitxt)
#
import
unitxt_wrapper
import
yaml
from
unitxt.artifact
import
fetch_artifact
from
unitxt.standard
import
StandardRecipe
# This code is required to properly dump LM harness YAML that contains references to functions
def
function_representer
(
dumper
:
yaml
.
SafeDumper
,
func
)
->
yaml
.
nodes
.
MappingNode
:
return
dumper
.
represent_scalar
(
"!function"
,
f
"
{
func
.
__module__
}
.
{
func
.
__name__
}
"
,
style
=
None
)
def
write_task_yaml
(
filename
,
data
):
yaml
.
add_representer
(
type
(
data
[
"process_results"
]),
function_representer
)
with
open
(
filename
,
"w"
)
as
stream
:
yaml
.
dump
(
data
,
stream
,
sort_keys
=
False
)
def
write_card_yaml
(
filename
,
data
):
with
open
(
filename
,
"w"
)
as
stream
:
yaml
.
dump
(
data
,
stream
,
sort_keys
=
False
)
default_template_per_task
=
{
"tasks.classification.multi_label"
:
"templates.classification.multi_label.title"
,
"tasks.classification.multi_class"
:
"templates.classification.multi_class.title"
,
"tasks.summarization.abstractive"
:
"templates.summarization.abstractive.full"
,
"tasks.regression.two_texts"
:
"templates.regression.two_texts.simple"
,
"tasks.qa.with_context.extractive"
:
"templates.qa.with_context.simple"
,
"tasks.grammatical_error_correction"
:
"templates.grammatical_error_correction.simple"
,
"tasks.span_labeling.extraction"
:
"templates.span_labeling.extraction.title"
,
}
def
generate_task_yaml
(
task
:
str
):
"""
Generate an LM Eval Harness YAML file based on a Unitxt task defintion.
The output YAML is based on 'template.yaml.file' found in current directoy.
The common template is filled the the specific metrics for the task.
It still leaves the 'dataset_name' and 'task name' unspecified.
"""
print
(
"*"
*
80
)
print
(
"*"
)
print
(
f
"* Generating YAML base file for task
{
task
}
"
)
print
(
"*"
)
task_definition
,
_
=
fetch_artifact
(
task
)
data
=
{
"group"
:
[
"unitxt"
],
"dataset_path"
:
"unitxt/data"
,
"output_type"
:
"generate_until"
,
"training_split"
:
"train"
,
"validation_split"
:
"test"
,
"doc_to_text"
:
"{{source}}"
,
"doc_to_target"
:
"target"
,
"process_results"
:
unitxt_wrapper
.
process_results
,
"generation_kwargs"
:
{
"until"
:
[
"</s>"
]},
"metric_list"
:
[],
"metadata"
:
{
"verison"
:
1.0
},
}
for
metric_name
in
task_definition
.
metrics
:
new_metric
=
{
"metric"
:
""
,
"aggregation"
:
"unitxt"
,
"higher_is_better"
:
True
}
new_metric
[
"metric"
]
=
metric_name
.
replace
(
"metrics."
,
"unitxt_"
)
data
[
"metric_list"
].
append
(
new_metric
)
write_task_yaml
(
f
"unitxt_
{
task
}
"
,
data
)
def
generate_card_yaml
(
card
:
str
):
"""
Generate an LM Eval Harness YAML file based on the Unitxt dataset card.
It includes the task YAML for the dataset, and overrides the 'dataset_name' and 'task' with the card.
"""
print
(
"*"
*
80
)
print
(
"*"
)
print
(
f
"* Generating YAML file for unitxt dataset
{
card
}
"
)
print
(
"*"
)
card_definition
,
_
=
fetch_artifact
(
f
"cards.
{
card
}
"
)
task
=
card_definition
.
task
.
__id__
if
task
in
default_template_per_task
:
template
=
default_template_per_task
[
task
]
else
:
raise
ValueError
(
f
"Default template was not defined for task
{
task
}
in 'default_template_per_task' dict in generate_yamls.py"
)
data
=
{}
data
[
"include"
]
=
f
"unitxt_
{
task
}
"
data
[
"task"
]
=
card
data
[
"dataset_name"
]
=
f
"card=cards.
{
card
}
,template=
{
template
}
"
# This is faster that the load_dataset approach
# dataset = load_dataset('unitxt/data', data["dataset_name"]+",loader_limit=100",trust_remote_code=True)
recipe
=
StandardRecipe
(
card
=
f
"cards.
{
card
}
"
,
template
=
template
,
loader_limit
=
100
)
stream
=
recipe
()
dataset
=
stream
.
to_dataset
()
print
(
dataset
)
print
(
"Sample input:"
)
print
(
dataset
[
"test"
][
0
][
"source"
])
print
(
"Sample output:"
)
print
(
dataset
[
"test"
][
0
][
"target"
])
write_card_yaml
(
f
"
{
card
}
.yaml"
,
data
)
def
main
():
for
task
in
default_template_per_task
.
keys
():
try
:
generate_task_yaml
(
task
)
except
Exception
as
e
:
print
(
f
"Unable to generate YAML for
{
task
}
due to:"
)
print
(
e
)
raise
(
e
)
with
open
(
"unitxt_datasets"
)
as
f
:
for
unitxt_dataset
in
f
:
unitxt_dataset
=
unitxt_dataset
.
strip
()
if
unitxt_dataset
.
startswith
(
"### END ###"
):
exit
(
0
)
if
not
unitxt_dataset
.
startswith
(
"#"
):
try
:
generate_card_yaml
(
unitxt_dataset
)
except
Exception
as
e
:
print
(
f
"Unable to generate YAML for
{
unitxt_dataset
}
due to:"
)
print
(
e
)
raise
e
if
__name__
==
"__main__"
:
main
()
lm_eval/tasks/unitxt/law_stack_exchange.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
law_stack_exchange
dataset_name
:
card=cards.law_stack_exchange,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/ledgar.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
ledgar
dataset_name
:
card=cards.ledgar,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/medical_abstracts.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_class
task
:
medical_abstracts
dataset_name
:
card=cards.medical_abstracts,template=templates.classification.multi_class.title
lm_eval/tasks/unitxt/stsb.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.regression.two_texts
task
:
stsb
dataset_name
:
card=cards.stsb,template=templates.regression.two_texts.simple
lm_eval/tasks/unitxt/unfair_tos.yaml
0 → 100644
View file @
eea16d36
include
:
unitxt_tasks.classification.multi_label
task
:
unfair_tos
dataset_name
:
card=cards.unfair_tos,template=templates.classification.multi_label.title
lm_eval/tasks/unitxt/unitxt_datasets
0 → 100644
View file @
eea16d36
coedit_gec
atis
20_newsgroups
ag_news
argument_topic
banking77
claim_stance_topic
cnn_dailymail
dbpedia_14
ethos_binary
financial_tweets
law_stack_exchange
ledgar
medical_abstracts
stsb
unfair_tos
xsum
yahoo_answers_topics
lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_class
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_f1_micro
aggregation: unitxt
higher_is_better: true
- metric: unitxt_accuracy
aggregation: unitxt
higher_is_better: true
- metric: unitxt_f1_macro
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
lm_eval/tasks/unitxt/unitxt_tasks.classification.multi_label
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_f1_micro_multi_label
aggregation: unitxt
higher_is_better: true
- metric: unitxt_accuracy
aggregation: unitxt
higher_is_better: true
- metric: unitxt_f1_macro_multi_label
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
lm_eval/tasks/unitxt/unitxt_tasks.grammatical_error_correction
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_char_edit_dist_accuracy
aggregation: unitxt
higher_is_better: true
- metric: unitxt_rouge
aggregation: unitxt
higher_is_better: true
- metric: unitxt_char_edit_distance[reference_field=original_text]
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
lm_eval/tasks/unitxt/unitxt_tasks.qa.with_context.extractive
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_squad
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
lm_eval/tasks/unitxt/unitxt_tasks.regression.two_texts
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_spearman
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
lm_eval/tasks/unitxt/unitxt_tasks.span_labeling.extraction
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_ner
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
lm_eval/tasks/unitxt/unitxt_tasks.summarization.abstractive
0 → 100644
View file @
eea16d36
group:
- unitxt
dataset_path: unitxt/data
output_type: generate_until
training_split: train
validation_split: test
doc_to_text: '{{source}}'
doc_to_target: target
process_results: !function 'unitxt_wrapper.process_results'
generation_kwargs:
until:
- </s>
metric_list:
- metric: unitxt_rouge
aggregation: unitxt
higher_is_better: true
metadata:
verison: 1.0
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment