Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
a6c640d3
Unverified
Commit
a6c640d3
authored
Jun 16, 2023
by
Lintang Sutawika
Committed by
GitHub
Jun 16, 2023
Browse files
Merge branch 'big-refactor' into seq2seq-refactor
parents
55eccc29
24e3e3fa
Changes
261
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
23 additions
and
365 deletions
+23
-365
lm_eval/tasks/pile/pile_enron.yaml
lm_eval/tasks/pile/pile_enron.yaml
+2
-21
lm_eval/tasks/pile/pile_europarl.yaml
lm_eval/tasks/pile/pile_europarl.yaml
+1
-20
lm_eval/tasks/pile/pile_freelaw.yaml
lm_eval/tasks/pile/pile_freelaw.yaml
+1
-20
lm_eval/tasks/pile/pile_github.yaml
lm_eval/tasks/pile/pile_github.yaml
+1
-20
lm_eval/tasks/pile/pile_gutenberg.yaml
lm_eval/tasks/pile/pile_gutenberg.yaml
+1
-20
lm_eval/tasks/pile/pile_hackernews.yaml
lm_eval/tasks/pile/pile_hackernews.yaml
+1
-20
lm_eval/tasks/pile/pile_nih-exporter.yaml
lm_eval/tasks/pile/pile_nih-exporter.yaml
+1
-20
lm_eval/tasks/pile/pile_opensubtitles.yaml
lm_eval/tasks/pile/pile_opensubtitles.yaml
+1
-20
lm_eval/tasks/pile/pile_openwebtext2.yaml
lm_eval/tasks/pile/pile_openwebtext2.yaml
+1
-20
lm_eval/tasks/pile/pile_philpapers.yaml
lm_eval/tasks/pile/pile_philpapers.yaml
+1
-20
lm_eval/tasks/pile/pile_pile-cc.yaml
lm_eval/tasks/pile/pile_pile-cc.yaml
+1
-20
lm_eval/tasks/pile/pile_pubmed-abstracts.yaml
lm_eval/tasks/pile/pile_pubmed-abstracts.yaml
+1
-20
lm_eval/tasks/pile/pile_pubmed-central.yaml
lm_eval/tasks/pile/pile_pubmed-central.yaml
+1
-20
lm_eval/tasks/pile/pile_stackexchange.yaml
lm_eval/tasks/pile/pile_stackexchange.yaml
+1
-20
lm_eval/tasks/pile/pile_ubuntu-irc.yaml
lm_eval/tasks/pile/pile_ubuntu-irc.yaml
+1
-20
lm_eval/tasks/pile/pile_uspto.yaml
lm_eval/tasks/pile/pile_uspto.yaml
+1
-20
lm_eval/tasks/pile/pile_wikipedia.yaml
lm_eval/tasks/pile/pile_wikipedia.yaml
+1
-20
lm_eval/tasks/pile/pile_youtubesubtitles.yaml
lm_eval/tasks/pile/pile_youtubesubtitles.yaml
+1
-20
lm_eval/tasks/piqa/piqa.yaml
lm_eval/tasks/piqa/piqa.yaml
+2
-2
lm_eval/tasks/sciq/sciq.yaml
lm_eval/tasks/sciq/sciq.yaml
+2
-2
No files found.
lm_eval/tasks/pile/pile_enron.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_enron
dataset_path
:
EleutherAI/the_pile
dataset_name
:
enron_emails
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
dataset_name
:
pile_enron
lm_eval/tasks/pile/pile_europarl.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_europarl
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_europarl
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_freelaw.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_freelaw
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_freelaw
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_github.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_github
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_github
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_gutenberg.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_gutenberg
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_gutenberg
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_hackernews.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_hackernews
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_hackernews
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_nih-exporter.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_nih-exporter
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_nih-exporter
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_opensubtitles.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_opensubtitles
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_opensubtitles
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_openwebtext2.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_openwebtext2
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_openwebtext2
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_philpapers.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_philpapers
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_philpapers
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_pile-cc.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_pile-cc
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_pile-cc
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_pubmed-abstracts.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_pubmed-abstracts
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_pubmed-abstracts
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_pubmed-central.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_pubmed-central
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_pubmed-central
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_stackexchange.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_stackexchange
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_stackexchange
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_ubuntu-irc.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_ubuntu-irc
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_ubuntu-irc
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_uspto.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_uspto
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_uspto
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_wikipedia.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_wikipedia
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_wikipedia
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/pile/pile_youtubesubtitles.yaml
View file @
a6c640d3
group
:
-
pile
include
:
pile_arxiv.yaml
task
:
pile_youtubesubtitles
dataset_path
:
EleutherAI/the_pile
dataset_name
:
pile_youtubesubtitles
output_type
:
loglikelihood_rolling
test_split
:
train
template_aliases
:
"
"
doc_to_text
:
"
"
doc_to_target
:
"
{{text}}"
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{text}}"
metric_list
:
-
metric
:
word_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
byte_perplexity
aggregation
:
weighted_perplexity
higher_is_better
:
false
-
metric
:
bits_per_byte
aggregation
:
bits_per_byte
higher_is_better
:
false
lm_eval/tasks/piqa/piqa.yaml
View file @
a6c640d3
group
:
-
piqa_yaml_grp
task
:
piqa
_yaml
-
multiple_choice
task
:
piqa
dataset_path
:
piqa
dataset_name
:
null
output_type
:
multiple_choice
...
...
lm_eval/tasks/sciq/sciq.yaml
View file @
a6c640d3
group
:
-
sciq_yaml_grp
task
:
sciq
_yaml
-
multiple_choice
task
:
sciq
dataset_path
:
sciq
dataset_name
:
null
output_type
:
multiple_choice
...
...
Prev
1
2
3
4
5
6
7
…
14
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment