Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
a7286607
Commit
a7286607
authored
Sep 05, 2023
by
haileyschoelkopf
Browse files
change words/bytes calc for wikitext
parent
cc7828dd
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
0 deletions
+13
-0
lm_eval/tasks/wikitext/preprocess_wikitext.py
lm_eval/tasks/wikitext/preprocess_wikitext.py
+12
-0
lm_eval/tasks/wikitext/wikitext.yaml
lm_eval/tasks/wikitext/wikitext.yaml
+1
-0
No files found.
lm_eval/tasks/wikitext/preprocess_wikitext.py
View file @
a7286607
...
@@ -34,3 +34,15 @@ def wikitext_detokenizer(doc):
...
@@ -34,3 +34,15 @@ def wikitext_detokenizer(doc):
string
=
string
.
replace
(
" 's"
,
"'s"
)
string
=
string
.
replace
(
" 's"
,
"'s"
)
return
string
return
string
def
process_results
(
doc
,
results
):
(
loglikelihood
,)
=
results
# IMPORTANT: wikitext counts number of words in *original doc before detokenization*
_words
=
len
(
re
.
split
(
r
"\s+"
,
doc
[
"page"
]))
_bytes
=
len
(
doc
[
"page"
].
encode
(
"utf-8"
))
return
{
"word_perplexity"
:
(
loglikelihood
,
_words
),
"byte_perplexity"
:
(
loglikelihood
,
_bytes
),
"bits_per_byte"
:
(
loglikelihood
,
_bytes
),
}
lm_eval/tasks/wikitext/wikitext.yaml
View file @
a7286607
...
@@ -7,6 +7,7 @@ validation_split: validation
...
@@ -7,6 +7,7 @@ validation_split: validation
test_split
:
test
test_split
:
test
doc_to_text
:
"
"
doc_to_text
:
"
"
doc_to_target
:
!function
preprocess_wikitext.wikitext_detokenizer
doc_to_target
:
!function
preprocess_wikitext.wikitext_detokenizer
process_results
:
!function
preprocess_wikitext.process_results
should_decontaminate
:
true
should_decontaminate
:
true
doc_to_decontamination_query
:
"
{{page}}"
doc_to_decontamination_query
:
"
{{page}}"
metric_list
:
metric_list
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment