Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
gaoqiong
lm-evaluation-harness
Commits
42d54f8c
"profiler/vscode:/vscode.git/clone" did not exist on "97c4d486f46f26bc241be5565f373ca28221e454"
Commit
42d54f8c
authored
Jan 19, 2024
by
haileyschoelkopf
Browse files
add the hack (works for Mistral/Llama, destroys performance for GPT2
parent
588a493c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
0 deletions
+20
-0
lm_eval/models/huggingface.py
lm_eval/models/huggingface.py
+20
-0
No files found.
lm_eval/models/huggingface.py
View file @
42d54f8c
...
@@ -755,6 +755,26 @@ class HFLM(LM):
...
@@ -755,6 +755,26 @@ class HFLM(LM):
# context_enc = self.tok_encode(context, add_special_tokens=False)
# context_enc = self.tok_encode(context, add_special_tokens=False)
context_enc_len
=
len
(
context_enc
)
context_enc_len
=
len
(
context_enc
)
continuation_enc
=
whole_enc
[
context_enc_len
:]
continuation_enc
=
whole_enc
[
context_enc_len
:]
# quite the hack, but what this does:
# circumvents the addition of an extraneous sentencepiece underline token
# that was produced when passing " <word>" into the Llama / Mistral tokenizer.
# if instead we pass "<word>" in, we don't get this extra token (29871 for Llama.)
# which would hurt performance if provided.
if
(
len
(
continuation
.
lstrip
())
+
1
==
len
(
continuation
)
and
continuation
.
startswith
(
" "
)
)
or
(
len
(
continuation_enc
)
==
0
):
context_enc_2
=
context_enc
continuation_enc_2
=
self
.
tok_encode
(
continuation
[
1
:],
add_special_tokens
=
False
)
# assert context_enc == context_enc_2
# assert continuation_enc == continuation_enc_2, f"{continuation_enc},{continuation_enc_2}"
return
context_enc_2
,
continuation_enc_2
return
context_enc
,
continuation_enc
return
context_enc
,
continuation_enc
def
loglikelihood
(
self
,
requests
:
List
[
Instance
])
->
List
[
Tuple
[
float
,
bool
]]:
def
loglikelihood
(
self
,
requests
:
List
[
Instance
])
->
List
[
Tuple
[
float
,
bool
]]:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment