"docs/vscode:/vscode.git/clone" did not exist on "d69062c67af46a2e624be92162e9db585eef329b"
Commit b1568cf4 authored by Luciano Martins's avatar Luciano Martins Committed by khluu
Browse files

[Model] Fix Gemma 4 token repetition by dynamic BOS injection for PT models (#39842)


Signed-off-by: default avatarLuciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: default avatarLuciano Martins <lucianommartins@users.noreply.github.com>
(cherry picked from commit 6dc94914)
parent a4ac72ce
......@@ -168,9 +168,14 @@ class Gemma4ProcessingInfo(BaseProcessingInfo):
Setting ``add_special_tokens=False`` here prevents the duplicate and
ensures both ``llm.generate()`` and the chat/completions API behave
correctly.
correctly for IT models. For PT models (without chat template), we
keep the default (True) to ensure BOS is added for raw prompts.
"""
tokenizer = self.ctx.get_tokenizer()
has_chat_template = getattr(tokenizer, "chat_template", None) is not None
params = super().get_default_tok_params()
if has_chat_template:
params = params.with_kwargs(add_special_tokens=False)
return params
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment