Unverified Commit 6dc94914 authored by Luciano Martins's avatar Luciano Martins Committed by GitHub
Browse files

[Model] Fix Gemma 4 token repetition by dynamic BOS injection for PT models (#39842)


Signed-off-by: default avatarLuciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: default avatarLuciano Martins <lucianommartins@users.noreply.github.com>
parent 27c0ca50
...@@ -167,9 +167,14 @@ class Gemma4ProcessingInfo(BaseProcessingInfo): ...@@ -167,9 +167,14 @@ class Gemma4ProcessingInfo(BaseProcessingInfo):
Setting ``add_special_tokens=False`` here prevents the duplicate and Setting ``add_special_tokens=False`` here prevents the duplicate and
ensures both ``llm.generate()`` and the chat/completions API behave ensures both ``llm.generate()`` and the chat/completions API behave
correctly. correctly for IT models. For PT models (without chat template), we
keep the default (True) to ensure BOS is added for raw prompts.
""" """
tokenizer = self.ctx.get_tokenizer()
has_chat_template = getattr(tokenizer, "chat_template", None) is not None
params = super().get_default_tok_params() params = super().get_default_tok_params()
if has_chat_template:
params = params.with_kwargs(add_special_tokens=False) params = params.with_kwargs(add_special_tokens=False)
return params return params
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment