Commit b1568cf4 authored by Luciano Martins's avatar Luciano Martins Committed by khluu
Browse files

[Model] Fix Gemma 4 token repetition by dynamic BOS injection for PT models (#39842)


Signed-off-by: default avatarLuciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: default avatarLuciano Martins <lucianommartins@users.noreply.github.com>
(cherry picked from commit 6dc94914)
parent a4ac72ce
...@@ -168,10 +168,15 @@ class Gemma4ProcessingInfo(BaseProcessingInfo): ...@@ -168,10 +168,15 @@ class Gemma4ProcessingInfo(BaseProcessingInfo):
Setting ``add_special_tokens=False`` here prevents the duplicate and Setting ``add_special_tokens=False`` here prevents the duplicate and
ensures both ``llm.generate()`` and the chat/completions API behave ensures both ``llm.generate()`` and the chat/completions API behave
correctly. correctly for IT models. For PT models (without chat template), we
keep the default (True) to ensure BOS is added for raw prompts.
""" """
tokenizer = self.ctx.get_tokenizer()
has_chat_template = getattr(tokenizer, "chat_template", None) is not None
params = super().get_default_tok_params() params = super().get_default_tok_params()
params = params.with_kwargs(add_special_tokens=False) if has_chat_template:
params = params.with_kwargs(add_special_tokens=False)
return params return params
def get_hf_processor(self, **kwargs: object) -> Gemma4Processor: def get_hf_processor(self, **kwargs: object) -> Gemma4Processor:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment