Unverified Commit 8865da15 authored by sangho.lee's avatar sangho.lee Committed by GitHub
Browse files

[Bugfix][Multi Modal] Fix incorrect Molmo token processing (#26873)


Signed-off-by: default avatarsanghol <sanghol@allenai.org>
parent f0862eae
...@@ -1264,13 +1264,16 @@ class MolmoMultiModalProcessor(BaseMultiModalProcessor[MolmoProcessingInfo]): ...@@ -1264,13 +1264,16 @@ class MolmoMultiModalProcessor(BaseMultiModalProcessor[MolmoProcessingInfo]):
) -> list[int]: ) -> list[int]:
processor = self.info.get_hf_processor() processor = self.info.get_hf_processor()
# Apply the chat template to the tokens # The chat template is already applied to the prompt tokens
# Use message_format="none" to avoid applying it again
# Prepend an empty space if `always_start_with_space` is True
tokens = processor.processor.get_tokens_input( # type: ignore tokens = processor.processor.get_tokens_input( # type: ignore
self.info.get_tokenizer().decode(prompt_tokens), self.info.get_tokenizer().decode(prompt_tokens),
message_format=processor.message_format, message_format="none",
always_start_with_space=processor.always_start_with_space, always_start_with_space=processor.always_start_with_space,
) )
# Prepend a BOS token id to the tokens
processed_data = self.info.ctx.call_hf_processor( processed_data = self.info.ctx.call_hf_processor(
processor, # type: ignore processor, # type: ignore
dict(tokens=tokens), dict(tokens=tokens),
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment