Unverified Commit 90ce374d authored by Ruiyang Sun's avatar Ruiyang Sun Committed by GitHub
Browse files

fix(llama): fix LlamaTokenzier (#22746)

Bug in LlamaTokenizer when  #22742
parent d85bf954
...@@ -246,9 +246,12 @@ class LlamaTokenizer(PreTrainedTokenizer): ...@@ -246,9 +246,12 @@ class LlamaTokenizer(PreTrainedTokenizer):
Returns: Returns:
`List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s). `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
""" """
sep = [self.sep_token_id] bos_token_id = [self.bos_token_id] if self.add_bos_token else []
cls = [self.cls_token_id] eos_token_id = [self.eos_token_id] if self.add_eos_token else []
if token_ids_1 is None: output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
return len(cls + token_ids_0 + sep) * [0]
return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1] if token_ids_1 is not None:
output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
return output
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment