Unverified Commit b0ad0695 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

[Tokenization] fix edge case for bert tokenization (#3517)

* fix egde gase for bert tokenization

* add Lysandres comments for improvement

* use new is_pretokenized_flag
parent 80fa0f78
......@@ -1396,7 +1396,7 @@ class PreTrainedTokenizer(SpecialTokensMixin):
input_ids = []
for ids_or_pair_ids in batch_text_or_text_pairs:
if isinstance(ids_or_pair_ids, (list, tuple)) and len(ids_or_pair_ids) == 2:
if isinstance(ids_or_pair_ids, (list, tuple)) and len(ids_or_pair_ids) == 2 and not is_pretokenized:
ids, pair_ids = ids_or_pair_ids
else:
ids, pair_ids = ids_or_pair_ids, None
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment