Commit 9fde13a3 authored by BramVanroy's avatar BramVanroy Committed by Lysandre Debut
Browse files

Add check to verify existence of pad_token_id

In batch_encode_plus we have to ensure that the tokenizer has a pad_token_id so that, when padding, no None values are added as padding. That would happen with gpt2, openai, transfoxl.

closes https://github.com/huggingface/transformers/issues/2640
parent e63a81dd
......@@ -998,7 +998,8 @@ class PreTrainedTokenizer(object):
for key, value in batch_outputs.items():
padded_value = value
if key != "input_len":
# verify that the tokenizer has a pad_token_id
if key != "input_len" and self.pad_token_id is not None:
# Padding handle
padded_value = [
v + [self.pad_token_id if key == "input_ids" else 1] * (max_seq_len - len(v))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment