Actually the extra_id are from 0-99 and not from 1-100 (#5967)
a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[ 62, 530, 3, 9, 32000]])
a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[ 62, 530, 3, 9, 3, 2, 25666, 834, 23, 26,
834, 2915, 3155]])
Showing
Please register or sign in to comment