"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "8876ce8a5f7d0dd3a1a4fbb0acc8727dd0eea686"
fix LayoutLMv3TokenizerFast subword label after 'Ġ' token (#21695)
LayoutLMv3TokenizerFast produces empty 'Ġ' token with `offset_mapping = (0, 0)`. Next token is wrongly assumed to also be beginning of word and isn't correctly assigned `pad_token_label`. Modify test with text that produce 'Ġ' token. Remove copy check from LayoutLMv2TokenizerFast for `_batch_encode_plus`. solves issue: #19978
Showing
Please register or sign in to comment