"examples/contrib/vscode:/vscode.git/clone" did not exist on "b0cbcdb05b39e6c81db049d2b4d7dfc5d823210d"
run_ner.py / bert-base-multilingual-cased can output empty tokens (#2991)
* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers. Signed-off-by:Morgan Funtowicz <morgan@huggingface.co> * run_ner.py - Do not add a label to the labels_ids if word_tokens is empty. This can happen when using bert-base-multilingual-cased with an input containing an unique space. In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior over the labels_ids tokens adding one more tokens than tokens vector. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
Showing
Please register or sign in to comment