Add new pre-trained models BERTweet and PhoBERT (#6129)
* Add BERTweet and PhoBERT models * Update modeling_auto.py Re-add `bart` to LM_MAPPING * Update tokenization_auto.py Re-add `from .configuration_mobilebert import MobileBertConfig` not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig` * Add BERTweet and PhoBERT to pretrained_models.rst * Update tokenization_auto.py Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer. * Update BertweetTokenizer - without nltk * Update model card for BERTweet * PhoBERT - with Auto mode - without import fastBPE * PhoBERT - with Auto mode - without import fastBPE * BERTweet - with Auto mode - without import fastBPE * Add PhoBERT and BERTweet to TF modeling auto * Improve Docstrings for PhobertTokenizer and BertweetTokenizer * Update PhoBERT and BERTweet model cards * Fixed a merge conflict in tokenization_auto * Used black to reformat BERTweet- and PhoBERT-related files * Used isort to reformat BERTweet- and PhoBERT-related files * Reformatted BERTweet- and PhoBERT-related files based on flake8 * Updated test files * Updated test files * Updated tf test files * Updated tf test files * Updated tf test files * Updated tf test files * Update commits from huggingface * Delete unnecessary files * Add tokenizers to auto and init files * Add test files for tokenizers * Revised model cards * Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files * Revised test files * Update orders of Phobert and Bertweet tokenizers in auto tokenization file
Showing
Please register or sign in to comment