• Dat Quoc Nguyen's avatar
    Add new pre-trained models BERTweet and PhoBERT (#6129) · af2322c7
    Dat Quoc Nguyen authored
    * Add BERTweet and PhoBERT models
    
    * Update modeling_auto.py
    
    Re-add `bart` to LM_MAPPING
    
    * Update tokenization_auto.py
    
    Re-add `from .configuration_mobilebert import MobileBertConfig`
    not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`
    
    * Add BERTweet and PhoBERT to pretrained_models.rst
    
    * Update tokenization_auto.py
    
    Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.
    
    * Update BertweetTokenizer - without nltk
    
    * Update model card for BERTweet
    
    * PhoBERT - with Auto mode - without import fastBPE
    
    * PhoBERT - with Auto mode - without import fastBPE
    
    * BERTweet - with Auto mode - without import fastBPE
    
    * Add PhoBERT and BERTweet to TF modeling auto
    
    * Improve Docstrings for PhobertTokenizer and BertweetTokenizer
    
    * Update PhoBERT and BERTweet model cards
    
    * Fixed a merge conflict in tokenization_auto
    
    * Used black to reformat BERTweet- and PhoBERT-related files
    
    * Used isort to reformat BERTweet- and PhoBERT-related files
    
    * Reformatted BERTweet- and PhoBERT-related files based on flake8
    
    * Updated test files
    
    * Updated test files
    
    * Updated tf test files
    
    * Updated tf test files
    
    * Updated tf test files
    
    * Updated tf test files
    
    * Update commits from huggingface
    
    * Delete unnecessary files
    
    * Add tokenizers to auto and init files
    
    * Add test files for tokenizers
    
    * Revised model cards
    
    * Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files
    
    * Revised test files
    
    * Update orders of Phobert and Bertweet tokenizers in auto tokenization file
    af2322c7
test_tokenization_bertweet.py 2.62 KB