• SaulLu's avatar
    improve saving strategy of sentencepiece tokenizer (#15328) · ade7371a
    SaulLu authored
    
    
    * add new test
    
    * add a feature to same the sentencepiece tokenizer model when the init file was deleted
    
    * update marian
    
    * update m2m_100
    
    * fix marian
    
    * update speech to text
    
    * override test for layoutxlm
    
    * fix saving bartpho
    
    * remove harcoded values bartpho
    
    * special token string version
    
    * finish bartpho
    
    * override layoutxml test
    
    * add mbart
    
    * move special tokens list
    
    * format
    
    * Revert "format"
    
    This reverts commit 37a40df37903a932c2f951cbd33acb684246bae7.
    
    * simplify list of string of special tokens
    
    * Re-write `self.fairseq_tokens_to_ids ` initialization logic with special tokens
    Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
    Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
    ade7371a
test_tokenization_mbart.py 13.7 KB