• Philip May's avatar
    Add regression tests for slow sentencepiece tokenizers. (#11737) · fcad8018
    Philip May authored
    * add test_vocab_size for sentencepiece tok.
    
    * add test_get_vocab for sentencepiece tok.
    
    * add test_convert_token_and_id for sentencepiece tok.
    
    * add test_tokenize_and_convert_tokens_to_string for all tok.
    
    * improve test_tokenize_and_convert_tokens_to_string for sp. tok.
    
    * add common tokenizer integration tests
    - for albert
    - for barthez
    
    * add tokenizer integration tests to bert gen.
    
    * add most tokenizer integration tests
    
    * fix camembert tokenizer integration test
    
    * add tokenizer integration test to marian
    
    * add tokenizer integration test to reformer
    
    * add typing and doc to tokenizer_integration_test_util
    
    * fix tokenizer integration test of reformer
    
    * improve test_sentencepiece_tokenize_and_convert_tokens_to_string
    
    * empty commit to trigger CI
    
    * fix tokenizer integration test of reformer
    
    * remove code not needed anymore
    
    * empty commit to trigger CI
    
    * empty commit to trigger CI
    fcad8018
test_tokenization_common.py 156 KB