"git@developer.sourcefind.cn:wuxk1/megatron-lm.git" did not exist on "12518332df3797ae1213102c95a1bccbf04c324d"
  • Philip May's avatar
    Add regression tests for slow sentencepiece tokenizers. (#11737) · fcad8018
    Philip May authored
    * add test_vocab_size for sentencepiece tok.
    
    * add test_get_vocab for sentencepiece tok.
    
    * add test_convert_token_and_id for sentencepiece tok.
    
    * add test_tokenize_and_convert_tokens_to_string for all tok.
    
    * improve test_tokenize_and_convert_tokens_to_string for sp. tok.
    
    * add common tokenizer integration tests
    - for albert
    - for barthez
    
    * add tokenizer integration tests to bert gen.
    
    * add most tokenizer integration tests
    
    * fix camembert tokenizer integration test
    
    * add tokenizer integration test to marian
    
    * add tokenizer integration test to reformer
    
    * add typing and doc to tokenizer_integration_test_util
    
    * fix tokenizer integration test of reformer
    
    * improve test_sentencepiece_tokenize_and_convert_tokens_to_string
    
    * empty commit to trigger CI
    
    * fix tokenizer integration test of reformer
    
    * remove code not needed anymore
    
    * empty commit to trigger CI
    
    * empty commit to trigger CI
    fcad8018
test_tokenization_speech_to_text.py 10.1 KB