• Nilesh's avatar
    Added test cases for rembert refering to albert and reformer test_tok… (#27637) · 4d4febb7
    Nilesh authored
    
    
    * Added test cases for rembert refering to albert and reformer test_tokenization
    
    * removed CURL_CA_BUNDLE='
    
    * Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True
    
    * Overrided test_added_tokens_serialization
    
    * As slow->fast token failed due to the different initialization for [MASK]  for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token
    
    * Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to  have AddedToken instance with lstrip=True
    
    * Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True
    
    * Cleaned the code and added  fmt: skip to avoid line breaks after make style +  added comments to indicate from the copied test cases
    
    * Corrected few comments
    
    * Fixed quality issue
    
    * Ran fix-copies
    
    * Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text
    
    * Reverted the changes made by repo-consistancy
    
    ---------
    Co-authored-by: default avatarKokane <kokanen@apac.corpdir.net>
    4d4febb7
test_tokenization_rembert.py 13.6 KB