tests/models/rembert/test_tokenization_rembert.py · 4d4febb7aa3f49c29d7f4fc29bbc1760edd583b6 · chenpangpang / transformers

Added test cases for rembert refering to albert and reformer test_tok… (#27637) · 4d4febb7

Nilesh authored Dec 04, 2023



* Added test cases for rembert refering to albert and reformer test_tokenization

* removed CURL_CA_BUNDLE='

* Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True

* Overrided test_added_tokens_serialization

* As slow->fast token failed due to the different initialization for [MASK]  for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token

* Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to  have AddedToken instance with lstrip=True

* Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True

* Cleaned the code and added  fmt: skip to avoid line breaks after make style +  added comments to indicate from the copied test cases

* Corrected few comments

* Fixed quality issue

* Ran fix-copies

* Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text

* Reverted the changes made by repo-consistancy

---------
Co-authored-by: Kokane <kokanen@apac.corpdir.net>

4d4febb7

test_tokenization_rembert.py 13.6 KB

Replace test_tokenization_rembert.py