1. 21 Jun, 2024 1 commit
    • Ita Zaporozhets's avatar
      SPLIT PR: add user defined symbols and control symbols (#31305) · 1e79eade
      Ita Zaporozhets authored
      * PR SPLIT: moving origina changes for adding user defined symbols
      
      * adding gemma test and generalizing gemma converter
      
      * ruff
      
      * update common test
      
      * update serialization test
      
      * deberta v2 tests updates as rust version adds '.' as a user added token, so a space is not added
      
      * removing commented lines
      
      * applying feedback - user only added_tokens to add and check piece.type instead of trainer_spec for user_defined_symbols
      
      * add comment referencing sentencepiece
      1e79eade
  2. 22 May, 2024 1 commit
  3. 15 Apr, 2024 1 commit
  4. 13 Mar, 2024 1 commit
  5. 04 Dec, 2023 1 commit
    • Nilesh's avatar
      Added test cases for rembert refering to albert and reformer test_tok… (#27637) · 4d4febb7
      Nilesh authored
      
      
      * Added test cases for rembert refering to albert and reformer test_tokenization
      
      * removed CURL_CA_BUNDLE='
      
      * Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True
      
      * Overrided test_added_tokens_serialization
      
      * As slow->fast token failed due to the different initialization for [MASK]  for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token
      
      * Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to  have AddedToken instance with lstrip=True
      
      * Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True
      
      * Cleaned the code and added  fmt: skip to avoid line breaks after make style +  added comments to indicate from the copied test cases
      
      * Corrected few comments
      
      * Fixed quality issue
      
      * Ran fix-copies
      
      * Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text
      
      * Reverted the changes made by repo-consistancy
      
      ---------
      Co-authored-by: default avatarKokane <kokanen@apac.corpdir.net>
      4d4febb7