"comfy/vscode:/vscode.git/clone" did not exist on "7daad468ec945aabfbf3f502c6c059bfc818014d"
  • Thomas Wolf's avatar
    Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58
    Thomas Wolf authored
    
    Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
    
    * [WIP] SP tokenizers
    
    * fixing tests for T5
    
    * WIP tokenizers
    
    * serialization
    
    * update T5
    
    * WIP T5 tokenization
    
    * slow to fast conversion script
    
    * Refactoring to move tokenzier implementations inside transformers
    
    * Adding gpt - refactoring - quality
    
    * WIP adding several tokenizers to the fast world
    
    * WIP Roberta - moving implementations
    
    * update to dev4 switch file loading to in-memory loading
    
    * Updating and fixing
    
    * advancing on the tokenizers - updating do_lower_case
    
    * style and quality
    
    * moving forward with tokenizers conversion and tests
    
    * MBart, T5
    
    * dumping the fast version of transformer XL
    
    * Adding to autotokenizers + style/quality
    
    * update init and space_between_special_tokens
    
    * style and quality
    
    * bump up tokenizers version
    
    * add protobuf
    
    * fix pickle Bert JP with Mecab
    
    * fix newly added tokenizers
    
    * style and quality
    
    * fix bert japanese
    
    * fix funnel
    
    * limite tokenizer warning to one occurence
    
    * clean up file
    
    * fix new tokenizers
    
    * fast tokenizers deep tests
    
    * WIP adding all the special fast tests on the new fast tokenizers
    
    * quick fix
    
    * adding more fast tokenizers in the fast tests
    
    * all tokenizers in fast version tested
    
    * Adding BertGenerationFast
    
    * bump up setup.py for CI
    
    * remove BertGenerationFast (too early)
    
    * bump up tokenizers version
    
    * Clean old docstrings
    
    * Typo
    
    * Update following Lysandre comments
    Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
    9aeacb58
test_tokenization_xlm.py 3.2 KB