• Arthur's avatar
    [`Core Tokenization`] Support a fix for spm fast models (#26678) · 81899778
    Arthur authored
    * fix
    
    * last attempt
    
    * current work
    
    * fix forward compatibility
    
    * save all special tokens
    
    * current state
    
    * revert additional changes
    
    * updates
    
    * remove tokenizer.model
    
    * add a test and the fix
    
    * nit
    
    * revert one more break
    
    * fix typefield issue
    
    * quality
    
    * more tests
    
    * fix fields for FC
    
    * more nits?
    
    * new additional changes
    
    * how
    
    * some updates
    
    * the fix
    
    * where do we stand
    
    * nits
    
    * nits
    
    * revert unrelated changes
    
    * nits nits nits
    
    * styling
    
    * don't break llama just yet
    
    * revert llama changes
    
    * safe arg check
    
    * fixup
    
    * Add a test for T5
    
    * Necessary changes
    
    * Tests passing, added tokens need to not be normalized. If the added tokens are normalized, it will the stripping which seems to be unwanted for a normal functioning
    
    * Add even more tests, when normalization is set to True (which does not work 馃槗 )
    
    * Add even more tests, when normalization is set to True (which does not work 馃槗 )
    
    * Update to main
    
    * nits
    
    * fmt
    
    * more and more test
    
    * comments
    
    * revert change as tests are failing
    
    * make the test more readble
    
    * nits
    
    * refactor the test
    
    * nit
    
    * updates
    
    * simplify
    
    * style
    
    * style
    
    * style convert slow
    
    * Update src/transformers/convert_slow_tokenizer.py
    81899778
test_tokenization_t5.py 30.5 KB