• Arthur's avatar
    [`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues (#28010) · 15cfe389
    Arthur authored
    * add add_dummy_prefix_space option to slow
    
    * checking kwargs might be better. Should be there for all spm tokenizer IMO
    
    * nits
    
    * fix copies
    
    * more copied
    
    * nits
    
    * add prefix space
    
    * nit
    
    * nits
    
    * Update src/transformers/convert_slow_tokenizer.py
    
    * fix inti
    
    * revert wrong styling
    
    * fix
    
    * nits
    
    * style
    
    * updates
    
    * make sure we use slow tokenizer for conversion instead of looking for the decoder
    
    * support llama ast well
    
    * update llama tokenizer fast
    
    * nits
    
    * nits nits nits
    
    * update the doc
    
    * update
    
    * update to fix tests
    
    * skip unrelated tailing test
    
    * Update src/transformers/convert_slow_tokenizer.py
    
    * add proper testing
    
    * test decode as well
    
    * more testing
    
    * format
    
    * fix llama test
    
    * Apply suggestions from code review
    15cfe389
test_tokenization_t5.py 32.3 KB