• Arthur's avatar
    Fix llama tokenizer (#22402) · c0f99b4d
    Arthur authored
    * draft
    
    * update tokenization limma and conversion script
    
    * more udpates
    
    * initial commit
    
    * style
    
    * default pad to None
    
    * draft tokenization tests
    
    * update test
    
    * update tokenization tests
    
    * nits
    
    * update
    
    * versioning test
    
    * major fix
    
    * fix more testst
    
    * finish fixing special masks
    
    * last nit
    
    * more nits
    
    * add encode decode tests
    
    * add more
    
    * fix token type ids
    
    * style
    c0f99b4d
test_tokenization_llama.py 17.6 KB