• Raushan Turganbay's avatar
    Cache: new Cache format in decoder-only models (#31421) · a30c865f
    Raushan Turganbay authored
    
    
    * draft bart with new cache
    
    * add cache for decoder-only models
    
    * revert utils
    
    * modify docstring
    
    * revert bart
    
    * minor fixes
    
    * fix copies (not related)
    
    * revert tests
    
    * remove enc-dec related code
    
    * remove bloom
    
    * remove opt (enc-dec)
    
    * update docstring
    
    * git, codegen, gpt_neo, gpt_neox, gpj
    
    * clean up
    
    * copied from statements
    
    * revert
    
    * tmp
    
    * update warning msg
    
    * forgot git
    
    * add more flags
    
    * run-slow git,codegen,gpt_neo,gpt_neox,gpj
    
    * add cache flag to VLMs
    
    * remove files
    
    * style
    
    * video LLMs also need a flag
    
    * style
    
    * llava will go in another PR
    
    * style
    
    * [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics
    
    * Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * copy from
    
    * deprecate until v4.45 and warn if not training
    
    * nit
    
    * fix test
    
    * test static cache
    
    * add more tests and fix models
    
    * fix copies
    
    * return sliding window mask
    
    * run slow tests & fix + codestyle
    
    * one more falcon fix for alibi
    
    ---------
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    a30c865f
test_modeling_common.py 223 KB