• Arthur's avatar
    [`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900
    Arthur authored
    * attempt to fix
    
    * the actual fix that works with compilation!
    
    * this?
    
    * temporary update
    
    * nit?
    
    * dispatcg to memory efficient?
    
    * update both models that have static cache support
    
    * fix copies fix compile
    
    * make sure fix
    
    * fix cohere and gemma
    
    * fix beams?
    
    * nit
    
    * slipped through the cracks
    
    * nit
    
    * nits
    
    * update
    
    * fix-copies
    
    * skip failing tests
    
    * nits
    ff841900
test_modeling_llama.py 33.8 KB