• Daniel Hiltgen's avatar
    flash attn: add auto mode for llama engine (#13052) · bd6c1d6b
    Daniel Hiltgen authored
    * flash attn: add auto mode for llama engine
    
    If the user does not specify fa in the environment, use auto-mode.
    
    * review comments
    
    * ensure kv cache quantized types have FA explicitly enabled
    
    additional review comments
    bd6c1d6b
server.go 54.9 KB