llm/memory.go · 71cb86af3e8b8006540550a8eeb9fed106b77eee · OpenDAS / ollama

llm: Remove unneeded warning with flash attention enabled · 71cb86af

Jesse Gross authored Sep 09, 2025

If flash attention is enabled without KV cache quanitization, we will
currently always get this warning:
level=WARN source=server.go:226 msg="kv cache type not supported by model" type=""

71cb86af

memory.go 15 KB

Replace memory.go