ml/backend/ggml/ggml.go · 4100ed7bdd417ae6d25bf64467fb9df33f3f6525 · OpenDAS / ollama

Jesse Gross authored Feb 21, 2025

Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.

4100ed7b

ggml.go 23.8 KB