fs/ggml/ggml.go · 29ddfc2cab7f5a83a96c3133094f67b22e4f27d1 · OpenDAS / ollama

ggml: Disable flash attention for gemma2 · 29ddfc2c

Jesse Gross authored Sep 09, 2025

Our new engine implementation of gemma2 doesn't support flash
attention, which means that it also doesn't support KV cache
quantization. Currently, it is possible to turn these two on,
which will result in a crash.

29ddfc2c

ggml.go 24.4 KB

Replace ggml.go