llm/memory.go · 4e5d862ec47467a87124a180ae91de703283d14c · OpenDAS / ollama

llm: Allow overriding flash attention setting · fdb10946

Jesse Gross authored Oct 01, 2025

As we automatically enable flash attention for more models, there
are likely some cases where we get it wrong. This allows setting
OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually
have flash attention.

fdb10946

memory.go 15 KB

Replace memory.go