• Jesse Gross's avatar
    llm: Allow overriding flash attention setting · fdb10946
    Jesse Gross authored
    As we automatically enable flash attention for more models, there
    are likely some cases where we get it wrong. This allows setting
    OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually
    have flash attention.
    fdb10946
memory.go 15 KB