".github/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "53748217e692792cd1f96c25777e02628b557061"
llm: Allow overriding flash attention setting
As we automatically enable flash attention for more models, there are likely some cases where we get it wrong. This allows setting OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually have flash attention.
Showing
Please register or sign in to comment