-
Jesse Gross authored
Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.
4100ed7b
Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.