ml: Add support for quantized KV cache
Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.
Showing
Please register or sign in to comment
Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.