With the new version of GGML in #12245, KV cache quantization no longer causes a fallback to CPU.
Attach a file by drag & drop or click to upload