"graphbolt/src/cuda/gpu_cache.cu" did not exist on "528b041c51aae91afb7b40c031010f24cfcd3cf8"
llm: Support KV cache quantization with gpt-oss
With the new version of GGML in #12245, KV cache quantization no longer causes a fallback to CPU.
Showing
Please register or sign in to comment