".github/ISSUE_TEMPLATE/-feature2-.yaml" did not exist on "31b01f5b99a88d50ab95fe1c41c5f670c602febe"
  • Jesse Gross's avatar
    ggml: Always set cache padding to 256 · 7837a5bc
    Jesse Gross authored
    We currently use cache padding of 32 when not using flash attention
    and 256 with flash attention, which is based on the historic alignment
    requirements of these kernels. The restrictions have since been
    loosened but there are still performance benefits, such as better
    CUDA graph reuse.
    
    Since the requirement is no longer kernel-specific, set the padding
    uniformly to 256, as llama.cpp has.
    7837a5bc
ggml.go 47.2 KB