• Daniël de Kok's avatar
    Simplify the `attention` function (#2609) · 59ea38cb
    Daniël de Kok authored
    * Simplify the `attention` function
    
    - Use one definition rather than multiple.
    - Add `key`/`value` arguments, so that we don't need the
      `PREFILL_IN_KVCACHE` constant.
    - Make it kwargs-only (to avoid mixing up the various `Tensor` args).
    
    * Fixup flashinfer support
    59ea38cb
kv_cache.py 3.69 KB