• Jesse Gross's avatar
    kvcache: Don't shift empty batches · c116a752
    Jesse Gross authored
    When we context shift, we delete half the context and apply RoPE
    with an offset to the other half. We used to RoPE across the entire
    context in a single pass with a zero offset for the deleted
    section. With the change to shifting in batches, we can skip any
    batches where all of the offsets would be zero. This typically
    reduces the number of operations by half.
    c116a752
causal.go 18.4 KB