• Jesse Gross's avatar
    kvcache: Sliding window cache only needs a single batch total · 1feff619
    Jesse Gross authored
    When computing the size of the cache for sliding window attention,
    we don't need to multiple the batch size by the number of parallel
    sequences - the batch size is constant.
    
    This also simplifies the check for whether to allocate the cache
    size based on capacity or window size as the batch size is already
    incorporated into the capacity when handled by the runner.
    1feff619
causal.go 16.4 KB