• Jesse Gross's avatar
    kvcache: Pass granular cache size into implementations · 3ed7ad3a
    Jesse Gross authored
    Currently the runner computes the kv size needed and creates a
    cache of that size. This is the context size times number of
    parallel sequences.
    
    Cache implementations can make better decisions about their memory
    usage, so instead pass in the required capacity, number of sequences
    and maximum batch size. For now, the causal cache just uses this to
    compute the size in the same way as before.
    3ed7ad3a
cache.go 2.52 KB