kvcache/cache.go · 3ed7ad3ab32b458aa2fdb8d0144c546efdb26a72 · OpenDAS / ollama

kvcache: Pass granular cache size into implementations · 3ed7ad3a

Jesse Gross authored Mar 18, 2025

Currently the runner computes the kv size needed and creates a
cache of that size. This is the context size times number of
parallel sequences.

Cache implementations can make better decisions about their memory
usage, so instead pass in the required capacity, number of sequences
and maximum batch size. For now, the causal cache just uses this to
compute the size in the same way as before.

3ed7ad3a

cache.go 2.52 KB

Replace cache.go