• Jesse Gross's avatar
    runner.go: Retry decoding after defragmentation if needed · 7121dfa3
    Jesse Gross authored
    Fragmentation of the KV cache can occur due to cache shifting or
    different sequences getting processed. Decode uses a heuristic to
    decide if it should defrag. However, this heuristic isn't 100%
    accurate, so decoding can sometimes fail by surprise.
    
    For these cases, if decode indicates that there is no KV cache space,
    we should defrag and then try again.
    7121dfa3
context_test.go 3.38 KB