• Jesse Gross's avatar
    runner.go: Don't add inputs to cache view until actually processed · c3ff9164
    Jesse Gross authored
    We need to track which tokens are in the cache ourselves. We currently
    add tokens to the cache tracker when we add them to batch but they are
    not actually in the cache until we call Decode. This can cause
    confusion when we are shifting the cache.
    
    Avoids "could not find a KV slot for the batch" issues.
    
    Bug #7545
    c3ff9164
cache.go 5.88 KB