• Jesse Gross's avatar
    ollamarunner: Base cached tokens on current prompt · 499ae731
    Jesse Gross authored
    When we restore a sequence from the cache, we split the prompt into
    the already used tokens (stored in the cache) and new tokens that
    need to be processed. Currently, the references to the used tokens
    are coming from the stored previous sequence.
    
    However, even though we know that the used tokens are semantically
    equivalent to the prefix of the prompt, tokens can contain pointers
    which are no longer valid. As a result, it is better to get the
    used tokens from the prompt, which has currently valid pointers.
    
    This doesn't currently have any impact because it isn't possible
    to reuse the pointers (which are tensors) anyways. However, it
    becomes an issue once we can.
    499ae731
cache.go 7.12 KB