• jmorganca's avatar
    kvcache: Add check for values that fall out of sliding window cache · b4297006
    jmorganca authored
    
    
    The sliding window cache trims entries that are outside the window for
    the latest token. This works when we are extending the cache, such as
    when the conversation continues. However, if we have a partial overlap
    in conversation (including the BOS tokens), then we resume from a past
    point in the conversation and the needed tokens are no longer stored
    in memory. This verifies that the new window overlaps with the old one
    before reusing the cache.
    Co-authored-by: default avatarJesse Gross <jesse@ollama.com>
    b4297006
encoder.go 3.46 KB