• Jesse Gross's avatar
    ollamarunner: Don't truncate a SameBatch · 493385eb
    Jesse Gross authored
    When truncating inputs to the the context window at the beginning of
    a sequence, we remove the minimum amount possible. However, this
    may cause us to truncate to the middle of a set of inputs that
    the model specified should not be split up. To avoid this, we
    need to remove the rest of the partial batch.
    493385eb
cache.go 7.48 KB