1. 15 May, 2025 3 commits
    • Jesse Gross's avatar
      ollamarunner: Base cached tokens on current prompt · 499ae731
      Jesse Gross authored
      When we restore a sequence from the cache, we split the prompt into
      the already used tokens (stored in the cache) and new tokens that
      need to be processed. Currently, the references to the used tokens
      are coming from the stored previous sequence.
      
      However, even though we know that the used tokens are semantically
      equivalent to the prefix of the prompt, tokens can contain pointers
      which are no longer valid. As a result, it is better to get the
      used tokens from the prompt, which has currently valid pointers.
      
      This doesn't currently have any impact because it isn't possible
      to reuse the pointers (which are tensors) anyways. However, it
      becomes an issue once we can.
      499ae731
    • Michael Yang's avatar
      fix pixel values padding (#10718) · ef202789
      Michael Yang authored
      * panic if trying to pad 4d
      
      * fix pixel values padding
      ef202789
    • Michael Yang's avatar
      fix mllama conversion (#10716) · 55760195
      Michael Yang authored
      cross attention Q and K projections needs to have their heads swapped, similar to non-cross attention Q and K tensors
      55760195
  2. 14 May, 2025 4 commits
  3. 13 May, 2025 7 commits
  4. 12 May, 2025 5 commits
  5. 11 May, 2025 2 commits
  6. 10 May, 2025 5 commits
  7. 08 May, 2025 5 commits
  8. 07 May, 2025 5 commits
  9. 06 May, 2025 4 commits