1. 15 Dec, 2024 1 commit
  2. 09 Dec, 2024 1 commit
    • Jesse Gross's avatar
      prompt: Don't trim whitespace from prompts · 900f64e6
      Jesse Gross authored
      New lines can be an important part of a user's prompt and trimming
      it can alter the results. We previously only trimmed prompts with
      images but refactoring brought this behavior to all prompts, where
      it became more noticable.
      
      The /generate endpoint adds less whitespace and therefore doesn't
      need to trim it out - this brings the same behavior to /chat.
      
      Thanks to @gabe-l-hart for spotting the issue!
      
      Fixes #7795
      900f64e6
  3. 05 Nov, 2024 1 commit
    • Jesse Gross's avatar
      prompt: Use a single token when estimating mllama context size · 34a75102
      Jesse Gross authored
      Currently we assume that images take 768 tokens of context size for
      the purposes of clipping old messages that exceed the context window.
      However, our mllama implementation stores the full image embedding
      in a single token. As a result, there is significant waste of context
      space.
      
      Ideally, we would handle this more generically and have the
      implementation report the number of tokens. However, at the moment
      this would just result in a similar set of 'if' conditions in the
      runner plus APIs to report it back. So for now, we just keep this
      simple.
      34a75102
  4. 30 Oct, 2024 1 commit
    • Jesse Gross's avatar
      runner.go: Better abstract vision model integration · c826e574
      Jesse Gross authored
      
      
      -Update mllama to take the cross attention state as embeddings in
      a batch, more similar to how Llava handles it. This improves
      integration with the input cache.
      -Pass locations in a prompt for embeddings using tags similar to Llava.
      -Abstract interface to vision models so the main runner accesses Clip
      and Mllama similarly
      Co-authored-by: default avatarMichael Yang <mxyng@pm.me>
      c826e574
  5. 18 Oct, 2024 1 commit
  6. 15 Jul, 2024 1 commit
  7. 13 Jul, 2024 1 commit
  8. 05 Jul, 2024 2 commits
  9. 01 Jul, 2024 1 commit
  10. 26 Mar, 2024 1 commit
  11. 29 Feb, 2024 1 commit
  12. 16 Feb, 2024 1 commit
  13. 12 Feb, 2024 1 commit