1. 05 Nov, 2024 1 commit
    • Jesse Gross's avatar
      prompt: Use a single token when estimating mllama context size · 34a75102
      Jesse Gross authored
      Currently we assume that images take 768 tokens of context size for
      the purposes of clipping old messages that exceed the context window.
      However, our mllama implementation stores the full image embedding
      in a single token. As a result, there is significant waste of context
      space.
      
      Ideally, we would handle this more generically and have the
      implementation report the number of tokens. However, at the moment
      this would just result in a similar set of 'if' conditions in the
      runner plus APIs to report it back. So for now, we just keep this
      simple.
      34a75102
  2. 30 Oct, 2024 1 commit
    • Jesse Gross's avatar
      runner.go: Better abstract vision model integration · c826e574
      Jesse Gross authored
      
      
      -Update mllama to take the cross attention state as embeddings in
      a batch, more similar to how Llava handles it. This improves
      integration with the input cache.
      -Pass locations in a prompt for embeddings using tags similar to Llava.
      -Abstract interface to vision models so the main runner accesses Clip
      and Mllama similarly
      Co-authored-by: default avatarMichael Yang <mxyng@pm.me>
      c826e574
  3. 18 Oct, 2024 1 commit
  4. 15 Jul, 2024 1 commit
  5. 13 Jul, 2024 1 commit
  6. 05 Jul, 2024 2 commits
  7. 01 Jul, 2024 1 commit
  8. 26 Mar, 2024 1 commit
  9. 29 Feb, 2024 1 commit
  10. 16 Feb, 2024 1 commit
  11. 12 Feb, 2024 1 commit