- 05 Nov, 2024 1 commit
-
-
Jesse Gross authored
Currently we assume that images take 768 tokens of context size for the purposes of clipping old messages that exceed the context window. However, our mllama implementation stores the full image embedding in a single token. As a result, there is significant waste of context space. Ideally, we would handle this more generically and have the implementation report the number of tokens. However, at the moment this would just result in a similar set of 'if' conditions in the runner plus APIs to report it back. So for now, we just keep this simple.
-
- 30 Oct, 2024 1 commit
-
-
Jesse Gross authored
-Update mllama to take the cross attention state as embeddings in a batch, more similar to how Llava handles it. This improves integration with the input cache. -Pass locations in a prompt for embeddings using tags similar to Llava. -Abstract interface to vision models so the main runner accesses Clip and Mllama similarly Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 18 Oct, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me> Co-authored-by:
Jesse Gross <jesse@ollama.com>
-
- 15 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 13 Jul, 2024 1 commit
-
-
Michael Yang authored
* fix system prompt * execute template when hitting previous roles * fix tests --------- Co-authored-by:jmorganca <jmorganca@gmail.com>
-
- 05 Jul, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 01 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 26 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 29 Feb, 2024 1 commit
-
-
Michael Yang authored
instead of appending image tags, prepend them - this generally produces better results
-
- 16 Feb, 2024 1 commit
-
-
Bruce MacDonald authored
-
- 12 Feb, 2024 1 commit
-
-
Jeffrey Morgan authored
-