Commits · 7bf793a6007ca11fae0180ea6f2ebd7258428bd4 · OpenDAS / ollama

14 Mar, 2025 1 commit

gemma3: Allow multiple image in a single input · 7bf793a6

Jesse Gross authored Mar 12, 2025

Previously processing multiple images in a batch would trigger
segfaults so sending images together was disabled as a way to
mitigate this. The trigger was processing one image on the CPU
and one on the GPU.

This can no longer happen:
 - The vision encoder is now on the GPU so both images would be
   processed on the GPU.
 - We require images to be fully contained in a batch and each
   image including its special tokens is over half the batch size.
   As a result, we will never get two images in the same batch.

Fixes #9731

7bf793a6

11 Mar, 2025 3 commits

Revert "Allow models to force a new batch" · 65b0f329
jmorganca authored Mar 11, 2025
```
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
```
65b0f329

Allow models to force a new batch · 06007c0a

Jesse Gross authored Mar 10, 2025

This is useful for a few things:
 - Work around bugs, such as having 2 images in one batch
 - Keep the image in a single batch for fully connected attention
 - Improve performance by not evaluating embeddings multiple times

06007c0a

Restrict Gemma to a single image per request · 47500550
Jesse Gross authored Mar 10, 2025

47500550

04 Mar, 2025 1 commit

New engine: vision models and auto-fallback (#9113) · 1fdb351c

Daniel Hiltgen authored Mar 04, 2025

* Include unified vision layers in memory prediction

For newer vision models with a single gguf, include
the projection estimates.

* Adjust CLI to handle both styles of vision model metadata

* Wire up new tokenizers for new engine

If we're loading the new engine, utilize the new model
text processor instead of calling into cgo wrappers for
llama.cpp.  This also cleans up some tech debt from the
older tokenization flow for the C++ server which was
no longer used.

This also adjusts the grammar handling logic to pass
through to the new engine instead of utilizing the cgo
schema to grammar call.

* Lay foundation for auto selection of new engine

1fdb351c

14 Feb, 2025 2 commits

Runner for Ollama engine · ed443a03

Jesse Gross authored Dec 17, 2024

This provides integration with the new Ollama engine
(58245413 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1

ed443a03

models: Move model into their own directory · 6945617a

Jesse Gross authored Feb 05, 2025

This allows there to be a file that is a list of models that is
not mixed into the runner code.

6945617a

15 Dec, 2024 1 commit
- imageproc mllama refactor (#7537) · 8c9fb8eb
  Patrick Devine authored Dec 14, 2024
```
Refactor mllama image processing code, and add pixtral and qwen2vl
```
  8c9fb8eb
09 Dec, 2024 1 commit

prompt: Don't trim whitespace from prompts · 900f64e6

Jesse Gross authored Dec 06, 2024

New lines can be an important part of a user's prompt and trimming
it can alter the results. We previously only trimmed prompts with
images but refactoring brought this behavior to all prompts, where
it became more noticable.

The /generate endpoint adds less whitespace and therefore doesn't
need to trim it out - this brings the same behavior to /chat.

Thanks to @gabe-l-hart for spotting the issue!

Fixes #7795

900f64e6

05 Nov, 2024 1 commit

prompt: Use a single token when estimating mllama context size · 34a75102

Jesse Gross authored Nov 04, 2024

Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.

34a75102

30 Oct, 2024 1 commit

runner.go: Better abstract vision model integration · c826e574

Jesse Gross authored Oct 11, 2024



-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly
Co-authored-by: Michael Yang <mxyng@pm.me>

c826e574

18 Oct, 2024 1 commit

image processing for llama3.2 (#6963) · c7cb0f06

Patrick Devine authored Oct 18, 2024


Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>

c7cb0f06

15 Jul, 2024 1 commit
- tools · d02bbebb
  Michael Yang authored Jun 20, 2024
  
  d02bbebb
13 Jul, 2024 1 commit

fix system prompt (#5662) · 22c5451f

Michael Yang authored Jul 12, 2024



* fix system prompt

* execute template when hitting previous roles

* fix tests

---------
Co-authored-by: jmorganca <jmorganca@gmail.com>

22c5451f

05 Jul, 2024 2 commits
- comments · 2c3fe1fd
  Michael Yang authored Jun 20, 2024
  
  2c3fe1fd
- update message processing · 269ed6e6
  Michael Yang authored Jun 17, 2024
  
  269ed6e6
01 Jul, 2024 1 commit
- rename templates to template · 58e3fff3
  Michael Yang authored Jun 10, 2024
  
  58e3fff3
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
29 Feb, 2024 1 commit

prepend image tags (#2789) · 0e19476b

Michael Yang authored Feb 29, 2024

instead of appending image tags, prepend them - this generally produces better results

0e19476b

16 Feb, 2024 1 commit
- fix: chat system prompting overrides (#2542) · 88622847
  Bruce MacDonald authored Feb 16, 2024
  
  88622847
12 Feb, 2024 1 commit
- Fix issues with templating prompt in chat mode (#2460) · 48a273f8
  Jeffrey Morgan authored Feb 12, 2024
  
  48a273f8