Commits · 1f371ea92f7ebe4edd208b6732753473b2c4d0cd · OpenDAS / ollama

22 May, 2025 2 commits

ml: Panic rather than return error on tensor allocation failure · 1f371ea9

Jesse Gross authored May 19, 2025

FromFloatSlice and FromIntSlice return an error if the shape doesn't
match the passed data or if memory can't be allocated. Since these
are inputs, the memory being allocated is system memory rather than VRAM.

In many cases, the caller can't really handle the error and panics.

Empty and Zeros directly panic if they can't allocate memory.

This makes things consistent by panicing for the first two cases,
removing a fair amount of error handling code. This is also consistent
with how Go typically handles these situations.

1f371ea9

fix: mllama quality (#10807) · adff143b

Michael Yang authored May 22, 2025

* fix mllama convert

- transform attn_gate and ffn_gate
- swap attention heads for vision models

* fix mllama

the mlp gate which was applied in the wrong place

adff143b

21 May, 2025 3 commits

feat: port qwen2 model (#10782) · c8900113
Michael Yang authored May 21, 2025

c8900113
feat: qwen3 dense and sparse models (#10708) · e0ed984c
Michael Yang authored May 21, 2025
```
* feat: qwen3 dense
* feat: qwen3moe
* fix llama4 moe
```
e0ed984c

fix: qwen25vl assign samebatch in multimodal input (#10789) · 69b2fe92

Michael Yang authored May 21, 2025

setting samebatch on the vision start token is problematic because it
will be shared with other inputs that also use images. this will cause
the input to be cached and the runner will not see SameBatch. SameBatch
will also be incorrect since it may be for a different image.

assigning samebatch to the input tokens resolves this by ensure it's
assigned correctly to inputs corresponding to the image.

not setting same batch correctly may cause panics during inference since
images are no longer guaranteed to be in the same batch.

69b2fe92

20 May, 2025 1 commit
- ml: add more rope options (#10775) · 9ed8bf14
  Michael Yang authored May 20, 2025
  
  9ed8bf14
19 May, 2025 1 commit
- fix llama and mistral3 models (#10774) · ff180c34
  Michael Yang authored May 19, 2025
```
* fix llama model

* fix mistral3.1 model

do not set default vision layers
```
  ff180c34
16 May, 2025 1 commit

model: handle multiple eos tokens (#10577) · 333e3604

Michael Yang authored May 16, 2025

* get eos_token_id from generation_config.json

* refactor

* include both ids and strings in trace

* comments

* remove special case for gemma3 special vocab (#10743)

333e3604

15 May, 2025 2 commits

ollamarunner: Separate text and multimodal graphs · 3c14461d

Jesse Gross authored May 05, 2025

For some multimodal models (such as gemma3), we create a single
graph that generates the image embedding and then use this in the
text model. The embedding tensor is completely opaque to the runner.

However, this doesn't work if we need to use the embedding in multiple
batches. This can arise if the embedding is larger than the batch size.
In these cases (as with llama4), we would like to create views that
are more appropriately sized. However, if we do this then the original
source tensor is used in multiple graphs, which isn't allowed. To
avoid that problem, models with this pattern compute the embedding
tensor on first use and recreate the individual views. There is no
longer a single vision and text graph.

This codifies the pattern of separating vision and text graphs. The
logic of computing tensors on demand is moved to the runner, so models
no longer have to worry about this. It also gives the runner visibility
into the multimodal tensors, which is important for memory management.

3c14461d

fix pixel values padding (#10718) · ef202789
Michael Yang authored May 15, 2025
```
* panic if trying to pad 4d

* fix pixel values padding
```
ef202789

14 May, 2025 2 commits
- model: add Qwen2.5-VL support (#10385) · 0aa8b371
  Bruce MacDonald authored May 13, 2025
  
  0aa8b371
- chore: update mllama to use ollama engine (#10637) · 23125648
  Michael Yang authored May 13, 2025
  
  23125648
13 May, 2025 1 commit
- fix vocabulary (#10679) · 526b2ed1
  Michael Yang authored May 12, 2025
  
  526b2ed1
12 May, 2025 1 commit
- models: remove unused qwen2vl processing (#10677) · a7240c6d
  Bruce MacDonald authored May 12, 2025
  
  a7240c6d
26 Apr, 2025 1 commit
- model: fix build (#10416) · 5cfc1c39
  Michael Yang authored Apr 25, 2025
  
  5cfc1c39
25 Apr, 2025 6 commits
- fixes for maverick · 7ba9fa9c
  Michael Yang authored Apr 21, 2025
  
  7ba9fa9c
- chunked attention · 8bf11b84
  Michael Yang authored Apr 10, 2025
  
  8bf11b84
- connect vision to text · 470af8ab
  Michael Yang authored Apr 17, 2025
  
  470af8ab
- image processing · 178761ae
  Michael Yang authored Apr 16, 2025
```
Co-authored-by: Patrick Devine <patrick@infrahq.com>
```
  178761ae
- llama4 · f0c66e6d
  Michael Yang authored Apr 03, 2025
  
  f0c66e6d
- fix token type · d26c18e2
  Michael Yang authored Apr 23, 2025
  
  d26c18e2
24 Apr, 2025 1 commit
- llama: remove model loading for grammar (#10096) · a53d744b
  Parth Sareen authored Apr 24, 2025
  
  a53d744b
18 Apr, 2025 1 commit
- arange · 40b8fdbd
  Michael Yang authored Apr 03, 2025
  
  40b8fdbd
03 Apr, 2025 2 commits

model: support for mistral-small in the ollama runner · 6bd0a983

Bruce MacDonald authored Mar 14, 2025

Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.

6bd0a983

fs: move ml.Config to fs package · 3b96a936
Michael Yang authored Mar 18, 2025

3b96a936

02 Apr, 2025 1 commit
- model: fix issues with spm tokenizer for Gemma 3 (#10081) · b51e0f39
  Jeffrey Morgan authored Apr 02, 2025
  
  b51e0f39
20 Mar, 2025 3 commits

model: Pass input tensor instead of raw data to models · 0fbfcf3c

Jesse Gross authored Mar 19, 2025

Rather than directly giving the input data to models, we can
pass a tensor instead. In the short term, this saves some duplicated
code.

Longer term, we will want to overlap setting up the next batch with
processing of the current one. In this case, we will only have the
shape of tensor but it will not be loaded with data at the time of
graph generation. By passing only a tensor to models now, we set up
this possibility and prevent them from relying on data that they won't
have in the future.

Although the same could be done for Positions and Outputs, in some
cases we either need the raw input data or don't use them at all.
Therefore, for now we leave them as they are and allow models to
convert them to tensors as needed.

0fbfcf3c

input: Rename Options to Batch · 0c220935
Jesse Gross authored Mar 19, 2025
```
Options is no longer very descriptive of this struct.
```
0c220935
gemma2: Remove second call to Rows · b078dd15
Jesse Gross authored Mar 19, 2025
```
Looks like a merge conflict that broke the model.
```
b078dd15

19 Mar, 2025 1 commit
- ml: use input context for extracting outputs (#9875) · da0e3452
  Jeffrey Morgan authored Mar 18, 2025
  
  da0e3452
14 Mar, 2025 2 commits

ollamarunner: Use a separate context per multimodal input · 282bfaaa

Jesse Gross authored Mar 13, 2025

Currently there is a single context per sequence, shared all by
all multimodal inputs. Since we build a vision encoder graph per
image, with a large number of inputs we can eventually hit the
maximum number of graph nodes per context.

This changes to use a separate context for each image, ensuring
that available resource limits are consistent.

282bfaaa

ml: Allow models to constrain inputs to a single batch · 9679f401

Jesse Gross authored Mar 12, 2025

Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697

9679f401

13 Mar, 2025 1 commit
- fix: error if image requested without vision model · 5e2e0b46
  Michael Yang authored Mar 13, 2025
  
  5e2e0b46
12 Mar, 2025 1 commit

models/gemma3: remove final logit softcap (#9692) · a70820da

Bruce MacDonald authored Mar 12, 2025

Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.

a70820da

11 Mar, 2025 6 commits
- all: address linter errors · 83f0ec82
  jmorganca authored Mar 11, 2025
  
  83f0ec82
- use 2d pooling · 63a39406
  Michael Yang authored Mar 11, 2025
  
  63a39406
- add trailing \n\n after <end_of_image> to match reference implementation · 11bfa627
  jmorganca authored Mar 11, 2025
  
  11bfa627
- reduce kernel size, add TODO for loading from config · f63e62e5
  jmorganca authored Mar 11, 2025
  
  f63e62e5
- Revert "Allow models to force a new batch" · 65b0f329
  jmorganca authored Mar 11, 2025
```
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
```
  65b0f329
- Allow models to force a new batch · 06007c0a
  Jesse Gross authored Mar 10, 2025
```
This is useful for a few things:
 - Work around bugs, such as having 2 images in one batch
 - Keep the image in a single batch for fully connected attention
 - Improve performance by not evaluating embeddings multiple times
```
  06007c0a