Commits · 20e35938630e633a5f40fd2c8b097b0dbfbda8d9 · OpenDAS / ollama

11 Mar, 2025 30 commits
- model: validate left and right pairs before merging them · 20e35938
  jmorganca authored Mar 11, 2025
  
  20e35938
- use 2d pooling · 63a39406
  Michael Yang authored Mar 11, 2025
  
  63a39406
- llm: auto detect models that require Ollama Engine (#1 ) · ab39e08e
  Daniel Hiltgen authored Mar 11, 2025
  
  ab39e08e
- add trailing \n\n after <end_of_image> to match reference implementation · 11bfa627
  jmorganca authored Mar 11, 2025
  
  11bfa627
- reduce kernel size, add TODO for loading from config · f63e62e5
  jmorganca authored Mar 11, 2025
  
  f63e62e5
- Revert "Allow models to force a new batch" · 65b0f329
  jmorganca authored Mar 11, 2025
```
This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.
```
  65b0f329
- Allow models to force a new batch · 06007c0a
  Jesse Gross authored Mar 10, 2025
```
This is useful for a few things:
 - Work around bugs, such as having 2 images in one batch
 - Keep the image in a single batch for fully connected attention
 - Improve performance by not evaluating embeddings multiple times
```
  06007c0a
- Disable causal attention based on batch index · a8e83a76
  Jesse Gross authored Mar 10, 2025
```
Currently we are using positions, which are relative to a
sequence and may not be unique.
```
  a8e83a76
- Restrict Gemma to a single image per request · 47500550
  Jesse Gross authored Mar 10, 2025
  
  47500550
- Fix follow up images and images split across batches · 2c40c4d3
  Jesse Gross authored Mar 09, 2025
  
  2c40c4d3
- use non-causal mask only for image positions · e9527893
  Michael Yang authored Mar 10, 2025
  
  e9527893
- use non-causal mask for inputs with images · 9d2a20a7
  Michael Yang authored Mar 10, 2025
  
  9d2a20a7
- fix gemma3 1b conversion · 2e54d72f
  Patrick Devine authored Mar 10, 2025
  
  2e54d72f
- compat with upstream gguf · 6b32a2d5
  Michael Yang authored Mar 10, 2025
  
  6b32a2d5
- fallback to cpu · c5cbe4fc
  Michael Yang authored Mar 10, 2025
  
  c5cbe4fc
- fix vision encoder · f8889128
  Michael Yang authored Mar 09, 2025
  
  f8889128
- ollama debug tensor · 9e4642e9
  Michael Yang authored Mar 09, 2025
  
  9e4642e9
- duplicate token_embd to output · 6b0486c2
  Michael Yang authored Mar 09, 2025
  
  6b0486c2
- skip repacking vision tensors · d368c039
  Michael Yang authored Mar 09, 2025
  
  d368c039
- fix configs · 9b54267e
  Patrick Devine authored Mar 08, 2025
  
  9b54267e
- update model · 46bb0169
  Michael Yang authored Mar 08, 2025
  
  46bb0169
- use fast attention · 8934324b
  Michael Yang authored Mar 07, 2025
  
  8934324b
- Fix tests and drift from main · 0e886595
  Jesse Gross authored Mar 07, 2025
  
  0e886595
- fix conversion · c62861f4
  Patrick Devine authored Mar 07, 2025
  
  c62861f4
- set non-causal attention · 0df18004
  Michael Yang authored Mar 07, 2025
  
  0df18004
- temporary work around for converting spm · 631fecc6
  Patrick Devine authored Mar 07, 2025
  
  631fecc6
- fix drift from main · 4346c240
  Jesse Gross authored Mar 07, 2025
  
  4346c240
- add gemma vision encoder · 4b037a97
  Michael Yang authored Mar 06, 2025
  
  4b037a97
- gemma2 impl · 5f74d1fd
  Patrick Devine authored Feb 07, 2025
  
  5f74d1fd
- Build release for windows with local script (#9636) · 4dcf8016
  Daniel Hiltgen authored Mar 11, 2025
  
  4dcf8016
10 Mar, 2025 10 commits
- Merge pull request #9590 from ollama/mxyng/dump-pad · 26a26998
  Michael Yang authored Mar 10, 2025
```
fix: pad tensor item if ge zero
```
  26a26998
- fix: pad tensor item if ge zero · 9926eae0
  Michael Yang authored Mar 07, 2025
```
this produces a nicer output since both positive and negative values
produces the same width
```
  9926eae0
- docs: add opik to observability integrations (#9626) · 8585b7b1
  Vincent Koc authored Mar 11, 2025
  
  8585b7b1
- sample: add numerical stability to temperature/softmax transform (#9631) · 7e34f4fb
  Parth Sareen authored Mar 10, 2025
  
  7e34f4fb
- Merge pull request #9569 from dwt/patch-1 · fe776293
  Michael Yang authored Mar 10, 2025
```
Better WantedBy declaration
```
  fe776293
- docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545) · d8a5d96b
  frob authored Mar 10, 2025
  
  d8a5d96b
- docs: add SwiftChat (#9540) · 757668c4
  Xiaowei Zhu authored Mar 10, 2025
  
  757668c4
- docs(tool): add mcp-llm (#9537) · 96ec8afd
  Sam authored Mar 11, 2025
  
  96ec8afd
- sample: temporarily use grammars for constrained generation in new engine (#9586) · e093db92
  Jeffrey Morgan authored Mar 10, 2025
  
  e093db92
- model: Update encoder cache to use multimodal input processing handler · a1cda80b
  Jesse Gross authored Mar 08, 2025
```
The encoder cache needs to know the position of images in the input
stream so that it knows when to delete them. Previously images didn't
have a position, so we implied one by breaking batches before an
image and then assuming the image was in the first position. However,
multimodal objects are now given explicit positions in the input
stream, so we can use that instead.

Breaking batches was also a way to simulate a cross attention mask
for mllama. However, given that it only supports a single sequence
and a single image, this mask doesn't serve any real purpose.
Removing the batch break does not appear to affect the quality of
the output.

Most of this is simply moving the input data structures to a new
package to avoid import cycles.
```
  a1cda80b