Commits · c5cbe4fc2a9d0204f65a467aa6ebcfeef17ccb2b · OpenDAS / ollama

11 Mar, 2025 16 commits
- fallback to cpu · c5cbe4fc
  Michael Yang authored Mar 10, 2025
  
  c5cbe4fc
- fix vision encoder · f8889128
  Michael Yang authored Mar 09, 2025
  
  f8889128
- ollama debug tensor · 9e4642e9
  Michael Yang authored Mar 09, 2025
  
  9e4642e9
- duplicate token_embd to output · 6b0486c2
  Michael Yang authored Mar 09, 2025
  
  6b0486c2
- skip repacking vision tensors · d368c039
  Michael Yang authored Mar 09, 2025
  
  d368c039
- fix configs · 9b54267e
  Patrick Devine authored Mar 08, 2025
  
  9b54267e
- update model · 46bb0169
  Michael Yang authored Mar 08, 2025
  
  46bb0169
- use fast attention · 8934324b
  Michael Yang authored Mar 07, 2025
  
  8934324b
- Fix tests and drift from main · 0e886595
  Jesse Gross authored Mar 07, 2025
  
  0e886595
- fix conversion · c62861f4
  Patrick Devine authored Mar 07, 2025
  
  c62861f4
- set non-causal attention · 0df18004
  Michael Yang authored Mar 07, 2025
  
  0df18004
- temporary work around for converting spm · 631fecc6
  Patrick Devine authored Mar 07, 2025
  
  631fecc6
- fix drift from main · 4346c240
  Jesse Gross authored Mar 07, 2025
  
  4346c240
- add gemma vision encoder · 4b037a97
  Michael Yang authored Mar 06, 2025
  
  4b037a97
- gemma2 impl · 5f74d1fd
  Patrick Devine authored Feb 07, 2025
  
  5f74d1fd
- Build release for windows with local script (#9636) · 4dcf8016
  Daniel Hiltgen authored Mar 11, 2025
  
  4dcf8016
10 Mar, 2025 10 commits
- Merge pull request #9590 from ollama/mxyng/dump-pad · 26a26998
  Michael Yang authored Mar 10, 2025
```
fix: pad tensor item if ge zero
```
  26a26998
- fix: pad tensor item if ge zero · 9926eae0
  Michael Yang authored Mar 07, 2025
```
this produces a nicer output since both positive and negative values
produces the same width
```
  9926eae0
- docs: add opik to observability integrations (#9626) · 8585b7b1
  Vincent Koc authored Mar 11, 2025
  
  8585b7b1
- sample: add numerical stability to temperature/softmax transform (#9631) · 7e34f4fb
  Parth Sareen authored Mar 10, 2025
  
  7e34f4fb
- Merge pull request #9569 from dwt/patch-1 · fe776293
  Michael Yang authored Mar 10, 2025
```
Better WantedBy declaration
```
  fe776293
- docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545) · d8a5d96b
  frob authored Mar 10, 2025
  
  d8a5d96b
- docs: add SwiftChat (#9540) · 757668c4
  Xiaowei Zhu authored Mar 10, 2025
  
  757668c4
- docs(tool): add mcp-llm (#9537) · 96ec8afd
  Sam authored Mar 11, 2025
  
  96ec8afd
- sample: temporarily use grammars for constrained generation in new engine (#9586) · e093db92
  Jeffrey Morgan authored Mar 10, 2025
  
  e093db92
- model: Update encoder cache to use multimodal input processing handler · a1cda80b
  Jesse Gross authored Mar 08, 2025
```
The encoder cache needs to know the position of images in the input
stream so that it knows when to delete them. Previously images didn't
have a position, so we implied one by breaking batches before an
image and then assuming the image was in the first position. However,
multimodal objects are now given explicit positions in the input
stream, so we can use that instead.

Breaking batches was also a way to simulate a cross attention mask
for mllama. However, given that it only supports a single sequence
and a single image, this mask doesn't serve any real purpose.
Removing the batch break does not appear to affect the quality of
the output.

Most of this is simply moving the input data structures to a new
package to avoid import cycles.
```
  a1cda80b
09 Mar, 2025 1 commit

ollamarunner: Don't panic for unimplemented features at runtime. · 4614fafa

Jesse Gross authored Mar 08, 2025

It's ok to fail on startup but we shouldn't panic during runtime
based on user input. Downgrade the panic to a warning.

4614fafa

08 Mar, 2025 5 commits
- ml: Add support for quantized KV cache · 4100ed7b
  Jesse Gross authored Feb 21, 2025
```
Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.
```
  4100ed7b
- kvcache: Set context for shift offsets · f52b2615
  Jesse Gross authored Mar 07, 2025
  
  f52b2615
- ggml-backend: Ensure allocation meet backend requirements · 25f9b152
  Jesse Gross authored Mar 07, 2025
```
Backends can impose additional alignment requirements on buffer sizes.
We should ensure that we meet these or allocations can fail.
```
  25f9b152
- kvcache: Support non-causal attention · 6da8b6a8
  Jesse Gross authored Mar 07, 2025
```
Models can disable causality for all or part of their processing
while continuing to store data in the KV cache.
```
  6da8b6a8
- ollamarunner: Quiet debug logging and panic on unimplemented features · 0daaaef8
  Jesse Gross authored Mar 07, 2025
```
Debug logging of every token has previously caused test timeouts
on slower machines.
```
  0daaaef8
07 Mar, 2025 8 commits
- additional review comments · 98272fbd
  Jesse Gross authored Mar 07, 2025
  
  98272fbd
- ml/backend/ggml: use backend buffer type · b27e8f3f
  Michael Yang authored Mar 05, 2025
```
this ensures the tensor is created on the right buffer type for backends
such as cpu
```
  b27e8f3f
- comments · 45df786f
  Michael Yang authored Mar 04, 2025
  
  45df786f
- ml/backend/ggml: clean up · daaf42e4
  Michael Yang authored Feb 28, 2025
  
  daaf42e4
- ml/backend/ggml: offload vision to cpu · 2dc60d46
  Michael Yang authored Feb 27, 2025
```
temporary until tensor loading can accurately account for vision models
```
  2dc60d46
- ml/backend/ggml: handle tensor split · b5312f30
  Michael Yang authored Feb 26, 2025
  
  b5312f30
- ml/backend/ggml: handle user specified cpu offloading · 26c2e0bd
  Michael Yang authored Feb 26, 2025
  
  26c2e0bd
- ml/backend/ggml: set cpu n_threads · bf920883
  Michael Yang authored Feb 26, 2025
  
  bf920883