Commits · 87b7af6ceef2b4d96374dbff5070b41b17d3f138 · OpenDAS / ollama

20 Jun, 2025 1 commit

ggml: Check return status for computation. · 87b7af6c

Jesse Gross authored Jun 19, 2025

We don't check the return status after computing the graph, which
can silently lead to bad outputs if we try to keep going and future
computation succeeds. This appears to happens in certain cases on
Apple M2 devices.

Fixes #11070

87b7af6c

19 Jun, 2025 1 commit
- int: add coverage for older models (#11137) · f2527b08
  Daniel Hiltgen authored Jun 19, 2025
```
Verified these fail on 0.9.1 and pass on HEAD.
```
  f2527b08
18 Jun, 2025 6 commits
- benchmark: remove unused benchmark test (#11120) · 8bcb3125
  Jeffrey Morgan authored Jun 18, 2025
```
Removes a test under benchmark/ that is unused
```
  8bcb3125
- Revert "Revert "ggml: Export GPU UUIDs" (#11115)" (#11117) · 6baf1e31
  Jeffrey Morgan authored Jun 18, 2025
```
Reverts PR #11115. The original change was mistakingly reverted instead of #10822
```
  6baf1e31
- Revert "ggml: Export GPU UUIDs" (#11115) · ed567ef4
  Jeffrey Morgan authored Jun 18, 2025
```
This reverts commit aaa78180.
```
  ed567ef4
- Revert "feat: incremental gguf parser (#10822)" (#11114) · a6e64fbd
  Jeffrey Morgan authored Jun 18, 2025
```
This reverts commit 6b04cad7.
```
  a6e64fbd
- cache: fix comment function name in cache.go (#11110) · 60cfa2a2
  曹家巧 authored Jun 18, 2025
  
  60cfa2a2
- tools: return empty arguments object instead of null (#11113) · 55bbf3b4
  Jeffrey Morgan authored Jun 18, 2025
  
  55bbf3b4
17 Jun, 2025 1 commit

tools: fix parsing tool calls without any parameters (#11101) · 6bda1d24

Jeffrey Morgan authored Jun 17, 2025

Fixes issue where tool calls that don't expect any parameters were
not being parsed. This also fixes two additional issues: one where
2+ tool calls would not be correctly parsed, and cases where tool calls
with invalid parameters would still get parsed

6bda1d24

16 Jun, 2025 3 commits
- model: treat 'user defined' tokens as special tokens (#11077) · 9e125d88
  Jeffrey Morgan authored Jun 16, 2025
  
  9e125d88
- gguf: fix write order (#11068) · a6fbfc88
  Michael Yang authored Jun 16, 2025
```
* ggml: test write gguf order
* ggml: fix write tensor order
```
  a6fbfc88
- readme: add ollama-launcher to community integrations (#11080) · 50202896
  NGC13009 authored Jun 16, 2025
  
  50202896
14 Jun, 2025 1 commit
- readme: add GPTranslate to community integrations (#11071) · 5a8eb0e1
  Phil authored Jun 14, 2025
  
  5a8eb0e1
12 Jun, 2025 2 commits

tools: loosen tool parsing to allow for more formats (#11030) · 9f8a18ec
Jeffrey Morgan authored Jun 12, 2025

9f8a18ec

feat: incremental gguf parser (#10822) · 6b04cad7

Michael Yang authored Jun 12, 2025



* incremental gguf parser
* gguf: update test to not rely on gguf on disc
* re-use existing create gguf
* read capabilities from gguf kv
* kv exists
* update tests
* s/doneFunc/successFunc/g
* new buffered reader

---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

6b04cad7

11 Jun, 2025 3 commits

feat: uneven splits (#11048) · 45f56355

Michael Yang authored Jun 11, 2025

The current splitDim function only operates on tensors that are split evenly which isn't always the case, e.g. a QKV tensor. This change allows the function to be used for arbitrary splits

45f56355

skip tokenizer.model if possible (#11050) · 0dabb4ef
Michael Yang authored Jun 11, 2025
```
if tokenizer.json is already copied, skip tokenizer.model
```
0dabb4ef

use nn.Linear in place of ml.Tensor (#11049) · 2e77aa1a

Michael Yang authored Jun 11, 2025

while nn.Linear.Forward isn't applicable for sparse MLP, it's still
a nice container for the tensors

2e77aa1a

10 Jun, 2025 3 commits
- readme: add ollama-multirun to community integrations (#11038) · deaabe29
  Attogram Project authored Jun 10, 2025
  
  deaabe29
- readme: update quickstart link text to Gemma 3 · af21a5ac
  Jeffrey Morgan authored Jun 10, 2025
  
  af21a5ac
- readme: update quickstart example to Gemma 3 · f63d7f68
  Jeffrey Morgan authored Jun 10, 2025
  
  f63d7f68
09 Jun, 2025 1 commit

mac: handle "keep" named apps (#11031) · 82ad1dbc

Daniel Hiltgen authored Jun 09, 2025

When a user elects to keep the existing app, the
new Ollama is named `Ollama 2.app`
This fixes the app startup flow to handle this naming pattern.

82ad1dbc

08 Jun, 2025 1 commit
- spawn desktop quickly (#11011) · feeabdad
  Daniel Hiltgen authored Jun 08, 2025
```
Give the desktop app a hint to start fast.
```
  feeabdad
07 Jun, 2025 2 commits
- docs: update link to AMD drivers in linux.md (#10973) · fc030961
  Krzysztof Jeziorny authored Jun 07, 2025
  
  fc030961
- Revert "server: add model capabilities to the list endpoint (#10174)" (#11004) · 09d308d6
  Jeffrey Morgan authored Jun 06, 2025
```
This reverts commit 09430011.
```
  09d308d6
06 Jun, 2025 4 commits
- launch app hidden (#10962) · a8ed68bd
  Daniel Hiltgen authored Jun 06, 2025
```
When starting the app in the background, start it hidden.
```
  a8ed68bd
- win: handle more than 2048 processes (#10997) · 2ae65ae4
  Daniel Hiltgen authored Jun 06, 2025
```
Fix an array out of bounds crash
```
  2ae65ae4
- move thinking logic into its own package (#10990) · a3b6886b
  Devon Rifkin authored Jun 06, 2025
```
move thinking logic into its own package
```
  a3b6886b
- docs: fix typo in development.md (#10998) · c6a6d729
  Hunter Wittenborn authored Jun 06, 2025
  
  c6a6d729
05 Jun, 2025 2 commits
- Merge pull request #10987 from ollama/drifkin/export-thinking-parser · 2cf007c9
  Devon Rifkin authored Jun 05, 2025
```
export ThinkingParser
```
  2cf007c9
- export ThinkingParser · 0683efa6
  Devon Rifkin authored Jun 05, 2025
  
  0683efa6
04 Jun, 2025 1 commit
- server: add model capabilities to the list endpoint (#10174) · 09430011
  JasonHonKL authored Jun 05, 2025
  
  09430011
31 May, 2025 1 commit
- readme: add SimpleOllamaUnity to community integrations (#10817) · 5c42800f
  HardCodeDev authored May 31, 2025
  
  5c42800f
30 May, 2025 1 commit
- tools: resiliency upgrade to name and arg extraction from template (#10917) · 65f10c28
  Parth Sareen authored May 30, 2025
  
  65f10c28
29 May, 2025 3 commits

ggml: Export GPU UUIDs · aaa78180

Jesse Gross authored Apr 24, 2025

This enables matching up devices and information reported by the backend
with system management libraries such as nvml to get accurate free
memory reporting.

aaa78180

llm: Make "POST predict" error message more informative · f15ffc43

Jesse Gross authored May 13, 2025

"POST predict" basically means that the runner has crashed, which
can have many reasons. However, many people think this is a specific
error and either report only this message or group together unrelated
bugs. This replaces it with a more friendly and helpful message.

f15ffc43

add thinking support to the api and cli (#10584) · 5f57b0ef

Devon Rifkin authored May 28, 2025

- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models

5f57b0ef

27 May, 2025 3 commits

client: add request signing to the client (#10881) · aa25aff1
Patrick Devine authored May 27, 2025
```
If OLLAMA_AUTH is set, sign each request w/ a timestamp and pass the signature in the token header
```
aa25aff1

kvcache: Skip computing causal mask for worst case graph reservation · ea790031

Jesse Gross authored May 27, 2025

Computing an attention mask for a large context and max batch is
expensive - over 100ms. Models like Gemma3 that have multiple types
of caches and custom attention masks need to do this 4 times, so this
adds approximately 500ms to startup time when using 128k context

When we are reserving the worst case graph, we don't need the mask,
only its shape, so we can skip this.

ea790031

server: abort download on empty digest · 9239a254
Kyle Steere authored May 27, 2025
```
Signed-off-by: Kyle Steere <kyle.steere@chainguard.dev>
```
9239a254