Commits · e51dead6363e941b480f5bf1270254db7e175083 · OpenDAS / ollama

06 Jan, 2026 1 commit

preserve tool definition and call JSON ordering (#13525) · e51dead6

Devon Rifkin authored Jan 05, 2026

* preserve tool definition and call JSON ordering

This is another iteration of
<https://github.com/ollama/ollama/pull/12518>, but this time we've
simplified things by relaxing the competing requirements of being
compatible AND order-preserving with templates (vs. renderers). We
maintain backwards compatibility at the cost of not guaranteeing order
for templates. We plan on moving more and more models to renderers,
which have been updated to use these new data types, and additionally
we could add an opt-in way of templates getting an order-preserved list
(e.g., via sibling template vars)

* orderedmap_test: remove testify

e51dead6

11 Dec, 2025 1 commit

embeddings: modified batch size (#13429) · 3475d915

nicole pardal authored Dec 11, 2025

This PR detects embedding models and sets batch_size = context_size so the full input fits in a single batch.
Previously, if batch size was smaller than the input, tokens could be split across batches and cause a SIGTRAP crash.
This change ensures all tokens stay in one batch and prevents crashes.
Fixes: #12938 #13054
Co-authored-by: Jesse Gross <jesse@ollama.com>

3475d915

08 Dec, 2025 1 commit

truncation: fixed runner truncation logic + removed server truncation (#12839) · e082d60a

nicole pardal authored Dec 08, 2025

This PR consolidates all embedding prompt-length checking, truncation, and prompt token counting into the runner to ensure a single source of truth.

e082d60a

02 Dec, 2025 2 commits
- test: avoid ministral tools test on low vram (#13302) · 18b5958d
  Daniel Hiltgen authored Dec 02, 2025
```
Avoid hitting test timeouts
```
  18b5958d
- test: add ministral-3 (#13300) · d771043e
  Daniel Hiltgen authored Dec 02, 2025
  
  d771043e
13 Nov, 2025 1 commit
- logprob: add bytes to logprobs (#13068) · c1149875
  Parth Sareen authored Nov 13, 2025
  
  c1149875
11 Nov, 2025 1 commit

server: add logprobs and top_logprobs support to Ollama's API (#12899) · 59241c5b

Baptiste Jamin authored Nov 11, 2025



Adds logprobs support to Ollama's API including support for Ollama's
OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
in the API, Ollama will return the log probabilities for each token generated.
'top_logprobs', an integer value can also be specified up to the value 20.
When specified, the API will also provide the number of most likely tokens to
return at each token position
Co-authored-by: Baptiste Jamin <baptiste@crisp.chat>

59241c5b

31 Oct, 2025 1 commit

embeddings: removed redundant TestAPIEmbeddings test (#12863) · 7dd4862a

nicole pardal authored Oct 30, 2025

This PR removes a redundant test from TestAPIEmbeddings
Contents of this test already exists in embed_test.go and model_arch_test.go

7dd4862a

30 Oct, 2025 1 commit
- testing: test more models with tool calling (#12867) · 76eb7d0f
  Patrick Devine authored Oct 30, 2025
  
  76eb7d0f
29 Oct, 2025 3 commits
- int: harden server lifecycle (#12835) · c8864710
  Daniel Hiltgen authored Oct 29, 2025
```
this should reduce zombies during integration runs
```
  c8864710
- tests: fix embeddinggemma integration test (#12830) · 05aff4a4
  Patrick Devine authored Oct 29, 2025
  
  05aff4a4
- feat(model): add qwen3vl (#12665) · 7d25b9e1
  Michael Yang authored Oct 28, 2025
  
  7d25b9e1
28 Oct, 2025 2 commits
- embed: add distance correlation test for library embed models (#12796) · 36d64fb5
  Patrick Devine authored Oct 28, 2025
  
  36d64fb5
- Revert "server: Consolidate embedding truncation in runner (#12730)" (#12810) · 29f63f37
  Patrick Devine authored Oct 28, 2025
```
This reverts commit 5d347f6d.
```
  29f63f37
27 Oct, 2025 1 commit

server: Consolidate embedding truncation in runner (#12730) · 5d347f6d

nicole pardal authored Oct 27, 2025

Currently, checking the length of prompts for embeddings to ensure
they fit in the context window (and possible truncation) occurs in
two places - the Ollama server and runner. This can lead to
inconsistencies in both the checks and reported number of tokens
processed. Since we have to do this processing in the runner, this
consolidates all of the logic there.

5d347f6d

20 Oct, 2025 1 commit
- runner: always truncate embeddings requests (#12714) · 5fe7ba1b
  Jeffrey Morgan authored Oct 20, 2025
  
  5fe7ba1b
17 Oct, 2025 1 commit

test: harden scheduler tests (#12662) · 68e04c7f

Daniel Hiltgen authored Oct 17, 2025

* test: harden scheduler tests

This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.

* test: tune tests for partial loads

Give stress tests more time when the model is split between CPU/GPU

68e04c7f

16 Oct, 2025 1 commit
- test: add a few missing embedding models (#12661) · b531777a
  Daniel Hiltgen authored Oct 16, 2025
  
  b531777a
08 Oct, 2025 1 commit
- Integration test tuning (#12492) · 4e5d862e
  Daniel Hiltgen authored Oct 08, 2025
```
Remove some flaky scenarios, and switch to chat for better reliability
```
  4e5d862e
02 Oct, 2025 1 commit

Update GGML to b6646 (#12245) · c68f367e

Daniel Hiltgen authored Oct 02, 2025

Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported

c68f367e

01 Oct, 2025 1 commit

Use runners for GPU discovery (#12090) · bc8909fb

Daniel Hiltgen authored Oct 01, 2025

This revamps how we discover GPUs in the system by leveraging the Ollama
runner. This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs. Now the runner does that implicitly based on the actual
device list. In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.

bc8909fb

22 Sep, 2025 1 commit

tests: add single threaded history test (#12295) · c23e6f4c

Daniel Hiltgen authored Sep 22, 2025

* tests: add single threaded history test

Also tidies up some existing tests to handle more model output variation

* test: add support for testing specific architectures

c23e6f4c

18 Sep, 2025 1 commit
- fix(integration): check truncated length (#12337) · ceac416e
  Michael Yang authored Sep 18, 2025
  
  ceac416e
12 Sep, 2025 1 commit

tests: tighten up a few flaky tests (#12271) · 44a67928

Daniel Hiltgen authored Sep 12, 2025

Sometimes the context test results are pure emoji's
Thanksgiving has too much variability, so swap for a more straight forward prompt.

44a67928

09 Sep, 2025 3 commits

tests: add tool calling integration test (#12232) · 20b53eaa
Parth Sareen authored Sep 09, 2025

20b53eaa

tests: reduce stress on CPU to 2 models (#12161) · 67451828

Daniel Hiltgen authored Sep 09, 2025

* tests: reduce stress on CPU to 2 models

This should avoid flakes due to systems getting overloaded with 3 (or more) models running concurrently

* tests: allow slow systems to pass on timeout

If a slow system is still streaming a response, and the response
will pass validation, don't fail just because the system is slow.

* test: unload embedding models more quickly

67451828

llm: Clamp batch size to context size · e119783e

Jesse Gross authored Sep 08, 2025

The context must always be able to store the current batch, so
if the user requests a small context then we should also shrink
the batch to match. This also fixes the TestLongInputContext
test on the new engine. (The old engine already has this behavior.)

e119783e

29 Aug, 2025 1 commit

perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd

Daniel Hiltgen authored Aug 29, 2025

* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.

517807cd

15 Aug, 2025 1 commit

test: improve scheduler/concurrency stress tests (#11906) · d6f7233a

Daniel Hiltgen authored Aug 15, 2025

* test: improve scheduler/concurrency stress tests

The scheduler test used to use approximate memory figures and would often
over or under shoot a systems capcity leading to flaky test results.
This should improve the reliability of this scenario by leveraging
ps output to determinie exactly how many models it takes to
trigger thrashing.

The concurrency test is also refined to target num_parallel + 1 and handle
timeouts better.

With these refinements, TestMultiModelConcurrency was redundant

* test: add parallel generate with history

TestGenerateWithHistory will help verify caching and context
are properly handled while making requests

* test: focus embed tests on embedding models

remove non-embedding models from the embedding tests

d6f7233a

14 Aug, 2025 1 commit
- test: add valid responses (#11902) · c385ca86
  Daniel Hiltgen authored Aug 14, 2025
```
some of the new models need a few more valid responses to pass
```
  c385ca86
13 Aug, 2025 1 commit
- int: adjust a few models for integration tests (#11872) · a24f9060
  Daniel Hiltgen authored Aug 13, 2025
  
  a24f9060
07 Aug, 2025 1 commit
- tests: add integration coverage for oss-gpt (#11696) · 114c3f22
  Daniel Hiltgen authored Aug 07, 2025
```
Also wires up support to override the default "smol" model
```
  114c3f22
11 Jul, 2025 1 commit

Only load supported models on new engine (#11362) · f8a6e888

Daniel Hiltgen authored Jul 11, 2025

* Only load supported models on new engine

Verify the model is supported before trying to load

* int: testcase for all library models

f8a6e888

05 Jul, 2025 1 commit

int: add performance integration tests (#11173) · 4f473e22

Daniel Hiltgen authored Jul 05, 2025

4f473e22

19 Jun, 2025 1 commit
- int: add coverage for older models (#11137) · f2527b08
  Daniel Hiltgen authored Jun 19, 2025
```
Verified these fail on 0.9.1 and pass on HEAD.
```
  f2527b08
24 May, 2025 1 commit
- tests: drop llama3.2-vision embedding tests (#10837) · 2307fc2b
  Daniel Hiltgen authored May 24, 2025
  
  2307fc2b
22 May, 2025 1 commit

integration: add qwen2.5-vl (#10815) · fdd4d479

Daniel Hiltgen authored May 22, 2025

Replace the older llava model with qwen2.5 for vision tests
Skip split-batch test on small VRAM systems to avoid excessive test time

fdd4d479

06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

04 May, 2025 1 commit
- file close check and close. (#10554) · 7e5c8eee
  湛露先生 authored May 05, 2025
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  7e5c8eee
29 Apr, 2025 1 commit

integration: fix embedding tests error handling (#10478) · 7bec2724

Daniel Hiltgen authored Apr 29, 2025

The cleanup routine from InitServerconnection should run in the defer of the test case to properly detect failures and report the server logs

7bec2724