Commits · 76eb7d0fff04563ee89e253fc71a4cbf5d0f05f7 · OpenDAS / ollama

30 Oct, 2025 1 commit
- testing: test more models with tool calling (#12867) · 76eb7d0f
  Patrick Devine authored Oct 30, 2025
  
  76eb7d0f
29 Oct, 2025 3 commits
- int: harden server lifecycle (#12835) · c8864710
  Daniel Hiltgen authored Oct 29, 2025
```
this should reduce zombies during integration runs
```
  c8864710
- tests: fix embeddinggemma integration test (#12830) · 05aff4a4
  Patrick Devine authored Oct 29, 2025
  
  05aff4a4
- feat(model): add qwen3vl (#12665) · 7d25b9e1
  Michael Yang authored Oct 28, 2025
  
  7d25b9e1
28 Oct, 2025 2 commits
- embed: add distance correlation test for library embed models (#12796) · 36d64fb5
  Patrick Devine authored Oct 28, 2025
  
  36d64fb5
- Revert "server: Consolidate embedding truncation in runner (#12730)" (#12810) · 29f63f37
  Patrick Devine authored Oct 28, 2025
```
This reverts commit 5d347f6d.
```
  29f63f37
27 Oct, 2025 1 commit

server: Consolidate embedding truncation in runner (#12730) · 5d347f6d

nicole pardal authored Oct 27, 2025

Currently, checking the length of prompts for embeddings to ensure
they fit in the context window (and possible truncation) occurs in
two places - the Ollama server and runner. This can lead to
inconsistencies in both the checks and reported number of tokens
processed. Since we have to do this processing in the runner, this
consolidates all of the logic there.

5d347f6d

20 Oct, 2025 1 commit
- runner: always truncate embeddings requests (#12714) · 5fe7ba1b
  Jeffrey Morgan authored Oct 20, 2025
  
  5fe7ba1b
17 Oct, 2025 1 commit

test: harden scheduler tests (#12662) · 68e04c7f

Daniel Hiltgen authored Oct 17, 2025

* test: harden scheduler tests

This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.

* test: tune tests for partial loads

Give stress tests more time when the model is split between CPU/GPU

68e04c7f

16 Oct, 2025 1 commit
- test: add a few missing embedding models (#12661) · b531777a
  Daniel Hiltgen authored Oct 16, 2025
  
  b531777a
08 Oct, 2025 1 commit
- Integration test tuning (#12492) · 4e5d862e
  Daniel Hiltgen authored Oct 08, 2025
```
Remove some flaky scenarios, and switch to chat for better reliability
```
  4e5d862e
02 Oct, 2025 1 commit

Update GGML to b6646 (#12245) · c68f367e

Daniel Hiltgen authored Oct 02, 2025

Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported

c68f367e

01 Oct, 2025 1 commit

Use runners for GPU discovery (#12090) · bc8909fb

Daniel Hiltgen authored Oct 01, 2025

This revamps how we discover GPUs in the system by leveraging the Ollama
runner. This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs. Now the runner does that implicitly based on the actual
device list. In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.

bc8909fb

22 Sep, 2025 1 commit

tests: add single threaded history test (#12295) · c23e6f4c

Daniel Hiltgen authored Sep 22, 2025

* tests: add single threaded history test

Also tidies up some existing tests to handle more model output variation

* test: add support for testing specific architectures

c23e6f4c

18 Sep, 2025 1 commit
- fix(integration): check truncated length (#12337) · ceac416e
  Michael Yang authored Sep 18, 2025
  
  ceac416e
12 Sep, 2025 1 commit

tests: tighten up a few flaky tests (#12271) · 44a67928

Daniel Hiltgen authored Sep 12, 2025

Sometimes the context test results are pure emoji's
Thanksgiving has too much variability, so swap for a more straight forward prompt.

44a67928

09 Sep, 2025 3 commits

tests: add tool calling integration test (#12232) · 20b53eaa
Parth Sareen authored Sep 09, 2025

20b53eaa

tests: reduce stress on CPU to 2 models (#12161) · 67451828

Daniel Hiltgen authored Sep 09, 2025

* tests: reduce stress on CPU to 2 models

This should avoid flakes due to systems getting overloaded with 3 (or more) models running concurrently

* tests: allow slow systems to pass on timeout

If a slow system is still streaming a response, and the response
will pass validation, don't fail just because the system is slow.

* test: unload embedding models more quickly

67451828

llm: Clamp batch size to context size · e119783e

Jesse Gross authored Sep 08, 2025

The context must always be able to store the current batch, so
if the user requests a small context then we should also shrink
the batch to match. This also fixes the TestLongInputContext
test on the new engine. (The old engine already has this behavior.)

e119783e

29 Aug, 2025 1 commit

perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd

Daniel Hiltgen authored Aug 29, 2025

* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.

517807cd

15 Aug, 2025 1 commit

test: improve scheduler/concurrency stress tests (#11906) · d6f7233a

Daniel Hiltgen authored Aug 15, 2025

* test: improve scheduler/concurrency stress tests

The scheduler test used to use approximate memory figures and would often
over or under shoot a systems capcity leading to flaky test results.
This should improve the reliability of this scenario by leveraging
ps output to determinie exactly how many models it takes to
trigger thrashing.

The concurrency test is also refined to target num_parallel + 1 and handle
timeouts better.

With these refinements, TestMultiModelConcurrency was redundant

* test: add parallel generate with history

TestGenerateWithHistory will help verify caching and context
are properly handled while making requests

* test: focus embed tests on embedding models

remove non-embedding models from the embedding tests

d6f7233a

14 Aug, 2025 1 commit
- test: add valid responses (#11902) · c385ca86
  Daniel Hiltgen authored Aug 14, 2025
```
some of the new models need a few more valid responses to pass
```
  c385ca86
13 Aug, 2025 1 commit
- int: adjust a few models for integration tests (#11872) · a24f9060
  Daniel Hiltgen authored Aug 13, 2025
  
  a24f9060
07 Aug, 2025 1 commit
- tests: add integration coverage for oss-gpt (#11696) · 114c3f22
  Daniel Hiltgen authored Aug 07, 2025
```
Also wires up support to override the default "smol" model
```
  114c3f22
11 Jul, 2025 1 commit

Only load supported models on new engine (#11362) · f8a6e888

Daniel Hiltgen authored Jul 11, 2025

* Only load supported models on new engine

Verify the model is supported before trying to load

* int: testcase for all library models

f8a6e888

05 Jul, 2025 1 commit

int: add performance integration tests (#11173) · 4f473e22

Daniel Hiltgen authored Jul 05, 2025

4f473e22

19 Jun, 2025 1 commit
- int: add coverage for older models (#11137) · f2527b08
  Daniel Hiltgen authored Jun 19, 2025
```
Verified these fail on 0.9.1 and pass on HEAD.
```
  f2527b08
24 May, 2025 1 commit
- tests: drop llama3.2-vision embedding tests (#10837) · 2307fc2b
  Daniel Hiltgen authored May 24, 2025
  
  2307fc2b
22 May, 2025 1 commit

integration: add qwen2.5-vl (#10815) · fdd4d479

Daniel Hiltgen authored May 22, 2025

Replace the older llava model with qwen2.5 for vision tests
Skip split-batch test on small VRAM systems to avoid excessive test time

fdd4d479

06 May, 2025 1 commit

Move quantization to new backend (#10363) · 42481045

Daniel Hiltgen authored May 06, 2025

* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.

42481045

04 May, 2025 1 commit
- file close check and close. (#10554) · 7e5c8eee
  湛露先生 authored May 05, 2025
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  7e5c8eee
29 Apr, 2025 1 commit

integration: fix embedding tests error handling (#10478) · 7bec2724

Daniel Hiltgen authored Apr 29, 2025

The cleanup routine from InitServerconnection should run in the defer of the test case to properly detect failures and report the server logs

7bec2724

16 Apr, 2025 1 commit

Integration test improvements (#9654) · ed4e1393

Daniel Hiltgen authored Apr 16, 2025

Add some new test coverage for various model architectures,
and switch from orca-mini to the small llama model.

ed4e1393

08 Apr, 2025 1 commit
- fix(integration): move waitgroup Add(1) outside goroutine to avoid potential issue (#10070) · e7019c94
  CYJiang authored Apr 09, 2025
```
Signed-off-by: googs1025 <googs1025@gmail.com>
```
  e7019c94
02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

14 Mar, 2025 1 commit

ml: Allow models to constrain inputs to a single batch · 9679f401

Jesse Gross authored Mar 12, 2025

Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697

9679f401

10 Dec, 2024 1 commit
- all: fix typos in documentation, code, and comments (#7021) · abfdc471
  Stefan Weil authored Dec 10, 2024
  
  abfdc471
22 Nov, 2024 1 commit
- tests: fix max queue integration test (#7782) · f0a35181
  Daniel Hiltgen authored Nov 22, 2024
```
This had fallen out of sync with the envconfig behavior, where max queue default was not zero.
```
  f0a35181
20 Nov, 2024 1 commit

runner.go: Retry decoding after defragmentation if needed · 7121dfa3

Jesse Gross authored Nov 19, 2024

Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.

7121dfa3

01 Nov, 2024 1 commit
- Add basic mllama integration tests (#7455) · 8a9bb0d0
  Daniel Hiltgen authored Oct 31, 2024
  
  8a9bb0d0