Commits · 68e04c7ff88dca128016f75dc5bbd2f794bd2028 · OpenDAS / ollama

17 Oct, 2025 1 commit

test: harden scheduler tests (#12662) · 68e04c7f

Daniel Hiltgen authored Oct 17, 2025

* test: harden scheduler tests

This removes reschedDelay which was stale code, and adds
a new configurable timeout for the waitForVRAMRecovery so
tests can now set the timeout to be very short to avoid the
scheduler getting stuck and hitting a test timeout.

* test: tune tests for partial loads

Give stress tests more time when the model is split between CPU/GPU

68e04c7f

08 Oct, 2025 1 commit
- Integration test tuning (#12492) · 4e5d862e
  Daniel Hiltgen authored Oct 08, 2025
```
Remove some flaky scenarios, and switch to chat for better reliability
```
  4e5d862e
02 Oct, 2025 1 commit

Update GGML to b6646 (#12245) · c68f367e

Daniel Hiltgen authored Oct 02, 2025

Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported

c68f367e

22 Sep, 2025 1 commit

tests: add single threaded history test (#12295) · c23e6f4c

Daniel Hiltgen authored Sep 22, 2025

* tests: add single threaded history test

Also tidies up some existing tests to handle more model output variation

* test: add support for testing specific architectures

c23e6f4c

12 Sep, 2025 1 commit

tests: tighten up a few flaky tests (#12271) · 44a67928

Daniel Hiltgen authored Sep 12, 2025

Sometimes the context test results are pure emoji's
Thanksgiving has too much variability, so swap for a more straight forward prompt.

44a67928

09 Sep, 2025 1 commit

llm: Clamp batch size to context size · e119783e

Jesse Gross authored Sep 08, 2025

The context must always be able to store the current batch, so
if the user requests a small context then we should also shrink
the batch to match. This also fixes the TestLongInputContext
test on the new engine. (The old engine already has this behavior.)

e119783e

29 Aug, 2025 1 commit

perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd

Daniel Hiltgen authored Aug 29, 2025

* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.

517807cd

15 Aug, 2025 1 commit

test: improve scheduler/concurrency stress tests (#11906) · d6f7233a

Daniel Hiltgen authored Aug 15, 2025

* test: improve scheduler/concurrency stress tests

The scheduler test used to use approximate memory figures and would often
over or under shoot a systems capcity leading to flaky test results.
This should improve the reliability of this scenario by leveraging
ps output to determinie exactly how many models it takes to
trigger thrashing.

The concurrency test is also refined to target num_parallel + 1 and handle
timeouts better.

With these refinements, TestMultiModelConcurrency was redundant

* test: add parallel generate with history

TestGenerateWithHistory will help verify caching and context
are properly handled while making requests

* test: focus embed tests on embedding models

remove non-embedding models from the embedding tests

d6f7233a

02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

20 Nov, 2024 1 commit

runner.go: Retry decoding after defragmentation if needed · 7121dfa3

Jesse Gross authored Nov 19, 2024

Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.

7121dfa3

09 Jul, 2024 1 commit

Fix context exhaustion integration test for small gpus · 73e2c8f6

Daniel Hiltgen authored Jul 09, 2024

On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)

73e2c8f6

14 Jun, 2024 3 commits

review comments and coverage · 6f351bf5
Daniel Hiltgen authored Jun 05, 2024

6f351bf5
refined test timing · 68dfc623
Daniel Hiltgen authored May 31, 2024
```
adjust timing on some tests so they don't timeout on small/slow GPUs
```
68dfc623

Improve multi-gpu handling at the limit · 6fd04ca9

Daniel Hiltgen authored May 18, 2024

Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block

6fd04ca9

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

04 Apr, 2024 1 commit
- Add test case for context exhaustion · aeb1fb51
  Daniel Hiltgen authored Apr 04, 2024
```
Confirmed this fails on 0.1.30 with known regression
but passes on main
```
  aeb1fb51
01 Apr, 2024 1 commit
- Integration test improvements · 4fec5816
  Daniel Hiltgen authored Mar 27, 2024
```
Cleaner shutdown logic, a bit of response hardening
```
  4fec5816
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
25 Mar, 2024 1 commit
- Integration tests conditionally pull · 7b6cbc10
  Daniel Hiltgen authored Mar 24, 2024
```
If images aren't present, pull them.
Also fixes the expected responses
```
  7b6cbc10
23 Mar, 2024 1 commit

Revamp go based integration tests · 949b6c01

Daniel Hiltgen authored Mar 23, 2024

This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.

949b6c01