Commits · 517807cdf29d2c8d22bc748a2cfde2b61bd67c98 · OpenDAS / ollama

29 Aug, 2025 1 commit

perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd

Daniel Hiltgen authored Aug 29, 2025

* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.

517807cd

22 May, 2025 1 commit

integration: add qwen2.5-vl (#10815) · fdd4d479

Daniel Hiltgen authored May 22, 2025

Replace the older llava model with qwen2.5 for vision tests
Skip split-batch test on small VRAM systems to avoid excessive test time

fdd4d479

16 Apr, 2025 1 commit

Integration test improvements (#9654) · ed4e1393

Daniel Hiltgen authored Apr 16, 2025

Add some new test coverage for various model architectures,
and switch from orca-mini to the small llama model.

ed4e1393

02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

14 Mar, 2025 1 commit

ml: Allow models to constrain inputs to a single batch · 9679f401

Jesse Gross authored Mar 12, 2025

Models may require that a set of inputs all be processed as part
of the same batch. For example, if an image has multiple patches
with fully connected attention between them, we should not split
the batch in the middle of an image.

Fixes #9697

9679f401

01 Nov, 2024 1 commit
- Add basic mllama integration tests (#7455) · 8a9bb0d0
  Daniel Hiltgen authored Oct 31, 2024
  
  8a9bb0d0
14 Jun, 2024 1 commit
- refined test timing · 68dfc623
  Daniel Hiltgen authored May 31, 2024
```
adjust timing on some tests so they don't timeout on small/slow GPUs
```
  68dfc623
23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
25 Mar, 2024 1 commit
- Integration tests conditionally pull · 7b6cbc10
  Daniel Hiltgen authored Mar 24, 2024
```
If images aren't present, pull them.
Also fixes the expected responses
```
  7b6cbc10
23 Mar, 2024 1 commit

Revamp go based integration tests · 949b6c01

Daniel Hiltgen authored Mar 23, 2024

This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.

949b6c01

23 Dec, 2023 1 commit

Guard integration tests with a tag · 697bea69

Daniel Hiltgen authored Dec 22, 2023

This should help CI avoid running the integration test logic in a
container where it's not currently possible.

697bea69

19 Dec, 2023 1 commit
- Add automated test for multimodal · 51082535
  Daniel Hiltgen authored Dec 13, 2023
```
A simple test case that verifies llava:7b can read text in an image
```
  51082535