Commits · 517807cdf29d2c8d22bc748a2cfde2b61bd67c98 · OpenDAS / ollama

29 Aug, 2025 1 commit

perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd

Daniel Hiltgen authored Aug 29, 2025

* perf: build graph for next batch in parallel to keep GPU busy

This refactors the main run loop of the ollama runner to perform the main GPU
intensive tasks (Compute+Floats) in a go routine so we can prepare the next
batch in parallel to reduce the amount of time the GPU stalls waiting for the
next batch of work.

* tests: tune integration tests for ollama engine

This tunes the integration tests to focus more on models supported
by the new engine.

517807cd

16 Apr, 2025 1 commit

Integration test improvements (#9654) · ed4e1393

Daniel Hiltgen authored Apr 16, 2025

Add some new test coverage for various model architectures,
and switch from orca-mini to the small llama model.

ed4e1393

02 Apr, 2025 1 commit

chore(all): replace instances of interface with any (#10067) · 9876c9fa

Bruce MacDonald authored Apr 02, 2025

Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.

9876c9fa

31 Oct, 2024 1 commit

Give unicode test more time to run (#7437) · 921779bb

Daniel Hiltgen authored Oct 31, 2024

* Give unicode test more time to run

Some slower GPUs (or partial CPU/GPU loads) can take more than the default 30s to complete this test

* Give more time for concurrency test

CPU inference can be very slow under stress

921779bb

29 Oct, 2024 1 commit
- tests: Add test for Unicode processing · 078f666f
  Jesse Gross authored Oct 23, 2024
  
  078f666f
22 Oct, 2024 1 commit

runner.go: Merge partial unicode characters before sending · 03e40efa

Jesse Gross authored Oct 21, 2024

We check for partial unicode characters and accumulate them before
sending. However, when we did send, we still sent each individual piece
separately, leading to broken output. This combines everything into
a single group, which is also more efficient.

This also switches to the built-in check for valid unicode characters,
which is stricter. After this, we should never send back an invalid
sequence.

Fixes #7290

03e40efa

22 Jul, 2024 1 commit
- int · 0f191012
  Michael Yang authored Jul 03, 2024
  
  0f191012
23 Apr, 2024 2 commits

Local unicode test case · f2ea8470
Daniel Hiltgen authored Apr 16, 2024

f2ea8470

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

01 Apr, 2024 1 commit
- Integration test improvements · 4fec5816
  Daniel Hiltgen authored Mar 27, 2024
```
Cleaner shutdown logic, a bit of response hardening
```
  4fec5816
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
25 Mar, 2024 1 commit
- Integration tests conditionally pull · 7b6cbc10
  Daniel Hiltgen authored Mar 24, 2024
```
If images aren't present, pull them.
Also fixes the expected responses
```
  7b6cbc10
23 Mar, 2024 1 commit

Revamp go based integration tests · 949b6c01

Daniel Hiltgen authored Mar 23, 2024

This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.

949b6c01