- 29 Aug, 2025 1 commit
-
-
Daniel Hiltgen authored
* perf: build graph for next batch in parallel to keep GPU busy This refactors the main run loop of the ollama runner to perform the main GPU intensive tasks (Compute+Floats) in a go routine so we can prepare the next batch in parallel to reduce the amount of time the GPU stalls waiting for the next batch of work. * tests: tune integration tests for ollama engine This tunes the integration tests to focus more on models supported by the new engine.
-
- 16 Apr, 2025 1 commit
-
-
Daniel Hiltgen authored
Add some new test coverage for various model architectures, and switch from orca-mini to the small llama model.
-
- 08 Apr, 2025 1 commit
-
-
CYJiang authored
Signed-off-by:googs1025 <googs1025@gmail.com>
-
- 02 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
-
- 10 Dec, 2024 1 commit
-
-
Stefan Weil authored
-
- 22 Nov, 2024 1 commit
-
-
Daniel Hiltgen authored
This had fallen out of sync with the envconfig behavior, where max queue default was not zero.
-
- 05 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 22 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 16 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This test needs to be able to adjust the queue size down from our default setting for a reliable test, so it needs to skip on remote test execution mode.
-
- 05 May, 2024 1 commit
-
-
Daniel Hiltgen authored
-