Commits · 4e415029b30b2dc8a666491fdbe6254536e5d810 · OpenDAS / ollama

20 Nov, 2024 1 commit

runner.go: Retry decoding after defragmentation if needed · 7121dfa3

Jesse Gross authored Nov 19, 2024

Fragmentation of the KV cache can occur due to cache shifting or
different sequences getting processed. Decode uses a heuristic to
decide if it should defrag. However, this heuristic isn't 100%
accurate, so decoding can sometimes fail by surprise.

For these cases, if decode indicates that there is no KV cache space,
we should defrag and then try again.

7121dfa3

09 Jul, 2024 1 commit

Fix context exhaustion integration test for small gpus · 73e2c8f6

Daniel Hiltgen authored Jul 09, 2024

On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)

73e2c8f6

14 Jun, 2024 3 commits

review comments and coverage · 6f351bf5
Daniel Hiltgen authored Jun 05, 2024

6f351bf5
refined test timing · 68dfc623
Daniel Hiltgen authored May 31, 2024
```
adjust timing on some tests so they don't timeout on small/slow GPUs
```
68dfc623

Improve multi-gpu handling at the limit · 6fd04ca9

Daniel Hiltgen authored May 18, 2024

Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block

6fd04ca9

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

04 Apr, 2024 1 commit
- Add test case for context exhaustion · aeb1fb51
  Daniel Hiltgen authored Apr 04, 2024
```
Confirmed this fails on 0.1.30 with known regression
but passes on main
```
  aeb1fb51
01 Apr, 2024 1 commit
- Integration test improvements · 4fec5816
  Daniel Hiltgen authored Mar 27, 2024
```
Cleaner shutdown logic, a bit of response hardening
```
  4fec5816
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
25 Mar, 2024 1 commit
- Integration tests conditionally pull · 7b6cbc10
  Daniel Hiltgen authored Mar 24, 2024
```
If images aren't present, pull them.
Also fixes the expected responses
```
  7b6cbc10
23 Mar, 2024 1 commit

Revamp go based integration tests · 949b6c01

Daniel Hiltgen authored Mar 23, 2024

This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.

949b6c01