Commits · 73e2c8f68fe075ea159a20bbf778c0cf801316ad · OpenDAS / ollama

09 Jul, 2024 1 commit

Fix context exhaustion integration test for small gpus · 73e2c8f6

Daniel Hiltgen authored Jul 09, 2024

On the smaller GPUs, the initial model load of llama2 took over 30s (the
default timeout for the DoGenerate helper)

73e2c8f6

14 Jun, 2024 4 commits

review comments and coverage · 6f351bf5
Daniel Hiltgen authored Jun 05, 2024

6f351bf5
refined test timing · 68dfc623
Daniel Hiltgen authored May 31, 2024
```
adjust timing on some tests so they don't timeout on small/slow GPUs
```
68dfc623

Improve multi-gpu handling at the limit · 6fd04ca9

Daniel Hiltgen authored May 18, 2024

Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block

6fd04ca9

Fix concurrency integration test to work locally · 206797bd
Daniel Hiltgen authored May 23, 2024
```
This worked remotely but wound up trying to spawn multiple servers
locally which doesn't work
```
206797bd

16 May, 2024 1 commit

Skip max queue test on remote · 7f2fbad7

Daniel Hiltgen authored May 16, 2024

This test needs to be able to adjust the queue size down from
our default setting for a reliable test, so it needs to skip on
remote test execution mode.

7f2fbad7

10 May, 2024 1 commit
- Integration fixes · 074dc3b9
  Daniel Hiltgen authored May 10, 2024
  
  074dc3b9
06 May, 2024 1 commit
- update tests · a7248f6e
  Michael Yang authored Apr 16, 2024
  
  a7248f6e
05 May, 2024 1 commit
- Add integration test to push max queue limits · 45d61aaa
  Daniel Hiltgen authored May 05, 2024
  
  45d61aaa
23 Apr, 2024 2 commits

Local unicode test case · f2ea8470
Daniel Hiltgen authored Apr 16, 2024

f2ea8470

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

04 Apr, 2024 1 commit
- Add test case for context exhaustion · aeb1fb51
  Daniel Hiltgen authored Apr 04, 2024
```
Confirmed this fails on 0.1.30 with known regression
but passes on main
```
  aeb1fb51
03 Apr, 2024 1 commit
- Fix macOS builds on older SDKs (#3467) · cd135317
  Jeffrey Morgan authored Apr 03, 2024
  
  cd135317
01 Apr, 2024 1 commit
- Integration test improvements · 4fec5816
  Daniel Hiltgen authored Mar 27, 2024
```
Cleaner shutdown logic, a bit of response hardening
```
  4fec5816
26 Mar, 2024 1 commit
- change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) · 1b272d5b
  Patrick Devine authored Mar 26, 2024
  
  1b272d5b
25 Mar, 2024 1 commit
- Integration tests conditionally pull · 7b6cbc10
  Daniel Hiltgen authored Mar 24, 2024
```
If images aren't present, pull them.
Also fixes the expected responses
```
  7b6cbc10
23 Mar, 2024 1 commit

Revamp go based integration tests · 949b6c01

Daniel Hiltgen authored Mar 23, 2024

This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.

949b6c01