Commits · 45cacbaf0568a4d38d74ebdd0957fe01bd06719d · OpenDAS / ollama

14 Jun, 2024 2 commits

Reintroduce nvidia nvml library for windows · 434dfe30

Daniel Hiltgen authored Jun 03, 2024

This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.

434dfe30

Refine GPU discovery to bootstrap once · 43ed358f

Daniel Hiltgen authored May 15, 2024

Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.

43ed358f

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

01 Apr, 2024 2 commits

Release gpu discovery library after use · 526d4eb2

Daniel Hiltgen authored Mar 30, 2024

Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.

526d4eb2

Detect too-old cuda driver · 10ed1b62
Daniel Hiltgen authored Mar 28, 2024
```
"cudart init failure: 35" isn't particularly helpful in the logs.
```
10ed1b62

25 Mar, 2024 1 commit
- add support for libcudart.so for CUDA devices (adds Jetson support) · dfc6721b
  Jeremy authored Mar 25, 2024
  
  dfc6721b