Commits · 920a4b0794d997b46aac502560e7dc2860875e1f · orangecat / ollama

"test/pipelines/pipelines-it-remote-windows.yml" did not exist on "d5e6af27bcf58111fcd75c376e0e744d327a1586"

01 May, 2024 1 commit

Add CUDA Driver API for GPU discovery · 089daaea

Daniel Hiltgen authored Apr 30, 2024

We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package

089daaea

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

01 Apr, 2024 2 commits

Release gpu discovery library after use · 526d4eb2

Daniel Hiltgen authored Mar 30, 2024

Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.

526d4eb2

Detect too-old cuda driver · 10ed1b62
Daniel Hiltgen authored Mar 28, 2024
```
"cudart init failure: 35" isn't particularly helpful in the logs.
```
10ed1b62

25 Mar, 2024 1 commit
- add support for libcudart.so for CUDA devices (adds Jetson support) · dfc6721b
  Jeremy authored Mar 25, 2024
  
  dfc6721b